Commits · b278db77f029bf89deb8f9f7f1542f2306a151a4 · Summer2022 / 22b970497

Jul 19, 2021

Make compile successful when CONFIG_BCACHE is not set. · b278db77

Xu Wei authored 3 years ago

euleros inclusion
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=327


CVE: NA

When kernel config don't enbale CONFIG_BCACHE, compiling bcache module will
fail. This patch add the judgment for CONFIG_BCACHE macro to make sure
compiling bcache module success.

Signed-off-by: qinghaixiang <xuweiqhx@163.com>
Signed-off-by: Xu Wei <xuwei56@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: Li Ruilin <liruilin4@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

b278db77

Move only dirty data when gc runnning, in order to reducing write amplification. · 59a67a69

Xu Wei authored 3 years ago

euleros inclusion
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=327


CVE: NA

Bcache will move all data, including clean and dirty data, in bucket when
gc running. This will cause big write amplification, which may reduce the
cache device's life. This patch provice a switch for gc to move only dirty
data, which can reduce write amplification.

Signed-off-by: qinghaixiang <xuweiqhx@163.com>
Signed-off-by: Xu Wei <xuwei56@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: Li Ruilin <liruilin4@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

59a67a69

Add traffic policy for low cache available. · cb004e12

Xu Wei authored 3 years ago

euleros inclusion
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=327


CVE: NA

When cache available is low, bcache turn to writethrough mode. Therefore,
All write IO will be directly sent to backend device, which is usually
HDD. At same time, cache device flush dirty data to the backend device
in the bcache writeback process. So write IO from user will damage the
sequentiality of writeback. And if there is lots of IO from writeback,
user's write IO may be block. This patch add traffic policy in bcache
to solve the problem and improve the performance for bcache when cache
available is low.

Signed-off-by: qinghaixiang <xuweiqhx@163.com>
Signed-off-by: Xu Wei <xuwei56@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: Li Ruilin <liruilin4@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

cb004e12

igmp: Add ip_mc_list lock in ip_check_mc_rcu · 06d62ce5

Liu Jian authored 3 years ago

mainline inclusion
from mainline-net-next
commit 23d2b94043ca8835bd1e67749020e839f396a1c2
category: bugfix
bugzilla: NA
CVE: NA

--------------------------------

I got below panic when doing fuzz test:

Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 4056 Comm: syz-executor.3 Tainted: G    B             5.14.0-rc1-00195-gcff5c4254439-dirty #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack_lvl+0x7a/0x9b
panic+0x2cd/0x5af
end_report.cold+0x5a/0x5a
kasan_report+0xec/0x110
ip_check_mc_rcu+0x556/0x5d0
__mkroute_output+0x895/0x1740
ip_route_output_key_hash_rcu+0x2d0/0x1050
ip_route_output_key_hash+0x182/0x2e0
ip_route_output_flow+0x28/0x130
udp_sendmsg+0x165d/0x2280
udpv6_sendmsg+0x121e/0x24f0
inet6_sendmsg+0xf7/0x140
sock_sendmsg+0xe9/0x180
____sys_sendmsg+0x2b8/0x7a0
___sys_sendmsg+0xf0/0x160
__sys_sendmmsg+0x17e/0x3c0
__x64_sys_sendmmsg+0x9e/0x100
do_syscall...

06d62ce5

Jul 18, 2021

memcg: fix unsuitable null check after alloc memory · bed274bc

卢佳琳 authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: 51815, https://gitee.com/openeuler/kernel/issues/I3IJ9I


CVE: NA

--------

Signed-off-by: Lu Jialin <lujialin4@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

bed274bc

Jul 16, 2021

cpuidle: fix a build error when compiling haltpoll into module · f6ca4176

GONG, Ruiqi authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZURN


CVE: NA

--------

Kernel build would fail in case of CONFIG_HALTPOLL_CPUIDLE=m, caused by
haltpoll_switch_governor() not marked as an exported symbol. Fix this
by complementing the EXPORT_SYMBOL statement.

Fixes: 97c22788 ("cpuidle: fix container_of err in cpuidle_device and cpuidle_driver")
Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Cc: Jiajun Chen <chenjiajun8@huawei.com>
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>

4.19.90-2107.2.0

f6ca4176

config: enable KASAN and UBSAN by default · e01c1bf7

Yang Yingliang authored 3 years ago


hulk inclusion
category: bugfix
bugzilla: NA
CVE: NA

--------------------------------

Enable KASAN and UBSAN by default for test.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

e01c1bf7

KVM: x86: expose AVX512_BF16 feature to guest · bbea3a3a

Jing Liu authored 3 years ago

mainline inclusion
from mainline-v5.3-rc1
commit 0b774629
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3YAEG
CVE: NA

-----------------------------

AVX512 BFLOAT16 instructions support 16-bit BFLOAT16 floating-point
format (BF16) for deep learning optimization.

Intel adds AVX512 BFLOAT16 feature in CooperLake, which is CPUID.7.1.EAX[5].

Detailed information of the CPUID bit can be found here,
https://software.intel.com/sites/default/files/managed/c5/15/\


architecture-instruction-set-extensions-programming-reference.pdf.

Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
[Fix type mismatch in min, changing constant "1" to "1u". - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

bbea3a3a

KVM: cpuid: remove has_leaf_count from struct kvm_cpuid_param · 16b22d73

Paolo Bonzini authored 3 years ago

mainline inclusion
from mainline-v5.3-rc1
commit 60cec433
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3YAEG


CVE: NA

-----------------------------

The has_leaf_count member was originally added for KVM's paravirtualization
CPUID leaves.  However, since then the leaf count _has_ been added to those
leaves as well, so we can drop that special case.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

16b22d73

KVM: cpuid: rename do_cpuid_1_ent · a61f03da

Paolo Bonzini authored 3 years ago

mainline inclusion
from mainline-v5.3-rc1
commit 50a9e1a4
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3YAEG


CVE: NA

-----------------------------

do_cpuid_1_ent does not do the entire processing for a CPUID entry, it
only retrieves the host's values.  Rename it to match reality.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

a61f03da

KVM: cpuid: set struct kvm_cpuid_entry2 flags in do_cpuid_1_ent · ad2a90fb

Paolo Bonzini authored 3 years ago

mainline inclusion
from mainline-v5.3-rc1
commit d9aadaf6
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3YAEG


CVE: NA

-----------------------------

do_cpuid_1_ent is typically called in two places by __do_cpuid_func
for CPUID functions that have subleafs.  Both places have to set
the KVM_CPUID_FLAG_SIGNIFCANT_INDEX.  Set that flag, and
KVM_CPUID_FLAG_STATEFUL_FUNC as well, directly in do_cpuid_1_ent.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

ad2a90fb

KVM: cpuid: extract do_cpuid_7_mask and support multiple subleafs · 18f2e790

Paolo Bonzini authored 3 years ago

mainline inclusion
from mainline-v5.3-rc1
commit 54d360d4
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3YAEG


CVE: NA

-----------------------------

CPUID function 7 has multiple subleafs.  Instead of having nested
switch statements, move the logic to filter supported features to
a separate function, and call it for each subleaf.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

18f2e790

KVM: cpuid: do_cpuid_ent works on a whole CPUID function · b2c1c889

Paolo Bonzini authored 3 years ago

mainline inclusion
from mainline-v5.3-rc1
commit ab8bcf64
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3YAEG


CVE: NA

-----------------------------

Rename it as well as __do_cpuid_ent and __do_cpuid_ent_emulated to have
"func" in its name, and drop the index parameter which is always 0.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

b2c1c889

Jul 14, 2021

ext4: fix possible UAF when remounting r/o a mmp-protected file system · 7d7d26aa

Theodore Ts'o authored 3 years ago

mainline inclusion
from mainline-5.14
commit	61bb4a1c417e5b95d9edb4f887f131de32e419cb
category: bugfix
bugzilla: 173880
CVE: NA

-------------------------------------------------

After commit 618f003199c6 ("ext4: fix memory leak in
ext4_fill_super"), after the file system is remounted read-only, there
is a race where the kmmpd thread can exit, causing sbi->s_mmp_tsk to
point at freed memory, which the call to ext4_stop_mmpd() can trip
over.

Fix this by only allowing kmmpd() to exit when it is stopped via
ext4_stop_mmpd().

Link: https://lore.kernel.org/r/20210707002433.3719773-1-tytso@mit.edu


Reported-by: Ye Bin <yebin10@huawei.com>
Bug-Report-Link: <20210629143603.2166962-1-yebin10@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>

Conflicts:
	fs/ext4/super.c

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yang Yingliang <yangying...

7d7d26aa

locks: Fix UBSAN undefined behaviour in flock64_to_posix_lock · 1b84bd6f

Luo Meng authored 3 years ago

mainline inclusion
from mainline-v5.11-rc1
commit 16238415eb9886328a89fe7a3cb0b88c7335fe16
category: bugfix
bugzilla: 38689
CVE: NA

-----------------------------------------------

When the sum of fl->fl_start and l->l_len overflows,
UBSAN shows the following warning:

UBSAN: Undefined behaviour in fs/locks.c:482:29
signed integer overflow: 2 + 9223372036854775806
cannot be represented in type 'long long int'
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xe4/0x14e lib/dump_stack.c:118
 ubsan_epilogue+0xe/0x81 lib/ubsan.c:161
 handle_overflow+0x193/0x1e2 lib/ubsan.c:192
 flock64_to_posix_lock fs/locks.c:482 [inline]
 flock_to_posix_lock+0x595/0x690 fs/locks.c:515
 fcntl_setlk+0xf3/0xa90 fs/locks.c:2262
 do_fcntl+0x456/0xf60 fs/fcntl.c:387
 __do_sys_fcntl fs/fcntl.c:483 [inline]
 __se_sys_fcntl fs/fcntl.c:468 [inline]
 __x64_sys_fcntl+0x12d/0x180 fs/fcntl.c:468
 do_syscall_64+0xc8/0x5a0 arch/x86/entry/common.c:293
 entry_SY...

1b84bd6f

iomap: Mark read blocks uptodate in write_begin · a259beb0

Matthew Wilcox (Oracle) authored 3 years ago


mainline inclusion
from mainline-v5.10
commit 14284fed
category: bugfix
bugzilla: 43547
CVE: NA

-----------------------------------------------

When bringing (portions of) a page uptodate, we were marking blocks that
were zeroed as being uptodate, but not blocks that were read from storage.

Like the previous commit, this problem was found with generic/127 and
a kernel which failed readahead I/Os.  This bug causes writes to be
silently lost when working with flaky storage.

Fixes: 9dc55f13 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

conflicts:
fs/iomap.c

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yang Yingliang <yangyi...

a259beb0

iomap: Clear page error before beginning a write · b3a0aab5

Matthew Wilcox (Oracle) authored 3 years ago


mainline inclusion
from mainline-v5.10-rc1
commit e6e7ca92
category: bugfix
bugzilla: 43551
CVE: NA

-----------------------------------------------

If we find a page in write_begin which is !Uptodate, we need
to clear any error on the page before starting to read data
into it.  This matches how filemap_fault(), do_read_cache_page()
and generic_file_buffered_read() handle PageError on !Uptodate pages.
When calling iomap_set_range_uptodate() in __iomap_write_begin(), blocks
were not being marked as uptodate.

This was found with generic/127 and a specially modified kernel which
would fail (some) readahead I/Os.  The test read some bytes in a prior
page which caused readahead to extend into page 0x34.  There was
a subsequent write to page 0x34, followed by a read to page 0x34.
Because the blocks were still marked as !Uptodate, the read caused all
blocks to be re-read, overwriting the write.  With this change, and the
next one, the bytes which were written are marked as being Uptodate, so
even though the page is still marked as !Uptodate, the blocks containing
the written data are not re-read from storage.

Fixes: 9dc55f13 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

conflicts:
fs/iomap.c

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

b3a0aab5

iomap: move the zeroing case out of iomap_read_page_sync · 9fdbab15

Christoph Hellwig authored 3 years ago


mainline inclusion
from mainline-v5.5-rc1
commit d3b40439
category: bugfix
bugzilla: 43551
CVE: NA

-----------------------------------------------

That keeps the function a little easier to understand, and easier to
modify for pending enhancements.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

conflicts:
fs/iomap.c

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

9fdbab15

nbd: handle device refs for DESTROY_ON_DISCONNECT properly · c72b1648

Josef Bacik authored 3 years ago

mainline inclusion
from mainline-5.12-rc1
commit c9a2f90f4d6b
category: bugfix
bugzilla: 50455
CVE: NA

-------------------------------------------------

There exists a race where we can be attempting to create a new nbd
configuration while a previous configuration is going down, both
configured with DESTROY_ON_DISCONNECT.  Normally devices all have a
reference of 1, as they won't be cleaned up until the module is torn
down.  However with DESTROY_ON_DISCONNECT we'll make sure that there is
only 1 reference (generally) on the device for the config itself, and
then once the config is dropped, the device is torn down.

The race that exists looks like this

TASK1					TASK2
nbd_genl_connect()
  idr_find()
    refcount_inc_not_zero(nbd)
      * count is 2 here ^^
					nbd_config_put()
					  nbd_put(nbd) (count is 1)
    setup new config
      check DESTROY_ON_DISCONNECT
	put_dev = true
    if (put_dev) nbd_put(nbd)
	* free'd here ^^

In nbd_genl_conne...

c72b1648

cifs: Fix leak when handling lease break for cached root fid · 25ebc65f

Paul Aurich authored 3 years ago

mainline inclusion
from mainline-5.9-rc1
commit baf57b56
category: bugfix
bugzilla: 40791
CVE: NA

---------------------------

Handling a lease break for the cached root didn't free the
smb2_lease_break_work allocation, resulting in a leak:

    unreferenced object 0xffff98383a5af480 (size 128):
      comm "cifsd", pid 684, jiffies 4294936606 (age 534.868s)
      hex dump (first 32 bytes):
        c0 ff ff ff 1f 00 00 00 88 f4 5a 3a 38 98 ff ff  ..........Z:8...
        88 f4 5a 3a 38 98 ff ff 80 88 d6 8a ff ff ff ff  ..Z:8...........
      backtrace:
        [<0000000068957336>] smb2_is_valid_oplock_break+0x1fa/0x8c0
        [<0000000073b70b9e>] cifs_demultiplex_thread+0x73d/0xcc0
        [<00000000905fa372>] kthread+0x11c/0x150
        [<0000000079378e4e>] ret_from_fork+0x22/0x30

Avoid this leak by only allocating when necessary.

Fixes: a93864d9 ("cifs: add lease tracking to the cached root fid")
Signed-o...

25ebc65f

Jul 12, 2021

mm/memcontrol.c: fix kasan slab-out-of-bounds in mem_cgroup_css_alloc · 7395879d

卢佳琳 authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: 51815, https://gitee.com/openeuler/kernel/issues/I3IJ9I


CVE: NA

--------

static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
{
...
pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
if (!pn)
	return 1;

	pnext = to_mgpn_ext(pn);
	pnext->lruvec_stat_local = alloc_percpu(struct lruvec_stat);
}
the size of pnext is larger than pn, so pnext->lruvec_stat_local is out
of bounds

Signed-off-by: Lu Jialin <lujialin4@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

7395879d

module: limit enabling module.sig_enforce · e1f55683

Mimi Zohar authored 3 years ago


stable inclusion
from linux-4.19.196
commit ff660863628fb144badcb3395cde7821c82c13a6
CVE: CVE-2021-35039

--------------------------------

[ Upstream commit 0c18f29aae7ce3dadd26d8ee3505d07cc982df75 ]

Irrespective as to whether CONFIG_MODULE_SIG is configured, specifying
"module.sig_enforce=1" on the boot command line sets "sig_enforce".
Only allow "sig_enforce" to be set when CONFIG_MODULE_SIG is configured.

This patch makes the presence of /sys/module/module/parameters/sig_enforce
dependent on CONFIG_MODULE_SIG=y.

Fixes: fda784e5 ("module: export module signature enforcement status")
Reported-by: Nayna Jain <nayna@linux.ibm.com>
Tested-by: Mimi Zohar <zohar@linux.ibm.com>
Tested-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@hu...>

e1f55683

selftests/bpf: add test_spec_readahead_xfs_file to support specail async readahead · 69513cfb

Yufen Yu authored 3 years ago


hulk inclusion
category: feature
bugzilla: 173267
CVE: NA
---------------------------

For hibench applications, likely kmeans, wordcount, terasort,
we can try to use this bpf tool to improve io performance.

Usage:
	make -C bpf
	./test_xfs_file spec_readahead

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

69513cfb

mm: support special async readahead · cc549516

Yufen Yu authored 3 years ago

hulk inclusion
category: feature
bugzilla: 173267
CVE: NA
---------------------------

For hibench applications, include kmeans, wordcount and terasort,
they will read whole blk_xxx and blk_xxx.meta from disk in sequential.
And almost all of the read issued to disk are triggered by async
readahead.

While sequential read of single thread does't means sequential io
on disk when multiple threads cocurrently do that. Multiple threads
interleaving sequentail read can make io issued into disk become
random, which will limit disk IO throughput.

To reduce disk randomization, we can consider to increase readahead
window. Then IO generated by filesystem will be bigger in each time
of async readahead. But, limited by disk max_hw_sectors_kb, big IO
will be split and the whole bio need to wait all split bios complete,
which can cause longer io latency.

Our trace shows that many long latency in threads are caused by waiting
async readahead IO complete when set readahead window wit...

cc549516

selftests/bpf: test_xfs_file support to clear FMODE_RANDOM · 38abc1bb

Yufen Yu authored 3 years ago


hulk inclusion
category: feature
bugzilla: 173267
CVE: NA
---------------------------

If ra->prev_pos page index is equal to current pos, that means
it is sequential read, then clear FMODE_RANDOM flag to enable
async readahead.

Usage:
	make -C bpf
	./test_xfs_file clear

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

38abc1bb

xfs: let writable tracepoint enable to clear flag of f_mode · b1e9dddb

Yufen Yu authored 3 years ago


hulk inclusion
category: feature
bugzilla: 173267
CVE: NA
---------------------------

Adding a new member clear_f_mode into struct xfs_writable_file,
then we can clear some flag of file->f_mode.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

b1e9dddb

Jul 09, 2021

jbd2: fix kabi broken in struct journal_s · c18bc51c

yangerkun authored 3 years ago


hulk inclusion
category: bugfix
bugzilla: 172974
CVE: NA
---------------------------

72c9e4df ('jbd2: ensure abort the journal if detect IO error when
writing original buffer back') will add 'j_atomic_flags' which can lead
lots of kabi broken like jbd2_journal_destroy/jbd2_journal_abort and so
on.

Fix it by add a wrapper.

Signed-off-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

c18bc51c

btrfs: allow btrfs_truncate_block() to fallback to nocow for data space reservation · ed46208d

Qu Wenruo authored 3 years ago

mainline inclusion
from mainline-v5.13-rc5
commit 6d4572a9
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I39MZM
CVE: NA

------------------------------------------------------

[BUG]
When the data space is exhausted, even if the inode has NOCOW attribute,
we will still refuse to truncate unaligned range due to ENOSPC.

The following script can reproduce it pretty easily:
	#!/bin/bash

	dev=/dev/test/test
	mnt=/mnt/btrfs

	umount $dev &> /dev/null
	umount $mnt &> /dev/null

	mkfs.btrfs -f $dev -b 1G
	mount -o nospace_cache $dev $mnt
	touch $mnt/foobar
	chattr +C $mnt/foobar

	xfs_io -f -c "pwrite -b 4k 0 4k" $mnt/foobar > /dev/null
	xfs_io -f -c "pwrite -b 4k 0 1G" $mnt/padding &> /dev/null
	sync

	xfs_io -c "fpunch 0 2k" $mnt/foobar
	umount $mnt

Currently this will fail at the fpunch part.

[CAUSE]
Because btrfs_truncate_block() always reserves space without checking
...

ed46208d

NFSv4.1: fix kabi for struct rpc_xprt · 3a50d694

ChenXiaoSong authored 3 years ago


hulk inclusion
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

commit a2ff6d97 ("NFSv4.1: Don't rebind to the same source port when reconnecting to the server")
add new member into struct rpc_xprt,
which will break KABI. This patch try to fix it.

Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Reviewed-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

3a50d694

Jul 08, 2021

usb: gadget: rndis: Fix info leak of rndis · 7f1196b6

王海 authored 3 years ago


hulk inclusion
category: bugfix
bugzilla: 172330
CVE: NA

--------------------------------

We can construct some special USB packets that cause kernel
info leak by the following steps of rndis.

1. construct the packet to make rndis call gen_ndis_set_resp().

In gen_ndis_set_resp(), BufOffset comes from the USB packet and
it is not checked so that BufOffset can be any value. Therefore,
if OID is RNDIS_OID_GEN_CURRENT_PACKET_FILTER, then *params->filter
can get data at any address.

2. construct the packet to make rndis call rndis_query_response().

In rndis_query_response(), if OID is RNDIS_OID_GEN_CURRENT_PACKET_FILTER,
then the data of *params->filter is fetched and returned, resulting in
info leak.

Therefore, we need to check the BufOffset to prevent info leak. Here,
buf size is USB_COMP_EP0_BUFSIZ, as long as "8 + BufOffset + BufLength"
is less than USB_COMP_EP0_BUFSIZ, it will be considered legal.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Reviewed-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

7f1196b6

Jul 05, 2021

once: Fix panic when module unload · 23eb8e37

王克锋 authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: 172153
CVE: NA

-------------------------------------------------

DO_ONCE
DEFINE_STATIC_KEY_TRUE(___once_key);
__do_once_done
  once_disable_jump(once_key);
    INIT_WORK(&w->work, once_deferred);
    struct once_work *w;
    w->key = key;
    schedule_work(&w->work);                     module unload
                                                   //*the key is destroy*
process_one_work
  once_deferred
    BUG_ON(!static_key_enabled(work->key));
       static_key_count((struct static_key *)x)    //*access key, crash*

When module uses DO_ONCE mechanism, it could crash due to the above
concurrency problem, we could reproduce it with link[1].

Fix it by add/put module refcount in the once work process.

[1]
https://lore.kernel.org/netdev/eaa6c371-465e-57eb-6be9-f4b16b9d7cbf@huawei.com/



Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Reported-by: Minmin chen <chenmingmin@huawei.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

4.19.90-2107.1.0

23eb8e37

SUNRPC: Should wake up the privileged task firstly. · 3d5dba2f

Zhang Xiaoxu authored 3 years ago


mainline inclusion
from mainline-v5.14
commit 5483b904bf336948826594610af4c9bbb0d9e3aa
category: bugfix
bugzilla: 51898
CVE: NA

---------------------------

When find a task from wait queue to wake up, a non-privileged task may
be found out, rather than the privileged. This maybe lead a deadlock
same as commit dfe1fe75e00e ("NFSv4: Fix deadlock between nfs4_evict_inode()
and nfs4_opendata_get_inode()"):

Privileged delegreturn task is queued to privileged list because all
the slots are assigned. If there has no enough slot to wake up the
non-privileged batch tasks(session less than 8 slot), then the privileged
delegreturn task maybe lost waked up because the found out task can't
get slot since the session is on draining.

So we should treate the privileged task as the emergency task, and
execute it as for as we can.

Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 5fcdfacc ("NFSv4: Return delegations synchronously in evict_inode")
Cc: stable@vger...

3d5dba2f

SUNRPC: Fix the batch tasks count wraparound. · 9b06b695

Zhang Xiaoxu authored 3 years ago


mainline inclusion
from mainline-v5.14
commit fcb170a9d825d7db4a3fb870b0300f5a40a8d096
category: bugfix
bugzilla: 51898
CVE: NA

---------------------------

The 'queue->nr' will wraparound from 0 to 255 when only current
priority queue has tasks. This maybe lead a deadlock same as commit
dfe1fe75e00e ("NFSv4: Fix deadlock between nfs4_evict_inode()
and nfs4_opendata_get_inode()"):

Privileged delegreturn task is queued to privileged list because all
the slots are assigned. When non-privileged task complete and release
the slot, a non-privileged maybe picked out. It maybe allocate slot
failed when the session on draining.

If the 'queue->nr' has wraparound to 255, and no enough slot to
service it, then the privileged delegreturn will lost to wake up.

So we should avoid the wraparound on 'queue->nr'.

Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 5fcdfacc ("NFSv4: Return delegations synchronously in evict_inode")
Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

9b06b695

bpf: Fix leakage under speculation on mispredicted branches · 78d76ae7

Daniel Borkmann authored 3 years ago

mainline inclusion
from mainline-v5.13-rc7
commit 9183671af6dbf60a1219371d4ed73e23f43b49db
category: bugfix
bugzilla: NA
CVE: CVE-2021-33624

--------------------------------

The verifier only enumerates valid control-flow paths and skips paths that
are unreachable in the non-speculative domain. And so it can miss issues
under speculative execution on mispredicted branches.

For example, a type confusion has been demonstrated with the following
crafted program:

  // r0 = pointer to a map array entry
  // r6 = pointer to readable stack slot
  // r9 = scalar controlled by attacker
  1: r0 = *(u64 *)(r0) // cache miss
  2: if r0 != 0x0 goto line 4
  3: r6 = r9
  4: if r0 != 0x1 goto line 6
  5: r9 = *(u8 *)(r6)
  6: // leak r9

Since line 3 runs iff r0 == 0 and line 5 runs iff r0 == 1, the verifier
concludes that the pointer dereference on line 5 is safe. But: if the
attacker trains both the branches to fall-through, such that the following
is spe...

78d76ae7

bpf: Do not mark insn as seen under speculative path verification · b2fdc6d8

Daniel Borkmann authored 3 years ago


mainline inclusion
from mainline-v5.13-rc7
commit fe9a5ca7e370e613a9a75a13008a3845ea759d6e
category: bugfix
bugzilla: NA
CVE: CVE-2021-33624

--------------------------------

... in such circumstances, we do not want to mark the instruction as seen given
the goal is still to jmp-1 rewrite/sanitize dead code, if it is not reachable
from the non-speculative path verification. We do however want to verify it for
safety regardless.

With the patch as-is all the insns that have been marked as seen before the
patch will also be marked as seen after the patch (just with a potentially
different non-zero count). An upcoming patch will also verify paths that are
unreachable in the non-speculative domain, hence this extension is needed.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Benedict Schlueter <benedict.schlueter@rub.de>
Reviewed-by: Piotr Krysiuk <piotras@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>

Conflicts:
  kernel/bpf/verifier.c

pass_cnt is not introduced in kernel-4.19.

Signed-off-by: He Fengqing <hefengqing@huawei.com>
Reviewed-by: Kuohai Xu <xukuohai@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

b2fdc6d8

bpf: Inherit expanded/patched seen count from old aux data · 9d1b583d

Daniel Borkmann authored 3 years ago


mainline inclusion
from mainline-v5.13-rc7
commit d203b0fd863a2261e5d00b97f3d060c4c2a6db71
category: bugfix
bugzilla: NA
CVE: CVE-2021-33624

--------------------------------

Instead of relying on current env->pass_cnt, use the seen count from the
old aux data in adjust_insn_aux_data(), and expand it to the new range of
patched instructions. This change is valid given we always expand 1:n
with n>=1, so what applies to the old/original instruction needs to apply
for the replacement as well.

Not relying on env->pass_cnt is a prerequisite for a later change where we
want to avoid marking an instruction seen when verified under speculative
execution path.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Benedict Schlueter <benedict.schlueter@rub.de>
Reviewed-by: Piotr Krysiuk <piotras@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>

Conflicts:
  kernel/bpf/verifier.c

seen of bpf_insn_aux_data is bool in kernel-4.19.

Signed-off-by: He Fengqing <hefengqing@huawei.com>
Reviewed-by: Kuohai Xu <xukuohai@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

9d1b583d

bpf: Update selftests to reflect new error states · 040bd002

Daniel Borkmann authored 3 years ago


stable inclusion
from linux-4.19.193
commit 138b0ec1064c8f154a32297458e562591a94773f

--------------------------------

commit d7a5091351756d0ae8e63134313c455624e36a13 upstream

Update various selftest error messages:

 * The 'Rx tried to sub from different maps, paths, or prohibited types'
   is reworked into more specific/differentiated error messages for better
   guidance.

 * The change into 'value -4294967168 makes map_value pointer be out of
   bounds' is due to moving the mixed bounds check into the speculation
   handling and thus occuring slightly later than above mentioned sanity
   check.

 * The change into 'math between map_value pointer and register with
   unbounded min value' is similarly due to register sanity check coming
   before the mixed bounds check.

 * The case of 'map access: known scalar += value_ptr from different maps'
   now loads fine given masks are the same from the different paths (despite
   max map value size being different).

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
[OP: 4.19 backport, account for split test_verifier and
different / missing tests]
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

040bd002

bpf, test_verifier: switch bpf_get_stack's 0 s> r8 test · 0dae2841

Daniel Borkmann authored 3 years ago


stable inclusion
from linux-4.19.193
commit d1e281d6cb8841122c4677b47fcebdc6f410bd74

--------------------------------

[ no upstream commit ]

Switch the comparison, so that is_branch_taken() will recognize that below
branch is never taken:

  [...]
  17: [...] R1_w=inv0 [...] R8_w=inv(id=0,smin_value=-2147483648,smax_value=-1,umin_value=18446744071562067968,var_off=(0xffffffff80000000; 0x7fffffff)) [...]
  17: (67) r8 <<= 32
  18: [...] R8_w=inv(id=0,smax_value=-4294967296,umin_value=9223372036854775808,umax_value=18446744069414584320,var_off=(0x8000000000000000; 0x7fffffff00000000)) [...]
  18: (c7) r8 s>>= 32
  19: [...] R8_w=inv(id=0,smin_value=-2147483648,smax_value=-1,umin_value=18446744071562067968,var_off=(0xffffffff80000000; 0x7fffffff)) [...]
  19: (6d) if r1 s> r8 goto pc+16
  [...] R1_w=inv0 [...] R8_w=inv(id=0,smin_value=-2147483648,smax_value=-1,umin_value=18446744071562067968,var_off=(0xffffffff80000000; 0x7fffffff)) [...]
  [...]

Currently we check for is_branch_taken() only if either K is source, or source
is a scalar value that is const. For upstream it would be good to extend this
properly to check whether dst is const and src not.

For the sake of the test_verifier, it is probably not needed here:

  # ./test_verifier 101
  #101/p bpf_get_stack return R0 within range OK
  Summary: 1 PASSED, 0 SKIPPED, 0 FAILED

I haven't seen this issue in test_progs* though, they are passing fine:

  # ./test_progs-no_alu32 -t get_stack
  Switching to flavor 'no_alu32' subdirectory...
  #20 get_stack_raw_tp:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

  # ./test_progs -t get_stack
  #20 get_stack_raw_tp:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[OP: backport to 4.19]
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

0dae2841

bpf: Test_verifier, bpf_get_stack return value add <0 · 011e1131

John Fastabend authored 3 years ago


stable inclusion
from linux-4.19.193
commit f915e7975fc2d593ddb60b67d14eef314eb6dd08

--------------------------------

commit 9ac26e99 upstream.

With current ALU32 subreg handling and retval refine fix from last
patches we see an expected failure in test_verifier. With verbose
verifier state being printed at each step for clarity we have the
following relavent lines [I omit register states that are not
necessarily useful to see failure cause],

#101/p bpf_get_stack return R0 within range FAIL
Failed to load prog 'Success'!
[..]
14: (85) call bpf_get_stack#67
 R0_w=map_value(id=0,off=0,ks=8,vs=48,imm=0)
 R3_w=inv48
15:
 R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
15: (b7) r1 = 0
16:
 R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
 R1_w=inv0
16: (bf) r8 = r0
17:
 R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
 R1_w=inv0
 R8_w=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
17: (67) r8 <<= 32
18:
 R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
 R1_w=inv0
 R8_w=inv(id=0,smax_value=9223372032559808512,
               umax_value=18446744069414584320,
               var_off=(0x0; 0xffffffff00000000),
               s32_min_value=0,
               s32_max_value=0,
               u32_max_value=0,
               var32_off=(0x0; 0x0))
18: (c7) r8 s>>= 32
19
 R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
 R1_w=inv0
 R8_w=inv(id=0,smin_value=-2147483648,
               smax_value=2147483647,
               var32_off=(0x0; 0xffffffff))
19: (cd) if r1 s< r8 goto pc+16
 R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
 R1_w=inv0
 R8_w=inv(id=0,smin_value=-2147483648,
               smax_value=0,
               var32_off=(0x0; 0xffffffff))
20:
 R0=inv(id=0,smax_value=48,var32_off=(0x0; 0xffffffff))
 R1_w=inv0
 R8_w=inv(id=0,smin_value=-2147483648,
               smax_value=0,
 R9=inv48
20: (1f) r9 -= r8
21: (bf) r2 = r7
22:
 R2_w=map_value(id=0,off=0,ks=8,vs=48,imm=0)
22: (0f) r2 += r8
value -2147483648 makes map_value pointer be out of bounds

After call bpf_get_stack() on line 14 and some moves we have at line 16
an r8 bound with max_value 48 but an unknown min value. This is to be
expected bpf_get_stack call can only return a max of the input size but
is free to return any negative error in the 32-bit register space. The
C helper is returning an int so will use lower 32-bits.

Lines 17 and 18 clear the top 32 bits with a left/right shift but use
ARSH so we still have worst case min bound before line 19 of -2147483648.
At this point the signed check 'r1 s< r8' meant to protect the addition
on line 22 where dst reg is a map_value pointer may very well return
true with a large negative number. Then the final line 22 will detect
this as an invalid operation and fail the program. What we want to do
is proceed only if r8 is positive non-error. So change 'r1 s< r8' to
'r1 s> r8' so that we jump if r8 is negative.

Next we will throw an error because we access past the end of the map
value. The map value size is 48 and sizeof(struct test_val) is 48 so
we walk off the end of the map value on the second call to
get bpf_get_stack(). Fix this by changing sizeof(struct test_val) to
24 by using 'sizeof(struct test_val) / 2'. After this everything passes
as expected.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/158560426019.10843.3285429543232025187.stgit@john-Precision-5820-Tower


Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[OP: backport to 4.19]
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

011e1131

bpf: extend is_branch_taken to registers · 6147ca1f

Alexei Starovoitov authored 3 years ago


stable inclusion
from linux-4.19.193
commit e0b86677fb3e4622b444dcdd8546caa0dba8a689

--------------------------------

commit fb8d251e upstream

This patch extends is_branch_taken() logic from JMP+K instructions
to JMP+X instructions.
Conditional branches are often done when src and dst registers
contain known scalars. In such case the verifier can follow
the branch that is going to be taken when program executes.
That speeds up the verification and is essential feature to support
bounded loops.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[OP: drop is_jmp32 parameter from is_branch_taken() calls and
     adjust context]
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

6147ca1f