Commits · 7a15f88aaaba933fac5cc22230a3387aa708a85b · Summer2022 / 22b970497

Oct 19, 2022

!137 net: bonding: Add support for IPV6 ns/na to balance-alb/balance-tlb mode · 7a15f88a

Merge Pull Request from: @babyihc 
 
This PR is cherry-pick from upstream "net: bonding: Add support for IPV6 ns/na to balance-alb/balance-tlb mode"

Upstream patch:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0da8aa00bfcfeb3f4e6537dd8e2001e0727ba549
 
 
Link:https://gitee.com/openeuler/kernel/pulls/137

 
Reviewed-by: Laibin Qiu <qiulaibin@huawei.com>
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>

Unverified

7a15f88a

kcm: avoid potential race in kcm_tx_work · a4c35ce0

Eric Dumazet authored 2 years ago

mainline inclusion
from mainline-v6.1-rc1
commit ec7eede369fe5b0d085ac51fdbb95184f87bfc6c
category: bugfix
bugzilla: 187823, https://gitee.com/src-openeuler/kernel/issues/I5VZ0N


CVE: CVE-2022-3521

--------------------------------

syzbot found that kcm_tx_work() could crash [1] in:

	/* Primarily for SOCK_SEQPACKET sockets */
	if (likely(sk->sk_socket) &&
	    test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) {
<<*>>	clear_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
		sk->sk_write_space(sk);
	}

I think the reason is that another thread might concurrently
run in kcm_release() and call sock_orphan(sk) while sk is not
locked. kcm_tx_work() find sk->sk_socket being NULL.

[1]
BUG: KASAN: null-ptr-deref in instrument_atomic_write include/linux/instrumented.h:86 [inline]
BUG: KASAN: null-ptr-deref in clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline]
BUG: KASAN: null-ptr-deref in kcm_tx_work+0xff/0x160 net/kcm/kcmsock.c:742
Write of size 8 at addr 0000000000000008 by task kworker/u4:3/53

CPU: 0 PID: 53 Comm: kworker/u4:3 Not tainted 5.19.0-rc3-next-20220621-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: kkcmd kcm_tx_work
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
kasan_report+0xbe/0x1f0 mm/kasan/report.c:495
check_region_inline mm/kasan/generic.c:183 [inline]
kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189
instrument_atomic_write include/linux/instrumented.h:86 [inline]
clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline]
kcm_tx_work+0xff/0x160 net/kcm/kcmsock.c:742
process_one_work+0x996/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e9/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
</TASK>

Fixes: ab7ac4eb ("kcm: Kernel Connection Multiplexor module")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Link: https://lore.kernel.org/r/20221012133412.519394-1-edumazet@google.com


Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Baisong Zhong <zhongbaisong@huawei.com>
Reviewed-by: Liu Jian <liujian56@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

a4c35ce0

net: bonding: Add support for IPV6 ns/na to balance-alb/balance-tlb mode · 1eefd25a

Sun Shouxin authored 3 years ago

mainline inclusion
from mainline-v5.4
commit 0da8aa00bfcfeb3f4e6537dd8e2001e0727ba549
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UY5E


CVE: NA

--------------------------------

Since ipv6 neighbor solicitation and advertisement messages
isn't handled gracefully in bond6 driver, we can see packet
drop due to inconsistency between mac address in the option
message and source MAC .

Another examples is ipv6 neighbor solicitation and advertisement
messages from VM via tap attached to host bridge, the src mac
might be changed through balance-alb mode, but it is not synced
with Link-layer address in the option message.

The patch implements bond6's tx handle for ipv6 neighbor
solicitation and advertisement messages.

Suggested-by: Hu Yadi <huyd12@chinatelecom.cn>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Sun Shouxin <sunshouxin@chinatelecom.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sun Shouxin <sunshouxin@chinatelecom.cn>

1eefd25a

Oct 18, 2022

!139 nvme: Assign subsys instance from first ctrl · c13fd95d

openeuler-ci-bot authored 2 years ago

Merge Pull Request from: @babyihc 
 
This PR is to fix nvme disk names not unique for the lifetime of the subsystem.
Kernel issuse:
https://gitee.com/openeuler/kernel/issues/I5986O
 
 
Link:https://gitee.com/openeuler/kernel/pulls/139

 
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>

Unverified

c13fd95d

fbdev: smscufx: Fix use-after-free in ufx_ops_open() · 4b472e21

Hyunwoo Kim authored 2 years ago

mainline inclusion
from mainline-v6.0-rc1
commit 5610bcfe8693c02e2e4c8b31427f1bdbdecc839c
category: bugfix
bugzilla: 187798, https://gitee.com/src-openeuler/kernel/issues/I5U1NZ


CVE: CVE-2022-41849

--------------------------------

A race condition may occur if the user physically removes the
USB device while calling open() for this device node.

This is a race condition between the ufx_ops_open() function and
the ufx_usb_disconnect() function, which may eventually result in UAF.

So, add a mutex to the ufx_ops_open() and ufx_usb_disconnect() functions
to avoid race contidion of krefs.

Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

4b472e21

nvme: fix controller instance leak · 1e83e42b

Keith Busch authored 4 years ago

mainline inclusion
from mainline-v5.9
commit 192f6c29
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5986O


CVE: NA

----------------------------------

If the driver has to unbind from the controller for an early failure
before the subsystem has been set up, there won't be a subsystem holding
the controller's instance, so the controller needs to free its own
instance in this case.

Fixes: 733e4b69 ("nvme: Assign subsys instance from first ctrl")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Xibo.Wang <wangxb12@chinatelecom.cn>

1e83e42b

nvme: Assign subsys instance from first ctrl · 7d574b4c

Keith Busch authored 5 years ago

mainline inclusion
from mainline-v5.4
commit 733e4b69
category: bugfix
bugzilla:https://gitee.com/openeuler/kernel/issues/I5986O


CVE: NA

--------------------------------

The namespace disk names must be unique for the lifetime of the
subsystem. This was accomplished by using their parent subsystems'
instances which were allocated independently from the controllers
connected to that subsystem. This allowed name prefixes assigned to
namespaces to match a controller from an unrelated subsystem, and has
created confusion among users examining device nodes.

Ensure a namespace's subsystem instance never clashes with a controller
instance of another subsystem by transferring the instance ownership
to the parent subsystem from the first controller discovered in that
subsystem.

Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Minwoo Im <minwoo.im@samsung.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Xibo.Wang <wangxb12@chinatelecom.cn>

7d574b4c

binder: fix UAF of ref->proc caused by race condition · edb25e44

Carlos Llamas authored 2 years ago

mainline inclusion
from mainline-v6.0-rc4
commit a0e44c64b6061dda7e00b7c458e4523e2331b739
category: bugfix
bugzilla: 187805, https://gitee.com/src-openeuler/kernel/issues/I5U713


CVE: CVE-2022-20421

--------------------------------

A transaction of type BINDER_TYPE_WEAK_HANDLE can fail to increment the
reference for a node. In this case, the target proc normally releases
the failed reference upon close as expected. However, if the target is
dying in parallel the call will race with binder_deferred_release(), so
the target could have released all of its references by now leaving the
cleanup of the new failed reference unhandled.

The transaction then ends and the target proc gets released making the
ref->proc now a dangling pointer. Later on, ref->node is closed and we
attempt to take spin_lock(&ref->proc->inner_lock), which leads to the
use-after-free bug reported below. Let's fix this by cleaning up the
failed reference on the spot instead of relying on the target to do so.

  ==================================================================
  BUG: KASAN: use-after-free in _raw_spin_lock+0xa8/0x150
  Write of size 4 at addr ffff5ca207094238 by task kworker/1:0/590

  CPU: 1 PID: 590 Comm: kworker/1:0 Not tainted 5.19.0-rc8 #10
  Hardware name: linux,dummy-virt (DT)
  Workqueue: events binder_deferred_func
  Call trace:
   dump_backtrace.part.0+0x1d0/0x1e0
   show_stack+0x18/0x70
   dump_stack_lvl+0x68/0x84
   print_report+0x2e4/0x61c
   kasan_report+0xa4/0x110
   kasan_check_range+0xfc/0x1a4
   __kasan_check_write+0x3c/0x50
   _raw_spin_lock+0xa8/0x150
   binder_deferred_func+0x5e0/0x9b0
   process_one_work+0x38c/0x5f0
   worker_thread+0x9c/0x694
   kthread+0x188/0x190
   ret_from_fork+0x10/0x20

Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Cc: stable <stable@kernel.org> # 4.14+
Link: https://lore.kernel.org/r/20220801182511.3371447-1-cmllamas@google.com


Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ren Zhijie <renzhijie2@huawei.com>
Reviewed-by: Zhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Chen Hui <judy.chenhui@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

4.19.90-2210.3.0

edb25e44

arm64: fix oops in concurrently setting insn_emulation sysctls · 08f3f0b2

haibinzhang (张海斌) authored 2 years ago

stable inclusion
from stable-v4.19.256
commit b51881b1da57fe9877125dfdd0aac5172958fcfd
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5U71M
CVE: CVE-2022-20422

---------------------------

[ Upstream commit af483947d472eccb79e42059276c4deed76f99a6 ]

emulation_proc_handler() changes table->data for proc_dointvec_minmax
and can generate the following Oops if called concurrently with itself:

 | Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
 | Internal error: Oops: 96000006 [#1] SMP
 | Call trace:
 | update_insn_emulation_mode+0xc0/0x148
 | emulation_proc_handler+0x64/0xb8
 | proc_sys_call_handler+0x9c/0xf8
 | proc_sys_write+0x18/0x20
 | __vfs_write+0x20/0x48
 | vfs_write+0xe4/0x1d0
 | ksys_write+0x70/0xf8
 | __arm64_sys_write+0x20/0x28
 | el0_svc_common.constprop.0+0x7c/0x1c0
 | el0_svc_handler+0x2c/0xa0
 | el0_svc+0x8/0x200

To fix this issue, keep the table->data as &insn->cur...

08f3f0b2

Oct 17, 2022

mm/hotplug: silence a lockdep splat with printk() · 988dd3e9

Qian Cai authored 2 years ago

mainline inclusion
from mainline-v5.6-rc1
commit 4a55c0474a92d5c418bcbbe122368de0910aeac2q
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5T5DD
CVE: NA

--------------------------------

It is not that hard to trigger lockdep splats by calling printk from
under zone->lock.  Most of them are false positives caused by lock
chains introduced early in the boot process and they do not cause any
real problems (although most of the early boot lock dependencies could
happen after boot as well).  There are some console drivers which do
allocate from the printk context as well and those should be fixed.  In
any case, false positives are not that trivial to workaround and it is
far from optimal to lose lockdep functionality for something that is a
non-issue.

So change has_unmovable_pages() so that it no longer calls dump_page()
itself - instead it returns a "struct page *" of the unmovable page back
to the caller so that in the case of a has_u...

988dd3e9

init/Kconfig: Add SMP to the dependencies of QOS_SCHED · 72b3422d

Xia Fukun authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UPB0
CVE: NA

------------------------------------------------------------

After CONFIG_SMP is disabled during kernel compilation,
CONFIG_QOS_SCHED is not disabled.
As a result, the following error occurs:

kernel/sched/fair.c: In function ‘check_qos_cfs_rq’:
kernel/sched/fair.c:7324:4: error: implicit declaration of function
‘sched_idle_cpu’; did you mean ‘sched_idle_rq’?
[-Werror=implicit-function-declaration]
 7324 |   !sched_idle_cpu(smp_processor_id()) &&
      |    ^~~~~~~~~~~~~~
./include/linux/compiler.h:78:42: note: in definition of macro ‘unlikely’
   78 | # define unlikely(x) __builtin_expect(!!(x), 0)
      |                                          ^
  CC      mm/highmem.o
kernel/sched/fair.c: In function ‘pick_next_task_fair’:
kernel/sched/fair.c:7599:43: error: ‘struct rq’ has no member named ‘online’
 7599 |   if (cfs_rq->...

72b3422d

Oct 14, 2022

mm/rmap: Fix kabi broken in anon_vma · 6d5d324d

Ma Wupeng authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5USOP


CVE: CVE-2022-42703

--------------------------------

In order to fix CVE-2022-42703, degree in struct anon_vma is splited into
two variables(num_children, num_active_vmas) and this will lead to broken
kabi.

Since struct anon_vma is only used by mm module and referenced by other
struct as pointers. So we can ignore this kabi change warning.

For variable degree in struct anon_vma, previous patch has already
delete this but this will lead to kabi change. Add it back at the same
position. For variables(num_children, num_active_vmas), put them
between macro __GENKSYMS__.

Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

6d5d324d

mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse · a3544c89

Jann Horn authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 6dbfc25d68d922736381988d64156a649ccf7bf1
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5USOP


CVE: CVE-2022-42703

--------------------------------

commit 2555283eb40df89945557273121e9393ef9b542b upstream.

anon_vma->degree tracks the combined number of child anon_vmas and VMAs
that use the anon_vma as their ->anon_vma.

anon_vma_clone() then assumes that for any anon_vma attached to
src->anon_vma_chain other than src->anon_vma, it is impossible for it to
be a leaf node of the VMA tree, meaning that for such VMAs ->degree is
elevated by 1 because of a child anon_vma, meaning that if ->degree
equals 1 there are no VMAs that use the anon_vma as their ->anon_vma.

This assumption is wrong because the ->degree optimization leads to leaf
nodes being abandoned on anon_vma_clone() - an existing anon_vma is
reused and no new parent-child relationship is created.  So it is
possible to reuse an anon_vma for one VMA while it is still tied to
another VMA.

This is an issue because is_mergeable_anon_vma() and its callers assume
that if two VMAs have the same ->anon_vma, the list of anon_vmas
attached to the VMAs is guaranteed to be the same.  When this assumption
is violated, vma_merge() can merge pages into a VMA that is not attached
to the corresponding anon_vma, leading to dangling page->mapping
pointers that will be dereferenced during rmap walks.

Fix it by separately tracking the number of child anon_vmas and the
number of VMAs using the anon_vma as their ->anon_vma.

Fixes: 7a3ef208 ("mm: prevent endless growth of anon_vma hierarchy")
Cc: stable@kernel.org
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

a3544c89

HID: roccat: Fix use-after-free in roccat_read() · 4d870684

Hyunwoo Kim authored 2 years ago

mainline inclusion
from mainline master
commit cacdb14b1c8d3804a3a7d31773bc7569837b71a4
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5U1PE


CVE: CVE-2022-41850

--------------------------------

roccat_report_event() is responsible for registering
roccat-related reports in struct roccat_device.

int roccat_report_event(int minor, u8 const *data)
{
	struct roccat_device *device;
	struct roccat_reader *reader;
	struct roccat_report *report;
	uint8_t *new_value;

	device = devices[minor];

	new_value = kmemdup(data, device->report_size, GFP_ATOMIC);
	if (!new_value)
		return -ENOMEM;

	report = &device->cbuf[device->cbuf_end];

	/* passing NULL is safe */
	kfree(report->value);
	...

The registered report is stored in the struct roccat_device member
"struct roccat_report cbuf[ROCCAT_CBUF_SIZE];".
If more reports are received than the "ROCCAT_CBUF_SIZE" value,
kfree() the saved report from cbuf[0] and allocates a new reprot.
Since there is no lock when this kfree() is performed,
kfree() can be performed even while reading the saved report.

static ssize_t roccat_read(struct file *file, char __user *buffer,
		size_t count, loff_t *ppos)
{
	struct roccat_reader *reader = file->private_data;
	struct roccat_device *device = reader->device;
	struct roccat_report *report;
	ssize_t retval = 0, len;
	DECLARE_WAITQUEUE(wait, current);

	mutex_lock(&device->cbuf_lock);

	...

	report = &device->cbuf[reader->cbuf_start];
	/*
	 * If report is larger than requested amount of data, rest of report
	 * is lost!
	 */
	len = device->report_size > count ? count : device->report_size;

	if (copy_to_user(buffer, report->value, len)) {
		retval = -EFAULT;
		goto exit_unlock;
	}
	...

The roccat_read() function receives the device->cbuf report and
delivers it to the user through copy_to_user().
If the N+ROCCAT_CBUF_SIZE th report is received while copying of
the Nth report->value is in progress, the pointer that copy_to_user()
is working on is kfree()ed and UAF read may occur. (race condition)

Since the device node of this driver does not set separate permissions,
this is not a security vulnerability, but because it is used for
requesting screen display of profile or dpi settings,
a user using the roccat device can apply udev to this device node or
There is a possibility to use it by giving.

Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Cai Xinchen <caixinchen1@huawei.com>
Reviewed-by: Ruiqi Gong <gongruiqi1@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

4d870684

Oct 11, 2022

ext4: fix dir corruption when ext4_dx_add_entry() fails · e921a80d

Zhihao Cheng authored 2 years ago

mainline inclusion
from mainline-v6.0-rc2
commit 7177dd009c7c04290891e9a534cd47d1b620bd04
category: bugfix
bugzilla: 187586, https://gitee.com/openeuler/kernel/issues/I5V3VI
CVE: NA

--------------------------------

Following process may lead to fs corruption:
1. ext4_create(dir/foo)
 ext4_add_nondir
  ext4_add_entry
   ext4_dx_add_entry
     a. add_dirent_to_buf
      ext4_mark_inode_dirty
      ext4_handle_dirty_metadata   // dir inode bh is recorded into journal
     b. ext4_append    // dx_get_count(entries) == dx_get_limit(entries)
       ext4_bread(EXT4_GET_BLOCKS_CREATE)
        ext4_getblk
         ext4_map_blocks
          ext4_ext_map_blocks
            ext4_mb_new_blocks
             dquot_alloc_block
              dquot_alloc_space_nodirty
               inode_add_bytes    // update dir's i_blocks
            ext4_ext_insert_extent
	     ext4_ext_dirty  // record extent bh into journal
              ext4_handle_dirty_metadata(bh)
	      ...

4.19.90-2210.2.0

e921a80d

quota: Add more checking after reading from quota file · 5de13cee

Zhihao Cheng authored 2 years ago

mainline inclusion
from mainline-v6.0-rc2
commit 191249f708897fc34c78f4494f7156896aaaeca9
category: bugfix
bugzilla: 187046, https://gitee.com/openeuler/kernel/issues/I5QH0X
CVE: NA

--------------------------------

It would be better to do more sanity checking (eg. dqdh_entries,
block no.) for the content read from quota file, which can prevent
corrupting the quota file.

Link: https://lore.kernel.org/r/20220923134555.2623931-4-chengzhihao1@huawei.com


Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

5de13cee

quota: Replace all block number checking with helper function · 5606887f

Zhihao Cheng authored 2 years ago

mainline inclusion
from mainline-v6.0-rc2
commit 3fc61e0e96a3261aacfd3150fb3a9228f7ce5dd6
category: bugfix
bugzilla: 187046, https://gitee.com/openeuler/kernel/issues/I5QH0X
CVE: NA

--------------------------------

Cleanup all block checking places, replace them with helper function
do_check_range().

Link: https://lore.kernel.org/r/20220923134555.2623931-3-chengzhihao1@huawei.com


Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

5606887f

quota: Check next/prev free block number after reading from quota file · 8b165737

Zhihao Cheng authored 2 years ago

mainline inclusion
from mainline-v6.0-rc2
commit 6c8ea8b8cd4722efd419f91ca46a2dc81b7d89a3
category: bugfix
bugzilla: 187046, https://gitee.com/openeuler/kernel/issues/I5QH0X
CVE: NA

--------------------------------

Following process:
 Init: v2_read_file_info: <3> dqi_free_blk 0 dqi_free_entry 5 dqi_blks 6

 Step 1. chown bin f_a -> dquot_acquire -> v2_write_dquot:
  qtree_write_dquot
   do_insert_tree
    find_free_dqentry
     get_free_dqblk
      write_blk(info->dqi_blocks) // info->dqi_blocks = 6, failure. The
	   content in physical block (corresponding to blk 6) is random.

 Step 2. chown root f_a -> dquot_transfer -> dqput_all -> dqput ->
         ext4_release_dquot -> v2_release_dquot -> qtree_delete_dquot:
  dquot_release
   remove_tree
    free_dqentry
     put_free_dqblk(6)
      info->dqi_free_blk = blk    // info->dqi_free_blk = 6

 Step 3. drop cache (buffer head for block 6 is released)

 Step 4. chown bin f_b -> dquot_acquire -> commit_dqblk -> v2_write_dquot:
  qtree_write_dquot
   do_insert_tree
    find_free_dqentry
     get_free_dqblk
      dh = (struct qt_disk_dqdbheader *)buf
      blk = info->dqi_free_blk     // 6
      ret = read_blk(info, blk, buf)  // The content of buf is random
      info->dqi_free_blk = le32_to_cpu(dh->dqdh_next_free)  // random blk

 Step 5. chown bin f_c -> notify_change -> ext4_setattr -> dquot_transfer:
  dquot = dqget -> acquire_dquot -> ext4_acquire_dquot -> dquot_acquire ->
          commit_dqblk -> v2_write_dquot -> dq_insert_tree:
   do_insert_tree
    find_free_dqentry
     get_free_dqblk
      blk = info->dqi_free_blk    // If blk < 0 and blk is not an error
				     code, it will be returned as dquot

  transfer_to[USRQUOTA] = dquot  // A random negative value
  __dquot_transfer(transfer_to)
   dquot_add_inodes(transfer_to[cnt])
    spin_lock(&dquot->dq_dqb_lock)  // page fault

, which will lead to kernel page fault:
 Quota error (device sda): qtree_write_dquot: Error -8000 occurred
 while creating quota
 BUG: unable to handle page fault for address: ffffffffffffe120
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 Oops: 0002 [#1] PREEMPT SMP
 CPU: 0 PID: 5974 Comm: chown Not tainted 6.0.0-rc1-00004
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
 RIP: 0010:_raw_spin_lock+0x3a/0x90
 Call Trace:
  dquot_add_inodes+0x28/0x270
  __dquot_transfer+0x377/0x840
  dquot_transfer+0xde/0x540
  ext4_setattr+0x405/0x14d0
  notify_change+0x68e/0x9f0
  chown_common+0x300/0x430
  __x64_sys_fchownat+0x29/0x40

In order to avoid accessing invalid quota memory address, this patch adds
block number checking of next/prev free block read from quota file.

Fetch a reproducer in [Link].

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216372
Fixes: 1da177e4 ("Linux-2.6.12-rc2")
CC: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220923134555.2623931-2-chengzhihao1@huawei.com


Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

8b165737

Revert "quota: Check next/prev free block number after reading from quota file" · be12b77e

Zhong Jinghua authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: 187046, https://gitee.com/openeuler/kernel/issues/I5QH0X


CVE: NA

--------------------------------

This reverts commit 6c27d754.

The mainline of this patch already exists, revert the interim patch.

Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

be12b77e

Revert "quota: Replace all block number checking with helper function" · 592e66e1

Zhong Jinghua authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: 187046, https://gitee.com/openeuler/kernel/issues/I5QH0X


CVE: NA

--------------------------------

This reverts commit 1e9a49cf.

The mainline of this patch already exists, revert the interim patch

Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

592e66e1

Revert "quota: Add more checking after reading from quota file" · 8cad765b

Zhong Jinghua authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: 187046, https://gitee.com/openeuler/kernel/issues/I5QH0X


CVE: NA

--------------------------------

This reverts commit f66997d9.

The mainline of this patch already exists, revert the interim patch.

Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

8cad765b

tracefs: Only clobber mode/uid/gid on remount if asked · b15e501f

Brian Norris authored 2 years ago

stable inclusion
from stable-v4.19.259
commit 27591187f95e6f67a40bba46f8306699d2ecbd81
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5V45M
CVE: NA

---------------------------

[ Upstream commit 47311db8e8f33011d90dee76b39c8886120cdda4 ]

Users may have explicitly configured their tracefs permissions; we
shouldn't overwrite those just because a second mount appeared.

Only clobber if the options were provided at mount time.

Note: the previous behavior was especially surprising in the presence of
automounted /sys/kernel/debug/tracing/.

Existing behavior:

  ## Pre-existing status: tracefs is 0755.
  # stat -c '%A' /sys/kernel/tracing/
  drwxr-xr-x

  ## (Re)trigger the automount.
  # umount /sys/kernel/debug/tracing
  # stat -c '%A' /sys/kernel/debug/tracing/.
  drwx------

  ## Unexpected: the automount changed mode for other mount instances.
  # stat -c '%A' /sys/kernel/tracing/
  drwx------

New behavior (after this change):

  ## Pre-existing status: tracefs is 0755.
  # stat -c '%A' /sys/kernel/tracing/
  drwxr-xr-x

  ## (Re)trigger the automount.
  # umount /sys/kernel/debug/tracing
  # stat -c '%A' /sys/kernel/debug/tracing/.
  drwxr-xr-x

  ## Expected: the automount does not change other mount instances.
  # stat -c '%A' /sys/kernel/tracing/
  drwxr-xr-x

Link: https://lkml.kernel.org/r/20220826174353.2.Iab6e5ea57963d6deca5311b27fb7226790d44406@changeid



Cc: stable@vger.kernel.org
Fixes: 4282d606 ("tracefs: Add new tracefs file system")
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

b15e501f

netfilter: ebtables: fix memory leak when blob is malformed · 2265bd1b

Florian Westphal authored 2 years ago

stable inclusion
from stable-v4.19.260
commit 1e98318af2f163eadaff815abcef38d27ca92c1e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit 62ce44c4fff947eebdf10bb582267e686e6835c9 ]

The bug fix was incomplete, it "replaced" crash with a memory leak.
The old code had an assignment to "ret" embedded into the conditional,
restore this.

Fixes: 7997eff82828 ("netfilter: ebtables: reject blobs that don't provide all entry points")
Reported-and-tested-by:  <syzbot+a24c5252f3e3ab733464@syzkaller.appspotmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Xu Jia <xujia39@huawei.com>
Reviewed-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

2265bd1b

netfilter: ebtables: reject blobs that don't provide all entry points · 6c76c592

Florian Westphal authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 358765beb836f5fc2ed26b5df4140d5d3548ac11
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit 7997eff82828304b780dc0a39707e1946d6f1ebf ]

Harshit Mogalapalli says:
 In ebt_do_table() function dereferencing 'private->hook_entry[hook]'
 can lead to NULL pointer dereference. [..] Kernel panic:

general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
[..]
RIP: 0010:ebt_do_table+0x1dc/0x1ce0
Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 5c 16 00 00 48 b8 00 00 00 00 00 fc ff df 49 8b 6c df 08 48 8d 7d 2c 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 88
[..]
Call Trace:
 nf_hook_slow+0xb1/0x170
 __br_forward+0x289/0x730
 maybe_deliver+0x24b/0x380
 br_flood+0xc6/0x390
 br_dev_xmit+0xa2e/0x12c0

For some reason ebtables rejects blobs that provide entry points that are
not supported by the table, but what it should instead reject is the
opposite: blobs that DO NOT provide an entry point supported by the table.

t->valid_hooks is the bitmask of hooks (input, forward ...) that will see
packets.  Providing an entry point that is not support is harmless
(never called/used), but the inverse isn't: it results in a crash
because the ebtables traverser doesn't expect a NULL blob for a location
its receiving packets for.

Instead of fixing all the individual checks, do what iptables is doing and
reject all blobs that differ from the expected hooks.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Xu Jia <xujia39@huawei.com>
Reviewed-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

6c76c592

mm: Fix TLB flush for not-first PFNMAP mappings in unmap_region() · e7a18d60

Jann Horn authored 2 years ago

stable inclusion
from stable-v4.19.259
commit 56fa5f3dd44a05a5eacd75ae9d00c5415046d371
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

This is a stable-specific patch.
I botched the stable-specific rewrite of
commit b67fbebd4cf98 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas"):
As Hugh pointed out, unmap_region() actually operates on a list of VMAs,
and the variable "vma" merely points to the first VMA in that list.
So if we want to check whether any of the VMAs we're operating on is
PFNMAP or MIXEDMAP, we have to iterate through the list and check each VMA.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

e7a18d60

SUNRPC: use _bh spinlocking on ->transport_lock · 7cbdfd4c

NeilBrown authored 2 years ago

stable inclusion
from stable-v4.19.258
commit bcab4d551a3d15bdf8efaac7b61420219bd154b6
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

Prior to Linux 5.3, ->transport_lock in sunrpc required the _bh style
spinlocks (when not called from a bottom-half handler).

When upstream 3848e96edf4788f772d83990022fa7023a233d83 was backported to
stable kernels, the spin_lock/unlock calls should have been changed to
the _bh version, but this wasn't noted in the patch and didn't happen.

So convert these lock/unlock calls to the _bh versions.

This patch is required for any stable kernel prior to 5.3 to which the
above mentioned patch was backported.  Namely 4.9.y, 4.14.y, 4.19.y.

Signed-off-by: NeilBrown <neilb@suse.de>
Reported-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Reviewed-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Tested-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

7cbdfd4c

tcp: fix early ETIMEDOUT after spurious non-SACK RTO · d2189b9e

Neal Cardwell authored 2 years ago

stable inclusion
from stable-v4.19.258
commit 4d941fdf910bd6da3a017627ef9c08e9f3f17c06
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit 686dc2db2a0fdc1d34b424ec2c0a735becd8d62b ]

Fix a bug reported and analyzed by Nagaraj Arankal, where the handling
of a spurious non-SACK RTO could cause a connection to fail to clear
retrans_stamp, causing a later RTO to very prematurely time out the
connection with ETIMEDOUT.

Here is the buggy scenario, expanding upon Nagaraj Arankal's excellent
report:

(*1) Send one data packet on a non-SACK connection

(*2) Because no ACK packet is received, the packet is retransmitted
     and we enter CA_Loss; but this retransmission is spurious.

(*3) The ACK for the original data is received. The transmitted packet
     is acknowledged.  The TCP timestamp is before the retrans_stamp,
     so tcp_may_undo() returns true, and tcp_try_undo_loss() returns
     true without changing state to Open (because tcp_is_sack() is
     false), and tcp_process_loss() returns without calling
     tcp_try_undo_recovery().  Normally after undoing a CA_Loss
     episode, tcp_fastretrans_alert() would see that the connection
     has returned to CA_Open and fall through and call
     tcp_try_to_open(), which would set retrans_stamp to 0.  However,
     for non-SACK connections we hold the connection in CA_Loss, so do
     not fall through to call tcp_try_to_open() and do not set
     retrans_stamp to 0. So retrans_stamp is (erroneously) still
     non-zero.

     At this point the first "retransmission event" has passed and
     been recovered from. Any future retransmission is a completely
     new "event". However, retrans_stamp is erroneously still
     set. (And we are still in CA_Loss, which is correct.)

(*4) After 16 minutes (to correspond with tcp_retries2=15), a new data
     packet is sent. Note: No data is transmitted between (*3) and
     (*4) and we disabled keep alives.

     The socket's timeout SHOULD be calculated from this point in
     time, but instead it's calculated from the prior "event" 16
     minutes ago (step (*2)).

(*5) Because no ACK packet is received, the packet is retransmitted.

(*6) At the time of the 2nd retransmission, the socket returns
     ETIMEDOUT, prematurely, because retrans_stamp is (erroneously)
     too far in the past (set at the time of (*2)).

This commit fixes this bug by ensuring that we reuse in
tcp_try_undo_loss() the same careful logic for non-SACK connections
that we have in tcp_try_undo_recovery(). To avoid duplicating logic,
we factor out that logic into a new
tcp_is_non_sack_preventing_reopen() helper and call that helper from
both undo functions.

Fixes: da34ac76 ("tcp: only undo on partial ACKs in CA_Loss")
Reported-by: Nagaraj Arankal <nagaraj.p.arankal@hpe.com>
Link: https://lore.kernel.org/all/SJ0PR84MB1847BE6C24D274C46A1B9B0EB27A9@SJ0PR84MB1847.NAMPRD84.PROD.OUTLOOK.COM/


Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220903121023.866900-1-ncardwell.kernel@gmail.com


Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

d2189b9e

netfilter: br_netfilter: Drop dst references before setting. · 858c6aad

Harsh Modi authored 2 years ago

stable inclusion
from stable-v4.19.258
commit 89810dbbffff7efae8d83b39557a50a7dfc65b44
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit d047283a7034140ea5da759a494fd2274affdd46 ]

The IPv6 path already drops dst in the daddr changed case, but the IPv4
path does not. This change makes the two code paths consistent.

Further, it is possible that there is already a metadata_dst allocated from
ingress that might already be attached to skbuff->dst while following
the bridge path. If it is not released before setting a new
metadata_dst, it will be leaked. This is similar to what is done in
bpf_set_tunnel_key() or ip6_route_input().

It is important to note that the memory being leaked is not the dst
being set in the bridge code, but rather memory allocated from some
other code path that is not being freed correctly before the skb dst is
overwritten.

An example of the leakage fixed by this commit found using kmemleak:

unreferenced object 0xffff888010112b00 (size 256):
  comm "softirq", pid 0, jiffies 4294762496 (age 32.012s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 80 16 f1 83 ff ff ff ff  ................
    e1 4e f6 82 ff ff ff ff 00 00 00 00 00 00 00 00  .N..............
  backtrace:
    [<00000000d79567ea>] metadata_dst_alloc+0x1b/0xe0
    [<00000000be113e13>] udp_tun_rx_dst+0x174/0x1f0
    [<00000000a36848f4>] geneve_udp_encap_recv+0x350/0x7b0
    [<00000000d4afb476>] udp_queue_rcv_one_skb+0x380/0x560
    [<00000000ac064aea>] udp_unicast_rcv_skb+0x75/0x90
    [<000000009a8ee8c5>] ip_protocol_deliver_rcu+0xd8/0x230
    [<00000000ef4980bb>] ip_local_deliver_finish+0x7a/0xa0
    [<00000000d7533c8c>] __netif_receive_skb_one_core+0x89/0xa0
    [<00000000a879497d>] process_backlog+0x93/0x190
    [<00000000e41ade9f>] __napi_poll+0x28/0x170
    [<00000000b4c0906b>] net_rx_action+0x14f/0x2a0
    [<00000000b20dd5d4>] __do_softirq+0xf4/0x305
    [<000000003a7d7e15>] __irq_exit_rcu+0xc3/0x140
    [<00000000968d39a2>] sysvec_apic_timer_interrupt+0x9e/0xc0
    [<000000009e920794>] asm_sysvec_apic_timer_interrupt+0x16/0x20
    [<000000008942add0>] native_safe_halt+0x13/0x20

Florian Westphal says: "Original code was likely fine because nothing
ever did set a skb->dst entry earlier than bridge in those days."

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: Harsh Modi <harshmodi@google.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

858c6aad

debugfs: add debugfs_lookup_and_remove() · a1c25212

Greg Kroah-Hartman authored 2 years ago

stable inclusion
from stable-v4.19.258
commit ebfb744bb6036485dac2a45c46e82b911c3726d1
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

commit dec9b2f1e0455a151a7293c367da22ab973f713e upstream.

There is a very common pattern of using
debugfs_remove(debufs_lookup(..)) which results in a dentry leak of the
dentry that was looked up.  Instead of having to open-code the correct
pattern of calling dput() on the dentry, create
debugfs_lookup_and_remove() to handle this pattern automatically and
properly without any memory leaks.

Cc: stable <stable@kernel.org>
Reported-by: Kuyo Chang <kuyo.chang@mediatek.com>
Tested-by: Kuyo Chang <kuyo.chang@mediatek.com>
Link: https://lore.kernel.org/r/YxIaQ8cSinDR881k@kroah.com


Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

a1c25212

tcp: annotate data-race around challenge_timestamp · fbf064eb

Eric Dumazet authored 2 years ago

stable inclusion
from stable-v4.19.258
commit 12f99f07a5f4d7ec8d72da6ee8ef66f048634f6a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit 8c70521238b7863c2af607e20bcba20f974c969b ]

challenge_timestamp can be read an written by concurrent threads.

This was expected, but we need to annotate the race to avoid potential issues.

Following patch moves challenge_timestamp and challenge_count
to per-netns storage to provide better isolation.

Fixes: 354e4aa3 ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

fbf064eb

Revert "mm: kmemleak: take a full lowmem check in kmemleak_*_phys()" · 6afb63d8

Yee Lee authored 2 years ago

stable inclusion
from stable-v4.19.258
commit aabdbd63c46ff6fa6661dd9d3aaaebb62141db90
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

This reverts commit 23c2d497de21f25898fbea70aeb292ab8acc8c94.

Commit 23c2d497de21 ("mm: kmemleak: take a full lowmem check in
kmemleak_*_phys()") brought false leak alarms on some archs like arm64
that does not init pfn boundary in early booting. The final solution
lands on linux-6.0: commit 0c24e061196c ("mm: kmemleak: add rbtree and
store physical address for objects allocated with PA").

Revert this commit before linux-6.0. The original issue of invalid PA
can be mitigated by additional check in devicetree.

The false alarm report is as following: Kmemleak output: (Qemu/arm64)
unreferenced object 0xffff0000c0170a00 (size 128):
  comm "swapper/0", pid 1, jiffies 4294892404 (age 126.208s)
  hex dump (first 32 bytes):
 62 61 73 65 00 00 00 00 00 00 00 00 00 00 00 00  base............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<(____ptrval____)>] __kmalloc_track_caller+0x1b0/0x2e4
    [<(____ptrval____)>] kstrdup_const+0x8c/0xc4
    [<(____ptrval____)>] kvasprintf_const+0xbc/0xec
    [<(____ptrval____)>] kobject_set_name_vargs+0x58/0xe4
    [<(____ptrval____)>] kobject_add+0x84/0x100
    [<(____ptrval____)>] __of_attach_node_sysfs+0x78/0xec
    [<(____ptrval____)>] of_core_init+0x68/0x104
    [<(____ptrval____)>] driver_init+0x28/0x48
    [<(____ptrval____)>] do_basic_setup+0x14/0x28
    [<(____ptrval____)>] kernel_init_freeable+0x110/0x178
    [<(____ptrval____)>] kernel_init+0x20/0x1a0
    [<(____ptrval____)>] ret_from_fork+0x10/0x20

This pacth is also applicable to linux-5.17.y/linux-5.18.y/linux-5.19.y

Cc: <stable@vger.kernel.org>
Signed-off-by: Yee Lee <yee.lee@mediatek.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

6afb63d8

net: neigh: don't call kfree_skb() under spin_lock_irqsave() · afb3765b

Yang Yingliang authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 509e1b324aad95019ff14b8eb88fdfa500d9a489
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

commit d5485d9dd24e1d04e5509916515260186eb1455c upstream.

It is not allowed to call kfree_skb() from hardware interrupt
context or with interrupts being disabled. So add all skb to
a tmp list, then free them after spin_unlock_irqrestore() at
once.

Fixes: 66ba215cb513 ("neigh: fix possible DoS due to net iface start/stop loop")
Suggested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

afb3765b

neigh: fix possible DoS due to net iface start/stop loop · 68d9a8b1

Denis V. Lunev authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 05fdce1ae744dee43c9181fd063c9c0db4f777f2
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit 66ba215cb51323e4e55e38fd5f250e0fae0cbc94 ]

Normal processing of ARP request (usually this is Ethernet broadcast
packet) coming to the host is looking like the following:
* the packet comes to arp_process() call and is passed through routing
  procedure
* the request is put into the queue using pneigh_enqueue() if
  corresponding ARP record is not local (common case for container
  records on the host)
* the request is processed by timer (within 80 jiffies by default) and
  ARP reply is sent from the same arp_process() using
  NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED condition (flag is set inside
  pneigh_enqueue())

And here the problem comes. Linux kernel calls pneigh_queue_purge()
which destroys the whole queue of ARP requests on ANY network interface
start/stop event through __neigh_ifdown().

This is actually not a problem within the original world as network
interface start/stop was accessible to the host 'root' only, which
could do more destructive things. But the world is changed and there
are Linux containers available. Here container 'root' has an access
to this API and could be considered as untrusted user in the hosting
(container's) world.

Thus there is an attack vector to other containers on node when
container's root will endlessly start/stop interfaces. We have observed
similar situation on a real production node when docker container was
doing such activity and thus other containers on the node become not
accessible.

The patch proposed doing very simple thing. It drops only packets from
the same namespace in the pneigh_queue_purge() where network interface
state change is detected. This is enough to prevent the problem for the
whole node preserving original semantics of the code.

v2:
	- do del_timer_sync() if queue is empty after pneigh_queue_purge()
v3:
	- rebase to net tree

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@kernel.org>
Cc: Yajun Deng <yajun.deng@linux.dev>
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Cc: Konstantin Khorenko <khorenko@virtuozzo.com>
Cc: kernel@openvz.org
Cc: devel@openvz.org
Investigated-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

68d9a8b1

mm/hugetlb: fix hugetlb not supporting softdirty tracking · 67a9ed93

David Hildenbrand authored 2 years ago

stable inclusion
from stable-v4.19.257
commit b18376f359baff4fa5132d3accd1a05cd8d526e4
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4
CVE: NA

--------------------------------

commit f96f7a40874d7c746680c0b9f57cef2262ae551f upstream.

Patch series "mm/hugetlb: fix write-fault handling for shared mappings", v2.

I observed that hugetlb does not support/expect write-faults in shared
mappings that would have to map the R/O-mapped page writable -- and I
found two case where we could currently get such faults and would
erroneously map an anon page into a shared mapping.

Reproducers part of the patches.

I propose to backport both fixes to stable trees.  The first fix needs a
small adjustment.

This patch (of 2):

Staring at hugetlb_wp(), one might wonder where all the logic for shared
mappings is when stumbling over a write-protected page in a shared
mapping.  In fact, there is none, and so far we thought we could get away
with that because e.g., mprotect() should always do the right thing and
map all pages directly writable.

Looks like we were wrong:

--------------------------------------------------------------------------
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <fcntl.h>
 #include <unistd.h>
 #include <errno.h>
 #include <sys/mman.h>

 #define HUGETLB_SIZE (2 * 1024 * 1024u)

 static void clear_softdirty(void)
 {
         int fd = open("/proc/self/clear_refs", O_WRONLY);
         const char *ctrl = "4";
         int ret;

         if (fd < 0) {
                 fprintf(stderr, "open(clear_refs) failed\n");
                 exit(1);
         }
         ret = write(fd, ctrl, strlen(ctrl));
         if (ret != strlen(ctrl)) {
                 fprintf(stderr, "write(clear_refs) failed\n");
                 exit(1);
         }
         close(fd);
 }

 int main(int argc, char **argv)
 {
         char *map;
         int fd;

         fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT);
         if (!fd) {
                 fprintf(stderr, "open() failed\n");
                 return -errno;
         }
         if (ftruncate(fd, HUGETLB_SIZE)) {
                 fprintf(stderr, "ftruncate() failed\n");
                 return -errno;
         }

         map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
         if (map == MAP_FAILED) {
                 fprintf(stderr, "mmap() failed\n");
                 return -errno;
         }

         *map = 0;

         if (mprotect(map, HUGETLB_SIZE, PROT_READ)) {
                 fprintf(stderr, "mmprotect() failed\n");
                 return -errno;
         }

         clear_softdirty();

         if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) {
                 fprintf(stderr, "mmprotect() failed\n");
                 return -errno;
         }

         *map = 0;

         return 0;
 }
--------------------------------------------------------------------------

Above test fails with SIGBUS when there is only a single free hugetlb page.
 # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
 # ./test
 Bus error (core dumped)

And worse, with sufficient free hugetlb pages it will map an anonymous page
into a shared mapping, for example, messing up accounting during unmap
and breaking MAP_SHARED semantics:
 # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
 # ./test
 # cat /proc/meminfo | grep HugePages_
 HugePages_Total:       2
 HugePages_Free:        1
 HugePages_Rsvd:    18446744073709551615
 HugePages_Surp:        0

Reason in this particular case is that vma_wants_writenotify() will
return "true", removing VM_SHARED in vma_set_page_prot() to map pages
write-protected. Let's teach vma_wants_writenotify() that hugetlb does not
support softdirty tracking.

Link: https://lkml.kernel.org/r/20220811103435.188481-1-david@redhat.com
Link: https://lkml.kernel.org/r/20220811103435.188481-2-david@redhat.com


Fixes: 64e45507 ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Jamie Liu <jamieliu@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>	[3.18+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

67a9ed93

asm-generic: sections: refactor memory_intersects · 3d014e93

Quanyang Wang authored 2 years ago

stable inclusion
from stable-v4.19.257
commit b5cc57e43f2e5c51ef6ee6d45d97ff7844e9d4ba
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4
CVE: NA

--------------------------------

commit 0c7d7cc2b4fe2e74ef8728f030f0f1674f9f6aee upstream.

There are two problems with the current code of memory_intersects:

First, it doesn't check whether the region (begin, end) falls inside the
region (virt, vend), that is (virt < begin && vend > end).

The second problem is if vend is equal to begin, it will return true but
this is wrong since vend (virt + size) is not the last address of the
memory region but (virt + size -1) is.  The wrong determination will
trigger the misreporting when the function check_for_illegal_area calls
memory_intersects to check if the dma region intersects with stext region.

The misreporting is as below (stext is at 0x80100000):
 WARNING: CPU: 0 PID: 77 at kernel/dma/debug.c:1073 check_for_illegal_area+0x130/0x168
 DMA-API: chipidea-usb2 e0002000.usb: device driver maps memory from kernel text or rodata [addr=800f0000] [len=65536]
 Modules linked in:
 CPU: 1 PID: 77 Comm: usb-storage Not tainted 5.19.0-yocto-standard #5
 Hardware name: Xilinx Zynq Platform
  unwind_backtrace from show_stack+0x18/0x1c
  show_stack from dump_stack_lvl+0x58/0x70
  dump_stack_lvl from __warn+0xb0/0x198
  __warn from warn_slowpath_fmt+0x80/0xb4
  warn_slowpath_fmt from check_for_illegal_area+0x130/0x168
  check_for_illegal_area from debug_dma_map_sg+0x94/0x368
  debug_dma_map_sg from __dma_map_sg_attrs+0x114/0x128
  __dma_map_sg_attrs from dma_map_sg_attrs+0x18/0x24
  dma_map_sg_attrs from usb_hcd_map_urb_for_dma+0x250/0x3b4
  usb_hcd_map_urb_for_dma from usb_hcd_submit_urb+0x194/0x214
  usb_hcd_submit_urb from usb_sg_wait+0xa4/0x118
  usb_sg_wait from usb_stor_bulk_transfer_sglist+0xa0/0xec
  usb_stor_bulk_transfer_sglist from usb_stor_bulk_srb+0x38/0x70
  usb_stor_bulk_srb from usb_stor_Bulk_transport+0x150/0x360
  usb_stor_Bulk_transport from usb_stor_invoke_transport+0x38/0x440
  usb_stor_invoke_transport from usb_stor_control_thread+0x1e0/0x238
  usb_stor_control_thread from kthread+0xf8/0x104
  kthread from ret_from_fork+0x14/0x2c

Refactor memory_intersects to fix the two problems above.

Before the 1d7db834a027e ("dma-debug: use memory_intersects()
directly"), memory_intersects is called only by printk_late_init:

printk_late_init -> init_section_intersects ->memory_intersects.

There were few places where memory_intersects was called.

When commit 1d7db834a027e ("dma-debug: use memory_intersects()
directly") was merged and CONFIG_DMA_API_DEBUG is enabled, the DMA
subsystem uses it to check for an illegal area and the calltrace above
is triggered.

[akpm@linux-foundation.org: fix nearby comment typo]
Link: https://lkml.kernel.org/r/20220819081145.948016-1-quanyang.wang@windriver.com


Fixes: 97955936 ("asm/sections: add helpers to check for section data")
Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thierry Reding <treding@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

3d014e93

loop: Check for overflow while configuring loop · ba3f75d0

Siddh Raman Pant authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 2035c770bfdbcc82bd52e05871a7c82db9529e0f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4
CVE: NA

--------------------------------

commit c490a0b5a4f36da3918181a8acdc6991d967c5f3 upstream.

The userspace can configure a loop using an ioctl call, wherein
a configuration of type loop_config is passed (see lo_ioctl()'s
case on line 1550 of drivers/block/loop.c). This proceeds to call
loop_configure() which in turn calls loop_set_status_from_info()
(see line 1050 of loop.c), passing &config->info which is of type
loop_info64*. This function then sets the appropriate values, like
the offset.

loop_device has lo_offset of type loff_t (see line 52 of loop.c),
which is typdef-chained to long long, whereas loop_info64 has
lo_offset of type __u64 (see line 56 of include/uapi/linux/loop.h).

The function directly copies offset from info to the device as
follows (See line 980 of loop.c):
	lo->lo_offset = info->lo_offset;

This results in an overflow, which triggers a warning in iomap_iter()
due to a call to iomap_iter_done() which has:
	WARN_ON_ONCE(iter->iomap.offset > iter->pos);

Thus, check for negative value during loop_set_status_from_info().

Bug report: https://syzkaller.appspot.com/bug?id=c620fe14aac810396d3c3edc9ad73848bf69a29e



Reported-and-tested-by:  <syzbot+a8e049cd3abd342936b6@syzkaller.appspotmail.com>
Cc: stable@vger.kernel.org
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Siddh Raman Pant <code@siddh.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220823160810.181275-1-code@siddh.me


Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

ba3f75d0

net: Fix a data-race around sysctl_somaxconn. · c608e294

Kuniyuki Iwashima authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 1e4142b95269a394229c89d132f8e226fb1b4f71
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit 3c9ba81d72047f2e81bb535d42856517b613aba7 ]

While reading sysctl_somaxconn, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

c608e294

net: Fix a data-race around netdev_budget_usecs. · 87b0a8b8

Kuniyuki Iwashima authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 2a341830849e1ced57e40455bcaa6fc9b79662d5
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit fa45d484c52c73f79db2c23b0cdfc6c6455093ad ]

While reading netdev_budget_usecs, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 7acf8a1e ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

87b0a8b8

net: Fix a data-race around netdev_budget. · f9c68f46

Kuniyuki Iwashima authored 2 years ago

stable inclusion
from stable-v4.19.257
commit c3d0b6b3c808ff7d5439fbe6a41108bbb000ab26
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit 2e0c42374ee32e72948559d2ae2f7ba3dc6b977c ]

While reading netdev_budget, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 51b0bded ("[NET]: Separate two usages of netdev_max_backlog.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

f9c68f46

net: Fix a data-race around sysctl_net_busy_read. · 094eb9a6

Kuniyuki Iwashima authored 2 years ago

stable inclusion
from stable-v4.19.257
commit d908fd060927cf23bae5b2975140c7ed81bd4013
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit e59ef36f0795696ab229569c153936bfd068d21c ]

While reading sysctl_net_busy_read, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 2d48d67f ("net: poll/select low latency socket support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

094eb9a6