Commits · 5cabb5b5b8ab85c6862f3467613377d173df3630 · Summer2022 / 22b970264

Jul 21, 2022

sched: Introduce dynamic affinity for cfs scheduler · 5cabb5b5

Hui Tang authored 2 years ago

hulk inclusion
category: feature
bugzilla: 187173, https://gitee.com/openeuler/kernel/issues/I5G4IH


CVE: NA

--------------------------------

Dynamic affinity set preferred cpus for task. When the utilization of
taskgroup's preferred cpu is low, task only run in cpus preferred to
enhance cpu resource locality and reduce interference between task cgroups,
otherwise task can burst preferred cpus to use external cpu within
cpus allowed.

Signed-off-by: Hui Tang <tanghui20@huawei.com>
Reviewed-by: Zhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: Chen Hui <judy.chenhui@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

5cabb5b5

crypto: hisilicon/sec - don't sleep when in softirq · 2d4185ae

Zhengchao Shao authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GEVR


CVE: NA

--------------------------------

When kunpeng920 encryption driver is used to deencrypt and decrypt
packets during the softirq, it is not allowed to use mutex lock. The
kernel will report the following error:

BUG: scheduling while atomic: swapper/57/0/0x00000300
Call trace:
dump_backtrace+0x0/0x1e4
show_stack+0x20/0x2c
dump_stack+0xd8/0x140
__schedule_bug+0x68/0x80
__schedule+0x728/0x840
schedule+0x50/0xe0
schedule_preempt_disabled+0x18/0x24
__mutex_lock.constprop.0+0x594/0x5dc
__mutex_lock_slowpath+0x1c/0x30
mutex_lock+0x50/0x60
sec_request_init+0x8c/0x1a0 [hisi_sec2]
sec_process+0x28/0x1ac [hisi_sec2]
sec_skcipher_crypto+0xf4/0x1d4 [hisi_sec2]
sec_skcipher_encrypt+0x1c/0x30 [hisi_sec2]
crypto_skcipher_encrypt+0x2c/0x40
crypto_authenc_encrypt+0xc8/0xfc [authenc]
crypto_aead_encrypt+0x2c/0x40
echainiv_encrypt+0x144/0x1a0 [echainiv]
crypto_aead_encrypt+0x2c/0x40
esp_output_tail+0x348/0x5c0 [esp4]
esp_output+0x120/0x19c [esp4]
xfrm_output_one+0x25c/0x4d4
xfrm_output_resume+0x6c/0x1fc
xfrm_output+0xac/0x3c0
xfrm4_output+0x64/0x130
ip_build_and_send_pkt+0x158/0x20c
tcp_v4_send_synack+0xdc/0x1f0
tcp_conn_request+0x7d0/0x994
tcp_v4_conn_request+0x58/0x6c
tcp_v6_conn_request+0xf0/0x100
tcp_rcv_state_process+0x1cc/0xd60
tcp_v4_do_rcv+0x10c/0x250
tcp_v4_rcv+0xfc4/0x10a4
ip_protocol_deliver_rcu+0xf4/0x200
ip_local_deliver_finish+0x58/0x70
ip_local_deliver+0x68/0x120
ip_sublist_rcv_finish+0x70/0x94
ip_list_rcv_finish.constprop.0+0x17c/0x1d0
ip_sublist_rcv+0x40/0xb0
ip_list_rcv+0x140/0x1dc
__netif_receive_skb_list_core+0x154/0x28c
__netif_receive_skb_list+0x120/0x1a0
netif_receive_skb_list_internal+0xe4/0x1f0
napi_complete_done+0x70/0x1f0
gro_cell_poll+0x9c/0xb0
napi_poll+0xcc/0x264
net_rx_action+0xd4/0x21c
__do_softirq+0x130/0x358
irq_exit+0x11c/0x13c
__handle_domain_irq+0x88/0xf0
gic_handle_irq+0x78/0x2c0
el1_irq+0xb8/0x140
arch_cpu_idle+0x18/0x40
default_idle_call+0x5c/0x1c0
cpuidle_idle_call+0x174/0x1b0
do_idle+0xc8/0x160
cpu_startup_entry+0x30/0x11c
secondary_start_kernel+0x158/0x1e4
softirq: huh, entered softirq 3 NET_RX 0000000093774ee4 with
preempt_count 00000100, exited with fffffe00?

Fixes: 416d8220 ("crypto: hisilicon - add HiSilicon SEC V2 driver")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

2d4185ae

video: fbdev: sm712fb: Fix crash in smtcfb_write() · ffc9dfba

Zheyu Ma authored 2 years ago

stable inclusion
from stable-v4.19.238
commit 1aea36a62f0a0ad67eccc945bac0bd6422ef720f
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5GKWU


CVE: CVE-2022-2380

--------------------------------

When the sm712fb driver writes three bytes to the framebuffer, the
driver will crash:

    BUG: unable to handle page fault for address: ffffc90001ffffff
    RIP: 0010:smtcfb_write+0x454/0x5b0
    Call Trace:
     vfs_write+0x291/0xd60
     ? do_sys_openat2+0x27d/0x350
     ? __fget_light+0x54/0x340
     ksys_write+0xce/0x190
     do_syscall_64+0x43/0x90
     entry_SYSCALL_64_after_hwframe+0x44/0xae

Fix it by removing the open-coded endianness fixup-code.

Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Xia Longlong <xialonglong1@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

ffc9dfba

video: fbdev: sm712fb: Fix crash in smtcfb_read() · 45ea46ea

Helge Deller authored 2 years ago

stable inclusion
from stable-v4.19.238
commit 1caa40af491dcfe17b3ae870a854388d8ea01984
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5GKWU


CVE: CVE-2022-2380

--------------------------------

Zheyu Ma reported this crash in the sm712fb driver when reading
three bytes from the framebuffer:

 BUG: unable to handle page fault for address: ffffc90001ffffff
 RIP: 0010:smtcfb_read+0x230/0x3e0
 Call Trace:
  vfs_read+0x198/0xa00
  ? do_sys_openat2+0x27d/0x350
  ? __fget_light+0x54/0x340
  ksys_read+0xce/0x190
  do_syscall_64+0x43/0x90

Fix it by removing the open-coded endianess fixup-code and
by moving the pointer post decrement out the fb_readl() function.

Reported-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Zheyu Ma <zheyuma97@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Xia Longlong <xialonglong1@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

45ea46ea

Jul 20, 2022

scsi: ses: fix slab-out-of-bounds in ses_enclosure_data_process · b101167f

Zhang Wensheng authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: 187025, https://gitee.com/src-openeuler/kernel/issues/I5HPS5


CVE: NA

--------------------------------

Kasan report a bug like below:
[  494.865170] ==================================================================
[  494.901335] BUG: KASAN: slab-out-of-bounds in ses_enclosure_data_process+0x234/0x6f0 [ses]
[  494.901347] Write of size 1 at addr ffff8882f3181a70 by task systemd-udevd/1704
[  494.931929] i801_smbus 0000:00:1f.4: SPD Write Disable is set

[  494.944092] CPU: 12 PID: 1704 Comm: systemd-udevd Tainted: G
[  494.944101] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 7.01 11/13/2019
[  494.964003] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt
[  494.978532] Call Trace:
[  494.978544]  dump_stack+0xbe/0xf9
[  494.978558]  print_address_description.constprop.0+0x19/0x130
[  495.092838]  ? ses_enclosure_data_process+0x234/0x6f0 [ses]
[  495.092846]  __kasan_report.cold+0x68/0x80
[  495.092855]  ? __kasan_kmalloc.constprop.0+0x71/0xd0
[  495.092862]  ? ses_enclosure_data_process+0x234/0x6f0 [ses]
[  495.092868]  kasan_report+0x3a/0x50
[  495.092875]  ses_enclosure_data_process+0x234/0x6f0 [ses]
[  495.092882]  ? mutex_unlock+0x1d/0x40
[  495.092889]  ses_intf_add+0x57f/0x910 [ses]
[  495.092900]  class_interface_register+0x26d/0x290
[  495.092906]  ? class_destroy+0xd0/0xd0
[  495.092912]  ? 0xffffffffc0bf8000
[  495.092919]  ses_init+0x18/0x1000 [ses]
[  495.092927]  do_one_initcall+0xcb/0x370
[  495.092934]  ? initcall_blacklisted+0x1b0/0x1b0
[  495.092942]  ? create_object.isra.0+0x330/0x3a0
[  495.092950]  ? kasan_unpoison_shadow+0x33/0x40
[  495.092957]  ? kasan_unpoison_shadow+0x33/0x40
[  495.092966]  do_init_module+0xe4/0x3a0
[  495.092972]  load_module+0xd0a/0xdd0
[  495.092980]  ? layout_and_allocate+0x300/0x300
[  495.092989]  ? seccomp_run_filters+0x1d6/0x2c0
[  495.092999]  ? kernel_read_file_from_fd+0xb3/0xe0
[  495.093006]  __se_sys_finit_module+0x11b/0x1b0
[  495.093012]  ? __ia32_sys_init_module+0x40/0x40
[  495.093023]  ? __audit_syscall_entry+0x226/0x290
[  495.093032]  do_syscall_64+0x33/0x40
[  495.093041]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  495.093046] RIP: 0033:0x7f39c3376089
[  495.093054] Code: 00 48 81 c4 80 00 00 00 89 f0 c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e7 dd 0b 00 f7 d8 64 89 01 48
[  495.093058] RSP: 002b:00007ffdc6009e18 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[  495.093068] RAX: ffffffffffffffda RBX: 000055d4192801c0 RCX: 00007f39c3376089
[  495.093072] RDX: 0000000000000000 RSI: 00007f39c2fae99d RDI: 000000000000000f
[  495.093076] RBP: 00007f39c2fae99d R08: 0000000000000000 R09: 0000000000000001
[  495.093080] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000000000
[  495.093084] R13: 000055d419282e00 R14: 0000000000020000 R15: 000055d41927f1f0

[  495.093091] Allocated by task 1704:
[  495.093098]  kasan_save_stack+0x1b/0x40
[  495.093105]  __kasan_kmalloc.constprop.0+0xc2/0xd0
[  495.093111]  ses_enclosure_data_process+0x65d/0x6f0 [ses]
[  495.093117]  ses_intf_add+0x57f/0x910 [ses]
[  495.093123]  class_interface_register+0x26d/0x290
[  495.093129]  ses_init+0x18/0x1000 [ses]
[  495.093134]  do_one_initcall+0xcb/0x370
[  495.093139]  do_init_module+0xe4/0x3a0
[  495.093144]  load_module+0xd0a/0xdd0
[  495.093150]  __se_sys_finit_module+0x11b/0x1b0
[  495.093155]  do_syscall_64+0x33/0x40
[  495.093162]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[  495.093168] The buggy address belongs to the object at ffff8882f3181a40
                which belongs to the cache kmalloc-64 of size 64
[  495.093173] The buggy address is located 48 bytes inside of
                64-byte region [ffff8882f3181a40, ffff8882f3181a80)
[  495.093175] The buggy address belongs to the page:
[  495.093181] page:ffffea000bcc6000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2f3180
[  495.093186] head:ffffea000bcc6000 order:2 compound_mapcount:0 compound_pincount:0
[  495.093194] flags: 0x17ffe0000010200(slab|head|node=0|zone=2|lastcpupid=0x3fff)
[  495.093204] raw: 017ffe0000010200 ffffea0016e5fb08 ffffea0016921508 ffff888100050e00
[  495.093211] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
[  495.093213] page dumped because: kasan: bad access detected

[  495.093216] Memory state around the buggy address:
[  495.093222]  ffff8882f3181900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  495.093227]  ffff8882f3181980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  495.093231] >ffff8882f3181a00: fc fc fc fc fc fc fc fc 00 00 00 00 01 fc fc fc
[  495.093234]                                                              ^
[  495.093239]  ffff8882f3181a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  495.093244]  ffff8882f3181b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  495.093246] ==================================================================

After analysis on vmcore, it was found that the line "desc_ptr[len] =
'\0';" has slab-out-of-bounds problem in ses_enclosure_data_process.
In ses_enclosure_data_process, "desc_ptr" point to "buf", so it have
to be limited in the memory of "buf", however. although there is
"desc_ptr >= buf + page7_len" judgment, it does not work because
"desc_ptr + 4 + len" may bigger than "buf + page7_len", which will
lead to slab-out-of-bounds problem.

Signed-off-by: Zhang Wensheng <zhangwensheng5@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

b101167f

block: don't delete queue kobject before its children · 0698a51d

Eric Biggers authored 3 years ago

stable inclusion
from linux-4.19.238
commit b2001eb10f59363da930cdd6e086a2861986fa18
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5EWKO


CVE: NA

--------------------------------

[ Upstream commit 0f69288253e9fc7c495047720e523b9f1aba5712 ]

kobjects aren't supposed to be deleted before their child kobjects are
deleted.  Apparently this is usually benign; however, a WARN will be
triggered if one of the child kobjects has a named attribute group:

    sysfs group 'modes' not found for kobject 'crypto'
    WARNING: CPU: 0 PID: 1 at fs/sysfs/group.c:278 sysfs_remove_group+0x72/0x80
    ...
    Call Trace:
      sysfs_remove_groups+0x29/0x40 fs/sysfs/group.c:312
      __kobject_del+0x20/0x80 lib/kobject.c:611
      kobject_cleanup+0xa4/0x140 lib/kobject.c:696
      kobject_release lib/kobject.c:736 [inline]
      kref_put include/linux/kref.h:65 [inline]
      kobject_put+0x53/0x70 lib/kobject.c:753
      blk_crypto_sysfs_unregister+0x10/0x20 block/blk-crypto-sysfs.c:159
      blk_unregister_queue+0xb0/0x110 block/blk-sysfs.c:962
      del_gendisk+0x117/0x250 block/genhd.c:610

Fix this by moving the kobject_del() and the corresponding
kobject_uevent() to the correct place.

Fixes: 2c2086af ("block: Protect less code with sysfs_lock in blk_{un,}register_queue()")
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220124215938.2769-3-ebiggers@kernel.org


Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>

Conflict:
	block/blk-sysfs.c
Signed-off-by: Zhang Wensheng <zhangwensheng5@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

0698a51d

etmem:fix kernel stack overflow in do_swapcache_reclaim · 89ea0cd3

liubo authored 3 years ago

euleros inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GN7K


CVE: NA

--------------------------------
In the do_swapcache_reclaim interface,
there are the following local variables.

unsigned long nr[MAX_NUMNODES],
unsigned long nr_to_reclaim[MAX_NUMNODES],
struct list_head swapcache_list[MAX_NUMNODES],

In the kernel, MAX_NUMNODES is defined as follows:

Under the x86_64 architecture, CONFIG_NODES_SHIFT is
defined as follows:
CONFIG_NODES_SHIFT=10

Therefore, under the X86_64 architecture, local variables
may cause kernel stack overflow.

Modify the above variable acquisition method
and change it to dynamic application.

Signed-off-by: liubo <liubo254@huawei.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: wangkefeng <wangkefeng.wang@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

89ea0cd3

etmem:fix kasan slab-out-of-bounds in do_swapcache_reclaim · c9a24119

liubo authored 3 years ago

euleros inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GN7K


CVE: NA

--------------------------------
In the do_swapcache_reclaim interface,
there is a slab-out-of-bounds kasan problem;

The reason for the problem is that when
list_for_each_entry_safe_reverse_from traverses
the LRU linked list, it does not consider that next may be
equal to the head address, which may lead to the
head address being accessed as the page address,
causing problems.

In response to the above problems,
add a judgment about whether pos is head.

Signed-off-by: liubo <liubo254@huawei.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: wangkefeng <wangkefeng.wang@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

c9a24119

nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed · 622ecb59

余快 authored 3 years ago

mainline inclusion
from mainline-v5.19-rc1
commit 2895f1831e911ca87d4efdf43e35eb72a0c7e66e
category: bugfix
bugzilla: 187081, https://gitee.com/src-openeuler/kernel/issues/I5H341


CVE: NA

--------------------------------

Otherwise io will hung because request will only be completed if the
cmd has the flag 'NBD_CMD_INFLIGHT'.

Fixes: 07175cb1baf4 ("nbd: make sure request completion won't concurrent")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20220521073749.3146892-4-yukuai3@huawei.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

Conflict: fake timeout is not supported yet, clear_bit() in
nbd_handle_reply() directly.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

622ecb59

blk-throttle: fix io hung due to configuration updates · f4df027e

余快 authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: 186731, https://gitee.com/src-openeuler/kernel/issues/I5HWTF


CVE: NA

--------------------------------

If new configuration is submitted while a bio is throttled, then new
waiting time is recaculated regardless that the bio might aready wait
for some time:

tg_conf_updated
 throtl_start_new_slice
  tg_update_disptime
  throtl_schedule_next_dispatch

Then io hung can be triggered by always submmiting new configuration
before the throttled bio is dispatched.

Fix the problem by respecting the time that throttled bio aready waited.
In order to do that, instead of start new slice in tg_conf_updated(),
just update 'bytes_disp' and 'io_disp' based on the new configuration.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

f4df027e

block: fix NULL pointer dereference in disk_release() · dc2c2374

余快 authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: 186769, https://gitee.com/openeuler/kernel/issues/I5FYJY


CVE: NA

--------------------------------

Our test report a crash:

run fstests generic/349 at 2022-05-20 20:55:10
sd 3:0:0:0: Power-on or device reset occurred
BUG: kernel NULL pointer dereference, address: 0000000000000030
Call Trace:
 disk_release+0x42/0x170
 device_release+0x92/0x120
 kobject_put+0x183/0x350
 put_disk+0x23/0x30
 sg_device_destroy+0x77/0xd0
 sg_remove_device+0x1b8/0x220
 device_del+0x19b/0x610
 ? kfree_const+0x3e/0x50
 ? kobject_put+0x1d1/0x350
 device_unregister+0x36/0xa0
 __scsi_remove_device+0x1ba/0x240
 scsi_forget_host+0x95/0xd0
 scsi_remove_host+0xba/0x1f0
 sdebug_driver_remove+0x30/0x110 [scsi_debug]
 device_release_driver_internal+0x1ab/0x340
 device_release_driver+0x16/0x20
 bus_remove_device+0x167/0x220
 device_del+0x23e/0x610
 device_unregister+0x36/0xa0
 sdebug_do_remove_host+0x159/0x190 [scsi_debug]
 scsi_debug_exit+0x2d/0x120 [scsi_debug]
 __se_sys_delete_module+0x34c/0x420
 ? exit_to_user_mode_prepare+0x93/0x210
 __x64_sys_delete_module+0x1a/0x30
 do_syscall_64+0x4d/0x70
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Such crash happened since commit 2a19b28f7929 ("blk-mq: cancel blk-mq
dispatch work in both blk_cleanup_queue and disk_release()") was
backported from mainline.

commit 61a35cfc2633 ("block: hold a request_queue reference for the
lifetime of struct gendisk") is not backported, thus we can't ensure
request_queue still exist in disk_release(), and that's why
blk_mq_cancel_work_sync() will triggered the problem in disk_release().
However, in order to backport it, there are too many relied patches and
kabi will be broken.

Since we didn't backport related patches to tear down file system I/O in
del_gendisk, which fix issues introduced by refactor patches to move bdi
from request_queue to the disk, there is no need to call
blk_mq_cancel_work_sync() from disk_release(). This patch just remove
blk_mq_cancel_work_sync() from disk_release() to fix the above crash.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

dc2c2374

block, bfq: make bfq_has_work() more accurate · e27ff551

余快 authored 3 years ago

mainline inclusion
from mainline-v5.19-rc1
commit ddc25c86b466d2359b57bc7798f167baa1735a44
category: bugfix
bugzilla: 186769, https://gitee.com/openeuler/kernel/issues/I5FYJY


CVE: NA

--------------------------------

bfq_has_work() is using busy_queues currently, which is not accurate
because bfq_queue is busy doesn't represent that it has requests. Since
bfqd aready has a counter 'queued' to record how many requests are in
bfq, use it instead of busy_queues.

Noted that bfq_has_work() can be called with 'bfqd->lock' held, thus the
lock can't be held in bfq_has_work() to protect 'bfqd->queued'.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220513023507.2625717-3-yukuai3@huawei.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

e27ff551

blk-mq: fix panic during blk_mq_run_work_fn() · ea93e21d

余快 authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: 186769, https://gitee.com/openeuler/kernel/issues/I5FYJY


CVE: NA

--------------------------------

Our test report a following crash:

BUG: kernel NULL pointer dereference, address: 0000000000000018
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 6 PID: 265 Comm: kworker/6:1H Kdump: loaded Tainted: G           O      5.10.0-60.17.0.h43.eulerosv2r11.x86_64 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-20220320_160524-szxrtosci10000 04/01/2014
Workqueue: kblockd blk_mq_run_work_fn
RIP: 0010:blk_mq_delay_run_hw_queues+0xb6/0xe0
RSP: 0018:ffffacc6803d3d88 EFLAGS: 00010246
RAX: 0000000000000006 RBX: ffff99e2c3d25008 RCX: 00000000ffffffff
RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff99e2c911ae18
RBP: ffffacc6803d3dd8 R08: 0000000000000000 R09: ffff99e2c0901f6c
R10: 0000000000000018 R11: 0000000000000018 R12: ffff99e2c911ae18
R13: 0000000000000000 R14: 0000000000000003 R15: ffff99e2c911ae18
FS:  0000000000000000(0000) GS:ffff99e6bbf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000018 CR3: 000000007460a006 CR4: 00000000003706e0
Call Trace:
 __blk_mq_do_dispatch_sched+0x2a7/0x2c0
 ? newidle_balance+0x23e/0x2f0
 __blk_mq_sched_dispatch_requests+0x13f/0x190
 blk_mq_sched_dispatch_requests+0x30/0x60
 __blk_mq_run_hw_queue+0x47/0xd0
 process_one_work+0x1b0/0x350
 worker_thread+0x49/0x300
 ? rescuer_thread+0x3a0/0x3a0
 kthread+0xfe/0x140
 ? kthread_park+0x90/0x90
 ret_from_fork+0x22/0x30

After digging from vmcore, I found that the queue is cleaned
up(blk_cleanup_queue() is done) and tag set is
freed(blk_mq_free_tag_set() is done).

There are two problems here:

1) blk_mq_delay_run_hw_queues() will only be called from
__blk_mq_do_dispatch_sched() if e->type->ops.has_work() return true.
This seems impossible because blk_cleanup_queue() is done, and there
should be no io. However, bfq_has_work() can return true even if no
io is queued. This is because bfq_has_work() is using busy queues, and
bfq_queue can stay busy after dispatching all the requests.

2) 'hctx->run_work' still exists after blk_cleanup_queue().
blk_mq_cancel_work_sync() is called from blk_cleanup_queue() to cancel
all the 'run_work'. However, there is no guarantee that new 'run_work'
won't be queued after that(and before blk_mq_exit_queue() is done).

The first problem is not the root cause, this patch just fix the second
problem by grabbing 'q_usage_counter' before queuing 'hctx->run_work'.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

ea93e21d

blk-mq: cancel blk-mq dispatch work in both blk_cleanup_queue and disk_release() · a27f3109

Ming Lei authored 3 years ago

mainline inclusion
from mainline-v5.16-rc2
commit 2a19b28f7929866e1cec92a3619f4de9f2d20005
category: bugfix
bugzilla: 186769, https://gitee.com/openeuler/kernel/issues/I5FYJY


CVE: NA

--------------------------------

For avoiding to slow down queue destroy, we don't call
blk_mq_quiesce_queue() in blk_cleanup_queue(), instead of delaying to
cancel dispatch work in blk_release_queue().

However, this way has caused kernel oops[1], reported by Changhui. The log
shows that scsi_device can be freed before running blk_release_queue(),
which is expected too since scsi_device is released after the scsi disk
is closed and the scsi_device is removed.

Fixes the issue by canceling blk-mq dispatch work in both blk_cleanup_queue()
and disk_release():

1) when disk_release() is run, the disk has been closed, and any sync
dispatch activities have been done, so canceling dispatch work is enough to
quiesce filesystem I/O dispatch activity.

2) in blk_cleanup_queue(), we only focus on passthrough request, and
passthrough request is always explicitly allocated & freed by
its caller, so once queue is frozen, all sync dispatch activity
for passthrough request has been done, then it is enough to just cancel
dispatch work for avoiding any dispatch activity.

[1] kernel panic log
[12622.769416] BUG: kernel NULL pointer dereference, address: 0000000000000300
[12622.777186] #PF: supervisor read access in kernel mode
[12622.782918] #PF: error_code(0x0000) - not-present page
[12622.788649] PGD 0 P4D 0
[12622.791474] Oops: 0000 [#1] PREEMPT SMP PTI
[12622.796138] CPU: 10 PID: 744 Comm: kworker/10:1H Kdump: loaded Not tainted 5.15.0+ #1
[12622.804877] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 1.5.4 10/002/2015
[12622.813321] Workqueue: kblockd blk_mq_run_work_fn
[12622.818572] RIP: 0010:sbitmap_get+0x75/0x190
[12622.823336] Code: 85 80 00 00 00 41 8b 57 08 85 d2 0f 84 b1 00 00 00 45 31 e4 48 63 cd 48 8d 1c 49 48 c1 e3 06 49 03 5f 10 4c 8d 6b 40 83 f0 01 <48> 8b 33 44 89 f2 4c 89 ef 0f b6 c8 e8 fa f3 ff ff 83 f8 ff 75 58
[12622.844290] RSP: 0018:ffffb00a446dbd40 EFLAGS: 00010202
[12622.850120] RAX: 0000000000000001 RBX: 0000000000000300 RCX: 0000000000000004
[12622.858082] RDX: 0000000000000006 RSI: 0000000000000082 RDI: ffffa0b7a2dfe030
[12622.866042] RBP: 0000000000000004 R08: 0000000000000001 R09: ffffa0b742721334
[12622.874003] R10: 0000000000000008 R11: 0000000000000008 R12: 0000000000000000
[12622.881964] R13: 0000000000000340 R14: 0000000000000000 R15: ffffa0b7a2dfe030
[12622.889926] FS:  0000000000000000(0000) GS:ffffa0baafb40000(0000) knlGS:0000000000000000
[12622.898956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12622.905367] CR2: 0000000000000300 CR3: 0000000641210001 CR4: 00000000001706e0
[12622.913328] Call Trace:
[12622.916055]  <TASK>
[12622.918394]  scsi_mq_get_budget+0x1a/0x110
[12622.922969]  __blk_mq_do_dispatch_sched+0x1d4/0x320
[12622.928404]  ? pick_next_task_fair+0x39/0x390
[12622.933268]  __blk_mq_sched_dispatch_requests+0xf4/0x140
[12622.939194]  blk_mq_sched_dispatch_requests+0x30/0x60
[12622.944829]  __blk_mq_run_hw_queue+0x30/0xa0
[12622.949593]  process_one_work+0x1e8/0x3c0
[12622.954059]  worker_thread+0x50/0x3b0
[12622.958144]  ? rescuer_thread+0x370/0x370
[12622.962616]  kthread+0x158/0x180
[12622.966218]  ? set_kthread_struct+0x40/0x40
[12622.970884]  ret_from_fork+0x22/0x30
[12622.974875]  </TASK>
[12622.977309] Modules linked in: scsi_debug rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs sunrpc dm_multipath intel_rapl_msr intel_rapl_common dell_wmi_descriptor sb_edac rfkill video x86_pkg_temp_thermal intel_powerclamp dcdbas coretemp kvm_intel kvm mgag200 irqbypass i2c_algo_bit rapl drm_kms_helper ipmi_ssif intel_cstate intel_uncore syscopyarea sysfillrect sysimgblt fb_sys_fops pcspkr cec mei_me lpc_ich mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter drm fuse xfs libcrc32c sr_mod cdrom sd_mod t10_pi sg ixgbe ahci libahci crct10dif_pclmul crc32_pclmul crc32c_intel libata megaraid_sas ghash_clmulni_intel tg3 wdat_wdt mdio dca wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_debug]

Reported-by: ChanghuiZhong <czhong@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211116014343.610501-1-ming.lei@redhat.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

Conflicts:
./block/blk-mq.c
./block/blk-mq.h
./block/blk-sysfs.c
./block/genhd.c
./block/blk-core.c
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

a27f3109

blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue · 1cf7ad3c

Yang Yang authored 3 years ago

mainline inclusion
from mainline-v5.10-rc1
commit 47ce030b
category: bugfix
bugzilla: 186769, https://gitee.com/openeuler/kernel/issues/I5FYJY


CVE: NA

--------------------------------

blk_exit_queue will free elevator_data, while blk_mq_run_work_fn
will access it. Move cancel of hctx->run_work to the front of
blk_exit_queue to avoid use-after-free.

Fixes: 1b97871b ("blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_release")
Signed-off-by: Yang Yang <yang.yang@vivo.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

1cf7ad3c

ext4: fix race condition between ext4_ioctl_setflags and ext4_fiemap · b8493652

Baokun Li authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: 187222, https://gitee.com/openeuler/kernel/issues/I5H3KE


CVE: NA

--------------------------------

Hulk Robot reported a BUG:

==================================================================
kernel BUG at fs/ext4/extents_status.c:762!
invalid opcode: 0000 [#1] SMP KASAN PTI
[...]
Call Trace:
 ext4_cache_extents+0x238/0x2f0
 ext4_find_extent+0x785/0xa40
 ext4_fiemap+0x36d/0xe90
 do_vfs_ioctl+0x6af/0x1200
[...]
==================================================================

Above issue may happen as follows:
-------------------------------------
           cpu1		    cpu2
_____________________|_____________________
do_vfs_ioctl
 ext4_ioctl
  ext4_ioctl_setflags
   ext4_ind_migrate
                        do_vfs_ioctl
                         ioctl_fiemap
                          ext4_fiemap
                           ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)
                           ext4_fill_fiemap_extents
    down_write(&EXT4_I(inode)->i_data_sem);
    ext4_ext_check_inode
    ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS)
    memset(ei->i_data, 0, sizeof(ei->i_data))
    up_write(&EXT4_I(inode)->i_data_sem);
                            down_read(&EXT4_I(inode)->i_data_sem);
                            ext4_find_extent
                             ext4_cache_extents
                              ext4_es_cache_extent
                               BUG_ON(end < lblk)

We can easily reproduce this problem with the syzkaller testcase:
```
02:37:07 executing program 3:
r0 = openat(0xffffffffffffff9c, &(0x7f0000000040)='./file0\x00', 0x26e1, 0x0)
ioctl$FS_IOC_FSSETXATTR(r0, 0x40086602, &(0x7f0000000080)={0x17e})
mkdirat(0xffffffffffffff9c, &(0x7f00000000c0)='./file1\x00', 0x1ff)
r1 = openat(0xffffffffffffff9c, &(0x7f0000000100)='./file1\x00', 0x0, 0x0)
ioctl$FS_IOC_FIEMAP(r1, 0xc020660b, &(0x7f0000000180)={0x0, 0x1, 0x0, 0xef3, 0x6, []}) (async, rerun: 32)
ioctl$FS_IOC_FSSETXATTR(r1, 0x40086602, &(0x7f0000000140)={0x17e}) (rerun: 32)
```

To solve this issue, we use __generic_block_fiemap() instead of
generic_block_fiemap() and add inode_lock_shared to avoid race condition.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

b8493652

Jul 14, 2022

block: fix that part scan is disabled in device_add_disk() · faf2662e

余快 authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: 187190, https://gitee.com/src-openeuler/kernel/issues/I5GWOV


CVE: NA

--------------------------------

Commit f20a726b ("block: Fix warning in bd_link_disk_holder()")
moves the setting of flag 'GENHD_FL_UP' behind blkdev_get, which will
disabled part scan:

devcie_add_disk
 register_disk
  blkdev_get
   __blkdev_get
    bdev_get_gendisk
     get_gendisk -> failed because 'GENHD_FL_UP' is not set

And this will cause tests block/017, block/018 and scsi/004 to fail.

Fix the problem by moving part scan as well.

Fixes: f20a726b ("block: Fix warning in bd_link_disk_holder()")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

4.19.90-2207.3.0

faf2662e

Revert "block: rename bd_invalidated" · 13879644

余快 authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: 187190, https://gitee.com/src-openeuler/kernel/issues/I5GWOV


CVE: NA

--------------------------------

This reverts commit b6113052.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

13879644

Revert "block: move the NEED_PART_SCAN flag to struct gendisk" · acc38712

余快 authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: 187190, https://gitee.com/src-openeuler/kernel/issues/I5GWOV


CVE: NA

--------------------------------

This reverts commit b2f0e44f.

Because it will introduce following problem in ltp zram tests:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000600
PGD 0 P4D 0
Oops: 0002 [#1] SMP PTI
CPU: 28 PID: 172121 Comm: sh Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0+ #2
Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 5.15 05/21/2019
RIP: 0010:flush_disk+0x1d/0x50
RSP: 0018:ffffaf14a516fe20 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff899e26bac380 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff899e26bac380
RBP: ffff899e26bac380 R08: 00000000000006a9 R09: 0000000000000004
R10: ffff89cd878ff440 R11: 0000000000000001 R12: 0000000000000000
R13: ffff899e26bac398 R14: ffffaf14a516ff00 R15: ffff89cd8709c3e0
FS:  00007f78d6840740(0000) GS:ffff89fcbf480000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000600 CR3: 000000308afc0002 CR4: 00000000001606e0
Call Trace:
 revalidate_disk+0x57/0x80
 reset_store+0xaf/0x120 [zram]
 kernfs_fop_write+0x10f/0x190
 vfs_write+0xad/0x1a0
 ksys_write+0x52/0xc0
 do_syscall_64+0x5d/0x1d0
 entry_SYSCALL_64_after_hwframe+0x65/0xca

This is because "bdev->bd_disk" is not ensured to exist, just convert
"set_bit(BDEV_NEED_PART_SCAN, &bdev->bd_flags)" to
"set_bit(GD_NEED_PART_SCAN, &bdev->bd_disk->state)" is wrong.

The reason to backport it is that commit 2a57456c8973 ("block:
Fix warning in bd_link_disk_holder()") has a regression that part scan
is disabled in device_add_disk(), and this problem will be fixed in
later patch.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

acc38712

Revert "block:Fix kabi broken" · 3e2bfb3a

余快 authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: 187190, https://gitee.com/src-openeuler/kernel/issues/I5GWOV


CVE: NA

--------------------------------

This reverts commit 64ba823f.

The patches that broke kabi will be reverted together.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

3e2bfb3a

rcu/tree: Mark functions as notrace · 9b5e728e

Zheng Yejian authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: 187209, https://gitee.com/openeuler/kernel/issues/I5GWFT
CVE: NA

--------------------------------

Syzkaller report a softlockup problem, see following logs:
  [   41.463870] watchdog: BUG: soft lockup - CPU#0 stuck for 22s!  [ksoftirqd/0:9]
  [   41.509763] Modules linked in:
  [   41.512295] CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 4.19.90 #13
  [   41.516134] Hardware name: linux,dummy-virt (DT)
  [   41.519182] pstate: 80c00005 (Nzcv daif +PAN +UAO)
  [   41.522415] pc : perf_trace_buf_alloc+0x138/0x238
  [   41.525583] lr : perf_trace_buf_alloc+0x138/0x238
  [   41.528656] sp : ffff8000c137e880
  [   41.531050] x29: ffff8000c137e880 x28: ffff20000850ced0
  [   41.534759] x27: 0000000000000000 x26: ffff8000c137e9c0
  [   41.538456] x25: ffff8000ce5c2ae0 x24: ffff200008358b08
  [   41.542151] x23: 0000000000000000 x22: ffff2000084a50ac
  [   41.545834] x21: ffff8000c137e880 x20: 000000000000001c
  [   41.549516] x19: ffff7dffbfdf88e8 x18: 0000000000000000
  [   41.553202] x17: 0000000000000000 x16: 0000000000000000
  [   41.556892] x15: 1ffff00036e07805 x14: 0000000000000000
  [   41.560592] x13: 0000000000000004 x12: 0000000000000000
  [   41.564315] x11: 1fffefbff7fbf120 x10: ffff0fbff7fbf120
  [   41.568003] x9 : dfff200000000000 x8 : ffff7dffbfdf8904
  [   41.571699] x7 : 0000000000000000 x6 : ffff0fbff7fbf121
  [   41.575398] x5 : ffff0fbff7fbf121 x4 : ffff0fbff7fbf121
  [   41.579086] x3 : ffff20000850cdc8 x2 : 0000000000000008
  [   41.582773] x1 : ffff8000c1376000 x0 : 0000000000000100
  [   41.586495] Call trace:
  [   41.588922]  perf_trace_buf_alloc+0x138/0x238
  [   41.591912]  perf_ftrace_function_call+0x1ac/0x248
  [   41.595123]  ftrace_ops_no_ops+0x3a4/0x488
  [   41.597998]  ftrace_graph_call+0x0/0xc
  [   41.600715]  rcu_dynticks_curr_cpu_in_eqs+0x14/0x70
  [   41.603962]  rcu_is_watching+0xc/0x20
  [   41.606635]  ftrace_ops_no_ops+0x240/0x488
  [   41.609530]  ftrace_graph_call+0x0/0xc
  [   41.612249]  __read_once_size_nocheck.constprop.0+0x1c/0x38
  [   41.615905]  unwind_frame+0x140/0x358
  [   41.618597]  walk_stackframe+0x34/0x60
  [   41.621359]  __save_stack_trace+0x204/0x3b8
  [   41.624328]  save_stack_trace+0x2c/0x38
  [   41.627112]  __kasan_slab_free+0x120/0x228
  [   41.630018]  kasan_slab_free+0x10/0x18
  [   41.632752]  kfree+0x84/0x250
  [   41.635107]  skb_free_head+0x70/0xb0
  [   41.637772]  skb_release_data+0x3f8/0x730
  [   41.640626]  skb_release_all+0x50/0x68
  [   41.643350]  kfree_skb+0x84/0x278
  [   41.645890]  kfree_skb_list+0x4c/0x78
  [   41.648595]  __dev_queue_xmit+0x1a4c/0x23a0
  [   41.651541]  dev_queue_xmit+0x28/0x38
  [   41.654254]  ip6_finish_output2+0xeb0/0x1630
  [   41.657261]  ip6_finish_output+0x2d8/0x7f8
  [   41.660174]  ip6_output+0x19c/0x348
  [   41.663850]  mld_sendpack+0x560/0x9e0
  [   41.666564]  mld_ifc_timer_expire+0x484/0x8a8
  [   41.669624]  call_timer_fn+0x68/0x4b0
  [   41.672355]  expire_timers+0x168/0x498
  [   41.675126]  run_timer_softirq+0x230/0x7a8
  [   41.678052]  __do_softirq+0x2d0/0xba0
  [   41.680763]  run_ksoftirqd+0x110/0x1a0
  [   41.683512]  smpboot_thread_fn+0x31c/0x620
  [   41.686429]  kthread+0x2c8/0x348
  [   41.688927]  ret_from_fork+0x10/0x18

Look into above call stack, we found a recursive call in
'ftrace_graph_call', see a snippet:
    __read_once_size_nocheck.constprop.0
      ftrace_graph_call
        ......
          rcu_dynticks_curr_cpu_in_eqs
            ftrace_graph_call

We analyze that 'rcu_dynticks_curr_cpu_in_eqs' should not be tracable,
and we verify that mark related functions as 'notrace' can avoid the
problem.

Comparing mainline kernel, we find that commit ff5c4f5c ("rcu/tree:
Mark the idle relevant functions noinstr") mark related functions as
'noinstr' which implies notrace, noinline and sticks things in the
.noinstr.text section.
Link: https://lore.kernel.org/all/20200416114706.625340212@infradead.org/



Currently 'noinstr' mechanism has not been introduced, so we would not
directly backport that commit (otherwise more changes may be introduced).
Instead, we mark the functions as 'notrace' where it is 'noinstr' in
that commit.

Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

9b5e728e

Jul 13, 2022

netfilter: nf_tables: stricter validation of element data · 29b7a991

Pablo Neira Ayuso authored 3 years ago

mainline inclusion
from mainline-v5.19-rc6
commit 7e6bc1f6cabcd30aba0b11219d8e01b952eacbb6
category: bugfix
bugzilla: 187147, https://gitee.com/src-openeuler/kernel/issues/I5GCQH


CVE: CVE-2022-34918

--------------------------------

Make sure element data type and length do not mismatch the one specified
by the set declaration.

Fixes: 7d740264 ("netfilter: nf_tables: variable sized set element keys / data")
Reported-by: Hugues ANGUELKOV <hanguelkov@randorisec.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

conflict:
	net/netfilter/nf_tables_api.c

Signed-off-by: Lu Wei <luwei32@huawei.com>
Reviewed-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

29b7a991

net: rose: fix UAF bugs caused by timer handler · 390f573b

Duoming Zhou authored 3 years ago

stable inclusion
from stable-v4.19.251
commit 2661f2d88f40e35791257d73def0319b4560b74b
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5FNWN


CVE: CVE-2022-2318

--------------------------------

commit 9cc02ede696272c5271a401e4f27c262359bc2f6 upstream.

There are UAF bugs in rose_heartbeat_expiry(), rose_timer_expiry()
and rose_idletimer_expiry(). The root cause is that del_timer()
could not stop the timer handler that is running and the refcount
of sock is not managed properly.

One of the UAF bugs is shown below:

    (thread 1)          |        (thread 2)
                        |  rose_bind
                        |  rose_connect
                        |    rose_start_heartbeat
rose_release            |    (wait a time)
  case ROSE_STATE_0     |
  rose_destroy_socket   |  rose_heartbeat_expiry
    rose_stop_heartbeat |
    sock_put(sk)        |    ...
  sock_put(sk) // FREE  |
                        |    bh_lock_sock(sk) // USE

The sock is deallocated by sock_put() in rose_release() and
then used by bh_lock_sock() in rose_heartbeat_expiry().

Although rose_destroy_socket() calls rose_stop_heartbeat(),
it could not stop the timer that is running.

The KASAN report triggered by POC is shown below:

BUG: KASAN: use-after-free in _raw_spin_lock+0x5a/0x110
Write of size 4 at addr ffff88800ae59098 by task swapper/3/0
...
Call Trace:
 <IRQ>
 dump_stack_lvl+0xbf/0xee
 print_address_description+0x7b/0x440
 print_report+0x101/0x230
 ? irq_work_single+0xbb/0x140
 ? _raw_spin_lock+0x5a/0x110
 kasan_report+0xed/0x120
 ? _raw_spin_lock+0x5a/0x110
 kasan_check_range+0x2bd/0x2e0
 _raw_spin_lock+0x5a/0x110
 rose_heartbeat_expiry+0x39/0x370
 ? rose_start_heartbeat+0xb0/0xb0
 call_timer_fn+0x2d/0x1c0
 ? rose_start_heartbeat+0xb0/0xb0
 expire_timers+0x1f3/0x320
 __run_timers+0x3ff/0x4d0
 run_timer_softirq+0x41/0x80
 __do_softirq+0x233/0x544
 irq_exit_rcu+0x41/0xa0
 sysvec_apic_timer_interrupt+0x8c/0xb0
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x1b/0x20
RIP: 0010:default_idle+0xb/0x10
RSP: 0018:ffffc9000012fea0 EFLAGS: 00000202
RAX: 000000000000bcae RBX: ffff888006660f00 RCX: 000000000000bcae
RDX: 0000000000000001 RSI: ffffffff843a11c0 RDI: ffffffff843a1180
RBP: dffffc0000000000 R08: dffffc0000000000 R09: ffffed100da36d46
R10: dfffe9100da36d47 R11: ffffffff83cf0950 R12: 0000000000000000
R13: 1ffff11000ccc1e0 R14: ffffffff8542af28 R15: dffffc0000000000
...
Allocated by task 146:
 __kasan_kmalloc+0xc4/0xf0
 sk_prot_alloc+0xdd/0x1a0
 sk_alloc+0x2d/0x4e0
 rose_create+0x7b/0x330
 __sock_create+0x2dd/0x640
 __sys_socket+0xc7/0x270
 __x64_sys_socket+0x71/0x80
 do_syscall_64+0x43/0x90
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

Freed by task 152:
 kasan_set_track+0x4c/0x70
 kasan_set_free_info+0x1f/0x40
 ____kasan_slab_free+0x124/0x190
 kfree+0xd3/0x270
 __sk_destruct+0x314/0x460
 rose_release+0x2fa/0x3b0
 sock_close+0xcb/0x230
 __fput+0x2d9/0x650
 task_work_run+0xd6/0x160
 exit_to_user_mode_loop+0xc7/0xd0
 exit_to_user_mode_prepare+0x4e/0x80
 syscall_exit_to_user_mode+0x20/0x40
 do_syscall_64+0x4f/0x90
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

This patch adds refcount of sock when we use functions
such as rose_start_heartbeat() and so on to start timer,
and decreases the refcount of sock when timer is finished
or deleted by functions such as rose_stop_heartbeat()
and so on. As a result, the UAF bugs could be mitigated.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Tested-by: Duoming Zhou <duoming@zju.edu.cn>
Link: https://lore.kernel.org/r/20220629002640.5693-1-duoming@zju.edu.cn


Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Xu Jia <xujia39@huawei.com>
Reviewed-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

390f573b

xen/arm: Fix race in RB-tree based P2M accounting · 85886239

Oleksandr Tyshchenko authored 3 years ago

mainline inclusion
from mainline-v5.19-rc6
commit b75cd218274e01d026dc5240e86fdeb44bbed0c8
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5G5TV


CVE: CVE-2022-33744

--------------------------------

xen/arm: Fix race in RB-tree based P2M accounting

During the PV driver life cycle the mappings are added to
the RB-tree by set_foreign_p2m_mapping(), which is called from
gnttab_map_refs() and are removed by clear_foreign_p2m_mapping()
which is called from gnttab_unmap_refs(). As both functions end
up calling __set_phys_to_machine_multi() which updates the RB-tree,
this function can be called concurrently.

There is already a "p2m_lock" to protect against concurrent accesses,
but the problem is that the first read of "phys_to_mach.rb_node"
in __set_phys_to_machine_multi() is not covered by it, so this might
lead to the incorrect mappings update (removing in our case) in RB-tree.

In my environment the related issue happens rarely and only when
PV net backend is running, the xen_add_phys_to_mach_entry() claims
that it cannot add new pfn <-> mfn mapping to the tree since it is
already exists which results in a failure when mapping foreign pages.

But there might be other bad consequences related to the non-protected
root reads such use-after-free, etc.

While at it, also fix the similar usage in __pfn_to_mfn(), so
initialize "struct rb_node *n" with the "p2m_lock" held in both
functions to avoid possible bad consequences.

This is CVE-2022-33744 / XSA-406.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Zhao Wenhui <zhaowenhui8@huawei.com>
Reviewed-by: Chen Hui <judy.chenhui@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

85886239

vt: drop old FONT ioctls · d0f0f567

Jiri Slaby authored 3 years ago

stable inclusion
from stable-4.19.250
commit b15d5731b708a2190fec836990b8aefbbf36b07a
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5GKVX


CVE: CVE-2021-33656

--------------------------------

commit ff2047fb755d4415ec3c70ac799889371151796d upstream.

Drop support for these ioctls:
* PIO_FONT, PIO_FONTX
* GIO_FONT, GIO_FONTX
* PIO_FONTRESET

As was demonstrated by commit 90bfdeef (tty: make FONTX ioctl use
the tty pointer they were actually passed), these ioctls are not used
from userspace, as:
1) they used to be broken (set up font on current console, not the open
   one) and racy (before the commit above)
2) KDFONTOP ioctl is used for years instead

Note that PIO_FONTRESET is defunct on most systems as VGA_CONSOLE is set
on them for ages. That turns on BROKEN_GRAPHICS_PROGRAMS which makes
PIO_FONTRESET just return an error.

We are removing KD_FONT_FLAG_OLD here as it was used only by these
removed ioctls. kd.h header exists both in kernel and uapi headers, so
we can remove the kernel one completely. Everyone includeing kd.h will
now automatically get the uapi one.

There are now unused definitions of the ioctl numbers and "struct
consolefontdesc" in kd.h, but as it is a uapi header, I am not touching
these.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Link: https://lore.kernel.org/r/20210105120239.28031-8-jslaby@suse.cz


Cc: guodaxing <guodaxing@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

d0f0f567

dm thin: Fix crash in dm_sm_register_threshold_callback() · 306383ad

Luo Meng authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GRX6


CVE: NA

--------------------------------

Fault inject on pool metadata device report:
  BUG: KASAN: use-after-free in dm_pool_register_metadata_threshold+0x40/0x80
  Read of size 8 at addr ffff8881b9d50068 by task dmsetup/950

  CPU: 7 PID: 950 Comm: dmsetup Tainted: G        W         5.19.0-rc6 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x34/0x44
   print_address_description.constprop.0.cold+0xeb/0x3f4
   kasan_report.cold+0xe6/0x147
   dm_pool_register_metadata_threshold+0x40/0x80
   pool_ctr+0xa0a/0x1150
   dm_table_add_target+0x2c8/0x640
   table_load+0x1fd/0x430
   ctl_ioctl+0x2c4/0x5a0
   dm_ctl_ioctl+0xa/0x10
   __x64_sys_ioctl+0xb3/0xd0
   do_syscall_64+0x35/0x80
   entry_SYSCALL_64_after_hwframe+0x46/0xb0

This can be easy reproduce:
  echo offline > /sys/block/sda/device/state
  dd if=/dev/zero of=/dev/mapper/thin bs=4k count=10
  dmsetup load pool --table "0 20971520 thin-pool /dev/sda /dev/sdb 128 0 0"

If metadata commit failed, the transaction will be aborted and the metadata
space manager will be destroyed. If load table on this pool, when register the
metadata threshold callback, the UAF will happen on metadata space manager.

So return error when load table if the pool is on FAIL status.

Fixes: ac8c3f3d ("dm thin: generate event when metadata threshold passed")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Luo Meng <luomeng12@huawei.com>
Reviewed-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

306383ad

xen/blkfront: force data bouncing when backend is untrusted · 6f522137

Roger Pau Monne authored 3 years ago

stable inclusion
from stable-v4.19.251
commit 981de55fb6b5253fa7ae345827c6c3ca77912e5c
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5GM0S


CVE: CVE-2022-33742

--------------------------------

commit 2400617da7eebf9167d71a46122828bc479d64c9 upstream.

Split the current bounce buffering logic used with persistent grants
into it's own option, and allow enabling it independently of
persistent grants.  This allows to reuse the same code paths to
perform the bounce buffering required to avoid leaking contiguous data
in shared pages not part of the request fragments.

Reporting whether the backend is to be trusted can be done using a
module parameter, or from the xenstore frontend path as set by the
toolstack when adding the device.

This is CVE-2022-33742, part of XSA-403.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

6f522137

xen/netfront: force data bouncing when backend is untrusted · e319c741

Roger Pau Monne authored 3 years ago

stable inclusion
from stable-v4.19.251
commit 4b67d8e42dbba42cfafe22ac3e4117d9573fdd74
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5GM0S


CVE: CVE-2022-33741

--------------------------------

commit 4491001c2e0fa69efbb748c96ec96b100a5cdb7e upstream.

Bounce all data on the skbs to be transmitted into zeroed pages if the
backend is untrusted. This avoids leaking data present in the pages
shared with the backend but not part of the skb fragments.  This
requires introducing a new helper in order to allocate skbs with a
size multiple of XEN_PAGE_SIZE so we don't leak contiguous data on the
granted pages.

Reporting whether the backend is to be trusted can be done using a
module parameter, or from the xenstore frontend path as set by the
toolstack when adding the device.

This is CVE-2022-33741, part of XSA-403.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

e319c741

xen-netfront: fix potential deadlock in xennet_remove() · 9d638a9c

Andrea Righi authored 3 years ago

stable inclusion
from stable-v4.19.137
commit 4b635fc2b3491bdaa69fd80b376b216fae2d1461
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5GLZZ


CVE: CVE-2022-33741

--------------------------------

[ Upstream commit c2c63310 ]

There's a potential race in xennet_remove(); this is what the driver is
doing upon unregistering a network device:

  1. state = read bus state
  2. if state is not "Closed":
  3.    request to set state to "Closing"
  4.    wait for state to be set to "Closing"
  5.    request to set state to "Closed"
  6.    wait for state to be set to "Closed"

If the state changes to "Closed" immediately after step 1 we are stuck
forever in step 4, because the state will never go back from "Closed" to
"Closing".

Make sure to check also for state == "Closed" in step 4 to prevent the
deadlock.

Also add a 5 sec timeout any time we wait for the bus state to change,
to avoid getting stuck forever in wait_event().

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

9d638a9c

xen/netfront: fix leaking data in shared pages · e24409ef

Roger Pau Monne authored 3 years ago

stable inclusion
from stable-v4.19.251
commit 3650ac3218c1640a3d597a8cee17d8e2fcf0ed4e
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5GLYP


CVE: CVE-2022-33740

--------------------------------

commit 307c8de2b02344805ebead3440d8feed28f2f010 upstream.

When allocating pages to be used for shared communication with the
backend always zero them, this avoids leaking unintended data present
on the pages.

This is CVE-2022-33740, part of XSA-403.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

e24409ef

xen/blkfront: fix leaking data in shared pages · b8cb95e3

Roger Pau Monne authored 3 years ago

stable inclusion
from stable-v4.19.251
commit f4a1391185e30c977bfe1648435c152f806211c7
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5GLXT


CVE: CVE-2022-26365

--------------------------------

commit 2f446ffe9d737e9a844b97887919c4fda18246e7 upstream.

When allocating pages to be used for shared communication with the
backend always zero them, this avoids leaking unintended data present
on the pages.

This is CVE-2022-26365, part of XSA-403.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

b8cb95e3

xen/blkfront: fix memory allocation flags in blkfront_setup_indirect() · 6bbf7b40

Juergen Gross authored 3 years ago

stable inclusion
from stable-v4.19.116
commit 5f547e7cbd8435be3aa2a27e2ae594b4fd94865b
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5GLXT


CVE: CVE-2022-26365

--------------------------------

commit 3a169c0b upstream.

Commit 1d5c76e6 ("xen-blkfront: switch kcalloc to kvcalloc for
large array allocation") didn't fix the issue it was meant to, as the
flags for allocating the memory are GFP_NOIO, which will lead the
memory allocation falling back to kmalloc().

So instead of GFP_NOIO use GFP_KERNEL and do all the memory allocation
in blkfront_setup_indirect() in a memalloc_noio_{save,restore} section.

Fixes: 1d5c76e6 ("xen-blkfront: switch kcalloc to kvcalloc for large array allocation")
Cc: stable@vger.kernel.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Link: https://lore.kernel.org/r/20200403090034.8753-1-jgross@suse.com


Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

6bbf7b40

Jul 12, 2022

tmpfs: fix the issue that the mount and remount results are inconsistent. · e98687e5

ZhaoLong Wang authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: 187184, https://gitee.com/openeuler/kernel/issues/I5G9EK


CVE: NA

--------------------------------

An undefined-behavior issue has not been completely fixed since commit
908d0a9001a8("tmpfs: fix undefined-behaviour in shmem_reconfigure()").
In the commit, check in the shmem_reconfigure() is added in remount
process to avoid the Ubsan problem.  However, the check is not added to
the mount process.  It causes inconsistent results between mount and
remount.  The operations to reproduce the problem in user mode as follows:

If nr_blocks is set to 0x8000000000000000, the mounting is successful.

  # mount tmpfs /dev/shm/ -t tmpfs -o nr_blocks=0x8000000000000000

However, when -o remount is used, the mount fails because of the
check in the shmem_reconfigure()

  # mount tmpfs /dev/shm/ -t tmpfs -o remount,nr_blocks=0x8000000000000000
  mount: /dev/shm: mount point not mounted or bad option.

Therefore, add checks in the shmem_parse_one() function and remove the
check in shmem_reconfigure() to avoid this problem.

Signed-off-by: ZhaoLong Wang <wangzhaolong1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

e98687e5

tmpfs: fix undefined-behaviour in shmem_reconfigure() · d271201b

Luo Meng authored 3 years ago

mainline inclusion
from mainline-v5.19-rc3
commit d14f5efadd846dbde561bd734318de6a9e6b26e6
category: bugfix
bugzilla: 186471 https://gitee.com/openeuler/kernel/issues/I5DRKR
CVE: NA

--------------------------------

When shmem_reconfigure() calls __percpu_counter_compare(), the second
parameter is unsigned long long.  But in the definition of
__percpu_counter_compare(), the second parameter is s64.  So when
__percpu_counter_compare() executes abs(count - rhs), UBSAN shows the
following warning:

================================================================================
UBSAN: Undefined behaviour in lib/percpu_counter.c:209:6
signed integer overflow:
0 - -9223372036854775808 cannot be represented in type 'long long int'
CPU: 1 PID: 9636 Comm: syz-executor.2 Tainted: G                 ---------r-  - 4.18.0 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
 __dump_stack home/install/linux-rh-3-10/lib/dump_stack.c:77 [inline]
 dump_stack+0x125/0x1ae home/install/linux-rh-3-10/lib/dump_stack.c:117
 ubsan_epilogue+0xe/0x81 home/install/linux-rh-3-10/lib/ubsan.c:159
 handle_overflow+0x19d/0x1ec home/install/linux-rh-3-10/lib/ubsan.c:190
 __percpu_counter_compare+0x124/0x140 home/install/linux-rh-3-10/lib/percpu_counter.c:209
 percpu_counter_compare home/install/linux-rh-3-10/./include/linux/percpu_counter.h:50 [inline]
 shmem_remount_fs+0x1ce/0x6b0 home/install/linux-rh-3-10/mm/shmem.c:3530
 do_remount_sb+0x11b/0x530 home/install/linux-rh-3-10/fs/super.c:888
 do_remount home/install/linux-rh-3-10/fs/namespace.c:2344 [inline]
 do_mount+0xf8d/0x26b0 home/install/linux-rh-3-10/fs/namespace.c:2844
 ksys_mount+0xad/0x120 home/install/linux-rh-3-10/fs/namespace.c:3075
 __do_sys_mount home/install/linux-rh-3-10/fs/namespace.c:3089 [inline]
 __se_sys_mount home/install/linux-rh-3-10/fs/namespace.c:3086 [inline]
 __x64_sys_mount+0xbf/0x160 home/install/linux-rh-3-10/fs/namespace.c:3086
 do_syscall_64+0xca/0x5c0 home/install/linux-rh-3-10/arch/x86/entry/common.c:298
 entry_SYSCALL_64_after_hwframe+0x6a/0xdf
RIP: 0033:0x46b5e9
Code: 5d db fa ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b db fa ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f54d5f22c68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 000000000077bf60 RCX: 000000000046b5e9
RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000000
RBP: 000000000077bf60 R08: 0000000020000140 R09: 0000000000000000
R10: 00000000026740a4 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd1fb1592f R14: 00007f54d5f239c0 R15: 000000000077bf6c
================================================================================

[akpm@linux-foundation.org: tweak error message text]
Link: https://lkml.kernel.org/r/20220513025225.2678727-1-luomeng12@huawei.com


Signed-off-by: Luo Meng <luomeng12@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Conflicts:
        mm/shmem.c

Signed-off-by: ZhaoLong Wang <wangzhaolong1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

d271201b

mm/sharepool: Check sp_is_enabled() before show spa_stat · 713b8198

Wang Wensheng authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GO4E


CVE: NA

--------------------------------

When we read file /proc/sharepool/spa_stat, we use sp_mapping_normal to
iterator all the normal sp_area. The global pointer sp_mapping_normal is
NULL if the sharepool feature is not enabled via kernel bootarg. This
leads to a null-pointer-dereference issue.

Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
Reviewed-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

713b8198

Jul 08, 2022

x86: Fix return value of __setup handlers · 2467bc50

Randy Dunlap authored 3 years ago

stable inclusion
from stable-4.19.247
commit 61967ac7ba2814fefd033eb3979058057a18edc1
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

[ Upstream commit 12441ccdf5e2f5a01a46e344976cbbd3d46845c9 ]

__setup() handlers should return 1 to obsolete_checksetup() in
init/main.c to indicate that the boot option has been handled. A return
of 0 causes the boot option/value to be listed as an Unknown kernel
parameter and added to init's (limited) argument (no '=') or environment
(with '=') strings. So return 1 from these x86 __setup handlers.

Examples:

  Unknown kernel command line parameters "apicpmtimer
    BOOT_IMAGE=/boot/bzImage-517rc8 vdso=1 ring3mwait=disable", will be
    passed to user space.

  Run /sbin/init as init process
   with arguments:
     /sbin/init
     apicpmtimer
   with environment:
     HOME=/
     TERM=linux
     BOOT_IMAGE=/boot/bzImage-517rc8
     vdso=1
     ring3mwait=disable

Fixes: 2aae950b ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu")
Fixes: 77b52b4c ("x86: add "debugpat" boot option")
Fixes: e16fd002 ("x86/cpufeature: Enable RING3MWAIT for Knights Landing")
Fixes: b8ce3359 ("x86_64: convert to clock events")
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Link: https://lore.kernel.org/r/20220314012725.26661-1-rdunlap@infradead.org


Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

4.19.90-2207.2.0

2467bc50

x86/delay: Fix the wrong asm constraint in delay_loop() · 5b682ab6

Ammar Faizi authored 3 years ago

stable inclusion
from stable-4.19.247
commit 12ffed97ae3303c3c4bc772c9329a9977a9941d6
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

[ Upstream commit b86eb74098a92afd789da02699b4b0dd3f73b889 ]

The asm constraint does not reflect the fact that the asm statement can
modify the value of the local variable loops. Which it does.

Specifying the wrong constraint may lead to undefined behavior, it may
clobber random stuff (e.g. local variable, important temporary value in
regs, etc.). This is especially dangerous when the compiler decides to
inline the function and since it doesn't know that the value gets
modified, it might decide to use it from a register directly without
reloading it.

Change the constraint to "+a" to denote that the first argument is an
input and an output argument.

  [ bp: Fix typo, massage commit message. ]

Fixes: e01b70ef ("x86: fix bug in arch/i386/lib/delay.c file, delay_loop function")
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220329104705.65256-2-ammarfaizi2@gnuweeb.org


Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

5b682ab6

ACPI: sysfs: Fix BERT error region memory mapping · 19adf775

Lorenzo Pieralisi authored 3 years ago

stable inclusion
from stable-4.19.247
commit b322e833697c61a1ee1c239dd4a42b38f6ad531
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY
CVE: NA

--------------------------------

commit 1bbc21785b7336619fb6a67f1fff5afdaf229acc upstream.

Currently the sysfs interface maps the BERT error region as "memory"
(through acpi_os_map_memory()) in order to copy the error records into
memory buffers through memory operations (eg memory_read_from_buffer()).

The OS system cannot detect whether the BERT error region is part of
system RAM or it is "device memory" (eg BMC memory) and therefore it
cannot detect which memory attributes the bus to memory support (and
corresponding kernel mapping, unless firmware provides the required
information).

The acpi_os_map_memory() arch backend implementation determines the
mapping attributes. On arm64, if the BERT error region is not present in
the EFI memory map, the error region is mapped as device-nGnRnE; this
triggers alignment faults since memcpy unaligned accesses are not
allowed in device-nGnRnE regions.

The ACPI sysfs code cannot therefore map by default the BERT error
region with memory semantics but should use a safer default.

Change the sysfs code to map the BERT error region as MMIO (through
acpi_os_map_iomem()) and use the memcpy_fromio() interface to read the
error region into the kernel buffer.

Link: https://lore.kernel.org/linux-arm-kernel/31ffe8fc-f5ee-2858-26c5-0fd8bdd68702@arm.com
Link: https://lore.kernel.org/linux-acpi/CAJZ5v0g+OVbhuUUDrLUCfX_mVqY_e8ubgLTU98=jfjTeb4t+Pw@mail.gmail.com


Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Tested-by: Veronika Kabatova <vkabatov@redhat.com>
Tested-by: Aristeu Rozanski <aris@redhat.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

 Conflicts:
	drivers/acpi/sysfs.c
[wangxiongfeng: fix conflicts caused by commit bdd56d7d89
    ACPI: sysfs: Make sparse happy about address space in use]
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

19adf775

tcp: fix tcp_mtup_probe_success vs wrong snd_cwnd · 9e2de13c

Eric Dumazet authored 3 years ago

stable inclusion
from stable-4.19.247
commit f2845e1504a3bc4f3381394f057e8b63cb5f3f7a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

commit 11825765291a93d8e7f44230da67b9f607c777bf upstream.

syzbot got a new report [1] finally pointing to a very old bug,
added in initial support for MTU probing.

tcp_mtu_probe() has checks about starting an MTU probe if
tcp_snd_cwnd(tp) >= 11.

But nothing prevents tcp_snd_cwnd(tp) to be reduced later
and before the MTU probe succeeds.

This bug would lead to potential zero-divides.

Debugging added in commit 40570375356c ("tcp: add accessors
to read/set tp->snd_cwnd") has paid off :)

While we are at it, address potential overflows in this code.

[1]
WARNING: CPU: 1 PID: 14132 at include/net/tcp.h:1219 tcp_mtup_probe_success+0x366/0x570 net/ipv4/tcp_input.c:2712
Modules linked in:
CPU: 1 PID: 14132 Comm: syz-executor.2 Not tainted 5.18.0-syzkaller-07857-gbabf0bb978e3 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:tcp_snd_cwnd_set include/net/tcp.h:1219 [inline]
RIP: 0010:tcp_mtup_probe_success+0x366/0x570 net/ipv4/tcp_input.c:2712
Code: 74 08 48 89 ef e8 da 80 17 f9 48 8b 45 00 65 48 ff 80 80 03 00 00 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 aa b0 c5 f8 <0f> 0b e9 16 fe ff ff 48 8b 4c 24 08 80 e1 07 38 c1 0f 8c c7 fc ff
RSP: 0018:ffffc900079e70f8 EFLAGS: 00010287
RAX: ffffffff88c0f7f6 RBX: ffff8880756e7a80 RCX: 0000000000040000
RDX: ffffc9000c6c4000 RSI: 0000000000031f9e RDI: 0000000000031f9f
RBP: 0000000000000000 R08: ffffffff88c0f606 R09: ffffc900079e7520
R10: ffffed101011226d R11: 1ffff1101011226c R12: 1ffff1100eadcf50
R13: ffff8880756e72c0 R14: 1ffff1100eadcf89 R15: dffffc0000000000
FS:  00007f643236e700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1ab3f1e2a0 CR3: 0000000064fe7000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 tcp_clean_rtx_queue+0x223a/0x2da0 net/ipv4/tcp_input.c:3356
 tcp_ack+0x1962/0x3c90 net/ipv4/tcp_input.c:3861
 tcp_rcv_established+0x7c8/0x1ac0 net/ipv4/tcp_input.c:5973
 tcp_v6_do_rcv+0x57b/0x1210 net/ipv6/tcp_ipv6.c:1476
 sk_backlog_rcv include/net/sock.h:1061 [inline]
 __release_sock+0x1d8/0x4c0 net/core/sock.c:2849
 release_sock+0x5d/0x1c0 net/core/sock.c:3404
 sk_stream_wait_memory+0x700/0xdc0 net/core/stream.c:145
 tcp_sendmsg_locked+0x111d/0x3fc0 net/ipv4/tcp.c:1410
 tcp_sendmsg+0x2c/0x40 net/ipv4/tcp.c:1448
 sock_sendmsg_nosec net/socket.c:714 [inline]
 sock_sendmsg net/socket.c:734 [inline]
 __sys_sendto+0x439/0x5c0 net/socket.c:2119
 __do_sys_sendto net/socket.c:2131 [inline]
 __se_sys_sendto net/socket.c:2127 [inline]
 __x64_sys_sendto+0xda/0xf0 net/socket.c:2127
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f6431289109
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f643236e168 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f643139c100 RCX: 00007f6431289109
RDX: 00000000d0d0c2ac RSI: 0000000020000080 RDI: 000000000000000a
RBP: 00007f64312e308d R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fff372533af R14: 00007f643236e300 R15: 0000000000022000

Fixes: 5d424d5a ("[TCP]: MTU probing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

9e2de13c

nbd: fix io hung while disconnecting device · 5cade1d2

余快 authored 3 years ago

stable inclusion
from stable-4.19.247
commit 62d227f67a8c25d5e16f40e5290607f9306d2188
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

[ Upstream commit 09dadb5985023e27d4740ebd17e6fea4640110e5 ]

In our tests, "qemu-nbd" triggers a io hung:

INFO: task qemu-nbd:11445 blocked for more than 368 seconds.
      Not tainted 5.18.0-rc3-next-20220422-00003-g2176915513ca #884
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:qemu-nbd        state:D stack:    0 pid:11445 ppid:     1 flags:0x00000000
Call Trace:
 <TASK>
 __schedule+0x480/0x1050
 ? _raw_spin_lock_irqsave+0x3e/0xb0
 schedule+0x9c/0x1b0
 blk_mq_freeze_queue_wait+0x9d/0xf0
 ? ipi_rseq+0x70/0x70
 blk_mq_freeze_queue+0x2b/0x40
 nbd_add_socket+0x6b/0x270 [nbd]
 nbd_ioctl+0x383/0x510 [nbd]
 blkdev_ioctl+0x18e/0x3e0
 __x64_sys_ioctl+0xac/0x120
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fd8ff706577
RSP: 002b:00007fd8fcdfebf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000040000000 RCX: 00007fd8ff706577
RDX: 000000000000000d RSI: 000000000000ab00 RDI: 000000000000000f
RBP: 000000000000000f R08: 000000000000fbe8 R09: 000055fe497c62b0
R10: 00000002aff20000 R11: 0000000000000246 R12: 000000000000006d
R13: 0000000000000000 R14: 00007ffe82dc5e70 R15: 00007fd8fcdff9c0

"qemu-ndb -d" will call ioctl 'NBD_DISCONNECT' first, however, following
message was found:

block nbd0: Send disconnect failed -32

Which indicate that something is wrong with the server. Then,
"qemu-nbd -d" will call ioctl 'NBD_CLEAR_SOCK', however ioctl can't clear
requests after commit 2516ab15("nbd: only clear the queue on device
teardown"). And in the meantime, request can't complete through timeout
because nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER', which
means such request will never be completed in this situation.

Now that the flag 'NBD_CMD_INFLIGHT' can make sure requests won't
complete multiple times, switch back to call nbd_clear_sock() in
nbd_clear_sock_ioctl(), so that inflight requests can be cleared.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20220521073749.3146892-5-yukuai3@huawei.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

5cade1d2