Commits · a9c92fd420fb83ac0060a89ffd24386b7ea96ccf · Summer2022 / 22b970497

Jul 19, 2021

x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing · a9c92fd4

Thomas Gleixner authored 4 years ago


stable inclusion
from linux-4.19.194
commit 7e25cb1b22f81239ae3332e14a1d0cff7014bccd

--------------------------------

commit 7d65f9e80646c595e8c853640a9d0768a33e204c upstream.

PIC interrupts do not support affinity setting and they can end up on
any online CPU. Therefore, it's required to mark the associated vectors
as system-wide reserved. Otherwise, the corresponding irq descriptors
are copied to the secondary CPUs but the vectors are not marked as
assigned or reserved. This works correctly for the IO/APIC case.

When the IO/APIC is disabled via config, kernel command line or lack of
enumeration then all legacy interrupts are routed through the PIC, but
nothing marks them as system-wide reserved vectors.

As a consequence, a subsequent allocation on a secondary CPU can result in
allocating one of these vectors, which triggers the BUG() in
apic_update_vector() because the interrupt descriptor slot is not empty.

Imran tried to work around that by marking those interrupts as allocated
when a CPU comes online. But that's wrong in case that the IO/APIC is
available and one of the legacy interrupts, e.g. IRQ0, has been switched to
PIC mode because then marking them as allocated will fail as they are
already marked as system vectors.

Stay consistent and update the legacy vectors after attempting IO/APIC
initialization and mark them as system vectors in case that no IO/APIC is
available.

Fixes: 69cde000 ("x86/vector: Use matrix allocator for vector assignment")
Reported-by: Imran Khan <imran.f.khan@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210519233928.2157496-1-imran.f.khan@oracle.com


Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

a9c92fd4

pid: take a reference when initializing `cad_pid` · 79663d99

Mark Rutland authored 4 years ago

stable inclusion
from linux-4.19.194
commit d106f05432e60f9f62d456ef017687f5c73cb414

--------------------------------

commit 0711f0d7050b9e07c44bc159bbc64ac0a1022c7f upstream.

During boot, kernel_init_freeable() initializes `cad_pid` to the init
task's struct pid.  Later on, we may change `cad_pid` via a sysctl, and
when this happens proc_do_cad_pid() will increment the refcount on the
new pid via get_pid(), and will decrement the refcount on the old pid
via put_pid().  As we never called get_pid() when we initialized
`cad_pid`, we decrement a reference we never incremented, can therefore
free the init task's struct pid early.  As there can be dangling
references to the struct pid, we can later encounter a use-after-free
(e.g.  when delivering signals).

This was spotted when fuzzing v5.13-rc3 with Syzkaller, but seems to
have been around since the conversion of `cad_pid` to struct pid in
commit 9ec52099 ("[PATCH] replace cad_pid by a struct pid") from the
pre-KASAN stone age of v2.6.19.

Fix this by getting a reference to the init task's struct pid when we
assign it to `cad_pid`.

Full KASAN splat below.

   ==================================================================
   BUG: KASAN: use-after-free in ns_of_pid include/linux/pid.h:153 [inline]
   BUG: KASAN: use-after-free in task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509
   Read of size 4 at addr ffff23794dda0004 by task syz-executor.0/273

   CPU: 1 PID: 273 Comm: syz-executor.0 Not tainted 5.12.0-00001-g9aef892b2d15 #1
   Hardware name: linux,dummy-virt (DT)
   Call trace:
    ns_of_pid include/linux/pid.h:153 [inline]
    task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509
    do_notify_parent+0x308/0xe60 kernel/signal.c:1950
    exit_notify kernel/exit.c:682 [inline]
    do_exit+0x2334/0x2bd0 kernel/exit.c:845
    do_group_exit+0x108/0x2c8 kernel/exit.c:922
    get_signal+0x4e4/0x2a88 kernel/signal.c:2781
    do_signal arch/arm64/kernel/signal.c:882 [inline]
    do_notify_resume+0x300/0x970 arch/arm64/kernel/signal.c:936
    work_pending+0xc/0x2dc

   Allocated by task 0:
    slab_post_alloc_hook+0x50/0x5c0 mm/slab.h:516
    slab_alloc_node mm/slub.c:2907 [inline]
    slab_alloc mm/slub.c:2915 [inline]
    kmem_cache_alloc+0x1f4/0x4c0 mm/slub.c:2920
    alloc_pid+0xdc/0xc00 kernel/pid.c:180
    copy_process+0x2794/0x5e18 kernel/fork.c:2129
    kernel_clone+0x194/0x13c8 kernel/fork.c:2500
    kernel_thread+0xd4/0x110 kernel/fork.c:2552
    rest_init+0x44/0x4a0 init/main.c:687
    arch_call_rest_init+0x1c/0x28
    start_kernel+0x520/0x554 init/main.c:1064
    0x0

   Freed by task 270:
    slab_free_hook mm/slub.c:1562 [inline]
    slab_free_freelist_hook+0x98/0x260 mm/slub.c:1600
    slab_free mm/slub.c:3161 [inline]
    kmem_cache_free+0x224/0x8e0 mm/slub.c:3177
    put_pid.part.4+0xe0/0x1a8 kernel/pid.c:114
    put_pid+0x30/0x48 kernel/pid.c:109
    proc_do_cad_pid+0x190/0x1b0 kernel/sysctl.c:1401
    proc_sys_call_handler+0x338/0x4b0 fs/proc/proc_sysctl.c:591
    proc_sys_write+0x34/0x48 fs/proc/proc_sysctl.c:617
    call_write_iter include/linux/fs.h:1977 [inline]
    new_sync_write+0x3ac/0x510 fs/read_write.c:518
    vfs_write fs/read_write.c:605 [inline]
    vfs_write+0x9c4/0x1018 fs/read_write.c:585
    ksys_write+0x124/0x240 fs/read_write.c:658
    __do_sys_write fs/read_write.c:670 [inline]
    __se_sys_write fs/read_write.c:667 [inline]
    __arm64_sys_write+0x78/0xb0 fs/read_write.c:667
    __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline]
    invoke_syscall arch/arm64/kernel/syscall.c:49 [inline]
    el0_svc_common.constprop.1+0x16c/0x388 arch/arm64/kernel/syscall.c:129
    do_el0_svc+0xf8/0x150 arch/arm64/kernel/syscall.c:168
    el0_svc+0x28/0x38 arch/arm64/kernel/entry-common.c:416
    el0_sync_handler+0x134/0x180 arch/arm64/kernel/entry-common.c:432
    el0_sync+0x154/0x180 arch/arm64/kernel/entry.S:701

   The buggy address belongs to the object at ffff23794dda0000
    which belongs to the cache pid of size 224
   The buggy address is located 4 bytes inside of
    224-byte region [ffff23794dda0000, ffff23794dda00e0)
   The buggy address belongs to the page:
   page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4dda0
   head:(____ptrval____) order:1 compound_mapcount:0
   flags: 0x3fffc0000010200(slab|head)
   raw: 03fffc0000010200 dead000000000100 dead000000000122 ffff23794d40d080
   raw: 0000000000000000 0000000000190019 00000001ffffffff 0000000000000000
   page dumped because: kasan: bad access detected

   Memory state around the buggy address:
    ffff23794dd9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    ffff23794dd9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
   >ffff23794dda0000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                      ^
    ffff23794dda0080: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
    ffff23794dda0100: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00
   ==================================================================

Link: https://lkml.kernel.org/r/20210524172230.38715-1-mark.rutland@arm.com


Fixes: 9ec52099 ("[PATCH] replace cad_pid by a struct pid")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

79663d99

netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches · 89db94b1

Pablo Neira Ayuso authored 4 years ago


stable inclusion
from linux-4.19.194
commit 8aed10cd9497933ebe3fc0afe8d8ddc1af8b0d48

--------------------------------

[ Upstream commit 8971ee8b087750a23f3cd4dc55bff2d0303fd267 ]

The private helper data size cannot be updated. However, updates that
contain NFCTH_PRIV_DATA_LEN might bogusly hit EBUSY even if the size is
the same.

Fixes: 12f7a505 ("netfilter: add user-space connection tracking helper infrastructure")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

89db94b1

ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service · 32bedd8f

Julian Anastasov authored 4 years ago


stable inclusion
from linux-4.19.194
commit 6895ac910b73f7f886bb292c0a87d6e4d6eb6801

--------------------------------

[ Upstream commit 56e4ee82e850026d71223262c07df7d6af3bd872 ]

syzbot reported memory leak [1] when adding service with
HASHED flag. We should ignore this flag both from sockopt
and netlink provided data, otherwise the service is not
hashed and not visible while releasing resources.

[1]
BUG: memory leak
unreferenced object 0xffff888115227800 (size 512):
  comm "syz-executor263", pid 8658, jiffies 4294951882 (age 12.560s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff83977188>] kmalloc include/linux/slab.h:556 [inline]
    [<ffffffff83977188>] kzalloc include/linux/slab.h:686 [inline]
    [<ffffffff83977188>] ip_vs_add_service+0x598/0x7c0 net/netfilter/ipvs/ip_vs_ctl.c:1343
    [<ffffffff8397d770>] do_ip_vs_set_ctl+0x810/0xa40 net/netfilter/ipvs/ip_vs_ctl.c:2570
    [<ffffffff838449a8>] nf_setsockopt+0x68/0xa0 net/netfilter/nf_sockopt.c:101
    [<ffffffff839ae4e9>] ip_setsockopt+0x259/0x1ff0 net/ipv4/ip_sockglue.c:1435
    [<ffffffff839fa03c>] raw_setsockopt+0x18c/0x1b0 net/ipv4/raw.c:857
    [<ffffffff83691f20>] __sys_setsockopt+0x1b0/0x360 net/socket.c:2117
    [<ffffffff836920f2>] __do_sys_setsockopt net/socket.c:2128 [inline]
    [<ffffffff836920f2>] __se_sys_setsockopt net/socket.c:2125 [inline]
    [<ffffffff836920f2>] __x64_sys_setsockopt+0x22/0x30 net/socket.c:2125
    [<ffffffff84350efa>] do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
    [<ffffffff84400068>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Reported-and-tested-by:  <syzbot+e562383183e4b1766930@syzkaller.appspotmail.com>
Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

32bedd8f

vfio/platform: fix module_put call in error flow · 3c02fed5

Max Gurtovoy authored 4 years ago


stable inclusion
from linux-4.19.194
commit 2bd07ebcb949b17a431aacd0e99fa4558c4a5600

--------------------------------

[ Upstream commit dc51ff91cf2d1e9a2d941da483602f71d4a51472 ]

The ->parent_module is the one that use in try_module_get. It should
also be the one the we use in module_put during vfio_platform_open().

Fixes: 32a2d71c ("vfio: platform: introduce vfio-platform-base module")
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Message-Id: <20210518192133.59195-1-mgurtovoy@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

3c02fed5

vfio/pci: zap_vma_ptes() needs MMU · 53728502

Randy Dunlap authored 4 years ago


stable inclusion
from linux-4.19.194
commit 2b2ca3ee36e4e40d98c01b41a5819243197d4609

--------------------------------

[ Upstream commit 2a55ca37350171d9b43d561528f23d4130097255 ]

zap_vma_ptes() is only available when CONFIG_MMU is set/enabled.
Without CONFIG_MMU, vfio_pci.o has build errors, so make
VFIO_PCI depend on MMU.

riscv64-linux-ld: drivers/vfio/pci/vfio_pci.o: in function `vfio_pci_mmap_open':
vfio_pci.c:(.text+0x1ec): undefined reference to `zap_vma_ptes'
riscv64-linux-ld: drivers/vfio/pci/vfio_pci.o: in function `.L0 ':
vfio_pci.c:(.text+0x165c): undefined reference to `zap_vma_ptes'

Fixes: 11c4cd07 ("vfio-pci: Fault mmaps to enable vma tracking")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210515190856.2130-1-rdunlap@infradead.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

53728502

vfio/pci: Fix error return code in vfio_ecap_init() · 06fe2528

Zhen Lei authored 4 years ago


stable inclusion
from linux-4.19.194
commit c953aee0d8a162d249dedaa47111e95dde0461cd

--------------------------------

[ Upstream commit d1ce2c79156d3baf0830990ab06d296477b93c26 ]

The error code returned from vfio_ext_cap_len() is stored in 'len', not
in 'ret'.

Fixes: 89e1f7d4 ("vfio: Add PCI device driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Message-Id: <20210515020458.6771-1-thunder.leizhen@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

06fe2528

efi: cper: fix snprintf() use in cper_dimm_err_location() · 5e342e81

Rasmus Villemoes authored 4 years ago


stable inclusion
from linux-4.19.194
commit dd47a33e11fdd6c9d31570500e6ecc369ba1f08e

--------------------------------

[ Upstream commit 942859d969de7f6f7f2659a79237a758b42782da ]

snprintf() should be given the full buffer size, not one less. And it
guarantees nul-termination, so doing it manually afterwards is
pointless.

It's even potentially harmful (though probably not in practice because
CPER_REC_LEN is 256), due to the "return how much would have been
written had the buffer been big enough" semantics. I.e., if the bank
and/or device strings are long enough that the "DIMM location ..."
output gets truncated, writing to msg[n] is a buffer overflow.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Fixes: 3760cd20 ("CPER: Adjust code flow of some functions")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

5e342e81

efi: Allow EFI_MEMORY_XP and EFI_MEMORY_RO both to be cleared · 3f680876

Heiner Kallweit authored 4 years ago


stable inclusion
from linux-4.19.194
commit 5fdb418b14e41b33880b949db64b18f1f448916b

--------------------------------

[ Upstream commit 45add3cc99feaaf57d4b6f01d52d532c16a1caee ]

UEFI spec 2.9, p.108, table 4-1 lists the scenario that both attributes
are cleared with the description "No memory access protection is
possible for Entry". So we can have valid entries where both attributes
are cleared, so remove the check.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Fixes: 10f0d2f5 ("efi: Implement generic support for the Memory Attributes table")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

3f680876

lib/clear_user: ensure loop in __arch_clear_user cache-aligned · 04197a72

Cheng Jian authored 4 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I3OX0C


CVE: NA

--------------------------------

We must ensure that the following four instructions are cache-aligned.
Otherwise, it will cause problems with the performance of libMicro
pread.

1:
	# uao_user_alternative 9f, str, sttr, xzr, x0, 8
	str     xzr, [x0], #8
	nop
	subs    x1, x1, #8
	b.pl    1b

with this patch:

             prc thr   usecs/call      samples   errors cnt/samp     size
pread_z100     1   1      5.88400          807        0	1	     102400

The result of pread can range from 5 to 9 depending on  the
alignment performance of this function.

Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

04197a72

scsi: core: Treat device offline as a failure · ed0c958e

Jason Yan authored 4 years ago

mainline inclusion
from mainline-v5.14-rc1
commit 1ee2753422349723d27009f2f973d03289d430ab
category: bugfix
bugzilla: NA
CVE: NA

-----------------------------------------------

When a SCSI device is offline a MODE SENSE command will return a result
with only DID_NO_CONNECT set. In sd_read_write_protect_flag() only the
status byte of the result is checked. Despite a returned status of
DID_NO_CONNECT the command is considered successful and we read
sdkp->write_prot from a buffer containing garbage.

Modify scsi_status_is_good() to treat DID_NO_CONNECT as a failure case.

Link: https://lore.kernel.org/r/20210330114727.234467-1-yanaijie@huawei.com


Signed-off-by: Jason Yan <yanaijie@huawei.com>

conflicts:
include/scsi/scsi.h

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

ed0c958e

Revert "scsi: check the whole result for reading write protect flag" · 62371e76

Ye Bin authored 4 years ago


hulk inclusion
category: bugfix
bugzilla: NA
CVE: NA

-----------------------------------------------

This reverts commit e9e58048.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

62371e76

ext4: fix WARN_ON_ONCE(!buffer_uptodate) after an error writing the superblock · 8a92b248

Ye Bin authored 4 years ago


mainline inclusion
from mainline-v5.14-rc1
commit 558d6450c7755aa005d89021204b6cdcae5e848f
category: bugfix
bugzilla: 109280
CVE: NA

-----------------------------------------------

If a writeback of the superblock fails with an I/O error, the buffer
is marked not uptodate.  However, this can cause a WARN_ON to trigger
when we attempt to write superblock a second time.  (Which might
succeed this time, for cerrtain types of block devices such as iSCSI
devices over a flaky network.)

Try to detect this case in flush_stashed_error_work(), and also change
__ext4_handle_dirty_metadata() so we always set the uptodate flag, not
just in the nojournal case.

Before this commit, this problem can be repliciated via:

1. dmsetup  create dust1 --table  '0 2097152 dust /dev/sdc 0 4096'
2. mount  /dev/mapper/dust1  /home/test
3. dmsetup message dust1 0 addbadblock 0 10
4. cd /home/test
5. echo "XXXXXXX" > t

After a few seconds, we got following warning:

[   80.654487] end_buffer_async_write: bh=0xffff88842f18bdd0
[   80.656134] Buffer I/O error on dev dm-0, logical block 0, lost async page write
[   85.774450] EXT4-fs error (device dm-0): ext4_check_bdev_write_error:193: comm kworker/u16:8: Error while async write back metadata
[   91.415513] mark_buffer_dirty: bh=0xffff88842f18bdd0
[   91.417038] ------------[ cut here ]------------
[   91.418450] WARNING: CPU: 1 PID: 1944 at fs/buffer.c:1092 mark_buffer_dirty.cold+0x1c/0x5e
[   91.440322] Call Trace:
[   91.440652]  __jbd2_journal_temp_unlink_buffer+0x135/0x220
[   91.441354]  __jbd2_journal_unfile_buffer+0x24/0x90
[   91.441981]  __jbd2_journal_refile_buffer+0x134/0x1d0
[   91.442628]  jbd2_journal_commit_transaction+0x249a/0x3240
[   91.443336]  ? put_prev_entity+0x2a/0x200
[   91.443856]  ? kjournald2+0x12e/0x510
[   91.444324]  kjournald2+0x12e/0x510
[   91.444773]  ? woken_wake_function+0x30/0x30
[   91.445326]  kthread+0x150/0x1b0
[   91.445739]  ? commit_timeout+0x20/0x20
[   91.446258]  ? kthread_flush_worker+0xb0/0xb0
[   91.446818]  ret_from_fork+0x1f/0x30
[   91.447293] ---[ end trace 66f0b6bf3d1abade ]---

Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20210615090537.3423231-1-yebin10@huawei.com


Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

8a92b248

arm64/config: Set CONFIG_TXGBE=m by default · 56c6274b

zhenpengzheng authored 4 years ago


hulk inclusion
category: feature
bugzilla: 50777
CVE: NA

-------------------------------------------------------------------------

Ensure the netswift 10G NIC driver ko can be distributed in ISO on X86.

Signed-off-by: zhenpengzheng <zhenpengzheng@net-swift.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Jian Cheng <cj.chengjian@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

56c6274b

make bch_btree_check() to be multiple threads · b4a620ff

Coly Li authored 4 years ago

mainline inclusion
from mainline-v5.7-rc1
commit 8e710227
category: performance
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=327


CVE: NA

When registering a cache device, bch_btree_check() is called to check
all btree nodes, to make sure the btree is consistent and not
corrupted.

bch_btree_check() is recursively executed in a single thread, when there
are a lot of data cached and the btree is huge, it may take very long
time to check all the btree nodes. In my testing, I observed it took
around 50 minutes to finish bch_btree_check().

When checking the bcache btree nodes, the cache set is not running yet,
and indeed the whole tree is in read-only state, it is safe to create
multiple threads to check the btree in parallel.

This patch tries to create multiple threads, and each thread tries to
one-by-one check the sub-tree indexed by a key from the btree root node.
The parallel thread number depends on how many keys in the btree root
node. At most BCH_BTR_CHKTHREAD_MAX (64) threads can be created, but in
practice is should be min(cpu-number/2, root-node-keys-number).

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: qinghaixiang <xuweiqhx@163.com>
Signed-off-by: Xu Wei <xuwei56@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: Li Ruilin <liruilin4@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

b4a620ff

Make compile successful when CONFIG_BCACHE is not set. · b278db77

Xu Wei authored 4 years ago

euleros inclusion
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=327


CVE: NA

When kernel config don't enbale CONFIG_BCACHE, compiling bcache module will
fail. This patch add the judgment for CONFIG_BCACHE macro to make sure
compiling bcache module success.

Signed-off-by: qinghaixiang <xuweiqhx@163.com>
Signed-off-by: Xu Wei <xuwei56@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: Li Ruilin <liruilin4@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

b278db77

Move only dirty data when gc runnning, in order to reducing write amplification. · 59a67a69

Xu Wei authored 4 years ago

euleros inclusion
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=327


CVE: NA

Bcache will move all data, including clean and dirty data, in bucket when
gc running. This will cause big write amplification, which may reduce the
cache device's life. This patch provice a switch for gc to move only dirty
data, which can reduce write amplification.

Signed-off-by: qinghaixiang <xuweiqhx@163.com>
Signed-off-by: Xu Wei <xuwei56@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: Li Ruilin <liruilin4@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

59a67a69

Add traffic policy for low cache available. · cb004e12

Xu Wei authored 4 years ago

euleros inclusion
category: feature
bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=327


CVE: NA

When cache available is low, bcache turn to writethrough mode. Therefore,
All write IO will be directly sent to backend device, which is usually
HDD. At same time, cache device flush dirty data to the backend device
in the bcache writeback process. So write IO from user will damage the
sequentiality of writeback. And if there is lots of IO from writeback,
user's write IO may be block. This patch add traffic policy in bcache
to solve the problem and improve the performance for bcache when cache
available is low.

Signed-off-by: qinghaixiang <xuweiqhx@163.com>
Signed-off-by: Xu Wei <xuwei56@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: Li Ruilin <liruilin4@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

cb004e12

igmp: Add ip_mc_list lock in ip_check_mc_rcu · 06d62ce5

Liu Jian authored 4 years ago


mainline inclusion
from mainline-net-next
commit 23d2b94043ca8835bd1e67749020e839f396a1c2
category: bugfix
bugzilla: NA
CVE: NA

--------------------------------

I got below panic when doing fuzz test:

Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 4056 Comm: syz-executor.3 Tainted: G    B             5.14.0-rc1-00195-gcff5c4254439-dirty #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack_lvl+0x7a/0x9b
panic+0x2cd/0x5af
end_report.cold+0x5a/0x5a
kasan_report+0xec/0x110
ip_check_mc_rcu+0x556/0x5d0
__mkroute_output+0x895/0x1740
ip_route_output_key_hash_rcu+0x2d0/0x1050
ip_route_output_key_hash+0x182/0x2e0
ip_route_output_flow+0x28/0x130
udp_sendmsg+0x165d/0x2280
udpv6_sendmsg+0x121e/0x24f0
inet6_sendmsg+0xf7/0x140
sock_sendmsg+0xe9/0x180
____sys_sendmsg+0x2b8/0x7a0
___sys_sendmsg+0xf0/0x160
__sys_sendmmsg+0x17e/0x3c0
__x64_sys_sendmmsg+0x9e/0x100
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x462eb9
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8
 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f3df5af1c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462eb9
RDX: 0000000000000312 RSI: 0000000020001700 RDI: 0000000000000007
RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f3df5af26bc
R13: 00000000004c372d R14: 0000000000700b10 R15: 00000000ffffffff

It is one use-after-free in ip_check_mc_rcu.
In ip_mc_del_src, the ip_sf_list of pmc has been freed under pmc->lock protection.
But access to ip_sf_list in ip_check_mc_rcu is not protected by the lock.

Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

06d62ce5

Jul 18, 2021

memcg: fix unsuitable null check after alloc memory · bed274bc

卢佳琳 authored 4 years ago

hulk inclusion
category: bugfix
bugzilla: 51815, https://gitee.com/openeuler/kernel/issues/I3IJ9I


CVE: NA

--------

Signed-off-by: Lu Jialin <lujialin4@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

bed274bc

Jul 16, 2021

cpuidle: fix a build error when compiling haltpoll into module · f6ca4176

GONG, Ruiqi authored 4 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZURN


CVE: NA

--------

Kernel build would fail in case of CONFIG_HALTPOLL_CPUIDLE=m, caused by
haltpoll_switch_governor() not marked as an exported symbol. Fix this
by complementing the EXPORT_SYMBOL statement.

Fixes: 97c22788 ("cpuidle: fix container_of err in cpuidle_device and cpuidle_driver")
Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Cc: Jiajun Chen <chenjiajun8@huawei.com>
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>

4.19.90-2107.2.0

f6ca4176

config: enable KASAN and UBSAN by default · e01c1bf7

Yang Yingliang authored 4 years ago


hulk inclusion
category: bugfix
bugzilla: NA
CVE: NA

--------------------------------

Enable KASAN and UBSAN by default for test.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

e01c1bf7

KVM: x86: expose AVX512_BF16 feature to guest · bbea3a3a

Jing Liu authored 4 years ago

mainline inclusion
from mainline-v5.3-rc1
commit 0b774629
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3YAEG
CVE: NA

-----------------------------

AVX512 BFLOAT16 instructions support 16-bit BFLOAT16 floating-point
format (BF16) for deep learning optimization.

Intel adds AVX512 BFLOAT16 feature in CooperLake, which is CPUID.7.1.EAX[5].

Detailed information of the CPUID bit can be found here,
https://software.intel.com/sites/default/files/managed/c5/15/\


architecture-instruction-set-extensions-programming-reference.pdf.

Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
[Fix type mismatch in min, changing constant "1" to "1u". - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

bbea3a3a

KVM: cpuid: remove has_leaf_count from struct kvm_cpuid_param · 16b22d73

Paolo Bonzini authored 4 years ago

mainline inclusion
from mainline-v5.3-rc1
commit 60cec433
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3YAEG


CVE: NA

-----------------------------

The has_leaf_count member was originally added for KVM's paravirtualization
CPUID leaves.  However, since then the leaf count _has_ been added to those
leaves as well, so we can drop that special case.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

16b22d73

KVM: cpuid: rename do_cpuid_1_ent · a61f03da

Paolo Bonzini authored 4 years ago

mainline inclusion
from mainline-v5.3-rc1
commit 50a9e1a4
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3YAEG


CVE: NA

-----------------------------

do_cpuid_1_ent does not do the entire processing for a CPUID entry, it
only retrieves the host's values.  Rename it to match reality.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

a61f03da

KVM: cpuid: set struct kvm_cpuid_entry2 flags in do_cpuid_1_ent · ad2a90fb

Paolo Bonzini authored 4 years ago

mainline inclusion
from mainline-v5.3-rc1
commit d9aadaf6
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3YAEG


CVE: NA

-----------------------------

do_cpuid_1_ent is typically called in two places by __do_cpuid_func
for CPUID functions that have subleafs.  Both places have to set
the KVM_CPUID_FLAG_SIGNIFCANT_INDEX.  Set that flag, and
KVM_CPUID_FLAG_STATEFUL_FUNC as well, directly in do_cpuid_1_ent.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

ad2a90fb

KVM: cpuid: extract do_cpuid_7_mask and support multiple subleafs · 18f2e790

Paolo Bonzini authored 4 years ago

mainline inclusion
from mainline-v5.3-rc1
commit 54d360d4
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3YAEG


CVE: NA

-----------------------------

CPUID function 7 has multiple subleafs.  Instead of having nested
switch statements, move the logic to filter supported features to
a separate function, and call it for each subleaf.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

18f2e790

KVM: cpuid: do_cpuid_ent works on a whole CPUID function · b2c1c889

Paolo Bonzini authored 4 years ago

mainline inclusion
from mainline-v5.3-rc1
commit ab8bcf64
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3YAEG


CVE: NA

-----------------------------

Rename it as well as __do_cpuid_ent and __do_cpuid_ent_emulated to have
"func" in its name, and drop the index parameter which is always 0.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

b2c1c889

Jul 14, 2021

ext4: fix possible UAF when remounting r/o a mmp-protected file system · 7d7d26aa

Theodore Ts'o authored 4 years ago

mainline inclusion
from mainline-5.14
commit	61bb4a1c417e5b95d9edb4f887f131de32e419cb
category: bugfix
bugzilla: 173880
CVE: NA

-------------------------------------------------

After commit 618f003199c6 ("ext4: fix memory leak in
ext4_fill_super"), after the file system is remounted read-only, there
is a race where the kmmpd thread can exit, causing sbi->s_mmp_tsk to
point at freed memory, which the call to ext4_stop_mmpd() can trip
over.

Fix this by only allowing kmmpd() to exit when it is stopped via
ext4_stop_mmpd().

Link: https://lore.kernel.org/r/20210707002433.3719773-1-tytso@mit.edu


Reported-by: Ye Bin <yebin10@huawei.com>
Bug-Report-Link: <20210629143603.2166962-1-yebin10@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>

Conflicts:
	fs/ext4/super.c

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

7d7d26aa

locks: Fix UBSAN undefined behaviour in flock64_to_posix_lock · 1b84bd6f

Luo Meng authored 4 years ago


mainline inclusion
from mainline-v5.11-rc1
commit 16238415eb9886328a89fe7a3cb0b88c7335fe16
category: bugfix
bugzilla: 38689
CVE: NA

-----------------------------------------------

When the sum of fl->fl_start and l->l_len overflows,
UBSAN shows the following warning:

UBSAN: Undefined behaviour in fs/locks.c:482:29
signed integer overflow: 2 + 9223372036854775806
cannot be represented in type 'long long int'
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xe4/0x14e lib/dump_stack.c:118
 ubsan_epilogue+0xe/0x81 lib/ubsan.c:161
 handle_overflow+0x193/0x1e2 lib/ubsan.c:192
 flock64_to_posix_lock fs/locks.c:482 [inline]
 flock_to_posix_lock+0x595/0x690 fs/locks.c:515
 fcntl_setlk+0xf3/0xa90 fs/locks.c:2262
 do_fcntl+0x456/0xf60 fs/fcntl.c:387
 __do_sys_fcntl fs/fcntl.c:483 [inline]
 __se_sys_fcntl fs/fcntl.c:468 [inline]
 __x64_sys_fcntl+0x12d/0x180 fs/fcntl.c:468
 do_syscall_64+0xc8/0x5a0 arch/x86/entry/common.c:293
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fix it by parenthesizing 'l->l_len - 1'.

Signed-off-by: Luo Meng <luomeng12@huawei.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Luo Meng <luomeng12@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

1b84bd6f

iomap: Mark read blocks uptodate in write_begin · a259beb0

Matthew Wilcox (Oracle) authored 4 years ago


mainline inclusion
from mainline-v5.10
commit 14284fed
category: bugfix
bugzilla: 43547
CVE: NA

-----------------------------------------------

When bringing (portions of) a page uptodate, we were marking blocks that
were zeroed as being uptodate, but not blocks that were read from storage.

Like the previous commit, this problem was found with generic/127 and
a kernel which failed readahead I/Os.  This bug causes writes to be
silently lost when working with flaky storage.

Fixes: 9dc55f13 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

conflicts:
fs/iomap.c

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

a259beb0

iomap: Clear page error before beginning a write · b3a0aab5

Matthew Wilcox (Oracle) authored 4 years ago


mainline inclusion
from mainline-v5.10-rc1
commit e6e7ca92
category: bugfix
bugzilla: 43551
CVE: NA

-----------------------------------------------

If we find a page in write_begin which is !Uptodate, we need
to clear any error on the page before starting to read data
into it.  This matches how filemap_fault(), do_read_cache_page()
and generic_file_buffered_read() handle PageError on !Uptodate pages.
When calling iomap_set_range_uptodate() in __iomap_write_begin(), blocks
were not being marked as uptodate.

This was found with generic/127 and a specially modified kernel which
would fail (some) readahead I/Os.  The test read some bytes in a prior
page which caused readahead to extend into page 0x34.  There was
a subsequent write to page 0x34, followed by a read to page 0x34.
Because the blocks were still marked as !Uptodate, the read caused all
blocks to be re-read, overwriting the write.  With this change, and the
next one, the bytes which were written are marked as being Uptodate, so
even though the page is still marked as !Uptodate, the blocks containing
the written data are not re-read from storage.

Fixes: 9dc55f13 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

conflicts:
fs/iomap.c

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

b3a0aab5

iomap: move the zeroing case out of iomap_read_page_sync · 9fdbab15

Christoph Hellwig authored 4 years ago


mainline inclusion
from mainline-v5.5-rc1
commit d3b40439
category: bugfix
bugzilla: 43551
CVE: NA

-----------------------------------------------

That keeps the function a little easier to understand, and easier to
modify for pending enhancements.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

conflicts:
fs/iomap.c

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

9fdbab15

nbd: handle device refs for DESTROY_ON_DISCONNECT properly · c72b1648

Josef Bacik authored 4 years ago


mainline inclusion
from mainline-5.12-rc1
commit c9a2f90f4d6b
category: bugfix
bugzilla: 50455
CVE: NA

-------------------------------------------------

There exists a race where we can be attempting to create a new nbd
configuration while a previous configuration is going down, both
configured with DESTROY_ON_DISCONNECT.  Normally devices all have a
reference of 1, as they won't be cleaned up until the module is torn
down.  However with DESTROY_ON_DISCONNECT we'll make sure that there is
only 1 reference (generally) on the device for the config itself, and
then once the config is dropped, the device is torn down.

The race that exists looks like this

TASK1					TASK2
nbd_genl_connect()
  idr_find()
    refcount_inc_not_zero(nbd)
      * count is 2 here ^^
					nbd_config_put()
					  nbd_put(nbd) (count is 1)
    setup new config
      check DESTROY_ON_DISCONNECT
	put_dev = true
    if (put_dev) nbd_put(nbd)
	* free'd here ^^

In nbd_genl_connect() we assume that the nbd ref count will be 2,
however clearly that won't be true if the nbd device had been setup as
DESTROY_ON_DISCONNECT with its prior configuration.  Fix this by getting
rid of the runtime flag to check if we need to mess with the nbd device
refcount, and use the device NBD_DESTROY_ON_DISCONNECT flag to check if
we need to adjust the ref counts.  This was reported by syzkaller with
the following kasan dump

BUG: KASAN: use-after-free in instrument_atomic_read include/linux/instrumented.h:71 [inline]
BUG: KASAN: use-after-free in atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
BUG: KASAN: use-after-free in refcount_dec_not_one+0x71/0x1e0 lib/refcount.c:76
Read of size 4 at addr ffff888143bf71a0 by task systemd-udevd/8451

CPU: 0 PID: 8451 Comm: systemd-udevd Not tainted 5.11.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:120
 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:230
 __kasan_report mm/kasan/report.c:396 [inline]
 kasan_report.cold+0x79/0xd5 mm/kasan/report.c:413
 check_memory_region_inline mm/kasan/generic.c:179 [inline]
 check_memory_region+0x13d/0x180 mm/kasan/generic.c:185
 instrument_atomic_read include/linux/instrumented.h:71 [inline]
 atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
 refcount_dec_not_one+0x71/0x1e0 lib/refcount.c:76
 refcount_dec_and_mutex_lock+0x19/0x140 lib/refcount.c:115
 nbd_put drivers/block/nbd.c:248 [inline]
 nbd_release+0x116/0x190 drivers/block/nbd.c:1508
 __blkdev_put+0x548/0x800 fs/block_dev.c:1579
 blkdev_put+0x92/0x570 fs/block_dev.c:1632
 blkdev_close+0x8c/0xb0 fs/block_dev.c:1640
 __fput+0x283/0x920 fs/file_table.c:280
 task_work_run+0xdd/0x190 kernel/task_work.c:140
 tracehook_notify_resume include/linux/tracehook.h:189 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
 exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201
 __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
 syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fc1e92b5270
Code: 73 01 c3 48 8b 0d 38 7d 20 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 59 c1 20 00 00 75 10 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee fb ff ff 48 89 04 24
RSP: 002b:00007ffe8beb2d18 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000007 RCX: 00007fc1e92b5270
RDX: 000000000aba9500 RSI: 0000000000000000 RDI: 0000000000000007
RBP: 00007fc1ea16f710 R08: 000000000000004a R09: 0000000000000008
R10: 0000562f8cb0b2a8 R11: 0000000000000246 R12: 0000000000000000
R13: 0000562f8cb0afd0 R14: 0000000000000003 R15: 000000000000000e

Allocated by task 1:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
 kasan_set_track mm/kasan/common.c:46 [inline]
 set_alloc_info mm/kasan/common.c:401 [inline]
 ____kasan_kmalloc.constprop.0+0x82/0xa0 mm/kasan/common.c:429
 kmalloc include/linux/slab.h:552 [inline]
 kzalloc include/linux/slab.h:682 [inline]
 nbd_dev_add+0x44/0x8e0 drivers/block/nbd.c:1673
 nbd_init+0x250/0x271 drivers/block/nbd.c:2394
 do_one_initcall+0x103/0x650 init/main.c:1223
 do_initcall_level init/main.c:1296 [inline]
 do_initcalls init/main.c:1312 [inline]
 do_basic_setup init/main.c:1332 [inline]
 kernel_init_freeable+0x605/0x689 init/main.c:1533
 kernel_init+0xd/0x1b8 init/main.c:1421
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296

Freed by task 8451:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
 kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:356
 ____kasan_slab_free+0xe1/0x110 mm/kasan/common.c:362
 kasan_slab_free include/linux/kasan.h:192 [inline]
 slab_free_hook mm/slub.c:1547 [inline]
 slab_free_freelist_hook+0x5d/0x150 mm/slub.c:1580
 slab_free mm/slub.c:3143 [inline]
 kfree+0xdb/0x3b0 mm/slub.c:4139
 nbd_dev_remove drivers/block/nbd.c:243 [inline]
 nbd_put.part.0+0x180/0x1d0 drivers/block/nbd.c:251
 nbd_put drivers/block/nbd.c:295 [inline]
 nbd_config_put+0x6dd/0x8c0 drivers/block/nbd.c:1242
 nbd_release+0x103/0x190 drivers/block/nbd.c:1507
 __blkdev_put+0x548/0x800 fs/block_dev.c:1579
 blkdev_put+0x92/0x570 fs/block_dev.c:1632
 blkdev_close+0x8c/0xb0 fs/block_dev.c:1640
 __fput+0x283/0x920 fs/file_table.c:280
 task_work_run+0xdd/0x190 kernel/task_work.c:140
 tracehook_notify_resume include/linux/tracehook.h:189 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
 exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201
 __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
 syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

The buggy address belongs to the object at ffff888143bf7000
 which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 416 bytes inside of
 1024-byte region [ffff888143bf7000, ffff888143bf7400)
The buggy address belongs to the page:
page:000000005238f4ce refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x143bf0
head:000000005238f4ce order:3 compound_mapcount:0 compound_pincount:0
flags: 0x57ff00000010200(slab|head)
raw: 057ff00000010200 ffffea00004b1400 0000000300000003 ffff888010c41140
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888143bf7080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888143bf7100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888143bf7180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                               ^
 ffff888143bf7200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Reported-and-tested-by:  <syzbot+429d3f82d757c211bff3@syzkaller.appspotmail.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Luo Meng <luomeng12@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

c72b1648

cifs: Fix leak when handling lease break for cached root fid · 25ebc65f

Paul Aurich authored 4 years ago


mainline inclusion
from mainline-5.9-rc1
commit baf57b56
category: bugfix
bugzilla: 40791
CVE: NA

---------------------------

Handling a lease break for the cached root didn't free the
smb2_lease_break_work allocation, resulting in a leak:

    unreferenced object 0xffff98383a5af480 (size 128):
      comm "cifsd", pid 684, jiffies 4294936606 (age 534.868s)
      hex dump (first 32 bytes):
        c0 ff ff ff 1f 00 00 00 88 f4 5a 3a 38 98 ff ff  ..........Z:8...
        88 f4 5a 3a 38 98 ff ff 80 88 d6 8a ff ff ff ff  ..Z:8...........
      backtrace:
        [<0000000068957336>] smb2_is_valid_oplock_break+0x1fa/0x8c0
        [<0000000073b70b9e>] cifs_demultiplex_thread+0x73d/0xcc0
        [<00000000905fa372>] kthread+0x11c/0x150
        [<0000000079378e4e>] ret_from_fork+0x22/0x30

Avoid this leak by only allocating when necessary.

Fixes: a93864d9 ("cifs: add lease tracking to the cached root fid")
Signed-off-by: Paul Aurich <paul@darkrain42.org>
CC: Stable <stable@vger.kernel.org> # v4.18+
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Conflicts:
  fs/cifs/smb2misc.c
  [ Not apply 9bd45408("CIFS: Properly process SMB3 lease breaks") ]
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

25ebc65f

Jul 12, 2021

mm/memcontrol.c: fix kasan slab-out-of-bounds in mem_cgroup_css_alloc · 7395879d

卢佳琳 authored 4 years ago

hulk inclusion
category: bugfix
bugzilla: 51815, https://gitee.com/openeuler/kernel/issues/I3IJ9I


CVE: NA

--------

static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
{
...
pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
if (!pn)
	return 1;

	pnext = to_mgpn_ext(pn);
	pnext->lruvec_stat_local = alloc_percpu(struct lruvec_stat);
}
the size of pnext is larger than pn, so pnext->lruvec_stat_local is out
of bounds

Signed-off-by: Lu Jialin <lujialin4@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

7395879d

module: limit enabling module.sig_enforce · e1f55683

Mimi Zohar authored 4 years ago


stable inclusion
from linux-4.19.196
commit ff660863628fb144badcb3395cde7821c82c13a6
CVE: CVE-2021-35039

--------------------------------

[ Upstream commit 0c18f29aae7ce3dadd26d8ee3505d07cc982df75 ]

Irrespective as to whether CONFIG_MODULE_SIG is configured, specifying
"module.sig_enforce=1" on the boot command line sets "sig_enforce".
Only allow "sig_enforce" to be set when CONFIG_MODULE_SIG is configured.

This patch makes the presence of /sys/module/module/parameters/sig_enforce
dependent on CONFIG_MODULE_SIG=y.

Fixes: fda784e5 ("module: export module signature enforcement status")
Reported-by: Nayna Jain <nayna@linux.ibm.com>
Tested-by: Mimi Zohar <zohar@linux.ibm.com>
Tested-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

e1f55683

selftests/bpf: add test_spec_readahead_xfs_file to support specail async readahead · 69513cfb

Yufen Yu authored 4 years ago


hulk inclusion
category: feature
bugzilla: 173267
CVE: NA
---------------------------

For hibench applications, likely kmeans, wordcount, terasort,
we can try to use this bpf tool to improve io performance.

Usage:
	make -C bpf
	./test_xfs_file spec_readahead

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

69513cfb

mm: support special async readahead · cc549516

Yufen Yu authored 4 years ago


hulk inclusion
category: feature
bugzilla: 173267
CVE: NA
---------------------------

For hibench applications, include kmeans, wordcount and terasort,
they will read whole blk_xxx and blk_xxx.meta from disk in sequential.
And almost all of the read issued to disk are triggered by async
readahead.

While sequential read of single thread does't means sequential io
on disk when multiple threads cocurrently do that. Multiple threads
interleaving sequentail read can make io issued into disk become
random, which will limit disk IO throughput.

To reduce disk randomization, we can consider to increase readahead
window. Then IO generated by filesystem will be bigger in each time
of async readahead. But, limited by disk max_hw_sectors_kb, big IO
will be split and the whole bio need to wait all split bios complete,
which can cause longer io latency.

Our trace shows that many long latency in threads are caused by waiting
async readahead IO complete when set readahead window with a big value.
That means, thread read read speed is faster than async readahead io
complete.

To improve performance, we try to provide a special async readahead
method:

    * On the one hand, we try to read more sequential data from disk,
      which can reduce disk randomization when multiple thread
      interleaving.

    * On the other hand, size of each IO issued to disk is 2M, which
      can avoid big IO split and long io latency.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

cc549516

selftests/bpf: test_xfs_file support to clear FMODE_RANDOM · 38abc1bb

Yufen Yu authored 4 years ago


hulk inclusion
category: feature
bugzilla: 173267
CVE: NA
---------------------------

If ra->prev_pos page index is equal to current pos, that means
it is sequential read, then clear FMODE_RANDOM flag to enable
async readahead.

Usage:
	make -C bpf
	./test_xfs_file clear

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

38abc1bb