Commits · 589b2a6c46843453b69338e86a81e1f95f4b44df · Summer2022 / 22b970264

Sep 29, 2022

Revert "cifs: fix double free race when mount fails in cifs_get_root()" · 589b2a6c

Luo Meng authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5TMYD


CVE: NA

--------------------------------

This reverts commit 7959a470.

Commit 2fe0e281f7ad (cifs: fix double free race when mount fails
in cifs_get_root()) fixes a double free. However there is no such
issue on 4.19 because it will return after cifs_cleanup_volume_info().

Since merge this patch, cifs_cleanup_volume_info() is skipped, leading
to a memory leak.

Signed-off-by: Luo Meng <luomeng12@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

589b2a6c

Sep 26, 2022

ext4: fix use-after-free in ext4_ext_shift_extents · ae52ee4a

Baokun Li authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: 187600, https://gitee.com/openeuler/kernel/issues/I5SV2U


CVE: NA

--------------------------------

If the starting position of our insert range happens to be in the hole
between the two ext4_extent_idx, because the lblk of the ext4_extent in
the previous ext4_extent_idx is always less than the start, which leads
to the "extent" variable access across the boundary, the following UAF is
triggered:

==================================================================
BUG: KASAN: use-after-free in ext4_ext_shift_extents+0x257/0x790
Read of size 4 at addr ffff88819807a008 by task fallocate/8010
CPU: 3 PID: 8010 Comm: fallocate Tainted: G            E     5.10.0+ #492
Call Trace:
 dump_stack+0x7d/0xa3
 print_address_description.constprop.0+0x1e/0x220
 kasan_report.cold+0x67/0x7f
 ext4_ext_shift_extents+0x257/0x790
 ext4_insert_range+0x5b6/0x700
 ext4_fallocate+0x39e/0x3d0
 vfs_fallocate+0x26f/0x470
 ksys_fallocate+0x3a/0x70
 __x64_sys_fallocate+0x4f/0x60
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
==================================================================

For right shifts, we can divide them into the following situations：

1. When the first ee_block of ext4_extent_idx is greater than or equal to
   start, make right shifts directly from the first ee_block.
    1) If it is greater than start, we need to continue searching in the
       previous ext4_extent_idx.
    2) If it is equal to start, we can exit the loop (iterator=NULL).

2. When the first ee_block of ext4_extent_idx is less than start, then
   traverse from the last extent to find the first extent whose ee_block
   is less than start.
    1) If extent is still the last extent after traversal, it means that
       the last ee_block of ext4_extent_idx is less than start, that is,
       start is located in the hole between idx and (idx+1), so we can
       exit the loop directly (break) without right shifts.
    2) Otherwise, make right shifts at the corresponding position of the
       found extent, and then exit the loop (iterator=NULL).

Fixes: 331573fe ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate")
Cc: stable@vger.kernel.org
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

ae52ee4a

quota: Add more checking after reading from quota file · f66997d9

Zhihao Cheng authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: 187046, https://gitee.com/openeuler/kernel/issues/I5QH0X


CVE: NA

--------------------------------

It would be better to do more sanity checking (eg. dqdh_entries,
block no.) for the content read from quota file, which can prevent
corrupting the quota file.

Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

f66997d9

quota: Replace all block number checking with helper function · 1e9a49cf

Zhihao Cheng authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: 187046, https://gitee.com/openeuler/kernel/issues/I5QH0X


CVE: NA

--------------------------------

Cleanup all block checking places, replace them with helper function
do_check_range().

Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

1e9a49cf

quota: Check next/prev free block number after reading from quota file · 6c27d754

Zhihao Cheng authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: 187046, https://gitee.com/openeuler/kernel/issues/I5QH0X
CVE: NA

--------------------------------

Following process:
 Init: v2_read_file_info: <3> dqi_free_blk 0 dqi_free_entry 5 dqi_blks 6

 Step 1. chown bin f_a -> dquot_acquire -> v2_write_dquot:
  qtree_write_dquot
   do_insert_tree
    find_free_dqentry
     get_free_dqblk
      write_blk(info->dqi_blocks) // info->dqi_blocks = 6, failure. The
	   content in physical block (corresponding to blk 6) is random.

 Step 2. chown root f_a -> dquot_transfer -> dqput_all -> dqput ->
         ext4_release_dquot -> v2_release_dquot -> qtree_delete_dquot:
  dquot_release
   remove_tree
    free_dqentry
     put_free_dqblk(6)
      info->dqi_free_blk = blk    // info->dqi_free_blk = 6

 Step 3. drop cache (buffer head for block 6 is released)

 Step 4. chown bin f_b -> dquot_acquire -> commit_dqblk -> v2_write_dquot:
  qtree_write_dquot
   do_insert_tree
    find_free_dqentry
     get_free_dqblk
      dh = (struct qt_disk_dqdbheader *)buf
      blk = info->dqi_free_blk     // 6
      ret = read_blk(info, blk, buf)  // The content of buf is random
      info->dqi_free_blk = le32_to_cpu(dh->dqdh_next_free)  // random blk

 Step 5. chown bin f_c -> notify_change -> ext4_setattr -> dquot_transfer:
  dquot = dqget -> acquire_dquot -> ext4_acquire_dquot -> dquot_acquire ->
          commit_dqblk -> v2_write_dquot -> dq_insert_tree:
   do_insert_tree
    find_free_dqentry
     get_free_dqblk
      blk = info->dqi_free_blk    // If blk < 0 and blk is not an error
				     code, it will be returned as dquot

  transfer_to[USRQUOTA] = dquot  // A random negative value
  __dquot_transfer(transfer_to)
   dquot_add_inodes(transfer_to[cnt])
    spin_lock(&dquot->dq_dqb_lock)  // page fault

, which will lead to kernel page fault:
 Quota error (device sda): qtree_write_dquot: Error -8000 occurred
 while creating quota
 BUG: unable to handle page fault for address: ffffffffffffe120
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 Oops: 0002 [#1] PREEMPT SMP
 CPU: 0 PID: 5974 Comm: chown Not tainted 6.0.0-rc1-00004
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
 RIP: 0010:_raw_spin_lock+0x3a/0x90
 Call Trace:
  dquot_add_inodes+0x28/0x270
  __dquot_transfer+0x377/0x840
  dquot_transfer+0xde/0x540
  ext4_setattr+0x405/0x14d0
  notify_change+0x68e/0x9f0
  chown_common+0x300/0x430
  __x64_sys_fchownat+0x29/0x40

In order to avoid accessing invalid quota memory address, this patch adds
block number checking of next/prev free block read from quota file.

Fetch a reproducer in [Link].

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216372


Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

6c27d754

Sep 20, 2022

jfs: prevent NULL deref in diFree · 01334116

Haimin Zhang authored 2 years ago

stable inclusion
from stable-v5.10.111
commit b9c5ac0a15f24d63b20f899072fa6dd8c93af136
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5RX0N?from=project-issue


CVE: CVE-2022-3202

--------------------------------

[ Upstream commit a53046291020ec41e09181396c1e829287b48d47 ]

Add validation check for JFS_IP(ipimap)->i_imap to prevent a NULL deref
in diFree since diFree uses it without do any validations.
When function jfs_mount calls diMount to initialize fileset inode
allocation map, it can fail and JFS_IP(ipimap)->i_imap won't be
initialized. Then it calls diFreeSpecial to close fileset inode allocation
map inode and it will flow into jfs_evict_inode. Function jfs_evict_inode
just validates JFS_SBI(inode->i_sb)->ipimap, then calls diFree. diFree use
JFS_IP(ipimap)->i_imap directly, then it will cause a NULL deref.

Reported-by: TCS Robot <tcs_robot@tencent.com>
Signed-off-by: Haimin Zhang <tcs_kernel@tencent.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: ZhaoLong Wang <wangzhaolong1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

4.19.90-2209.5.0

01334116

jfs: fix GPF in diFree · 6a2d6565

Pavel Skripkin authored 2 years ago

mainline inclusion
from mainline-6.0-rc4
commit 9d574f985fe33efd6911f4d752de6f485a1ea732
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5RX0N?from=project-issue


CVE: CVE-2022-3202

--------------------------------

Avoid passing inode with
JFS_SBI(inode->i_sb)->ipimap == NULL to
diFree()[1]. GFP will appear:

	struct inode *ipimap = JFS_SBI(ip->i_sb)->ipimap;
	struct inomap *imap = JFS_IP(ipimap)->i_imap;

JFS_IP() will return invalid pointer when ipimap == NULL

Call Trace:
 diFree+0x13d/0x2dc0 fs/jfs/jfs_imap.c:853 [1]
 jfs_evict_inode+0x2c9/0x370 fs/jfs/inode.c:154
 evict+0x2ed/0x750 fs/inode.c:578
 iput_final fs/inode.c:1654 [inline]
 iput.part.0+0x3fe/0x820 fs/inode.c:1680
 iput+0x58/0x70 fs/inode.c:1670

Reported-and-tested-by:  <syzbot+0a89a7b56db04c21a656@syzkaller.appspotmail.com>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: ZhaoLong Wang <wangzhaolong1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

6a2d6565

Sep 07, 2022

ext4: avoid resizing to a partial cluster size · d4a36267

Kiselev, Oleg authored 2 years ago

stable inclusion
from stable-v4.19.256
commit 7bdfb01fc5f6b3696728aeb527c50386e0ee09a1
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ


CVE: NA

--------------------------------

[ Upstream commit 69cb8e9d8cd97cdf5e293b26d70a9dee3e35e6bd ]

This patch avoids an attempt to resize the filesystem to an
unaligned cluster boundary.  An online resize to a size that is not
integral to cluster size results in the last iteration attempting to
grow the fs by a negative amount, which trips a BUG_ON and leaves the fs
with a corrupted in-memory superblock.

Signed-off-by: Oleg Kiselev <okiselev@amazon.com>
Link: https://lore.kernel.org/r/0E92A0AB-4F16-4F1A-94B7-702CC6504FDE@amazon.com


Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

4.19.90-2209.2.0

d4a36267

NFSv4/pnfs: Fix a use-after-free bug in open · 40f93f4d

Trond Myklebust authored 2 years ago

stable inclusion
from stable-v4.19.256
commit 0fffb46ff3d5ed4668aca96441ec7a25b793bd6f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ


CVE: NA

--------------------------------

commit 2135e5d56278ffdb1c2e6d325dc6b87f669b9dac upstream.

If someone cancels the open RPC call, then we must not try to free
either the open slot or the layoutget operation arguments, since they
are likely still in use by the hung RPC call.

Fixes: 6949493884fe ("NFSv4: Don't hold the layoutget locks across multiple RPC calls")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

40f93f4d

NFSv4.1: RECLAIM_COMPLETE must handle EACCES · 520df356

Zhang Xianwei authored 2 years ago

stable inclusion
from stable-v4.19.256
commit a0b3a9b0e6832be88b426b242aa3320c3817e2fb
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ


CVE: NA

--------------------------------

commit e35a5e782f67ed76a65ad0f23a484444a95f000f upstream.

A client should be able to handle getting an EACCES error while doing
a mount operation to reclaim state due to NFS4CLNT_RECLAIM_REBOOT
being set. If the server returns RPC_AUTH_BADCRED because authentication
failed when we execute "exportfs -au", then RECLAIM_COMPLETE will go a
wrong way. After mount succeeds, all OPEN call will fail due to an
NFS4ERR_GRACE error being returned. This patch is to fix it by resending
a RPC request.

Signed-off-by: Zhang Xianwei <zhang.xianwei8@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Fixes: aa5190d0 ("NFSv4: Kill nfs4_async_handle_error() abuses by NFSv4.1")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

520df356

ext4: fix extent status tree race in writeback error recovery path · f3253e04

Eric Whitney authored 2 years ago

stable inclusion
from stable-v4.19.256
commit adf404e9d73dcae4f28fff9f7bcd857cfee2b7de
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ


CVE: NA

--------------------------------

commit 7f0d8e1d607c1a4fa9a27362a108921d82230874 upstream.

A race can occur in the unlikely event ext4 is unable to allocate a
physical cluster for a delayed allocation in a bigalloc file system
during writeback.  Failure to allocate a cluster forces error recovery
that includes a call to mpage_release_unused_pages().  That function
removes any corresponding delayed allocated blocks from the extent
status tree.  If a new delayed write is in progress on the same cluster
simultaneously, resulting in the addition of an new extent containing
one or more blocks in that cluster to the extent status tree, delayed
block accounting can be thrown off if that delayed write then encounters
a similar cluster allocation failure during future writeback.

Write lock the i_data_sem in mpage_release_unused_pages() to fix this
problem.  Ext4's block/cluster accounting code for bigalloc relies on
i_data_sem for mutual exclusion, as is found in the delayed write path,
and the locking in mpage_release_unused_pages() is missing.

Cc: stable@kernel.org
Reported-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Link: https://lore.kernel.org/r/20220615160530.1928801-1-enwlinux@gmail.com


Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

f3253e04

ext4: update s_overhead_clusters in the superblock during an on-line resize · 4901bf82

Theodore Ts'o authored 2 years ago

stable inclusion
from stable-v4.19.256
commit 38ac2511a56a74dfc6e0143bae58ef5fc4e318ea
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ


CVE: NA

--------------------------------

commit de394a86658ffe4e89e5328fd4993abfe41b7435 upstream.

When doing an online resize, the on-disk superblock on-disk wasn't
updated.  This means that when the file system is unmounted and
remounted, and the on-disk overhead value is non-zero, this would
result in the results of statfs(2) to be incorrect.

This was partially fixed by Commits 10b01ee92df5 ("ext4: fix overhead
calculation to account for the reserved gdt blocks"), 85d825dbf489
("ext4: force overhead calculation if the s_overhead_cluster makes no
sense"), and eb7054212eac ("ext4: update the cached overhead value in
the superblock").

However, since it was too expensive to forcibly recalculate the
overhead for bigalloc file systems at every mount, this didn't fix the
problem for bigalloc file systems.  This commit should address the
problem when resizing file systems with the bigalloc feature enabled.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20220629040026.112371-1-tytso@mit.edu


Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

4901bf82

ext4: make sure ext4_append() always allocates new block · 48d149dc

Lukas Czerner authored 2 years ago

stable inclusion
from stable-v4.19.256
commit 4816177c9f154594c8a841ed95b883eb07e4d6ce
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ


CVE: NA

--------------------------------

commit b8a04fe77ef1360fbf73c80fddbdfeaa9407ed1b upstream.

ext4_append() must always allocate a new block, otherwise we run the
risk of overwriting existing directory block corrupting the directory
tree in the process resulting in all manner of problems later on.

Add a sanity check to see if the logical block is already allocated and
error out if it is.

Cc: stable@kernel.org
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20220704142721.157985-2-lczerner@redhat.com


Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

48d149dc

fs: check FMODE_LSEEK to control internal pipe splicing · ea235ec6

Jason A. Donenfeld authored 2 years ago

stable inclusion
from stable-v4.19.256
commit ab37175dd3593b9098f2242e370a7b1af4c35368
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ


CVE: NA

--------------------------------

[ Upstream commit 97ef77c52b789ec1411d360ed99dca1efe4b2c81 ]

The original direct splicing mechanism from Jens required the input to
be a regular file because it was avoiding the special socket case. It
also recognized blkdevs as being close enough to a regular file. But it
forgot about chardevs, which behave the same way and work fine here.

This is an okayish heuristic, but it doesn't totally work. For example,
a few chardevs should be spliceable here. And a few regular files
shouldn't. This patch fixes this by instead checking whether FMODE_LSEEK
is set, which represents decently enough what we need rewinding for when
splicing to internal pipes.

Fixes: b92ce558 ("[PATCH] splice: add direct fd <-> fd splicing support")
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

ea235ec6

fs: Add missing umask strip in vfs_tmpfile · 844ace54

Yang Xu authored 2 years ago

stable inclusion
from stable-v4.19.256
commit a7abfbc07dedb69840a81f40f21696390436fa0e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ
CVE: NA

--------------------------------

commit ac6800e279a22b28f4fc21439843025a0d5bf03e upstream.

All creation paths except for O_TMPFILE handle umask in the vfs directly
if the filesystem doesn't support or enable POSIX ACLs. If the filesystem
does then umask handling is deferred until posix_acl_create().
Because, O_TMPFILE misses umask handling in the vfs it will not honor
umask settings. Fix this by adding the missing umask handling.

Link: https://lore.kernel.org/r/1657779088-2242-2-git-send-email-xuyang2018.jy@fujitsu.com


Fixes: 60545d0d ("[O_TMPFILE] it's still short a few helpers, but infrastructure should be OK now...")
Cc: <stable@vger.kernel.org> # 4.19+
Reported-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-and-Tested-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

844ace54

vfs: Check the truncate maximum size in inode_newsize_ok() · 131c56c6

David Howells authored 2 years ago

stable inclusion
from stable-v4.19.256
commit 76a633ff35b6434f9c93b87d5dcbc655bb52246a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0SQ


CVE: NA

--------------------------------

commit e2ebff9c57fe4eb104ce4768f6ebcccf76bef849 upstream.

If something manages to set the maximum file size to MAX_OFFSET+1, this
can cause the xfs and ext4 filesystems at least to become corrupt.

Ordinarily, the kernel protects against userspace trying this by
checking the value early in the truncate() and ftruncate() system calls
calls - but there are at least two places that this check is bypassed:

 (1) Cachefiles will round up the EOF of the backing file to DIO block
     size so as to allow DIO on the final block - but this might push
     the offset negative. It then calls notify_change(), but this
     inadvertently bypasses the checking. This can be triggered if
     someone puts an 8EiB-1 file on a server for someone else to try and
     access by, say, nfs.

 (2) ksmbd doesn't check the value it is given in set_end_of_file_info()
     and then calls vfs_truncate() directly - which also bypasses the
     check.

In both cases, it is potentially possible for a network filesystem to
cause a disk filesystem to be corrupted: cachefiles in the client's
cache filesystem; ksmbd in the server's filesystem.

nfsd is okay as it checks the value, but we can then remove this check
too.

Fix this by adding a check to inode_newsize_ok(), as called from
setattr_prepare(), thereby catching the issue as filesystems set up to
perform the truncate with minimal opportunity for bypassing the new
check.

Fixes: 1f08c925e7a3 ("cachefiles: Implement backing file wrangling")
Fixes: f44158485826 ("cifsd: add file operations")
Signed-off-by: David Howells <dhowells@redhat.com>
Reported-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Cc: stable@kernel.org
Acked-by: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Steve French <sfrench@samba.org>
cc: Hyunchul Lee <hyc.lee@gmail.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

131c56c6

Sep 06, 2022

configfs: fix a race in configfs_lookup() · 006398db

Sishuai Gong authored 2 years ago

mainline inclusion
from mainline-v5.15-rc1
commit c42dd069be8dfc9b2239a5c89e73bbd08ab35de0
category: bugfix
bugzilla: 187567, https://gitee.com/openeuler/kernel/issues/I5PK1G


CVE: NA

--------------------------------

When configfs_lookup() is executing list_for_each_entry(),
it is possible that configfs_dir_lseek() is calling list_del().
Some unfortunate interleavings of them can cause a kernel NULL
pointer dereference error

Thread 1                  Thread 2
//configfs_dir_lseek()    //configfs_lookup()
list_del(&cursor->s_sibling);
                         list_for_each_entry(sd, ...)

Fix this by grabbing configfs_dirent_lock in configfs_lookup()
while iterating ->s_children.

Signed-off-by: Sishuai Gong <sishuai@purdue.edu>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

006398db

configfs: fold configfs_attach_attr into configfs_lookup · 5e2e2285

Christoph Hellwig authored 2 years ago

mainline inclusion
from mainline-v5.15-rc1
commit d07f132a225c013e59aa77f514ad9211ecab82ee
category: bugfix
bugzilla: 187567, https://gitee.com/openeuler/kernel/issues/I5PK1G


CVE: NA

--------------------------------

This makes it more clear what gets added to the dcache and prepares
for an additional locking fix.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

5e2e2285

configfs: make configfs_create() return inode · ba59298f

Al Viro authored 2 years ago

mainline inclusion
from mainline-v5.4-rc1
commit 2743c515
category: bugfix
bugzilla: 187567, https://gitee.com/openeuler/kernel/issues/I5PK1G


CVE: NA

--------------------------------

Get rid of the callback, deal with that and dentry in callers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

ba59298f

configfs: factor dirent removal into helpers · d2151528

Christoph Hellwig authored 2 years ago

mainline inclusion
from mainline-v5.4-rc1
commit 1cf7a003
category: bugfix
bugzilla: 187567, https://gitee.com/openeuler/kernel/issues/I5PK1G


CVE: NA

--------------------------------

Lots of duplicated code that benefits from a little consolidation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

d2151528

configfs: simplify the configfs_dirent_is_ready · 2392db1c

Christoph Hellwig authored 2 years ago

mainline inclusion
from mainline-v5.15-rc1
commit 899587c8d0908e5124fd074d52bf05b4b0633a79
category: bugfix
bugzilla: 187567, https://gitee.com/openeuler/kernel/issues/I5PK1G


CVE: NA

--------------------------------

Return the error directly instead of using a goto.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

2392db1c

configfs: return -ENAMETOOLONG earlier in configfs_lookup · 6b2f49a9

Christoph Hellwig authored 2 years ago

mainline inclusion
from mainline-v5.15-rc1
commit 417b962ddeca2b70eb72d28c87541bdad4e234f8
category: bugfix
bugzilla: 187567, https://gitee.com/openeuler/kernel/issues/I5PK1G


CVE: NA

--------------------------------

Just like most other file systems: get the simple checks out of the
way first.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

6b2f49a9

Aug 27, 2022

CIFS: Fix retry mid list corruption on reconnects · 28b546c6

Pavel Shilovsky authored 2 years ago

mainline inclusion
from mainline-v5.4-rc5
commit abe57073
category: bugfix
bugzilla: 24367, https://gitee.com/openeuler/kernel/issues/I5OE1W


CVE: NA

--------------------------------

When the client hits reconnect it iterates over the mid
pending queue marking entries for retry and moving them
to a temporary list to issue callbacks later without holding
GlobalMid_Lock. In the same time there is no guarantee that
mids can't be removed from the temporary list or even
freed completely by another thread. It may cause a temporary
list corruption:

[  430.454897] list_del corruption. prev->next should be ffff98d3a8f316c0, but was 2e885cb266355469
[  430.464668] ------------[ cut here ]------------
[  430.466569] kernel BUG at lib/list_debug.c:51!
[  430.468476] invalid opcode: 0000 [#1] SMP PTI
[  430.470286] CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted 5.4.0-rc3+ #19
[  430.473472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  430.475872] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
...
[  430.510426] Call Trace:
[  430.511500]  cifs_reconnect+0x25e/0x610 [cifs]
[  430.513350]  cifs_readv_from_socket+0x220/0x250 [cifs]
[  430.515464]  cifs_read_from_socket+0x4a/0x70 [cifs]
[  430.517452]  ? try_to_wake_up+0x212/0x650
[  430.519122]  ? cifs_small_buf_get+0x16/0x30 [cifs]
[  430.521086]  ? allocate_buffers+0x66/0x120 [cifs]
[  430.523019]  cifs_demultiplex_thread+0xdc/0xc30 [cifs]
[  430.525116]  kthread+0xfb/0x130
[  430.526421]  ? cifs_handle_standard+0x190/0x190 [cifs]
[  430.528514]  ? kthread_park+0x90/0x90
[  430.530019]  ret_from_fork+0x35/0x40

Fix this by obtaining extra references for mids being retried
and marking them as MID_DELETED which indicates that such a mid
has been dequeued from the pending list.

Also move mid cleanup logic from DeleteMidQEntry to
_cifs_mid_q_entry_release which is called when the last reference
to a particular mid is put. This allows to avoid any use-after-free
of response buffers.

The patch needs to be backported to stable kernels. A stable tag
is not mentioned below because the patch doesn't apply cleanly
to any actively maintained stable kernel.

Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-and-tested-by: David Wysochanski <dwysocha@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

conflicts:
	fs/cifs/connect.c
	fs/cifs/transport.c

Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

28b546c6

Aug 08, 2022

exec: Force single empty string when argv is empty · 72b32d44

Kees Cook authored 2 years ago

stable inclusion
from stable-v4.19.246
commit b50fb8dbc8b81aaa126387de428f4c42a7c72a73
bugzilla: https://gitee.com/openeuler/kernel/issues/I5IGTA
CVE: NA

--------------------------------

exec: Force single empty string when argv is empty

commit dcd46d897adb70d63e025f175a00a89797d31a43 upstream.

Quoting[1] Ariadne Conill:

"In several other operating systems, it is a hard requirement that the
second argument to execve(2) be the name of a program, thus prohibiting
a scenario where argc < 1. POSIX 2017 also recommends this behaviour,
but it is not an explicit requirement[2]:

    The argument arg0 should point to a filename string that is
    associated with the process being started by one of the exec
    functions.
...
Interestingly, Michael Kerrisk opened an issue about this in 2008[3],
but there was no consensus to support fixing this issue then.
Hopefully now that CVE-2021-4034 shows practical exploitative use[4]
of this bug in a shellcode, we can reconsider.

This issue is being tracked in the KSPP issue tracker[5]."

While the initial code searches[6][7] turned up what appeared to be
mostly corner case tests, trying to that just reject argv == NULL
(or an immediately terminated pointer list) quickly started tripping[8]
existing userspace programs.

The next best approach is forcing a single empty string into argv and
adjusting argc to match. The number of programs depending on argc == 0
seems a smaller set than those calling execve with a NULL argv.

Account for the additional stack space in bprm_stack_limits(). Inject an
empty string when argc == 0 (and set argc = 1). Warn about the case so
userspace has some notice about the change:

    process './argc0' launched './argc0' with NULL argv: empty string added

Additionally WARN() and reject NULL argv usage for kernel threads.

[1] https://lore.kernel.org/lkml/20220127000724.15106-1-ariadne@dereferenced.org/
[2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html
[3] https://bugzilla.kernel.org/show_bug.cgi?id=8408
[4] https://www.qualys.com/2022/01/25/cve-2021-4034/pwnkit.txt
[5] https://github.com/KSPP/linux/issues/176
[6] https://codesearch.debian.net/search?q=execve%5C+*%5C%28%5B%5E%2C%5D%2B%2C+*NULL&literal=0
[7] https://codesearch.debian.net/search?q=execlp%3F%5Cs*%5C%28%5B%5E%2C%5D%2B%2C%5Cs*NULL&literal=0
[8] https://lore.kernel.org/lkml/20220131144352.GE16385@xsang-OptiPlex-9020/



Reported-by: Ariadne Conill <ariadne@dereferenced.org>
Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Ariadne Conill <ariadne@dereferenced.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20220201000947.2453721-1-keescook@chromium.org


[vegard: fixed conflicts due to missing
 886d7de6^- and
 3950e975^- and
 655c16a8]
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Zhao Wenhui <zhaowenhui8@huawei.com>
Reviewed-by: zheng zucheng <zhengzucheng@huawei.com>
Reviewed-by: Zhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

72b32d44

ext4: make variable "count" signed · fe95c98a

Ding Xiang authored 2 years ago

stable inclusion
from stable-4.19.249
commit c86fce1d8c083d268103d2d19e5a9a66000a8e6f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5I4FP


CVE: NA

--------------------------------

commit bc75a6eb856cb1507fa907bf6c1eda91b3fef52f upstream.

Since dx_make_map() may return -EFSCORRUPTED now, so change "count" to
be a signed integer so we can correctly check for an error code returned
by dx_make_map().

Fixes: 46c116b920eb ("ext4: verify dir block before splitting it")
Cc: stable@kernel.org
Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20220530100047.537598-1-dingxiang@cmss.chinamobile.com


Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

fe95c98a

Aug 05, 2022

io_uring: add missing item types for various requests · f0a1bfae

Jens Axboe authored 2 years ago

stable inclusion
from stable-v5.10.125
commit df3f3bb5059d20ef094d6b2f0256c4bf4127a859
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5IM3T


CVE: CVE-2022-2327

--------------------------------

Any read/write should grab current->nsproxy, denoted by IO_WQ_WORK_FILES
as it refers to current->files as well, and connect and recv/recvmsg,
send/sendmsg should grab current->fs which is denoted by IO_WQ_WORK_FS.

No upstream commit exists for this issue.

Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Conflicts:
1. read/write already grab current->nsproxy, thus don't modified
read/write related ops.
2. 'work_flags' doesn't exist, io_op_def is using field 'needs_fs' to
decide if 'current->fs' should be grabbed.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

f0a1bfae

Aug 04, 2022

io-wq: Switch io_wqe_worker's fs before releasing request · b56e6431

Zhihao Cheng authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: 187369, https://gitee.com/openeuler/kernel/issues/I5K66O


CVE: NA

--------------------------------

Following process triggers an use-after-free problem while iterating
every process's fs:

main
   fd = setup_iouring
   fork => main'                      // task->fs->users = 1
   submit(fd, STATX, async)
     id->fs = current->fs
     req->work.identity = id
    io_submit_sqes
     ...
      io_grab_identity
       id = req->work.identity
       id->fs->users++               // fs->users = 2

io_wqe_worker
    current->fs = work->identity->fs
    io_req_clean_work
      fs = req->work.identity->fs
      --fs->users                    // fs->user = 1

main' exit
    exit_fs
     --fs->users                     // fs->user = 0
     free_fs_struct(fs)              // FREE fs

pivot_root
chroot_fs_refs
 do_each_thread(g, p) {
  fs = p->fs                // io_wqe_worker->fs
  if (fs)
    spin_lock(&fs->lock)    // UAF!
 }

io_wqe_worker
    io_worker_exit
     __io_worker_unuse
      if (current->fs != worker->restore_fs)
        current->fs = worker->restore_fs

Task's fs_struct is used in do_work() and destroyed in free_work(),
this problem can be fixed by switching io_wqe_worker's fs before
releasing request.

Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

b56e6431

Jul 28, 2022

jbd2: Fix assertion 'jh->b_frozen_data == NULL' failure when journal aborted · 7dd12442

Zhihao Cheng authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: 187185, https://gitee.com/openeuler/kernel/issues/I5JAOC
CVE: NA

--------------------------------

Following process will fail assertion 'jh->b_frozen_data == NULL' in
jbd2_journal_dirty_metadata():

                   jbd2_journal_commit_transaction
unlink(dir/a)
 jh->b_transaction = trans1
 jh->b_jlist = BJ_Metadata
                    journal->j_running_transaction = NULL
                    trans1->t_state = T_COMMIT
unlink(dir/b)
 handle->h_trans = trans2
 do_get_write_access
  jh->b_modified = 0
  jh->b_frozen_data = frozen_buffer
  jh->b_next_transaction = trans2
 jbd2_journal_dirty_metadata
  is_handle_aborted
   is_journal_aborted // return false

           --> jbd2 abort <--

                     while (commit_transaction->t_buffers)
                      if (is_journal_aborted)
                       jbd2_journal_refile_buffer
                        __jbd2_journal_refile_buffer
                         WRITE_ONCE(jh->b_transaction,
						jh->b_next_transaction)
                         WRITE_ONCE(jh->b_next_transaction, NULL)
                         __jbd2_journal_file_buffer(jh, BJ_Reserved)
        J_ASSERT_JH(jh, jh->b_frozen_data == NULL) // assertion failure !

The reproducer (See detail in [Link]) reports:
 ------------[ cut here ]------------
 kernel BUG at fs/jbd2/transaction.c:1629!
 invalid opcode: 0000 [#1] PREEMPT SMP
 CPU: 2 PID: 584 Comm: unlink Tainted: G        W
 5.19.0-rc6-00115-g4a57a8400075-dirty #697
 RIP: 0010:jbd2_journal_dirty_metadata+0x3c5/0x470
 RSP: 0018:ffffc90000be7ce0 EFLAGS: 00010202
 Call Trace:
  <TASK>
  __ext4_handle_dirty_metadata+0xa0/0x290
  ext4_handle_dirty_dirblock+0x10c/0x1d0
  ext4_delete_entry+0x104/0x200
  __ext4_unlink+0x22b/0x360
  ext4_unlink+0x275/0x390
  vfs_unlink+0x20b/0x4c0
  do_unlinkat+0x42f/0x4c0
  __x64_sys_unlink+0x37/0x50
  do_syscall_64+0x35/0x80

After journal aborting, __jbd2_journal_refile_buffer() is executed with
holding @jh->b_state_lock, we can fix it by moving 'is_handle_aborted()'
into the area protected by @jh->b_state_lock.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216251


Fixes: 470decc6 ("[PATCH] jbd2: initial copy of files from jbd")
Conflicts:
	fs/jbd2/transaction.c
	[ 46417064("jbd2: Make state lock a spinlock") is not
	  applied. ]
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

7dd12442

ext4: Fix race when reusing xattr blocks · cf6b7789

Jan Kara authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: 186975, https://gitee.com/openeuler/kernel/issues/I5HT6F
CVE: NA

Reference: https://patchwork.ozlabs.org/project/linux-ext4/list/?series=309169



--------------------------------

When ext4_xattr_block_set() decides to remove xattr block the following
race can happen:

CPU1                                    CPU2
ext4_xattr_block_set()                  ext4_xattr_release_block()
  new_bh = ext4_xattr_block_cache_find()

                                          lock_buffer(bh);
                                          ref = le32_to_cpu(BHDR(bh)->h_refcount);
                                          if (ref == 1) {
                                            ...
                                            mb_cache_entry_delete();
                                            unlock_buffer(bh);
                                            ext4_free_blocks();
                                              ...
                                              ext4_forget(..., bh, ...);
                                                jbd2_journal_revoke(..., bh);

  ext4_journal_get_write_access(..., new_bh, ...)
    do_get_write_access()
      jbd2_journal_cancel_revoke(..., new_bh);

Later the code in ext4_xattr_block_set() finds out the block got freed
and cancels reusal of the block but the revoke stays canceled and so in
case of block reuse and journal replay the filesystem can get corrupted.
If the race works out slightly differently, we can also hit assertions
in the jbd2 code.

Fix the problem by making sure that once matching mbcache entry is
found, code dropping the last xattr block reference (or trying to modify
xattr block in place) waits until the mbcache entry reference is
dropped. This way code trying to reuse xattr block is protected from
someone trying to drop the last reference to xattr block.

Reported-and-tested-by: Ritesh Harjani <ritesh.list@gmail.com>
CC: stable@vger.kernel.org
Fixes: 82939d79 ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

cf6b7789

ext4: Unindent codeblock in ext4_xattr_block_set() · 3347776f

Jan Kara authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: 186975, https://gitee.com/openeuler/kernel/issues/I5HT6F
CVE: NA

Reference: https://patchwork.ozlabs.org/project/linux-ext4/list/?series=309169



--------------------------------

Remove unnecessary else (and thus indentation level) from a code block
in ext4_xattr_block_set(). It will also make following code changes
easier. No functional changes.

CC: stable@vger.kernel.org
Fixes: 82939d79 ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <jack@suse.cz>
Conflicts:
	fs/ext4/xattr.c
	[ 188c299e2a26 ("ext4: Support for checksumming from journal
	  triggers") is not applied. ]
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

3347776f

ext4: Remove EA inode entry from mbcache on inode eviction · e4dafb9b

Jan Kara authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: 186975, https://gitee.com/openeuler/kernel/issues/I5HT6F
CVE: NA

Reference: https://patchwork.ozlabs.org/project/linux-ext4/list/?series=309169



--------------------------------

Currently we remove EA inode from mbcache as soon as its xattr refcount
drops to zero. However there can be pending attempts to reuse the inode
and thus refcount handling code has to handle the situation when
refcount increases from zero anyway. So save some work and just keep EA
inode in mbcache until it is getting evicted. At that moment we are sure
following iget() of EA inode will fail anyway (or wait for eviction to
finish and load things from the disk again) and so removing mbcache
entry at that moment is fine and simplifies the code a bit.

CC: stable@vger.kernel.org
Fixes: 82939d79 ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

e4dafb9b

mbcache: Add functions to delete entry if unused · b2ec0867

Jan Kara authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: 186975, https://gitee.com/openeuler/kernel/issues/I5HT6F
CVE: NA

Reference: https://patchwork.ozlabs.org/project/linux-ext4/list/?series=309169



--------------------------------

Add function mb_cache_entry_delete_or_get() to delete mbcache entry if
it is unused and also add a function to wait for entry to become unused
- mb_cache_entry_wait_unused(). We do not share code between the two
deleting function as one of them will go away soon.

CC: stable@vger.kernel.org
Fixes: 82939d79 ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

b2ec0867

mbcache: Don't reclaim used entries · e249cdb4

Jan Kara authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: 186975, https://gitee.com/openeuler/kernel/issues/I5HT6F
CVE: NA

Reference: https://patchwork.ozlabs.org/project/linux-ext4/list/?series=309169



--------------------------------

Do not reclaim entries that are currently used by somebody from a
shrinker. Firstly, these entries are likely useful. Secondly, we will
need to keep such entries to protect pending increment of xattr block
refcount.

CC: stable@vger.kernel.org
Fixes: 82939d79 ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

e249cdb4

Jul 25, 2022

inotify: show inotify mask flags in proc fdinfo · e9035e63

Amir Goldstein authored 2 years ago

mainline inclusion
from mainline-v5.19-rc1
commit a32e697cda27679a0327ae2cafdad8c7170f548f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5IGXM
CVE: NA

--------------------------------

The inotify mask flags IN_ONESHOT and IN_EXCL_UNLINK are not "internal
to kernel" and should be exposed in procfs fdinfo so CRIU can restore
them.

Fixes: 69335996 ("inotify: hide internal kernel bits from fdinfo")
Link: https://lore.kernel.org/r/20220422120327.3459282-2-amir73il@gmail.com


Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Li Nan <linan122@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

4.19.90-2207.4.0

e9035e63

Jul 22, 2022

io_uring: always grab file table for deferred statx · 46a8cd03

Jens Axboe authored 2 years ago

stable inclusion
from stable-v5.10.118
commit 3c48558be571e01f67e65edcf03193484eeb2b79
category: bugfix
bugzilla: 187134, https://gitee.com/openeuler/kernel/issues/I5HEXY
CVE: NA

--------------------------------

Lee reports that there's a use-after-free of the process file table.
There's an assumption that we don't need the file table for some
variants of statx invocation, but that turns out to be false and we
end up with not grabbing a reference for the request even if the
deferred execution uses it.

Get rid of the REQ_F_NO_FILE_TABLE optimization for statx, and always
grab that reference.

This issues doesn't exist upstream since the native workers got
introduced with 5.12.

Link: https://lore.kernel.org/io-uring/YoOJ%2FT4QRKC+fAZE@google.com/


Reported-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

46a8cd03

Jul 20, 2022

ext4: fix race condition between ext4_ioctl_setflags and ext4_fiemap · b8493652

Baokun Li authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: 187222, https://gitee.com/openeuler/kernel/issues/I5H3KE


CVE: NA

--------------------------------

Hulk Robot reported a BUG:

==================================================================
kernel BUG at fs/ext4/extents_status.c:762!
invalid opcode: 0000 [#1] SMP KASAN PTI
[...]
Call Trace:
 ext4_cache_extents+0x238/0x2f0
 ext4_find_extent+0x785/0xa40
 ext4_fiemap+0x36d/0xe90
 do_vfs_ioctl+0x6af/0x1200
[...]
==================================================================

Above issue may happen as follows:
-------------------------------------
           cpu1		    cpu2
_____________________|_____________________
do_vfs_ioctl
 ext4_ioctl
  ext4_ioctl_setflags
   ext4_ind_migrate
                        do_vfs_ioctl
                         ioctl_fiemap
                          ext4_fiemap
                           ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)
                           ext4_fill_fiemap_extents
    down_write(&EXT4_I(inode)->i_data_sem);
    ext4_ext_check_inode
    ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS)
    memset(ei->i_data, 0, sizeof(ei->i_data))
    up_write(&EXT4_I(inode)->i_data_sem);
                            down_read(&EXT4_I(inode)->i_data_sem);
                            ext4_find_extent
                             ext4_cache_extents
                              ext4_es_cache_extent
                               BUG_ON(end < lblk)

We can easily reproduce this problem with the syzkaller testcase:
```
02:37:07 executing program 3:
r0 = openat(0xffffffffffffff9c, &(0x7f0000000040)='./file0\x00', 0x26e1, 0x0)
ioctl$FS_IOC_FSSETXATTR(r0, 0x40086602, &(0x7f0000000080)={0x17e})
mkdirat(0xffffffffffffff9c, &(0x7f00000000c0)='./file1\x00', 0x1ff)
r1 = openat(0xffffffffffffff9c, &(0x7f0000000100)='./file1\x00', 0x0, 0x0)
ioctl$FS_IOC_FIEMAP(r1, 0xc020660b, &(0x7f0000000180)={0x0, 0x1, 0x0, 0xef3, 0x6, []}) (async, rerun: 32)
ioctl$FS_IOC_FSSETXATTR(r1, 0x40086602, &(0x7f0000000140)={0x17e}) (rerun: 32)
```

To solve this issue, we use __generic_block_fiemap() instead of
generic_block_fiemap() and add inode_lock_shared to avoid race condition.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

b8493652

Jul 14, 2022

Revert "block: rename bd_invalidated" · 13879644

余快 authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: 187190, https://gitee.com/src-openeuler/kernel/issues/I5GWOV


CVE: NA

--------------------------------

This reverts commit b6113052.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

13879644

Revert "block: move the NEED_PART_SCAN flag to struct gendisk" · acc38712

余快 authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: 187190, https://gitee.com/src-openeuler/kernel/issues/I5GWOV


CVE: NA

--------------------------------

This reverts commit b2f0e44f.

Because it will introduce following problem in ltp zram tests:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000600
PGD 0 P4D 0
Oops: 0002 [#1] SMP PTI
CPU: 28 PID: 172121 Comm: sh Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0+ #2
Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 5.15 05/21/2019
RIP: 0010:flush_disk+0x1d/0x50
RSP: 0018:ffffaf14a516fe20 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff899e26bac380 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff899e26bac380
RBP: ffff899e26bac380 R08: 00000000000006a9 R09: 0000000000000004
R10: ffff89cd878ff440 R11: 0000000000000001 R12: 0000000000000000
R13: ffff899e26bac398 R14: ffffaf14a516ff00 R15: ffff89cd8709c3e0
FS:  00007f78d6840740(0000) GS:ffff89fcbf480000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000600 CR3: 000000308afc0002 CR4: 00000000001606e0
Call Trace:
 revalidate_disk+0x57/0x80
 reset_store+0xaf/0x120 [zram]
 kernfs_fop_write+0x10f/0x190
 vfs_write+0xad/0x1a0
 ksys_write+0x52/0xc0
 do_syscall_64+0x5d/0x1d0
 entry_SYSCALL_64_after_hwframe+0x65/0xca

This is because "bdev->bd_disk" is not ensured to exist, just convert
"set_bit(BDEV_NEED_PART_SCAN, &bdev->bd_flags)" to
"set_bit(GD_NEED_PART_SCAN, &bdev->bd_disk->state)" is wrong.

The reason to backport it is that commit 2a57456c8973 ("block:
Fix warning in bd_link_disk_holder()") has a regression that part scan
is disabled in device_add_disk(), and this problem will be fixed in
later patch.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

acc38712

Jul 08, 2022

NFSv4: Don't hold the layoutget locks across multiple RPC calls · ae64d846

Trond Myklebust authored 2 years ago

stable inclusion
from stable-4.19.247
commit 6b3fc1496e7227cd6a39a80bbfb7588ef7c7a010
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

[ Upstream commit 6949493884fe88500de4af182588e071cf1544ee ]

When doing layoutget as part of the open() compound, we have to be
careful to release the layout locks before we can call any further RPC
calls, such as setattr(). The reason is that those calls could trigger
a recall, which could deadlock.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

ae64d846

ext4: avoid cycles in directory h-tree · d0fdd7f3

Jan Kara authored 2 years ago

stable inclusion
from stable-4.19.247
commit b3ad9ff6f06c1dc6abf7437691c88ca3d6da3ac0
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

commit 3ba733f879c2a88910744647e41edeefbc0d92b2 upstream.

A maliciously corrupted filesystem can contain cycles in the h-tree
stored inside a directory. That can easily lead to the kernel corrupting
tree nodes that were already verified under its hands while doing a node
split and consequently accessing unallocated memory. Fix the problem by
verifying traversed block numbers are unique.

Cc: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220518093332.13986-2-jack@suse.cz


Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

d0fdd7f3