Skip to content
Snippets Groups Projects
  1. Jul 19, 2021
  2. Jul 18, 2021
  3. Jul 16, 2021
  4. Jul 14, 2021
    • Theodore Ts'o's avatar
      ext4: fix possible UAF when remounting r/o a mmp-protected file system · 7d7d26aa
      Theodore Ts'o authored
      mainline inclusion
      from mainline-5.14
      commit	61bb4a1c417e5b95d9edb4f887f131de32e419cb
      category: bugfix
      bugzilla: 173880
      CVE: NA
      
      -------------------------------------------------
      
      After commit 618f003199c6 ("ext4: fix memory leak in
      ext4_fill_super"), after the file system is remounted read-only, there
      is a race where the kmmpd thread can exit, causing sbi->s_mmp_tsk to
      point at freed memory, which the call to ext4_stop_mmpd() can trip
      over.
      
      Fix this by only allowing kmmpd() to exit when it is stopped via
      ext4_stop_mmpd().
      
      Link: https://lore.kernel.org/r/20210707002433.3719773-1-tytso@mit.edu
      
      
      Reported-by: default avatarYe Bin <yebin10@huawei.com>
      Bug-Report-Link: <20210629143603.2166962-1-yebin10@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      
      Conflicts:
      	fs/ext4/super.c
      
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Reviewed-by: default avatarZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      7d7d26aa
    • Luo Meng's avatar
      locks: Fix UBSAN undefined behaviour in flock64_to_posix_lock · 1b84bd6f
      Luo Meng authored
      
      mainline inclusion
      from mainline-v5.11-rc1
      commit 16238415eb9886328a89fe7a3cb0b88c7335fe16
      category: bugfix
      bugzilla: 38689
      CVE: NA
      
      -----------------------------------------------
      
      When the sum of fl->fl_start and l->l_len overflows,
      UBSAN shows the following warning:
      
      UBSAN: Undefined behaviour in fs/locks.c:482:29
      signed integer overflow: 2 + 9223372036854775806
      cannot be represented in type 'long long int'
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xe4/0x14e lib/dump_stack.c:118
       ubsan_epilogue+0xe/0x81 lib/ubsan.c:161
       handle_overflow+0x193/0x1e2 lib/ubsan.c:192
       flock64_to_posix_lock fs/locks.c:482 [inline]
       flock_to_posix_lock+0x595/0x690 fs/locks.c:515
       fcntl_setlk+0xf3/0xa90 fs/locks.c:2262
       do_fcntl+0x456/0xf60 fs/fcntl.c:387
       __do_sys_fcntl fs/fcntl.c:483 [inline]
       __se_sys_fcntl fs/fcntl.c:468 [inline]
       __x64_sys_fcntl+0x12d/0x180 fs/fcntl.c:468
       do_syscall_64+0xc8/0x5a0 arch/x86/entry/common.c:293
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fix it by parenthesizing 'l->l_len - 1'.
      
      Signed-off-by: default avatarLuo Meng <luomeng12@huawei.com>
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarLuo Meng <luomeng12@huawei.com>
      Reviewed-by: default avatarZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      1b84bd6f
    • Matthew Wilcox (Oracle)'s avatar
      iomap: Mark read blocks uptodate in write_begin · a259beb0
      Matthew Wilcox (Oracle) authored
      
      mainline inclusion
      from mainline-v5.10
      commit 14284fed
      category: bugfix
      bugzilla: 43547
      CVE: NA
      
      -----------------------------------------------
      
      When bringing (portions of) a page uptodate, we were marking blocks that
      were zeroed as being uptodate, but not blocks that were read from storage.
      
      Like the previous commit, this problem was found with generic/127 and
      a kernel which failed readahead I/Os.  This bug causes writes to be
      silently lost when working with flaky storage.
      
      Fixes: 9dc55f13 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      
      conflicts:
      fs/iomap.c
      
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: Yang Yingliang <yangyi...
      a259beb0
    • Matthew Wilcox (Oracle)'s avatar
      iomap: Clear page error before beginning a write · b3a0aab5
      Matthew Wilcox (Oracle) authored
      
      mainline inclusion
      from mainline-v5.10-rc1
      commit e6e7ca92
      category: bugfix
      bugzilla: 43551
      CVE: NA
      
      -----------------------------------------------
      
      If we find a page in write_begin which is !Uptodate, we need
      to clear any error on the page before starting to read data
      into it.  This matches how filemap_fault(), do_read_cache_page()
      and generic_file_buffered_read() handle PageError on !Uptodate pages.
      When calling iomap_set_range_uptodate() in __iomap_write_begin(), blocks
      were not being marked as uptodate.
      
      This was found with generic/127 and a specially modified kernel which
      would fail (some) readahead I/Os.  The test read some bytes in a prior
      page which caused readahead to extend into page 0x34.  There was
      a subsequent write to page 0x34, followed by a read to page 0x34.
      Because the blocks were still marked as !Uptodate, the read caused all
      blocks to be re-read, overwriting the write.  With this change, and the
      next one, the bytes which were written are marked as being Uptodate, so
      even though the page is still marked as !Uptodate, the blocks containing
      the written data are not re-read from storage.
      
      Fixes: 9dc55f13 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      
      conflicts:
      fs/iomap.c
      
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      b3a0aab5
    • Christoph Hellwig's avatar
      iomap: move the zeroing case out of iomap_read_page_sync · 9fdbab15
      Christoph Hellwig authored
      
      mainline inclusion
      from mainline-v5.5-rc1
      commit d3b40439
      category: bugfix
      bugzilla: 43551
      CVE: NA
      
      -----------------------------------------------
      
      That keeps the function a little easier to understand, and easier to
      modify for pending enhancements.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      
      conflicts:
      fs/iomap.c
      
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      9fdbab15
    • Josef Bacik's avatar
      nbd: handle device refs for DESTROY_ON_DISCONNECT properly · c72b1648
      Josef Bacik authored
      
      mainline inclusion
      from mainline-5.12-rc1
      commit c9a2f90f4d6b
      category: bugfix
      bugzilla: 50455
      CVE: NA
      
      -------------------------------------------------
      
      There exists a race where we can be attempting to create a new nbd
      configuration while a previous configuration is going down, both
      configured with DESTROY_ON_DISCONNECT.  Normally devices all have a
      reference of 1, as they won't be cleaned up until the module is torn
      down.  However with DESTROY_ON_DISCONNECT we'll make sure that there is
      only 1 reference (generally) on the device for the config itself, and
      then once the config is dropped, the device is torn down.
      
      The race that exists looks like this
      
      TASK1					TASK2
      nbd_genl_connect()
        idr_find()
          refcount_inc_not_zero(nbd)
            * count is 2 here ^^
      					nbd_config_put()
      					  nbd_put(nbd) (count is 1)
          setup new config
            check DESTROY_ON_DISCONNECT
      	put_dev = true
          if (put_dev) nbd_put(nbd)
      	* free'd here ^^
      
      In nbd_genl_connect() we assume that the nbd ref count will be 2,
      however clearly that won't be true if the nbd device had been setup as
      DESTROY_ON_DISCONNECT with its prior configuration.  Fix this by getting
      rid of the runtime flag to check if we need to mess with the nbd device
      refcount, and use the device NBD_DESTROY_ON_DISCONNECT flag to check if
      we need to adjust the ref counts.  This was reported by syzkaller with
      the following kasan dump
      
      BUG: KASAN: use-after-free in instrument_atomic_read include/linux/instrumented.h:71 [inline]
      BUG: KASAN: use-after-free in atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
      BUG: KASAN: use-after-free in refcount_dec_not_one+0x71/0x1e0 lib/refcount.c:76
      Read of size 4 at addr ffff888143bf71a0 by task systemd-udevd/8451
      
      CPU: 0 PID: 8451 Comm: systemd-udevd Not tainted 5.11.0-rc7-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x107/0x163 lib/dump_stack.c:120
       print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:230
       __kasan_report mm/kasan/report.c:396 [inline]
       kasan_report.cold+0x79/0xd5 mm/kasan/report.c:413
       check_memory_region_inline mm/kasan/generic.c:179 [inline]
       check_memory_region+0x13d/0x180 mm/kasan/generic.c:185
       instrument_atomic_read include/linux/instrumented.h:71 [inline]
       atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
       refcount_dec_not_one+0x71/0x1e0 lib/refcount.c:76
       refcount_dec_and_mutex_lock+0x19/0x140 lib/refcount.c:115
       nbd_put drivers/block/nbd.c:248 [inline]
       nbd_release+0x116/0x190 drivers/block/nbd.c:1508
       __blkdev_put+0x548/0x800 fs/block_dev.c:1579
       blkdev_put+0x92/0x570 fs/block_dev.c:1632
       blkdev_close+0x8c/0xb0 fs/block_dev.c:1640
       __fput+0x283/0x920 fs/file_table.c:280
       task_work_run+0xdd/0x190 kernel/task_work.c:140
       tracehook_notify_resume include/linux/tracehook.h:189 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
       exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201
       __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
       syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7fc1e92b5270
      Code: 73 01 c3 48 8b 0d 38 7d 20 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 59 c1 20 00 00 75 10 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee fb ff ff 48 89 04 24
      RSP: 002b:00007ffe8beb2d18 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
      RAX: 0000000000000000 RBX: 0000000000000007 RCX: 00007fc1e92b5270
      RDX: 000000000aba9500 RSI: 0000000000000000 RDI: 0000000000000007
      RBP: 00007fc1ea16f710 R08: 000000000000004a R09: 0000000000000008
      R10: 0000562f8cb0b2a8 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000562f8cb0afd0 R14: 0000000000000003 R15: 000000000000000e
      
      Allocated by task 1:
       kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:46 [inline]
       set_alloc_info mm/kasan/common.c:401 [inline]
       ____kasan_kmalloc.constprop.0+0x82/0xa0 mm/kasan/common.c:429
       kmalloc include/linux/slab.h:552 [inline]
       kzalloc include/linux/slab.h:682 [inline]
       nbd_dev_add+0x44/0x8e0 drivers/block/nbd.c:1673
       nbd_init+0x250/0x271 drivers/block/nbd.c:2394
       do_one_initcall+0x103/0x650 init/main.c:1223
       do_initcall_level init/main.c:1296 [inline]
       do_initcalls init/main.c:1312 [inline]
       do_basic_setup init/main.c:1332 [inline]
       kernel_init_freeable+0x605/0x689 init/main.c:1533
       kernel_init+0xd/0x1b8 init/main.c:1421
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
      
      Freed by task 8451:
       kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
       kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
       kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:356
       ____kasan_slab_free+0xe1/0x110 mm/kasan/common.c:362
       kasan_slab_free include/linux/kasan.h:192 [inline]
       slab_free_hook mm/slub.c:1547 [inline]
       slab_free_freelist_hook+0x5d/0x150 mm/slub.c:1580
       slab_free mm/slub.c:3143 [inline]
       kfree+0xdb/0x3b0 mm/slub.c:4139
       nbd_dev_remove drivers/block/nbd.c:243 [inline]
       nbd_put.part.0+0x180/0x1d0 drivers/block/nbd.c:251
       nbd_put drivers/block/nbd.c:295 [inline]
       nbd_config_put+0x6dd/0x8c0 drivers/block/nbd.c:1242
       nbd_release+0x103/0x190 drivers/block/nbd.c:1507
       __blkdev_put+0x548/0x800 fs/block_dev.c:1579
       blkdev_put+0x92/0x570 fs/block_dev.c:1632
       blkdev_close+0x8c/0xb0 fs/block_dev.c:1640
       __fput+0x283/0x920 fs/file_table.c:280
       task_work_run+0xdd/0x190 kernel/task_work.c:140
       tracehook_notify_resume include/linux/tracehook.h:189 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
       exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201
       __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
       syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The buggy address belongs to the object at ffff888143bf7000
       which belongs to the cache kmalloc-1k of size 1024
      The buggy address is located 416 bytes inside of
       1024-byte region [ffff888143bf7000, ffff888143bf7400)
      The buggy address belongs to the page:
      page:000000005238f4ce refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x143bf0
      head:000000005238f4ce order:3 compound_mapcount:0 compound_pincount:0
      flags: 0x57ff00000010200(slab|head)
      raw: 057ff00000010200 ffffea00004b1400 0000000300000003 ffff888010c41140
      raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff888143bf7080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff888143bf7100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff888143bf7180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                     ^
       ffff888143bf7200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Reported-and-tested-by: default avatar <syzbot+429d3f82d757c211bff3@syzkaller.appspotmail.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarLuo Meng <luomeng12@huawei.com>
      Reviewed-by: default avatarJason Yan <yanaijie@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      c72b1648
    • Paul Aurich's avatar
      cifs: Fix leak when handling lease break for cached root fid · 25ebc65f
      Paul Aurich authored
      
      mainline inclusion
      from mainline-5.9-rc1
      commit baf57b56
      category: bugfix
      bugzilla: 40791
      CVE: NA
      
      ---------------------------
      
      Handling a lease break for the cached root didn't free the
      smb2_lease_break_work allocation, resulting in a leak:
      
          unreferenced object 0xffff98383a5af480 (size 128):
            comm "cifsd", pid 684, jiffies 4294936606 (age 534.868s)
            hex dump (first 32 bytes):
              c0 ff ff ff 1f 00 00 00 88 f4 5a 3a 38 98 ff ff  ..........Z:8...
              88 f4 5a 3a 38 98 ff ff 80 88 d6 8a ff ff ff ff  ..Z:8...........
            backtrace:
              [<0000000068957336>] smb2_is_valid_oplock_break+0x1fa/0x8c0
              [<0000000073b70b9e>] cifs_demultiplex_thread+0x73d/0xcc0
              [<00000000905fa372>] kthread+0x11c/0x150
              [<0000000079378e4e>] ret_from_fork+0x22/0x30
      
      Avoid this leak by only allocating when necessary.
      
      Fixes: a93864d9 ("cifs: add lease tracking to the cached root fid")
      Signed-off-by: default avatarPaul Aurich <paul@darkrain42.org>
      CC: Stable <stable@vger.kernel.org> # v4.18+
      Reviewed-by: default avatarAurelien Aptel <aaptel@suse.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Conflicts:
        fs/cifs/smb2misc.c
        [ Not apply 9bd45408("CIFS: Properly process SMB3 lease breaks") ]
      Signed-off-by: default avatarZhihao Cheng <chengzhihao1@huawei.com>
      Reviewed-by: default avatarZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      25ebc65f
  5. Jul 12, 2021
  6. Jul 09, 2021
  7. Jul 08, 2021
    • 王海's avatar
      usb: gadget: rndis: Fix info leak of rndis · 7f1196b6
      王海 authored
      
      hulk inclusion
      category: bugfix
      bugzilla: 172330
      CVE: NA
      
      --------------------------------
      
      We can construct some special USB packets that cause kernel
      info leak by the following steps of rndis.
      
      1. construct the packet to make rndis call gen_ndis_set_resp().
      
      In gen_ndis_set_resp(), BufOffset comes from the USB packet and
      it is not checked so that BufOffset can be any value. Therefore,
      if OID is RNDIS_OID_GEN_CURRENT_PACKET_FILTER, then *params->filter
      can get data at any address.
      
      2. construct the packet to make rndis call rndis_query_response().
      
      In rndis_query_response(), if OID is RNDIS_OID_GEN_CURRENT_PACKET_FILTER,
      then the data of *params->filter is fetched and returned, resulting in
      info leak.
      
      Therefore, we need to check the BufOffset to prevent info leak. Here,
      buf size is USB_COMP_EP0_BUFSIZ, as long as "8 + BufOffset + BufLength"
      is less than USB_COMP_EP0_BUFSIZ, it will be considered legal.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarWang Hai <wanghai38@huawei.com>
      Reviewed-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Reviewed-by: default avatarXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      7f1196b6
  8. Jul 05, 2021
    • 王克锋's avatar
      once: Fix panic when module unload · 23eb8e37
      王克锋 authored
      hulk inclusion
      category: bugfix
      bugzilla: 172153
      CVE: NA
      
      -------------------------------------------------
      
      DO_ONCE
      DEFINE_STATIC_KEY_TRUE(___once_key);
      __do_once_done
        once_disable_jump(once_key);
          INIT_WORK(&w->work, once_deferred);
          struct once_work *w;
          w->key = key;
          schedule_work(&w->work);                     module unload
                                                         //*the key is destroy*
      process_one_work
        once_deferred
          BUG_ON(!static_key_enabled(work->key));
             static_key_count((struct static_key *)x)    //*access key, crash*
      
      When module uses DO_ONCE mechanism, it could crash due to the above
      concurrency problem, we could reproduce it with link[1].
      
      Fix it by add/put module refcount in the once work process.
      
      [1]
      https://lore.kernel.org/netdev/eaa6c371-465e-57eb-6be9-f4b16b9d7cbf@huawei.com/
      
      
      
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Reported-by: default avatarMinmin chen <chenmingmin@huawei.com>
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
    • Zhang Xiaoxu's avatar
      SUNRPC: Should wake up the privileged task firstly. · 3d5dba2f
      Zhang Xiaoxu authored
      
      mainline inclusion
      from mainline-v5.14
      commit 5483b904bf336948826594610af4c9bbb0d9e3aa
      category: bugfix
      bugzilla: 51898
      CVE: NA
      
      ---------------------------
      
      When find a task from wait queue to wake up, a non-privileged task may
      be found out, rather than the privileged. This maybe lead a deadlock
      same as commit dfe1fe75e00e ("NFSv4: Fix deadlock between nfs4_evict_inode()
      and nfs4_opendata_get_inode()"):
      
      Privileged delegreturn task is queued to privileged list because all
      the slots are assigned. If there has no enough slot to wake up the
      non-privileged batch tasks(session less than 8 slot), then the privileged
      delegreturn task maybe lost waked up because the found out task can't
      get slot since the session is on draining.
      
      So we should treate the privileged task as the emergency task, and
      execute it as for as we can.
      
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Fixes: 5fcdfacc ("NFSv4: Return delegations synchronously in evict_inode")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarZhang Xiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Reviewed-by: default avatarYue Haibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      3d5dba2f
    • Zhang Xiaoxu's avatar
      SUNRPC: Fix the batch tasks count wraparound. · 9b06b695
      Zhang Xiaoxu authored
      
      mainline inclusion
      from mainline-v5.14
      commit fcb170a9d825d7db4a3fb870b0300f5a40a8d096
      category: bugfix
      bugzilla: 51898
      CVE: NA
      
      ---------------------------
      
      The 'queue->nr' will wraparound from 0 to 255 when only current
      priority queue has tasks. This maybe lead a deadlock same as commit
      dfe1fe75e00e ("NFSv4: Fix deadlock between nfs4_evict_inode()
      and nfs4_opendata_get_inode()"):
      
      Privileged delegreturn task is queued to privileged list because all
      the slots are assigned. When non-privileged task complete and release
      the slot, a non-privileged maybe picked out. It maybe allocate slot
      failed when the session on draining.
      
      If the 'queue->nr' has wraparound to 255, and no enough slot to
      service it, then the privileged delegreturn will lost to wake up.
      
      So we should avoid the wraparound on 'queue->nr'.
      
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Fixes: 5fcdfacc ("NFSv4: Return delegations synchronously in evict_inode")
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarZhang Xiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Reviewed-by: default avatarYue Haibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      9b06b695
    • Daniel Borkmann's avatar
      bpf: Fix leakage under speculation on mispredicted branches · 78d76ae7
      Daniel Borkmann authored
      
      mainline inclusion
      from mainline-v5.13-rc7
      commit 9183671af6dbf60a1219371d4ed73e23f43b49db
      category: bugfix
      bugzilla: NA
      CVE: CVE-2021-33624
      
      --------------------------------
      
      The verifier only enumerates valid control-flow paths and skips paths that
      are unreachable in the non-speculative domain. And so it can miss issues
      under speculative execution on mispredicted branches.
      
      For example, a type confusion has been demonstrated with the following
      crafted program:
      
        // r0 = pointer to a map array entry
        // r6 = pointer to readable stack slot
        // r9 = scalar controlled by attacker
        1: r0 = *(u64 *)(r0) // cache miss
        2: if r0 != 0x0 goto line 4
        3: r6 = r9
        4: if r0 != 0x1 goto line 6
        5: r9 = *(u8 *)(r6)
        6: // leak r9
      
      Since line 3 runs iff r0 == 0 and line 5 runs iff r0 == 1, the verifier
      concludes that the pointer dereference on line 5 is safe. But: if the
      attacker trains both the branches to fall-through, such that the following
      is speculatively executed ...
      
        r6 = r9
        r9 = *(u8 *)(r6)
        // leak r9
      
      ... then the program will dereference an attacker-controlled value and could
      leak its content under speculative execution via side-channel. This requires
      to mistrain the branch predictor, which can be rather tricky, because the
      branches are mutually exclusive. However such training can be done at
      congruent addresses in user space using different branches that are not
      mutually exclusive. That is, by training branches in user space ...
      
        A:  if r0 != 0x0 goto line C
        B:  ...
        C:  if r0 != 0x0 goto line D
        D:  ...
      
      ... such that addresses A and C collide to the same CPU branch prediction
      entries in the PHT (pattern history table) as those of the BPF program's
      lines 2 and 4, respectively. A non-privileged attacker could simply brute
      force such collisions in the PHT until observing the attack succeeding.
      
      Alternative methods to mistrain the branch predictor are also possible that
      avoid brute forcing the collisions in the PHT. A reliable attack has been
      demonstrated, for example, using the following crafted program:
      
        // r0 = pointer to a [control] map array entry
        // r7 = *(u64 *)(r0 + 0), training/attack phase
        // r8 = *(u64 *)(r0 + 8), oob address
        // [...]
        // r0 = pointer to a [data] map array entry
        1: if r7 == 0x3 goto line 3
        2: r8 = r0
        // crafted sequence of conditional jumps to separate the conditional
        // branch in line 193 from the current execution flow
        3: if r0 != 0x0 goto line 5
        4: if r0 == 0x0 goto exit
        5: if r0 != 0x0 goto line 7
        6: if r0 == 0x0 goto exit
        [...]
        187: if r0 != 0x0 goto line 189
        188: if r0 == 0x0 goto exit
        // load any slowly-loaded value (due to cache miss in phase 3) ...
        189: r3 = *(u64 *)(r0 + 0x1200)
        // ... and turn it into known zero for verifier, while preserving slowly-
        // loaded dependency when executing:
        190: r3 &= 1
        191: r3 &= 2
        // speculatively bypassed phase dependency
        192: r7 += r3
        193: if r7 == 0x3 goto exit
        194: r4 = *(u8 *)(r8 + 0)
        // leak r4
      
      As can be seen, in training phase (phase != 0x3), the condition in line 1
      turns into false and therefore r8 with the oob address is overridden with
      the valid map value address, which in line 194 we can read out without
      issues. However, in attack phase, line 2 is skipped, and due to the cache
      miss in line 189 where the map value is (zeroed and later) added to the
      phase register, the condition in line 193 takes the fall-through path due
      to prior branch predictor training, where under speculation, it'll load the
      byte at oob address r8 (unknown scalar type at that point) which could then
      be leaked via side-channel.
      
      One way to mitigate these is to 'branch off' an unreachable path, meaning,
      the current verification path keeps following the is_branch_taken() path
      and we push the other branch to the verification stack. Given this is
      unreachable from the non-speculative domain, this branch's vstate is
      explicitly marked as speculative. This is needed for two reasons: i) if
      this path is solely seen from speculative execution, then we later on still
      want the dead code elimination to kick in in order to sanitize these
      instructions with jmp-1s, and ii) to ensure that paths walked in the
      non-speculative domain are not pruned from earlier walks of paths walked in
      the speculative domain. Additionally, for robustness, we mark the registers
      which have been part of the conditional as unknown in the speculative path
      given there should be no assumptions made on their content.
      
      The fix in here mitigates type confusion attacks described earlier due to
      i) all code paths in the BPF program being explored and ii) existing
      verifier logic already ensuring that given memory access instruction
      references one specific data structure.
      
      An alternative to this fix that has also been looked at in this scope was to
      mark aux->alu_state at the jump instruction with a BPF_JMP_TAKEN state as
      well as direction encoding (always-goto, always-fallthrough, unknown), such
      that mixing of different always-* directions themselves as well as mixing of
      always-* with unknown directions would cause a program rejection by the
      verifier, e.g. programs with constructs like 'if ([...]) { x = 0; } else
      { x = 1; }' with subsequent 'if (x == 1) { [...] }'. For unprivileged, this
      would result in only single direction always-* taken paths, and unknown taken
      paths being allowed, such that the former could be patched from a conditional
      jump to an unconditional jump (ja). Compared to this approach here, it would
      have two downsides: i) valid programs that otherwise are not performing any
      pointer arithmetic, etc, would potentially be rejected/broken, and ii) we are
      required to turn off path pruning for unprivileged, where both can be avoided
      in this work through pushing the invalid branch to the verification stack.
      
      The issue was originally discovered by Adam and Ofek, and later independently
      discovered and reported as a result of Benedict and Piotr's research work.
      
      Fixes: b2157399 ("bpf: prevent out-of-bounds speculation")
      Reported-by: default avatarAdam Morrison <mad@cs.tau.ac.il>
      Reported-by: default avatarOfek Kirzner <ofekkir@gmail.com>
      Reported-by: default avatarBenedict Schlueter <benedict.schlueter@rub.de>
      Reported-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: default avatarBenedict Schlueter <benedict.schlueter@rub.de>
      Reviewed-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      
      onflicts:
        kernel/bpf/verifier.c
      [yyl: bypass_spec_v1 is not introduced in kernel-4.19,
        use allow_ptr_leaks instead]
      
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarHe <Fengqing&lt;hefengqing@huawei.com>
      Reviewed-by: default avatarKuohai Xu <xukuohai@huawei.com>
      Reviewed-by: default avatarXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      78d76ae7
    • Daniel Borkmann's avatar
      bpf: Do not mark insn as seen under speculative path verification · b2fdc6d8
      Daniel Borkmann authored
      
      mainline inclusion
      from mainline-v5.13-rc7
      commit fe9a5ca7e370e613a9a75a13008a3845ea759d6e
      category: bugfix
      bugzilla: NA
      CVE: CVE-2021-33624
      
      --------------------------------
      
      ... in such circumstances, we do not want to mark the instruction as seen given
      the goal is still to jmp-1 rewrite/sanitize dead code, if it is not reachable
      from the non-speculative path verification. We do however want to verify it for
      safety regardless.
      
      With the patch as-is all the insns that have been marked as seen before the
      patch will also be marked as seen after the patch (just with a potentially
      different non-zero count). An upcoming patch will also verify paths that are
      unreachable in the non-speculative domain, hence this extension is needed.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: default avatarBenedict Schlueter <benedict.schlueter@rub.de>
      Reviewed-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      
      Conflicts:
        kernel/bpf/verifier.c
      
      pass_cnt is not introduced in kernel-4.19.
      
      Signed-off-by: default avatarHe Fengqing <hefengqing@huawei.com>
      Reviewed-by: default avatarKuohai Xu <xukuohai@huawei.com>
      Reviewed-by: default avatarXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      b2fdc6d8
    • Daniel Borkmann's avatar
      bpf: Inherit expanded/patched seen count from old aux data · 9d1b583d
      Daniel Borkmann authored
      
      mainline inclusion
      from mainline-v5.13-rc7
      commit d203b0fd863a2261e5d00b97f3d060c4c2a6db71
      category: bugfix
      bugzilla: NA
      CVE: CVE-2021-33624
      
      --------------------------------
      
      Instead of relying on current env->pass_cnt, use the seen count from the
      old aux data in adjust_insn_aux_data(), and expand it to the new range of
      patched instructions. This change is valid given we always expand 1:n
      with n>=1, so what applies to the old/original instruction needs to apply
      for the replacement as well.
      
      Not relying on env->pass_cnt is a prerequisite for a later change where we
      want to avoid marking an instruction seen when verified under speculative
      execution path.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: default avatarBenedict Schlueter <benedict.schlueter@rub.de>
      Reviewed-by: default avatarPiotr Krysiuk <piotras@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      
      Conflicts:
        kernel/bpf/verifier.c
      
      seen of bpf_insn_aux_data is bool in kernel-4.19.
      
      Signed-off-by: default avatarHe Fengqing <hefengqing@huawei.com>
      Reviewed-by: default avatarKuohai Xu <xukuohai@huawei.com>
      Reviewed-by: default avatarXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      9d1b583d
    • Daniel Borkmann's avatar
      bpf: Update selftests to reflect new error states · 040bd002
      Daniel Borkmann authored
      
      stable inclusion
      from linux-4.19.193
      commit 138b0ec1064c8f154a32297458e562591a94773f
      
      --------------------------------
      
      commit d7a5091351756d0ae8e63134313c455624e36a13 upstream
      
      Update various selftest error messages:
      
       * The 'Rx tried to sub from different maps, paths, or prohibited types'
         is reworked into more specific/differentiated error messages for better
         guidance.
      
       * The change into 'value -4294967168 makes map_value pointer be out of
         bounds' is due to moving the mixed bounds check into the speculation
         handling and thus occuring slightly later than above mentioned sanity
         check.
      
       * The change into 'math between map_value pointer and register with
         unbounded min value' is similarly due to register sanity check coming
         before the mixed bounds check.
      
       * The case of 'map access: known scalar += value_ptr from different maps'
         now loads fine given masks are the same from the different paths (despite
         max map value size being different).
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      [OP: 4.19 backport, account for split test_verifier and
      different / missing tests]
      Signed-off-by: default avatarOvidiu Panait <ovidiu.panait@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      040bd002
    • Daniel Borkmann's avatar
      bpf, test_verifier: switch bpf_get_stack's 0 s> r8 test · 0dae2841
      Daniel Borkmann authored
      
      stable inclusion
      from linux-4.19.193
      commit d1e281d6cb8841122c4677b47fcebdc6f410bd74
      
      --------------------------------
      
      [ no upstream commit ]
      
      Switch the comparison, so that is_branch_taken() will recognize that below
      branch is never taken:
      
        [...]
        17: [...] R1_w=inv0 [...] R8_w=inv(id=0,smin_value=-2147483648,smax_value=-1,umin_value=18446744071562067968,var_off=(0xffffffff80000000; 0x7fffffff)) [...]
        17: (67) r8 <<= 32
        18: [...] R8_w=inv(id=0,smax_value=-4294967296,umin_value=9223372036854775808,umax_value=18446744069414584320,var_off=(0x8000000000000000; 0x7fffffff00000000)) [...]
        18: (c7) r8 s>>= 32
        19: [...] R8_w=inv(id=0,smin_value=-2147483648,smax_value=-1,umin_value=18446744071562067968,var_off=(0xffffffff80000000; 0x7fffffff)) [...]
        19: (6d) if r1 s> r8 goto pc+16
        [...] R1_w=inv0 [...] R8_w=inv(id=0,smin_value=-2147483648,smax_value=-1,umin_value=18446744071562067968,var_off=(0xffffffff80000000; 0x7fffffff)) [...]
        [...]
      
      Currently we check for is_branch_taken() only if either K is source, or source
      is a scalar value that is const. For upstream it would be good to extend this
      properly to check whether dst is const and src not.
      
      For the sake of the test_verifier, it is probably not needed here:
      
        # ./test_verifier 101
        #101/p bpf_get_stack return R0 within range OK
        Summary: 1 PASSED, 0 SKIPPED, 0 FAILED
      
      I haven't seen this issue in test_progs* though, they are passing fine:
      
        # ./test_progs-no_alu32 -t get_stack
        Switching to flavor 'no_alu32' subdirectory...
        #20 get_stack_raw_tp:OK
        Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
      
        # ./test_progs -t get_stack
        #20 get_stack_raw_tp:OK
        Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [OP: backport to 4.19]
      Signed-off-by: default avatarOvidiu Panait <ovidiu.panait@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      0dae2841