Skip to content
Snippets Groups Projects
  1. Dec 27, 2019
    • Alexander Shishkin's avatar
      perf: Paper over the hw.target problems · 03804742
      Alexander Shishkin authored and 谢秀奇's avatar 谢秀奇 committed
      euler inclusion
      category: bugfix
      bugzilla: 9513/11006/11050
      CVE: NA
      --------------------------------------------------
      
      [ Cheng Jian
      HULK-Syzkaller reported a problem which has been reported to mainline(lkml)
      by syzbot early, this patch comes from the reply form lkml.
      v1	https://lkml.org/lkml/2019/2/28/529
      v2	https://lkml.org/lkml/2019/3/8/206
      we merged v1 first but cause bugzilla #11050, it was because :
      we also use perf_remove_from_context() in perf_event_open() when we move
      events from a SW context to a HW context, so we can't destroy the event
      here.
      now v2 will not exhibit that warning.
      it's same to another patch at https://lkml.org/lkml/2019/3/8/536
      
      .
      but more clear than it.]
      
      First, we have a race between perf_event_release_kernel() and
      perf_free_event(), which happens when parent's event is released while the
      child's fork fails (because of a fatal signal, for example), that looks
      like this:
      
      cpu X                            cpu Y
      -----                            -----
                                       copy_process() error path
      perf_release(parent)             +->perf_event_free_task()
      +-> lock(child_ctx->mutex)       |  |
      +-> remove_from_context(child)   |  |
      +-> unlock(child_ctx->mutex)     |  |
      |                                |  +-> lock(child_ctx->mutex)
      |                                |  +-> unlock(child_ctx->mutex)
      |                                +-> free_task(child_task)
      +-> put_task_struct(child_task)
      
      Technically, we're still holding a reference to the task via
      parent->hw.target, that's not stopping free_task(), so we end up poking at
      free'd memory, as is pointed out by KASAN in the syzkaller report (see Link
      below). The straightforward fix is to drop the hw.target reference while
      the task is still around.
      
      Therein lies the second problem: the users of hw.target (uprobe) assume
      that it's around at ->destroy() callback time, where they use it for
      context. So, in order to not break the uprobe teardown and avoid leaking
      stuff, we need to call ->destroy() at the same time.
      
      This patch fixes the race and the subsequent fallout by doing both these
      things at remove_from_context time.
      
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Link: https://syzkaller.appspot.com/bug?extid=a24c397a29ad22d86c98
      
      
      
      Signed-off-by: default avatarCheng Jian <cj.chengjian@huawei.com>
      Reviewed-by: default avatarXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      03804742
    • Al Viro's avatar
      aio_poll(): sanitize the logics after vfs_poll(), get rid of leak on error · 40bdfdc7
      Al Viro authored and 谢秀奇's avatar 谢秀奇 committed
      euler inclusion
      category: bugfix
      bugzilla: 10679
      CVE: NA
      
      ---------------------------
      
      We want iocb_put() happening on errors, to balance the extra reference
      we'd taken.  As it is, we end up with a leak.  The rules should be
      	* error: iocb_put() to deal with the extra ref, return error,
      let the caller do another iocb_put().
      	* async: iocb_put() to deal with the extra ref, return 0.
      	* no error, event present immediately: aio_poll_complete() to
      report it, iocb_put() to deal with the extra ref, return 0.
      
      Link: https://patchwork.kernel.org/patch/10842103/
      
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarzhengbin <zhengbin13@huawei.com>
      Reviewed-by: default avatarzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      40bdfdc7
    • Al Viro's avatar
      aio_poll_wake(): don't set ->woken if we ignore the wakeup · 2d9e350d
      Al Viro authored and 谢秀奇's avatar 谢秀奇 committed
      euler inclusion
      category: bugfix
      bugzilla: 10679
      CVE: NA
      
      ---------------------------
      
      In case of early wakeups, aio_poll() assumes that aio_poll_complete()
      has either already happened or is imminent.  In that case we do not
      want to put iocb on the list of cancellables.  However, ignored
      wakeups need to be treated as if wakeup has not happened at all.
      Trivially fixed by having aio_poll_wake() set ->woken only after
      it's committed to taking iocb out of the waitqueue.
      
      Link: https://patchwork.kernel.org/patch/10842107/
      
      
      Suggested-by: default avatarzhengbin <zhengbin13@huawei.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarzhengbin <zhengbin13@huawei.com>
      Reviewed-by: default avatarzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      2d9e350d
    • zhangyi (F)'s avatar
      ext4: brelse all indirect buffers in ext4_ind_remove_space() · 9b15cfc8
      zhangyi (F) authored and 谢秀奇's avatar 谢秀奇 committed
      
      euler inclusion
      category: bugfix
      bugzilla: 11043
      CVE: NA
      ---------------------------
      
      All indirect buffers get by ext4_find_shared() should be released no
      mater the branch should be freed or not. But now, we forget to release
      the lower depth indirect buffers when removing space from the same
      higher depth indirect block. It will lead to buffer leak and futher
      more, it may lead to quota information corruption when using old quota,
      consider the following case.
      
       - Create and mount an empty ext4 filesystem without extent and quota
         features,
       - quotacheck and enable the user & group quota,
       - Create some files and write some data to them, and then punch hole
         to some files of them, it may trigger the buffer leak problem
         mentioned above.
       - Disable quota and run quotacheck again, it will create two new
         aquota files and write the checked quota information to them, which
         probably may reuse the freed indirect block(the buffer and page
         cache was not freed) as data block.
       - Enable quota again, it will invoke
         vfs_load_quota_inode()->invalidate_bdev() to try to clean unused
         buffers and pagecache. Unfortunately, because of the buffer of quota
         data block is still referenced, quota code cannot read the up to date
         quota info from the device and lead to quota information corruption.
      
      This problem can be reproduced by xfstests generic/231 on ext3 filesystem
      or ext4 filesystem without extent and quota feature.
      
      This patch fix this problem by brelse all indirect buffers, and also
      cleanup the brelse code in ext4_ind_remove_space().
      
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarzhangyi (F) <yi.zhang@huawei.com>
      Reviewed-by: default avatarMiao Xie <miaoxie@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      9b15cfc8
    • Andrey Ryabinin's avatar
      kasan: remove use after scope bugs detection. · 363eeef5
      Andrey Ryabinin authored and 谢秀奇's avatar 谢秀奇 committed
      mainline inclusion
      from mainline-v5.0
      commit 7771bdbb
      category: bugfix
      bugzilla: 10979
      CVE: NA
      
      ------------------------------------------------
      
      Use after scope bugs detector seems to be almost entirely useless for the
      linux kernel.  It exists over two years, but I've seen only one valid bug
      so far [1].  And the bug was fixed before it has been reported.  There
      were some other use-after-scope reports, but they were false-positives due
      to different reasons like incompatibility with structleak plugin.
      
      This feature significantly increases stack usage, especially with GCC < 9
      version, and causes a 32K stack overflow.  It probably adds performance
      penalty too.
      
      Given all that, let's remove use-after-scope detector entirely.
      
      While preparing this patch I've noticed that we mistakenly enable
      use-after-scope detection for clang compiler regardless of
      CONFIG_KASAN_EXTRA setting.  This is also fixed now.
      
      [1] http://lkml.kernel.org/r/<2...
      363eeef5
    • Andrea Arcangeli's avatar
      userfaultfd: use RCU to free the task struct when fork fails if MEMCG · 8eb04a7a
      Andrea Arcangeli authored and 谢秀奇's avatar 谢秀奇 committed
      
      euler inclusion
      category: bugfix
      bugzilla: 10989
      CVE: NA
      
      ------------------------------------------------
      
      MEMCG depends on the task structure not to be freed under
      rcu_read_lock() in get_mem_cgroup_from_mm() after it dereferences
      mm->owner.
      
      A better fix would be to avoid registering forked vmas in userfaultfd
      contexts reported to the monitor, if case fork ends up failing.
      
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarzhong jiang <zhongjiang@huawei.com>
      Reviewed-by: default avatarMiao Xie <miaoxie@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      8eb04a7a
    • David Howells's avatar
      keys: Fix dependency loop between construction record and auth key · 4877d0fd
      David Howells authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0-rc8
      commit 822ad64d
      category: bugfix
      bugzilla: 10783
      CVE: NA
      
      ---------------------------
      
      In the request_key() upcall mechanism there's a dependency loop by which if
      a key type driver overrides the ->request_key hook and the userspace side
      manages to lose the authorisation key, the auth key and the internal
      construction record (struct key_construction) can keep each other pinned.
      
      Fix this by the following changes:
      
       (1) Killing off the construction record and using the auth key instead.
      
       (2) Including the operation name in the auth key payload and making the
           payload available outside of security/keys/.
      
       (3) The ->request_key hook is given the authkey instead of the cons
           record and operation name.
      
      Changes (2) and (3) allow the auth key to naturally be cleaned up if the
      keyring it is in is destroyed or cleared or the auth key is unlinked.
      
      Fixes: 7ee02a316600 ("keys: Fix dependency loop between construction record and auth key")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJames Morris <james.morris@microsoft.com>
      
      Signed-off-by: default avatarJason Yan <yanaijie@huawei.com>
      Reviewed-by: default avatarZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      4877d0fd
    • David Howells's avatar
      keys: Timestamp new keys · 0013e500
      David Howells authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0-rc8
      commit 7c1857bd
      category: bugfix
      bugzilla: 10783
      CVE: NA
      
      ---------------------------
      
      Set the timestamp on new keys rather than leaving it unset.
      
      Fixes: 31d5a79d ("KEYS: Do LRU discard in full keyrings")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJames Morris <james.morris@microsoft.com>
      
      Signed-off-by: default avatarJason Yan <yanaijie@huawei.com>
      Reviewed-by: default avatarZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      0013e500
    • Marcel Holtmann's avatar
      Bluetooth: Check L2CAP option sizes returned from l2cap_get_conf_opt · 8282e42e
      Marcel Holtmann authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0
      commit af3d5d1c
      category: bugfix
      bugzilla: NA
      CVE: CVE-2019-3460
      
      -------------------------------------------------
      
      When doing option parsing for standard type values of 1, 2 or 4 octets,
      the value is converted directly into a variable instead of a pointer. To
      avoid being tricked into being a pointer, check that for these option
      types that sizes actually match. In L2CAP every option is fixed size and
      thus it is prudent anyway to ensure that the remote side sends us the
      right option size along with option paramters.
      
      If the option size is not matching the option type, then that option is
      silently ignored. It is a protocol violation and instead of trying to
      give the remote attacker any further hints just pretend that option is
      not present and proceed with the default values. Implementation
      following the specification and its qualification procedures will always
      use the correct size and thus not being impacted here.
      
      To keep the code readable and consistent accross all options, a few
      cosmetic changes were also required.
      
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJohan Hedberg <johan.hedberg@intel.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      8282e42e
    • Marcel Holtmann's avatar
      Bluetooth: Verify that l2cap_get_conf_opt provides large enough buffer · a3dbdb59
      Marcel Holtmann authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0
      commit 7c9cbd0b
      category: bugfix
      bugzilla: NA
      CVE: CVE-2019-3459
      
      -------------------------------------------------
      
      The function l2cap_get_conf_opt will return L2CAP_CONF_OPT_SIZE + opt->len
      as length value. The opt->len however is in control over the remote user
      and can be used by an attacker to gain access beyond the bounds of the
      actual packet.
      
      To prevent any potential leak of heap memory, it is enough to check that
      the resulting len calculation after calling l2cap_get_conf_opt is not
      below zero. A well formed packet will always return >= 0 here and will
      end with the length value being zero after the last option has been
      parsed. In case of malformed packets messing with the opt->len field the
      length value will become negative. If that is the case, then just abort
      and ignore the option.
      
      In case an attacker uses a too short opt->len value, then garbage will
      be parsed, but that is protected by the unknown option handling and also
      the option parameter size checks.
      
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJohan Hedberg <johan.hedberg@intel.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      a3dbdb59
    • Cheng Jian's avatar
      Revert "perf: Paper over the hw.target problems" · db93c085
      Cheng Jian authored and 谢秀奇's avatar 谢秀奇 committed
      
      euler inclusion
      category: bugfix
      bugzilla: 9513/11006
      CVE: NA
      --------------------------------------------------
      
      This reverts commit b772baf9a14ab4975e8884a399a4e0bab2fb6bf9.
      
      we merge the patch b772baf9a14a ("perf: Paper over the
      hw.target problems") to reslove an use-after-free issue
      (bugzilla #9513/#11006).  but it cause some new problem
      (bugzilla #11050/#11049) in this version.
      
      So just revert it.
      
      Signed-off-by: default avatarCheng Jian <cj.chengjian@huawei.com>
      Reviewed-by: default avatarXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      db93c085
    • Mao Wenan's avatar
      net: hsr: fix memory leak in hsr_dev_finalize() · c91d7b3a
      Mao Wenan authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-master~13
      commit 6caabe7f
      category: bugfix
      bugzilla: 11026
      CVE: NA
      
      -------------------------------------------------
      
      If hsr_add_port(hsr, hsr_dev, HSR_PT_MASTER) failed to
      add port, it directly returns res and forgets to free the node
      that allocated in hsr_create_self_node(), and forgets to delete
      the node->mac_list linked in hsr->self_node_db.
      
      BUG: memory leak
      unreferenced object 0xffff8881cfa0c780 (size 64):
        comm "syz-executor.0", pid 2077, jiffies 4294717969 (age 2415.377s)
        hex dump (first 32 bytes):
          e0 c7 a0 cf 81 88 ff ff 00 02 00 00 00 00 ad de  ................
          00 e6 49 cd 81 88 ff ff c0 9b 87 d0 81 88 ff ff  ..I.............
        backtrace:
          [<00000000e2ff5070>] hsr_dev_finalize+0x736/0x960 [hsr]
          [<000000003ed2e597>] hsr_newlink+0x2b2/0x3e0 [hsr]
          [<000000003fa8c6b6>] __rtnl_newlink+0xf1f/0x1600 net/core/rtnetlink.c:3182
          [<000000001247a7ad>] rtnl_newlink+0x66/0x90 net/core/rtnetlink.c:3240
          [<00000000e7d1b61d>] rtnetlink_rcv_msg+0x54e/0xb90 net/core/rtnetlink.c:5130
          [<000000005556bd3a>] netlink_rcv_skb+0x129/0x340 net/netlink/af_netlink.c:2477
          [<00000000741d5ee6>] netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
          [<00000000741d5ee6>] netlink_unicast+0x49a/0x650 net/netlink/af_netlink.c:1336
          [<000000009d56f9b7>] netlink_sendmsg+0x88b/0xdf0 net/netlink/af_netlink.c:1917
          [<0000000046b35c59>] sock_sendmsg_nosec net/socket.c:621 [inline]
          [<0000000046b35c59>] sock_sendmsg+0xc3/0x100 net/socket.c:631
          [<00000000d208adc9>] __sys_sendto+0x33e/0x560 net/socket.c:1786
          [<00000000b582837a>] __do_sys_sendto net/socket.c:1798 [inline]
          [<00000000b582837a>] __se_sys_sendto net/socket.c:1794 [inline]
          [<00000000b582837a>] __x64_sys_sendto+0xdd/0x1b0 net/socket.c:1794
          [<00000000c866801d>] do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
          [<00000000fea382d9>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<00000000e01dacb3>] 0xffffffffffffffff
      
      Fixes: c5a75911 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarMao Wenan <maowenan@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarMao Wenan <maowenan@huawei.com>
      Reviewed-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      c91d7b3a
    • Xiongfeng Wang's avatar
      posix-cpu-timers: Avoid undefined behaviour in timespec64_to_ns() · 2a3c5819
      Xiongfeng Wang authored and 谢秀奇's avatar 谢秀奇 committed
      
      euler inclusion
      category: feature
      Bugzilla: 10876
      CVE: N/A
      
      ----------------------------------------
      
      When I ran Syzkaller testsuite, I got the following call trace.
      Reviewed-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      
      ================================================================================
      UBSAN: Undefined behaviour in ./include/linux/time64.h:120:27
      signed integer overflow:
      8243129037239968815 * 1000000000 cannot be represented in type 'long long int'
      CPU: 5 PID: 28854 Comm: syz-executor.1 Not tainted 4.19.24 #4
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xca/0x13e lib/dump_stack.c:113
       ubsan_epilogue+0xe/0x81 lib/ubsan.c:159
       handle_overflow+0x193/0x1e2 lib/ubsan.c:190
       timespec64_to_ns include/linux/time64.h:120 [inline]
       posix_cpu_timer_set+0x95a/0xb70 kernel/time/posix-cpu-timers.c:687
       do_timer_settime+0x198/0x2a0 kernel/time/posix-timers.c:892
       __do_sys_timer_settime kernel/time/posix-timers.c:918 [inline]
       __se_sys_timer_settime kernel/time/posix-timers.c:904 [inline]
       __x64_sys_timer_settime+0x18d/0x260 kernel/time/posix-timers.c:904
       do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462eb9
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f14e4127c58 EFLAGS: 00000246 ORIG_RAX: 00000000000000df
      RAX: ffffffffffffffda RBX: 000000000073bfa0 RCX: 0000000000462eb9
      RDX: 0000000020000080 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f14e41286bc
      R13: 00000000004c54cc R14: 0000000000704278 R15: 00000000ffffffff
      ================================================================================
      
      It is because 'it_interval.tv_sec' is larger than 'KTIME_SEC_MAX' and
      'it_interval.tv_sec * NSEC_PER_SEC' overflows in 'timespec64_to_ns()'.
      
      This patch checks whether 'it_interval.tv_sec' is larger than
      'KTIME_SEC_MAX' and saturate if that is the case.
      
      Signed-off-by: default avatarXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      2a3c5819
    • Xiongfeng Wang's avatar
      ntp: Avoid undefined behaviour in second_overflow() · d8df4fe5
      Xiongfeng Wang authored and 谢秀奇's avatar 谢秀奇 committed
      
      euler inclusion
      category: feature
      Bugzilla: 11009
      CVE: N/A
      
      ----------------------------------------
      
      When I ran Syzkaller testsuite, I got the following call trace.
      Reviewed-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      
      ================================================================================
      UBSAN: Undefined behaviour in kernel/time/ntp.c:457:16
      signed integer overflow:
      9223372036854775807 + 500 cannot be represented in type 'long int'
      CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.25-dirty #2
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xca/0x13e lib/dump_stack.c:113
       ubsan_epilogue+0xe/0x81 lib/ubsan.c:159
       handle_overflow+0x193/0x1e2 lib/ubsan.c:190
       second_overflow+0x403/0x540 kernel/time/ntp.c:457
       accumulate_nsecs_to_secs kernel/time/timekeeping.c:2002 [inline]
       logarithmic_accumulation kernel/time/timekeeping.c:2046 [inline]
       timekeeping_advance+0x2bb/0xec0 kernel/time/timekeeping.c:2114
       tick_do_update_jiffies64.part.2+0x1a0/0x350 kernel/time/tick-sched.c:97
       tick_do_update_jiffies64 kernel/time/tick-sched.c:1229 [inline]
       tick_nohz_update_jiffies kernel/time/tick-sched.c:499 [inline]
       tick_nohz_irq_enter kernel/time/tick-sched.c:1232 [inline]
       tick_irq_enter+0x1fd/0x240 kernel/time/tick-sched.c:1249
       irq_enter+0xc4/0x100 kernel/softirq.c:353
       entering_irq arch/x86/include/asm/apic.h:517 [inline]
       entering_ack_irq arch/x86/include/asm/apic.h:523 [inline]
       smp_apic_timer_interrupt+0x20/0x480 arch/x86/kernel/apic/apic.c:1052
       apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864
       </IRQ>
      RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
      Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 c0 21 b1 83 e8 a1 0a 02 ff e9 ab fd ff ff 4c 89 e7 e8 77 b6 a5 fe e9 6a ff ff ff 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
      RSP: 0018:ffff888106307d20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
      RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8881062e4f1c
      RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848c78a0
      R13: 0000000000000003 R14: 1ffff11020c60fae R15: 0000000000000000
       arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
       default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561
       cpuidle_idle_call kernel/sched/idle.c:153 [inline]
       do_idle+0x2ca/0x420 kernel/sched/idle.c:262
       cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368
       start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271
       secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
      ================================================================================
      
      It is because time_maxerror is set as 0x7FFFFFFFFFFFFFFF by user. It
      overflows when we add it with 'MAXFREQ / NSEC_PER_USEC' in
      'second_overflow()'.
      
      This patch add a limit check and saturate it when the user set
      'time_maxerror'.
      
      Signed-off-by: default avatarXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      d8df4fe5
    • Luo Jiaxing's avatar
      scsi: hisi_sas: add softreset behind abort device at I_T_nexus_reset() to... · 55f77c08
      Luo Jiaxing authored and 谢秀奇's avatar 谢秀奇 committed
      scsi: hisi_sas: add softreset behind abort device at I_T_nexus_reset() to ensure decoupling of SATA device
      
      driver inclusion
      category: bugfix
      bugzilla: NA
      CVE: NA
      
      -------------------------------------------------
      
      We found out that SATA disk can not be write but read only after
      system come up. No abnormal IO have come back between init, but
      when we try to write SATA disk, the IO can not return and timeout.
      
      We notice that one if-check is remove at sas_I_T_nexus(), and it
      cause internal_task_abort() will be allow to run besides error
      handle, and obviously softreset_ata() did not run after this
      condition, so it's clear that SATA disk is not decoupling.
      
      Fixes: 0de2941 ("scsi: hisi_sas: remove the check of sas_dev status in function hisi_sas_I_T_nexus_reset()")
      
      Signed-off-by: default avatarLuo Jiaxing <luojiaxing@huawei.com>
      Reviewed-by: default avatarXiang Chen <chenxiang66@hisilicon.com>
      Signed-off-by: default avatarJohn Garry <john.garry@huawei.com>
      Reviewed-by: default avatarXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      55f77c08
    • yangerkun's avatar
      ext4: add mask of ext4 flags to swap · 3d274170
      yangerkun authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-next
      commit abdc644e
      category: bugfix
      bugzilla: 5355
      CVE: NA
      --------------------------------------------------
      
      The reason is that while swapping two inode, we swap the flags too.
      Some flags such as EXT4_JOURNAL_DATA_FL can really confuse the things
      since we're not resetting the address operations structure.  The
      simplest way to keep things sane is to restrict the flags that can be
      swapped.
      
      Signed-off-by: default avataryangerkun <yangerkun@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      3d274170
    • yangerkun's avatar
      ext4: update quota information while swapping boot loader inode · 8641b4d6
      yangerkun authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-next
      commit aa507b5f
      category: bugfix
      bugzilla: 5355
      CVE: NA
      --------------------------------------------------
      
      While do swap between two inode, they swap i_data without update
      quota information. Also, swap_inode_boot_loader can do "revert"
      somtimes, so update the quota while all operations has been finished.
      
      Signed-off-by: default avataryangerkun <yangerkun@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Reviewed-by: default avatarzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      8641b4d6
    • yangerkun's avatar
      ext4: cleanup pagecache before swap i_data · 14048703
      yangerkun authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-next
      commit a46c68a3
      category: bugfix
      bugzilla: 5355
      CVE: NA
      --------------------------------------------------
      
      While do swap, we should make sure there has no new dirty page since we
      should swap i_data between two inode:
      1.We should lock i_mmap_sem with write to avoid new pagecache from mmap
      read/write;
      2.Change filemap_flush to filemap_write_and_wait and move them to the
      space protected by inode lock to avoid new pagecache from buffer read/write.
      
      Signed-off-by: default avataryangerkun <yangerkun@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Reviewed-by: default avatarzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      14048703
    • yangerkun's avatar
      ext4: fix check of inode in swap_inode_boot_loader · b8d0a408
      yangerkun authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-next
      commit 67a11611
      category: bugfix
      bugzilla: 5355
      CVE: NA
      --------------------------------------------------
      
      Before really do swap between inode and boot inode, something need to
      check to avoid invalid or not permitted operation, like does this inode
      has inline data. But the condition check should be protected by inode
      lock to avoid change while swapping. Also some other condition will not
      change between swapping, but there has no problem to do this under inode
      lock.
      
      Fixes: ee3c859409("ext4: disallow files with EXT4_JOURNAL_DATA_FL ...")
      Signed-off-by: default avataryangerkun <yangerkun@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Reviewed-by: default avatarzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      b8d0a408
    • David Howells's avatar
      assoc_array: Fix shortcut creation · 46bf099f
      David Howells authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0-rc8
      commit bb2ba2d7
      category: bugfix
      bugzilla: 10759
      CVE: NA
      
      -------------------------------------------------
      
      Fix the creation of shortcuts for which the length of the index key value
      is an exact multiple of the machine word size.  The problem is that the
      code that blanks off the unused bits of the shortcut value malfunctions if
      the number of bits in the last word equals machine word size.  This is due
      to the "<<" operator being given a shift of zero in this case, and so the
      mask that should be all zeros is all ones instead.  This causes the
      subsequent masking operation to clear everything rather than clearing
      nothing.
      
      Ordinarily, the presence of the hash at the beginning of the tree index key
      makes the issue very hard to test for, but in this case, it was encountered
      due to a development mistake that caused the hash output to be either 0
      (keyring) or 1 (non-keyring) only.  This made it susceptible to the
      keyctl/unlink/valid test in the keyutils package.
      
      The fix is simply to skip the blanking if the shift would be 0.  For
      example, an index key that is 64 bits long would produce a 0 shift and thus
      a 'blank' of all 1s.  This would then be inverted and AND'd onto the
      index_key, incorrectly clearing the entire last word.
      
      Fixes: 3cb98950 ("Add a generic associative array implementation.")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJames Morris <james.morris@microsoft.com>
      Signed-off-by: default avatarCheng Jian <cj.chengjian@huawei.com>
      Reviewed-by: default avatarLi Bin <huawei.libin@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      46bf099f
    • Alexei Starovoitov's avatar
      bpf: fix lockdep false positive in stackmap · ec9d64f0
      Alexei Starovoitov authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0-rc8
      commit 3defaf2f
      category: bugfix
      bugzilla: 10760
      CVE: NA
      
      -------------------------------------------------
      
      Lockdep warns about false positive:
      [   11.211460] ------------[ cut here ]------------
      [   11.211936] DEBUG_LOCKS_WARN_ON(depth <= 0)
      [   11.211985] WARNING: CPU: 0 PID: 141 at ../kernel/locking/lockdep.c:3592 lock_release+0x1ad/0x280
      [   11.213134] Modules linked in:
      [   11.214954] RIP: 0010:lock_release+0x1ad/0x280
      [   11.223508] Call Trace:
      [   11.223705]  <IRQ>
      [   11.223874]  ? __local_bh_enable+0x7a/0x80
      [   11.224199]  up_read+0x1c/0xa0
      [   11.224446]  do_up_read+0x12/0x20
      [   11.224713]  irq_work_run_list+0x43/0x70
      [   11.225030]  irq_work_run+0x26/0x50
      [   11.225310]  smp_irq_work_interrupt+0x57/0x1f0
      [   11.225662]  irq_work_interrupt+0xf/0x20
      
      since rw_semaphore is released in a different task vs task that locked the sema.
      It is expected behavior.
      Fix the warning with up_read_non_owner() and rwsem_release() annotation.
      
      Fixes: bae77c5e ("bpf: enable stackmap with build_id in nmi context")
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarCheng Jian <cj.chengjian@huawei.com>
      Reviewed-by: default avatarLi Bin <huawei.libin@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      ec9d64f0
    • Alexander Shishkin's avatar
      perf: Paper over the hw.target problems · e597bc6a
      Alexander Shishkin authored and 谢秀奇's avatar 谢秀奇 committed
      euler inclusion
      category: bugfix
      bugzilla: 9513/11006
      CVE: NA
      --------------------------------------------------
      
      [ Cheng Jian
      HULK-Syzkaller reported a problem which has been reported
      to mainline(lkml) by syzbot early, this patch comes from the
      reply form lkml.
      https://lkml.org/lkml/2019/2/28/529
      
       ]
      
      First, we have a race between perf_event_release_kernel() and
      perf_free_event(), which happens when parent's event is released while the
      child's fork fails (because of a fatal signal, for example), that looks
      like this:
      
      cpu X                            cpu Y
      -----                            -----
                                       copy_process() error path
      perf_release(parent)             +->perf_event_free_task()
      +-> lock(child_ctx->mutex)       |  |
      +-> remove_from_context(child)   |  |
      +-> unlock(child_ctx->mutex)     |  |
      |                                |  +-> lock(child_ctx->mutex)
      |                                |  +-> unlock(child_ctx->mutex)
      |                                +-> free_task(child_task)
      +-> put_task_struct(child_task)
      
      Technically, we're still holding a reference to the task via
      parent->hw.target, that's not stopping free_task(), so we end up poking at
      free'd memory, as is pointed out by KASAN in the syzkaller report (see Link
      below). The straightforward fix is to drop the hw.target reference while
      the task is still around.
      
      Therein lies the second problem: the users of hw.target (uprobe) assume
      that it's around at ->destroy() callback time, where they use it for
      context. So, in order to not break the uprobe teardown and avoid leaking
      stuff, we need to call ->destroy() at the same time.
      
      This patch fixes the race and the subsequent fallout by doing both these
      things at remove_from_context time.
      
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Link: https://syzkaller.appspot.com/bug?extid=a24c397a29ad22d86c98
      
      
      Reported-by: default avatar <syzbot+a24c397a29ad22d86c98@syzkaller.appspotmail.com>
      Signed-off-by: default avatarCheng Jian <cj.chengjian@huawei.com>
      Reviewed-by: default avatarLi Bin <huawei.libin@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      e597bc6a
    • zhongjiang's avatar
      mm: hwpoison: fix thp split handing in soft_offline_in_use_page() · 41c46e00
      zhongjiang authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.x
      commit: <not-yet-available>
      category: bugfix
      bugzilla: 10883
      CVE: NA
      
      ------------------------------------------------
      
      When soft_offline_in_use_page() runs on a thp tail page after pmd is split,
      we trigger the following VM_BUG_ON_PAGE():
      
      Memory failure: 0x3755ff: non anonymous thp
      __get_any_page: 0x3755ff: unknown zero refcount page type 2fffff80000000
      Soft offlining pfn 0x34d805 at process virtual address 0x20fff000
      page:ffffea000d360140 count:0 mapcount:0 mapping:0000000000000000 index:0x1
      flags: 0x2fffff80000000()
      raw: 002fffff80000000 ffffea000d360108 ffffea000d360188 0000000000000000
      raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
      page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
      ------------[ cut here ]------------
      kernel BUG at ./include/linux/mm.h:519!
      
      soft_offline_in_use_page() passed refcount and page lock from tail page to
      head page, which is not needed because we can pass any subpage to
      split_huge_page().
      
      Naoya had fixed the similar issue in the commit c3901e72 ("
      mm: hwpoison: fix thp split handling in memory_failure()"). But he missed
      fixing soft offline.
      
      Fixes: 61f5d698 ("mm: re-enable THP")
      Cc: <stable@vger.kernel.org>        [4.5+]
      Acked-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarzhongjiang <zhongjiang@huawei.com>
      Reviewed-by: default avatarMiao Xie <miaoxie@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      41c46e00
    • Yufen Yu's avatar
      hugetlbfs: fix memory leak for resv_map · bd575e60
      Yufen Yu authored and 谢秀奇's avatar 谢秀奇 committed
      
      euler inclusion
      category: bugfix
      bugzilla: 10984
      CVE: NA
      ---------------------------
      
      When .mknod create a block device file in hugetlbfs, it will
      allocate an inode, and kmalloc a 'struct resv_map' in resv_map_alloc().
      For now, inode->i_mapping->private_data is used to point the resv_map.
      However, when open the device, bd_acquire() will set i_mapping as
      bd_inode->imapping, result in resv_map memory leak.
      
      We fix the leak by adding a new entry resv_map in hugetlbfs_inode_info.
      It can store resv_map pointer.
      
      Programs to reproduce:
      	mount -t hugetlbfs nodev hugetlbfs
      	mknod hugetlbfs/dev b 0 0
      	exec 30<> hugetlbfs/dev
      	umount hugetlbfs/
      
      Fixes: 9119a41e ("mm, hugetlb: unify region structure handling")
      Signed-off-by: default avatarYufen Yu <yuyufen@huawei.com>
      Reviewed-by: default avatarMiao Xie <miaoxie@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      bd575e60
    • Greg Kroah-Hartman's avatar
      Linux 4.19.27 · d95806c9
      Greg Kroah-Hartman authored and 谢秀奇's avatar 谢秀奇 committed
      
      Merge 75 patches from 4.19.27 stable
      branch (79 total) beside 4 already merged patches
      
      0655618 irqchip/gic-v3-mbi: Fix uninitialized mbi_lock
      5024f0a sched/wait: Fix rcuwait_wake_up() ordering
      2368e6d futex: Fix (possible) missed wakeup
      9ad6216 locking/rwsem: Fix (possible) missed wakeup
      
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      d95806c9
    • Andy Lutomirski's avatar
      x86/uaccess: Don't leak the AC flag into __put_user() value evaluation · 9975334e
      Andy Lutomirski authored and 谢秀奇's avatar 谢秀奇 committed
      
      commit 2a418cf3 upstream.
      
      When calling __put_user(foo(), ptr), the __put_user() macro would call
      foo() in between __uaccess_begin() and __uaccess_end().  If that code
      were buggy, then those bugs would be run without SMAP protection.
      
      Fortunately, there seem to be few instances of the problem in the
      kernel. Nevertheless, __put_user() should be fixed to avoid doing this.
      Therefore, evaluate __put_user()'s argument before setting AC.
      
      This issue was noticed when an objtool hack by Peter Zijlstra complained
      about genregs_get() and I compared the assembly output to the C source.
      
       [ bp: Massage commit message and fixed up whitespace. ]
      
      Fixes: 11f1a4b9 ("x86: reorganize SMAP handling in user space accesses")
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20190225125231.845656645@infradead.org
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      9975334e
    • Paul Burton's avatar
      MIPS: eBPF: Fix icache flush end address · 466a5894
      Paul Burton authored and 谢秀奇's avatar 谢秀奇 committed
      
      commit d1a2930d upstream.
      
      The MIPS eBPF JIT calls flush_icache_range() in order to ensure the
      icache observes the code that we just wrote. Unfortunately it gets the
      end address calculation wrong due to some bad pointer arithmetic.
      
      The struct jit_ctx target field is of type pointer to u32, and as such
      adding one to it will increment the address being pointed to by 4 bytes.
      Therefore in order to find the address of the end of the code we simply
      need to add the number of 4 byte instructions emitted, but we mistakenly
      add the number of instructions multiplied by 4. This results in the call
      to flush_icache_range() operating on a memory region 4x larger than
      intended, which is always wasteful and can cause crashes if we overrun
      into an unmapped page.
      
      Fix this by correcting the pointer arithmetic to remove the bogus
      multiplication, and use braces to remove the need for a set of brackets
      whilst also making it obvious that the target field is a pointer.
      
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Fixes: b6bd53f9 ("MIPS: Add missing file for eBPF JIT.")
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: netdev@vger.kernel.org
      Cc: bpf@vger.kernel.org
      Cc: linux-mips@vger.kernel.org
      Cc: stable@vger.kernel.org # v4.13+
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      466a5894
    • Jonas Gorski's avatar
      MIPS: BCM63XX: provide DMA masks for ethernet devices · 1dac92b6
      Jonas Gorski authored and 谢秀奇's avatar 谢秀奇 committed
      
      commit 18836b48 upstream.
      
      The switch to the generic dma ops made dma masks mandatory, breaking
      devices having them not set. In case of bcm63xx, it broke ethernet with
      the following warning when trying to up the device:
      
      [    2.633123] ------------[ cut here ]------------
      [    2.637949] WARNING: CPU: 0 PID: 325 at ./include/linux/dma-mapping.h:516 bcm_enetsw_open+0x160/0xbbc
      [    2.647423] Modules linked in: gpio_button_hotplug
      [    2.652361] CPU: 0 PID: 325 Comm: ip Not tainted 4.19.16 #0
      [    2.658080] Stack : 80520000 804cd3ec 00000000 00000000 804ccc00 87085bdc 87d3f9d4 804f9a17
      [    2.666707]         8049cf18 00000145 80a942a0 00000204 80ac0000 10008400 87085b90 eb3d5ab7
      [    2.675325]         00000000 00000000 80ac0000 000022b0 00000000 00000000 00000007 00000000
      [    2.683954]         0000007a 80500000 0013b381 00000000 80000000 00000000 804a1664 80289878
      [    2.692572]         00000009 00000204 80ac0000 00000200 00000002 00000000 00000000 80a90000
      [    2.701191]         ...
      [    2.703701] Call Trace:
      [    2.706244] [<8001f3c8>] show_stack+0x58/0x100
      [    2.710840] [<800336e4>] __warn+0xe4/0x118
      [    2.715049] [<800337d4>] warn_slowpath_null+0x48/0x64
      [    2.720237] [<80289878>] bcm_enetsw_open+0x160/0xbbc
      [    2.725347] [<802d1d4c>] __dev_open+0xf8/0x16c
      [    2.729913] [<802d20cc>] __dev_change_flags+0x100/0x1c4
      [    2.735290] [<802d21b8>] dev_change_flags+0x28/0x70
      [    2.740326] [<803539e0>] devinet_ioctl+0x310/0x7b0
      [    2.745250] [<80355fd8>] inet_ioctl+0x1f8/0x224
      [    2.749939] [<802af290>] sock_ioctl+0x30c/0x488
      [    2.754632] [<80112b34>] do_vfs_ioctl+0x740/0x7dc
      [    2.759459] [<80112c20>] ksys_ioctl+0x50/0x94
      [    2.763955] [<800240b8>] syscall_common+0x34/0x58
      [    2.768782] ---[ end trace fb1a6b14d74e28b6 ]---
      [    2.773544] bcm63xx_enetsw bcm63xx_enetsw.0: cannot allocate rx ring 512
      
      Fix this by adding appropriate DMA masks for the platform devices.
      
      Fixes: f8c55dc6 ("MIPS: use generic dma noncoherent ops for simple noncoherent platforms")
      Signed-off-by: default avatarJonas Gorski <jonas.gorski@gmail.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      1dac92b6
    • Michael Clark's avatar
      MIPS: fix truncation in __cmpxchg_small for short values · fed715e6
      Michael Clark authored and 谢秀奇's avatar 谢秀奇 committed
      
      commit 94ee12b5 upstream.
      
      __cmpxchg_small erroneously uses u8 for load comparison which can
      be either char or short. This patch changes the local variable to
      u32 which is sufficiently sized, as the loaded value is already
      masked and shifted appropriately. Using an integer size avoids
      any unnecessary canonicalization from use of non native widths.
      
      This patch is part of a series that adapts the MIPS small word
      atomics code for xchg and cmpxchg on short and char to RISC-V.
      
      Cc: RISC-V Patches <patches@groups.riscv.org>
      Cc: Linux RISC-V <linux-riscv@lists.infradead.org>
      Cc: Linux MIPS <linux-mips@linux-mips.org>
      Signed-off-by: default avatarMichael Clark <michaeljclark@mac.com>
      [paul.burton@mips.com:
        - Fix varialble typo per Jonas Gorski.
        - Consolidate load variable with other declarations.]
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Fixes: 3ba7f44d ("MIPS: cmpxchg: Implement 1 byte & 2 byte cmpxchg()")
      Cc: stable@vger.kernel.org # v4.13+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      fed715e6
    • Mike Kravetz's avatar
      hugetlbfs: fix races and page leaks during migration · 58d3ea0e
      Mike Kravetz authored and 谢秀奇's avatar 谢秀奇 committed
      commit cb6acd01 upstream.
      
      hugetlb pages should only be migrated if they are 'active'.  The
      routines set/clear_page_huge_active() modify the active state of hugetlb
      pages.
      
      When a new hugetlb page is allocated at fault time, set_page_huge_active
      is called before the page is locked.  Therefore, another thread could
      race and migrate the page while it is being added to page table by the
      fault code.  This race is somewhat hard to trigger, but can be seen by
      strategically adding udelay to simulate worst case scheduling behavior.
      Depending on 'how' the code races, various BUG()s could be triggered.
      
      To address this issue, simply delay the set_page_huge_active call until
      after the page is successfully added to the page table.
      
      Hugetlb pages can also be leaked at migration time if the pages are
      associated with a file in an explicitly mounted hugetlbfs filesystem.
      For example, consider a two node system with 4GB worth of huge pages
      available.  A program mmaps a 2G file in a hugetlbfs filesystem.  It
      then migrates the pages associated with the file from one node to
      another.  When the program exits, huge page counts are as follows:
      
        node0
        1024    free_hugepages
        1024    nr_hugepages
      
        node1
        0       free_hugepages
        1024    nr_hugepages
      
        Filesystem                         Size  Used Avail Use% Mounted on
        nodev                              4.0G  2.0G  2.0G  50% /var/opt/hugepool
      
      That is as expected.  2G of huge pages are taken from the free_hugepages
      counts, and 2G is the size of the file in the explicitly mounted
      filesystem.  If the file is then removed, the counts become:
      
        node0
        1024    free_hugepages
        1024    nr_hugepages
      
        node1
        1024    free_hugepages
        1024    nr_hugepages
      
        Filesystem                         Size  Used Avail Use% Mounted on
        nodev                              4.0G  2.0G  2.0G  50% /var/opt/hugepool
      
      Note that the filesystem still shows 2G of pages used, while there
      actually are no huge pages in use.  The only way to 'fix' the filesystem
      accounting is to unmount the filesystem
      
      If a hugetlb page is associated with an explicitly mounted filesystem,
      this information in contained in the page_private field.  At migration
      time, this information is not preserved.  To fix, simply transfer
      page_private from old to new page at migration time if necessary.
      
      There is a related race with removing a huge page from a file and
      migration.  When a huge page is removed from the pagecache, the
      page_mapping() field is cleared, yet page_private remains set until the
      page is actually freed by free_huge_page().  A page could be migrated
      while in this state.  However, since page_mapping() is not set the
      hugetlbfs specific routine to transfer page_private is not called and we
      leak the page count in the filesystem.
      
      To fix that, check for this condition before migrating a huge page.  If
      the condition is detected, return EBUSY for the page.
      
      Link: http://lkml.kernel.org/r/74510272-7319-7372-9ea6-ec914734c179@oracle.com
      Link: http://lkml.kernel.org/r/20190212221400.3512-1-mike.kravetz@oracle.com
      
      
      Fixes: bcc54222 ("mm: hugetlb: introduce page_huge_active")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: <stable@vger.kernel.org>
      [mike.kravetz@oracle.com: v2]
        Link: http://lkml.kernel.org/r/7534d322-d782-8ac6-1c8d-a8dc380eb3ab@oracle.com
      [mike.kravetz@oracle.com: update comment and changelog]
        Link: http://lkml.kernel.org/r/420bcfd6-158b-38e4-98da-26d0cd85bd01@oracle.com
      
      
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      58d3ea0e
    • Nicholas Kazlauskas's avatar
      drm: Block fb changes for async plane updates · 466ab537
      Nicholas Kazlauskas authored and 谢秀奇's avatar 谢秀奇 committed
      
      commit 22163229 upstream.
      
      The prepare_fb call always happens on new_plane_state.
      
      The drm_atomic_helper_cleanup_planes checks to see if
      plane state pointer has changed when deciding to call cleanup_fb on
      either the new_plane_state or the old_plane_state.
      
      For a non-async atomic commit the state pointer is swapped, so this
      helper calls prepare_fb on the new_plane_state and cleanup_fb on the
      old_plane_state. This makes sense, since we want to prepare the
      framebuffer we are going to use and cleanup the the framebuffer we are
      no longer using.
      
      For the async atomic update helpers this differs. The async atomic
      update helpers perform in-place updates on the existing state. They call
      drm_atomic_helper_cleanup_planes but the state pointer is not swapped.
      This means that prepare_fb is called on the new_plane_state and
      cleanup_fb is called on the new_plane_state (not the old).
      
      In the case where old_plane_state->fb == new_plane_state->fb then
      there should be no behavioral difference between an async update
      and a non-async commit. But there are issues that arise when
      old_plane_state->fb != new_plane_state->fb.
      
      The first is that the new_plane_state->fb is immediately cleaned up
      after it has been prepared, so we're using a fb that we shouldn't
      be.
      
      The second occurs during a sequence of async atomic updates and
      non-async regular atomic commits. Suppose there are two framebuffers
      being interleaved in a double-buffering scenario, fb1 and fb2:
      
      - Async update, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb1
      - Async update, oldfb = fb1, newfb = fb2, prepare fb2, cleanup fb2
      - Non-async commit, oldfb = fb2, newfb = fb1, prepare fb1, cleanup fb2
      
      We call cleanup_fb on fb2 twice in this example scenario, and any
      further use will result in use-after-free.
      
      The simple fix to this problem is to block framebuffer changes
      in the drm_atomic_helper_async_check function for now.
      
      v2: Move check by itself, add a FIXME (Daniel)
      
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Harry Wentland <harry.wentland@amd.com>
      Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
      Cc: <stable@vger.kernel.org> # v4.14+
      Fixes: fef9df8b ("drm/atomic: initial support for asynchronous plane update")
      Signed-off-by: default avatarNicholas Kazlauskas <nicholas.kazlauskas@amd.com>
      Acked-by: default avatarAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Acked-by: default avatarHarry Wentland <harry.wentland@amd.com>
      Reviewed-by: default avatarDaniel Vetter <daniel@ffwll.ch>
      Signed-off-by: default avatarHarry Wentland <harry.wentland@amd.com>
      Link: https://patchwork.freedesktop.org/patch/275364/
      
      
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      466ab537
    • Jann Horn's avatar
      mm: enforce min addr even if capable() in expand_downwards() · 811c0f72
      Jann Horn authored and 谢秀奇's avatar 谢秀奇 committed
      
      commit 0a1d5299 upstream.
      
      security_mmap_addr() does a capability check with current_cred(), but
      we can reach this code from contexts like a VFS write handler where
      current_cred() must not be used.
      
      This can be abused on systems without SMAP to make NULL pointer
      dereferences exploitable again.
      
      Fixes: 8869477a ("security: protect from stack expansion into low vm addresses")
      Cc: stable@kernel.org
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      811c0f72
    • BOUGH CHEN's avatar
      mmc: sdhci-esdhc-imx: correct the fix of ERR004536 · e5e6aa1f
      BOUGH CHEN authored and 谢秀奇's avatar 谢秀奇 committed
      
      commit e30be063 upstream.
      
      Commit 18094430 ("mmc: sdhci-esdhc-imx: add ADMA Length
      Mismatch errata fix") involve the fix of ERR004536, but the
      fix is incorrect. Double confirm with IC, need to clear the
      bit 7 of register 0x6c rather than set this bit 7.
      Here is the definition of bit 7 of 0x6c:
          0: enable the new IC fix for ERR004536
          1: do not use the IC fix, keep the same as before
      
      Find this issue on i.MX845s-evk board when enable CMDQ, and
      let system in heavy loading.
      
      root@imx8mmevk:~# dd if=/dev/mmcblk2 of=/dev/null bs=1M &
      root@imx8mmevk:~# memtester 1000M > /dev/zero &
      root@imx8mmevk:~# [  139.897220] mmc2: cqhci: timeout for tag 16
      [  139.901417] mmc2: cqhci: ============ CQHCI REGISTER DUMP ===========
      [  139.907862] mmc2: cqhci: Caps:      0x0000310a | Version:  0x00000510
      [  139.914311] mmc2: cqhci: Config:    0x00001001 | Control:  0x00000000
      [  139.920753] mmc2: cqhci: Int stat:  0x00000000 | Int enab: 0x00000006
      [  139.927193] mmc2: cqhci: Int sig:   0x00000006 | Int Coal: 0x00000000
      [  139.933634] mmc2: cqhci: TDL base:  0x7809c000 | TDL up32: 0x00000000
      [  139.940073] mmc2: cqhci: Doorbell:  0x00030000 | TCN:      0x00000000
      [  139.946518] mmc2: cqhci: Dev queue: 0x00010000 | Dev Pend: 0x00010000
      [  139.952967] mmc2: cqhci: Task clr:  0x00000000 | SSC1:     0x00011000
      [  139.959411] mmc2: cqhci: SSC2:      0x00000001 | DCMD rsp: 0x00000000
      [  139.965857] mmc2: cqhci: RED mask:  0xfdf9a080 | TERRI:    0x00000000
      [  139.972308] mmc2: cqhci: Resp idx:  0x0000002e | Resp arg: 0x00000900
      [  139.978761] mmc2: sdhci: ============ SDHCI REGISTER DUMP ===========
      [  139.985214] mmc2: sdhci: Sys addr:  0xb2c19000 | Version:  0x00000002
      [  139.991669] mmc2: sdhci: Blk size:  0x00000200 | Blk cnt:  0x00000400
      [  139.998127] mmc2: sdhci: Argument:  0x40110400 | Trn mode: 0x00000033
      [  140.004618] mmc2: sdhci: Present:   0x01088a8f | Host ctl: 0x00000030
      [  140.011113] mmc2: sdhci: Power:     0x00000002 | Blk gap:  0x00000080
      [  140.017583] mmc2: sdhci: Wake-up:   0x00000008 | Clock:    0x0000000f
      [  140.024039] mmc2: sdhci: Timeout:   0x0000008f | Int stat: 0x00000000
      [  140.030497] mmc2: sdhci: Int enab:  0x107f4000 | Sig enab: 0x107f4000
      [  140.036972] mmc2: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000502
      [  140.043426] mmc2: sdhci: Caps:      0x07eb0000 | Caps_1:   0x8000b407
      [  140.049867] mmc2: sdhci: Cmd:       0x00002c1a | Max curr: 0x00ffffff
      [  140.056314] mmc2: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0xffffffff
      [  140.062755] mmc2: sdhci: Resp[2]:   0x328f5903 | Resp[3]:  0x00d00f00
      [  140.069195] mmc2: sdhci: Host ctl2: 0x00000008
      [  140.073640] mmc2: sdhci: ADMA Err:  0x00000007 | ADMA Ptr: 0x7809c108
      [  140.080079] mmc2: sdhci: ============================================
      [  140.086662] mmc2: running CQE recovery
      
      Fixes: 18094430 ("mmc: sdhci-esdhc-imx: add ADMA Length Mismatch errata fix")
      Signed-off-by: default avatarHaibo Chen <haibo.chen@nxp.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      e5e6aa1f
    • Alamy Liu's avatar
      mmc: cqhci: Fix a tiny potential memory leak on error condition · 865d5f28
      Alamy Liu authored and 谢秀奇's avatar 谢秀奇 committed
      
      commit d07e9fad upstream.
      
      Free up the allocated memory in the case of error return
      
      The value of mmc_host->cqe_enabled stays 'false'. Thus, cqhci_disable
      (mmc_cqe_ops->cqe_disable) won't be called to free the memory.  Also,
      cqhci_disable() seems to be designed to disable and free all resources, not
      suitable to handle this corner case.
      
      Fixes: a4080225 ("mmc: cqhci: support for command queue enabled host")
      Signed-off-by: default avatarAlamy Liu <alamy.liu@gmail.com>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      865d5f28
    • Alamy Liu's avatar
      mmc: cqhci: fix space allocated for transfer descriptor · 18595dde
      Alamy Liu authored and 谢秀奇's avatar 谢秀奇 committed
      
      commit 27ec9dc1 upstream.
      
      There is not enough space being allocated when DCMD is disabled.
      
      CQE_DCMD is not necessary to be enabled when CQE is enabled.
      (Software could halt CQE to send command)
      
      In the case that CQE_DCMD is not enabled, it still needs to allocate
      space for data transfer. For instance:
        CQE_DCMD is enabled:  31 slots space (one slot used by DCMD)
        CQE_DCMD is disabled: 32 slots space
      
      Fixes: a4080225 ("mmc: cqhci: support for command queue enabled host")
      Signed-off-by: default avatarAlamy Liu <alamy.liu@gmail.com>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      18595dde
    • Ritesh Harjani's avatar
      mmc: core: Fix NULL ptr crash from mmc_should_fail_request · 658988db
      Ritesh Harjani authored and 谢秀奇's avatar 谢秀奇 committed
      
      commit e5723f95 upstream.
      
      In case of CQHCI, mrq->cmd may be NULL for data requests (non DCMD).
      In such case mmc_should_fail_request is directly dereferencing
      mrq->cmd while cmd is NULL.
      Fix this by checking for mrq->cmd pointer.
      
      Fixes: 72a5af55 ("mmc: core: Add support for handling CQE requests")
      Signed-off-by: default avatarRitesh Harjani <riteshh@codeaurora.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      658988db
    • Takeshi Saito's avatar
      mmc: tmio: fix access width of Block Count Register · ec95b697
      Takeshi Saito authored and 谢秀奇's avatar 谢秀奇 committed
      
      commit 5603731a upstream.
      
      In R-Car Gen2 or later, the maximum number of transfer blocks are
      changed from 0xFFFF to 0xFFFFFFFF. Therefore, Block Count Register
      should use iowrite32().
      
      If another system (U-boot, Hypervisor OS, etc) uses bit[31:16], this
      value will not be cleared. So, SD/MMC card initialization fails.
      
      So, check for the bigger register and use apropriate write. Also, mark
      the register as extended on Gen2.
      
      Signed-off-by: default avatarTakeshi Saito <takeshi.saito.xv@renesas.com>
      [wsa: use max_blk_count in if(), add Gen2, update commit message]
      Signed-off-by: default avatarWolfram Sang <wsa+renesas@sang-engineering.com>
      Cc: stable@kernel.org
      Reviewed-by: default avatarSimon Horman <horms+renesas@verge.net.au>
      [Ulf: Fixed build error]
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      ec95b697
    • Sergei Shtylyov's avatar
      mmc: tmio_mmc_core: don't claim spurious interrupts · 389cf99a
      Sergei Shtylyov authored and 谢秀奇's avatar 谢秀奇 committed
      
      commit 5c27ff5d upstream.
      
      I have encountered an interrupt storm during the eMMC chip probing (and
      the chip finally didn't get detected).  It turned out that U-Boot left
      the DMAC interrupts enabled while the Linux driver  didn't use those.
      The SDHI driver's interrupt handler somehow assumes that, even if an
      SDIO interrupt didn't happen, it should return IRQ_HANDLED.  I think
      that if none of the enabled interrupts happened and got handled, we
      should return IRQ_NONE -- that way the kernel IRQ code recoginizes
      a spurious interrupt and masks it off pretty quickly...
      
      Fixes: 7729c7a2 ("mmc: tmio: Provide separate interrupt handlers")
      Signed-off-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Reviewed-by: default avatarWolfram Sang <wsa+renesas@sang-engineering.com>
      Tested-by: default avatarWolfram Sang <wsa+renesas@sang-engineering.com>
      Reviewed-by: default avatarSimon Horman <horms+renesas@verge.net.au>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      389cf99a
    • Jonathan Neuschäfer's avatar
      mmc: spi: Fix card detection during probe · a37f8380
      Jonathan Neuschäfer authored and 谢秀奇's avatar 谢秀奇 committed
      
      commit c9bd505d upstream.
      
      When using the mmc_spi driver with a card-detect pin, I noticed that the
      card was not detected immediately after probe, but only after it was
      unplugged and plugged back in (and the CD IRQ fired).
      
      The call tree looks something like this:
      
      mmc_spi_probe
        mmc_add_host
          mmc_start_host
            _mmc_detect_change
              mmc_schedule_delayed_work(&host->detect, 0)
                mmc_rescan
                  host->bus_ops->detect(host)
                    mmc_detect
                      _mmc_detect_card_removed
                        host->ops->get_cd(host)
                          mmc_gpio_get_cd -> -ENOSYS (ctx->cd_gpio not set)
        mmc_gpiod_request_cd
          ctx->cd_gpio = desc
      
      To fix this issue, call mmc_detect_change after the card-detect GPIO/IRQ
      is registered.
      
      Signed-off-by: default avatarJonathan Neuschäfer <j.neuschaefer@gmx.net>
      Reviewed-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      a37f8380
    • Ben Gardon's avatar
      kvm: selftests: Fix region overlap check in kvm_util · 8b835f79
      Ben Gardon authored and 谢秀奇's avatar 谢秀奇 committed
      
      [ Upstream commit 94a980c3 ]
      
      Fix a call to userspace_mem_region_find to conform to its spec of
      taking an inclusive, inclusive range. It was previously being called
      with an inclusive, exclusive range. Also remove a redundant region bounds
      check in vm_userspace_mem_region_add. Region overlap checking is already
      performed by the call to userspace_mem_region_find.
      
      Tested: Compiled tools/testing/selftests/kvm with -static
      	Ran all resulting test binaries on an Intel Haswell test machine
      	All tests passed
      
      Signed-off-by: default avatarBen Gardon <bgardon@google.com>
      Reviewed-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      8b835f79