Skip to content
Snippets Groups Projects
  1. Dec 27, 2019
    • Dou Liyang's avatar
      irq/matrix: Spread managed interrupts on allocation · 5fc68885
      Dou Liyang authored and 谢秀奇's avatar 谢秀奇 committed
      
      [ Upstream commit 76f99ae5 ]
      
      Linux spreads out the non managed interrupt across the possible target CPUs
      to avoid vector space exhaustion.
      
      Managed interrupts are treated differently, as for them the vectors are
      reserved (with guarantee) when the interrupt descriptors are initialized.
      
      When the interrupt is requested a real vector is assigned. The assignment
      logic uses the first CPU in the affinity mask for assignment. If the
      interrupt has more than one CPU in the affinity mask, which happens when a
      multi queue device has less queues than CPUs, then doing the same search as
      for non managed interrupts makes sense as it puts the interrupt on the
      least interrupt plagued CPU. For single CPU affine vectors that's obviously
      a NOOP.
      
      Restructre the matrix allocation code so it does the 'best CPU' search, add
      the sanity check for an empty affinity mask and adapt the call site in the
      x86 vector management code.
      
      [ tglx: Added the empty mask check to the core and improved change log ]
      
      Signed-off-by: default avatarDou Liyang <douly.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: hpa@zytor.com
      Link: https://lkml.kernel.org/r/20180908175838.14450-2-dou_liyang@163.com
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      5fc68885
    • Dou Liyang's avatar
      irq/matrix: Split out the CPU selection code into a helper · 916144b4
      Dou Liyang authored and 谢秀奇's avatar 谢秀奇 committed
      
      [ Upstream commit 8ffe4e61 ]
      
      Linux finds the CPU which has the lowest vector allocation count to spread
      out the non managed interrupts across the possible target CPUs, but does
      not do so for managed interrupts.
      
      Split out the CPU selection code into a helper function for reuse. No
      functional change.
      
      Signed-off-by: default avatarDou Liyang <douly.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: hpa@zytor.com
      Link: https://lkml.kernel.org/r/20180908175838.14450-1-dou_liyang@163.com
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      916144b4
    • Wei Li's avatar
      perf machine: Update kernel map address and re-order properly · 35e44e29
      Wei Li authored and 谢秀奇's avatar 谢秀奇 committed
      
      hulk inclusion
      category: bugfix
      bugzilla: NA
      CVE: NA
      -------------------------------------------------
      
      Since commit 1fb87b8e ("perf machine: Don't search for active kernel
      start in __machine__create_kernel_maps"), the
      __machine__create_kernel_maps() just create a map what start and end are
      both zero. Though the address will be updated later, the order of map in
      the rbtree may be incorrect.
      
      The commit ee05d217 ("perf machine: Set main kernel end address
      properly") fixed the logic in machine__create_kernel_maps(), but it's
      still wrong in function machine__process_kernel_mmap_event().
      
      To reproduce this issue, we need an environment which the module address
      is before the kernel text segment. I tested it on an aarch64 machine with
      kernel 4.19.25:
      
      [root@localhost hulk]# grep _stext /proc/kallsyms
      ffff000008081000 T _stext
      [root@localhost hulk]# grep _etext /proc/kallsyms
      ffff000009780000 R _etext
      [root@localhost hulk]# tail /proc/modules
      hisi_sas_v2_hw 77824 0 - Live 0xffff00000191d000
      nvme_core 126976 7 nvme, Live 0xffff0000018b6000
      mdio 20480 1 ixgbe, Live 0xffff0000018ab000
      hisi_sas_main 106496 1 hisi_sas_v2_hw, Live 0xffff000001861000
      hns_mdio 20480 2 - Live 0xffff000001822000
      hnae 28672 3 hns_dsaf,hns_enet_drv, Live 0xffff000001815000
      dm_mirror 40960 0 - Live 0xffff000001804000
      dm_region_hash 32768 1 dm_mirror, Live 0xffff0000017f5000
      dm_log 32768 2 dm_mirror,dm_region_hash, Live 0xffff0000017e7000
      dm_mod 315392 17 dm_mirror,dm_log, Live 0xffff000001780000
      [root@localhost hulk]#
      
      Before fix:
      
      [root@localhost bin]# perf record sleep 3
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.011 MB perf.data (9 samples) ]
      [root@localhost bin]# perf buildid-list -i perf.data
      4c4e46c971ca935f781e603a09b52a92e8bdfee8 [vdso]
      [root@localhost bin]# perf buildid-list -i perf.data -H
      0000000000000000000000000000000000000000 /proc/kcore
      [root@localhost bin]#
      
      After fix:
      
      [root@localhost tools]# ./perf/perf record sleep 3
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.011 MB perf.data (9 samples) ]
      [root@localhost tools]# ./perf/perf buildid-list -i perf.data
      28a6c690262896dbd1b5e1011ed81623e6db0610 [kernel.kallsyms]
      106c14ce6e4acea3453e484dc604d66666f08a2f [vdso]
      [root@localhost tools]# ./perf/perf buildid-list -i perf.data -H
      28a6c690262896dbd1b5e1011ed81623e6db0610 /proc/kcore
      
      Signed-off-by: default avatarWei Li <liwei391@huawei.com>
      Reviewed-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      35e44e29
    • Vladimir Murzin's avatar
      arm64: Relax GIC version check during early boot · c6023d98
      Vladimir Murzin authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0
      commit 74698f69
      category: bugfix
      bugzilla: 10793
      CVE: NA
      
      -------------------------------------------------
      
      Updates to the GIC architecture allow ID_AA64PFR0_EL1.GIC to have
      values other than 0 or 1. At the moment, Linux is quite strict in the
      way it handles this field at early boot stage (cpufeature is fine) and
      will refuse to use the system register CPU interface if it doesn't
      find the value 1.
      
      Fixes: 021f6537 ("irqchip: gic-v3: Initial support for GICv3")
      Reported-by: default avatarChase Conklin <Chase.Conklin@arm.com>
      Reviewed-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarVladimir Murzin <vladimir.murzin@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Reviewed-by: default avatarXuefeng Wang <wxf.wang@hisilicon.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      c6023d98
    • Darrick J. Wong's avatar
      tmpfs: fix uninitialized return value in shmem_link · c91994d7
      Darrick J. Wong authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-v5.0-rc8
      commit 29b00e60
      category: bugfix
      bugzilla: 10938
      CVE: NA
      
      -------------------------------------------------
      
      When we made the shmem_reserve_inode call in shmem_link conditional, we
      forgot to update the declaration for ret so that it always has a known
      value.  Dan Carpenter pointed out this deficiency in the original patch.
      
      Fixes: 1062af92 ("tmpfs: fix link accounting when a tmpfile is linked in")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Matej Kupljen <matej.kupljen@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarzhong jiang <zhongjiang@huawei.com>
      Reviewed-by: default avatarJing Xiangfeng <jingxiangfeng@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      c91994d7
    • Darrick J. Wong's avatar
      tmpfs: fix link accounting when a tmpfile is linked in · 1e744256
      Darrick J. Wong authored and 谢秀奇's avatar 谢秀奇 committed
      mainline inclusion
      from mainline-v5.0-rc8
      commit 1062af92
      category: bugfix
      bugzilla: 10800
      CVE: NA
      
      -------------------------------------------------
      
      tmpfs has a peculiarity of accounting hard links as if they were
      separate inodes: so that when the number of inodes is limited, as it is
      by default, a user cannot soak up an unlimited amount of unreclaimable
      dcache memory just by repeatedly linking a file.
      
      But when v3.11 added O_TMPFILE, and the ability to use linkat() on the
      fd, we missed accommodating this new case in tmpfs: "df -i" shows that
      an extra "inode" remains accounted after the file is unlinked and the fd
      closed and the actual inode evicted.  If a user repeatedly links
      tmpfiles into a tmpfs, the limit will be hit (ENOSPC) even after they
      are deleted.
      
      Just skip the extra reservation from shmem_link() in this case: there's
      a sense in which this first link of a tmpfile is then cheaper than a
      hard link of another file, but the accounting works out, and there's
      still good limiting, so no need to do anything more complicated.
      
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1902182134370.7035@eggly.anvils
      
      
      Fixes: f4e0c30c ("allow the temp files created by open() to be linked to")
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatarMatej Kupljen <matej.kupljen@gmail.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarzhong jiang <zhongjiang@huawei.com>
      Reviewed-by: default avatarJing Xiangfeng <jingxiangfeng@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      1e744256
    • Tobias Brunner's avatar
      xfrm: Fix inbound traffic via XFRM interfaces across network namespaces · 77d883e7
      Tobias Brunner authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0
      commit 660899dd
      category: bugfix
      bugzilla: 10928
      CVE: NA
      
      -------------------------------------------------
      
      After moving an XFRM interface to another namespace it stays associated
      with the original namespace (net in `struct xfrm_if` and the list keyed
      with `xfrmi_net_id`), allowing processes in the new namespace to use
      SAs/policies that were created in the original namespace.  For instance,
      this allows a keying daemon in one namespace to establish IPsec SAs for
      other namespaces without processes there having access to the keys or IKE
      credentials.
      
      This worked fine for outbound traffic, however, for inbound traffic the
      lookup for the interfaces and the policies used the incorrect namespace
      (the one the XFRM interface was moved to).
      
      Fixes: f203b76d ("xfrm: Add virtual xfrm interfaces")
      Signed-off-by: default avatarTobias Brunner <tobias@strongswan.org>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarZhiqiang Liu <liuzhiqiang26@huawei.com>
      Reviewed-by: default avatarWenan Mao <maowenan@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      77d883e7
    • Cong Wang's avatar
      xfrm: destroy xfrm_state synchronously on net exit path · d985029e
      Cong Wang authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0
      commit f75a2804
      category: bugfix
      bugzilla: 10919
      CVE: NA
      
      -------------------------------------------------
      
      xfrm_state_put() moves struct xfrm_state to the GC list
      and schedules the GC work to clean it up. On net exit call
      path, xfrm_state_flush() is called to clean up and
      xfrm_flush_gc() is called to wait for the GC work to complete
      before exit.
      
      However, this doesn't work because one of the ->destructor(),
      ipcomp_destroy(), schedules the same GC work again inside
      the GC work. It is hard to wait for such a nested async
      callback. This is also why syzbot still reports the following
      warning:
      
       WARNING: CPU: 1 PID: 33 at net/ipv6/xfrm6_tunnel.c:351 xfrm6_tunnel_net_exit+0x2cb/0x500 net/ipv6/xfrm6_tunnel.c:351
       ...
        ops_exit_list.isra.0+0xb0/0x160 net/core/net_namespace.c:153
        cleanup_net+0x51d/0xb10 net/core/net_namespace.c:551
        process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153
        worker_thread+0x143/0x14a0 kernel/workqueue.c:2296
        kthread+0x357/0x430 kernel/kthread.c:246
        ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
      
      In fact, it is perfectly fine to bypass GC and destroy xfrm_state
      synchronously on net exit call path, because it is in process context
      and doesn't need a work struct to do any blocking work.
      
      This patch introduces xfrm_state_put_sync() which simply bypasses
      GC, and lets its callers to decide whether to use this synchronous
      version. On net exit path, xfrm_state_fini() and
      xfrm6_tunnel_net_exit() use it. And, as ipcomp_destroy() itself is
      blocking, it can use xfrm_state_put_sync() directly too.
      
      Also rename xfrm_state_gc_destroy() to ___xfrm_state_destroy() to
      reflect this change.
      
      Fixes: b48c05ab ("xfrm: Fix warning in xfrm6_tunnel_net_exit.")
      Reported-and-tested-by: default avatar <syzbot+e9aebef558e3ed673934@syzkaller.appspotmail.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarZhiqiang Liu <liuzhiqiang26@huawei.com>
      Reviewed-by: default avatarWenan Mao <maowenan@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      d985029e
    • George Wilkie's avatar
      team: use operstate consistently for linkup · 48dd159a
      George Wilkie authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0
      commit 8c7a7726
      category: bugfix
      bugzilla: 10921
      CVE: NA
      
      -------------------------------------------------
      
      When a port is added to a team, its initial state is derived
      from netif_carrier_ok rather than netif_oper_up.
      If it is carrier up but operationally down at the time of being
      added, the port state.linkup will be set prematurely.
      port state.linkup should be set consistently using
      netif_oper_up rather than netif_carrier_ok.
      
      Fixes: f1d22a1e ("team: account for oper state")
      Signed-off-by: default avatarGeorge Wilkie <gwilkie@vyatta.att-mail.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZhiqiang Liu <liuzhiqiang26@huawei.com>
      Reviewed-by: default avatarWenan Mao <maowenan@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      48dd159a
    • Maciej Kwiecien's avatar
      sctp: don't compare hb_timer expire date before starting it · 44b885e6
      Maciej Kwiecien authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0
      commit d1f20c03
      category: bugfix
      bugzilla: 10912
      CVE: NA
      
      -------------------------------------------------
      
      hb_timer might not start at all for a particular transport because its
      start is conditional. In a result a node is not sending heartbeats.
      
      Function sctp_transport_reset_hb_timer has two roles:
          - initial start of hb_timer for a given transport,
          - update expire date of hb_timer for a given transport.
      The function is optimized to update timer's expire only if it is before
      a new calculated one but this comparison is invalid for a timer which
      has not yet started. Such a timer has expire == 0 and if a new expire
      value is bigger than (MAX_JIFFIES / 2 + 2) then "time_before" macro will
      fail and timer will not start resulting in no heartbeat packets send by
      the node.
      
      This was found when association was initialized within first 5 mins
      after system boot due to jiffies init value which is near to MAX_JIFFIES.
      
      Test kernel version: 4.9.154 (ARCH=arm)
      hb_timer.expire = 0;                //initialized, not started timer
      new_expire = MAX_JIFFIES / 2 + 2;   //or more
      time_before(hb_timer.expire, new_expire) == false
      
      Fixes: ba6f5e33 ("sctp: avoid refreshing heartbeat timer too often")
      Reported-by: default avatarMarcin Stojek <marcin.stojek@nokia.com>
      Tested-by: default avatarMarcin Stojek <marcin.stojek@nokia.com>
      Signed-off-by: default avatarMaciej Kwiecien <maciej.kwiecien@nokia.com>
      Reviewed-by: default avatarAlexander Sverdlin <alexander.sverdlin@nokia.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZhiqiang Liu <liuzhiqiang26@huawei.com>
      Reviewed-by: default avatarWenan Mao <maowenan@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      44b885e6
    • Hangbin Liu's avatar
      net: vrf: remove MTU limits for vrf device · 6be48433
      Hangbin Liu authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0
      commit ad49bc63
      category: bugfix
      bugzilla: 10915
      CVE: NA
      
      -------------------------------------------------
      
      Similiar to commit e94cd811 ("net: remove MTU limits for dummy and
      ifb device"), MTU is irrelevant for VRF device. We init it as 64K while
      limit it to [68, 1500] may make users feel confused.
      
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZhiqiang Liu <liuzhiqiang26@huawei.com>
      Reviewed-by: default avatarWenan Mao <maowenan@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      6be48433
    • Ursula Braun's avatar
      net/smc: fix smc_poll in SMC_INIT state · 818aa3ac
      Ursula Braun authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0
      commit d7cf4a3b
      category: bugfix
      bugzilla: 10930
      CVE: NA
      
      -------------------------------------------------
      
      smc_poll() returns with mask bit EPOLLPRI if the connection urg_state
      is SMC_URG_VALID. Since SMC_URG_VALID is zero, smc_poll signals
      EPOLLPRI errorneously if called in state SMC_INIT before the connection
      is created, for instance in a non-blocking connect scenario.
      
      This patch switches to non-zero values for the urg states.
      
      Reviewed-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Fixes: de8474eb ("net/smc: urgent data support")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZhiqiang Liu <liuzhiqiang26@huawei.com>
      Reviewed-by: default avatarWenan Mao <maowenan@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      818aa3ac
    • YueHaibing's avatar
      mdio_bus: Fix use-after-free on device_register fails · 645488bd
      YueHaibing authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0
      commit 6ff7b060
      category: bugfix
      bugzilla: 10910
      CVE: NA
      
      -------------------------------------------------
      
      KASAN has found use-after-free in fixed_mdio_bus_init,
      commit 0c692d07 ("drivers/net/phy/mdio_bus.c: call
      put_device on device_register() failure") call put_device()
      while device_register() fails,give up the last reference
      to the device and allow mdiobus_release to be executed
      ,kfreeing the bus. However in most drives, mdiobus_free
      be called to free the bus while mdiobus_register fails.
      use-after-free occurs when access bus again, this patch
      revert it to let mdiobus_free free the bus.
      
      KASAN report details as below:
      
      BUG: KASAN: use-after-free in mdiobus_free+0x85/0x90 drivers/net/phy/mdio_bus.c:482
      Read of size 4 at addr ffff8881dc824d78 by task syz-executor.0/3524
      
      CPU: 1 PID: 3524 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xfa/0x1ce lib/dump_stack.c:113
       print_address_description+0x65/0x270 mm/kasan/report.c:187
       kasan_report+0x149/0x18d mm/kasan/report.c:317
       mdiobus_free+0x85/0x90 drivers/net/phy/mdio_bus.c:482
       fixed_mdio_bus_init+0x283/0x1000 [fixed_phy]
       ? 0xffffffffc0e40000
       ? 0xffffffffc0e40000
       ? 0xffffffffc0e40000
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462e99
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f6215c19c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
      RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003
      RBP: 00007f6215c19c70 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6215c1a6bc
      R13: 00000000004bcefb R14: 00000000006f7030 R15: 0000000000000004
      
      Allocated by task 3524:
       set_track mm/kasan/common.c:85 [inline]
       __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:496
       kmalloc include/linux/slab.h:545 [inline]
       kzalloc include/linux/slab.h:740 [inline]
       mdiobus_alloc_size+0x54/0x1b0 drivers/net/phy/mdio_bus.c:143
       fixed_mdio_bus_init+0x163/0x1000 [fixed_phy]
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 3524:
       set_track mm/kasan/common.c:85 [inline]
       __kasan_slab_free+0x130/0x180 mm/kasan/common.c:458
       slab_free_hook mm/slub.c:1409 [inline]
       slab_free_freelist_hook mm/slub.c:1436 [inline]
       slab_free mm/slub.c:2986 [inline]
       kfree+0xe1/0x270 mm/slub.c:3938
       device_release+0x78/0x200 drivers/base/core.c:919
       kobject_cleanup lib/kobject.c:662 [inline]
       kobject_release lib/kobject.c:691 [inline]
       kref_put include/linux/kref.h:67 [inline]
       kobject_put+0x146/0x240 lib/kobject.c:708
       put_device+0x1c/0x30 drivers/base/core.c:2060
       __mdiobus_register+0x483/0x560 drivers/net/phy/mdio_bus.c:382
       fixed_mdio_bus_init+0x26b/0x1000 [fixed_phy]
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8881dc824c80
       which belongs to the cache kmalloc-2k of size 2048
      The buggy address is located 248 bytes inside of
       2048-byte region [ffff8881dc824c80, ffff8881dc825480)
      The buggy address belongs to the page:
      page:ffffea0007720800 count:1 mapcount:0 mapping:ffff8881f6c02800 index:0x0 compound_mapcount: 0
      flags: 0x2fffc0000010200(slab|head)
      raw: 02fffc0000010200 0000000000000000 0000000500000001 ffff8881f6c02800
      raw: 0000000000000000 00000000800f000f 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8881dc824c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff8881dc824c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8881dc824d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                                      ^
       ffff8881dc824d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8881dc824e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: 0c692d07 ("drivers/net/phy/mdio_bus.c: call put_device on device_register() failure")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZhiqiang Liu <liuzhiqiang26@huawei.com>
      Reviewed-by: default avatarWenan Mao <maowenan@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      645488bd
    • Paolo Abeni's avatar
      ipv6: route: purge exception on removal · f5b9b87d
      Paolo Abeni authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0
      commit f5b51fe8
      category: bugfix
      bugzilla: 10922
      CVE: NA
      
      -------------------------------------------------
      
      When a netdevice is unregistered, we flush the relevant exception
      via rt6_sync_down_dev() -> fib6_ifdown() -> fib6_del() -> fib6_del_route().
      
      Finally, we end-up calling rt6_remove_exception(), where we release
      the relevant dst, while we keep the references to the related fib6_info and
      dev. Such references should be released later when the dst will be
      destroyed.
      
      There are a number of caches that can keep the exception around for an
      unlimited amount of time - namely dst_cache, possibly even socket cache.
      As a result device registration may hang, as demonstrated by this script:
      
      ip netns add cl
      ip netns add rt
      ip netns add srv
      ip netns exec rt sysctl -w net.ipv6.conf.all.forwarding=1
      
      ip link add name cl_veth type veth peer name cl_rt_veth
      ip link set dev cl_veth netns cl
      ip -n cl link set dev cl_veth up
      ip -n cl addr add dev cl_veth 2001::2/64
      ip -n cl route add default via 2001::1
      
      ip -n cl link add tunv6 type ip6tnl mode ip6ip6 local 2001::2 remote 2002::1 hoplimit 64 dev cl_veth
      ip -n cl link set tunv6 up
      ip -n cl addr add 2013::2/64 dev tunv6
      
      ip link set dev cl_rt_veth netns rt
      ip -n rt link set dev cl_rt_veth up
      ip -n rt addr add dev cl_rt_veth 2001::1/64
      
      ip link add name rt_srv_veth type veth peer name srv_veth
      ip link set dev srv_veth netns srv
      ip -n srv link set dev srv_veth up
      ip -n srv addr add dev srv_veth 2002::1/64
      ip -n srv route add default via 2002::2
      
      ip -n srv link add tunv6 type ip6tnl mode ip6ip6 local 2002::1 remote 2001::2 hoplimit 64 dev srv_veth
      ip -n srv link set tunv6 up
      ip -n srv addr add 2013::1/64 dev tunv6
      
      ip link set dev rt_srv_veth netns rt
      ip -n rt link set dev rt_srv_veth up
      ip -n rt addr add dev rt_srv_veth 2002::2/64
      
      ip netns exec srv netserver & sleep 0.1
      ip netns exec cl ping6 -c 4 2013::1
      ip netns exec cl netperf -H 2013::1 -t TCP_STREAM -l 3 & sleep 1
      ip -n rt link set dev rt_srv_veth mtu 1400
      wait %2
      
      ip -n cl link del cl_veth
      
      This commit addresses the issue purging all the references held by the
      exception at time, as we currently do for e.g. ipv6 pcpu dst entries.
      
      v1 -> v2:
       - re-order the code to avoid accessing dst and net after dst_dev_put()
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZhiqiang Liu <liuzhiqiang26@huawei.com>
      Reviewed-by: default avatarWenan Mao <maowenan@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      f5b9b87d
    • Martin Willi's avatar
      esp: Skip TX bytes accounting when sending from a request socket · d774b790
      Martin Willi authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0
      commit 09db5124
      category: bugfix
      bugzilla: 10914
      CVE: NA
      
      -------------------------------------------------
      
      On ESP output, sk_wmem_alloc is incremented for the added padding if a
      socket is associated to the skb. When replying with TCP SYNACKs over
      IPsec, the associated sk is a casted request socket, only. Increasing
      sk_wmem_alloc on a request socket results in a write at an arbitrary
      struct offset. In the best case, this produces the following WARNING:
      
      WARNING: CPU: 1 PID: 0 at lib/refcount.c:102 esp_output_head+0x2e4/0x308 [esp4]
      refcount_t: addition on 0; use-after-free.
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-rc3 #2
      Hardware name: Marvell Armada 380/385 (Device Tree)
      [...]
      [<bf0ff354>] (esp_output_head [esp4]) from [<bf1006a4>] (esp_output+0xb8/0x180 [esp4])
      [<bf1006a4>] (esp_output [esp4]) from [<c05dee64>] (xfrm_output_resume+0x558/0x664)
      [<c05dee64>] (xfrm_output_resume) from [<c05d07b0>] (xfrm4_output+0x44/0xc4)
      [<c05d07b0>] (xfrm4_output) from [<c05956bc>] (tcp_v4_send_synack+0xa8/0xe8)
      [<c05956bc>] (tcp_v4_send_synack) from [<c0586ad8>] (tcp_conn_request+0x7f4/0x948)
      [<c0586ad8>] (tcp_conn_request) from [<c058c404>] (tcp_rcv_state_process+0x2a0/0xe64)
      [<c058c404>] (tcp_rcv_state_process) from [<c05958ac>] (tcp_v4_do_rcv+0xf0/0x1f4)
      [<c05958ac>] (tcp_v4_do_rcv) from [<c0598a4c>] (tcp_v4_rcv+0xdb8/0xe20)
      [<c0598a4c>] (tcp_v4_rcv) from [<c056eb74>] (ip_protocol_deliver_rcu+0x2c/0x2dc)
      [<c056eb74>] (ip_protocol_deliver_rcu) from [<c056ee6c>] (ip_local_deliver_finish+0x48/0x54)
      [<c056ee6c>] (ip_local_deliver_finish) from [<c056eecc>] (ip_local_deliver+0x54/0xec)
      [<c056eecc>] (ip_local_deliver) from [<c056efac>] (ip_rcv+0x48/0xb8)
      [<c056efac>] (ip_rcv) from [<c0519c2c>] (__netif_receive_skb_one_core+0x50/0x6c)
      [...]
      
      The issue triggers only when not using TCP syncookies, as for syncookies
      no socket is associated.
      
      Fixes: cac2661c ("esp4: Avoid skb_cow_data whenever possible")
      Fixes: 03e2a30f ("esp6: Avoid skb_cow_data whenever possible")
      Signed-off-by: default avatarMartin Willi <martin@strongswan.org>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarZhiqiang Liu <liuzhiqiang26@huawei.com>
      Reviewed-by: default avatarWenan Mao <maowenan@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      d774b790
    • Michal Soltys's avatar
      bonding: fix PACKET_ORIGDEV regression · 30b9156c
      Michal Soltys authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0
      commit 3c963a33
      category: bugfix
      bugzilla: 10926
      CVE: NA
      
      -------------------------------------------------
      
      This patch fixes a subtle PACKET_ORIGDEV regression which was a side
      effect of fixes introduced by:
      
      6a9e461f bonding: pass link-local packets to bonding master also.
      
      ... to:
      
      b89f04c6 bonding: deliver link-local packets with skb->dev set to link that packets arrived on
      
      While 6a9e461f restored pre-b89f04c6 presence of link-local
      packets on bonding masters (which is required e.g. by linux bridges
      participating in spanning tree or needed for lab-like setups created
      with group_fwd_mask) it also caused the originating device
      information to be lost due to cloning.
      
      Maciej Żenczykowski proposed another solution that doesn't require
      packet cloning and retains original device information - instead of
      returning RX_HANDLER_PASS for all link-local packets it's now limited
      only to packets from inactive slaves.
      
      At the same time, packets passed to bonding masters retain correct
      information about the originating device and PACKET_ORIGDEV can be used
      to determine it.
      
      This elegantly solves all issues so far:
      
      - link-local packets that were removed from bonding masters
      - LLDP daemons being forced to explicitly bind to slave interfaces
      - PACKET_ORIGDEV having no effect on bond interfaces
      
      Fixes: 6a9e461f (bonding: pass link-local packets to bonding master also.)
      Reported-by: default avatarVincent Bernat <vincent@bernat.ch>
      Signed-off-by: default avatarMichal Soltys <soltys@ziu.info>
      Signed-off-by: default avatarMaciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZhiqiang Liu <liuzhiqiang26@huawei.com>
      Reviewed-by: default avatarWenan Mao <maowenan@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      30b9156c
    • Sean Tranchetti's avatar
      af_key: unconditionally clone on broadcast · 7daeb62c
      Sean Tranchetti authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0
      commit fc2d5cfd
      category: bugfix
      bugzilla: 10929
      CVE: NA
      
      -------------------------------------------------
      
      Attempting to avoid cloning the skb when broadcasting by inflating
      the refcount with sock_hold/sock_put while under RCU lock is dangerous
      and violates RCU principles. It leads to subtle race conditions when
      attempting to free the SKB, as we may reference sockets that have
      already been freed by the stack.
      
      Unable to handle kernel paging request at virtual address 6b6b6b6b6b6c4b
      [006b6b6b6b6b6c4b] address between user and kernel address ranges
      Internal error: Oops: 96000004 [#1] PREEMPT SMP
      task: fffffff78f65b380 task.stack: ffffff8049a88000
      pc : sock_rfree+0x38/0x6c
      lr : skb_release_head_state+0x6c/0xcc
      Process repro (pid: 7117, stack limit = 0xffffff8049a88000)
      Call trace:
      	sock_rfree+0x38/0x6c
      	skb_release_head_state+0x6c/0xcc
      	skb_release_all+0x1c/0x38
      	__kfree_skb+0x1c/0x30
      	kfree_skb+0xd0/0xf4
      	pfkey_broadcast+0x14c/0x18c
      	pfkey_sendmsg+0x1d8/0x408
      	sock_sendmsg+0x44/0x60
      	___sys_sendmsg+0x1d0/0x2a8
      	__sys_sendmsg+0x64/0xb4
      	SyS_sendmsg+0x34/0x4c
      	el0_svc_naked+0x34/0x38
      Kernel panic - not syncing: Fatal exception
      
      Suggested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarSean Tranchetti <stranche@codeaurora.org>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarZhiqiang Liu <liuzhiqiang26@huawei.com>
      Reviewed-by: default avatarWenan Mao <maowenan@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      7daeb62c
    • Rayagonda Kokatanur's avatar
      mailbox: bcm-flexrm-mailbox: Fix FlexRM ring flush timeout issue · 07234a89
      Rayagonda Kokatanur authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-v5.0-rc7
      commit d7bf31a0
      category: bugfix
      bugzilla: 10755
      CVE: NA
      
      -------------------------------------------------
      
      RING_CONTROL reg was not written due to wrong address, hence all
      the subsequent ring flush was timing out.
      
      Fixes: a371c10e ("mailbox: bcm-flexrm-mailbox: Fix FlexRM ring flush sequence")
      
      Signed-off-by: default avatarRayagonda Kokatanur <rayagonda.kokatanur@broadcom.com>
      Signed-off-by: default avatarRay Jui <ray.jui@broadcom.com>
      Reviewed-by: default avatarScott Branden <scott.branden@broadcom.com>
      Signed-off-by: default avatarJassi Brar <jaswinder.singh@linaro.org>
      Signed-off-by: default avatarTan Xiaojun <tanxiaojun@huawei.com>
      Reviewed-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      07234a89
    • Andrey Konovalov's avatar
      kasan, slab: remove redundant kasan_slab_alloc hooks · 874e7071
      Andrey Konovalov authored and 谢秀奇's avatar 谢秀奇 committed
      mainline inclusion
      from mainline-v5.0-rc7
      commit 557ea253
      category: bugfix
      bugzilla: 10810
      CVE: NA
      
      -------------------------------------------------
      
      kasan_slab_alloc() calls in kmem_cache_alloc() and kmem_cache_alloc_node()
      are redundant as they are already called via slab_alloc/slab_alloc_node()->
      slab_post_alloc_hook()->kasan_slab_alloc().  Remove them.
      
      Link: http://lkml.kernel.org/r/4ca1655cdcfc4379c49c50f7bf80f81c4ad01485.1550602886.git.andreyknvl@google.com
      
      
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarQian Cai <cai@lca.pw>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Evgeniy Stepanov <eugenis@google.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarzhong jiang <zhongjiang@huawei.com>
      Reviewed-by: default avatarJing Xiangfeng <jingxiangfeng@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      874e7071
    • Jeremy Linton's avatar
      arm64: Provide a command line to disable spectre_v2 mitigation · 66ff1b1d
      Jeremy Linton authored and 谢秀奇's avatar 谢秀奇 committed
      
      euler inclusion
      category: feature
      bugzilla: 11011
      CVE: NA
      
      Patch will be in mainlien kernel 5.2
      --------------------------------------------------
      
      There are various reasons, including bencmarking, to disable spectrev2
      mitigation on a machine. Provide a command-line to do so.
      
      Signed-off-by: default avatarJeremy Linton <jeremy.linton@arm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: linux-doc@vger.kernel.org
      [Hanjun: fix conflicts which we don't have PPC PPC_FSL_BOOK3E
       arch spectre-v2 mitigation in 4.19 kernel, so remvoe PPC_FSL_BOOK3E]
      Conflicts:
      	Documentation/admin-guide/kernel-parameters.txt
      Signed-off-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Reviewed-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      66ff1b1d
    • Jeremy Linton's avatar
      Documentation: Document arm64 kpti control · 5f550a8e
      Jeremy Linton authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from linux-next
      commit de190555
      category: feature
      bugzilla: 11011
      CVE: NA
      
      -------------------------------------------------
      
      For a while Arm64 has been capable of force enabling
      or disabling the kpti mitigations. Lets make sure the
      documentation reflects that.
      
      Signed-off-by: default avatarJeremy Linton <jeremy.linton@arm.com>
      Reviewed-by: default avatarAndre Przywara <andre.przywara@arm.com>
      Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Reviewed-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      5f550a8e
    • YueHaibing's avatar
      xfrm: policy: Fix out-of-bound array accesses in __xfrm_policy_unlink · 2d8e8d80
      YueHaibing authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-v5.1
      commit: b805d78d
      category: bugfix
      bugzilla: 10662
      CVE: NA
      
      ---------------------------------------------------------
      
      UBSAN report this:
      
      UBSAN: Undefined behaviour in net/xfrm/xfrm_policy.c:1289:24
      index 6 is out of range for type 'unsigned int [6]'
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.4.162-514.55.6.9.x86_64+ #13
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
       0000000000000000 1466cf39b41b23c9 ffff8801f6b07a58 ffffffff81cb35f4
       0000000041b58ab3 ffffffff83230f9c ffffffff81cb34e0 ffff8801f6b07a80
       ffff8801f6b07a20 1466cf39b41b23c9 ffffffff851706e0 ffff8801f6b07ae8
      Call Trace:
       <IRQ>  [<ffffffff81cb35f4>] __dump_stack lib/dump_stack.c:15 [inline]
       <IRQ>  [<ffffffff81cb35f4>] dump_stack+0x114/0x1a0 lib/dump_stack.c:51
       [<ffffffff81d94225>] ubsan_epilogue+0x12/0x8f lib/ubsan.c:164
       [<ffffffff81d954db>] __ubsan_handle_out_of_bounds+0x16e/0x1b2 lib/ubsan.c:382
       [<ffffffff82a25acd>] __xfrm_policy_unlink+0x3dd/0x5b0 net/xfrm/xfrm_policy.c:1289
       [<ffffffff82a2e572>] xfrm_policy_delete+0x52/0xb0 net/xfrm/xfrm_policy.c:1309
       [<ffffffff82a3319b>] xfrm_policy_timer+0x30b/0x590 net/xfrm/xfrm_policy.c:243
       [<ffffffff813d3927>] call_timer_fn+0x237/0x990 kernel/time/timer.c:1144
       [<ffffffff813d8e7e>] __run_timers kernel/time/timer.c:1218 [inline]
       [<ffffffff813d8e7e>] run_timer_softirq+0x6ce/0xb80 kernel/time/timer.c:1401
       [<ffffffff8120d6f9>] __do_softirq+0x299/0xe10 kernel/softirq.c:273
       [<ffffffff8120e676>] invoke_softirq kernel/softirq.c:350 [inline]
       [<ffffffff8120e676>] irq_exit+0x216/0x2c0 kernel/softirq.c:391
       [<ffffffff82c5edab>] exiting_irq arch/x86/include/asm/apic.h:652 [inline]
       [<ffffffff82c5edab>] smp_apic_timer_interrupt+0x8b/0xc0 arch/x86/kernel/apic/apic.c:926
       [<ffffffff82c5c985>] apic_timer_interrupt+0xa5/0xb0 arch/x86/entry/entry_64.S:735
       <EOI>  [<ffffffff81188096>] ? native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:52
       [<ffffffff810834d7>] arch_safe_halt arch/x86/include/asm/paravirt.h:111 [inline]
       [<ffffffff810834d7>] default_idle+0x27/0x430 arch/x86/kernel/process.c:446
       [<ffffffff81085f05>] arch_cpu_idle+0x15/0x20 arch/x86/kernel/process.c:437
       [<ffffffff8132abc3>] default_idle_call+0x53/0x90 kernel/sched/idle.c:92
       [<ffffffff8132b32d>] cpuidle_idle_call kernel/sched/idle.c:156 [inline]
       [<ffffffff8132b32d>] cpu_idle_loop kernel/sched/idle.c:251 [inline]
       [<ffffffff8132b32d>] cpu_startup_entry+0x60d/0x9a0 kernel/sched/idle.c:299
       [<ffffffff8113e119>] start_secondary+0x3c9/0x560 arch/x86/kernel/smpboot.c:245
      
      The issue is triggered as this:
      
      xfrm_add_policy
          -->verify_newpolicy_info  //check the index provided by user with XFRM_POLICY_MAX
      			      //In my case, the index is 0x6E6BB6, so it pass the check.
          -->xfrm_policy_construct  //copy the user's policy and set xfrm_policy_timer
          -->xfrm_policy_insert
      	--> __xfrm_policy_link //use the orgin dir, in my case is 2
      	--> xfrm_gen_index   //generate policy index, there is 0x6E6BB6
      
      then xfrm_policy_timer be fired
      
      xfrm_policy_timer
         --> xfrm_policy_id2dir  //get dir from (policy index & 7), in my case is 6
         --> xfrm_policy_delete
            --> __xfrm_policy_unlink //access policy_count[dir], trigger out of range access
      
      Add xfrm_policy_id2dir check in verify_newpolicy_info, make sure the computed dir is
      valid, to fix the issue.
      
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Fixes: e682adf0 ("xfrm: Try to honor policy index if it's supplied by user")
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarWenan Mao <maowenan@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      2d8e8d80
    • YueHaibing's avatar
      net-sysfs: Fix mem leak in netdev_register_kobject · cddd2f5d
      YueHaibing authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-v5.1
      commit: 895a5e96
      category: bugfix
      bugzilla: 10987
      CVE: NA
      
      ---------------------------------------------------------
      
      syzkaller report this:
      BUG: memory leak
      unreferenced object 0xffff88837a71a500 (size 256):
        comm "syz-executor.2", pid 9770, jiffies 4297825125 (age 17.843s)
        hex dump (first 32 bytes):
          00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
          ff ff ff ff ff ff ff ff 20 c0 ef 86 ff ff ff ff  ........ .......
        backtrace:
          [<00000000db12624b>] netdev_register_kobject+0x124/0x2e0 net/core/net-sysfs.c:1751
          [<00000000dc49a994>] register_netdevice+0xcc1/0x1270 net/core/dev.c:8516
          [<00000000e5f3fea0>] tun_set_iff drivers/net/tun.c:2649 [inline]
          [<00000000e5f3fea0>] __tun_chr_ioctl+0x2218/0x3d20 drivers/net/tun.c:2883
          [<000000001b8ac127>] vfs_ioctl fs/ioctl.c:46 [inline]
          [<000000001b8ac127>] do_vfs_ioctl+0x1a5/0x10e0 fs/ioctl.c:690
          [<0000000079b269f8>] ksys_ioctl+0x89/0xa0 fs/ioctl.c:705
          [<00000000de649beb>] __do_sys_ioctl fs/ioctl.c:712 [inline]
          [<00000000de649beb>] __se_sys_ioctl fs/ioctl.c:710 [inline]
          [<00000000de649beb>] __x64_sys_ioctl+0x74/0xb0 fs/ioctl.c:710
          [<000000007ebded1e>] do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
          [<00000000db315d36>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<00000000115be9bb>] 0xffffffffffffffff
      
      It should call kset_unregister to free 'dev->queues_kset'
      in error path of register_queue_kobjects, otherwise will cause a mem leak.
      
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Fixes: 1d24eb48 ("xps: Transmit Packet Steering")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Reviewed-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      cddd2f5d
    • Xiongfeng Wang's avatar
      timekeeping: Avoid undefined behaviour in 'ktime_get_with_offset()' · 85d6dbdf
      Xiongfeng Wang authored and 谢秀奇's avatar 谢秀奇 committed
      
      euler inclusion
      category: feature
      Bugzilla: 10683
      CVE: N/A
      
      ----------------------------------------
      
      When I ran Syzkaller testsuite, I got the following call trace.
      Reviewed-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      
      ================================================================================
      UBSAN: Undefined behaviour in kernel/time/timekeeping.c:801:8
      signed integer overflow:
      500152103386 + 9223372036854775807 cannot be represented in type 'long long int'
      CPU: 6 PID: 13904 Comm: syz-executor.0 Not tainted 4.19.25 #5
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xca/0x13e lib/dump_stack.c:113
       ubsan_epilogue+0xe/0x81 lib/ubsan.c:159
       handle_overflow+0x193/0x1e2 lib/ubsan.c:190
       ktime_get_with_offset+0x26a/0x2d0 kernel/time/timekeeping.c:801
       common_hrtimer_arm+0x14d/0x220 kernel/time/posix-timers.c:817
       common_timer_set+0x337/0x530 kernel/time/posix-timers.c:863
       do_timer_settime+0x198/0x290 kernel/time/posix-timers.c:892
       __do_sys_timer_settime kernel/time/posix-timers.c:918 [inline]
       __se_sys_timer_settime kernel/time/posix-timers.c:904 [inline]
       __x64_sys_timer_settime+0x18d/0x260 kernel/time/posix-timers.c:904
       do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462eb9
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f7968072c58 EFLAGS: 00000246 ORIG_RAX: 00000000000000df
      RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462eb9
      RDX: 00000000200000c0 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f79680736bc
      R13: 00000000004c54cc R14: 0000000000704278 R15: 00000000ffffffff
      ================================================================================
      
      It it because global variable 'offsets' is set with a very large but still
      valid value. It overflows when we add 'tk->tkr_mono.base' with 'offsets'.
      
      This patch use 'ktime_add_safe()' to limit the result to 'KTIME_SEC_MAX'
      when it overflows.
      
      Signed-off-by: default avatarXiongfeng Wang <wangxiongfeng2@huawe.com>
      Reviewed-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      85d6dbdf
    • Magnus Karlsson's avatar
      xsk: add missing smp_rmb() in xsk_mmap · a4ff09a1
      Magnus Karlsson authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-v5.0-rc7
      commit e6762c8b
      category: bugfix
      bugzilla: 10762
      CVE: NA
      
      -------------------------------------------------
      
      All the setup code in AF_XDP is protected by a mutex with the
      exception of the mmap code that cannot use it. To make sure that a
      process banging on the mmap call at the same time as another process
      is setting up the socket, smp_wmb() calls were added in the umem
      registration code and the queue creation code, so that the published
      structures that xsk_mmap needs would be consistent. However, the
      corresponding smp_rmb() calls were not added to the xsk_mmap
      code. This patch adds these calls.
      
      Fixes: 37b07693 ("xsk: add missing write- and data-dependency barrier")
      Fixes: c0c77d8f ("xsk: add user memory registration support sockopt")
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarLin Miaohe <linmiaohe@huawei.com>
      Reviewed-by: default avatarMao Wenan <maowenan@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      a4ff09a1
    • Greg Kroah-Hartman's avatar
      rpc: properly check debugfs dentry before using it · 5d946333
      Greg Kroah-Hartman authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-v5.0-rc7
      commit ad6fef77
      category: bugfix
      bugzilla: 10735
      CVE: NA
      
      -------------------------------------------------
      
      debugfs can now report an error code if something went wrong instead of
      just NULL.  So if the return value is to be used as a "real" dentry, it
      needs to be checked if it is an error before dereferencing it.
      
      This is now happening because of ff9fb72b ("debugfs: return error
      values, not NULL"), but why debugfs files are not being created properly
      is an older issue, probably one that has always been there and should
      probably be looked at...
      
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Jeff Layton <jlayton@kernel.org>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Anna Schumaker <anna.schumaker@netapp.com>
      Cc: linux-nfs@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Reported-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarLin Miaohe <linmiaohe@huawei.com>
      Reviewed-by: default avatarMao Wenan <maowenan@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      5d946333
    • Willem de Bruijn's avatar
      bpf: only adjust gso_size on bytestream protocols · 7609c614
      Willem de Bruijn authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-v5.0-rc7
      commit b90efd22
      category: bugfix
      bugzilla: 10773
      CVE: NA
      
      -------------------------------------------------
      
      bpf_skb_change_proto and bpf_skb_adjust_room change skb header length.
      For GSO packets they adjust gso_size to maintain the same MTU.
      
      The gso size can only be safely adjusted on bytestream protocols.
      Commit d02f51cb ("bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat
      to deal with gso sctp skbs") excluded SKB_GSO_SCTP.
      
      Since then type SKB_GSO_UDP_L4 has been added, whose contents are one
      gso_size unit per datagram. Also exclude these.
      
      Move from a blacklist to a whitelist check to future proof against
      additional such new GSO types, e.g., for fraglist based GRO.
      
      Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarLin Miaohe <linmiaohe@huawei.com>
      Reviewed-by: default avatarMao Wenan <maowenan@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      7609c614
    • Nicolas Morey-Chaisemartin's avatar
      xprtrdma: Make sure Send CQ is allocated on an existing compvec · e4288d0f
      Nicolas Morey-Chaisemartin authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-v5.0-rc7
      commit a4cb5bdb
      category: bugfix
      bugzilla: 10737
      CVE: NA
      
      -------------------------------------------------
      
      Make sure the device has at least 2 completion vectors
      before allocating to compvec#1
      
      Fixes: a4699f56 (xprtrdma: Put Send CQ in IB_POLL_WORKQUEUE mode)
      Signed-off-by: default avatarNicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
      Reviewed-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarLin Miaohe <linmiaohe@huawei.com>
      Reviewed-by: default avatarMao Wenan <maowenan@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      e4288d0f
    • ZhangXiaoxu's avatar
      inotify: Fix fsnotify_mark refcount leak in inotify_update_existing_watch() · 23717b34
      ZhangXiaoxu authored and 谢秀奇's avatar 谢秀奇 committed
      euler inclusion
      category: bugfix
      bugzilla: 10982
      CVE: NA
      --------------------------------------------------
      
      commit 4d97f7d5 ("inotify: Add flag IN_MASK_CREATE for
      inotify_add_watch()") forgot to call fsnotify_put_mark() with
      IN_MASK_CREATE after fsnotify_find_mark()
      
      Fixes: 4d97f7d5 ("inotify: Add flag IN_MASK_CREATE for inotify_add_watch()")
      Link: https://www.spinics.net/lists/linux-fsdevel/msg140540.html
      
      
      
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Reviewed-by: default avatarMiao Xie <miaoxie@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      23717b34
    • Paolo Abeni's avatar
      ipv6: route: enforce RCU protection in ip6_route_check_nh_onlink() · c6f71dda
      Paolo Abeni authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-v5.0-rc7
      commit bf1dc8ba
      category: bugfix
      bugzilla: 10918
      CVE: NA
      
      -------------------------------------------------
      
      We need a RCU critical section around rt6_info->from deference, and
      proper annotation.
      
      Fixes: 4ed591c8 ("net/ipv6: Allow onlink routes to have a device mismatch if it is the default route")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLin Miaohe <linmiaohe@huawei.com>
      Reviewed-by: default avatarMao Wenan <maowenan@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      c6f71dda
    • Paolo Abeni's avatar
      ipv6: route: enforce RCU protection in rt6_update_exception_stamp_rt() · f5e84cdf
      Paolo Abeni authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-v5.0-rc7
      commit 193f3685
      category: bugfix
      bugzilla: 10918
      CVE: NA
      
      -------------------------------------------------
      
      We must access rt6_info->from under RCU read lock: move the
      dereference under such lock, with proper annotation.
      
      v1 -> v2:
       - avoid using multiple, racy, fetch operations for rt->from
      
      Fixes: a68886a6 ("net/ipv6: Make from in rt6_info rcu protected")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLin Miaohe <linmiaohe@huawei.com>
      Reviewed-by: default avatarMao Wenan <maowenan@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      f5e84cdf
    • Davidlohr Bueso's avatar
      xsk: share the mmap_sem for page pinning · 3220df4c
      Davidlohr Bueso authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-v5.0-rc7
      commit e451eb51
      category: bugfix
      bugzilla: 10768
      CVE: NA
      
      -------------------------------------------------
      
      Holding mmap_sem exclusively for a gup() is an overkill. Lets
      share the lock and replace the gup call for gup_longterm(), as
      it is better suited for the lifetime of the pinning.
      
      Fixes: c0c77d8f ("xsk: add user memory registration support sockopt")
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Bjorn Topel <bjorn.topel@intel.com>
      Cc: Magnus Karlsson <magnus.karlsson@intel.com>
      CC: netdev@vger.kernel.org
      Acked-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarLin Miaohe <linmiaohe@huawei.com>
      Reviewed-by: default avatarMao Wenan <maowenan@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      3220df4c
    • Alban Crequy's avatar
      bpf, lpm: fix lookup bug in map_delete_elem · 5f549338
      Alban Crequy authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0
      commit 7c0cdf0b
      category: bugfix
      bugzilla: 10936
      CVE: NA
      
      -------------------------------------------------
      
      trie_delete_elem() was deleting an entry even though it was not matching
      if the prefixlen was correct. This patch adds a check on matchlen.
      
      Reproducer:
      
      $ sudo bpftool map create /sys/fs/bpf/mylpm type lpm_trie key 8 value 1 entries 128 name mylpm flags 1
      $ sudo bpftool map update pinned /sys/fs/bpf/mylpm key hex 10 00 00 00 aa bb cc dd value hex 01
      $ sudo bpftool map dump pinned /sys/fs/bpf/mylpm
      key: 10 00 00 00 aa bb cc dd  value: 01
      Found 1 element
      $ sudo bpftool map delete pinned /sys/fs/bpf/mylpm key hex 10 00 00 00 ff ff ff ff
      $ echo $?
      0
      $ sudo bpftool map dump pinned /sys/fs/bpf/mylpm
      Found 0 elements
      
      A similar reproducer is added in the selftests.
      
      Without the patch:
      
      $ sudo ./tools/testing/selftests/bpf/test_lpm_map
      test_lpm_map: test_lpm_map.c:485: test_lpm_delete: Assertion `bpf_map_delete_elem(map_fd, key) == -1 && errno == ENOENT' failed.
      Aborted
      
      With the patch: test_lpm_map runs without errors.
      
      Fixes: e454cf59 ("bpf: Implement map_delete_elem for BPF_MAP_TYPE_LPM_TRIE")
      Cc: Craig Gallek <kraig@google.com>
      Signed-off-by: default avatarAlban Crequy <alban@kinvolk.io>
      Acked-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarCheng Jian <cj.chengjian@huawei.com>
      Reviewed-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      5f549338
    • Eric Dumazet's avatar
      bpf, lpm: make longest_prefix_match() faster · 7e35327e
      Eric Dumazet authored and 谢秀奇's avatar 谢秀奇 committed
      mainline inclusion
      from mainline-5.0
      commit 7c0cdf0b
      category: bugfix
      bugzilla: 10936
      CVE: NA
      
      -------------------------------------------------
      
      At LPC 2018 in Vancouver, Vlad Dumitrescu mentioned that longest_prefix_match()
      has a high cost [1].
      
      One reason for that cost is a loop handling one byte at a time.
      
      We can handle more bytes at a time, if enough attention is paid
      to endianness.
      
      I was able to remove ~55 % of longest_prefix_match() cpu costs.
      
      [1] https://linuxplumbersconf.org/event/2/contributions/88/attachments/76/87/lpc-bpf-2018-shaping.pdf
      
      
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Vlad Dumitrescu <vladum@google.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarCheng Jian <cj.chengjian@huawei.com>
      Reviewed-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      7e35327e
    • Eugene Loh's avatar
      kallsyms: Handle too long symbols in kallsyms.c · eb6b1b22
      Eugene Loh authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0-rc8
      commit 6db2983c
      category: bugfix
      bugzilla: 10868
      CVE: NA
      
      -------------------------------------------------
      
      When checking for symbols with excessively long names,
      account for null terminating character.
      
      Fixes: f3462aa9 ("Kbuild: Handle longer symbols in kallsyms.c")
      Signed-off-by: default avatarEugene Loh <eugene.loh@oracle.com>
      Acked-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarCheng Jian <cj.chengjian@huawei.com>
      Reviewed-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      eb6b1b22
    • Martin Wilck's avatar
      scsi: core: reset host byte in DID_NEXUS_FAILURE case · 38f8e68d
      Martin Wilck authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0-rc8
      commit 4a067cf8
      category: bugfix
      bugzilla: 10864
      CVE: NA
      ---------------------------
      
      Up to 4.12, __scsi_error_from_host_byte() would reset the host byte to
      DID_OK for various cases including DID_NEXUS_FAILURE.  Commit
      2a842aca ("block: introduce new block status code type") replaced this
      function with scsi_result_to_blk_status() and removed the host-byte
      resetting code for the DID_NEXUS_FAILURE case.  As the line
      set_host_byte(cmd, DID_OK) was preserved for the other cases, I suppose
      this was an editing mistake.
      
      The fact that the host byte remains set after 4.13 is causing problems with
      the sg_persist tool, which now returns success rather then exit status 24
      when a RESERVATION CONFLICT error is encountered.
      
      Fixes: 2a842aca "block: introduce new block status code type"
      Signed-off-by: default avatarMartin Wilck <mwilck@suse.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarYufen Yu <yuyufen@huawei.com>
      Reviewed-by: default avatarJason Yan <yanaijie@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      38f8e68d
    • Anoob Soman's avatar
      scsi: libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task · 6d70a601
      Anoob Soman authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0-rc8
      commit 79edd00d
      category: bugfix
      bugzilla: 10862
      CVE: NA
      ---------------------------
      
      When a target sends Check Condition, whilst initiator is busy xmiting
      re-queued data, could lead to race between iscsi_complete_task() and
      iscsi_xmit_task() and eventually crashing with the following kernel
      backtrace.
      
      [3326150.987523] ALERT: BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
      [3326150.987549] ALERT: IP: [<ffffffffa05ce70d>] iscsi_xmit_task+0x2d/0xc0 [libiscsi]
      [3326150.987571] WARN: PGD 569c8067 PUD 569c9067 PMD 0
      [3326150.987582] WARN: Oops: 0002 [#1] SMP
      [3326150.987593] WARN: Modules linked in: tun nfsv3 nfs fscache dm_round_robin
      [3326150.987762] WARN: CPU: 2 PID: 8399 Comm: kworker/u32:1 Tainted: G O 4.4.0+2 #1
      [3326150.987774] WARN: Hardware name: Dell Inc. PowerEdge R720/0W7JN5, BIOS 2.5.4 01/22/2016
      [3326150.987790] WARN: Workqueue: iscsi_q_13 iscsi_xmitworker [libiscsi]
      [3326150.987799] WARN: task: ffff8801d50f3800 ti: ffff8801f5458000 task.ti: ffff8801f5458000
      [3326150.987810] WARN: RIP: e030:[<ffffffffa05ce70d>] [<ffffffffa05ce70d>] iscsi_xmit_task+0x2d/0xc0 [libiscsi]
      [3326150.987825] WARN: RSP: e02b:ffff8801f545bdb0 EFLAGS: 00010246
      [3326150.987831] WARN: RAX: 00000000ffffffc3 RBX: ffff880282d2ab20 RCX: ffff88026b6ac480
      [3326150.987842] WARN: RDX: 0000000000000000 RSI: 00000000fffffe01 RDI: ffff880282d2ab20
      [3326150.987852] WARN: RBP: ffff8801f545bdc8 R08: 0000000000000000 R09: 0000000000000008
      [3326150.987862] WARN: R10: 0000000000000000 R11: 000000000000fe88 R12: 0000000000000000
      [3326150.987872] WARN: R13: ffff880282d2abe8 R14: ffff880282d2abd8 R15: ffff880282d2ac08
      [3326150.987890] WARN: FS: 00007f5a866b4840(0000) GS:ffff88028a640000(0000) knlGS:0000000000000000
      [3326150.987900] WARN: CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
      [3326150.987907] WARN: CR2: 0000000000000078 CR3: 0000000070244000 CR4: 0000000000042660
      [3326150.987918] WARN: Stack:
      [3326150.987924] WARN: ffff880282d2ad58 ffff880282d2ab20 ffff880282d2abe8 ffff8801f545be18
      [3326150.987938] WARN: ffffffffa05cea90 ffff880282d2abf8 ffff88026b59cc80 ffff88026b59cc00
      [3326150.987951] WARN: ffff88022acf32c0 ffff880289491800 ffff880255a80800 0000000000000400
      [3326150.987964] WARN: Call Trace:
      [3326150.987975] WARN: [<ffffffffa05cea90>] iscsi_xmitworker+0x2f0/0x360 [libiscsi]
      [3326150.987988] WARN: [<ffffffff8108862c>] process_one_work+0x1fc/0x3b0
      [3326150.987997] WARN: [<ffffffff81088f95>] worker_thread+0x2a5/0x470
      [3326150.988006] WARN: [<ffffffff8159cad8>] ? __schedule+0x648/0x870
      [3326150.988015] WARN: [<ffffffff81088cf0>] ? rescuer_thread+0x300/0x300
      [3326150.988023] WARN: [<ffffffff8108ddf5>] kthread+0xd5/0xe0
      [3326150.988031] WARN: [<ffffffff8108dd20>] ? kthread_stop+0x110/0x110
      [3326150.988040] WARN: [<ffffffff815a0bcf>] ret_from_fork+0x3f/0x70
      [3326150.988048] WARN: [<ffffffff8108dd20>] ? kthread_stop+0x110/0x110
      [3326150.988127] ALERT: RIP [<ffffffffa05ce70d>] iscsi_xmit_task+0x2d/0xc0 [libiscsi]
      [3326150.988138] WARN: RSP <ffff8801f545bdb0>
      [3326150.988144] WARN: CR2: 0000000000000078
      [3326151.020366] WARN: ---[ end trace 1c60974d4678d81b ]---
      
      Commit 6f8830f5 ("scsi: libiscsi: add lock around task lists to fix
      list corruption regression") introduced "taskqueuelock" to fix list
      corruption during the race, but this wasn't enough.
      
      Re-setting of conn->task to NULL, could race with iscsi_xmit_task().
      iscsi_complete_task()
      {
          ....
          if (conn->task == task)
              conn->task = NULL;
      }
      
      conn->task in iscsi_xmit_task() could be NULL and so will be task.
      __iscsi_get_task(task) will crash (NullPtr de-ref), trying to access
      refcount.
      
      iscsi_xmit_task()
      {
          struct iscsi_task *task = conn->task;
      
          __iscsi_get_task(task);
      }
      
      This commit will take extra conn->session->back_lock in iscsi_xmit_task()
      to ensure iscsi_xmit_task() waits for iscsi_complete_task(), if
      iscsi_complete_task() wins the race.  If iscsi_xmit_task() wins the race,
      iscsi_xmit_task() increments task->refcount
      (__iscsi_get_task) ensuring iscsi_complete_task() will not iscsi_free_task().
      
      Signed-off-by: default avatarAnoob Soman <anoob.soman@citrix.com>
      Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
      Acked-by: default avatarLee Duncan <lduncan@suse.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarYufen Yu <yuyufen@huawei.com>
      Reviewed-by: default avatarMiao Xie <miaoxie@huawei.com>
      Reviewed-by: default avatarJason Yan <yanaijie@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      6d70a601
    • Mao Wenan's avatar
      net: sit: fix memory leak in sit_init_net() · 8f137c70
      Mao Wenan authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0
      commit 07f12b26
      category: bugfix
      bugzilla: 10988
      CVE: NA
      
      -------------------------------------------------
      
      If register_netdev() is failed to register sitn->fb_tunnel_dev,
      it will go to err_reg_dev and forget to free netdev(sitn->fb_tunnel_dev).
      
      BUG: memory leak
      unreferenced object 0xffff888378daad00 (size 512):
        comm "syz-executor.1", pid 4006, jiffies 4295121142 (age 16.115s)
        hex dump (first 32 bytes):
          00 e6 ed c0 83 88 ff ff 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      backtrace:
          [<00000000d6dcb63e>] kvmalloc include/linux/mm.h:577 [inline]
          [<00000000d6dcb63e>] kvzalloc include/linux/mm.h:585 [inline]
          [<00000000d6dcb63e>] netif_alloc_netdev_queues net/core/dev.c:8380 [inline]
          [<00000000d6dcb63e>] alloc_netdev_mqs+0x600/0xcc0 net/core/dev.c:8970
          [<00000000867e172f>] sit_init_net+0x295/0xa40 net/ipv6/sit.c:1848
          [<00000000871019fa>] ops_init+0xad/0x3e0 net/core/net_namespace.c:129
          [<00000000319507f6>] setup_net+0x2ba/0x690 net/core/net_namespace.c:314
          [<0000000087db4f96>] copy_net_ns+0x1dc/0x330 net/core/net_namespace.c:437
          [<0000000057efc651>] create_new_namespaces+0x382/0x730 kernel/nsproxy.c:107
          [<00000000676f83de>] copy_namespaces+0x2ed/0x3d0 kernel/nsproxy.c:165
          [<0000000030b74bac>] copy_process.part.27+0x231e/0x6db0 kernel/fork.c:1919
          [<00000000fff78746>] copy_process kernel/fork.c:1713 [inline]
          [<00000000fff78746>] _do_fork+0x1bc/0xe90 kernel/fork.c:2224
          [<000000001c2e0d1c>] do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
          [<00000000ec48bd44>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<0000000039acff8a>] 0xffffffffffffffff
      
      Fixes: 4ece9009 ("sit: fix sit0 percpu double allocations")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarMao Wenan <maowenan@huawei.com>
      Reviewed-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      8f137c70
    • Lorenzo Bianconi's avatar
      net: ip6_gre: do not report erspan_ver for ip6gre or ip6gretap · 5faad14c
      Lorenzo Bianconi authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0
      commit 103d0244
      category: bugfix
      bugzilla: 10916
      CVE: NA
      
      -------------------------------------------------
      
      Report erspan version field to userspace in ip6gre_fill_info just for
      erspan_v6 tunnels. Moreover report IFLA_GRE_ERSPAN_INDEX only for
      erspan version 1.
      The issue can be triggered with the following reproducer:
      
      $ip link add name gre6 type ip6gre local 2001::1 remote 2002::2
      $ip link set gre6 up
      $ip -d link sh gre6
      14: grep6@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1448 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
          link/gre6 2001::1 peer 2002::2 promiscuity 0 minmtu 0 maxmtu 0
          ip6gre remote 2002::2 local 2001::1 hoplimit 64 encaplimit 4 tclass 0x00 flowlabel 0x00000 erspan_index 0 erspan_ver 0 addrgenmode eui64
      
      Fixes: 94d7d8f2 ("ip6_gre: add erspan v2 support")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZhiqiang Liu <liuzhiqiang26@huawei.com>
      Reviewed-by: default avatarMao Wenan <maowenan@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      5faad14c
    • Lorenzo Bianconi's avatar
      net: ip_gre: do not report erspan_ver for gre or gretap · cb810d15
      Lorenzo Bianconi authored and 谢秀奇's avatar 谢秀奇 committed
      
      mainline inclusion
      from mainline-5.0
      commit 2bdf700e
      category: bugfix
      bugzilla: 10920
      CVE: NA
      
      -------------------------------------------------
      
      Report erspan version field to userspace in ipgre_fill_info just for
      erspan tunnels. The issue can be triggered with the following reproducer:
      
      $ip link add name gre1 type gre local 192.168.0.1 remote 192.168.1.1
      $ip link set dev gre1 up
      $ip -d link sh gre1
      13: gre1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1476 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
          link/gre 192.168.0.1 peer 192.168.1.1 promiscuity 0 minmtu 0 maxmtu 0
          gre remote 192.168.1.1 local 192.168.0.1 ttl inherit erspan_ver 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1
      
      Fixes: f551c91d ("net: erspan: introduce erspan v2 for ip_gre")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarZhiqiang Liu <liuzhiqiang26@huawei.com>
      Reviewed-by: default avatarMao Wenan <maowenan@huawei.com>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      cb810d15