Skip to content
Snippets Groups Projects
  1. Oct 11, 2022
  2. Sep 07, 2022
  3. Aug 08, 2022
  4. Jul 08, 2022
  5. Jun 14, 2022
    • Xin Long's avatar
      sctp: use call_rcu to free endpoint · 7e4c999a
      Xin Long authored
      stable inclusion
      from stable-v4.19.224
      commit af6e6e58f7ebf86b4e7201694b1e4f3a62cbc3ec
      category: bugfix
      bugzilla: 186701, https://gitee.com/src-openeuler/kernel/issues/I5CAM2
      
      
      CVE: CVE-2022-20154
      
      --------------------------------
      
      [ Upstream commit 5ec7d18d1813a5bead0b495045606c93873aecbb ]
      
      This patch is to delay the endpoint free by calling call_rcu() to fix
      another use-after-free issue in sctp_sock_dump():
      
        BUG: KASAN: use-after-free in __lock_acquire+0x36d9/0x4c20
        Call Trace:
          __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218
          lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844
          __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
          _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
          spin_lock_bh include/linux/spinlock.h:334 [inline]
          __lock_sock+0x203/0x350 net/core/sock.c:2253
          lock_sock_nested+0xfe/0x120 net/core/sock.c:2774
          lock_sock include/net/sock.h:1492 [inline]
          sctp_sock_dump+0x122/0xb20 net/sctp/diag.c:324
          sctp_for_each_transport+0x2b5/0x370 net/sctp/socket.c:5091
          sctp_diag_dump+0x3ac/0x660 net/sctp/diag.c:527
          __inet_diag_dump+0xa8/0x140 net/ipv4/inet_diag.c:1049
          inet_diag_dump+0x9b/0x110 net/ipv4/inet_diag.c:1065
          netlink_dump+0x606/0x1080 net/netlink/af_netlink.c:2244
          __netlink_dump_start+0x59a/0x7c0 net/netlink/af_netlink.c:2352
          netlink_dump_start include/linux/netlink.h:216 [inline]
          inet_diag_handler_cmd+0x2ce/0x3f0 net/ipv4/inet_diag.c:1170
          __sock_diag_cmd net/core/sock_diag.c:232 [inline]
          sock_diag_rcv_msg+0x31d/0x410 net/core/sock_diag.c:263
          netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2477
          sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:274
      
      This issue occurs when asoc is peeled off and the old sk is freed after
      getting it by asoc->base.sk and before calling lock_sock(sk).
      
      To prevent the sk free, as a holder of the sk, ep should be alive when
      calling lock_sock(). This patch uses call_rcu() and moves sock_put and
      ep free into sctp_endpoint_destroy_rcu(), so that it's safe to try to
      hold the ep under rcu_read_lock in sctp_transport_traverse_process().
      
      If sctp_endpoint_hold() returns true, it means this ep is still alive
      and we have held it and can continue to dump it; If it returns false,
      it means this ep is dead and can be freed after rcu_read_unlock, and
      we should skip it.
      
      In sctp_sock_dump(), after locking the sk, if this ep is different from
      tsp->asoc->ep, it means during this dumping, this asoc was peeled off
      before calling lock_sock(), and the sk should be skipped; If this ep is
      the same with tsp->asoc->ep, it means no peeloff happens on this asoc,
      and due to lock_sock, no peeloff will happen either until release_sock.
      
      Note that delaying endpoint free won't delay the port release, as the
      port release happens in sctp_endpoint_destroy() before calling call_rcu().
      Also, freeing endpoint by call_rcu() makes it safe to access the sk by
      asoc->base.sk in sctp_assocs_seq_show() and sctp_rcv().
      
      Thanks Jones to bring this issue up.
      
      v1->v2:
        - improve the changelog.
        - add kfree(ep) into sctp_endpoint_destroy_rcu(), as Jakub noticed.
      
      Reported-by: default avatar <syzbot+9276d76e83e3bcde6c99@syzkaller.appspotmail.com>
      Reported-by: default avatarLee Jones <lee.jones@linaro.org>
      Fixes: d25adbeb ("sctp: fix an use-after-free issue in sctp_sock_dump")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarHuang Guobin <huangguobin4@huawei.com>
      Reviewed-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Reviewed-by: default avatarXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      7e4c999a
  6. Jun 02, 2022
    • Eric Dumazet's avatar
      tcp: make sure treq->af_specific is initialized · 5b15aac4
      Eric Dumazet authored
      stable inclusion
      from stable-4.19.242
      commit 6c2176f5ad48095aa1e2608b51bada5bebc568c1
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5A6BA
      
      
      CVE: NA
      
      --------------------------------
      
      commit ba5a4fdd63ae0c575707030db0b634b160baddd7 upstream.
      
      syzbot complained about a recent change in TCP stack,
      hitting a NULL pointer [1]
      
      tcp request sockets have an af_specific pointer, which
      was used before the blamed change only for SYNACK generation
      in non SYNCOOKIE mode.
      
      tcp requests sockets momentarily created when third packet
      coming from client in SYNCOOKIE mode were not using
      treq->af_specific.
      
      Make sure this field is populated, in the same way normal
      TCP requests sockets do in tcp_conn_request().
      
      [1]
      TCP: request_sock_TCPv6: Possible SYN flooding on port 20002. Sending cookies.  Check SNMP counters.
      general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
      CPU: 1 PID: 3695 Comm: syz-executor864 Not tainted 5.18.0-rc3-syzkaller-00224-g5fd1fe4807f9 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:tcp_create_openreq_child+0xe16/0x16b0 net/ipv4/tcp_minisocks.c:534
      Code: 48 c1 ea 03 80 3c 02 00 0f 85 e5 07 00 00 4c 8b b3 28 01 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7e 08 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 c9 07 00 00 48 8b 3c 24 48 89 de 41 ff 56 08 48
      RSP: 0018:ffffc90000de0588 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff888076490330 RCX: 0000000000000100
      RDX: 0000000000000001 RSI: ffffffff87d67ff0 RDI: 0000000000000008
      RBP: ffff88806ee1c7f8 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff87d67f00 R11: 0000000000000000 R12: ffff88806ee1bfc0
      R13: ffff88801b0e0368 R14: 0000000000000000 R15: 0000000000000000
      FS:  00007f517fe58700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffcead76960 CR3: 000000006f97b000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <IRQ>
       tcp_v6_syn_recv_sock+0x199/0x23b0 net/ipv6/tcp_ipv6.c:1267
       tcp_get_cookie_sock+0xc9/0x850 net/ipv4/syncookies.c:207
       cookie_v6_check+0x15c3/0x2340 net/ipv6/syncookies.c:258
       tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1131 [inline]
       tcp_v6_do_rcv+0x1148/0x13b0 net/ipv6/tcp_ipv6.c:1486
       tcp_v6_rcv+0x3305/0x3840 net/ipv6/tcp_ipv6.c:1725
       ip6_protocol_deliver_rcu+0x2e9/0x1900 net/ipv6/ip6_input.c:422
       ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:464
       NF_HOOK include/linux/netfilter.h:307 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:473
       dst_input include/net/dst.h:461 [inline]
       ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       ipv6_rcv+0x27f/0x3b0 net/ipv6/ip6_input.c:297
       __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5405
       __netif_receive_skb+0x24/0x1b0 net/core/dev.c:5519
       process_backlog+0x3a0/0x7c0 net/core/dev.c:5847
       __napi_poll+0xb3/0x6e0 net/core/dev.c:6413
       napi_poll net/core/dev.c:6480 [inline]
       net_rx_action+0x8ec/0xc60 net/core/dev.c:6567
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
       invoke_softirq kernel/softirq.c:432 [inline]
       __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
       irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
       sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1097
      
      Fixes: 5b0b9e4c2c89 ("tcp: md5: incorrect tcp_header_len for incoming connections")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Francesco Ruggeri <fruggeri@arista.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [fruggeri: Account for backport conflicts from 35b2c321 and 6fc8c827]
      Signed-off-by: default avatarFrancesco Ruggeri <fruggeri@arista.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      5b15aac4
    • Eric Dumazet's avatar
      tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT · 6cba1671
      Eric Dumazet authored
      stable inclusion
      from stable-4.19.242
      commit cc639aa3c2f5ec7189a2917af49559006f678c62
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5A6BA
      
      
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 4bfe744ff1644fbc0a991a2677dc874475dd6776 ]
      
      I had this bug sitting for too long in my pile, it is time to fix it.
      
      Thanks to Doug Porter for reminding me of it!
      
      We had various attempts in the past, including commit
      0cbe6a8f ("tcp: remove SOCK_QUEUE_SHRUNK"),
      but the issue is that TCP stack currently only generates
      EPOLLOUT from input path, when tp->snd_una has advanced
      and skb(s) cleaned from rtx queue.
      
      If a flow has a big RTT, and/or receives SACKs, it is possible
      that the notsent part (tp->write_seq - tp->snd_nxt) reaches 0
      and no more data can be sent until tp->snd_una finally advances.
      
      What is needed is to also check if POLLOUT needs to be generated
      whenever tp->snd_nxt is advanced, from output path.
      
      This bug triggers more often after an idle period, as
      we do not receive ACK for at least one RTT. tcp_notsent_lowat
      could be a fraction of what CWND and pacing rate would allow to
      send during this RTT.
      
      In a followup patch, I will remove the bogus call
      to tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED)
      from tcp_check_space(). Fact that we have decided to generate
      an EPOLLOUT does not mean the application has immediately
      refilled the transmit queue. This optimistic call
      might have been the reason the bug seemed not too serious.
      
      Tested:
      
      200 ms rtt, 1% packet loss, 32 MB tcp_rmem[2] and tcp_wmem[2]
      
      $ echo 500000 >/proc/sys/net/ipv4/tcp_notsent_lowat
      $ cat bench_rr.sh
      SUM=0
      for i in {1..10}
      do
       V=`netperf -H remote_host -l30 -t TCP_RR -- -r 10000000,10000 -o LOCAL_BYTES_SENT | egrep -v "MIGRATED|Bytes"`
       echo $V
       SUM=$(($SUM + $V))
      done
      echo SUM=$SUM
      
      Before patch:
      $ bench_rr.sh
      130000000
      80000000
      140000000
      140000000
      140000000
      140000000
      130000000
      40000000
      90000000
      110000000
      SUM=1140000000
      
      After patch:
      $ bench_rr.sh
      430000000
      590000000
      530000000
      450000000
      450000000
      350000000
      450000000
      490000000
      480000000
      460000000
      SUM=4680000000  # This is 410 % of the value before patch.
      
      Fixes: c9bee3b7 ("tcp: TCP_NOTSENT_LOWAT socket option")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDoug Porter <dsp@fb.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      6cba1671
    • Ido Schimmel's avatar
      ipv4: Invalidate neighbour for broadcast address upon address addition · 394ec9b5
      Ido Schimmel authored
      stable inclusion
      from stable-4.19.238
      commit 75517bd7e4afad5a800b4d14f80aa7e6f73a9681
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5A6BA
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 0c51e12e218f20b7d976158fdc18019627326f7a ]
      
      In case user space sends a packet destined to a broadcast address when a
      matching broadcast route is not configured, the kernel will create a
      unicast neighbour entry that will never be resolved [1].
      
      When the broadcast route is configured, the unicast neighbour entry will
      not be invalidated and continue to linger, resulting in packets being
      dropped.
      
      Solve this by invalidating unresolved neighbour entries for broadcast
      addresses after routes for these addresses are internally configured by
      the kernel. This allows the kernel to create a broadcast neighbour entry
      following the next route lookup.
      
      Another possible solution that is more generic but also more complex is
      to have the ARP code register a listener to the FIB notification chain
      and invalidate matching neighbour entries upon the addition of broadcast
      routes.
      
      It is also possible to wave off the issue as a user space problem, but
      it seems a bit excessive to expect user space to be that intimately
      familiar with the inner workings of the FIB/neighbour kernel code.
      
      [1] https://lore.kernel.org/netdev/55a04a8f-56f3-f73c-2aea-2195923f09d1@huawei.com/
      
      
      
      Reported-by: default avatarWang Hai <wanghai38@huawei.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Tested-by: default avatarWang Hai <wanghai38@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      394ec9b5
  7. May 18, 2022
  8. May 13, 2022
  9. May 07, 2022
  10. Apr 16, 2022
  11. Apr 12, 2022
    • Eric Dumazet's avatar
      bonding: fix data-races around agg_select_timer · 68f74ceb
      Eric Dumazet authored
      
      stable inclusion
      from linux-4.19.231
      commit 4218e6995c19970aa7b32914be6c8e059837cbdf
      
      --------------------------------
      
      commit 9ceaf6f76b203682bb6100e14b3d7da4c0bedde8 upstream.
      
      syzbot reported that two threads might write over agg_select_timer
      at the same time. Make agg_select_timer atomic to fix the races.
      
      BUG: KCSAN: data-race in bond_3ad_initiate_agg_selection / bond_3ad_state_machine_handler
      
      read to 0xffff8881242aea90 of 4 bytes by task 1846 on cpu 1:
       bond_3ad_state_machine_handler+0x99/0x2810 drivers/net/bonding/bond_3ad.c:2317
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       worker_thread+0x616/0xa70 kernel/workqueue.c:2454
       kthread+0x1bf/0x1e0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30
      
      write to 0xffff8881242aea90 of 4 bytes by task 25910 on cpu 0:
       bond_3ad_initiate_agg_selection+0x18/0x30 drivers/net/bonding/bond_3ad.c:1998
       bond_open+0x658/0x6f0 drivers/net/bonding/bond_main.c:3967
       __dev_open+0x274/0x3a0 net/core/dev.c:1407
       dev_open+0x54/0x190 net/core/dev.c:1443
       bond_enslave+0xcef/0x3000 drivers/net/bonding/bond_main.c:1937
       do_set_master net/core/rtnetlink.c:2532 [inline]
       do_setlink+0x94f/0x2500 net/core/rtnetlink.c:2736
       __rtnl_newlink net/core/rtnetlink.c:3414 [inline]
       rtnl_newlink+0xfeb/0x13e0 net/core/rtnetlink.c:3529
       rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5594
       netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494
       rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2413
       ___sys_sendmsg net/socket.c:2467 [inline]
       __sys_sendmsg+0x195/0x230 net/socket.c:2496
       __do_sys_sendmsg net/socket.c:2505 [inline]
       __se_sys_sendmsg net/socket.c:2503 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x00000050 -> 0x0000004f
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 25910 Comm: syz-executor.1 Tainted: G        W         5.17.0-rc4-syzkaller-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      Signed-off-by: default avatarLaibin Qiu <qiulaibin@huawei.com>
      68f74ceb
    • Antoine Tenart's avatar
      net: fix a memleak when uncloning an skb dst and its metadata · 0c4296c0
      Antoine Tenart authored
      
      stable inclusion
      from linux-4.19.230
      commit 0be943916d781df2b652793bb2d3ae4f9624c10a
      
      --------------------------------
      
      [ Upstream commit 9eeabdf17fa0ab75381045c867c370f4cc75a613 ]
      
      When uncloning an skb dst and its associated metadata, a new
      dst+metadata is allocated and later replaces the old one in the skb.
      This is helpful to have a non-shared dst+metadata attached to a specific
      skb.
      
      The issue is the uncloned dst+metadata is initialized with a refcount of
      1, which is increased to 2 before attaching it to the skb. When
      tun_dst_unclone returns, the dst+metadata is only referenced from a
      single place (the skb) while its refcount is 2. Its refcount will never
      drop to 0 (when the skb is consumed), leading to a memory leak.
      
      Fix this by removing the call to dst_hold in tun_dst_unclone, as the
      dst+metadata refcount is already 1.
      
      Fixes: fc4099f1 ("openvswitch: Fix egress tunnel info.")
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Reported-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Tested-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      Signed-off-by: default avatarLaibin Qiu <qiulaibin@huawei.com>
      0c4296c0
    • Antoine Tenart's avatar
      net: do not keep the dst cache when uncloning an skb dst and its metadata · ceaed295
      Antoine Tenart authored
      
      stable inclusion
      from linux-4.19.230
      commit 040e92ea3d7d6f27c1b71d6502e35c54a0939cb7
      
      --------------------------------
      
      [ Upstream commit cfc56f85e72f5b9c5c5be26dc2b16518d36a7868 ]
      
      When uncloning an skb dst and its associated metadata a new dst+metadata
      is allocated and the tunnel information from the old metadata is copied
      over there.
      
      The issue is the tunnel metadata has references to cached dst, which are
      copied along the way. When a dst+metadata refcount drops to 0 the
      metadata is freed including the cached dst entries. As they are also
      referenced in the initial dst+metadata, this ends up in UaFs.
      
      In practice the above did not happen because of another issue, the
      dst+metadata was never freed because its refcount never dropped to 0
      (this will be fixed in a subsequent patch).
      
      Fix this by initializing the dst cache after copying the tunnel
      information from the old metadata to also unshare the dst cache.
      
      Fixes: d71785ff ("net: add dst_cache to ovs vxlan lwtunnel")
      Cc: Paolo Abeni <pabeni@redhat.com>
      Reported-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Tested-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      Signed-off-by: default avatarLaibin Qiu <qiulaibin@huawei.com>
      ceaed295
    • Eric Dumazet's avatar
      ipv6: annotate accesses to fn->fn_sernum · 89141213
      Eric Dumazet authored
      
      stable inclusion
      from linux-4.19.228
      commit 243e898452adcf6055c9b2632becef7c59f02874
      
      --------------------------------
      
      commit aafc2e3285c2d7a79b7ee15221c19fbeca7b1509 upstream.
      
      struct fib6_node's fn_sernum field can be
      read while other threads change it.
      
      Add READ_ONCE()/WRITE_ONCE() annotations.
      
      Do not change existing smp barriers in fib6_get_cookie_safe()
      and __fib6_update_sernum_upto_root()
      
      syzbot reported:
      
      BUG: KCSAN: data-race in fib6_clean_node / inet6_csk_route_socket
      
      write to 0xffff88813df62e2c of 4 bytes by task 1920 on cpu 1:
       fib6_clean_node+0xc2/0x260 net/ipv6/ip6_fib.c:2178
       fib6_walk_continue+0x38e/0x430 net/ipv6/ip6_fib.c:2112
       fib6_walk net/ipv6/ip6_fib.c:2160 [inline]
       fib6_clean_tree net/ipv6/ip6_fib.c:2240 [inline]
       __fib6_clean_all+0x1a9/0x2e0 net/ipv6/ip6_fib.c:2256
       fib6_flush_trees+0x6c/0x80 net/ipv6/ip6_fib.c:2281
       rt_genid_bump_ipv6 include/net/net_namespace.h:488 [inline]
       addrconf_dad_completed+0x57f/0x870 net/ipv6/addrconf.c:4230
       addrconf_dad_work+0x908/0x1170
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       worker_thread+0x616/0xa70 kernel/workqueue.c:2454
       kthread+0x1bf/0x1e0 kernel/kthread.c:359
       ret_from_fork+0x1f/0x30
      
      read to 0xffff88813df62e2c of 4 bytes by task 15701 on cpu 0:
       fib6_get_cookie_safe include/net/ip6_fib.h:285 [inline]
       rt6_get_cookie include/net/ip6_fib.h:306 [inline]
       ip6_dst_store include/net/ip6_route.h:234 [inline]
       inet6_csk_route_socket+0x352/0x3c0 net/ipv6/inet6_connection_sock.c:109
       inet6_csk_xmit+0x91/0x1e0 net/ipv6/inet6_connection_sock.c:121
       __tcp_transmit_skb+0x1323/0x1840 net/ipv4/tcp_output.c:1402
       tcp_transmit_skb net/ipv4/tcp_output.c:1420 [inline]
       tcp_write_xmit+0x1450/0x4460 net/ipv4/tcp_output.c:2680
       __tcp_push_pending_frames+0x68/0x1c0 net/ipv4/tcp_output.c:2864
       tcp_push+0x2d9/0x2f0 net/ipv4/tcp.c:725
       mptcp_push_release net/mptcp/protocol.c:1491 [inline]
       __mptcp_push_pending+0x46c/0x490 net/mptcp/protocol.c:1578
       mptcp_sendmsg+0x9ec/0xa50 net/mptcp/protocol.c:1764
       inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:643
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       kernel_sendmsg+0x97/0xd0 net/socket.c:745
       sock_no_sendpage+0x84/0xb0 net/core/sock.c:3086
       inet_sendpage+0x9d/0xc0 net/ipv4/af_inet.c:834
       kernel_sendpage+0x187/0x200 net/socket.c:3492
       sock_sendpage+0x5a/0x70 net/socket.c:1007
       pipe_to_sendpage+0x128/0x160 fs/splice.c:364
       splice_from_pipe_feed fs/splice.c:418 [inline]
       __splice_from_pipe+0x207/0x500 fs/splice.c:562
       splice_from_pipe fs/splice.c:597 [inline]
       generic_splice_sendpage+0x94/0xd0 fs/splice.c:746
       do_splice_from fs/splice.c:767 [inline]
       direct_splice_actor+0x80/0xa0 fs/splice.c:936
       splice_direct_to_actor+0x345/0x650 fs/splice.c:891
       do_splice_direct+0x106/0x190 fs/splice.c:979
       do_sendfile+0x675/0xc40 fs/read_write.c:1245
       __do_sys_sendfile64 fs/read_write.c:1310 [inline]
       __se_sys_sendfile64 fs/read_write.c:1296 [inline]
       __x64_sys_sendfile64+0x102/0x140 fs/read_write.c:1296
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x0000026f -> 0x00000271
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 15701 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      The Fixes tag I chose is probably arbitrary, I do not think
      we need to backport this patch to older kernels.
      
      Fixes: c5cff856 ("ipv6: add rcu grace period before freeing fib6_node")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220120174112.1126644-1-eric.dumazet@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      Signed-off-by: default avatarLaibin Qiu <qiulaibin@huawei.com>
      89141213
    • Eric Dumazet's avatar
      ipv4: avoid using shared IP generator for connected sockets · b8ab968e
      Eric Dumazet authored
      
      stable inclusion
      from linux-4.19.228
      commit eb04c6d1ec67e30f3aa5ef82112cbfdbddfd4f65
      
      --------------------------------
      
      commit 23f57406b82de51809d5812afd96f210f8b627f3 upstream.
      
      ip_select_ident_segs() has been very conservative about using
      the connected socket private generator only for packets with IP_DF
      set, claiming it was needed for some VJ compression implementations.
      
      As mentioned in this referenced document, this can be abused.
      (Ref: Off-Path TCP Exploits of the Mixed IPID Assignment)
      
      Before switching to pure random IPID generation and possibly hurt
      some workloads, lets use the private inet socket generator.
      
      Not only this will remove one vulnerability, this will also
      improve performance of TCP flows using pmtudisc==IP_PMTUDISC_DONT
      
      Fixes: 73f156a6 ("inetpeer: get rid of ip_id_count")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reported-by: default avatarRay Che <xijiache@gmail.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      Signed-off-by: default avatarLaibin Qiu <qiulaibin@huawei.com>
      b8ab968e
  12. Mar 22, 2022
  13. Jan 17, 2022
  14. Dec 28, 2021
    • Eric Dumazet's avatar
      llc: fix out-of-bound array index in llc_sk_dev_hash() · b6a87f22
      Eric Dumazet authored
      
      stable inclusion
      from linux-4.19.218
      commit 0c727425668ddc43bcf1a19c77bad215de966e65
      
      --------------------------------
      
      [ Upstream commit 8ac9dfd58b138f7e82098a4e0a0d46858b12215b ]
      
      Both ifindex and LLC_SK_DEV_HASH_ENTRIES are signed.
      
      This means that (ifindex % LLC_SK_DEV_HASH_ENTRIES) is negative
      if @ifindex is negative.
      
      We could simply make LLC_SK_DEV_HASH_ENTRIES unsigned.
      
      In this patch I chose to use hash_32() to get more entropy
      from @ifindex, like llc_sk_laddr_hashfn().
      
      UBSAN: array-index-out-of-bounds in ./include/net/llc.h:75:26
      index -43 is out of range for type 'hlist_head [64]'
      CPU: 1 PID: 20999 Comm: syz-executor.3 Not tainted 5.15.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       ubsan_epilogue+0xb/0x5a lib/ubsan.c:151
       __ubsan_handle_out_of_bounds.cold+0x62/0x6c lib/ubsan.c:291
       llc_sk_dev_hash include/net/llc.h:75 [inline]
       llc_sap_add_socket+0x49c/0x520 net/llc/llc_conn.c:697
       llc_ui_bind+0x680/0xd70 net/llc/af_llc.c:404
       __sys_bind+0x1e9/0x250 net/socket.c:1693
       __do_sys_bind net/socket.c:1704 [inline]
       __se_sys_bind net/socket.c:1702 [inline]
       __x64_sys_bind+0x6f/0xb0 net/socket.c:1702
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7fa503407ae9
      
      Fixes: 6d2e3ea2 ("llc: use a device based hash table to speed up multicast delivery")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      b6a87f22
  15. Dec 09, 2021
  16. Dec 08, 2021
  17. Oct 25, 2021
    • Xin Long's avatar
      sctp: move 198 addresses from unusable to private scope · c0a17e35
      Xin Long authored
      
      stable inclusion
      from linux-4.19.200
      commit 53012dd6ca2f3c9420b5cc447279375a90290fb4
      
      --------------------------------
      
      [ Upstream commit 1d11fa231cabeae09a95cb3e4cf1d9dd34e00f08 ]
      
      The doc draft-stewart-tsvwg-sctp-ipv4-00 that restricts 198 addresses
      was never published. These addresses as private addresses should be
      allowed to use in SCTP.
      
      As Michael Tuexen suggested, this patch is to move 198 addresses from
      unusable to private scope.
      
      Reported-by: default avatarSérgio <surkamp@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      c0a17e35
    • Eric Dumazet's avatar
      net: annotate data race around sk_ll_usec · f785ece9
      Eric Dumazet authored
      
      stable inclusion
      from linux-4.19.200
      commit c1a5cd807960d07381364c7b05aa3a43eb6d3a2f
      
      --------------------------------
      
      [ Upstream commit 0dbffbb5335a1e3aa6855e4ee317e25e669dd302 ]
      
      sk_ll_usec is read locklessly from sk_can_busy_loop()
      while another thread can change its value in sock_setsockopt()
      
      This is correct but needs annotations.
      
      BUG: KCSAN: data-race in __skb_try_recv_datagram / sock_setsockopt
      
      write to 0xffff88814eb5f904 of 4 bytes by task 14011 on cpu 0:
       sock_setsockopt+0x1287/0x2090 net/core/sock.c:1175
       __sys_setsockopt+0x14f/0x200 net/socket.c:2100
       __do_sys_setsockopt net/socket.c:2115 [inline]
       __se_sys_setsockopt net/socket.c:2112 [inline]
       __x64_sys_setsockopt+0x62/0x70 net/socket.c:2112
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff88814eb5f904 of 4 bytes by task 14001 on cpu 1:
       sk_can_busy_loop include/net/busy_poll.h:41 [inline]
       __skb_try_recv_datagram+0x14f/0x320 net/core/datagram.c:273
       unix_dgram_recvmsg+0x14c/0x870 net/unix/af_unix.c:2101
       unix_seqpacket_recvmsg+0x5a/0x70 net/unix/af_unix.c:2067
       ____sys_recvmsg+0x15d/0x310 include/linux/uio.h:244
       ___sys_recvmsg net/socket.c:2598 [inline]
       do_recvmmsg+0x35c/0x9f0 net/socket.c:2692
       __sys_recvmmsg net/socket.c:2771 [inline]
       __do_sys_recvmmsg net/socket.c:2794 [inline]
       __se_sys_recvmmsg net/socket.c:2787 [inline]
       __x64_sys_recvmmsg+0xcf/0x150 net/socket.c:2787
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x00000000 -> 0x00000101
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 14001 Comm: syz-executor.3 Not tainted 5.13.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      f785ece9
  18. Aug 31, 2021
    • Tetsuo Handa's avatar
      Bluetooth: defer cleanup of resources in hci_unregister_dev() · 1917d1f6
      Tetsuo Handa authored
      stable inclusion
      from linux-4.19.203
      commit 3719acc161d5c1ce09912cc1c9eddc2c5faa3c66
      
      --------------------------------
      
      [ Upstream commit e04480920d1eec9c061841399aa6f35b6f987d8b ]
      
      syzbot is hitting might_sleep() warning at hci_sock_dev_event() due to
      calling lock_sock() with rw spinlock held [1].
      
      It seems that history of this locking problem is a trial and error.
      
      Commit b40df574 ("[PATCH] bluetooth: fix socket locking in
      hci_sock_dev_event()") in 2.6.21-rc4 changed bh_lock_sock() to
      lock_sock() as an attempt to fix lockdep warning.
      
      Then, commit 4ce61d1c ("[BLUETOOTH]: Fix locking in
      hci_sock_dev_event().") in 2.6.22-rc2 changed lock_sock() to
      local_bh_disable() + bh_lock_sock_nested() as an attempt to fix the
      sleep in atomic context warning.
      
      Then, commit 4b5dd696 ("Bluetooth: Remove local_bh_disable() from
      hci_sock.c") in 3.3-rc1 removed local_bh_disable().
      
      Then, commit e305509e678b ("Bluetooth: use correct lock to prevent UAF
      of hdev object") in 5.13-rc5 again changed bh_lock_sock_nested() to
      lock_sock() as an attempt to fix CVE-2021-3573.
      
      This difficulty comes from current implementation that
      hci_sock_dev_event(HCI_DEV_UNREG) is responsible for dropping all
      references from sockets because hci_unregister_dev() immediately
      reclaims resources as soon as returning from
      hci_sock_dev_event(HCI_DEV_UNREG).
      
      But the history suggests that hci_sock_dev_event(HCI_DEV_UNREG) was not
      doing what it should do.
      
      Therefore, instead of trying to detach sockets from device, let's accept
      not detaching sockets from device at hci_sock_dev_event(HCI_DEV_UNREG),
      by moving actual cleanup of resources from hci_unregister_dev() to
      hci_cleanup_dev() which is called by bt_host_release() when all
      references to this unregistered device (which is a kobject) are gone.
      
      Since hci_sock_dev_event(HCI_DEV_UNREG) no longer resets
      hci_pi(sk)->hdev, we need to check whether this device was unregistered
      and return an error based on HCI_UNREGISTER flag.  There might be subtle
      behavioral difference in "monitor the hdev" functionality; please report
      if you found something went wrong due to this patch.
      
      Link: https://syzkaller.appspot.com/bug?extid=a5df189917e79d5e59c9
      
       [1]
      Reported-by: default avatarsyzbot <syzbot+a5df189917e79d5e59c9@syzkaller.appspotmail.com>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Fixes: e305509e678b ("Bluetooth: use correct lock to prevent UAF of hdev object")
      Acked-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      1917d1f6
  19. Aug 19, 2021
  20. Aug 02, 2021