Skip to content
Snippets Groups Projects
  1. Oct 11, 2022
  2. Oct 25, 2021
    • Eric Dumazet's avatar
      net: annotate data race around sk_ll_usec · f785ece9
      Eric Dumazet authored
      stable inclusion
      from linux-4.19.200
      commit c1a5cd807960d07381364c7b05aa3a43eb6d3a2f
      
      --------------------------------
      
      [ Upstream commit 0dbffbb5335a1e3aa6855e4ee317e25e669dd302 ]
      
      sk_ll_usec is read locklessly from sk_can_busy_loop()
      while another thread can change its value in sock_setsockopt()
      
      This is correct but needs annotations.
      
      BUG: KCSAN: data-race in __skb_try_recv_datagram / sock_setsockopt
      
      write to 0xffff88814eb5f904 of 4 bytes by task 14011 on cpu 0:
       sock_setsockopt+0x1287/0x2090 net/core/sock.c:1175
       __sys_setsockopt+0x14f/0x200 net/socket.c:2100
       __do_sys_setsockopt net/socket.c:2115 [inline]
       __se_sys_setsockopt net/socket.c:2112 [inline]
       __x64_sys_setsockopt+0x62/0x70 net/socket.c:2112
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff88814eb5f904 of 4 bytes by task 14001 on cpu 1:
       sk_can_busy_loop include/net/busy_poll.h:41 [inline]
       __skb_try_recv_datagram+0x14f/0x320 net/core/data...
      f785ece9
  3. Dec 27, 2019
    • Eric Dumazet's avatar
      net: annotate lockless accesses to sk->sk_napi_id · 72e3e1c0
      Eric Dumazet authored and 谢秀奇's avatar 谢秀奇 committed
      
      [ Upstream commit ee8d153d ]
      
      We already annotated most accesses to sk->sk_napi_id
      
      We missed sk_mark_napi_id() and sk_mark_napi_id_once()
      which might be called without socket lock held in UDP stack.
      
      KCSAN reported :
      BUG: KCSAN: data-race in udpv6_queue_rcv_one_skb / udpv6_queue_rcv_one_skb
      
      write to 0xffff888121c6d108 of 4 bytes by interrupt on cpu 0:
       sk_mark_napi_id include/net/busy_poll.h:125 [inline]
       __udpv6_queue_rcv_skb net/ipv6/udp.c:571 [inline]
       udpv6_queue_rcv_one_skb+0x70c/0xb40 net/ipv6/udp.c:672
       udpv6_queue_rcv_skb+0xb5/0x400 net/ipv6/udp.c:689
       udp6_unicast_rcv_skb.isra.0+0xd7/0x180 net/ipv6/udp.c:832
       __udp6_lib_rcv+0x69c/0x1770 net/ipv6/udp.c:913
       udpv6_rcv+0x2b/0x40 net/ipv6/udp.c:1015
       ip6_protocol_deliver_rcu+0x22a/0xbe0 net/ipv6/ip6_input.c:409
       ip6_input_finish+0x30/0x50 net/ipv6/ip6_input.c:450
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip6_input+0x177/0x190 net/ipv6/ip6_input.c:459
       dst_input include/net/dst.h:442 [inline]
       ip6_rcv_finish+0x110/0x140 net/ipv6/ip6_input.c:76
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ipv6_rcv+0x1a1/0x1b0 net/ipv6/ip6_input.c:284
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
       process_backlog+0x1d3/0x420 net/core/dev.c:5955
       napi_poll net/core/dev.c:6392 [inline]
       net_rx_action+0x3ae/0xa90 net/core/dev.c:6460
      
      write to 0xffff888121c6d108 of 4 bytes by interrupt on cpu 1:
       sk_mark_napi_id include/net/busy_poll.h:125 [inline]
       __udpv6_queue_rcv_skb net/ipv6/udp.c:571 [inline]
       udpv6_queue_rcv_one_skb+0x70c/0xb40 net/ipv6/udp.c:672
       udpv6_queue_rcv_skb+0xb5/0x400 net/ipv6/udp.c:689
       udp6_unicast_rcv_skb.isra.0+0xd7/0x180 net/ipv6/udp.c:832
       __udp6_lib_rcv+0x69c/0x1770 net/ipv6/udp.c:913
       udpv6_rcv+0x2b/0x40 net/ipv6/udp.c:1015
       ip6_protocol_deliver_rcu+0x22a/0xbe0 net/ipv6/ip6_input.c:409
       ip6_input_finish+0x30/0x50 net/ipv6/ip6_input.c:450
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip6_input+0x177/0x190 net/ipv6/ip6_input.c:459
       dst_input include/net/dst.h:442 [inline]
       ip6_rcv_finish+0x110/0x140 net/ipv6/ip6_input.c:76
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ipv6_rcv+0x1a1/0x1b0 net/ipv6/ip6_input.c:284
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
       process_backlog+0x1d3/0x420 net/core/dev.c:5955
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 10890 Comm: syz-executor.0 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: e68b6e50 ("udp: enable busy polling for all sockets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      72e3e1c0
  4. Jul 31, 2018
  5. Jul 02, 2018
  6. May 26, 2018
  7. Aug 12, 2017
  8. Mar 25, 2017
  9. Mar 02, 2017
  10. Feb 14, 2017
  11. Nov 18, 2016
    • Eric Dumazet's avatar
      udp: enable busy polling for all sockets · e68b6e50
      Eric Dumazet authored
      
      UDP busy polling is restricted to connected UDP sockets.
      
      This is because sk_busy_loop() only takes care of one NAPI context.
      
      There are cases where it could be extended.
      
      1) Some hosts receive traffic on a single NIC, with one RX queue.
      
      2) Some applications use SO_REUSEPORT and associated BPF filter
         to split the incoming traffic on one UDP socket per RX
      queue/thread/cpu
      
      3) Some UDP sockets are used to send/receive traffic for one flow, but
      they do not bother with connect()
      
      This patch records the napi_id of first received skb, giving more
      reach to busy polling.
      
      Tested:
      
      lpaa23:~# echo 70 >/proc/sys/net/core/busy_read
      lpaa24:~# echo 70 >/proc/sys/net/core/busy_read
      
      lpaa23:~# for f in `seq 1 10`; do ./super_netperf 1 -H lpaa24 -t UDP_RR -l 5; done
      
      Before patch :
         27867   28870   37324   41060   41215
         36764   36838   44455   41282   43843
      After patch :
         73920   73213   70147   74845   71697
         68315   68028   75219   70082   73707
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e68b6e50
  12. Nov 17, 2016
  13. Nov 19, 2015
  14. Jan 14, 2014
    • Peter Zijlstra's avatar
      sched, net: Fixup busy_loop_us_clock() · 37089834
      Peter Zijlstra authored
      
      The only valid use of preempt_enable_no_resched() is if the very next
      line is schedule() or if we know preemption cannot actually be enabled
      by that statement due to known more preempt_count 'refs'.
      
      This busy_poll stuff looks to be completely and utterly broken,
      sched_clock() can return utter garbage with interrupts enabled (rare
      but still) and it can drift unbounded between CPUs.
      
      This means that if you get preempted/migrated and your new CPU is
      years behind on the previous CPU we get to busy spin for a _very_ long
      time.
      
      There is a _REASON_ sched_clock() warns about preemptability -
      papering over it with a preempt_disable()/preempt_enable_no_resched()
      is just terminal brain damage on so many levels.
      
      Replace sched_clock() usage with local_clock() which has a bounded
      drift between CPUs (<2 jiffies).
      
      There is a further problem with the entire busy wait poll thing in
      that the spin time is additive to the syscall timeout, not inclusive.
      
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: rui.zhang@intel.com
      Cc: jacob.jun.pan@linux.intel.com
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: hpa@zytor.com
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: lenb@kernel.org
      Cc: rjw@rjwysocki.net
      Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.net
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      37089834
  15. Aug 29, 2013
  16. Aug 10, 2013
  17. Aug 05, 2013
  18. Aug 02, 2013
  19. Jul 11, 2013
  20. Jul 10, 2013
  21. Jul 09, 2013
  22. Jul 03, 2013
  23. Jul 02, 2013
  24. Jun 26, 2013
    • Eliezer Tamir's avatar
      net: poll/select low latency socket support · 2d48d67f
      Eliezer Tamir authored
      
      select/poll busy-poll support.
      
      Split sysctl value into two separate ones, one for read and one for poll.
      updated Documentation/sysctl/net.txt
      
      Add a new poll flag POLL_LL. When this flag is set, sock_poll will call
      sk_poll_ll if possible. sock_poll sets this flag in its return value
      to indicate to select/poll when a socket that can busy poll is found.
      
      When poll/select have nothing to report, call the low-level
      sock_poll again until we are out of time or we find something.
      
      Once the system call finds something, it stops setting POLL_LL, so it can
      return the result to the user ASAP.
      
      Signed-off-by: default avatarEliezer Tamir <eliezer.tamir@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d48d67f
  25. Jun 18, 2013
  26. Jun 11, 2013