Commits · fdda68feeca82610ccbcdcbda7250623a6d187d2 · Summer2022 / 22b970497

Sep 22, 2020

arm64/ascend: Set mem_sleep_current to PM_SUSPEND_ON for ascend platform · fdda68fe

Ding Tianhong authored 4 years ago


ascend inclusion
category: feature
bugzilla: NA
CVE: NA

-------------------------------------------------

The mem_sleep_current is set to PM_SUSPEND_TO_IDLE default, it would
cause the system to hang up if the wake-up device is not registered,
therefore the PM_SUSPEND_ON need to be set to prevent the system from
entering an endless loop.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

fdda68fe

mm/swap_state: fix a data race in swapin_nr_pages · baa07aea

Qian Cai authored 4 years ago


mainline inclusion
from mainline-v5.8-rc1
commit d6c1f098
category: bugfix
bugzilla: 35806
CVE: NA

-------------------------------------------------

"prev_offset" is a static variable in swapin_nr_pages() that can be
accessed concurrently with only mmap_sem held in read mode as noticed by
KCSAN,

 BUG: KCSAN: data-race in swap_cluster_readahead / swap_cluster_readahead

 write to 0xffffffff92763830 of 8 bytes by task 14795 on cpu 17:
  swap_cluster_readahead+0x2a6/0x5e0
  swapin_readahead+0x92/0x8dc
  do_swap_page+0x49b/0xf20
  __handle_mm_fault+0xcfb/0xd70
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x715
  page_fault+0x34/0x40

 1 lock held by (dnf)/14795:
  #0: ffff897bd2e98858 (&mm->mmap_sem#2){++++}-{3:3}, at: do_page_fault+0x143/0x715
  do_user_addr_fault at arch/x86/mm/fault.c:1405
  (inlined by) do_page_fault at arch/x86/mm/fault.c:1535
 irq event stamp: 83493
 count_memcg_event_mm+0x1a6/0x270
 count_memcg_event_mm+0x119/0x270
 __do_softirq+0x365/0x589
 irq_exit+0xa2/0xc0

 read to 0xffffffff92763830 of 8 bytes by task 1 on cpu 22:
  swap_cluster_readahead+0xfd/0x5e0
  swapin_readahead+0x92/0x8dc
  do_swap_page+0x49b/0xf20
  __handle_mm_fault+0xcfb/0xd70
  handle_mm_fault+0xfc/0x2f0
  do_page_fault+0x263/0x715
  page_fault+0x34/0x40

 1 lock held by systemd/1:
  #0: ffff897c38f14858 (&mm->mmap_sem#2){++++}-{3:3}, at: do_page_fault+0x143/0x715
 irq event stamp: 43530289
 count_memcg_event_mm+0x1a6/0x270
 count_memcg_event_mm+0x119/0x270
 __do_softirq+0x365/0x589
 irq_exit+0xa2/0xc0

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Marco Elver <elver@google.com>
Cc: Hugh Dickins <hughd@google.com>
Link: http://lkml.kernel.org/r/20200402213748.2237-1-cai@lca.pw


Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

baa07aea

arm64: secomp: fix the secure computing mode 1 syscall check for ilp32 · 7c28351c

Xiongfeng Wang authored 4 years ago


hulk inclusion
category: bugfix
bugzilla: NA
CVE: NA
---------------------------

ILP32 application belongs to the compat application. But its syscall
number is different from the traditional compat a32 application. The
syscall number is the same with the lp64 application. So we need to fix
the secure computing mode 1 syscall check for ilp32.

Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

7c28351c

vti4: removed duplicate log message. · 08800a74

Jeremy Sowden authored 4 years ago


stable inclusion
from linux-4.19.119
commit be2b6b4a22013ce7016388b276a6a9b147bb0a24

--------------------------------

commit 01ce31c5 upstream.

Removed info log-message if ipip tunnel registration fails during
module-initialization: it adds nothing to the error message that is
written on all failures.

Fixes: dd9ee344 ("vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

08800a74

KEYS: Don't write out to userspace while holding key semaphore · 7893f356

Waiman Long authored 4 years ago


stable inclusion
from linux-4.19.118
commit 18779eac17b5df9dff57138d07631573553a41d4

--------------------------------

commit d3ec10aa upstream.

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE    --------- -  -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434]        down_write+0x4d/0x110
[12537.385202]        __key_link_begin+0x87/0x280
[12537.405232]        request_key_and_link+0x483/0xf70
[12537.427221]        request_key+0x3c/0x80
[12537.444839]        dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445]        dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731]        cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418]        cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263]        cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551]        cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045]        kthread+0x30c/0x3d0
[12537.617906]        ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525]        __mutex_lock+0x105/0x11f0
[12537.683734]        request_key_and_link+0x35a/0xf70
[12537.705640]        request_key+0x3c/0x80
[12537.723304]        dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773]        dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607]        cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322]        cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369]        cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262]        cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477]        kthread+0x30c/0x3d0
[12537.890281]        ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225]        __mutex_lock+0x105/0x11f0
[12537.954450]        cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250]        smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659]        cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920]        read_pages+0xf5/0x560
[12538.041583]        __do_page_cache_readahead+0x41d/0x4b0
[12538.067047]        ondemand_readahead+0x44c/0xc10
[12538.092069]        filemap_fault+0xec1/0x1830
[12538.111637]        __do_fault+0x82/0x260
[12538.129216]        do_fault+0x419/0xfb0
[12538.146390]        __handle_mm_fault+0x862/0xdf0
[12538.167408]        handle_mm_fault+0x154/0x550
[12538.187401]        __do_page_fault+0x42f/0xa60
[12538.207395]        do_page_fault+0x38/0x5e0
[12538.225777]        page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875]        lock_acquire+0x14c/0x420
[12538.286848]        __might_fault+0x119/0x1b0
[12538.306006]        keyring_read_iterator+0x7e/0x170
[12538.327936]        assoc_array_subtree_iterate+0x97/0x280
[12538.352154]        keyring_read+0xe9/0x110
[12538.370558]        keyctl_read_key+0x1b9/0x220
[12538.391470]        do_syscall_64+0xa5/0x4b0
[12538.410511]        entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829]   &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820]  Possible unsafe locking scenario:
[12538.524820]
[12538.551431]        CPU0                    CPU1
[12538.572654]        ----                    ----
[12538.595865]   lock(&type->lock_class);
[12538.613737]                                lock(root_key_user.cons_lock);
[12538.644234]                                lock(&type->lock_class);
[12538.672410]   lock(&mm->mmap_sem);
[12538.687758]
[12538.687758]  *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097]  #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897]  dump_stack+0x9a/0xf0
[12538.907908]  print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891]  ? save_trace+0xd6/0x250
[12538.948979]  check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643]  ? keyring_compare_object+0x104/0x190
[12538.992738]  ? check_usage+0x550/0x550
[12539.009845]  ? sched_clock+0x5/0x10
[12539.025484]  ? sched_clock_cpu+0x18/0x1e0
[12539.043555]  __lock_acquire+0x1f12/0x38d0
[12539.061551]  ? trace_hardirqs_on+0x10/0x10
[12539.080554]  lock_acquire+0x14c/0x420
[12539.100330]  ? __might_fault+0xc4/0x1b0
[12539.119079]  __might_fault+0x119/0x1b0
[12539.135869]  ? __might_fault+0xc4/0x1b0
[12539.153234]  keyring_read_iterator+0x7e/0x170
[12539.172787]  ? keyring_read+0x110/0x110
[12539.190059]  assoc_array_subtree_iterate+0x97/0x280
[12539.211526]  keyring_read+0xe9/0x110
[12539.227561]  ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076]  keyctl_read_key+0x1b9/0x220
[12539.266660]  do_syscall_64+0xa5/0x4b0
[12539.283091]  entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

  1) The put_user() call is replaced by a direct copy.
  2) The copy_to_user() call is replaced by memcpy().
  3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

7893f356

netfilter: nf_tables: report EOPNOTSUPP on unsupported flags/object type · 50b5b15b

Pablo Neira Ayuso authored 4 years ago


stable inclusion
from linux-4.19.118
commit 79f784c999bc43c55125432b791c6f3821b5995f

--------------------------------

commit d9583cdf upstream.

EINVAL should be used for malformed netlink messages. New userspace
utility and old kernels might easily result in EINVAL when exercising
new set features, which is misleading.

Fixes: 8aeff920 ("netfilter: nf_tables: add stateful object reference to set elements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

50b5b15b

net: revert default NAPI poll timeout to 2 jiffies · 190cfd07

Konstantin Khlebnikov authored 4 years ago


stable inclusion
from linux-4.19.117
commit f7379c0050d2bfb65e44b340f1d667254dcc3058

--------------------------------

[ Upstream commit a4837980 ]

For HZ < 1000 timeout 2000us rounds up to 1 jiffy but expires randomly
because next timer interrupt could come shortly after starting softirq.

For commonly used CONFIG_HZ=1000 nothing changes.

Fixes: 7acf8a1e ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning")
Reported-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

190cfd07

net: ipv6: do not consider routes via gateways for anycast address check · 5c17cfac

Tim Stallard authored 4 years ago


stable inclusion
from linux-4.19.117
commit 8fdf8a84ea68fff914f137169e20aba95978a7af

--------------------------------

[ Upstream commit 03e2a984 ]

The behaviour for what is considered an anycast address changed in
commit 45e4fd26 ("ipv6: Only create RTF_CACHE routes after
encountering pmtu exception"). This now considers the first
address in a subnet where there is a route via a gateway
to be an anycast address.

This breaks path MTU discovery and traceroutes when a host in a
remote network uses the address at the start of a prefix
(eg 2600:: advertised as 2600::/48 in the DFZ) as ICMP errors
will not be sent to anycast addresses.

This patch excludes any routes with a gateway, or via point to
point links, like the behaviour previously from
rt6_is_gw_or_nonexthop in net/ipv6/route.c.

This can be tested with:
ip link add v1 type veth peer name v2
ip netns add test
ip netns exec test ip link set lo up
ip link set v2 netns test
ip link set v1 up
ip netns exec test ip link set v2 up
ip addr add 2001:db8::1/64 dev v1 nodad
ip addr add 2001:db8:100:: dev lo nodad
ip netns exec test ip addr add 2001:db8::2/64 dev v2 nodad
ip netns exec test ip route add unreachable 2001:db8:1::1
ip netns exec test ip route add 2001:db8:100::/64 via 2001:db8::1
ip netns exec test sysctl net.ipv6.conf.all.forwarding=1
ip route add 2001:db8:1::1 via 2001:db8::2
ping -I 2001:db8::1 2001:db8:1::1 -c1
ping -I 2001:db8:100:: 2001:db8:1::1 -c1
ip addr delete 2001:db8:100:: dev lo
ip netns delete test

Currently the first ping will get back a destination unreachable ICMP
error, but the second will never get a response, with "icmp6_send:
acast source" logged. After this patch, both get destination
unreachable ICMP replies.

Fixes: 45e4fd26 ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: Tim Stallard <code@timstallard.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

5c17cfac

net: ipv4: devinet: Fix crash when add/del multicast IP with autojoin · 9f482e52

Taras Chornyi authored 4 years ago


stable inclusion
from linux-4.19.117
commit 80dd8146df680b8982b659341b8ecd3361f032ca

--------------------------------

[ Upstream commit 690cc863 ]

When CONFIG_IP_MULTICAST is not set and multicast ip is added to the device
with autojoin flag or when multicast ip is deleted kernel will crash.

steps to reproduce:

ip addr add 224.0.0.0/32 dev eth0
ip addr del 224.0.0.0/32 dev eth0

or

ip addr add 224.0.0.0/32 dev eth0 autojoin

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000088
 pc : _raw_write_lock_irqsave+0x1e0/0x2ac
 lr : lock_sock_nested+0x1c/0x60
 Call trace:
  _raw_write_lock_irqsave+0x1e0/0x2ac
  lock_sock_nested+0x1c/0x60
  ip_mc_config.isra.28+0x50/0xe0
  inet_rtm_deladdr+0x1a8/0x1f0
  rtnetlink_rcv_msg+0x120/0x350
  netlink_rcv_skb+0x58/0x120
  rtnetlink_rcv+0x14/0x20
  netlink_unicast+0x1b8/0x270
  netlink_sendmsg+0x1a0/0x3b0
  ____sys_sendmsg+0x248/0x290
  ___sys_sendmsg+0x80/0xc0
  __sys_sendmsg+0x68/0xc0
  __arm64_sys_sendmsg+0x20/0x30
  el0_svc_common.constprop.2+0x88/0x150
  do_el0_svc+0x20/0x80
 el0_sync_handler+0x118/0x190
  el0_sync+0x140/0x180

Fixes: 93a714d6 ("multicast: Extend ip address command to enable multicast group join/leave on")
Signed-off-by: Taras Chornyi <taras.chornyi@plvision.eu>
Signed-off-by: Vadym Kochan <vadym.kochan@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

9f482e52

mlxsw: spectrum_flower: Do not stop at FLOW_ACTION_VLAN_MANGLE · 1ead4376

Petr Machata authored 4 years ago


stable inclusion
from linux-4.19.115
commit b12448912c5e3c38f5baa58fb1f8912a1926a542

--------------------------------

[ Upstream commit ccfc5693 ]

The handler for FLOW_ACTION_VLAN_MANGLE ends by returning whatever the
lower-level function that it calls returns. If there are more actions lined
up after this action, those are never offloaded. Fix by only bailing out
when the called function returns an error.

Fixes: a150201a ("mlxsw: spectrum: Add support for vlan modify TC action")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

1ead4376

ipv6: don't auto-add link-local address to lag ports · 3b7059af

Jarod Wilson authored 4 years ago


stable inclusion
from linux-4.19.115
commit 7a5f4bd6868cc21ea9d4471265d662f7c487c3fc

--------------------------------

[ Upstream commit 744fdc82 ]

Bonding slave and team port devices should not have link-local addresses
automatically added to them, as it can interfere with openvswitch being
able to properly add tc ingress.

Basic reproducer, courtesy of Marcelo:

$ ip link add name bond0 type bond
$ ip link set dev ens2f0np0 master bond0
$ ip link set dev ens2f1np2 master bond0
$ ip link set dev bond0 up
$ ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc
mq master bond0 state UP group default qlen 1000
    link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc
mq master bond0 state DOWN group default qlen 1000
    link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
11: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc
noqueue state UP group default qlen 1000
    link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::20f:53ff:fe2f:ea40/64 scope link
       valid_lft forever preferred_lft forever

(above trimmed to relevant entries, obviously)

$ sysctl net.ipv6.conf.ens2f0np0.addr_gen_mode=0
net.ipv6.conf.ens2f0np0.addr_gen_mode = 0
$ sysctl net.ipv6.conf.ens2f1np2.addr_gen_mode=0
net.ipv6.conf.ens2f1np2.addr_gen_mode = 0

$ ip a l ens2f0np0
2: ens2f0np0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc
mq master bond0 state UP group default qlen 1000
    link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative
       valid_lft forever preferred_lft forever
$ ip a l ens2f1np2
5: ens2f1np2: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc
mq master bond0 state DOWN group default qlen 1000
    link/ether 00:0f:53:2f:ea:40 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::20f:53ff:fe2f:ea40/64 scope link tentative
       valid_lft forever preferred_lft forever

Looks like addrconf_sysctl_addr_gen_mode() bypasses the original "is
this a slave interface?" check added by commit c2edacf8, and
results in an address getting added, while w/the proposed patch added,
no address gets added. This simply adds the same gating check to another
code path, and thus should prevent the same devices from erroneously
obtaining an ipv6 link-local address.

Fixes: d35a00b8 ("net/ipv6: allow sysctl to change link-local address generation mode")
Reported-by: Moshe Levi <moshele@mellanox.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: Marcelo Ricardo Leitner <mleitner@redhat.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

3b7059af

net: Fix Tx hash bound checking · 4553d0bb

Amritha Nambiar authored 4 years ago


stable inclusion
from linux-4.19.115
commit b1cb7f2bc9b4f776ae1ab9583802b1bca34d215a

--------------------------------

commit 6e11d157 upstream.

Fixes the lower and upper bounds when there are multiple TCs and
traffic is on the the same TC on the same device.

The lower bound is represented by 'qoffset' and the upper limit for
hash value is 'qcount + qoffset'. This gives a clean Rx to Tx queue
mapping when there are multiple TCs, as the queue indices for upper TCs
will be offset by 'qoffset'.

v2: Fixed commit description based on comments.

Fixes: 1b837d48 ("net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash")
Fixes: eadec877 ("net: Add support for subordinate traffic classes to netdev_pick_tx")
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

4553d0bb

sctp: fix possibly using a bad saddr with a given dst · 611f604d

Marcelo Ricardo Leitner authored 4 years ago


stable inclusion
from linux-4.19.115
commit e2ed7b117f3fe6aa0237568dcb69ed7f39cb4979

--------------------------------

[ Upstream commit 582eea23 ]

Under certain circumstances, depending on the order of addresses on the
interfaces, it could be that sctp_v[46]_get_dst() would return a dst
with a mismatched struct flowi.

For example, if when walking through the bind addresses and the first
one is not a match, it saves the dst as a fallback (added in
410f0383), but not the flowi. Then if the next one is also not a
match, the previous dst will be returned but with the flowi information
for the 2nd address, which is wrong.

The fix is to use a locally stored flowi that can be used for such
attempts, and copy it to the parameter only in case it is a possible
match, together with the corresponding dst entry.

The patch updates IPv6 code mostly just to be in sync. Even though the issue
is also present there, it fallback is not expected to work with IPv6.

Fixes: 410f0383 ("sctp: add routing output fallback")
Reported-by: Jin Meng <meng.a.jin@nokia-sbell.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Conflicts:
  net/sctp/ipv6.c
[yyl: adjust context]

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Wenan Mao <maowenan@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

611f604d

sctp: fix refcount bug in sctp_wfree · dff678d5

Qiujun Huang authored 4 years ago


stable inclusion
from linux-4.19.115
commit 6ce6aea362d46781d4f5f03cfda16f0a395445d2

--------------------------------

[ Upstream commit 5c3e82fe ]

We should iterate over the datamsgs to move
all chunks(skbs) to newsk.

The following case cause the bug:
for the trouble SKB, it was in outq->transmitted list

sctp_outq_sack
        sctp_check_transmitted
                SKB was moved to outq->sacked list
        then throw away the sack queue
                SKB was deleted from outq->sacked
(but it was held by datamsg at sctp_datamsg_to_asoc
So, sctp_wfree was not called here)

then migrate happened

        sctp_for_each_tx_datachunk(
        sctp_clear_owner_w);
        sctp_assoc_migrate();
        sctp_for_each_tx_datachunk(
        sctp_set_owner_w);
SKB was not in the outq, and was not changed to newsk

finally

__sctp_outq_teardown
        sctp_chunk_put (for another skb)
                sctp_datamsg_put
                        __kfree_skb(msg->frag_list)
                                sctp_wfree (for SKB)
	SKB->sk was still oldsk (skb->sk != asoc->base.sk).

Reported-and-tested-by:  <syzbot+cea71eec5d6de256d54d@syzkaller.appspotmail.com>
Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Acked-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

dff678d5

net, ip_tunnel: fix interface lookup with no key · 9e20922a

William Dauchy authored 4 years ago


stable inclusion
from linux-4.19.115
commit 48dee02237117c0758410fa4989ce71bdb6cf184

--------------------------------

[ Upstream commit 25629fda ]

when creating a new ipip interface with no local/remote configuration,
the lookup is done with TUNNEL_NO_KEY flag, making it impossible to
match the new interface (only possible match being fallback or metada
case interface); e.g: `ip link add tunl1 type ipip dev eth0`

To fix this case, adding a flag check before the key comparison so we
permit to match an interface with no local/remote config; it also avoids
breaking possible userland tools relying on TUNNEL_NO_KEY flag and
uninitialised key.

context being on my side, I'm creating an extra ipip interface attached
to the physical one, and moving it to a dedicated namespace.

Fixes: c5441932 ("GRE: Refactor GRE tunneling code.")
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

9e20922a

ipv4: fix a RCU-list lock in fib_triestat_seq_show · 84989d84

Qian Cai authored 4 years ago


stable inclusion
from linux-4.19.115
commit 6f2239a1ad0c965d9faeb5d175f8c6c163b4fa57

--------------------------------

[ Upstream commit fbe4e0c1 ]

fib_triestat_seq_show() calls hlist_for_each_entry_rcu(tb, head,
tb_hlist) without rcu_read_lock() will trigger a warning,

 net/ipv4/fib_trie.c:2579 RCU-list traversed in non-reader section!!

 other info that might help us debug this:

 rcu_scheduler_active = 2, debug_locks = 1
 1 lock held by proc01/115277:
  #0: c0000014507acf00 (&p->lock){+.+.}-{3:3}, at: seq_read+0x58/0x670

 Call Trace:
  dump_stack+0xf4/0x164 (unreliable)
  lockdep_rcu_suspicious+0x140/0x164
  fib_triestat_seq_show+0x750/0x880
  seq_read+0x1a0/0x670
  proc_reg_read+0x10c/0x1b0
  __vfs_read+0x3c/0x70
  vfs_read+0xac/0x170
  ksys_read+0x7c/0x140
  system_call+0x5c/0x68

Fix it by adding a pair of rcu_read_lock/unlock() and use
cond_resched_rcu() to avoid the situation where walking of a large
number of items  may prevent scheduling for a long time.

Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

84989d84

vti6: Fix memory leak of skb if input policy check fails · 0de43bda

Torsten Hilbrich authored 4 years ago


stable inclusion
from linux-4.19.114
commit 7df44c92854964ff5540756dd47507908e4e63cf

--------------------------------

commit 2a9de3af upstream.

The vti6_rcv function performs some tests on the retrieved tunnel
including checking the IP protocol, the XFRM input policy, the
source and destination address.

In all but one places the skb is released in the error case. When
the input policy check fails the network packet is leaked.

Using the same goto-label discard in this case to fix this problem.

Fixes: ed1efb2a ("ipv6: Add support for IPsec virtual tunnel interfaces")
Signed-off-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

0de43bda

netfilter: nft_fwd_netdev: validate family and chain type · 374017ff

Pablo Neira Ayuso authored 4 years ago


stable inclusion
from linux-4.19.114
commit 24c290b811945102e2c0e51cfe4b9efea9ae49d4

--------------------------------

commit 76a109fa upstream.

Make sure the forward action is only used from ingress.

Fixes: 39e6dea2 ("netfilter: nf_tables: add forward expression to the netdev family")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

374017ff

netfilter: flowtable: reload ip{v6}h in nf_flow_tuple_ip{v6} · 487b06c5

Haishuang Yan authored 4 years ago


stable inclusion
from linux-4.19.114
commit 113df2c58a723b6e30b3f0b7b5bf1dee16d177db

--------------------------------

commit 41e9ec5a upstream.

Since pskb_may_pull may change skb->data, so we need to reload ip{v6}h at
the right place.

Fixes: a908fdec ("netfilter: nf_flow_table: move ipv6 offload hook code to nf_flow_table")
Fixes: 7d208687 ("netfilter: nf_flow_table: move ipv4 offload hook code to nf_flow_table")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

487b06c5

xfrm: policy: Fix doulbe free in xfrm_policy_timer · 53559563

YueHaibing authored 4 years ago


stable inclusion
from linux-4.19.114
commit 7ad217a824f7fab1e8534a6dfa82899ae1900bcb

--------------------------------

commit 4c59406e upstream.

After xfrm_add_policy add a policy, its ref is 2, then

                             xfrm_policy_timer
                               read_lock
                               xp->walk.dead is 0
                               ....
                               mod_timer()
xfrm_policy_kill
  policy->walk.dead = 1
  ....
  del_timer(&policy->timer)
    xfrm_pol_put //ref is 1
  xfrm_pol_put  //ref is 0
    xfrm_policy_destroy
      call_rcu
                                 xfrm_pol_hold //ref is 1
                               read_unlock
                               xfrm_pol_put //ref is 0
                                 xfrm_policy_destroy
                                  call_rcu

xfrm_policy_destroy is called twice, which may leads to
double free.

Call Trace:
RIP: 0010:refcount_warn_saturate+0x161/0x210
...
 xfrm_policy_timer+0x522/0x600
 call_timer_fn+0x1b3/0x5e0
 ? __xfrm_decode_session+0x2990/0x2990
 ? msleep+0xb0/0xb0
 ? _raw_spin_unlock_irq+0x24/0x40
 ? __xfrm_decode_session+0x2990/0x2990
 ? __xfrm_decode_session+0x2990/0x2990
 run_timer_softirq+0x5c5/0x10e0

Fix this by use write_lock_bh in xfrm_policy_kill.

Fixes: ea2dea9d ("xfrm: remove policy lock when accessing policy->walk.dead")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Timo Teräs <timo.teras@iki.fi>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

53559563

xfrm: add the missing verify_sec_ctx_len check in xfrm_add_acquire · 1f7e73e4

Xin Long authored 4 years ago


stable inclusion
from linux-4.19.114
commit 0a7b397c013322fec975f30012302f694efba2da

--------------------------------

commit a1a7e3a3 upstream.

Without doing verify_sec_ctx_len() check in xfrm_add_acquire(), it may be
out-of-bounds to access uctx->ctx_str with uctx->ctx_len, as noticed by
syz:

  BUG: KASAN: slab-out-of-bounds in selinux_xfrm_alloc_user+0x237/0x430
  Read of size 768 at addr ffff8880123be9b4 by task syz-executor.1/11650

  Call Trace:
   dump_stack+0xe8/0x16e
   print_address_description.cold.3+0x9/0x23b
   kasan_report.cold.4+0x64/0x95
   memcpy+0x1f/0x50
   selinux_xfrm_alloc_user+0x237/0x430
   security_xfrm_policy_alloc+0x5c/0xb0
   xfrm_policy_construct+0x2b1/0x650
   xfrm_add_acquire+0x21d/0xa10
   xfrm_user_rcv_msg+0x431/0x6f0
   netlink_rcv_skb+0x15a/0x410
   xfrm_netlink_rcv+0x6d/0x90
   netlink_unicast+0x50e/0x6a0
   netlink_sendmsg+0x8ae/0xd40
   sock_sendmsg+0x133/0x170
   ___sys_sendmsg+0x834/0x9a0
   __sys_sendmsg+0x100/0x1e0
   do_syscall_64+0xe5/0x660
   entry_SYSCALL_64_after_hwframe+0x6a/0xdf

So fix it by adding the missing verify_sec_ctx_len check there.

Fixes: 980ebd25 ("[IPSEC]: Sync series - acquire insert")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

1f7e73e4

xfrm: fix uctx len check in verify_sec_ctx_len · ab80f736

Xin Long authored 4 years ago


stable inclusion
from linux-4.19.114
commit cf265c64c91957fd0f1b86b7427028d823966d74

--------------------------------

commit 171d449a upstream.

It's not sufficient to do 'uctx->len != (sizeof(struct xfrm_user_sec_ctx) +
uctx->ctx_len)' check only, as uctx->len may be greater than nla_len(rt),
in which case it will cause slab-out-of-bounds when accessing uctx->ctx_str
later.

This patch is to fix it by return -EINVAL when uctx->len > nla_len(rt).

Fixes: df71837d ("[LSM-IPSec]: Security association restriction.")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

ab80f736

vti[6]: fix packet tx through bpf_redirect() in XinY cases · b8d2c6e0

Nicolas Dichtel authored 4 years ago


stable inclusion
from linux-4.19.114
commit f8ee708284e1d62ecc345908b40b7f9ccca4e603

--------------------------------

commit f1ed1026 upstream.

I forgot the 4in6/6in4 cases in my previous patch. Let's fix them.

Fixes: 95224166 ("vti[6]: fix packet tx through bpf_redirect()")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

b8d2c6e0

xfrm: handle NETDEV_UNREGISTER for xfrm device · 8e5cf8c0

Raed Salem authored 4 years ago


stable inclusion
from linux-4.19.114
commit cb2775c906eed8f350b8deed7d681bf285fbcb72

--------------------------------

commit 03891f82 upstream.

This patch to handle the asynchronous unregister
device event so the device IPsec offload resources
could be cleanly released.

Fixes: e4db5b61 ("xfrm: policy: remove pcpu policy cache")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

8e5cf8c0

ceph: check POOL_FLAG_FULL/NEARFULL in addition to OSDMAP_FULL/NEARFULL · 54a62c9f

Ilya Dryomov authored 4 years ago


stable inclusion
from linux-4.19.114
commit 1e2d0c50980c55f84035adf7e7cece8a19e6b9ec

--------------------------------

commit 76142097 upstream.

CEPH_OSDMAP_FULL/NEARFULL aren't set since mimic, so we need to consult
per-pool flags as well.  Unfortunately the backwards compatibility here
is lacking:

- the change that deprecated OSDMAP_FULL/NEARFULL went into mimic, but
  was guarded by require_osd_release >= RELEASE_LUMINOUS
- it was subsequently backported to luminous in v12.2.2, but that makes
  no difference to clients that only check OSDMAP_FULL/NEARFULL because
  require_osd_release is not client-facing -- it is for OSDs

Since all kernels are affected, the best we can do here is just start
checking both map flags and pool flags and send that to stable.

These checks are best effort, so take osdc->lock and look up pool flags
just once.  Remove the FIXME, since filesystem quotas are checked above
and RADOS quotas are reflected in POOL_FLAG_FULL: when the pool reaches
its quota, both POOL_FLAG_FULL and POOL_FLAG_FULL_QUOTA are set.

Cc: stable@vger.kernel.org
Reported-by: Yanhu Cao <gmayyyha@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

54a62c9f

vxlan: check return value of gro_cells_init() · badededb

Taehee Yoo authored 4 years ago


stable inclusion
from linux-4.19.114
commit facf9c7ecc2f0d8e8c65e4d532f690dc5e7aa659

--------------------------------

[ Upstream commit 384d91c2 ]

gro_cells_init() returns error if memory allocation is failed.
But the vxlan module doesn't check the return value of gro_cells_init().

Fixes: 58ce31cc ("vxlan: GRO support at tunnel layer")`
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

badededb

tcp: repair: fix TCP_QUEUE_SEQ implementation · d5262c62

Eric Dumazet authored 4 years ago


stable inclusion
from linux-4.19.114
commit 58b501cc08ccd688c4dd3d202cbdc4e36aeff79a

--------------------------------

[ Upstream commit 6cd6cbf5 ]

When application uses TCP_QUEUE_SEQ socket option to
change tp->rcv_next, we must also update tp->copied_seq.

Otherwise, stuff relying on tcp_inq() being precise can
eventually be confused.

For example, tcp_zerocopy_receive() might crash because
it does not expect tcp_recv_skb() to return NULL.

We could add tests in various places to fix the issue,
or simply make sure tcp_inq() wont return a random value,
and leave fast path as it is.

Note that this fixes ioctl(fd, SIOCINQ, &val) at the same
time.

Fixes: ee995283 ("tcp: Initial repair mode")
Fixes: 05255b82 ("tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

d5262c62

net: ip_gre: Accept IFLA_INFO_DATA-less configuration · 0e792e07

Petr Machata authored 4 years ago


stable inclusion
from linux-4.19.114
commit f5ebb2dd86777379a552acce0d635de8210a427c

--------------------------------

[ Upstream commit 32ca98fe ]

The fix referenced below causes a crash when an ERSPAN tunnel is created
without passing IFLA_INFO_DATA. Fix by validating passed-in data in the
same way as ipgre does.

Fixes: e1f8f78f ("net: ip_gre: Separate ERSPAN newlink / changelink callbacks")
Reported-by:  <syzbot+1b4ebf4dae4e510dd219@syzkaller.appspotmail.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

0e792e07

net: ip_gre: Separate ERSPAN newlink / changelink callbacks · f79367cf

Petr Machata authored 4 years ago


stable inclusion
from linux-4.19.114
commit 54266b2694682e7207ec66bce59f4f5323727dd3

--------------------------------

[ Upstream commit e1f8f78f ]

ERSPAN shares most of the code path with GRE and gretap code. While that
helps keep the code compact, it is also error prone. Currently a broken
userspace can turn a gretap tunnel into a de facto ERSPAN one by passing
IFLA_GRE_ERSPAN_VER. There has been a similar issue in ip6gretap in the
past.

To prevent these problems in future, split the newlink and changelink code
paths. Split the ERSPAN code out of ipgre_netlink_parms() into a new
function erspan_netlink_parms(). Extract a piece of common logic from
ipgre_newlink() and ipgre_changelink() into ipgre_newlink_encap_setup().
Add erspan_newlink() and erspan_changelink().

Fixes: 84e54fe0 ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

f79367cf

net_sched: keep alloc_hash updated after hash allocation · 2f1f199b

Cong Wang authored 4 years ago


stable inclusion
from linux-4.19.114
commit 557d015ffb27b672e24e6ad141fd887783871dc2

--------------------------------

[ Upstream commit 0d1c3530 ]

In commit 599be01e ("net_sched: fix an OOB access in cls_tcindex")
I moved cp->hash calculation before the first
tcindex_alloc_perfect_hash(), but cp->alloc_hash is left untouched.
This difference could lead to another out of bound access.

cp->alloc_hash should always be the size allocated, we should
update it after this tcindex_alloc_perfect_hash().

Reported-and-tested-by:  <syzbot+dcc34d54d68ef7d2d53d@syzkaller.appspotmail.com>
Reported-and-tested-by:  <syzbot+c72da7b9ed57cde6fca2@syzkaller.appspotmail.com>
Fixes: 599be01e ("net_sched: fix an OOB access in cls_tcindex")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

2f1f199b

net_sched: cls_route: remove the right filter from hashtable · c5b68284

Cong Wang authored 4 years ago


stable inclusion
from linux-4.19.114
commit ea3d6652c240978736a91b9e85fde9fee9359be4

--------------------------------

[ Upstream commit ef299cc3 ]

route4_change() allocates a new filter and copies values from
the old one. After the new filter is inserted into the hash
table, the old filter should be removed and freed, as the final
step of the update.

However, the current code mistakenly removes the new one. This
looks apparently wrong to me, and it causes double "free" and
use-after-free too, as reported by syzbot.

Reported-and-tested-by:  <syzbot+f9b32aaacd60305d9687@syzkaller.appspotmail.com>
Reported-and-tested-by:  <syzbot+2f8c233f131943d6056d@syzkaller.appspotmail.com>
Reported-and-tested-by:  <syzbot+9c2df9fd5e9445b74e01@syzkaller.appspotmail.com>
Fixes: 1109c005 ("net: sched: RCU cls_route")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

c5b68284

net/packet: tpacket_rcv: avoid a producer race condition · be6d55c8

Willem de Bruijn authored 4 years ago

stable inclusion
from linux-4.19.114
commit 6fb0e4385928900ccb8697748555b3f54bba5193

--------------------------------

[ Upstream commit 61fad681 ]

PACKET_RX_RING can cause multiple writers to access the same slot if a
fast writer wraps the ring while a slow writer is still copying. This
is particularly likely with few, large, slots (e.g., GSO packets).

Synchronize kernel thread ownership of rx ring slots with a bitmap.

Writers acquire a slot race-free by testing tp_status TP_STATUS_KERNEL
while holding the sk receive queue lock. They release this lock before
copying and set tp_status to TP_STATUS_USER to release to userspace
when done. During copying, another writer may take the lock, also see
TP_STATUS_KERNEL, and start writing to the same slot.

Introduce a new rx_owner_map bitmap with a bit per slot. To acquire a
slot, test and set with the lock held. To release race-free, update
tp_status and owner bit as a transaction, so take the lock again.

This is the one of a variety of discussed options (see Link below):

* instead of a shadow ring, embed the data in the slot itself, such as
in tp_padding. But any test for this field may match a value left by
userspace, causing deadlock.

* avoid the lock on release. This leaves a small race if releasing the
shadow slot before setting TP_STATUS_USER. The below reproducer showed
that this race is not academic. If releasing the slot after tp_status,
the race is more subtle. See the first link for details.

* add a new tp_status TP_KERNEL_OWNED to avoid the transactional store
of two fields. But, legacy applications may interpret all non-zero
tp_status as owned by the user. As libpcap does. So this is possible
only opt-in by newer processes. It can be added as an optional mode.

* embed the struct at the tail of pg_vec to avoid extra allocation.
The implementation proved no less complex than a separate field.

The additional locking cost on release adds contention, no different
than scaling on multicore or multiqueue h/w. In practice, below
reproducer nor small packet tcpdump showed a noticeable change in
perf report in cycles spent in spinlock. Where contention is
problematic, packet sockets support mitigation through PACKET_FANOUT.
And we can consider adding opt-in state TP_KERNEL_OWNED.

Easy to reproduce by running multiple netperf or similar TCP_STREAM
flows concurrently with `tcpdump -B 129 -n greater 60000`.

Based on an earlier patchset by Jon Rosen. See links below.

I believe this issue goes back to the introduction of tpacket_rcv,
which predates git history.

Link: https://www.mail-archive.com/netdev@vger.kernel.org/msg237222.html


Suggested-by: Jon Rosen <jrosen@cisco.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jon Rosen <jrosen@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

be6d55c8

net: cbs: Fix software cbs to consider packet sending time · 9163dc5a

Zh-yuan Ye authored 4 years ago


stable inclusion
from linux-4.19.114
commit c94fbe2892d523e8706dc60b714a677f20918ad6

--------------------------------

[ Upstream commit 961d0e5b ]

Currently the software CBS does not consider the packet sending time
when depleting the credits. It caused the throughput to be
Idleslope[kbps] * (Port transmit rate[kbps] / |Sendslope[kbps]|) where
Idleslope * (Port transmit rate / (Idleslope + |Sendslope|)) = Idleslope
is expected. In order to fix the issue above, this patch takes the time
when the packet sending completes into account by moving the anchor time
variable "last" ahead to the send completion time upon transmission and
adding wait when the next dequeue request comes before the send
completion time of the previous packet.

changelog:
V2->V3:
 - remove unnecessary whitespace cleanup
 - add the checks if port_rate is 0 before division

V1->V2:
 - combine variable "send_completed" into "last"
 - add the comment for estimate of the packet sending

Fixes: 585d763a ("net/sched: Introduce Credit Based Shaper (CBS) qdisc")
Signed-off-by: Zh-yuan Ye <ye.zh-yuan@socionext.com>
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

9163dc5a

mlxsw: spectrum_mr: Fix list iteration in error path · 6991af14

Ido Schimmel authored 4 years ago


stable inclusion
from linux-4.19.114
commit b371fdcd26675e7bc583ac9449c667e2e90b4e7e

--------------------------------

[ Upstream commit f6bf1baf ]

list_for_each_entry_from_reverse() iterates backwards over the list from
the current position, but in the error path we should start from the
previous position.

Fix this by using list_for_each_entry_continue_reverse() instead.

This suppresses the following error from coccinelle:

drivers/net/ethernet/mellanox/mlxsw//spectrum_mr.c:655:34-38: ERROR:
invalid reference to the index variable of the iterator on line 636

Fixes: c011ec1b ("mlxsw: spectrum: Add the multicast routing offloading logic")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

6991af14

Revert "ipv6: Fix handling of LLA with VRF and sockets bound to VRF" · ed4bd27d

Sasha Levin authored 4 years ago


stable inclusion
from linux-4.19.113
commit a22d7fc61f931e280b77dc755c807548bd1765d9

--------------------------------

This reverts commit 2b3541ffdd05198b329d21920a0f606009a1058b.

This patch shouldn't have been backported to 4.19.

Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

ed4bd27d

Revert "vrf: mark skb for multicast or link-local as enslaved to VRF" · 405398bb

Sasha Levin authored 4 years ago


stable inclusion
from linux-4.19.113
commit ae2f7c84371a2a4c449a92c956d0e4f83565e257

--------------------------------

This reverts commit 91c5f99d131ed3b231aaef7d4ed6799085b095a3.

This patch shouldn't have been backported to 4.19.

Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

405398bb

ipv4: ensure rcu_read_lock() in cipso_v4_error() · b1a83ac2

Matteo Croce authored 4 years ago


stable inclusion
from linux-4.19.112
commit b4176d3b1a820f792e36d7cadd5bf0eeaf71fb09

--------------------------------

commit 3e72dfdf upstream.

Similarly to commit c543cb4a ("ipv4: ensure rcu_read_lock() in
ipv4_link_failure()"), __ip_options_compile() must be called under rcu
protection.

Fixes: 3da1ed7a ("net: avoid use IPCB in cipso_v4_error")
Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

b1a83ac2

netfilter: nft_tunnel: add missing attribute validation for tunnels · 27123a7a

Jakub Kicinski authored 4 years ago


stable inclusion
from linux-4.19.111
commit 5ae2daf9977a1fa4f153c20e1996ba28a54a66d1

--------------------------------

commit 88a63771 upstream.

Add missing attribute validation for tunnel source and
destination ports to the netlink policy.

Fixes: af308b94 ("netfilter: nf_tables: add tunnel support")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

27123a7a

netfilter: nft_payload: add missing attribute validation for payload csum flags · 57551287

Jakub Kicinski authored 4 years ago


stable inclusion
from linux-4.19.111
commit 64d43185eba6d61467db53ca026fdeb66fe78646

--------------------------------

commit 9d6effb2 upstream.

Add missing attribute validation for NFTA_PAYLOAD_CSUM_FLAGS
to the netlink policy.

Fixes: 18140969 ("netfilter: nft_payload: layer 4 checksum adjustment for pseudoheader fields")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

57551287

netfilter: cthelper: add missing attribute validation for cthelper · 27318ff6

Jakub Kicinski authored 4 years ago


stable inclusion
from linux-4.19.111
commit 5b425d389ed2627aa04739a076b9da9a9adaad9e

--------------------------------

commit c049b345 upstream.

Add missing attribute validation for cthelper
to the netlink policy.

Fixes: 12f7a505 ("netfilter: add user-space connection tracking helper infrastructure")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Li Aichun <liaichun@huawei.com>
Reviewed-by: guodeqing <geffrey.guo@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

27318ff6