Commits · 6c76c5928c44c20298281f1ff161baac08c7e40d · Summer2022 / 22b970264

Oct 11, 2022

netfilter: ebtables: reject blobs that don't provide all entry points · 6c76c592

stable inclusion
from stable-v4.19.257
commit 358765beb836f5fc2ed26b5df4140d5d3548ac11
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4
CVE: NA

--------------------------------

[ Upstream commit 7997eff82828304b780dc0a39707e1946d6f1ebf ]

Harshit Mogalapalli says:
 In ebt_do_table() function dereferencing 'private->hook_entry[hook]'
 can lead to NULL pointer dereference. [..] Kernel panic:

general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
[..]
RIP: 0010:ebt_do_table+0x1dc/0x1ce0
Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 5c 16 00 00 48 b8 00 00 00 00 00 fc ff df 49 8b 6c df 08 48 8d 7d 2c 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 88
[..]
Call Trace:
 nf_hook_slow+0xb1/0x170
 __br_forward+0x289/0x730
 maybe_deliver+0x24b/0x380
 br_flood+0...

6c76c592

mm: Fix TLB flush for not-first PFNMAP mappings in unmap_region() · e7a18d60

Jann Horn authored 2 years ago

stable inclusion
from stable-v4.19.259
commit 56fa5f3dd44a05a5eacd75ae9d00c5415046d371
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

This is a stable-specific patch.
I botched the stable-specific rewrite of
commit b67fbebd4cf98 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas"):
As Hugh pointed out, unmap_region() actually operates on a list of VMAs,
and the variable "vma" merely points to the first VMA in that list.
So if we want to check whether any of the VMAs we're operating on is
PFNMAP or MIXEDMAP, we have to iterate through the list and check each VMA.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

e7a18d60

SUNRPC: use _bh spinlocking on ->transport_lock · 7cbdfd4c

NeilBrown authored 2 years ago

stable inclusion
from stable-v4.19.258
commit bcab4d551a3d15bdf8efaac7b61420219bd154b6
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

Prior to Linux 5.3, ->transport_lock in sunrpc required the _bh style
spinlocks (when not called from a bottom-half handler).

When upstream 3848e96edf4788f772d83990022fa7023a233d83 was backported to
stable kernels, the spin_lock/unlock calls should have been changed to
the _bh version, but this wasn't noted in the patch and didn't happen.

So convert these lock/unlock calls to the _bh versions.

This patch is required for any stable kernel prior to 5.3 to which the
above mentioned patch was backported.  Namely 4.9.y, 4.14.y, 4.19.y.

Signed-off-by: NeilBrown <neilb@suse.de>
Reported-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Reviewed-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Tested-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

7cbdfd4c

tcp: fix early ETIMEDOUT after spurious non-SACK RTO · d2189b9e

Neal Cardwell authored 2 years ago

stable inclusion
from stable-v4.19.258
commit 4d941fdf910bd6da3a017627ef9c08e9f3f17c06
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit 686dc2db2a0fdc1d34b424ec2c0a735becd8d62b ]

Fix a bug reported and analyzed by Nagaraj Arankal, where the handling
of a spurious non-SACK RTO could cause a connection to fail to clear
retrans_stamp, causing a later RTO to very prematurely time out the
connection with ETIMEDOUT.

Here is the buggy scenario, expanding upon Nagaraj Arankal's excellent
report:

(*1) Send one data packet on a non-SACK connection

(*2) Because no ACK packet is received, the packet is retransmitted
     and we enter CA_Loss; but this retransmission is spurious.

(*3) The ACK for the original data is received. The transmitted packet
     is acknowledged.  The TCP timestamp is before the retrans_stamp,
     so tcp_may_undo() returns true, and tcp_try_undo_loss() returns
     true without changing state to Open (because tcp_is_sack() is
     false), and tcp_process_loss() returns without calling
     tcp_try_undo_recovery().  Normally after undoing a CA_Loss
     episode, tcp_fastretrans_alert() would see that the connection
     has returned to CA_Open and fall through and call
     tcp_try_to_open(), which would set retrans_stamp to 0.  However,
     for non-SACK connections we hold the connection in CA_Loss, so do
     not fall through to call tcp_try_to_open() and do not set
     retrans_stamp to 0. So retrans_stamp is (erroneously) still
     non-zero.

     At this point the first "retransmission event" has passed and
     been recovered from. Any future retransmission is a completely
     new "event". However, retrans_stamp is erroneously still
     set. (And we are still in CA_Loss, which is correct.)

(*4) After 16 minutes (to correspond with tcp_retries2=15), a new data
     packet is sent. Note: No data is transmitted between (*3) and
     (*4) and we disabled keep alives.

     The socket's timeout SHOULD be calculated from this point in
     time, but instead it's calculated from the prior "event" 16
     minutes ago (step (*2)).

(*5) Because no ACK packet is received, the packet is retransmitted.

(*6) At the time of the 2nd retransmission, the socket returns
     ETIMEDOUT, prematurely, because retrans_stamp is (erroneously)
     too far in the past (set at the time of (*2)).

This commit fixes this bug by ensuring that we reuse in
tcp_try_undo_loss() the same careful logic for non-SACK connections
that we have in tcp_try_undo_recovery(). To avoid duplicating logic,
we factor out that logic into a new
tcp_is_non_sack_preventing_reopen() helper and call that helper from
both undo functions.

Fixes: da34ac76 ("tcp: only undo on partial ACKs in CA_Loss")
Reported-by: Nagaraj Arankal <nagaraj.p.arankal@hpe.com>
Link: https://lore.kernel.org/all/SJ0PR84MB1847BE6C24D274C46A1B9B0EB27A9@SJ0PR84MB1847.NAMPRD84.PROD.OUTLOOK.COM/


Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220903121023.866900-1-ncardwell.kernel@gmail.com


Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

d2189b9e

netfilter: br_netfilter: Drop dst references before setting. · 858c6aad

Harsh Modi authored 2 years ago

stable inclusion
from stable-v4.19.258
commit 89810dbbffff7efae8d83b39557a50a7dfc65b44
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit d047283a7034140ea5da759a494fd2274affdd46 ]

The IPv6 path already drops dst in the daddr changed case, but the IPv4
path does not. This change makes the two code paths consistent.

Further, it is possible that there is already a metadata_dst allocated from
ingress that might already be attached to skbuff->dst while following
the bridge path. If it is not released before setting a new
metadata_dst, it will be leaked. This is similar to what is done in
bpf_set_tunnel_key() or ip6_route_input().

It is important to note that the memory being leaked is not the dst
being set in the bridge code, but rather memory allocated from some
other code path that is not being freed correctly before the skb dst is
overwritten.

An example of the leakage fixed by this commit found using kmemleak:

unreferenced object 0xffff888010112b00 (size 256):
  comm "softirq", pid 0, jiffies 4294762496 (age 32.012s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 80 16 f1 83 ff ff ff ff  ................
    e1 4e f6 82 ff ff ff ff 00 00 00 00 00 00 00 00  .N..............
  backtrace:
    [<00000000d79567ea>] metadata_dst_alloc+0x1b/0xe0
    [<00000000be113e13>] udp_tun_rx_dst+0x174/0x1f0
    [<00000000a36848f4>] geneve_udp_encap_recv+0x350/0x7b0
    [<00000000d4afb476>] udp_queue_rcv_one_skb+0x380/0x560
    [<00000000ac064aea>] udp_unicast_rcv_skb+0x75/0x90
    [<000000009a8ee8c5>] ip_protocol_deliver_rcu+0xd8/0x230
    [<00000000ef4980bb>] ip_local_deliver_finish+0x7a/0xa0
    [<00000000d7533c8c>] __netif_receive_skb_one_core+0x89/0xa0
    [<00000000a879497d>] process_backlog+0x93/0x190
    [<00000000e41ade9f>] __napi_poll+0x28/0x170
    [<00000000b4c0906b>] net_rx_action+0x14f/0x2a0
    [<00000000b20dd5d4>] __do_softirq+0xf4/0x305
    [<000000003a7d7e15>] __irq_exit_rcu+0xc3/0x140
    [<00000000968d39a2>] sysvec_apic_timer_interrupt+0x9e/0xc0
    [<000000009e920794>] asm_sysvec_apic_timer_interrupt+0x16/0x20
    [<000000008942add0>] native_safe_halt+0x13/0x20

Florian Westphal says: "Original code was likely fine because nothing
ever did set a skb->dst entry earlier than bridge in those days."

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: Harsh Modi <harshmodi@google.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

858c6aad

debugfs: add debugfs_lookup_and_remove() · a1c25212

Greg Kroah-Hartman authored 2 years ago

stable inclusion
from stable-v4.19.258
commit ebfb744bb6036485dac2a45c46e82b911c3726d1
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

commit dec9b2f1e0455a151a7293c367da22ab973f713e upstream.

There is a very common pattern of using
debugfs_remove(debufs_lookup(..)) which results in a dentry leak of the
dentry that was looked up.  Instead of having to open-code the correct
pattern of calling dput() on the dentry, create
debugfs_lookup_and_remove() to handle this pattern automatically and
properly without any memory leaks.

Cc: stable <stable@kernel.org>
Reported-by: Kuyo Chang <kuyo.chang@mediatek.com>
Tested-by: Kuyo Chang <kuyo.chang@mediatek.com>
Link: https://lore.kernel.org/r/YxIaQ8cSinDR881k@kroah.com


Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

a1c25212

tcp: annotate data-race around challenge_timestamp · fbf064eb

Eric Dumazet authored 2 years ago

stable inclusion
from stable-v4.19.258
commit 12f99f07a5f4d7ec8d72da6ee8ef66f048634f6a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit 8c70521238b7863c2af607e20bcba20f974c969b ]

challenge_timestamp can be read an written by concurrent threads.

This was expected, but we need to annotate the race to avoid potential issues.

Following patch moves challenge_timestamp and challenge_count
to per-netns storage to provide better isolation.

Fixes: 354e4aa3 ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

fbf064eb

Revert "mm: kmemleak: take a full lowmem check in kmemleak_*_phys()" · 6afb63d8

Yee Lee authored 2 years ago

stable inclusion
from stable-v4.19.258
commit aabdbd63c46ff6fa6661dd9d3aaaebb62141db90
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4
CVE: NA

--------------------------------

This reverts commit 23c2d497de21f25898fbea70aeb292ab8acc8c94.

Commit 23c2d497de21 ("mm: kmemleak: take a full lowmem check in
kmemleak_*_phys()") brought false leak alarms on some archs like arm64
that does not init pfn boundary in early booting. The final solution
lands on linux-6.0: commit 0c24e061196c ("mm: kmemleak: add rbtree and
store physical address for objects allocated with PA").

Revert this commit before linux-6.0. The original issue of invalid PA
can be mitigated by additional check in devicetree.

The false alarm report is as following: Kmemleak output: (Qemu/arm64)
unreferenced object 0xffff0000c0170a00 (size 128):
  comm "swapper/0", pid 1, jiffies 4294892404 (age 126.208s)
  hex dump (first 32 bytes):
 62 61 73 65 00 ...

6afb63d8

net: neigh: don't call kfree_skb() under spin_lock_irqsave() · afb3765b

Yang Yingliang authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 509e1b324aad95019ff14b8eb88fdfa500d9a489
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

commit d5485d9dd24e1d04e5509916515260186eb1455c upstream.

It is not allowed to call kfree_skb() from hardware interrupt
context or with interrupts being disabled. So add all skb to
a tmp list, then free them after spin_unlock_irqrestore() at
once.

Fixes: 66ba215cb513 ("neigh: fix possible DoS due to net iface start/stop loop")
Suggested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

afb3765b

neigh: fix possible DoS due to net iface start/stop loop · 68d9a8b1

Denis V. Lunev authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 05fdce1ae744dee43c9181fd063c9c0db4f777f2
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit 66ba215cb51323e4e55e38fd5f250e0fae0cbc94 ]

Normal processing of ARP request (usually this is Ethernet broadcast
packet) coming to the host is looking like the following:
* the packet comes to arp_process() call and is passed through routing
  procedure
* the request is put into the queue using pneigh_enqueue() if
  corresponding ARP record is not local (common case for container
  records on the host)
* the request is processed by timer (within 80 jiffies by default) and
  ARP reply is sent from the same arp_process() using
  NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED condition (flag is set inside
  pneigh_enqueue())

And here the problem comes. Linux kernel calls pneigh_queue_purge()
which destroys the whole queue of ARP requests on ANY network interface
start/stop event through __neigh_ifdown().

This is actually not a problem within the original world as network
interface start/stop was accessible to the host 'root' only, which
could do more destructive things. But the world is changed and there
are Linux containers available. Here container 'root' has an access
to this API and could be considered as untrusted user in the hosting
(container's) world.

Thus there is an attack vector to other containers on node when
container's root will endlessly start/stop interfaces. We have observed
similar situation on a real production node when docker container was
doing such activity and thus other containers on the node become not
accessible.

The patch proposed doing very simple thing. It drops only packets from
the same namespace in the pneigh_queue_purge() where network interface
state change is detected. This is enough to prevent the problem for the
whole node preserving original semantics of the code.

v2:
	- do del_timer_sync() if queue is empty after pneigh_queue_purge()
v3:
	- rebase to net tree

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@kernel.org>
Cc: Yajun Deng <yajun.deng@linux.dev>
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Cc: Konstantin Khorenko <khorenko@virtuozzo.com>
Cc: kernel@openvz.org
Cc: devel@openvz.org
Investigated-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

68d9a8b1

mm/hugetlb: fix hugetlb not supporting softdirty tracking · 67a9ed93

David Hildenbrand authored 2 years ago

stable inclusion
from stable-v4.19.257
commit b18376f359baff4fa5132d3accd1a05cd8d526e4
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4
CVE: NA

--------------------------------

commit f96f7a40874d7c746680c0b9f57cef2262ae551f upstream.

Patch series "mm/hugetlb: fix write-fault handling for shared mappings", v2.

I observed that hugetlb does not support/expect write-faults in shared
mappings that would have to map the R/O-mapped page writable -- and I
found two case where we could currently get such faults and would
erroneously map an anon page into a shared mapping.

Reproducers part of the patches.

I propose to backport both fixes to stable trees.  The first fix needs a
small adjustment.

This patch (of 2):

Staring at hugetlb_wp(), one might wonder where all the logic for shared
mappings is when stumbling over a write-protected page in a shared
mapping.  In fact, there is none, and so far we thought we could get away
with that because e.g., mprotect() should always do the right thing and
map all pages directly writable.

Looks like we were wrong:

--------------------------------------------------------------------------
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <fcntl.h>
 #include <unistd.h>
 #include <errno.h>
 #include <sys/mman.h>

 #define HUGETLB_SIZE (2 * 1024 * 1024u)

 static void clear_softdirty(void)
 {
         int fd = open("/proc/self/clear_refs", O_WRONLY);
         const char *ctrl = "4";
         int ret;

         if (fd < 0) {
                 fprintf(stderr, "open(clear_refs) failed\n");
                 exit(1);
         }
         ret = write(fd, ctrl, strlen(ctrl));
         if (ret != strlen(ctrl)) {
                 fprintf(stderr, "write(clear_refs) failed\n");
                 exit(1);
         }
         close(fd);
 }

 int main(int argc, char **argv)
 {
         char *map;
         int fd;

         fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT);
         if (!fd) {
                 fprintf(stderr, "open() failed\n");
                 return -errno;
         }
         if (ftruncate(fd, HUGETLB_SIZE)) {
                 fprintf(stderr, "ftruncate() failed\n");
                 return -errno;
         }

         map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
         if (map == MAP_FAILED) {
                 fprintf(stderr, "mmap() failed\n");
                 return -errno;
         }

         *map = 0;

         if (mprotect(map, HUGETLB_SIZE, PROT_READ)) {
                 fprintf(stderr, "mmprotect() failed\n");
                 return -errno;
         }

         clear_softdirty();

         if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) {
                 fprintf(stderr, "mmprotect() failed\n");
                 return -errno;
         }

         *map = 0;

         return 0;
 }
--------------------------------------------------------------------------

Above test fails with SIGBUS when there is only a single free hugetlb page.
 # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
 # ./test
 Bus error (core dumped)

And worse, with sufficient free hugetlb pages it will map an anonymous page
into a shared mapping, for example, messing up accounting during unmap
and breaking MAP_SHARED semantics:
 # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
 # ./test
 # cat /proc/meminfo | grep HugePages_
 HugePages_Total:       2
 HugePages_Free:        1
 HugePages_Rsvd:    18446744073709551615
 HugePages_Surp:        0

Reason in this particular case is that vma_wants_writenotify() will
return "true", removing VM_SHARED in vma_set_page_prot() to map pages
write-protected. Let's teach vma_wants_writenotify() that hugetlb does not
support softdirty tracking.

Link: https://lkml.kernel.org/r/20220811103435.188481-1-david@redhat.com
Link: https://lkml.kernel.org/r/20220811103435.188481-2-david@redhat.com


Fixes: 64e45507 ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Jamie Liu <jamieliu@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>	[3.18+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

67a9ed93

asm-generic: sections: refactor memory_intersects · 3d014e93

Quanyang Wang authored 2 years ago

stable inclusion
from stable-v4.19.257
commit b5cc57e43f2e5c51ef6ee6d45d97ff7844e9d4ba
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4
CVE: NA

--------------------------------

commit 0c7d7cc2b4fe2e74ef8728f030f0f1674f9f6aee upstream.

There are two problems with the current code of memory_intersects:

First, it doesn't check whether the region (begin, end) falls inside the
region (virt, vend), that is (virt < begin && vend > end).

The second problem is if vend is equal to begin, it will return true but
this is wrong since vend (virt + size) is not the last address of the
memory region but (virt + size -1) is.  The wrong determination will
trigger the misreporting when the function check_for_illegal_area calls
memory_intersects to check if the dma region intersects with stext region.

The misreporting is as below (stext is at 0x80100000):
 WARNING: CPU: 0 PID: 77 at kernel/dma/debug.c:1073 check_for_illegal_area+0x130/0x168
 DMA-API: chipidea-usb2 e0002000.usb: device driver maps memory from kernel text or rodata [addr=800f0000] [len=65536]
 Modules linked in:
 CPU: 1 PID: 77 Comm: usb-storage Not tainted 5.19.0-yocto-standard #5
 Hardware name: Xilinx Zynq Platform
  unwind_backtrace from show_stack+0x18/0x1c
  show_stack from dump_stack_lvl+0x58/0x70
  dump_stack_lvl from __warn+0xb0/0x198
  __warn from warn_slowpath_fmt+0x80/0xb4
  warn_slowpath_fmt from check_for_illegal_area+0x130/0x168
  check_for_illegal_area from debug_dma_map_sg+0x94/0x368
  debug_dma_map_sg from __dma_map_sg_attrs+0x114/0x128
  __dma_map_sg_attrs from dma_map_sg_attrs+0x18/0x24
  dma_map_sg_attrs from usb_hcd_map_urb_for_dma+0x250/0x3b4
  usb_hcd_map_urb_for_dma from usb_hcd_submit_urb+0x194/0x214
  usb_hcd_submit_urb from usb_sg_wait+0xa4/0x118
  usb_sg_wait from usb_stor_bulk_transfer_sglist+0xa0/0xec
  usb_stor_bulk_transfer_sglist from usb_stor_bulk_srb+0x38/0x70
  usb_stor_bulk_srb from usb_stor_Bulk_transport+0x150/0x360
  usb_stor_Bulk_transport from usb_stor_invoke_transport+0x38/0x440
  usb_stor_invoke_transport from usb_stor_control_thread+0x1e0/0x238
  usb_stor_control_thread from kthread+0xf8/0x104
  kthread from ret_from_fork+0x14/0x2c

Refactor memory_intersects to fix the two problems above.

Before the 1d7db834a027e ("dma-debug: use memory_intersects()
directly"), memory_intersects is called only by printk_late_init:

printk_late_init -> init_section_intersects ->memory_intersects.

There were few places where memory_intersects was called.

When commit 1d7db834a027e ("dma-debug: use memory_intersects()
directly") was merged and CONFIG_DMA_API_DEBUG is enabled, the DMA
subsystem uses it to check for an illegal area and the calltrace above
is triggered.

[akpm@linux-foundation.org: fix nearby comment typo]
Link: https://lkml.kernel.org/r/20220819081145.948016-1-quanyang.wang@windriver.com


Fixes: 97955936 ("asm/sections: add helpers to check for section data")
Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thierry Reding <treding@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

3d014e93

loop: Check for overflow while configuring loop · ba3f75d0

Siddh Raman Pant authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 2035c770bfdbcc82bd52e05871a7c82db9529e0f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4
CVE: NA

--------------------------------

commit c490a0b5a4f36da3918181a8acdc6991d967c5f3 upstream.

The userspace can configure a loop using an ioctl call, wherein
a configuration of type loop_config is passed (see lo_ioctl()'s
case on line 1550 of drivers/block/loop.c). This proceeds to call
loop_configure() which in turn calls loop_set_status_from_info()
(see line 1050 of loop.c), passing &config->info which is of type
loop_info64*. This function then sets the appropriate values, like
the offset.

loop_device has lo_offset of type loff_t (see line 52 of loop.c),
which is typdef-chained to long long, whereas loop_info64 has
lo_offset of type __u64 (see line 56 of include/uapi/linux/loop.h).

The function directly copies offset from info to the device as
follows (See line 980 of loop.c):
	lo->lo_...

ba3f75d0

net: Fix a data-race around sysctl_somaxconn. · c608e294

Kuniyuki Iwashima authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 1e4142b95269a394229c89d132f8e226fb1b4f71
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit 3c9ba81d72047f2e81bb535d42856517b613aba7 ]

While reading sysctl_somaxconn, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

c608e294

net: Fix a data-race around netdev_budget_usecs. · 87b0a8b8

Kuniyuki Iwashima authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 2a341830849e1ced57e40455bcaa6fc9b79662d5
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit fa45d484c52c73f79db2c23b0cdfc6c6455093ad ]

While reading netdev_budget_usecs, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 7acf8a1e ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

87b0a8b8

net: Fix a data-race around netdev_budget. · f9c68f46

Kuniyuki Iwashima authored 2 years ago

stable inclusion
from stable-v4.19.257
commit c3d0b6b3c808ff7d5439fbe6a41108bbb000ab26
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit 2e0c42374ee32e72948559d2ae2f7ba3dc6b977c ]

While reading netdev_budget, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 51b0bded ("[NET]: Separate two usages of netdev_max_backlog.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

f9c68f46

net: Fix a data-race around sysctl_net_busy_read. · 094eb9a6

Kuniyuki Iwashima authored 2 years ago

stable inclusion
from stable-v4.19.257
commit d908fd060927cf23bae5b2975140c7ed81bd4013
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit e59ef36f0795696ab229569c153936bfd068d21c ]

While reading sysctl_net_busy_read, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 2d48d67f ("net: poll/select low latency socket support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

094eb9a6

net: Fix a data-race around sysctl_net_busy_poll. · 80413eaa

Kuniyuki Iwashima authored 2 years ago

stable inclusion
from stable-v4.19.257
commit da89cab514b3ec7fdedb378617160c47fe3b60a9
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit c42b7cddea47503411bfb5f2f93a4154aaffa2d9 ]

While reading sysctl_net_busy_poll, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 06021292 ("net: add low latency socket poll")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

80413eaa

net: Fix a data-race around sysctl_tstamp_allow_data. · 8b6ba6cf

Kuniyuki Iwashima authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 04ec1e942cff42fecdde5ee30f42fc28b822379b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit d2154b0afa73c0159b2856f875c6b4fe7cf6a95e ]

While reading sysctl_tstamp_allow_data, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: b245be1f ("net-timestamp: no-payload only sysctl")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

8b6ba6cf

ratelimit: Fix data-races in ___ratelimit(). · a9f40466

Kuniyuki Iwashima authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 0b6dccd3077ad91ba6fd368fd77cb6b792faa1ac
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit 6bae8ceb90ba76cdba39496db936164fa672b9be ]

While reading rs->interval and rs->burst, they can be changed
concurrently via sysctl (e.g. net_ratelimit_state).  Thus, we
need to add READ_ONCE() to their readers.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

a9f40466

net: Fix data-races around netdev_tstamp_prequeue. · 0326f227

Kuniyuki Iwashima authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 8121bda0093a1e4ad517d93a5707e2a5615c6f99
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit 61adf447e38664447526698872e21c04623afb8e ]

While reading netdev_tstamp_prequeue, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 3b098e2d ("net: Consistent skb timestamping")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

0326f227

net: Fix data-races around weight_p and dev_weight_[rt]x_bias. · 5088437d

Kuniyuki Iwashima authored 2 years ago

stable inclusion
from stable-v4.19.257
commit d6be3137d5b9b5e9c6828d4d7eb7d73141895c7f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit bf955b5ab8f6f7b0632cdef8e36b14e4f6e77829 ]

While reading weight_p, it can be changed concurrently.  Thus, we need
to add READ_ONCE() to its reader.

Also, dev_[rt]x_weight can be read/written at the same time.  So, we
need to use READ_ONCE() and WRITE_ONCE() for its access.  Moreover, to
use the same weight_p while changing dev_[rt]x_weight, we add a mutex
in proc_do_dev_weight().

Fixes: 3d48b53f ("net: dev_weight: TX/RX orthogonality")
Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

5088437d

net: ipvtap - add __init/__exit annotations to module init/exit funcs · 6fd5f93e

Maciej Żenczykowski authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 4b07805e34a6b2e27cc07d6f0d0c228d46130cb1
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit 4b2e3a17e9f279325712b79fb01d1493f9e3e005 ]

Looks to have been left out in an oversight.

Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Sainath Grandhi <sainath.grandhi@intel.com>
Fixes: 235a9d89 ('ipvtap: IP-VLAN based tap driver')
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Link: https://lore.kernel.org/r/20220821130808.12143-1-zenczykowski@gmail.com


Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

6fd5f93e

bonding: 802.3ad: fix no transmission of LACPDUs · 443b3d8b

Jonathan Toppins authored 2 years ago

stable inclusion
from stable-v4.19.257
commit d60e70a898cbee907c686c75fb06b680ec37b94e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit d745b5062ad2b5da90a5e728d7ca884fc07315fd ]

This is caused by the global variable ad_ticks_per_sec being zero as
demonstrated by the reproducer script discussed below. This causes
all timer values in __ad_timer_to_ticks to be zero, resulting
in the periodic timer to never fire.

To reproduce:
Run the script in
`tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh` which
puts bonding into a state where it never transmits LACPDUs.

line 44: ip link add fbond type bond mode 4 miimon 200 \
            xmit_hash_policy 1 ad_actor_sys_prio 65535 lacp_rate fast
setting bond param: ad_actor_sys_prio
given:
    params.ad_actor_system = 0
call stack:
    bond_option_ad_actor_sys_prio()
    -> bond_3ad_update_ad_actor_settings()
       -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
       -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
            params.ad_actor_system == 0
results:
     ad.system.sys_mac_addr = bond->dev->dev_addr

line 48: ip link set fbond address 52:54:00:3B:7C:A6
setting bond MAC addr
call stack:
    bond->dev->dev_addr = new_mac

line 52: ip link set fbond type bond ad_actor_sys_prio 65535
setting bond param: ad_actor_sys_prio
given:
    params.ad_actor_system = 0
call stack:
    bond_option_ad_actor_sys_prio()
    -> bond_3ad_update_ad_actor_settings()
       -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
       -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
            params.ad_actor_system == 0
results:
     ad.system.sys_mac_addr = bond->dev->dev_addr

line 60: ip link set veth1-bond down master fbond
given:
    params.ad_actor_system = 0
    params.mode = BOND_MODE_8023AD
    ad.system.sys_mac_addr == bond->dev->dev_addr
call stack:
    bond_enslave
    -> bond_3ad_initialize(); because first slave
       -> if ad.system.sys_mac_addr != bond->dev->dev_addr
          return
results:
     Nothing is run in bond_3ad_initialize() because dev_addr equals
     sys_mac_addr leaving the global ad_ticks_per_sec zero as it is
     never initialized anywhere else.

The if check around the contents of bond_3ad_initialize() is no longer
needed due to commit 5ee14e6d ("bonding: 3ad: apply ad_actor settings
changes immediately") which sets ad.system.sys_mac_addr if any one of
the bonding parameters whos set function calls
bond_3ad_update_ad_actor_settings(). This is because if
ad.system.sys_mac_addr is zero it will be set to the current bond mac
address, this causes the if check to never be true.

Fixes: 5ee14e6d ("bonding: 3ad: apply ad_actor settings changes immediately")
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

443b3d8b

xfrm: fix refcount leak in __xfrm_policy_check() · 48c23b3c

Xin Xiong authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 0769491a8acd3e85ca4c3f65080eac2c824262df
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit 9c9cb23e00ddf45679b21b4dacc11d1ae7961ebe ]

The issue happens on an error path in __xfrm_policy_check(). When the
fetching process of the object `pols[1]` fails, the function simply
returns 0, forgetting to decrement the reference count of `pols[0]`,
which is incremented earlier by either xfrm_sk_policy_lookup() or
xfrm_policy_lookup(). This may result in memory leaks.

Fix it by decreasing the reference count of `pols[0]` in that path.

Fixes: 134b0fc5 ("IPsec: propagate security module errors up from flow_cache_lookup")
Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

48c23b3c

audit: fix potential double free on error path from fsnotify_add_inode_mark · 5985e603

Gaosheng Cui authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 1133d90d9d9ff3def7fc5ba160381cd611aa51ee
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

commit ad982c3be4e60c7d39c03f782733503cbd88fd2a upstream.

Audit_alloc_mark() assign pathname to audit_mark->path, on error path
from fsnotify_add_inode_mark(), fsnotify_put_mark will free memory
of audit_mark->path, but the caller of audit_alloc_mark will free
the pathname again, so there will be double free problem.

Fix this by resetting audit_mark->path to NULL pointer on error path
from fsnotify_add_inode_mark().

Cc: stable@vger.kernel.org
Fixes: 7b129323 ("fsnotify: Add group pointer in fsnotify_init_mark()")
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

5985e603

dm: return early from dm_pr_call() if DM device is suspended · 4fd23147

Mike Snitzer authored 2 years ago

stable inclusion
from stable-v4.19.256
commit 4f040ba9f15d047f173ecc01d3bba3d1e9fbee08
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit e120a5f1e78fab6223544e425015f393d90d6f0d ]

Otherwise PR ops may be issued while the broader DM device is being
reconfigured, etc.

Fixes: 9c72bad1 ("dm: call PR reserve/unreserve on each underlying device")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

4fd23147

NFSv4: Fix races in the legacy idmapper upcall · 092f8285

Trond Myklebust authored 2 years ago

stable inclusion
from stable-v4.19.256
commit bb4a4b31c29dd04d50e2b34b780acc0e1bbb8ce2
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

commit 51fd2eb52c0ca8275a906eed81878ef50ae94eb0 upstream.

nfs_idmap_instantiate() will cause the process that is waiting in
request_key_with_auxdata() to wake up and exit. If there is a second
process waiting for the idmap->idmap_mutex, then it may wake up and
start a new call to request_key_with_auxdata(). If the call to
idmap_pipe_downcall() from the first process has not yet finished
calling nfs_idmap_complete_pipe_upcall_locked(), then we may end up
triggering the WARN_ON_ONCE() in nfs_idmap_prepare_pipe_upcall().

The fix is to ensure that we clear idmap->idmap_upcall_data before
calling nfs_idmap_instantiate().

Fixes: e9ab41b6 ("NFSv4: Clean up the legacy idmapper upcall")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-o...

092f8285

Oct 10, 2022

netfilter: nf_conntrack_irc: Fix forged IP logic · ffa9f2a5

David Leadbeater authored 2 years ago

stable inclusion
from stable-v4.19.258
commit 3275f7804f40de3c578d2253232349b07c25f146
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5OWZ7


CVE: CVE-2022-2663

---------------------------

[ Upstream commit 0efe125cfb99e6773a7434f3463f7c2fa28f3a43 ]

Ensure the match happens in the right direction, previously the
destination used was the server, not the NAT host, as the comment
shows the code intended.

Additionally nf_nat_irc uses port 0 as a signal and there's no valid way
it can appear in a DCC message, so consider port 0 also forged.

Fixes: 869f37d8 ("[NETFILTER]: nf_conntrack/nf_nat: add IRC helper port")
Signed-off-by: David Leadbeater <dgl@dgl.cx>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Liu Jian <liujian56@huawei.com>
Reviewed-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

4.19.90-2210.1.0

ffa9f2a5

ext4: fix check for block being out of directory size · cc10dbac

Jan Kara authored 2 years ago

mainline inclusion
from mainline-v6.1-rc1
commit 61a1d87a324ad5e3ed27c6699dfc93218fcf3201
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I58WSQ


CVE: CVE-2022-1184

--------------------------------

The check in __ext4_read_dirblock() for block being outside of directory
size was wrong because it compared block number against directory size
in bytes. Fix it.

Fixes: 65f8ea4cd57d ("ext4: check if directory block is within i_size")
CVE: CVE-2022-1184
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20220822114832.1482-1-jack@suse.cz


Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

cc10dbac

ext4: check if directory block is within i_size · d9dc377b

Lukas Czerner authored 2 years ago

mainline inclusion
from mainline-v6.0-rc1
commit 65f8ea4cd57dbd46ea13b41dc8bac03176b04233
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I58WSQ


CVE: CVE-2022-1184

--------------------------------

Currently ext4 directory handling code implicitly assumes that the
directory blocks are always within the i_size. In fact ext4_append()
will attempt to allocate next directory block based solely on i_size and
the i_size is then appropriately increased after a successful
allocation.

However, for this to work it requires i_size to be correct. If, for any
reason, the directory inode i_size is corrupted in a way that the
directory tree refers to a valid directory block past i_size, we could
end up corrupting parts of the directory tree structure by overwriting
already used directory blocks when modifying the directory.

Fix it by catching the corruption early in __ext4_read_dirblock().

Addresses Red-Hat-Bugzilla: #2070205
CVE: CVE-2022-1184
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20220704142721.157985-1-lczerner@redhat.com


Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Conflicts:
	fs/ext4/namei.c

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

d9dc377b

Oct 09, 2022

block: Fix UAF in bd_link_disk_holder() · 01b1ec1d

Luo Meng authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5TY3L


CVE: NA

--------------------------------

A crash as follows:

 BUG: unable to handle page fault for address: 000000011241cec7
 sd 5:0:0:1: [sdl] Synchronizing SCSI cache
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP PTI
 CPU: 3 PID: 2465367 Comm: multipath Kdump: loaded Tainted: G        W  O      5.10.0-60.18.0.50.h478.eulerosv2r11.x86_64 #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-20220525_182517-szxrtosci10000 04/01/2014
 RIP: 0010:kernfs_new_node+0x22/0x60
 Code: cc cc 66 0f 1f 44 00 00 0f 1f 44 00 00 41 54 41 89 cb 0f b7 ca 48 89 f2 53 48 8b 47 08 48 89 fb 48 89 de 48 85 c0 48 0f 44 c7 <48> 8b 78 50 41 51 45 89 c1 45 89 d8 e8 4d ee ff ff 5a 49 89 c4 48
 RSP: 0018:ffffa178419539e8 EFLAGS: 00010206
 RAX: 000000011241ce77 RBX: ffff9596828395a0 RCX: 000000000000a1ff
 RDX: ffff9595ada828b0 RSI: ffff9596828395a0 RDI: ffff9596828395a0
 RBP: ffff95959a9a2a80 R08: 0000000000000000 R09: 0000000000000004
 R10: ffff9595ca0bf930 R11: 0000000000000000 R12: ffff9595ada828b0
 R13: ffff9596828395a0 R14: 0000000000000001 R15: ffff9595948c5c80
 FS:  00007f64baa10200(0000) GS:ffff9596bad80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000000011241cec7 CR3: 000000011923e003 CR4: 0000000000170ee0
 Call Trace:
  kernfs_create_link+0x31/0xa0
  sysfs_do_create_link_sd+0x61/0xc0
  bd_link_disk_holder+0x10a/0x180
  dm_get_table_device+0x10b/0x1f0 [dm_mod]
  __dm_get_device+0x1e2/0x280 [dm_mod]
  ? kmem_cache_alloc_trace+0x2fb/0x410
  parse_path+0xca/0x200 [dm_multipath]
  parse_priority_group+0x19d/0x1f0 [dm_multipath]
  multipath_ctr+0x27a/0x491 [dm_multipath]
  dm_table_add_target+0x177/0x360 [dm_mod]
  table_load+0x12b/0x380 [dm_mod]
  ctl_ioctl+0x199/0x290 [dm_mod]
  ? dev_suspend+0xd0/0xd0 [dm_mod]
  dm_ctl_ioctl+0xa/0x20 [dm_mod]
  __se_sys_ioctl+0x85/0xc0
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x61/0xc6

This can be easy reproduce:
 Add delay before ret = add_symlink(bdev->bd_part->holder_dir...)
 in bd_link_disk_holder()
 dmsetup create xxx --tabel "0 1000 linear /dev/sda 0"
 echo 1 > /sys/block/sda/device/delete

Delete /dev/sda will release holder_dir, but add_symlink() will
use holder_dir. Therefore UAF will occur in this case.

Fix this problem by adding reference count to holder_dir.

Signed-off-by: Luo Meng <luomeng12@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

01b1ec1d

ALSA: pcm: oss: Fix race at SNDCTL_DSP_SYNC · e4fc0e51

Sasha Levin authored 2 years ago

stable inclusion
from stable-v5.4.215
commit 4051324a6dafd7053c74c475e80b3ba10ae672b0
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5T9C3


CVE: CVE-2022-3303

---------------------------

[ Upstream commit 8423f0b6d513b259fdab9c9bf4aaa6188d054c2d ]

There is a small race window at snd_pcm_oss_sync() that is called from
OSS PCM SNDCTL_DSP_SYNC ioctl; namely the function calls
snd_pcm_oss_make_ready() at first, then takes the params_lock mutex
for the rest.  When the stream is set up again by another thread
between them, it leads to inconsistency, and may result in unexpected
results such as NULL dereference of OSS buffer as a fuzzer spotted
recently.

The fix is simply to cover snd_pcm_oss_make_ready() call into the same
params_lock mutex with snd_pcm_oss_make_ready_locked() variant.

Reported-and-tested-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/CAFcO6XN7JDM4xSXGhtusQfS2mSBcx50VJKwQpCq=WeLt57aaZA@mail.gmail.com
Link: https://lore.kernel.org/r/20220905060714.22549-1-tiwai@suse.de


Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Xia Longlong <xialonglong1@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

e4fc0e51

block: add a new config to control dispatching bios asynchronously · 3d14cd06

余快 authored 2 years ago

hulk inclusion
category: performance
bugzilla: 187597, https://gitee.com/openeuler/kernel/issues/I5QK5M


CVE: NA

--------------------------------

If CONFIG_BLK_BIO_DISPATCH_ASYNC is enabled, and driver support
QUEUE_FLAG_DISPATCH_ASYNC, bios will be dispatched asynchronously to
specific CPUs to avoid across nodes memory access in driver.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

3d14cd06

block: fix kabi broken in request_queue · b6a187ae

余快 authored 2 years ago

hulk inclusion
category: performance
bugzilla: 187597, https://gitee.com/openeuler/kernel/issues/I5QK5M


CVE: NA

--------------------------------

request_queue_wrapper is not accessible in drivers currently,
introduce a new helper to initialize async dispatch to fix kabi broken.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

b6a187ae

md: enable dispatching bio asynchronously for raid10 by default · 8934afb9

余快 authored 2 years ago

hulk inclusion
category: performance
bugzilla: 187597, https://gitee.com/openeuler/kernel/issues/I5QK5M


CVE: NA

--------------------------------

Try to improve performance for raid when user issues io concurrently
from multiple nodes.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

8934afb9

arm64/topology: getting preferred sibling's cpumask supported by platform · c59d6d53

Wang ShaoBo authored 2 years ago

hulk inclusion
category: performance
bugzilla: https://gitee.com/openeuler/kernel/issues/I5QK5M


CVE: NA

--------------------------------

For some architectures, masking the underlying processor topology
differences can make software unable to identify the cpu distance,
which results in performance fluctuations.

So we provide additional interface for getting preferred sibling's
cpumask supported by platform, this siblings' cpumask indicates those
CPUs which are clustered with relatively short distances, but this
hardly depends on the specific implementation of the specific platform.

Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

c59d6d53

block: support to dispatch bio asynchronously · f39ebff6

余快 authored 2 years ago

hulk inclusion
category: performance
bugzilla: https://gitee.com/openeuler/kernel/issues/I5QK5M


CVE: NA

--------------------------------

In some architecture memory access latency is very bad across nodes
compare to local node. For consequence, io performance is rather bad
while users issue io from multiple nodes if lock contention exist in
the driver.

This patch make io dispatch asynchronously to specific kthread that is
bind to cpus that are belong to the same node, so that memory access
across nodes in driver can be avoided.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

f39ebff6

block: add new fields in request_queue · 4fc0fcd6

余快 authored 2 years ago

hulk inclusion
category: performance
bugzilla: https://gitee.com/openeuler/kernel/issues/I5QK5M


CVE: NA

--------------------------------

Add a new flag QUEUE_FLAG_DISPATCH_ASYNC and two new fields
'dispatch_cpumask' and 'last_dispatch_cpu' for request_queue, prepare
to support dispatch bio asynchronous in specified cpus. This patch also
add sysfs apis.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

4fc0fcd6

md/raid10: convert resync_lock to use seqlock · f82f6d68

余快 authored 2 years ago

mainline inclusion
from md-next
commit ddc489e066cd267b383c0eed4f576f6bdb154588
category: performance
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5PRMO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=ddc489e066cd267b383c0eed4f576f6bdb154588



---------------------

Currently, wait_barrier() will hold 'resync_lock' to read 'conf->barrier',
and io can't be dispatched until 'barrier' is dropped.

Since holding the 'barrier' is not common, convert 'resync_lock' to use
seqlock so that holding lock can be avoided in fast path.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-and-tested-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

f82f6d68