Commits · ae64d846c64c62e5d68c87fc2a07ed07f6917b79 · Summer2022 / 22b970264

Jul 08, 2022

NFSv4: Don't hold the layoutget locks across multiple RPC calls · ae64d846

Trond Myklebust authored 3 years ago

stable inclusion
from stable-4.19.247
commit 6b3fc1496e7227cd6a39a80bbfb7588ef7c7a010
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

[ Upstream commit 6949493884fe88500de4af182588e071cf1544ee ]

When doing layoutget as part of the open() compound, we have to be
careful to release the layout locks before we can call any further RPC
calls, such as setattr(). The reason is that those calls could trigger
a recall, which could deadlock.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

ae64d846

tcp: tcp_rtx_synack() can be called from process context · dbed45ae

Eric Dumazet authored 3 years ago

stable inclusion
from stable-4.19.247
commit 58bd38cbc961fd799842b7be8c5222310f04b908
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

[ Upstream commit 0a375c822497ed6ad6b5da0792a12a6f1af10c0b ]

Laurent reported the enclosed report [1]

This bug triggers with following coditions:

0) Kernel built with CONFIG_DEBUG_PREEMPT=y

1) A new passive FastOpen TCP socket is created.
   This FO socket waits for an ACK coming from client to be a complete
   ESTABLISHED one.
2) A socket operation on this socket goes through lock_sock()
   release_sock() dance.
3) While the socket is owned by the user in step 2),
   a retransmit of the SYN is received and stored in socket backlog.
4) At release_sock() time, the socket backlog is processed while
   in process context.
5) A SYNACK packet is cooked in response of the SYN retransmit.
6) -> tcp_rtx_synack() is called in process context.

Before blamed commit, tcp_rtx_synack() was always called from BH handler,
from a timer handler.

Fix this by using TCP_INC_STATS() & NET_INC_STATS()
which do not assume caller is in non preemptible context.

[1]
BUG: using __this_cpu_add() in preemptible [00000000] code: epollpep/2180
caller is tcp_rtx_synack.part.0+0x36/0xc0
CPU: 10 PID: 2180 Comm: epollpep Tainted: G           OE     5.16.0-0.bpo.4-amd64 #1  Debian 5.16.12-1~bpo11+1
Hardware name: Supermicro SYS-5039MC-H8TRF/X11SCD-F, BIOS 1.7 11/23/2021
Call Trace:
 <TASK>
 dump_stack_lvl+0x48/0x5e
 check_preemption_disabled+0xde/0xe0
 tcp_rtx_synack.part.0+0x36/0xc0
 tcp_rtx_synack+0x8d/0xa0
 ? kmem_cache_alloc+0x2e0/0x3e0
 ? apparmor_file_alloc_security+0x3b/0x1f0
 inet_rtx_syn_ack+0x16/0x30
 tcp_check_req+0x367/0x610
 tcp_rcv_state_process+0x91/0xf60
 ? get_nohz_timer_target+0x18/0x1a0
 ? lock_timer_base+0x61/0x80
 ? preempt_count_add+0x68/0xa0
 tcp_v4_do_rcv+0xbd/0x270
 __release_sock+0x6d/0xb0
 release_sock+0x2b/0x90
 sock_setsockopt+0x138/0x1140
 ? __sys_getsockname+0x7e/0xc0
 ? aa_sk_perm+0x3e/0x1a0
 __sys_setsockopt+0x198/0x1e0
 __x64_sys_setsockopt+0x21/0x30
 do_syscall_64+0x38/0xc0
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 168a8f58 ("tcp: TCP Fast Open Server - main code path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Laurent Fasnacht <laurent.fasnacht@proton.ch>
Acked-by: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20220530213713.601888-1-eric.dumazet@gmail.com


Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

dbed45ae

serial: 8250_fintek: Check SER_RS485_RTS_* only with RS485 · 099165d5

Ilpo Järvinen authored 3 years ago

stable inclusion
from stable-4.19.247
commit e78a321c14ad4e96f3d215aad8bcb6cb55ddc442
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

[ Upstream commit af0179270977508df6986b51242825d7edd59caf ]

SER_RS485_RTS_ON_SEND and SER_RS485_RTS_AFTER_SEND relate to behavior
within RS485 operation. The driver checks if they have the same value
which is not possible to realize with the hardware. The check is taken
regardless of SER_RS485_ENABLED flag and -EINVAL is returned when the
check fails, which creates problems.

This check makes it unnecessarily complicated to turn RS485 mode off as
simple zeroed serial_rs485 struct will trigger that equal values check.
In addition, the driver itself memsets its rs485 structure to zero when
RS485 is disabled but if userspace would try to make an TIOCSRS485
ioctl() call with the very same struct, it would end up failing with
-EINVAL which doesn't make much sense.

Resolve the problem by moving the check inside SER_RS485_ENABLED block.

Fixes: 7ecc7701 ("serial: 8250_fintek: Return -EINVAL on invalid configuration")
Cc: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/035c738-8ea5-8b17-b1d7-84a7b3aeaa51@linux.intel.com


Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

099165d5

md: fix an incorrect NULL check in md_reload_sb · f99eddb9

Xiaomeng Tong authored 3 years ago

stable inclusion
from stable-4.19.247
commit 3a18aa6faa4c3b9b84d56188ac94099ba22a4f11
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

commit 64c54d9244a4efe9bc6e9c98e13c4bbb8bb39083 upstream.

The bug is here:
	if (!rdev || rdev->desc_nr != nr) {

The list iterator value 'rdev' will *always* be set and non-NULL
by rdev_for_each_rcu(), so it is incorrect to assume that the
iterator value will be NULL if the list is empty or no element
found (In fact, it will be a bogus pointer to an invalid struct
object containing the HEAD). Otherwise it will bypass the check
and lead to invalid memory access passing the check.

To fix the bug, use a new variable 'iter' as the list iterator,
while using the original variable 'pdev' as a dedicated pointer to
point to the found element.

Cc: stable@vger.kernel.org
Fixes: 70bcecdb ("md-cluster: Improve md_reload_sb to be less error prone")
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

f99eddb9

md: fix an incorrect NULL check in does_sb_need_changing · 5889d3e4

Xiaomeng Tong authored 3 years ago

stable inclusion
from stable-4.19.247
commit fcb51189503b2f5a254910b85ad1b03295e09aab
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

commit fc8738343eefc4ea8afb6122826dea48eacde514 upstream.

The bug is here:
	if (!rdev)

The list iterator value 'rdev' will *always* be set and non-NULL
by rdev_for_each(), so it is incorrect to assume that the iterator
value will be NULL if the list is empty or no element found.
Otherwise it will bypass the NULL check and lead to invalid memory
access passing the check.

To fix the bug, use a new variable 'iter' as the list iterator,
while using the original variable 'rdev' as a dedicated pointer to
point to the found element.

Cc: stable@vger.kernel.org
Fixes: 2aa82191 ("md-cluster: Perform a lazy update")
Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Acked-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

5889d3e4

ext4: avoid cycles in directory h-tree · d0fdd7f3

Jan Kara authored 3 years ago

stable inclusion
from stable-4.19.247
commit b3ad9ff6f06c1dc6abf7437691c88ca3d6da3ac0
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

commit 3ba733f879c2a88910744647e41edeefbc0d92b2 upstream.

A maliciously corrupted filesystem can contain cycles in the h-tree
stored inside a directory. That can easily lead to the kernel corrupting
tree nodes that were already verified under its hands while doing a node
split and consequently accessing unallocated memory. Fix the problem by
verifying traversed block numbers are unique.

Cc: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220518093332.13986-2-jack@suse.cz


Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

d0fdd7f3

ext4: verify dir block before splitting it · bd06f906

Jan Kara authored 3 years ago

stable inclusion
from stable-4.19.247
commit 78398c2b2cc14f9a9c8592cf6d334c5a479ed611
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

commit 46c116b920ebec58031f0a78c5ea9599b0d2a371 upstream.

Before splitting a directory block verify its directory entries are sane
so that the splitting code does not access memory it should not.

Cc: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220518093332.13986-1-jack@suse.cz


Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

bd06f906

proc: fix dentry/inode overinstantiating under /proc/${pid}/net · 4135c2ab

Alexey Dobriyan authored 3 years ago

stable inclusion
from stable-4.19.247
commit 22b5a48ac899a138552fa05b3fc69a3a0588fdbc
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY
CVE: NA

--------------------------------

[ Upstream commit 7055197705709c59b8ab77e6a5c7d46d61edd96e ]

When a process exits, /proc/${pid}, and /proc/${pid}/net dentries are
flushed.  However some leaf dentries like /proc/${pid}/net/arp_cache
aren't.  That's because respective PDEs have proc_misc_d_revalidate() hook
which returns 1 and leaves dentries/inodes in the LRU.

Force revalidation/lookup on everything under /proc/${pid}/net by
inheriting proc_net_dentry_ops.

[akpm@linux-foundation.org: coding-style cleanups]
Link: https://lkml.kernel.org/r/YjdVHgildbWO7diJ@localhost.localdomain


Fixes: c6c75deda813 ("proc: fix lookup in /proc/net subdirectories after setns(2)")
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: hui li <juanfengpy@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

4135c2ab

drivers/base/node.c: fix compaction sysfs file leak · c3dc7b42

Miaohe Lin authored 3 years ago

stable inclusion
from stable-4.19.247
commit f76ddc8fcf6d81fe89bfa4d3efcbc4fe69a91d48
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY
CVE: NA

--------------------------------

[ Upstream commit da63dc84befaa9e6079a0bc363ff0eaa975f9073 ]

Compaction sysfs file is created via compaction_register_node in
register_node.  But we forgot to remove it in unregister_node.  Thus
compaction sysfs file is leaked.  Using compaction_unregister_node to fix
this issue.

Link: https://lkml.kernel.org/r/20220401070905.43679-1-linmiaohe@huawei.com


Fixes: ed4a6d7f ("mm: compaction: add /sys trigger for per-node memory compaction")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

c3dc7b42

fsnotify: fix wrong lockdep annotations · 7df02fb0

Amir Goldstein authored 3 years ago

stable inclusion
from stable-4.19.247
commit 72632015277b56d5f8fd666ccd24cb0ed7ef1d72
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY
CVE: NA

--------------------------------

[ Upstream commit 623af4f538b5df9b416e1b82f720af7371b4c771 ]

Commit 6960b0d9 ("fsnotify: change locking order") changed some
of the mark_mutex locks in direct reclaim path to use:
  mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);

This change is explained:
 "...It uses nested locking to avoid deadlock in case we do the final
  iput() on an inode which still holds marks and thus would take the
  mutex again when calling fsnotify_inode_delete() in destroy_inode()."

The problem is that the mutex_lock_nested() is not a nested lock at
all. In fact, it has the opposite effect of preventing lockdep from
warning about a very possible deadlock.

Due to these wrong annotations, a deadlock that was introduced with
nfsd filecache in kernel v5.4 went unnoticed in v5.4.y for over two
years until it was reported recently by Khazhismel Kumykov, only to
find out that the deadlock was already fixed in kernel v5.5.

Fix the wrong lockdep annotations.

Cc: Khazhismel Kumykov <khazhy@google.com>
Fixes: 6960b0d9 ("fsnotify: change locking order")
Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/
Link: https://lore.kernel.org/r/20220422120327.3459282-4-amir73il@gmail.com


Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

7df02fb0

PCI: Avoid pci_dev_lock() AB/BA deadlock with sriov_numvfs_store() · c28291f4

Yicong Yang authored 3 years ago

stable inclusion
from stable-4.19.247
commit aed6d4d519210c28817948f34c53b6e058e0456c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY
CVE: NA

--------------------------------

[ Upstream commit a91ee0e9fca9d7501286cfbced9b30a33e52740a ]

The sysfs sriov_numvfs_store() path acquires the device lock before the
config space access lock:

  sriov_numvfs_store
    device_lock                 # A (1) acquire device lock
    sriov_configure
      vfio_pci_sriov_configure  # (for example)
        vfio_pci_core_sriov_configure
          pci_disable_sriov
            sriov_disable
              pci_cfg_access_lock
                pci_wait_cfg    # B (4) wait for dev->block_cfg_access == 0

Previously, pci_dev_lock() acquired the config space access lock before the
device lock:

  pci_dev_lock
    pci_cfg_access_lock
      dev->block_cfg_access = 1 # B (2) set dev->block_cfg_access = 1
    device_lock                 # A (3) wait for device lock

Any path that uses pci_dev_lock(), e.g., pci_reset_function(), may
deadlock with sriov_numvfs_store() if the operations occur in the sequence
(1) (2) (3) (4).

Avoid the deadlock by reversing the order in pci_dev_lock() so it acquires
the device lock before the config space access lock, the same as the
sriov_numvfs_store() path.

[bhelgaas: combined and adapted commit log from Jay Zhou's independent
subsequent posting:
https://lore.kernel.org/r/20220404062539.1710-1-jianjay.zhou@huawei.com]
Link: https://lore.kernel.org/linux-pci/1583489997-17156-1-git-send-email-yangyicong@hisilicon.com/


Also-posted-by: Jay Zhou <jianjay.zhou@huawei.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

c28291f4

fat: add ratelimit to fat*_ent_bread() · fc08a42b

OGAWA Hirofumi authored 3 years ago

stable inclusion
from stable-4.19.247
commit 2d03231e5cc5f1baf97090d82eac81fd53ab1b32
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY
CVE: NA

--------------------------------

[ Upstream commit 183c3237c928109d2008c0456dff508baf692b20 ]

fat*_ent_bread() can be the cause of too many report on I/O error path.
So use fat_msg_ratelimit() instead.

Link: https://lkml.kernel.org/r/87bkxogfeq.fsf@mail.parknet.co.jp


Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reported-by: qianfan <qianfanguijin@163.com>
Tested-by: qianfan <qianfanguijin@163.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

fc08a42b

nvme-pci: fix a NULL pointer dereference in nvme_alloc_admin_tags · a13ced4e

Smith, Kyle Miller (Nimble Kernel) authored 3 years ago

stable inclusion
from stable-4.19.247
commit 8da2b7bdb47e94bbc4062a3978c708926bcb022c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

[ Upstream commit da42761181627e9bdc37d18368b827948a583929 ]

In nvme_alloc_admin_tags, the admin_q can be set to an error (typically
-ENOMEM) if the blk_mq_init_queue call fails to set up the queue, which
is checked immediately after the call. However, when we return the error
message up the stack, to nvme_reset_work the error takes us to
nvme_remove_dead_ctrl()
  nvme_dev_disable()
   nvme_suspend_queue(&dev->queues[0]).

Here, we only check that the admin_q is non-NULL, rather than not
an error or NULL, and begin quiescing a queue that never existed, leading
to bad / NULL pointer dereference.

Signed-off-by: Kyle Smith <kyles@hpe.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

a13ced4e

bpf: Enlarge offset check value to INT_MAX in bpf_skb_{load,store}_bytes · cffd8a45

Liu Jian authored 3 years ago

stable inclusion
from stable-4.19.246
commit 22771d3095deeca0d2aefd55999fa8a50caea3cd
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

commit 45969b4152c1752089351cd6836a42a566d49bcf upstream.

The data length of skb frags + frag_list may be greater than 0xffff, and
skb_header_pointer can not handle negative offset. So, here INT_MAX is used
to check the validity of offset. Add the same change to the related function
skb_store_bytes.

Fixes: 05c74e5e ("bpf: add bpf_skb_load_bytes helper")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220416105801.88708-2-liujian56@huawei.com


Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

cffd8a45

dm stats: add cond_resched when looping over entries · 8c25a2b9

Mikulas Patocka authored 3 years ago

stable inclusion
from stable-4.19.246
commit fdf6803caf6ebd94f9ead27f888073c4def13715
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

commit bfe2b0146c4d0230b68f5c71a64380ff8d361f8b upstream.

dm-stats can be used with a very large number of entries (it is only
limited by 1/4 of total system memory), so add rescheduling points to
the loops that iterate over the entries.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

8c25a2b9

zsmalloc: fix races between asynchronous zspage free and page migration · 16beb196

Sultan Alsawaf authored 3 years ago

stable inclusion
from stable-4.19.246
commit 645996efc2ae391246d595832aaa6f9d3cc338c7
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY
CVE: NA

--------------------------------

commit 2505a981114dcb715f8977b8433f7540854851d8 upstream.

The asynchronous zspage free worker tries to lock a zspage's entire page
list without defending against page migration.  Since pages which haven't
yet been locked can concurrently migrate off the zspage page list while
lock_zspage() churns away, lock_zspage() can suffer from a few different
lethal races.

It can lock a page which no longer belongs to the zspage and unsafely
dereference page_private(), it can unsafely dereference a torn pointer to
the next page (since there's a data race), and it can observe a spurious
NULL pointer to the next page and thus not lock all of the zspage's pages
(since a single page migration will reconstruct the entire page list, and
create_page_chain() unconditionally zeroes out each list pointer in the
process).

Fix the races by using migrate_read_lock() in lock_zspage() to synchronize
with page migration.

Link: https://lkml.kernel.org/r/20220509024703.243847-1-sultan@kerneltoast.com


Fixes: 77ff4657 ("zsmalloc: zs_page_migrate: skip unnecessary loops but not return -EBUSY if zspage is not inuse")
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

16beb196

netfilter: conntrack: re-fetch conntrack after insertion · 75b72e66

Florian Westphal authored 3 years ago

stable inclusion
from stable-4.19.246
commit 92a999d1963eed0df666284e20055136ceabd12f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

commit 56b14ecec97f39118bf85c9ac2438c5a949509ed upstream.

In case the conntrack is clashing, insertion can free skb->_nfct and
set skb->_nfct to the already-confirmed entry.

This wasn't found before because the conntrack entry and the extension
space used to free'd after an rcu grace period, plus the race needs
events enabled to trigger.

Reported-by:  <syzbot+793a590957d9c1b96620@syzkaller.appspotmail.com>
Fixes: 71d8c47f ("netfilter: conntrack: introduce clash resolution on insertion race")
Fixes: 2ad9d774 ("netfilter: conntrack: free extension area immediately")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

75b72e66

assoc_array: Fix BUG_ON during garbage collect · 541279b2

Stephen Brennan authored 3 years ago

stable inclusion
from stable-4.19.246
commit 5b18856296423473cf0d8a6af8aef5df66ae1075
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY
CVE: NA

--------------------------------

commit d1dc87763f406d4e67caf16dbe438a5647692395 upstream.

A rare BUG_ON triggered in assoc_array_gc:

    [3430308.818153] kernel BUG at lib/assoc_array.c:1609!

Which corresponded to the statement currently at line 1593 upstream:

    BUG_ON(assoc_array_ptr_is_meta(p));

Using the data from the core dump, I was able to generate a userspace
reproducer[1] and determine the cause of the bug.

[1]: https://github.com/brenns10/kernel_stuff/tree/master/assoc_array_gc



After running the iterator on the entire branch, an internal tree node
looked like the following:

    NODE (nr_leaves_on_branch: 3)
      SLOT [0] NODE (2 leaves)
      SLOT [1] NODE (1 leaf)
      SLOT [2..f] NODE (empty)

In the userspace reproducer, the pr_devel output when compressing this
node was:

    -- compress node 0x5607cc089380 --
    free=0, leaves=0
    [0] retain node 2/1 [nx 0]
    [1] fold node 1/1 [nx 0]
    [2] fold node 0/1 [nx 2]
    [3] fold node 0/2 [nx 2]
    [4] fold node 0/3 [nx 2]
    [5] fold node 0/4 [nx 2]
    [6] fold node 0/5 [nx 2]
    [7] fold node 0/6 [nx 2]
    [8] fold node 0/7 [nx 2]
    [9] fold node 0/8 [nx 2]
    [10] fold node 0/9 [nx 2]
    [11] fold node 0/10 [nx 2]
    [12] fold node 0/11 [nx 2]
    [13] fold node 0/12 [nx 2]
    [14] fold node 0/13 [nx 2]
    [15] fold node 0/14 [nx 2]
    after: 3

At slot 0, an internal node with 2 leaves could not be folded into the
node, because there was only one available slot (slot 0). Thus, the
internal node was retained. At slot 1, the node had one leaf, and was
able to be folded in successfully. The remaining nodes had no leaves,
and so were removed. By the end of the compression stage, there were 14
free slots, and only 3 leaf nodes. The tree was ascended and then its
parent node was compressed. When this node was seen, it could not be
folded, due to the internal node it contained.

The invariant for compression in this function is: whenever
nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT, the node should contain all
leaf nodes. The compression step currently cannot guarantee this, given
the corner case shown above.

To fix this issue, retry compression whenever we have retained a node,
and yet nr_leaves_on_branch < ASSOC_ARRAY_FAN_OUT. This second
compression will then allow the node in slot 1 to be folded in,
satisfying the invariant. Below is the output of the reproducer once the
fix is applied:

    -- compress node 0x560e9c562380 --
    free=0, leaves=0
    [0] retain node 2/1 [nx 0]
    [1] fold node 1/1 [nx 0]
    [2] fold node 0/1 [nx 2]
    [3] fold node 0/2 [nx 2]
    [4] fold node 0/3 [nx 2]
    [5] fold node 0/4 [nx 2]
    [6] fold node 0/5 [nx 2]
    [7] fold node 0/6 [nx 2]
    [8] fold node 0/7 [nx 2]
    [9] fold node 0/8 [nx 2]
    [10] fold node 0/9 [nx 2]
    [11] fold node 0/10 [nx 2]
    [12] fold node 0/11 [nx 2]
    [13] fold node 0/12 [nx 2]
    [14] fold node 0/13 [nx 2]
    [15] fold node 0/14 [nx 2]
    internal nodes remain despite enough space, retrying
    -- compress node 0x560e9c562380 --
    free=14, leaves=1
    [0] fold node 2/15 [nx 0]
    after: 3

Changes

=======
DH:
 - Use false instead of 0.
 - Reorder the inserted lines in a couple of places to put retained before
   next_slot.

ver #2)
 - Fix typo in pr_devel, correct comparison to "<="

Fixes: 3cb98950 ("Add a generic associative array implementation.")
Cc: <stable@vger.kernel.org>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Andrew Morton <akpm@linux-foundation.org>
cc: keyrings@vger.kernel.org
Link: https://lore.kernel.org/r/20220511225517.407935-1-stephen.s.brennan@oracle.com/ # v1
Link: https://lore.kernel.org/r/20220512215045.489140-1-stephen.s.brennan@oracle.com/

 # v2
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

541279b2

net: af_key: check encryption module availability consistency · 596113f2

Thomas Bartschies authored 3 years ago

stable inclusion
from stable-4.19.246
commit 539d5deba06e0700807840e8d77507fcb6e4be3c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

[ Upstream commit 015c44d7bff3f44d569716117becd570c179ca32 ]

Since the recent introduction supporting the SM3 and SM4 hash algos for IPsec, the kernel
produces invalid pfkey acquire messages, when these encryption modules are disabled. This
happens because the availability of the algos wasn't checked in all necessary functions.
This patch adds these checks.

Signed-off-by: Thomas Bartschies <thomas.bartschies@cvk.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

596113f2

x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests · 3d3df885

Thomas Gleixner authored 3 years ago

stable inclusion
from stable-4.19.246
commit 06293f7d7b7c1dcb2b744d0ac53571b6ff53010a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

commit 7e0815b3e09986d2fe651199363e135b9358132a upstream.

When a XEN_HVM guest uses the XEN PIRQ/Eventchannel mechanism, then
PCI/MSI[-X] masking is solely controlled by the hypervisor, but contrary to
XEN_PV guests this does not disable PCI/MSI[-X] masking in the PCI/MSI
layer.

This can lead to a situation where the PCI/MSI layer masks an MSI[-X]
interrupt and the hypervisor grants the write despite the fact that it
already requested the interrupt. As a consequence interrupt delivery on the
affected device is not happening ever.

Set pci_msi_ignore_mask to prevent that like it's done for XEN_PV guests
already.

Fixes: 809f9267 ("xen: map MSIs into pirqs")
Reported-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Reported-by: Dusty Mabe <dustymabe@redhat.com>
Reported-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Noah Meyerhans <noahm@debian.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87tuaduxj5.ffs@tglx


[nmeyerha@amazon.com: backported to 4.19]
Signed-off-by: Noah Meyerhans <nmeyerha@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

3d3df885

net: bridge: Clear offload_fwd_mark when passing frame up bridge interface. · 087d6f90

Andrew Lunn authored 3 years ago

stable inclusion
from stable-4.19.245
commit b1f86c34b2720efc2d3899da572309e515db5190
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

[ Upstream commit fbb3abdf2223cd0dfc07de85fe5a43ba7f435bdf ]

It is possible to stack bridges on top of each other. Consider the
following which makes use of an Ethernet switch:

       br1
     /    \
    /      \
   /        \
 br0.11    wlan0
   |
   br0
 /  |  \
p1  p2  p3

br0 is offloaded to the switch. Above br0 is a vlan interface, for
vlan 11. This vlan interface is then a slave of br1. br1 also has a
wireless interface as a slave. This setup trunks wireless lan traffic
over the copper network inside a VLAN.

A frame received on p1 which is passed up to the bridge has the
skb->offload_fwd_mark flag set to true, indicating that the switch has
dealt with forwarding the frame out ports p2 and p3 as needed. This
flag instructs the software bridge it does not need to pass the frame
back down again. However, the flag is not getting reset when the frame
is passed upwards. As a result br1 sees the flag, wrongly interprets
it, and fails to forward the frame to wlan0.

When passing a frame upwards, clear the flag. This is the Rx
equivalent of br_switchdev_frame_unmark() in br_dev_xmit().

Fixes: f1c2eddf ("bridge: switchdev: Use an helper to clear forward mark")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20220518005840.771575-1-andrew@lunn.ch


Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

087d6f90

ARM: 9197/1: spectre-bhb: fix loop8 sequence for Thumb2 · dfc4e75b

Ard Biesheuvel authored 3 years ago

stable inclusion
from stable-4.19.245
commit 047794b3cfaff4313f379d0cb0509f1c0b972e26
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

[ Upstream commit 3cfb3019979666bdf33a1010147363cf05e0f17b ]

In Thumb2, 'b . + 4' produces a branch instruction that uses a narrow
encoding, and so it does not jump to the following instruction as
expected. So use W(b) instead.

Fixes: 6c7cb60bff7a ("ARM: fix Thumb2 regression with Spectre BHB")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

dfc4e75b

ARM: 9196/1: spectre-bhb: enable for Cortex-A15 · 21c8b698

Ard Biesheuvel authored 3 years ago

stable inclusion
from stable-4.19.245
commit 0569702c238290924fc1c7c6954258aa4a5fd649
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5FNPY


CVE: NA

--------------------------------

[ Upstream commit 0dc14aa94ccd8ba35eb17a0f9b123d1566efd39e ]

The Spectre-BHB mitigations were inadvertently left disabled for
Cortex-A15, due to the fact that cpu_v7_bugs_init() is not called in
that case. So fix that.

Fixes: b9baf5c8c5c3 ("ARM: Spectre-BHB workaround")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

21c8b698

block:Fix kabi broken · 64ba823f

Luo Meng authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ETAB


CVE: NA

--------------------------------

Because add state in gendisk and remove flags in block_device, to
fix the kabi chage.

Signed-off-by: Luo Meng <luomeng12@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

64ba823f

block: Fix warning in bd_link_disk_holder() · f20a726b

Luo Meng authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ETAB


CVE: NA

--------------------------------

Warning reports as follows:

 WARNING: CPU: 3 PID: 674 at fs/block_dev.c:1272 bd_link_disk_holder+0xcd/0x270
 Modules linked in: null_blk(+)
 CPU: 3 PID: 674 Comm: dmsetup Not tainted 5.10.0-16691-gf6076432827d-dirty #158
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-4
 RIP: 0010:bd_link_disk_holder+0xcd/0x270
 Code: 69 73 ee 00 44 89 e8 5b 48 83 05 c5 bf 6d 0c 01 5d 41 5c 41 5d 41 5e 41 8
 RSP: 0018:ffffc9000049bbb8 EFLAGS: 00010202
 RAX: ffff888104e39038 RBX: ffff888104185000 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: ffffffffaa085692 RDI: 0000000000000000
 RBP: ffff88810cc2ae00 R08: ffffffffa853659b R09: 0000000000000000
 R10: ffffc9000049bbb0 R11: 720030626c6c756e R12: ffff88810e800000
 R13: ffff88810e800090 R14: ffff888103570c98 R15: ffff888103570c80
 FS:  00007fb49dc13dc0(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007ff994ebde70 CR3: 000000010d54a000 CR4: 00000000000006e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  dm_get_table_device+0x175/0x300
  dm_get_device+0x238/0x360
  linear_ctr+0xee/0x170
  dm_table_add_target+0x199/0x4b0
  table_load+0x18c/0x480
  ? table_clear+0x190/0x190
  ctl_ioctl+0x21d/0x640
  ? check_preemption_disabled+0x140/0x150
  dm_ctl_ioctl+0x12/0x20
  __se_sys_ioctl+0xb1/0x100
  __x64_sys_ioctl+0x1e/0x30
  do_syscall_64+0x45/0x70
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

This can reproduce by concurrent operations:
	1. modprobe null_blk
	2. echo -e "0 10000 linear /dev/nullb0 0" > table
	   dmsetup create xxx table

t1: create disk a                   |     t2: dm setup
                                    |
device_add_disk                     |
 dev->devt = devt                   |
                        	    | dm_get_table_device
                        	    | open_table_device
                        	    | blkdev_get_by_dev -> succeed
				    | bd_link_disk_holder
                        	    |  -> holder_dir is still NULL
 register_disk -> create holder_dir
  kobject_create_and_add

device_add_disk() will set devt before creating holder_dir, which
leaves a window that dm_get_table_device() can find the disk by
devt while it's holder_dir is NULL.

So move GENHD_FL_UP in blk_register_queue() to avoid this warning and
fix a NULL-ptr in  __blk_mq_sched_bio_merge().

Signed-off-by: Luo Meng <luomeng12@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

f20a726b

block: move the NEED_PART_SCAN flag to struct gendisk · b2f0e44f

Christoph Hellwig authored 3 years ago

mainline inclusion
from mainline-v5.10-rc1
commit 38430f08
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ETAB


CVE: NA

-------------------------------------------------

We can only scan for partitions on the whole disk, so move the flag
from struct block_device to struct gendisk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Conflicts:
	block/genhd.c
	drivers/ide/ide-gd.c
	fs/block_dev.c
	include/linux/blk_types.h

Signed-off-by: Luo Meng <luomeng12@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

b2f0e44f

block: rename bd_invalidated · b6113052

Christoph Hellwig authored 3 years ago

mainline inclusion
from mainline-v5.10-rc1
commit f4ad06f2
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ETAB


CVE: NA

-------------------------------------------------

Replace bd_invalidate with a new BDEV_NEED_PART_SCAN flag in a bd_flags
variable to better describe the condition.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

Conflicts:
	fs/block_dev.c
	include/linux/blk_types.h

Signed-off-by: Luo Meng <luomeng12@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

b6113052

Jul 07, 2022

scsi: hisi_sas: Modify v3 HW I/O processing when SATA_DISK_ERR bit is set and NCQ Error occurs · fc811070

Xingui Yang authored 3 years ago

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5CG2F


CVE: NA

-----------------------------------------------------------------------

SATA_DISK_ERR bit is bit16 of cq dw3，when it is set to 1, it means this
sata disk is in error status and IPTT is invalid, such as NCQ error. In
this scenario, new IO issued from this disk will be rejected by sas
controller, all I/O remained in disk should be aborted.

To ensure sas controller wouldn't operate memory before abort all I/O, all
I/O remained in the disk should be set to aborted state by register and
completed with state SAS_ABORTED_TASK through task_done(), then SCSI error
handling thread will be wake up immediately to analyze the cause of the
error, such as read log page for error details.

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: kang fenglong <kangfenglong@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

fc811070

scsi: hisi_sas: enable use_clustering · 5f32f8b7

Xingui Yang authored 3 years ago

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5B468


CVE: NA

------------------------------------------------

Enable "clustering", that is merging of segments so that they might span
more than a single page, and optimized the issue that 520 KB of service
delivery is split.

fio test with --filename=/dev/sdb --bs=520k --iodepth=32

before:
[root@localhost ~]# cat /sys/block/sdb/queue/max_segment_size
4096

[root@localhost ~]#iostat -x
Device ... r_await rareq-sz ... aqu-sz  %util
sdb    ... 29.78   259.89   ... 5.87    9.92

after:
[root@localhost ~]# cat /sys/block/sdb/queue/max_segment_size
65536

[root@localhost ~]#iostat -x
Device ... r_await rareq-sz ... aqu-sz  %util
sdb    ... 29.80   516.03   ... 1.34    4.50

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: kang fenglong <kangfenglong@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

5f32f8b7

scsi: hisi_sas: Change DMA setup lock timeout to 2.5s · 510ebd8e

Xingui Yang authored 3 years ago

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5BXH1


CVE: NA

-------------------------------------------------------------------

DMA setup lock timeout protection is added when DMA setup frames are
received, it's a function outside the protocol and used to prevent SATA
disk I/Os from being delivered for a long time. The default value is 100ms
, it's too strict and easily triggered timeout when the disk is overloaded
or faulty. Based on the average I/O latency of 300 disks, we adjust the
value to 2.5s.

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: kang fenglong <kangfenglong@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

510ebd8e

Jul 06, 2022

x86/speculation/mmio: Print SMT warning · 6d8e8ddf

Josh Poimboeuf authored 3 years ago

stable inclusion
from stable-v4.19.248
commit 0255c936bfaa1887f7043b995f1c9e1049bb25f1
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5D5RS


CVE: CVE-2022-21123,CVE-2022-21125,CVE-2022-21166

--------------------------------

commit 1dc6ff02c8bf77d71b9b5d11cbc9df77cfb28626 upstream

Similar to MDS and TAA, print a warning if SMT is enabled for the MMIO
Stale Data vulnerability.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Reviewed-by: Zhang Jianhua <chris.zjh@huawei.com>
Reviewed-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

6d8e8ddf

KVM: x86/speculation: Disable Fill buffer clear within guests · 4c9a908c

Pawan Gupta authored 3 years ago

stable inclusion
from stable-v4.19.248
commit e0d1437042f0b491bf2cb7880628b0bd7783f80d
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5D5RS


CVE: CVE-2022-21123,CVE-2022-21125,CVE-2022-21166

--------------------------------

commit 027bbb884be006b05d9c577d6401686053aa789e upstream

The enumeration of MD_CLEAR in CPUID(EAX=7,ECX=0).EDX{bit 10} is not an
accurate indicator on all CPUs of whether the VERW instruction will
overwrite fill buffers. FB_CLEAR enumeration in
IA32_ARCH_CAPABILITIES{bit 17} covers the case of CPUs that are not
vulnerable to MDS/TAA, indicating that microcode does overwrite fill
buffers.

Guests running in VMM environments may not be aware of all the
capabilities/vulnerabilities of the host CPU. Specifically, a guest may
apply MDS/TAA mitigations when a virtual CPU is enumerated as vulnerable
to MDS/TAA even when the physical CPU is not. On CPUs that enumerate
FB_CLEAR_CTRL the VMM may set FB_CLEAR_DIS to skip overwriting of fill
buffers by the VERW instruction. This is done by setting FB_CLEAR_DIS
during VMENTER and resetting on VMEXIT. For guests that enumerate
FB_CLEAR (explicitly asking for fill buffer clear capability) the VMM
will not use FB_CLEAR_DIS.

Irrespective of guest state, host overwrites CPU buffers before VMENTER
to protect itself from an MMIO capable guest, as part of mitigation for
MMIO Stale Data vulnerabilities.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[cascardo: arch/x86/kvm/vmx.c has been split and context adjustment at
 vmx_vcpu_run]
[cascardo: moved functions so they are after struct vcpu_vmx definition]
[cascardo: fb_clear is disabled/enabled around __vmx_vcpu_run]
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Conflicts:
            arch/x86/kvm/vmx.c
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Reviewed-by: Zhang Jianhua <chris.zjh@huawei.com>
Reviewed-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

4c9a908c

x86/speculation/mmio: Reuse SRBDS mitigation for SBDS · a7b8a41c

Pawan Gupta authored 3 years ago

stable inclusion
from stable-v4.19.248
commit 0e94464009ee37217a7e450c96ea1f8d42d3a6b5
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5D5RS


CVE: CVE-2022-21123,CVE-2022-21125,CVE-2022-21166

--------------------------------

commit a992b8a4682f119ae035a01b40d4d0665c4a2875 upstream

The Shared Buffers Data Sampling (SBDS) variant of Processor MMIO Stale
Data vulnerabilities may expose RDRAND, RDSEED and SGX EGETKEY data.
Mitigation for this is added by a microcode update.

As some of the implications of SBDS are similar to SRBDS, SRBDS mitigation
infrastructure can be leveraged by SBDS. Set X86_BUG_SRBDS and use SRBDS
mitigation.

Mitigation is enabled by default; use srbds=off to opt-out. Mitigation
status can be checked from below file:

  /sys/devices/system/cpu/vulnerabilities/srbds

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[cascardo: adjust for processor model names]
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Reviewed-by: Zhang Jianhua <chris.zjh@huawei.com>
Reviewed-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

a7b8a41c

x86/speculation/srbds: Update SRBDS mitigation selection · 42d1c023

Pawan Gupta authored 3 years ago

stable inclusion
from stable-v4.19.248
commit 3ecb6dbad25b448ed8240f0ec2c7a8ff5155b7ea
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5D5RS


CVE: CVE-2022-21123,CVE-2022-21125,CVE-2022-21166

--------------------------------

commit 22cac9c677c95f3ac5c9244f8ca0afdc7c8afb19 upstream

Currently, Linux disables SRBDS mitigation on CPUs not affected by
MDS and have the TSX feature disabled. On such CPUs, secrets cannot
be extracted from CPU fill buffers using MDS or TAA. Without SRBDS
mitigation, Processor MMIO Stale Data vulnerabilities can be used to
extract RDRAND, RDSEED, and EGETKEY data.

Do not disable SRBDS mitigation by default when CPU is also affected by
Processor MMIO Stale Data vulnerabilities.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Reviewed-by: Zhang Jianhua <chris.zjh@huawei.com>
Reviewed-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

42d1c023

x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data · 602cbd94

Pawan Gupta authored 3 years ago

stable inclusion
from stable-v4.19.248
commit f2983fbba1cccac611d4966277f0336374fad0be
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5D5RS


CVE: CVE-2022-21123,CVE-2022-21125,CVE-2022-21166

--------------------------------

commit 8d50cdf8b8341770bc6367bce40c0c1bb0e1d5b3 upstream

Add the sysfs reporting file for Processor MMIO Stale Data
vulnerability. It exposes the vulnerability and mitigation state similar
to the existing files for the other hardware vulnerabilities.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Reviewed-by: Zhang Jianhua <chris.zjh@huawei.com>
Reviewed-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

602cbd94

x86/speculation/mmio: Enable CPU Fill buffer clearing on idle · 8d31bc35

Pawan Gupta authored 3 years ago

stable inclusion
from stable-v4.19.248
commit 8b42145e8c9903d4805651e08f4fca628e166642
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5D5RS


CVE: CVE-2022-21123,CVE-2022-21125,CVE-2022-21166

--------------------------------

commit 99a83db5a605137424e1efe29dc0573d6a5b6316 upstream

When the CPU is affected by Processor MMIO Stale Data vulnerabilities,
Fill Buffer Stale Data Propagator (FBSDP) can propagate stale data out
of Fill buffer to uncore buffer when CPU goes idle. Stale data can then
be exploited with other variants using MMIO operations.

Mitigate it by clearing the Fill buffer before entering idle state.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Co-developed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Reviewed-by: Zhang Jianhua <chris.zjh@huawei.com>
Reviewed-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

8d31bc35

x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigations · 8e8722cd

Pawan Gupta authored 3 years ago

stable inclusion
from stable-v4.19.248
commit 54974c8714283feb5bf64df3bfe0f44267db5a3c
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5D5RS


CVE: CVE-2022-21123,CVE-2022-21125,CVE-2022-21166

--------------------------------

commit e5925fb867290ee924fcf2fe3ca887b792714366 upstream

MDS, TAA and Processor MMIO Stale Data mitigations rely on clearing CPU
buffers. Moreover, status of these mitigations affects each other.
During boot, it is important to maintain the order in which these
mitigations are selected. This is especially true for
md_clear_update_mitigation() that needs to be called after MDS, TAA and
Processor MMIO Stale Data mitigation selection is done.

Introduce md_clear_select_mitigation(), and select all these mitigations
from there. This reflects relationships between these mitigations and
ensures proper ordering.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Reviewed-by: Zhang Jianhua <chris.zjh@huawei.com>
Reviewed-by: Liao Chang <liaochang1@huawei.com>
Reviewed-by: Liao Chang <liaochang1@huawei.com>
Reviewed-by: Liao Chang <liaochang1@huawei.com>
Reviewed-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

8e8722cd

x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data · 4d71d4bc

Pawan Gupta authored 3 years ago

stable inclusion
from stable-v4.19.248
commit 9f2ce43ebc33713ba02a89a66bd5f93c2f3a82cf
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5D5RS


CVE: CVE-2022-21123,CVE-2022-21125,CVE-2022-21166

--------------------------------

commit 8cb861e9e3c9a55099ad3d08e1a3b653d29c33ca upstream

Processor MMIO Stale Data is a class of vulnerabilities that may
expose data after an MMIO operation. For details please refer to
Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst.

These vulnerabilities are broadly categorized as:

Device Register Partial Write (DRPW):
  Some endpoint MMIO registers incorrectly handle writes that are
  smaller than the register size. Instead of aborting the write or only
  copying the correct subset of bytes (for example, 2 bytes for a 2-byte
  write), more bytes than specified by the write transaction may be
  written to the register. On some processors, this may expose stale
  data from the fill buffers of the core that created the write
  transaction.

Shared Buffers Data Sampling (SBDS):
  After propagators may have moved data around the uncore and copied
  stale data into client core fill buffers, processors affected by MFBDS
  can leak data from the fill buffer.

Shared Buffers Data Read (SBDR):
  It is similar to Shared Buffer Data Sampling (SBDS) except that the
  data is directly read into the architectural software-visible state.

An attacker can use these vulnerabilities to extract data from CPU fill
buffers using MDS and TAA methods. Mitigate it by clearing the CPU fill
buffers using the VERW instruction before returning to a user or a
guest.

On CPUs not affected by MDS and TAA, user application cannot sample data
from CPU fill buffers using MDS or TAA. A guest with MMIO access can
still use DRPW or SBDR to extract data architecturally. Mitigate it with
VERW instruction to clear fill buffers before VMENTER for MMIO capable
guests.

Add a kernel parameter mmio_stale_data={off|full|full,nosmt} to control
the mitigation.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[cascardo: arch/x86/kvm/vmx.c has been moved]
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Conflicts:
            arch/x86/kvm/vmx.c
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Reviewed-by: Zhang Jianhua <chris.zjh@huawei.com>
Reviewed-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

4d71d4bc

x86/speculation: Add a common function for MD_CLEAR mitigation update · 440f08e3

Pawan Gupta authored 3 years ago

stable inclusion
from stable-v4.19.248
commit d03de576a604899741a0ebadcfe2a4a19ee53ba3
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5D5RS


CVE: CVE-2022-21123,CVE-2022-21125,CVE-2022-21166

--------------------------------

commit f52ea6c26953fed339aa4eae717ee5c2133c7ff2 upstream

Processor MMIO Stale Data mitigation uses similar mitigation as MDS and
TAA. In preparation for adding its mitigation, add a common function to
update all mitigations that depend on MD_CLEAR.

  [ bp: Add a newline in md_clear_update_mitigation() to separate
    statements better. ]

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Reviewed-by: Zhang Jianhua <chris.zjh@huawei.com>
Reviewed-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

440f08e3

x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug · dc0038cd

Pawan Gupta authored 3 years ago

stable inclusion
from stable-v4.19.248
commit 9277b11cafd0472db9e7d634de52d7c5d8d25462
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5D5RS


CVE: CVE-2022-21123,CVE-2022-21125,CVE-2022-21166

--------------------------------

commit 51802186158c74a0304f51ab963e7c2b3a2b046f upstream

Processor MMIO Stale Data is a class of vulnerabilities that may
expose data after an MMIO operation. For more details please refer to
Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst

Add the Processor MMIO Stale Data bug enumeration. A microcode update
adds new bits to the MSR IA32_ARCH_CAPABILITIES, define them.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[cascardo: adapted family names to the ones in v4.19]
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Reviewed-by: Zhang Jianhua <chris.zjh@huawei.com>
Reviewed-by: Liao Chang <liaochang1@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

dc0038cd