Commits · 80413eaa8949654be763eeb5f725a4552f2b1df0 · Summer2022 / 22b970497

Oct 11, 2022

net: Fix a data-race around sysctl_net_busy_poll. · 80413eaa

stable inclusion
from stable-v4.19.257
commit da89cab514b3ec7fdedb378617160c47fe3b60a9
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit c42b7cddea47503411bfb5f2f93a4154aaffa2d9 ]

While reading sysctl_net_busy_poll, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 06021292 ("net: add low latency socket poll")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

80413eaa

net: Fix a data-race around sysctl_tstamp_allow_data. · 8b6ba6cf

Kuniyuki Iwashima authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 04ec1e942cff42fecdde5ee30f42fc28b822379b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit d2154b0afa73c0159b2856f875c6b4fe7cf6a95e ]

While reading sysctl_tstamp_allow_data, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: b245be1f ("net-timestamp: no-payload only sysctl")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

8b6ba6cf

ratelimit: Fix data-races in ___ratelimit(). · a9f40466

Kuniyuki Iwashima authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 0b6dccd3077ad91ba6fd368fd77cb6b792faa1ac
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit 6bae8ceb90ba76cdba39496db936164fa672b9be ]

While reading rs->interval and rs->burst, they can be changed
concurrently via sysctl (e.g. net_ratelimit_state).  Thus, we
need to add READ_ONCE() to their readers.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

a9f40466

net: Fix data-races around netdev_tstamp_prequeue. · 0326f227

Kuniyuki Iwashima authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 8121bda0093a1e4ad517d93a5707e2a5615c6f99
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit 61adf447e38664447526698872e21c04623afb8e ]

While reading netdev_tstamp_prequeue, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 3b098e2d ("net: Consistent skb timestamping")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

0326f227

net: Fix data-races around weight_p and dev_weight_[rt]x_bias. · 5088437d

Kuniyuki Iwashima authored 2 years ago

stable inclusion
from stable-v4.19.257
commit d6be3137d5b9b5e9c6828d4d7eb7d73141895c7f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit bf955b5ab8f6f7b0632cdef8e36b14e4f6e77829 ]

While reading weight_p, it can be changed concurrently.  Thus, we need
to add READ_ONCE() to its reader.

Also, dev_[rt]x_weight can be read/written at the same time.  So, we
need to use READ_ONCE() and WRITE_ONCE() for its access.  Moreover, to
use the same weight_p while changing dev_[rt]x_weight, we add a mutex
in proc_do_dev_weight().

Fixes: 3d48b53f ("net: dev_weight: TX/RX orthogonality")
Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

5088437d

net: ipvtap - add __init/__exit annotations to module init/exit funcs · 6fd5f93e

Maciej Żenczykowski authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 4b07805e34a6b2e27cc07d6f0d0c228d46130cb1
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit 4b2e3a17e9f279325712b79fb01d1493f9e3e005 ]

Looks to have been left out in an oversight.

Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Sainath Grandhi <sainath.grandhi@intel.com>
Fixes: 235a9d89 ('ipvtap: IP-VLAN based tap driver')
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Link: https://lore.kernel.org/r/20220821130808.12143-1-zenczykowski@gmail.com


Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

6fd5f93e

bonding: 802.3ad: fix no transmission of LACPDUs · 443b3d8b

Jonathan Toppins authored 2 years ago

stable inclusion
from stable-v4.19.257
commit d60e70a898cbee907c686c75fb06b680ec37b94e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit d745b5062ad2b5da90a5e728d7ca884fc07315fd ]

This is caused by the global variable ad_ticks_per_sec being zero as
demonstrated by the reproducer script discussed below. This causes
all timer values in __ad_timer_to_ticks to be zero, resulting
in the periodic timer to never fire.

To reproduce:
Run the script in
`tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh` which
puts bonding into a state where it never transmits LACPDUs.

line 44: ip link add fbond type bond mode 4 miimon 200 \
            xmit_hash_policy 1 ad_actor_sys_prio 65535 lacp_rate fast
setting bond param: ad_actor_sys_prio
given:
    params.ad_actor_system = 0
call stack:
    bond_option_ad_actor_sys_prio()
    -> bond_3ad_update_ad_actor_settings()
       -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
       -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
            params.ad_actor_system == 0
results:
     ad.system.sys_mac_addr = bond->dev->dev_addr

line 48: ip link set fbond address 52:54:00:3B:7C:A6
setting bond MAC addr
call stack:
    bond->dev->dev_addr = new_mac

line 52: ip link set fbond type bond ad_actor_sys_prio 65535
setting bond param: ad_actor_sys_prio
given:
    params.ad_actor_system = 0
call stack:
    bond_option_ad_actor_sys_prio()
    -> bond_3ad_update_ad_actor_settings()
       -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
       -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
            params.ad_actor_system == 0
results:
     ad.system.sys_mac_addr = bond->dev->dev_addr

line 60: ip link set veth1-bond down master fbond
given:
    params.ad_actor_system = 0
    params.mode = BOND_MODE_8023AD
    ad.system.sys_mac_addr == bond->dev->dev_addr
call stack:
    bond_enslave
    -> bond_3ad_initialize(); because first slave
       -> if ad.system.sys_mac_addr != bond->dev->dev_addr
          return
results:
     Nothing is run in bond_3ad_initialize() because dev_addr equals
     sys_mac_addr leaving the global ad_ticks_per_sec zero as it is
     never initialized anywhere else.

The if check around the contents of bond_3ad_initialize() is no longer
needed due to commit 5ee14e6d ("bonding: 3ad: apply ad_actor settings
changes immediately") which sets ad.system.sys_mac_addr if any one of
the bonding parameters whos set function calls
bond_3ad_update_ad_actor_settings(). This is because if
ad.system.sys_mac_addr is zero it will be set to the current bond mac
address, this causes the if check to never be true.

Fixes: 5ee14e6d ("bonding: 3ad: apply ad_actor settings changes immediately")
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

443b3d8b

xfrm: fix refcount leak in __xfrm_policy_check() · 48c23b3c

Xin Xiong authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 0769491a8acd3e85ca4c3f65080eac2c824262df
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit 9c9cb23e00ddf45679b21b4dacc11d1ae7961ebe ]

The issue happens on an error path in __xfrm_policy_check(). When the
fetching process of the object `pols[1]` fails, the function simply
returns 0, forgetting to decrement the reference count of `pols[0]`,
which is incremented earlier by either xfrm_sk_policy_lookup() or
xfrm_policy_lookup(). This may result in memory leaks.

Fix it by decreasing the reference count of `pols[0]` in that path.

Fixes: 134b0fc5 ("IPsec: propagate security module errors up from flow_cache_lookup")
Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

48c23b3c

audit: fix potential double free on error path from fsnotify_add_inode_mark · 5985e603

Gaosheng Cui authored 2 years ago

stable inclusion
from stable-v4.19.257
commit 1133d90d9d9ff3def7fc5ba160381cd611aa51ee
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

commit ad982c3be4e60c7d39c03f782733503cbd88fd2a upstream.

Audit_alloc_mark() assign pathname to audit_mark->path, on error path
from fsnotify_add_inode_mark(), fsnotify_put_mark will free memory
of audit_mark->path, but the caller of audit_alloc_mark will free
the pathname again, so there will be double free problem.

Fix this by resetting audit_mark->path to NULL pointer on error path
from fsnotify_add_inode_mark().

Cc: stable@vger.kernel.org
Fixes: 7b129323 ("fsnotify: Add group pointer in fsnotify_init_mark()")
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

5985e603

dm: return early from dm_pr_call() if DM device is suspended · 4fd23147

Mike Snitzer authored 2 years ago

stable inclusion
from stable-v4.19.256
commit 4f040ba9f15d047f173ecc01d3bba3d1e9fbee08
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

[ Upstream commit e120a5f1e78fab6223544e425015f393d90d6f0d ]

Otherwise PR ops may be issued while the broader DM device is being
reconfigured, etc.

Fixes: 9c72bad1 ("dm: call PR reserve/unreserve on each underlying device")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

4fd23147

NFSv4: Fix races in the legacy idmapper upcall · 092f8285

Trond Myklebust authored 2 years ago

stable inclusion
from stable-v4.19.256
commit bb4a4b31c29dd04d50e2b34b780acc0e1bbb8ce2
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5UQH4


CVE: NA

--------------------------------

commit 51fd2eb52c0ca8275a906eed81878ef50ae94eb0 upstream.

nfs_idmap_instantiate() will cause the process that is waiting in
request_key_with_auxdata() to wake up and exit. If there is a second
process waiting for the idmap->idmap_mutex, then it may wake up and
start a new call to request_key_with_auxdata(). If the call to
idmap_pipe_downcall() from the first process has not yet finished
calling nfs_idmap_complete_pipe_upcall_locked(), then we may end up
triggering the WARN_ON_ONCE() in nfs_idmap_prepare_pipe_upcall().

The fix is to ensure that we clear idmap->idmap_upcall_data before
calling nfs_idmap_instantiate().

Fixes: e9ab41b6 ("NFSv4: Clean up the legacy idmapper upcall")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

092f8285

Oct 10, 2022

netfilter: nf_conntrack_irc: Fix forged IP logic · ffa9f2a5

David Leadbeater authored 2 years ago

stable inclusion
from stable-v4.19.258
commit 3275f7804f40de3c578d2253232349b07c25f146
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5OWZ7


CVE: CVE-2022-2663

---------------------------

[ Upstream commit 0efe125cfb99e6773a7434f3463f7c2fa28f3a43 ]

Ensure the match happens in the right direction, previously the
destination used was the server, not the NAT host, as the comment
shows the code intended.

Additionally nf_nat_irc uses port 0 as a signal and there's no valid way
it can appear in a DCC message, so consider port 0 also forged.

Fixes: 869f37d8 ("[NETFILTER]: nf_conntrack/nf_nat: add IRC helper port")
Signed-off-by: David Leadbeater <dgl@dgl.cx>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Liu Jian <liujian56@huawei.com>
Reviewed-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

4.19.90-2210.1.0

ffa9f2a5

ext4: fix check for block being out of directory size · cc10dbac

Jan Kara authored 2 years ago

mainline inclusion
from mainline-v6.1-rc1
commit 61a1d87a324ad5e3ed27c6699dfc93218fcf3201
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I58WSQ


CVE: CVE-2022-1184

--------------------------------

The check in __ext4_read_dirblock() for block being outside of directory
size was wrong because it compared block number against directory size
in bytes. Fix it.

Fixes: 65f8ea4cd57d ("ext4: check if directory block is within i_size")
CVE: CVE-2022-1184
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Link: https://lore.kernel.org/r/20220822114832.1482-1-jack@suse.cz


Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

cc10dbac

ext4: check if directory block is within i_size · d9dc377b

Lukas Czerner authored 2 years ago

mainline inclusion
from mainline-v6.0-rc1
commit 65f8ea4cd57dbd46ea13b41dc8bac03176b04233
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I58WSQ


CVE: CVE-2022-1184

--------------------------------

Currently ext4 directory handling code implicitly assumes that the
directory blocks are always within the i_size. In fact ext4_append()
will attempt to allocate next directory block based solely on i_size and
the i_size is then appropriately increased after a successful
allocation.

However, for this to work it requires i_size to be correct. If, for any
reason, the directory inode i_size is corrupted in a way that the
directory tree refers to a valid directory block past i_size, we could
end up corrupting parts of the directory tree structure by overwriting
already used directory blocks when modifying the directory.

Fix it by catching the corruption early in __ext4_read_dirblock().

Addresses Red-Hat-Bugzilla: #2070205
CVE: CVE-2022-1184
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20220704142721.157985-1-lczerner@redhat.com


Signed-off-by: Theodore Ts'o <tytso@mit.edu>

Conflicts:
	fs/ext4/namei.c

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

d9dc377b

Oct 09, 2022

block: Fix UAF in bd_link_disk_holder() · 01b1ec1d

Luo Meng authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5TY3L


CVE: NA

--------------------------------

A crash as follows:

 BUG: unable to handle page fault for address: 000000011241cec7
 sd 5:0:0:1: [sdl] Synchronizing SCSI cache
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP PTI
 CPU: 3 PID: 2465367 Comm: multipath Kdump: loaded Tainted: G        W  O      5.10.0-60.18.0.50.h478.eulerosv2r11.x86_64 #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-20220525_182517-szxrtosci10000 04/01/2014
 RIP: 0010:kernfs_new_node+0x22/0x60
 Code: cc cc 66 0f 1f 44 00 00 0f 1f 44 00 00 41 54 41 89 cb 0f b7 ca 48 89 f2 53 48 8b 47 08 48 89 fb 48 89 de 48 85 c0 48 0f 44 c7 <48> 8b 78 50 41 51 45 89 c1 45 89 d8 e8 4d ee ff ff 5a 49 89 c4 48
 RSP: 0018:ffffa178419539e8 EFLAGS: 00010206
 RAX: 000000011241ce77 RBX: ffff9596828395a0 RCX: 000000000000a1ff
 RDX: ffff9595ada828b0 RSI: ffff9596828395a0 RDI: ffff9596828395a0
 RBP: ffff95959a9a2a80 R08: 0000000000000000 R09: 0000000000000004
 R10: ffff9595ca0bf930 R11: 0000000000000000 R12: ffff9595ada828b0
 R13: ffff9596828395a0 R14: 0000000000000001 R15: ffff9595948c5c80
 FS:  00007f64baa10200(0000) GS:ffff9596bad80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000000011241cec7 CR3: 000000011923e003 CR4: 0000000000170ee0
 Call Trace:
  kernfs_create_link+0x31/0xa0
  sysfs_do_create_link_sd+0x61/0xc0
  bd_link_disk_holder+0x10a/0x180
  dm_get_table_device+0x10b/0x1f0 [dm_mod]
  __dm_get_device+0x1e2/0x280 [dm_mod]
  ? kmem_cache_alloc_trace+0x2fb/0x410
  parse_path+0xca/0x200 [dm_multipath]
  parse_priority_group+0x19d/0x1f0 [dm_multipath]
  multipath_ctr+0x27a/0x491 [dm_multipath]
  dm_table_add_target+0x177/0x360 [dm_mod]
  table_load+0x12b/0x380 [dm_mod]
  ctl_ioctl+0x199/0x290 [dm_mod]
  ? dev_suspend+0xd0/0xd0 [dm_mod]
  dm_ctl_ioctl+0xa/0x20 [dm_mod]
  __se_sys_ioctl+0x85/0xc0
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x61/0xc6

This can be easy reproduce:
 Add delay before ret = add_symlink(bdev->bd_part->holder_dir...)
 in bd_link_disk_holder()
 dmsetup create xxx --tabel "0 1000 linear /dev/sda 0"
 echo 1 > /sys/block/sda/device/delete

Delete /dev/sda will release holder_dir, but add_symlink() will
use holder_dir. Therefore UAF will occur in this case.

Fix this problem by adding reference count to holder_dir.

Signed-off-by: Luo Meng <luomeng12@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

01b1ec1d

ALSA: pcm: oss: Fix race at SNDCTL_DSP_SYNC · e4fc0e51

Sasha Levin authored 2 years ago

stable inclusion
from stable-v5.4.215
commit 4051324a6dafd7053c74c475e80b3ba10ae672b0
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5T9C3


CVE: CVE-2022-3303

---------------------------

[ Upstream commit 8423f0b6d513b259fdab9c9bf4aaa6188d054c2d ]

There is a small race window at snd_pcm_oss_sync() that is called from
OSS PCM SNDCTL_DSP_SYNC ioctl; namely the function calls
snd_pcm_oss_make_ready() at first, then takes the params_lock mutex
for the rest.  When the stream is set up again by another thread
between them, it leads to inconsistency, and may result in unexpected
results such as NULL dereference of OSS buffer as a fuzzer spotted
recently.

The fix is simply to cover snd_pcm_oss_make_ready() call into the same
params_lock mutex with snd_pcm_oss_make_ready_locked() variant.

Reported-and-tested-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/CAFcO6XN7JDM4xSXGhtusQfS2mSBcx50VJKwQpCq=WeLt57aaZA@mail.gmail.com
Link: https://lore.kernel.org/r/20220905060714.22549-1-tiwai@suse.de


Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Xia Longlong <xialonglong1@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

e4fc0e51

block: add a new config to control dispatching bios asynchronously · 3d14cd06

余快 authored 2 years ago

hulk inclusion
category: performance
bugzilla: 187597, https://gitee.com/openeuler/kernel/issues/I5QK5M


CVE: NA

--------------------------------

If CONFIG_BLK_BIO_DISPATCH_ASYNC is enabled, and driver support
QUEUE_FLAG_DISPATCH_ASYNC, bios will be dispatched asynchronously to
specific CPUs to avoid across nodes memory access in driver.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

3d14cd06

block: fix kabi broken in request_queue · b6a187ae

余快 authored 2 years ago

hulk inclusion
category: performance
bugzilla: 187597, https://gitee.com/openeuler/kernel/issues/I5QK5M


CVE: NA

--------------------------------

request_queue_wrapper is not accessible in drivers currently,
introduce a new helper to initialize async dispatch to fix kabi broken.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

b6a187ae

md: enable dispatching bio asynchronously for raid10 by default · 8934afb9

余快 authored 2 years ago

hulk inclusion
category: performance
bugzilla: 187597, https://gitee.com/openeuler/kernel/issues/I5QK5M


CVE: NA

--------------------------------

Try to improve performance for raid when user issues io concurrently
from multiple nodes.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

8934afb9

arm64/topology: getting preferred sibling's cpumask supported by platform · c59d6d53

Wang ShaoBo authored 2 years ago

hulk inclusion
category: performance
bugzilla: https://gitee.com/openeuler/kernel/issues/I5QK5M


CVE: NA

--------------------------------

For some architectures, masking the underlying processor topology
differences can make software unable to identify the cpu distance,
which results in performance fluctuations.

So we provide additional interface for getting preferred sibling's
cpumask supported by platform, this siblings' cpumask indicates those
CPUs which are clustered with relatively short distances, but this
hardly depends on the specific implementation of the specific platform.

Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

c59d6d53

block: support to dispatch bio asynchronously · f39ebff6

余快 authored 2 years ago

hulk inclusion
category: performance
bugzilla: https://gitee.com/openeuler/kernel/issues/I5QK5M


CVE: NA

--------------------------------

In some architecture memory access latency is very bad across nodes
compare to local node. For consequence, io performance is rather bad
while users issue io from multiple nodes if lock contention exist in
the driver.

This patch make io dispatch asynchronously to specific kthread that is
bind to cpus that are belong to the same node, so that memory access
across nodes in driver can be avoided.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

f39ebff6

block: add new fields in request_queue · 4fc0fcd6

余快 authored 2 years ago

hulk inclusion
category: performance
bugzilla: https://gitee.com/openeuler/kernel/issues/I5QK5M


CVE: NA

--------------------------------

Add a new flag QUEUE_FLAG_DISPATCH_ASYNC and two new fields
'dispatch_cpumask' and 'last_dispatch_cpu' for request_queue, prepare
to support dispatch bio asynchronous in specified cpus. This patch also
add sysfs apis.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

4fc0fcd6

md/raid10: convert resync_lock to use seqlock · f82f6d68

余快 authored 2 years ago

mainline inclusion
from md-next
commit ddc489e066cd267b383c0eed4f576f6bdb154588
category: performance
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5PRMO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=ddc489e066cd267b383c0eed4f576f6bdb154588



---------------------

Currently, wait_barrier() will hold 'resync_lock' to read 'conf->barrier',
and io can't be dispatched until 'barrier' is dropped.

Since holding the 'barrier' is not common, convert 'resync_lock' to use
seqlock so that holding lock can be avoided in fast path.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-and-tested-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

f82f6d68

md/raid10: prevent unnecessary calls to wake_up() in fast path · 1668533d

余快 authored 2 years ago

mainline inclusion
from md-next
commit 7fdc91928ac109d3d1468ad7f951deb29a375e3d
category: performance
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5PRMO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?h=md-next&id=7fdc91928ac109d3d1468ad7f951deb29a375e3d



--------------------------------

Currently, wake_up() is called unconditionally in fast path such as
raid10_make_request(), which will cause lock contention under high
concurrency:

raid10_make_request
 wake_up
  __wake_up_common_lock
   spin_lock_irqsave

Improve performance by only call wake_up() if waitqueue is not empty.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

1668533d

!122 【kernel-openEuler-1.0-LTS】kernel：fix some issues with 4.19 kernel on openEuler 22.03 system · f25ff47b

openeuler-ci-bot authored 2 years ago

Merge Pull Request from: @tangbinzy 
 
This PR is to adapt the 4.19 kernel to the openEuler 22.03 system, the step one is just for initial kernel use.

Kernel Issue:
1）the problems of 4.19 kernel on openEuler 22.03 system, as follows:
https://gitee.com/openeuler/kernel/issues/I5Q0UG
2）the common problems of 4.19 kernel on 22.03/20.03, as follows:
2.1、https://gitee.com/openeuler/kernel/issues/I5QR5E
2.2、https://gitee.com/openeuler/kernel/issues/I5QSAP
2.3、https://gitee.com/openeuler/kernel/issues/I5RTF5
2.4、https://gitee.com/openeuler/kernel/issues/I5RZPX

Default config change
N/A 
 
Link:https://gitee.com/openeuler/kernel/pulls/122

 
Reviewed-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>

f25ff47b

Sep 29, 2022

mm: sharepool: fix potential AA deadlock · b9f6a788

Guo Mengqi authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5R0X9


CVE: NA

--------------------------------

Fix a AA deadlock caused by nested lock in mg_sp_group_add_task().

Deadlock path:

mg_sp_group_add_task()

    down_write(sp_group_sem)
    find_or_alloc_sp_group()
	!spg_valid()
	sp_group_drop()
	    free_sp_group() -> down_write(sp_group_sem)
    ---> AA deadlock

Signed-off-by: Guo Mengqi <guomengqi3@huawei.com>
Reviewed-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

b9f6a788

mm: sharepool: check size=0 in mg_sp_make_share_k2u() · a541bd47

Guo Mengqi authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5QQPG


CVE: NA

--------------------------------

Add a size-0-check in mg_sp_make_share_k2u() to avoid passing 0-size spa
to __insert_sp_area().

Signed-off-by: Guo Mengqi <guomengqi3@huawei.com>
Reviewed-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

a541bd47

mm: sharepool: delete redundant check in __sp_remap_get_pfn · c6b3415a

Guo Mengqi authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5QETC


CVE: NA

--------------------------------

sp_make_share_k2u only supports vmalloc address now. Therefore, delete a
backup handle case.

Signed-off-by: Guo Mengqi <guomengqi3@huawei.com>
Reviewed-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

c6b3415a

Revert "cifs: fix double free race when mount fails in cifs_get_root()" · 589b2a6c

Luo Meng authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5TMYD


CVE: NA

--------------------------------

This reverts commit 7959a470.

Commit 2fe0e281f7ad (cifs: fix double free race when mount fails
in cifs_get_root()) fixes a double free. However there is no such
issue on 4.19 because it will return after cifs_cleanup_volume_info().

Since merge this patch, cifs_cleanup_volume_info() is skipped, leading
to a memory leak.

Signed-off-by: Luo Meng <luomeng12@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

589b2a6c

Sep 28, 2022

scsi: hisi_sas: Release resource directly in hisi_sas_abort_task() when NCQ error · 64d37f3f

Xingui Yang authored 2 years ago

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5SXSB


CVE: NA

------------------------------------------------

When the port is detached, EH will clear ATA_EH_RESET in ehc->i.action when
call ata_eh_reset(), and device reset won't be executed.

As the disk won't return other I/Os normally after NCQ Error. In addition,
the abort operation is added, then resource release is safe, so release NCQ
command lldd resource directly in hisi_sas_abort_task() when NCQ error
without soft reset to make sure read log command can be executed success
later. But Soft reset still need to be used in other scenario.

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: kang fenglong <kangfenglong@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

64d37f3f

scsi: hisi_sas: Enable force phy when SATA disk directly connected · f32bc74e

Xingui Yang authored 2 years ago

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5QDH7


CVE: NA

----------------------------------

the SAS controller determines the disk to which I/Os are delivered based
on the port id in the DQ entry when SATA disk directly connected.

When the link is intermittently disconnected during I/O sending and the
port id changes and is used by another link, data inconsistency on the
SATA disk may occur during I/O retry. So enable force phy, then force the
command to be executed in a certain phy, and if the port's phy does not
match the phy configured in the command, the chip will stop delivering
I/Os to disk.

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: kang fenglong <kangfenglong@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

f32bc74e

scsi: hisi_sas: Modify v3 HW ATA completion process when SATA disk is in error status · 63c0c05a

Xingui Yang authored 2 years ago

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q63H


CVE: NA

-------------------------------------

When an NCQ error occurs, SAS controller will abnormally complete the I/Os
that newly delivered to disk, and bit8 in CQ dw3 will be set to 1 to
indicate current SATA disk is in error status. The current processing flow
is set ts->stat to SAS_OPEN_REJECT and then sas_ata_task_done() will set
fis stat to ATA_ERR. After analyzed by ata_eh_analyze_tf(), err_mask will
set to AC_ERR_HSM. If media error occurs for four times within 10 minutes
and the chip rejects new I/Os for four times, NCQ will be disabled due to
excessive errors.

However, if media error occurs multiple times, the NCQ mode shouldn't be
disabled. Therefore, use sas_task_abort() to handle abnormally completed
I/Os when SATA disk is in error status.

[10253.397429] hisi_sas_v3_hw 0000:b4:02.0: erroneous completion disk err dev id=2 sas_addr=0x5000000000000605 CQ hdr: 0x400903 0x2007f 0x0 0x80470000
[10253.397430] hisi_sas_v3_hw 0000:b4:02.0: erroneous completion iptt=135 task= pK-error dev id=2 sas_addr=0x5000000000000605 CQ hdr: 0x203 0x20087 0x0 0x100 Error info: 0x0 0x0 0x0 0x0
[10253.397432] hisi_sas_v3_hw 0000:b4:02.0: erroneous completion iptt=136 task= pK-error dev id=2 sas_addr=0x5000000000000605 CQ hdr: 0x203 0x20088 0x0 0x100 Error info: 0x0 0x0 0x0 0x0

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: kang fenglong <kangfenglong@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

63c0c05a

sched: Fix invalid free for tsk->se.dyn_affi_stats · 46212545

Hui Tang authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5TIOZ


CVE: NA

--------------------------------

BUG: KASAN: double-free or invalid-free in sched_prefer_cpus_free[...]

Freed by task 0:
 save_stack mm/kasan/kasan.c:448 [inline]
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x120/0x228 mm/kasan/kasan.c:521
 kasan_slab_free+0x10/0x18 mm/kasan/kasan.c:528
 slab_free_hook mm/slub.c:1397 [inline]
 slab_free_freelist_hook mm/slub.c:1425 [inline]
 slab_free mm/slub.c:3004 [inline]
 kfree+0x84/0x250 mm/slub.c:3965
 sched_prefer_cpus_free+0x58/0x78 kernel/sched/core.c:7219
 free_task+0xb0/0xe8 kernel/fork.c:463
 __delayed_free_task+0x24/0x30 kernel/fork.c:1716
 __rcu_reclaim kernel/rcu/rcu.h:236 [inline]
 rcu_do_batch+0x200/0x5e0 kernel/rcu/tree.c:2584
 invoke_rcu_callbacks kernel/rcu/tree.c:2897 [inline]
 __rcu_process_callbacks kernel/rcu/tree.c:2864 [inline]
 rcu_process_callbacks+0x470/0xb60 kernel/rcu/tree.c:2881
 __do_softirq+0x2d0/0xba0 kernel/softirq.c:292

Add init of 'tsk->se.dyn_affi_stats == NULL' in dup_task_struct().

Fixes: ebca52ab ("sched: Add statistics for scheduler dynamic affinity")
Signed-off-by: Hui Tang <tanghui20@huawei.com>
Reviewed-by: Zhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

46212545

scsi: target: tcmu: Fix warning: 'page' may be used uninitialized · e71f2087

John Donnelly authored 2 years ago

mainline inclusion
from mainline-v5.10-rc1
commit 8c4e0f21
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5SXLB
CVE: NA

--------------------------------

Corrects drivers/target/target_core_user.c:688:6: warning: 'page' may be
used uninitialized.

Link: https://lore.kernel.org/r/20200924001920.43594-1-john.p.donnelly@oracle.com


Fixes: 3c58f737 ("scsi: target: tcmu: Optimize use of flush_dcache_page")
Cc: Mike Christie <michael.christie@oracle.com>
Acked-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Wenchao Hao <haowenchao@huawei.com>
Reviewed-by: lijinlin <lijinlin3@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

e71f2087

scsi: target: tcmu: Fix crash on ARM during cmd completion · d0b7519e

Bodo Stroesser authored 2 years ago

mainline inclusion
from mainline-v5.9-rc1
commit 5a0c256d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5SXLB
CVE: NA

--------------------------------

If tcmu_handle_completions() has to process a padding shorter than
sizeof(struct tcmu_cmd_entry), the current call to
tcmu_flush_dcache_range() with sizeof(struct tcmu_cmd_entry) as length
param is wrong and causes crashes on e.g. ARM, because
tcmu_flush_dcache_range() in this case calls
flush_dcache_page(vmalloc_to_page(start)); with start being an invalid
address above the end of the vmalloc'ed area.

The fix is to use the minimum of remaining ring space and sizeof(struct
tcmu_cmd_entry) as the length param.

The patch was tested on kernel 4.19.118.

See https://bugzilla.kernel.org/show_bug.cgi?id=208045#c10

Link: https://lore.kernel.org/r/20200629093756.8947-1-bstroesser@ts.fujitsu.com


Tested-by: JiangYu <lnsyyj@hotmail.com>
Acked-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Wenchao Hao <haowenchao@huawei.com>
Reviewed-by: lijinlin <lijinlin3@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

d0b7519e

scsi: target: tcmu: Optimize use of flush_dcache_page · 75a8f1b4

Bodo Stroesser authored 2 years ago

mainline inclusion
from mainline-v5.9-rc1
commit 3c58f737
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5SXLB
CVE: NA

--------------------------------

(scatter|gather)_data_area() need to flush dcache after writing data to or
before reading data from a page in uio data area.  The two routines are
able to handle data transfer to/from such a page in fragments and flush the
cache after each fragment was copied by calling the wrapper
tcmu_flush_dcache_range().

That means:

1) flush_dcache_page() can be called multiple times for the same page.

2) Calling flush_dcache_page() indirectly using the wrapper does not make
   sense, because each call of the wrapper is for one single page only and
   the calling routine already has the correct page pointer.

Change (scatter|gather)_data_area() such that, instead of calling
tcmu_flush_dcache_range() before/after each memcpy, it now calls
flush_dcache_page() before unmapping a page (when writing is complete for
that page) or after mapping a page (when starting to read the page).

After this change only calls to tcmu_flush_dcache_range() for addresses in
vmalloc'ed command ring are left over.

The patch was tested on ARM with kernel 4.19.118 and 5.7.2

Link: https://lore.kernel.org/r/20200618131632.32748-2-bstroesser@ts.fujitsu.com


Tested-by: JiangYu <lnsyyj@hotmail.com>
Tested-by: Daniel Meyerholt <dxm523@gmail.com>
Acked-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Wenchao Hao <haowenchao@huawei.com>
Reviewed-by: lijinlin <lijinlin3@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

75a8f1b4

scsi: target: tcmu: Fix size in calls to tcmu_flush_dcache_range · 1981211d

Bodo Stroesser authored 2 years ago

mainline inclusion
from mainline-v5.8-rc1
commit 8c4e0f21
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5SXLB
CVE: NA

--------------------------------

1) If remaining ring space before the end of the ring is smaller then the
   next cmd to write, tcmu writes a padding entry which fills the remaining
   space at the end of the ring.

   Then tcmu calls tcmu_flush_dcache_range() with the size of struct
   tcmu_cmd_entry as data length to flush.  If the space filled by the
   padding was smaller then tcmu_cmd_entry, tcmu_flush_dcache_range() is
   called for an address range reaching behind the end of the vmalloc'ed
   ring.

   tcmu_flush_dcache_range() in a loop calls
   flush_dcache_page(virt_to_page(start)); for every page being part of the
   range. On x86 the line is optimized out by the compiler, as
   flush_dcache_page() is empty on x86.

   But I assume the above can cause trouble on other architectures that
   really have a flush_dcache_page().  For paddings only the header part of
   an entry is relevant due to alignment rules the header always fits in
   the remaining space, if padding is needed.  So tcmu_flush_dcache_range()
   can safely be called with sizeof(entry->hdr) as the length here.

2) After it has written a command to cmd ring, tcmu calls
   tcmu_flush_dcache_range() using the size of a struct tcmu_cmd_entry as
   data length to flush.  But if a command needs many iovecs, the real size
   of the command may be bigger then tcmu_cmd_entry, so a part of the
   written command is not flushed then.

Link: https://lore.kernel.org/r/20200528193108.9085-1-bstroesser@ts.fujitsu.com


Acked-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Wenchao Hao <haowenchao@huawei.com>
Reviewed-by: lijinlin <lijinlin3@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

1981211d

signal: fix deadlock caused by calling printk() under sighand->siglock · ab099ea9

Ye Weihua authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5T8FD


CVE: NA

--------------------------------

__dend_signal_locked() invokes __sigqueue_alloc() which may invoke a
normal printk() to print failure message. This can cause a deadlock in
the scenario reported by syz-bot below (test in 5.10):

	CPU0				CPU1
	----				----
	lock(&sighand->siglock);
					lock(&tty->read_wait);
					lock(&sighand->siglock);
	lock(console_owner);

This patch specities __GFP_NOWARN to __sigqueue_alloc(), so that printk
will not be called, and this deadlock problem can be avoided.

Syzbot reported the following lockdep error:

======================================================
WARNING: possible circular locking dependency detected
5.10.0-04424-ga472e3c833d3 #1 Not tainted
------------------------------------------------------
syz-executor.2/31970 is trying to acquire lock:
ffffa00014066a60 (console_owner){-.-.}-{0:0}, at: console_trylock_spinning+0xf0/0x2e0 kernel/printk/printk.c:1854

but task is already holding lock:
ffff0000ddb38a98 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x60/0x260 kernel/signal.c:1322

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (&sighand->siglock){-.-.}-{2:2}:
       validate_chain+0x6dc/0xb0c kernel/locking/lockdep.c:3728
       __lock_acquire+0x498/0x940 kernel/locking/lockdep.c:4954
       lock_acquire+0x228/0x580 kernel/locking/lockdep.c:5564
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
       _raw_spin_lock_irqsave+0xc0/0x15c kernel/locking/spinlock.c:159
       __lock_task_sighand+0xf0/0x370 kernel/signal.c:1396
       lock_task_sighand include/linux/sched/signal.h:699 [inline]
       task_work_add+0x1f8/0x2a0 kernel/task_work.c:58
       io_req_task_work_add+0x98/0x10c fs/io_uring.c:2115
       __io_async_wake+0x338/0x780 fs/io_uring.c:4984
       io_poll_wake+0x40/0x50 fs/io_uring.c:5461
       __wake_up_common+0xcc/0x2a0 kernel/sched/wait.c:93
       __wake_up_common_lock+0xd0/0x130 kernel/sched/wait.c:123
       __wake_up+0x1c/0x24 kernel/sched/wait.c:142
       pty_set_termios+0x1ac/0x2d0 drivers/tty/pty.c:286
       tty_set_termios+0x310/0x46c drivers/tty/tty_ioctl.c:334
       set_termios.part.0+0x2dc/0xa50 drivers/tty/tty_ioctl.c:414
       set_termios drivers/tty/tty_ioctl.c:368 [inline]
       tty_mode_ioctl+0x4f4/0xbec drivers/tty/tty_ioctl.c:736
       n_tty_ioctl_helper+0x74/0x260 drivers/tty/tty_ioctl.c:883
       n_tty_ioctl+0x80/0x3d0 drivers/tty/n_tty.c:2516
       tty_ioctl+0x508/0x1100 drivers/tty/tty_io.c:2751
       vfs_ioctl fs/ioctl.c:48 [inline]
       __do_sys_ioctl fs/ioctl.c:753 [inline]
       __se_sys_ioctl fs/ioctl.c:739 [inline]
       __arm64_sys_ioctl+0x12c/0x18c fs/ioctl.c:739
       __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
       el0_svc_common.constprop.0+0xf8/0x420 arch/arm64/kernel/syscall.c:155
       do_el0_svc+0x50/0x120 arch/arm64/kernel/syscall.c:217
       el0_svc+0x20/0x30 arch/arm64/kernel/entry-common.c:353
       el0_sync_handler+0xe4/0x1e0 arch/arm64/kernel/entry-common.c:369
       el0_sync+0x148/0x180 arch/arm64/kernel/entry.S:683

-> #3 (&tty->read_wait){....}-{2:2}:
       validate_chain+0x6dc/0xb0c kernel/locking/lockdep.c:3728
       __lock_acquire+0x498/0x940 kernel/locking/lockdep.c:4954
       lock_acquire+0x228/0x580 kernel/locking/lockdep.c:5564
       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
       _raw_spin_lock+0xa0/0x120 kernel/locking/spinlock.c:151
       spin_lock include/linux/spinlock.h:354 [inline]
       io_poll_double_wake+0x158/0x30c fs/io_uring.c:5093
       __wake_up_common+0xcc/0x2a0 kernel/sched/wait.c:93
       __wake_up_common_lock+0xd0/0x130 kernel/sched/wait.c:123
       __wake_up+0x1c/0x24 kernel/sched/wait.c:142
       pty_close+0x1bc/0x330 drivers/tty/pty.c:68
       tty_release+0x1e0/0x88c drivers/tty/tty_io.c:1761
       __fput+0x1dc/0x500 fs/file_table.c:281
       ____fput+0x24/0x30 fs/file_table.c:314
       task_work_run+0xf4/0x1ec kernel/task_work.c:151
       tracehook_notify_resume include/linux/tracehook.h:188 [inline]
       do_notify_resume+0x378/0x410 arch/arm64/kernel/signal.c:718
       work_pending+0xc/0x198

-> #2 (&tty->write_wait){....}-{2:2}:
       validate_chain+0x6dc/0xb0c kernel/locking/lockdep.c:3728
       __lock_acquire+0x498/0x940 kernel/locking/lockdep.c:4954
       lock_acquire+0x228/0x580 kernel/locking/lockdep.c:5564
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
       _raw_spin_lock_irqsave+0xc0/0x15c kernel/locking/spinlock.c:159
       __wake_up_common_lock+0xb0/0x130 kernel/sched/wait.c:122
       __wake_up+0x1c/0x24 kernel/sched/wait.c:142
       tty_wakeup+0x54/0xbc drivers/tty/tty_io.c:539
       tty_port_default_wakeup+0x38/0x50 drivers/tty/tty_port.c:50
       tty_port_tty_wakeup+0x3c/0x50 drivers/tty/tty_port.c:388
       uart_write_wakeup+0x38/0x60 drivers/tty/serial/serial_core.c:106
       pl011_tx_chars+0x530/0x5c0 drivers/tty/serial/amba-pl011.c:1418
       pl011_start_tx_pio drivers/tty/serial/amba-pl011.c:1303 [inline]
       pl011_start_tx+0x1b4/0x430 drivers/tty/serial/amba-pl011.c:1315
       __uart_start.isra.0+0xb4/0xcc drivers/tty/serial/serial_core.c:127
       uart_write+0x21c/0x460 drivers/tty/serial/serial_core.c:613
       process_output_block+0x120/0x3ac drivers/tty/n_tty.c:590
       n_tty_write+0x2c8/0x650 drivers/tty/n_tty.c:2383
       do_tty_write drivers/tty/tty_io.c:1028 [inline]
       file_tty_write.constprop.0+0x2d0/0x520 drivers/tty/tty_io.c:1118
       tty_write drivers/tty/tty_io.c:1125 [inline]
       redirected_tty_write+0xe4/0x104 drivers/tty/tty_io.c:1147
       call_write_iter include/linux/fs.h:1960 [inline]
       new_sync_write+0x264/0x37c fs/read_write.c:515
       vfs_write+0x694/0x9d0 fs/read_write.c:602
       ksys_write+0xfc/0x200 fs/read_write.c:655
       __do_sys_write fs/read_write.c:667 [inline]
       __se_sys_write fs/read_write.c:664 [inline]
       __arm64_sys_write+0x50/0x60 fs/read_write.c:664
       __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
       invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
       el0_svc_common.constprop.0+0xf8/0x420 arch/arm64/kernel/syscall.c:155
       do_el0_svc+0x50/0x120 arch/arm64/kernel/syscall.c:217
       el0_svc+0x20/0x30 arch/arm64/kernel/entry-common.c:353
       el0_sync_handler+0xe4/0x1e0 arch/arm64/kernel/entry-common.c:369
       el0_sync+0x148/0x180 arch/arm64/kernel/entry.S:683

-> #1 (&port_lock_key){-.-.}-{2:2}:
       validate_chain+0x6dc/0xb0c kernel/locking/lockdep.c:3728
       __lock_acquire+0x498/0x940 kernel/locking/lockdep.c:4954
       lock_acquire+0x228/0x580 kernel/locking/lockdep.c:5564
       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
       _raw_spin_lock+0xa0/0x120 kernel/locking/spinlock.c:151
       spin_lock include/linux/spinlock.h:354 [inline]
       pl011_console_write+0x2f0/0x410 drivers/tty/serial/amba-pl011.c:2263
       call_console_drivers.constprop.0+0x1f8/0x3b0 kernel/printk/printk.c:1932
       console_unlock+0x36c/0x9ec kernel/printk/printk.c:2553
       vprintk_emit+0x40c/0x4b0 kernel/printk/printk.c:2075
       vprintk_default+0x48/0x54 kernel/printk/printk.c:2092
       vprintk_func+0x1f0/0x40c kernel/printk/printk_safe.c:404
       printk+0xbc/0xf0 kernel/printk/printk.c:2123
       register_console+0x580/0x790 kernel/printk/printk.c:2905
       uart_configure_port.constprop.0+0x4a0/0x4e0 drivers/tty/serial/serial_core.c:2431
       uart_add_one_port+0x378/0x550 drivers/tty/serial/serial_core.c:2944
       pl011_register_port+0xb4/0x210 drivers/tty/serial/amba-pl011.c:2686
       pl011_probe+0x334/0x3ec drivers/tty/serial/amba-pl011.c:2736
       amba_probe+0x14c/0x2f0 drivers/amba/bus.c:283
       really_probe+0x210/0xa5c drivers/base/dd.c:562
       driver_probe_device+0x1c8/0x280 drivers/base/dd.c:747
       __device_attach_driver+0x18c/0x260 drivers/base/dd.c:853
       bus_for_each_drv+0x120/0x1a0 drivers/base/bus.c:431
       __device_attach+0x16c/0x3b4 drivers/base/dd.c:922
       device_initial_probe+0x28/0x34 drivers/base/dd.c:971
       bus_probe_device+0x124/0x13c drivers/base/bus.c:491
       fw_devlink_resume+0x164/0x270 drivers/base/core.c:1601
       of_platform_default_populate_init+0xf4/0x114 drivers/of/platform.c:543
       do_one_initcall+0x11c/0x770 init/main.c:1217
       do_initcall_level+0x364/0x388 init/main.c:1290
       do_initcalls+0x90/0xc0 init/main.c:1306
       do_basic_setup init/main.c:1326 [inline]
       kernel_init_freeable+0x57c/0x63c init/main.c:1529
       kernel_init+0x1c/0x20c init/main.c:1417
       ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1034

-> #0 (console_owner){-.-.}-{0:0}:
       check_prev_add+0xe0/0x105c kernel/locking/lockdep.c:2988
       check_prevs_add+0x1c8/0x3d4 kernel/locking/lockdep.c:3113
       validate_chain+0x6dc/0xb0c kernel/locking/lockdep.c:3728
       __lock_acquire+0x498/0x940 kernel/locking/lockdep.c:4954
       lock_acquire+0x228/0x580 kernel/locking/lockdep.c:5564
       console_trylock_spinning+0x130/0x2e0 kernel/printk/printk.c:1875
       vprintk_emit+0x268/0x4b0 kernel/printk/printk.c:2074
       vprintk_default+0x48/0x54 kernel/printk/printk.c:2092
       vprintk_func+0x1f0/0x40c kernel/printk/printk_safe.c:404
       printk+0xbc/0xf0 kernel/printk/printk.c:2123
       fail_dump lib/fault-inject.c:45 [inline]
       should_fail+0x2a0/0x370 lib/fault-inject.c:146
       __should_failslab+0x8c/0xe0 mm/failslab.c:33
       should_failslab+0x14/0x2c mm/slab_common.c:1181
       slab_pre_alloc_hook mm/slab.h:495 [inline]
       slab_alloc_node mm/slub.c:2842 [inline]
       slab_alloc mm/slub.c:2931 [inline]
       kmem_cache_alloc+0x8c/0xe64 mm/slub.c:2936
       __sigqueue_alloc+0x224/0x5a4 kernel/signal.c:437
       __send_signal+0x700/0xeac kernel/signal.c:1121
       send_signal+0x348/0x6a0 kernel/signal.c:1247
       force_sig_info_to_task+0x184/0x260 kernel/signal.c:1339
       force_sig_fault_to_task kernel/signal.c:1678 [inline]
       force_sig_fault+0xb0/0xf0 kernel/signal.c:1685
       arm64_force_sig_fault arch/arm64/kernel/traps.c:182 [inline]
       arm64_notify_die arch/arm64/kernel/traps.c:208 [inline]
       arm64_notify_die+0xdc/0x160 arch/arm64/kernel/traps.c:199
       do_sp_pc_abort+0x4c/0x60 arch/arm64/mm/fault.c:794
       el0_pc+0xd8/0x19c arch/arm64/kernel/entry-common.c:309
       el0_sync_handler+0x12c/0x1e0 arch/arm64/kernel/entry-common.c:394
       el0_sync+0x148/0x180 arch/arm64/kernel/entry.S:683

other info that might help us debug this:

Chain exists of:
	console_owner --> &tty->read_wait --> &sighand->siglock

Signed-off-by: Ye Weihua <yeweihua4@huawei.com>
Reviewed-by: Kuohai Xu <xukuohai@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

ab099ea9

mm: fix missing handler for __GFP_NOWARN · 027e2638

Qi Zheng authored 2 years ago

mainline inclusion
from mainline-v5.19-rc1
commit 3f913fc5f9745613088d3c569778c9813ab9c129
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5T8FD
CVE: NA

--------------------------------

We expect no warnings to be issued when we specify __GFP_NOWARN, but
currently in paths like alloc_pages() and kmalloc(), there are still some
warnings printed, fix it.

But for some warnings that report usage problems, we don't deal with them.
If such warnings are printed, then we should fix the usage problems.
Such as the following case:

	WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));

[zhengqi.arch@bytedance.com: v2]
 Link: https://lkml.kernel.org/r/20220511061951.1114-1-zhengqi.arch@bytedance.com
Link: https://lkml.kernel.org/r/20220510113809.80626-1-zhengqi.arch@bytedance.com


Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Conflict:
	mm/internal.h
	mm/page_alloc.c

Signed-off-by: Ye Weihua <yeweihua4@huawei.com>
Reviewed-by: Kuohai Xu <xukuohai@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

027e2638

Sep 27, 2022

KVM: x86/pmu: Update AMD PMC sample period to fix guest NMI-watchdog · 7a3eccfa

Like Xu authored 2 years ago

mainline inclusion
from mainline-v5.18
commit 75189d1de1b377e580ebd2d2c55914631eac9c64
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5SDUS


CVE: NA

-------------

NMI-watchdog is one of the favorite features of kernel developers,
but it does not work in AMD guest even with vPMU enabled and worse,
the system misrepresents this capability via /proc.

This is a PMC emulation error. KVM does not pass the latest valid
value to perf_event in time when guest NMI-watchdog is running, thus
the perf_event corresponding to the watchdog counter will enter the
old state at some point after the first guest NMI injection, forcing
the hardware register PMC0 to be constantly written to 0x800000000001.

Meanwhile, the running counter should accurately reflect its new value
based on the latest coordinated pmc->counter (from vPMC's point of view)
rather than the value written directly by the guest.

Fixes: 168d918f ("KVM: x86: Adjust counter sample period after a wrmsr")
Reported-by: Dongli Cao <caodongli@kingsoft.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Tested-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20220409015226.38619-1-likexu@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
[Yanan Wang: Adapt the code to linux v4.19]
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

4.19.90-2209.6.0

7a3eccfa