Commits · 2666666a66b4085dbfbbf476d4f3e0f604f8b736 · Summer2022 / 22b970264

Nov 29, 2021

sched: Introduce qos scheduler for co-location · 2666666a

Zheng Zucheng authored 3 years ago

hulk inclusion
category: feature
bugzilla: 51828, https://gitee.com/openeuler/kernel/issues/I4K96G


CVE: NA

--------------------------------

We introduce the idea of qos level to scheduler, which now is
supported with different scheduler policies. The qos scheduler
will change the policy of correlative tasks when the qos level
of a task group is modified with cpu.qos_level cpu cgroup file.
In this way we are able to satisfy different needs of tasks in
different qos levels.

Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: Zheng Zucheng <zhengzucheng@huawei.com>
Reviewed-by: Chen Hui <judy.chenhui@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

2666666a

Nov 27, 2021

io_uring: return back safer resurrect · 973d2c13

Pavel Begunkov authored 3 years ago


mainline inclusion
from mainline-v5.13-rc1
commit f70865db5ff35f5ed0c7e9ef63e7cca3d4947f04
category: bugfix
bugzilla: 185739
CVE: NA

-----------------------------------------------

Revert of revert of "io_uring: wait potential ->release() on resurrect",
which adds a helper for resurrect not racing completion reinit, as was
removed because of a strange bug with no clear root or link to the
patch.

Was improved, instead of rcu_synchronize(), just wait_for_completion()
because we're at 0 refs and it will happen very shortly. Specifically
use non-interruptible version to ignore all pending signals that may
have ended prior interruptible wait.

This reverts commit cb5e1b81304e089ee3ca948db4d29f71902eb575.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7a080c20f686d026efade810b116b72f88abaff9.1618101759.git.asml.silence@gmail.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

conflicts:
fs/io_uring.c

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

973d2c13

cpufreq: Fix get_cpu_device() failed in add_cpu_dev_symlink() · 97e7efbb

Xiongfeng Wang authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4HYY4?from=project-issue


CVE: NA

-------------------------------------------------

When I hot added a CPU, I found 'cpufreq' directory is not created below
/sys/devices/system/cpu/cpuX/. It is because get_cpu_device() failed in
add_cpu_dev_symlink().

cpufreq_add_dev() is the .add_dev callback of a CPU subsys interface. It
will be called when the CPU device registered into the system. The stack
is as follows.
  register_cpu()
  ->device_register()
   ->device_add()
    ->bus_probe_device()
     ->cpufreq_add_dev()

But only after the CPU device has been registered, we can get the CPU
device by get_cpu_device(), otherwise it will return NULL. Since we
already have the CPU device in cpufreq_add_dev(), pass it to
add_cpu_dev_symlink().

Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Cheng Jian <cj.chengjian@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

97e7efbb

ACPI: CPPC: Fix cppc_cpufreq_init failed in CPU Hotplug situation · b8815fbb

Xiongfeng Wang authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4HYY4?from=project-issue


CVE: NA

-------------------------------------------------

Per-CPU variables cpc_desc_ptr are initialized in
acpi_cppc_processor_probe() when the processor devices are present and
added into the system. But when cpu_possible_mask and cpu_present_mask
is not equal, only cpc_desc_ptr in cpu_present_mask are initialized,
this will cause acpi_get_psd_map() failed in cppc_cpufreq_init().

To fix this issue, we parse the _PSD method for all possible CPUs to get
the P-State topology and modify acpi_get_psd_map() to rely on this
information.

Signed-off-by: Xiongfeng Wang <wangxiongfeng@huawei.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Cheng Jian <cj.chengjian@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

b8815fbb

Nov 24, 2021

lib/clear_user: ensure loop in __arch_clear_user cache-aligned v2 · ed39950b

Cheng Jian authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I3OX0C


CVE: NA

--------------------------------

We must ensure that the following four instructions are cache-aligned.
Otherwise, it will cause problems with the performance of libMicro
pread.

1:
        # uao_user_alternative 9f, str, sttr, xzr, x0, 8
        str     xzr, [x0], #8
        nop
        subs    x1, x1, #8
        b.pl    1b

with this patch:

             prc thr   usecs/call      samples   errors cnt/samp     size
pread_z100     1   1      5.88400          807        0 1            102400

The result of pread can range from 5 to 9 depending on  the
alignment performance of this function.

Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

ed39950b

Nov 23, 2021

drm/ioctl: Ditch DRM_UNLOCKED except for the legacy vblank ioctl · 9a20655d

Daniel Vetter authored 3 years ago

mainline inclusion
from mainline-v5.4-rc1
commit 75426367
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4JI7R


CVE: NA

--------------------------------

This completes Emil's series of removing DRM_UNLOCKED from modern
drivers. It's entirely cargo-culted since we ignore it on
non-DRIVER_LEGACY drivers since:

commit ea487835
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Mon Sep 28 21:42:40 2015 +0200

    drm: Enforce unlocked ioctl operation for kms driver ioctls

Now justifying why we can do this for legacy drives too (and hence
close the source of all the bogus copypasting) is a bit more involved.
DRM_UNLOCKED was introduced in:

commit ed8b6704
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Wed Dec 16 22:17:09 2009 +0000

    drm: convert drm_ioctl to unlocked_ioctl

As a immediate hack to keep i810 happy, which would have deadlocked
without this trickery. The old BKL is automatically dropped in
schedule(), and hence the i810 vs. mmap_sem deadlock didn't actually
cause a real deadlock. But with a mutex it would. The solution was to
annotate these as DRM_UNLOCKED and mark i810 unsafe on SMP machines.

This conversion caused a regression, because unlike the BKL a mutex
isn't dropped over schedule (that thing again), which caused a vblank
wait in one thread to block the entire desktop and all its apps. Back
then we did vblank scheduling by blocking in the client, awesome isn't
it. This was fixed quickly in (ok not so quickly, took 2 years):

commit 8f4ff2b0
Author: Ilija Hadzic <ihadzic@research.bell-labs.com>
Date:   Mon Oct 31 17:46:18 2011 -0400

    drm: do not sleep on vblank while holding a mutex

All the other DRM_UNLOCKED annotations for all the core ioctls was
work to reach finer-grained locking for modern drivers. This took
years, and culminated in:

commit fdd5b877
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Sat Dec 10 22:52:54 2016 +0100

    drm: Enforce BKL-less ioctls for modern drivers

DRM_UNLOCKED was never required by any legacy drivers, except for the
vblank_wait IOCTL. Therefore we will not regress these old drivers by
going back to where we've been in 2011. For all modern drivers nothing
will change.

To make this perfectly clear, also add a comment to DRM_UNLOCKED.

v2: Don't forget about drm_ioc32.c (Michel).

Cc: Michel Dänzer <michel@daenzer.net>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190605120835.2798-1-daniel.vetter@ffwll.ch


Signed-off-by: Liu ZiXian <liuzixian4@huawei.com>
Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
Reviewed-by: wangxiongfeng 00379786 <wangxiongfeng2@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

4.19.90-2111.6.0

9a20655d

config: Enable some configs for test · c7e0fb5b

Yang Yingliang authored 3 years ago


hulk inclusion
category: bugfix
bugzilla: NA
CVE: NA

--------------------------------

Enable some configs for test.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Cheng Jian <cj.chengjian@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

c7e0fb5b

share_pool: add mm address check when access the process's sp_group file · f72709d7

Zhang Jian authored 3 years ago

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4JICC


CVE: NA

-------------------------------------------------

When we access the process's sp_group file and the precess is a kernel
process, it's task_struct->mm value will be 0, so we must check it, and
make sure the process is not a kernel process.

v1->v2: The path of a process as a kernel process is not often triggered,
so add a unlikely function to accelerate execytion.

Signed-off-by: Zhang Jian <zhangjian210@huawei.com>
Reviewed-by: Ding Tianhong <dingtianhong@huawei.com>
Reviewed-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

f72709d7

Nov 22, 2021

rq-qos: fix missed wake-ups in rq_qos_throttle try two · fbf4e285

Jan Kara authored 3 years ago

stable inclusion
from stable-5.10.51
commit 8cc58a6e2c394aa48aa05f600be7d279efbafcd7
bugzilla: 175263

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8cc58a6e2c394aa48aa05f600be7d279efbafcd7



--------------------------------

commit 11c7aa0ddea8611007768d3e6b58d45dc60a19e1 upstream.

Commit 545fbd07 ("rq-qos: fix missed wake-ups in rq_qos_throttle")
tried to fix a problem that a process could be sleeping in rq_qos_wait()
without anyone to wake it up. However the fix is not complete and the
following can still happen:

CPU1 (waiter1)		CPU2 (waiter2)		CPU3 (waker)
rq_qos_wait()		rq_qos_wait()
  acquire_inflight_cb() -> fails
			  acquire_inflight_cb() -> fails

						completes IOs, inflight
						  decreased
  prepare_to_wait_exclusive()
			  prepare_to_wait_exclusive()
  has_sleeper = !wq_has_single_sleeper() -> true as there are two sleepers
			  has_sleeper = !wq_has_single_sleeper() -> true
  io_schedule()		  io_schedule()

Deadlock as now there's nobody to wakeup the two waiters. The logic
automatically blocking when there are already sleepers is really subtle
and the only way to make it work reliably is that we check whether there
are some waiters in the queue when adding ourselves there. That way, we
are guaranteed that at least the first process to enter the wait queue
will recheck the waiting condition before going to sleep and thus
guarantee forward progress.

Fixes: 545fbd07 ("rq-qos: fix missed wake-ups in rq_qos_throttle")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210607112613.25344-1-jack@suse.cz


Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Lihong Kou <koulihong@huawei.com>
Reviewed-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

4.19.90-2111.5.0

fbf4e285

atlantic: Fix OOB read and write in hw_atl_utils_fw_rpc_wait · 060f78dd

Zekun Shen authored 3 years ago


mainline inclusion
from mainline-v5.16-rc2
commit b922f622592af76b57cbc566eaeccda0b31a3496
category: bugfix
bugzilla: NA
CVE: CVE-2021-43975

-------------------------------------------------

This bug report shows up when running our research tools. The
reports is SOOB read, but it seems SOOB write is also possible
a few lines below.

In details, fw.len and sw.len are inputs coming from io. A len
over the size of self->rpc triggers SOOB. The patch fixes the
bugs by adding sanity checks.

The bugs are triggerable with compromised/malfunctioning devices.
They are potentially exploitable given they first leak up to
0xffff bytes and able to overwrite the region later.

The patch is tested with QEMU emulater.
This is NOT tested with a real device.

Attached is the log we found by fuzzing.

BUG: KASAN: slab-out-of-bounds in
	hw_atl_utils_fw_upload_dwords+0x393/0x3c0 [atlantic]
Read of size 4 at addr ffff888016260b08 by task modprobe/213
CPU: 0 PID: 213 Comm: modprobe Not tainted 5.6.0 #1
Call Trace:
 dump_stack+0x76/0xa0
 print_address_description.constprop.0+0x16/0x200
 ? hw_atl_utils_fw_upload_dwords+0x393/0x3c0 [atlantic]
 ? hw_atl_utils_fw_upload_dwords+0x393/0x3c0 [atlantic]
 __kasan_report.cold+0x37/0x7c
 ? aq_hw_read_reg_bit+0x60/0x70 [atlantic]
 ? hw_atl_utils_fw_upload_dwords+0x393/0x3c0 [atlantic]
 kasan_report+0xe/0x20
 hw_atl_utils_fw_upload_dwords+0x393/0x3c0 [atlantic]
 hw_atl_utils_fw_rpc_call+0x95/0x130 [atlantic]
 hw_atl_utils_fw_rpc_wait+0x176/0x210 [atlantic]
 hw_atl_utils_mpi_create+0x229/0x2e0 [atlantic]
 ? hw_atl_utils_fw_rpc_wait+0x210/0x210 [atlantic]
 ? hw_atl_utils_initfw+0x9f/0x1c8 [atlantic]
 hw_atl_utils_initfw+0x12a/0x1c8 [atlantic]
 aq_nic_ndev_register+0x88/0x650 [atlantic]
 ? aq_nic_ndev_init+0x235/0x3c0 [atlantic]
 aq_pci_probe+0x731/0x9b0 [atlantic]
 ? aq_pci_func_init+0xc0/0xc0 [atlantic]
 local_pci_probe+0xd3/0x160
 pci_device_probe+0x23f/0x3e0

Reported-by: Brendan Dolan-Gavitt <brendandg@nyu.edu>
Signed-off-by: Zekun Shen <bruceshenzk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

060f78dd

drivers : remove drivers/mtd/hisilicon/sfc · 5fd27bfc

fengsheng authored 3 years ago

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4IYX7?from=project-issue


CVE: NA

------------------------------------------------------------

This driver is not in use. Remove it.

Signed-off-by: fengsheng <fengsheng5@huawei.com>
Reviewed-by: lidongming <lidongming5@huawei.com>
Reviewed-by: ouyang delong <ouyangdelong@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

5fd27bfc

drivers : remove drivers/soc/hisilicon/sysctl · bc3771fa

fengsheng authored 3 years ago

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4IYWW?from=project-issue


CVE: NA

------------------------------------------------------------

This driver is not in use. Remove it.

Signed-off-by: fengsheng <fengsheng5@huawei.com>
Reviewed-by: lidongming <lidongming5@huawei.com>
Reviewed-by: ouyang delong <ouyangdelong@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

bc3771fa

drivers : remove drivers/soc/hisilicon/lbc · 463a23d0

fengsheng authored 3 years ago

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4IYWE?from=project-issue


CVE: NA

------------------------------------------------------------

This driver is not in use. Remove it.

Signed-off-by: fengsheng <fengsheng5@huawei.com>
Reviewed-by: lidongming <lidongming5@huawei.com>
Reviewed-by: ouyang delong <ouyangdelong@huawei.com>
Acked-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

463a23d0

Nov 18, 2021

ipv4: fix uninitialized warnings in fnhe_remove_oldest() · f9072e26

Xu Jia authored 3 years ago


hulk inclusion
category: bugfix
bugzilla: 177871
CVE: NA

-------------------------------------------------

The following warning is falsely reported since commit
e2eea86c (ipv4: make exception cache less predictible):

  error: ‘oldest_p’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
    *oldest_p = oldest->fnhe_next;
    ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~
  net/ipv4/route.c:602:44: note: ‘oldest_p’ was declared here
    struct fib_nh_exception __rcu **fnhe_p, **oldest_p;

Fix and avoid the alarm.

Signed-off-by: Xu Jia <xujia39@huawei.com>
Reviewed-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

f9072e26

crypto: public_key: fix overflow during implicit conversion · d0562f6c

zhenwei pi authored 3 years ago


stable inclusion
from linux-4.19.207
commit aab312696d37de80502ca633b40184de24f22917

--------------------------------

commit f985911b7bc75d5c98ed24d8aaa8b94c590f7c6a upstream.

Hit kernel warning like this, it can be reproduced by verifying 256
bytes datafile by keyctl command, run script:
RAWDATA=rawdata
SIGDATA=sigdata

modprobe pkcs8_key_parser

rm -rf *.der *.pem *.pfx
rm -rf $RAWDATA
dd if=/dev/random of=$RAWDATA bs=256 count=1

openssl req -nodes -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem \
  -subj "/C=CN/ST=GD/L=SZ/O=vihoo/OU=dev/CN=xx.com/emailAddress=yy@xx.com"

KEY_ID=`openssl pkcs8 -in key.pem -topk8 -nocrypt -outform DER | keyctl \
  padd asymmetric 123 @s`

keyctl pkey_sign $KEY_ID 0 $RAWDATA enc=pkcs1 hash=sha1 > $SIGDATA
keyctl pkey_verify $KEY_ID 0 $RAWDATA $SIGDATA enc=pkcs1 hash=sha1

Then the kernel reports:
 WARNING: CPU: 5 PID: 344556 at crypto/rsa-pkcs1pad.c:540
   pkcs1pad_verify+0x160/0x190
 ...
 Call Trace:
  public_key_verify_signature+0x282/0x380
  ? software_key_query+0x12d/0x180
  ? keyctl_pkey_params_get+0xd6/0x130
  asymmetric_key_verify_signature+0x66/0x80
  keyctl_pkey_verify+0xa5/0x100
  do_syscall_64+0x35/0xb0
  entry_SYSCALL_64_after_hwframe+0x44/0xae

The reason of this issue, in function 'asymmetric_key_verify_signature':
'.digest_size(u8) = params->in_len(u32)' leads overflow of an u8 value,
so use u32 instead of u8 for digest_size field. And reorder struct
public_key_signature, it saves 8 bytes on a 64-bit machine.

Cc: stable@vger.kernel.org
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

d0562f6c

net: bridge: fix stale eth hdr pointer in br_dev_xmit · c7cc377d

Nikolay Aleksandrov authored 3 years ago


mainline inclusion
from mainline-v5.6-rc4
commit 823d81b0
category: bugfix
bugzilla: 185773
CVE: NA

-------------------------------------------------

In br_dev_xmit() we perform vlan filtering in br_allowed_ingress() but
if the packet has the vlan header inside (e.g. bridge with disabled
tx-vlan-offload) then the vlan filtering code will use skb_vlan_untag()
to extract the vid before filtering which in turn calls pskb_may_pull()
and we may end up with a stale eth pointer. Moreover the cached eth header
pointer will generally be wrong after that operation. Remove the eth header
caching and just use eth_hdr() directly, the compiler does the right thing
and calculates it only once so we don't lose anything.

Fixes: 057658cb ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Huang Guobin <huangguobin4@huawei.com>
Reviewed-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

c7cc377d

Nov 17, 2021

x86/entry: Make entry_64_compat.S objtool clean · 427284b2

Peter Zijlstra authored 3 years ago


mainline inclusion
from mainline-v5.8-rc1
commit 1c3e5d3f
category: feature
bugzilla: 175666
CVE: NA

---------------------------

Currently entry_64_compat is exempt from objtool, but with vmlinux
mode there is no hiding it.

Make the following changes to make it pass:

 - change entry_SYSENTER_compat to STT_NOTYPE; it's not a function
   and doesn't have function type stack setup.

 - mark all STT_NOTYPE symbols with UNWIND_HINT_EMPTY; so we do
   validate them and don't treat them as unreachable.

 - don't abuse RSP as a temp register, this confuses objtool
   mightily as it (rightfully) thinks we're doing unspeakable
   things to the stack.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505134341.272248024@linutronix.de


Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com>
Conflicts:
	arch/x86/entry/entry_64_compat.S
[wangshaobo: change ENDPROC to END, avoid objtool skipping STT_FUNC type check]
Reviewed-by: Cheng Jian <cj.chengjian@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

427284b2

Nov 15, 2021

io_uring: fix ltout double free on completion race · 89f74ab3

Pavel Begunkov authored 3 years ago


mainline inclusion
from mainline-v5.13-rc2
commit 447c19f3b5074409c794b350b10306e1da1ef4ba
category: bugfix
bugzilla: 185736
CVE: NA

-----------------------------------------------

Always remove linked timeout on io_link_timeout_fn() from the master
request link list, otherwise we may get use-after-free when first
io_link_timeout_fn() puts linked timeout in the fail path, and then
will be found and put on master's free.

Cc: stable@vger.kernel.org # 5.10+
Fixes: 90cd7e424969d ("io_uring: track link timeout's master explicitly")
Reported-and-tested-by:  <syzbot+5a864149dd970b546223@syzkaller.appspotmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/69c46bf6ce37fec4fdcd98f0882e18eb07ce693a.1620990121.git.asml.silence@gmail.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

conflicts:
fs/io_uring.c

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

4.19.90-2111.4.0

89f74ab3

iommu: smmuv2: fix compile error when CONFIG_ARCH_PHYTIUM is off · d77bc737

Zheng Zengkai authored 3 years ago

phytium inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I41AUQ



--------------------------------------

Disabling CONFIG_ARCH_PHYTIUM results in following compile errors:

drivers/iommu/arm-smmu.c: In function ‘phytium_smmu_def_domain_type’:
drivers/iommu/arm-smmu.c:1641:6: error: implicit declaration of function ‘typeof_ft2000plus’ [-Werror=implicit-function-declaration]
 1641 |  if (typeof_ft2000plus() || typeof_s2500()) {
      |      ^~~~~~~~~~~~~~~~~
drivers/iommu/arm-smmu.c:1641:29: error: implicit declaration of function ‘typeof_s2500’ [-Werror=implicit-function-declaration]
 1641 |  if (typeof_ft2000plus() || typeof_s2500()) {
      |                             ^~~~~~~~~~~~
cc1: some warnings being treated as errors

Fix it by using CONFIG_ARCH_PHYTIUM to control phytium related code.

Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

d77bc737

Nov 12, 2021

crypto: hisilicon delete invlaid api and config · c89f49b5

Yu'an Wang authored 3 years ago


driver inclusion
category: Feature
bugzilla: NA
CVE: NA

In this patch, delete several invalid define and api:
1. sq_head in hisi_qp_status is not used for any judge, as well
   as qm_sq_head_update
2. crypto hisilicon just support async logic in kernel driver, so
   hisi_qp_wait logic is abandoned
3. CONFIG_CRYPTO_QM_UACCE seems redundant, so we delete it

Signed-off-by: Yu'an Wang <wangyuan46@huawei.com>
Reviewed-by: Weili Qian <qianweili@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

c89f49b5

crypto: hisilicon - add CRYPTO_TFM_REQ_MAY_BACKLOG flag judge in sec_process() · d0a2029c

Yu'an Wang authored 3 years ago


driver inclusion
category: Bugfix
bugzilla: NA
CVE: NA

Set the flag CRYPTO_TFM_REQ_MAY_BACKLOG in the crypto driver, which can
limit task process

Signed-off-by: Yu'an Wang <wangyuan46@huawei.com>
Reviewed-by: Longfang Liu <liulongfang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

d0a2029c

tcp: adjust rto_base in retransmits_timed_out() · 03b293dc

Eric Dumazet authored 3 years ago

mainline inclusion
from mainline-v5.4-rc2
commit 3256a2d6
category: bugfix
bugzilla:
https://gitee.com/openeuler/kernel/issues/I4AFRJ?from=project-issue


CVE: NA

------------------------------------------------------------

The cited commit exposed an old retransmits_timed_out() bug
which assumed it could call tcp_model_timeout() with
TCP_RTO_MIN as rto_base for all states.

But flows in SYN_SENT or SYN_RECV state uses a different
RTO base (1 sec instead of 200 ms, unless BPF choses
another value)

This caused a reduction of SYN retransmits from 6 to 4 with
the default /proc/sys/net/ipv4/tcp_syn_retries value.

Fixes: a41e8a88 ("tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
        net/ipv4/tcp_timer.c
Signed-off-by: Jiazhenyuan <jiazhenyuan@uniontech.com> #openEuler_contributor
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>
Reviewed-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

03b293dc

tcp: create a helper to model exponential backoff · bc2bc11d

Yuchung Cheng authored 3 years ago

mainline inclusion
from mainline-v5.1-rc1
commit 01a523b0
category: bugfix
bugzilla:
https://gitee.com/openeuler/kernel/issues/I4AFRJ?from=project-issue


CVE: NA

------------------------------------------------------------

Create a helper to model TCP exponential backoff for the next patch.
This is pure refactor w no behavior change.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
        net/ipv4/tcp_timer.c
Signed-off-by: Jiazhenyuan <jiazhenyuan@uniontech.com> #openEuler_contributor
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>
Reviewed-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

bc2bc11d

tcp: always set retrans_stamp on recovery · 3fd35106

Yuchung Cheng authored 3 years ago

mainline inclusion
from mainline-v5.1-rc1
commit 7ae18975
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4AFRJ?from=project-issue


CVE: NA

------------------------------------------------------------

Previously TCP socket's retrans_stamp is not set if the
retransmission has failed to send. As a result if a socket is
experiencing local issues to retransmit packets, determining when
to abort a socket is complicated w/o knowning the starting time of
the recovery since retrans_stamp may remain zero.

This complication causes sub-optimal behavior that TCP may use the
latest, instead of the first, retransmission time to compute the
elapsed time of a stalling connection due to local issues. Then TCP
may disrecard TCP retries settings and keep retrying until it finally
succeed: not a good idea when the local host is already strained.

The simple fix is to always timestamp the start of a recovery.
It's worth noting that retrans_stamp is also used to compare echo
timestamp values to detect spurious recovery. This patch does
not break that because retrans_stamp is still later than when the
original packet was sent.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
        net/ipv4/tcp_timer.c
Signed-off-by: Jiazhenyuan <jiazhenyuan@uniontech> #openEuler_contributor
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>
Reviewed-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

3fd35106

profiling: fix shift-out-of-bounds bugs · cc9ad3eb

Pavel Skripkin authored 3 years ago

stable inclusion
from linux-4.19.208
commit a94a60ef75d6cc2ac4c0d07cd043316e7e8b5b3b

--------------------------------

commit 2d186afd04d669fe9c48b994c41a7405a3c9f16d upstream.

Syzbot reported shift-out-of-bounds bug in profile_init().
The problem was in incorrect prof_shift. Since prof_shift value comes from
userspace we need to clamp this value into [0, BITS_PER_LONG -1]
boundaries.

Second possible shiht-out-of-bounds was found by Tetsuo:
sample_step local variable in read_profile() had "unsigned int" type,
but prof_shift allows to make a BITS_PER_LONG shift. So, to prevent
possible shiht-out-of-bounds sample_step type was changed to
"unsigned long".

Also, "unsigned short int" will be sufficient for storing
[0, BITS_PER_LONG] value, that's why there is no need for
"unsigned long" prof_shift.

Link: https://lkml.kernel.org/r/20210813140022.5011-1-paskripkin@gmail.com


Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Reported-and-tested-by:  <syzbot+e68c89a9510c159d9684@syzkaller.appspotmail.com>
Suggested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

cc9ad3eb

prctl: allow to setup brk for et_dyn executables · a45c2334

Cyrill Gorcunov authored 3 years ago

stable inclusion
from linux-4.19.208
commit 6a96bac8ba0a5ab9c9af1ed1c77478e19ae1f0f6

--------------------------------

commit e1fbbd073137a9d63279f6bf363151a938347640 upstream.

Keno Fischer reported that when a binray loaded via ld-linux-x the
prctl(PR_SET_MM_MAP) doesn't allow to setup brk value because it lays
before mm:end_data.

For example a test program shows

 | # ~/t
 |
 | start_code      401000
 | end_code        401a15
 | start_stack     7ffce4577dd0
 | start_data	   403e10
 | end_data        40408c
 | start_brk	   b5b000
 | sbrk(0)         b5b000

and when executed via ld-linux

 | # /lib64/ld-linux-x86-64.so.2 ~/t
 |
 | start_code      7fc25b0a4000
 | end_code        7fc25b0c4524
 | start_stack     7fffcc6b2400
 | start_data	   7fc25b0ce4c0
 | end_data        7fc25b0cff98
 | start_brk	   55555710c000
 | sbrk(0)         55555710c000

This of course prevent criu from restoring such programs.  Looking into
how kernel operates with brk/start_brk inside brk() syscall I don't see
any problem if we allow to setup brk/start_brk without checking for
end_data.  Even if someone pass some weird address here on a purpose then
the worst possible result will be an unexpected unmapping of existing vma
(own vma, since prctl works with the callers memory) but test for
RLIMIT_DATA is still valid and a user won't be able to gain more memory in
case of expanding VMAs via new values shipped with prctl call.

Link: https://lkml.kernel.org/r/20210121221207.GB2174@grain


Fixes: bbdc6076 ("binfmt_elf: move brk out of mmap when doing direct loader exec")
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reported-by: Keno Fischer <keno@juliacomputing.com>
Acked-by: Andrey Vagin <avagin@gmail.com>
Tested-by: Andrey Vagin <avagin@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

a45c2334

dmaengine: acpi: Avoid comparison GSI with Linux vIRQ · 5b8c120f

Andy Shevchenko authored 3 years ago


stable inclusion
from linux-4.19.208
commit 523559507138ca4abcf4c2522c0061071c1d60a0

--------------------------------

commit 67db87dc8284070adb15b3c02c1c31d5cf51c5d6 upstream.

Currently the CRST parsing relies on the fact that on most of x86 devices
the IRQ mapping is 1:1 with Linux vIRQ. However, it may be not true for
some. Fix this by converting GSI to Linux vIRQ before checking it.

Fixes: ee8209fd ("dma: acpi-dma: parse CSRT to extract additional resources")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20210730202715.24375-1-andriy.shevchenko@linux.intel.com


Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

5b8c120f

tracing/kprobe: Fix kprobe_on_func_entry() modification · 9d797514

Li Huafei authored 3 years ago


stable inclusion
from linux-4.19.208
commit 6cfbbb961bb94de85455fe35140b1350c7ccb76c

--------------------------------

The commit 960434acef37 ("tracing/kprobe: Fix to support kretprobe
events on unloaded modules") backport from v5.11, which modifies the
return value of kprobe_on_func_entry(). However, there is no adaptation
modification in create_trace_kprobe(), resulting in the exact opposite
behavior. Now we need to return an error immediately only if
kprobe_on_func_entry() returns -EINVAL.

Fixes: 960434acef37 ("tracing/kprobe: Fix to support kretprobe events on unloaded modules")
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

9d797514

rcu: Fix missed wakeup of exp_wq waiters · a12fd75f

Neeraj Upadhyay authored 3 years ago


stable inclusion
from linux-4.19.208
commit 3226fb90cf5dc89611f742f122a33d4598076ad5

--------------------------------

commit fd6bc19d upstream.

Tasks waiting within exp_funnel_lock() for an expedited grace period to
elapse can be starved due to the following sequence of events:

1.	Tasks A and B both attempt to start an expedited grace
	period at about the same time.	This grace period will have
	completed when the lower four bits of the rcu_state structure's
	->expedited_sequence field are 0b'0100', for example, when the
	initial value of this counter is zero.	Task A wins, and thus
	does the actual work of starting the grace period, including
	acquiring the rcu_state structure's .exp_mutex and sets the
	counter to 0b'0001'.

2.	Because task B lost the race to start the grace period, it
	waits on ->expedited_sequence to reach 0b'0100' inside of
	exp_funnel_lock(). This task therefore blocks on the rcu_node
	structure's ->exp_wq[1] field, keeping in mind that the
	end-of-grace-period value of ->expedited_sequence (0b'0100')
	is shifted down two bits before indexing the ->exp_wq[] field.

3.	Task C attempts to start another expedited grace period,
	but blocks on ->exp_mutex, which is still held by Task A.

4.	The aforementioned expedited grace period completes, so that
	->expedited_sequence now has the value 0b'0100'.  A kworker task
	therefore acquires the rcu_state structure's ->exp_wake_mutex
	and starts awakening any tasks waiting for this grace period.

5.	One of the first tasks awakened happens to be Task A.  Task A
	therefore releases the rcu_state structure's ->exp_mutex,
	which allows Task C to start the next expedited grace period,
	which causes the lower four bits of the rcu_state structure's
	->expedited_sequence field to become 0b'0101'.

6.	Task C's expedited grace period completes, so that the lower four
	bits of the rcu_state structure's ->expedited_sequence field now
	become 0b'1000'.

7.	The kworker task from step 4 above continues its wakeups.
	Unfortunately, the wake_up_all() refetches the rcu_state
	structure's .expedited_sequence field:

	wake_up_all(&rnp->exp_wq[rcu_seq_ctr(rcu_state.expedited_sequence) & 0x3]);

	This results in the wakeup being applied to the rcu_node
	structure's ->exp_wq[2] field, which is unfortunate given that
	Task B is instead waiting on ->exp_wq[1].

On a busy system, no harm is done (or at least no permanent harm is done).
Some later expedited grace period will redo the wakeup.  But on a quiet
system, such as many embedded systems, it might be a good long time before
there was another expedited grace period.  On such embedded systems,
this situation could therefore result in a system hang.

This issue manifested as DPM device timeout during suspend (which
usually qualifies as a quiet time) due to a SCSI device being stuck in
_synchronize_rcu_expedited(), with the following stack trace:

	schedule()
	synchronize_rcu_expedited()
	synchronize_rcu()
	scsi_device_quiesce()
	scsi_bus_suspend()
	dpm_run_callback()
	__device_suspend()

This commit therefore prevents such delays, timeouts, and hangs by
making rcu_exp_wait_wake() use its "s" argument consistently instead of
refetching from rcu_state.expedited_sequence.

Fixes: 3b5f668e ("rcu: Overlap wakeups with next expedited grace period")
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: David Chen <david.chen@nutanix.com>
Acked-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

a12fd75f

netfilter: socket: icmp6: fix use-after-scope · 91bc0cc5

Benjamin Hesmans authored 3 years ago


stable inclusion
from linux-4.19.207
commit d6efada330af09253b0f81a0d836cee02192bd4f

--------------------------------

[ Upstream commit 730affed24bffcd1eebd5903171960f5ff9f1f22 ]

Bug reported by KASAN:

BUG: KASAN: use-after-scope in inet6_ehashfn (net/ipv6/inet6_hashtables.c:40)
Call Trace:
(...)
inet6_ehashfn (net/ipv6/inet6_hashtables.c:40)
(...)
nf_sk_lookup_slow_v6 (net/ipv6/netfilter/nf_socket_ipv6.c:91
net/ipv6/netfilter/nf_socket_ipv6.c:146)

It seems that this bug has already been fixed by Eric Dumazet in the
past in:
commit 78296c97 ("netfilter: xt_socket: fix a stack corruption bug")

But a variant of the same issue has been introduced in
commit d64d80a2 ("netfilter: x_tables: don't extract flow keys on early demuxed sks in socket match")

`daddr` and `saddr` potentially hold a reference to ipv6_var that is no
longer in scope when the call to `nf_socket_get_sock_v6` is made.

Fixes: d64d80a2 ("netfilter: x_tables: don't extract flow keys on early demuxed sks in socket match")
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Benjamin Hesmans <benjamin.hesmans@tessares.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

91bc0cc5

PCI: Sync __pci_register_driver() stub for CONFIG_PCI=n · f6b3082b

Andy Shevchenko authored 3 years ago

stable inclusion
from linux-4.19.207
commit 6bdadfff347e42b6da70a9c77bb443479781c1f3

--------------------------------

[ Upstream commit 817f9916a6e96ae43acdd4e75459ef4f92d96eb1 ]

The CONFIG_PCI=y case got a new parameter long time ago.  Sync the stub as
well.

[bhelgaas: add parameter names]
Fixes: 725522b5 ("PCI: add the sysfs driver name to all modules")
Link: https://lore.kernel.org/r/20210813153619.89574-1-andriy.shevchenko@linux.intel.com


Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

f6b3082b

PCI: Fix pci_dev_str_match_path() alloc while atomic bug · e19ade47

Dan Carpenter authored 3 years ago

stable inclusion
from linux-4.19.207
commit 1a091bfd11e61032b6192cf2a1ebb259889f28b3

--------------------------------

[ Upstream commit 7eb6ea4148579b85540a41d57bcec315b8af8ff8 ]

pci_dev_str_match_path() is often called with a spinlock held so the
allocation has to be atomic.  The call tree is:

  pci_specified_resource_alignment() <-- takes spin_lock();
    pci_dev_str_match()
      pci_dev_str_match_path()

Fixes: 45db3370 ("PCI: Allow specifying devices using a base bus and path of devfns")
Link: https://lore.kernel.org/r/20210812070004.GC31863@kili


Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

e19ade47

block, bfq: honor already-setup queue merges · 86b9692f

Paolo Valente authored 3 years ago


stable inclusion
from linux-4.19.207
commit 2c1b1848357dc69f62ce3630b850e6680b87854b

--------------------------------

[ Upstream commit 2d52c58b9c9bdae0ca3df6a1eab5745ab3f7d80b ]

The function bfq_setup_merge prepares the merging between two
bfq_queues, say bfqq and new_bfqq. To this goal, it assigns
bfqq->new_bfqq = new_bfqq. Then, each time some I/O for bfqq arrives,
the process that generated that I/O is disassociated from bfqq and
associated with new_bfqq (merging is actually a redirection). In this
respect, bfq_setup_merge increases new_bfqq->ref in advance, adding
the number of processes that are expected to be associated with
new_bfqq.

Unfortunately, the stable-merging mechanism interferes with this
setup. After bfqq->new_bfqq has been set by bfq_setup_merge, and
before all the expected processes have been associated with
bfqq->new_bfqq, bfqq may happen to be stably merged with a different
queue than the current bfqq->new_bfqq. In this case, bfqq->new_bfqq
gets changed. So, some of the processes that have been already
accounted for in the ref counter of the previous new_bfqq will not be
associated with that queue.  This creates an unbalance, because those
references will never be decremented.

This commit fixes this issue by reestablishing the previous, natural
behaviour: once bfqq->new_bfqq has been set, it will not be changed
until all expected redirections have occurred.

Signed-off-by: Davide Zini <davidezini2@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Link: https://lore.kernel.org/r/20210802141352.74353-2-paolo.valente@linaro.org


Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

86b9692f

mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range() · df25445e

David Hildenbrand authored 3 years ago

stable inclusion
from linux-4.19.207
commit c48402015e02901ff8b1fac5c112b02546364bb6

--------------------------------

commit 7cf209ba8a86410939a24cb1aeb279479a7e0ca6 upstream.

Patch series "mm/memory_hotplug: preparatory patches for new online policy and memory"

These are all cleanups and one fix previously sent as part of [1]:
[PATCH v1 00/12] mm/memory_hotplug: "auto-movable" online policy and memory
groups.

These patches make sense even without the other series, therefore I pulled
them out to make the other series easier to digest.

[1] https://lkml.kernel.org/r/20210607195430.48228-1-david@redhat.com

This patch (of 4):

Checkpatch complained on a follow-up patch that we are using "unsigned"
here, which defaults to "unsigned int" and checkpatch is correct.

As we will search for a fitting zone using the wrong pfn, we might end
up onlining memory to one of the special kernel zones, such as ZONE_DMA,
which can end badly as the onlined memory does not satisfy properties of
these zones.

Use "unsigned long" instead, just as we do in other places when handling
PFNs.  This can bite us once we have physical addresses in the range of
multiple TB.

Link: https://lkml.kernel.org/r/20210712124052.26491-2-david@redhat.com


Fixes: e5e68930 ("mm, memory_hotplug: display allowed zones in the preferred ordering")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: virtualization@lists.linux-foundation.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Michel Lespinasse <michel@lespinasse.org>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pierre Morel <pmorel@linux.ibm.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Scott Cheloha <cheloha@linux.ibm.com>
Cc: Sergei Trofimovich <slyfox@gentoo.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

df25445e

tcp: fix tp->undo_retrans accounting in tcp_sacktag_one() · ef77249e

zhenggy authored 3 years ago


stable inclusion
from linux-4.19.207
commit dfefcc46354530c3ec1d12db0e16c740548c6229

--------------------------------

commit 4f884f3962767877d7aabbc1ec124d2c307a4257 upstream.

Commit 10d3be56 ("tcp-tso: do not split TSO packets at retransmit
time") may directly retrans a multiple segments TSO/GSO packet without
split, Since this commit, we can no longer assume that a retransmitted
packet is a single segment.

This patch fixes the tp->undo_retrans accounting in tcp_sacktag_one()
that use the actual segments(pcount) of the retransmitted packet.

Before that commit (10d3be56), the assumption underlying the
tp->undo_retrans-- seems correct.

Fixes: 10d3be56 ("tcp-tso: do not split TSO packets at retransmit time")
Signed-off-by: zhenggy <zhenggy@chinatelecom.cn>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

ef77249e

net/af_unix: fix a data-race in unix_dgram_poll · bc96ec33

Eric Dumazet authored 3 years ago


stable inclusion
from linux-4.19.207
commit 44ba281510190e2915506016407ba5b374a3add2

--------------------------------

commit 04f08eb44b5011493d77b602fdec29ff0f5c6cd5 upstream.

syzbot reported another data-race in af_unix [1]

Lets change __skb_insert() to use WRITE_ONCE() when changing
skb head qlen.

Also, change unix_dgram_poll() to use lockless version
of unix_recvq_full()

It is verry possible we can switch all/most unix_recvq_full()
to the lockless version, this will be done in a future kernel version.

[1] HEAD commit: 8596e589b787732c8346f0482919e83cc9362db1

BUG: KCSAN: data-race in skb_queue_tail / unix_dgram_poll

write to 0xffff88814eeb24e0 of 4 bytes by task 25815 on cpu 0:
 __skb_insert include/linux/skbuff.h:1938 [inline]
 __skb_queue_before include/linux/skbuff.h:2043 [inline]
 __skb_queue_tail include/linux/skbuff.h:2076 [inline]
 skb_queue_tail+0x80/0xa0 net/core/skbuff.c:3264
 unix_dgram_sendmsg+0xff2/0x1600 net/unix/af_unix.c:1850
 sock_sendmsg_nosec net/socket.c:703 [inline]
 sock_sendmsg net/socket.c:723 [inline]
 ____sys_sendmsg+0x360/0x4d0 net/socket.c:2392
 ___sys_sendmsg net/socket.c:2446 [inline]
 __sys_sendmmsg+0x315/0x4b0 net/socket.c:2532
 __do_sys_sendmmsg net/socket.c:2561 [inline]
 __se_sys_sendmmsg net/socket.c:2558 [inline]
 __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2558
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff88814eeb24e0 of 4 bytes by task 25834 on cpu 1:
 skb_queue_len include/linux/skbuff.h:1869 [inline]
 unix_recvq_full net/unix/af_unix.c:194 [inline]
 unix_dgram_poll+0x2bc/0x3e0 net/unix/af_unix.c:2777
 sock_poll+0x23e/0x260 net/socket.c:1288
 vfs_poll include/linux/poll.h:90 [inline]
 ep_item_poll fs/eventpoll.c:846 [inline]
 ep_send_events fs/eventpoll.c:1683 [inline]
 ep_poll fs/eventpoll.c:1798 [inline]
 do_epoll_wait+0x6ad/0xf00 fs/eventpoll.c:2226
 __do_sys_epoll_wait fs/eventpoll.c:2238 [inline]
 __se_sys_epoll_wait fs/eventpoll.c:2233 [inline]
 __x64_sys_epoll_wait+0xf6/0x120 fs/eventpoll.c:2233
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x0000001b -> 0x00000001

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 25834 Comm: syz-executor.1 Tainted: G        W         5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 86b18aaa ("skbuff: fix a data race in skb_queue_len()")
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

bc96ec33

events: Reuse value read using READ_ONCE instead of re-reading it · 7bf309a4

Baptiste Lepers authored 3 years ago


stable inclusion
from linux-4.19.207
commit c09a84aea0d3902f955cc6504e1c25cddb7c48c2

--------------------------------

commit b89a05b21f46150ac10a962aa50109250b56b03b upstream.

In perf_event_addr_filters_apply, the task associated with
the event (event->ctx->task) is read using READ_ONCE at the beginning
of the function, checked, and then re-read from event->ctx->task,
voiding all guarantees of the checks. Reuse the value that was read by
READ_ONCE to ensure the consistency of the task struct throughout the
function.

Fixes: 375637bc ("perf/core: Introduce address range filtering")
Signed-off-by: Baptiste Lepers <baptiste.lepers@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210906015310.12802-1-baptiste.lepers@gmail.com


Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

7bf309a4

x86/mm: Fix kern_addr_valid() to cope with existing but not present entries · ef472070

Mike Rapoport authored 3 years ago


stable inclusion
from linux-4.19.207
commit 2717db72f74c8e51068801b6327559241c54b86e

--------------------------------

commit 34b1999da935a33be6239226bfa6cd4f704c5c88 upstream.

Jiri Olsa reported a fault when running:

  # cat /proc/kallsyms | grep ksys_read
  ffffffff8136d580 T ksys_read
  # objdump -d --start-address=0xffffffff8136d580 --stop-address=0xffffffff8136d590 /proc/kcore

  /proc/kcore:     file format elf64-x86-64

  Segmentation fault

  general protection fault, probably for non-canonical address 0xf887ffcbff000: 0000 [#1] SMP PTI
  CPU: 12 PID: 1079 Comm: objdump Not tainted 5.14.0-rc5qemu+ #508
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014
  RIP: 0010:kern_addr_valid
  Call Trace:
   read_kcore
   ? rcu_read_lock_sched_held
   ? rcu_read_lock_sched_held
   ? rcu_read_lock_sched_held
   ? trace_hardirqs_on
   ? rcu_read_lock_sched_held
   ? lock_acquire
   ? lock_acquire
   ? rcu_read_lock_sched_held
   ? lock_acquire
   ? rcu_read_lock_sched_held
   ? rcu_read_lock_sched_held
   ? rcu_read_lock_sched_held
   ? lock_release
   ? _raw_spin_unlock
   ? __handle_mm_fault
   ? rcu_read_lock_sched_held
   ? lock_acquire
   ? rcu_read_lock_sched_held
   ? lock_release
   proc_reg_read
   ? vfs_read
   vfs_read
   ksys_read
   do_syscall_64
   entry_SYSCALL_64_after_hwframe

The fault happens because kern_addr_valid() dereferences existent but not
present PMD in the high kernel mappings.

Such PMDs are created when free_kernel_image_pages() frees regions larger
than 2Mb. In this case, a part of the freed memory is mapped with PMDs and
the set_memory_np_noalias() -> ... -> __change_page_attr() sequence will
mark the PMD as not present rather than wipe it completely.

Have kern_addr_valid() check whether higher level page table entries are
present before trying to dereference them to fix this issue and to avoid
similar issues in the future.

Stable backporting note:
------------------------

Note that the stable marking is for all active stable branches because
there could be cases where pagetable entries exist but are not valid -
see 9a14aefc ("x86: cpa, fix lookup_address"), for example. So make
sure to be on the safe side here and use pXY_present() accessors rather
than pXY_none() which could #GP when accessing pages in the direct map.

Also see:

  c40a56a7 ("x86/mm/init: Remove freed kernel image areas from alias mapping")

for more info.

Reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Cc: <stable@vger.kernel.org>	# 4.4+
Link: https://lkml.kernel.org/r/20210819132717.19358-1-rppt@kernel.org


Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

ef472070

arm64/sve: Use correct size when reinitialising SVE state · ef8baf83

Mark Brown authored 3 years ago


stable inclusion
from linux-4.19.207
commit aadf3115f86c6df4308e2734b5c39eadbe9573e0

--------------------------------

commit e35ac9d0b56e9efefaeeb84b635ea26c2839ea86 upstream.

When we need a buffer for SVE register state we call sve_alloc() to make
sure that one is there. In order to avoid repeated allocations and frees
we keep the buffer around unless we change vector length and just memset()
it to ensure a clean register state. The function that deals with this
takes the task to operate on as an argument, however in the case where we
do a memset() we initialise using the SVE state size for the current task
rather than the task passed as an argument.

This is only an issue in the case where we are setting the register state
for a task via ptrace and the task being configured has a different vector
length to the task tracing it. In the case where the buffer is larger in
the traced process we will leak old state from the traced process to
itself, in the case where the buffer is smaller in the traced process we
will overflow the buffer and corrupt memory.

Fixes: bc0ee476 ("arm64/sve: Core task context handling")
Cc: <stable@vger.kernel.org> # 4.15.x
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210909165356.10675-1-broonie@kernel.org


Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

ef8baf83

mm/hugetlb: initialize hugetlb_usage in mm_init · 06f9c353

Liu Zixian authored 3 years ago

stable inclusion
from linux-4.19.207
commit 2fed7f8eda3211190a27caddd6ba8fd728f7b17b

--------------------------------

commit 13db8c50477d83ad3e3b9b0ae247e5cd833a7ae4 upstream.

After fork, the child process will get incorrect (2x) hugetlb_usage.  If
a process uses 5 2MB hugetlb pages in an anonymous mapping,

	HugetlbPages:	   10240 kB

and then forks, the child will show,

	HugetlbPages:	   20480 kB

The reason for double the amount is because hugetlb_usage will be copied
from the parent and then increased when we copy page tables from parent
to child.  Child will have 2x actual usage.

Fix this by adding hugetlb_count_init in mm_init.

Link: https://lkml.kernel.org/r/20210826071742.877-1-liuzixian4@huawei.com


Fixes: 5d317b2b ("mm: hugetlb: proc: add HugetlbPages field to /proc/PID/status")
Signed-off-by: Liu Zixian <liuzixian4@huawei.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

06f9c353