Commits · ebca52abe099caa97d7669d0acc71209ea80cfec · Summer2022 / 22b970264

Jul 21, 2022

sched: Add statistics for scheduler dynamic affinity · ebca52ab

Hui Tang authored 2 years ago

hulk inclusion
category: feature
bugzilla: 187173, https://gitee.com/openeuler/kernel/issues/I5G4IH


CVE: NA

--------------------------------

Signed-off-by: Hui Tang <tanghui20@huawei.com>
Reviewed-by: Chen Hui <judy.chenhui@huawei.com>
Reviewed-by: Zhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

ebca52ab

sched: Adjust cpu range in load balance dynamicly · 2af15a46

Hui Tang authored 2 years ago

hulk inclusion
category: feature
bugzilla: 187173, https://gitee.com/openeuler/kernel/issues/I5G4IH


CVE: NA

--------------------------------

Signed-off-by: Hui Tang <tanghui20@huawei.com>
Reviewed-by: Chen Hui <judy.chenhui@huawei.com>
Reviewed-by: Zhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

2af15a46

sched: Adjust wakeup cpu range according CPU util dynamicly · 70a232a5

Hui Tang authored 2 years ago

hulk inclusion
category: feature
bugzilla: 187173, https://gitee.com/openeuler/kernel/issues/I5G4IH


CVE: NA

--------------------------------

Compare taskgroup 'util_avg' in perferred cpu with capacity preferred cpu,
dynamicly adjust cpu range for task wakeup process.

Signed-off-by: Hui Tang <tanghui20@huawei.com>
Reviewed-by: Chen Hui <judy.chenhui@huawei.com>
Reviewed-by: Zhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

70a232a5

cpuset: Introduce new interface for scheduler dynamic affinity · 243865da

Hui Tang authored 2 years ago

hulk inclusion
category: feature
bugzilla: 187173, https://gitee.com/openeuler/kernel/issues/I5G4IH


CVE: NA

--------------------------------

Add 'prefer_cpus' sysfs and related interface in cgroup cpuset.

Signed-off-by: Hui Tang <tanghui20@huawei.com>
Reviewed-by: Zhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: Chen Hui <judy.chenhui@huawei.com>
Reviewed-by: Chen Hui <judy.chenhui@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

243865da

Jul 14, 2022

rcu/tree: Mark functions as notrace · 9b5e728e

Zheng Yejian authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: 187209, https://gitee.com/openeuler/kernel/issues/I5GWFT
CVE: NA

--------------------------------

Syzkaller report a softlockup problem, see following logs:
  [   41.463870] watchdog: BUG: soft lockup - CPU#0 stuck for 22s!  [ksoftirqd/0:9]
  [   41.509763] Modules linked in:
  [   41.512295] CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 4.19.90 #13
  [   41.516134] Hardware name: linux,dummy-virt (DT)
  [   41.519182] pstate: 80c00005 (Nzcv daif +PAN +UAO)
  [   41.522415] pc : perf_trace_buf_alloc+0x138/0x238
  [   41.525583] lr : perf_trace_buf_alloc+0x138/0x238
  [   41.528656] sp : ffff8000c137e880
  [   41.531050] x29: ffff8000c137e880 x28: ffff20000850ced0
  [   41.534759] x27: 0000000000000000 x26: ffff8000c137e9c0
  [   41.538456] x25: ffff8000ce5c2ae0 x24: ffff200008358b08
  [   41.542151] x23: 0000000000000000 x22: ffff2000084a50ac
  [   41.545834] x21: ffff8000c137e880 x20: 000000000000001c
  [   41.549516] x19: ffff7dffbfdf88e8 x18: 0000000000000000
  [   41.553202] x17: 0000000000000000 x16: 0000000000000000
  [   41.556892] x15: 1ffff00036e07805 x14: 0000000000000000
  [   41.560592] x13: 0000000000000004 x12: 0000000000000000
  [   41.564315] x11: 1fffefbff7fbf120 x10: ffff0fbff7fbf120
  [   41.568003] x9 : dfff200000000000 x8 : ffff7dffbfdf8904
  [   41.571699] x7 : 0000000000000000 x6 : ffff0fbff7fbf121
  [   41.575398] x5 : ffff0fbff7fbf121 x4 : ffff0fbff7fbf121
  [   41.579086] x3 : ffff20000850cdc8 x2 : 0000000000000008
  [   41.582773] x1 : ffff8000c1376000 x0 : 0000000000000100
  [   41.586495] Call trace:
  [   41.588922]  perf_trace_buf_alloc+0x138/0x238
  [   41.591912]  perf_ftrace_function_call+0x1ac/0x248
  [   41.595123]  ftrace_ops_no_ops+0x3a4/0x488
  [   41.597998]  ftrace_graph_call+0x0/0xc
  [   41.600715]  rcu_dynticks_curr_cpu_in_eqs+0x14/0x70
  [   41.603962]  rcu_is_watching+0xc/0x20
  [   41.606635]  ftrace_ops_no_ops+0x240/0x488
  [   41.609530]  ftrace_graph_call+0x0/0xc
  [   41.612249]  __read_once_size_nocheck.constprop.0+0x1c/0x38
  [   41.615905]  unwind_frame+0x140/0x358
  [   41.618597]  walk_stackframe+0x34/0x60
  [   41.621359]  __save_stack_trace+0x204/0x3b8
  [   41.624328]  save_stack_trace+0x2c/0x38
  [   41.627112]  __kasan_slab_free+0x120/0x228
  [   41.630018]  kasan_slab_free+0x10/0x18
  [   41.632752]  kfree+0x84/0x250
  [   41.635107]  skb_free_head+0x70/0xb0
  [   41.637772]  skb_release_data+0x3f8/0x730
  [   41.640626]  skb_release_all+0x50/0x68
  [   41.643350]  kfree_skb+0x84/0x278
  [   41.645890]  kfree_skb_list+0x4c/0x78
  [   41.648595]  __dev_queue_xmit+0x1a4c/0x23a0
  [   41.651541]  dev_queue_xmit+0x28/0x38
  [   41.654254]  ip6_finish_output2+0xeb0/0x1630
  [   41.657261]  ip6_finish_output+0x2d8/0x7f8
  [   41.660174]  ip6_output+0x19c/0x348
  [   41.663850]  mld_sendpack+0x560/0x9e0
  [   41.666564]  mld_ifc_timer_expire+0x484/0x8a8
  [   41.669624]  call_timer_fn+0x68/0x4b0
  [   41.672355]  expire_timers+0x168/0x498
  [   41.675126]  run_timer_softirq+0x230/0x7a8
  [   41.678052]  __do_softirq+0x2d0/0xba0
  [   41.680763]  run_ksoftirqd+0x110/0x1a0
  [   41.683512]  smpboot_thread_fn+0x31c/0x620
  [   41.686429]  kthread+0x2c8/0x348
  [   41.688927]  ret_from_fork+0x10/0x18

Look into above call stack, we found a recursive call in
'ftrace_graph_call', see a snippet:
    __read_once_size_nocheck.constprop.0
      ftrace_graph_call
        ......
          rcu_dynticks_curr_cpu_in_eqs
            ftrace_graph_call

We analyze that 'rcu_dynticks_curr_cpu_in_eqs' should not be tracable,
and we verify that mark related functions as 'notrace' can avoid the
problem.

Comparing mainline kernel, we find that commit ff5c4f5c ("rcu/tree:
Mark the idle relevant functions noinstr") mark related functions as
'noinstr' which implies notrace, noinline and sticks things in the
.noinstr.text section.
Link: https://lore.kernel.org/all/20200416114706.625340212@infradead.org/



Currently 'noinstr' mechanism has not been introduced, so we would not
directly backport that commit (otherwise more changes may be introduced).
Instead, we mark the functions as 'notrace' where it is 'noinstr' in
that commit.

Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

9b5e728e

Jul 01, 2022

tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing · 48c3a5e2

Steven Rostedt (VMware) authored 2 years ago

stable inclusion
from stable-5.10.50
commit 0531e84bc8ac750b7a15c18d478804c0d98f0a86
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ETW1
CVE: NA

--------------------------------

commit 9913d5745bd720c4266805c8d29952a3702e4eca upstream.

All internal use cases for tracepoint_probe_register() is set to not ever
be called with the same function and data. If it is, it is considered a
bug, as that means the accounting of handling tracepoints is corrupted.
If the function and data for a tracepoint is already registered when
tracepoint_probe_register() is called, it will call WARN_ON_ONCE() and
return with EEXISTS.

The BPF system call can end up calling tracepoint_probe_register() with
the same data, which now means that this can trigger the warning because
of a user space process. As WARN_ON_ONCE() should not be called because
user space called a system call with bad data, there needs to be a way to
register a tracepoint without tri...

48c3a5e2

swiotlb: skip swiotlb_bounce when orig_addr is zero · 8213da67

Liu Shixin authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5EZK8
CVE: NA

--------------------------------

After patch ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE"),
swiotlb_bounce will be called in swiotlb_tbl_map_single unconditionally.
This requires that the physical address must be valid, which is not always
true on stable-4.19 or earlier version.
On stable-4.19, swiotlb_alloc_buffer will call swiotlb_tbl_map_single with
orig_addr equal to zero, which cause such a panic:

Unable to handle kernel paging request at virtual address ffffb77a40000000
...
pc : __memcpy+0x100/0x180
lr : swiotlb_bounce+0x74/0x88
...
Call trace:
 __memcpy+0x100/0x180
 swiotlb_tbl_map_single+0x2c8/0x338
 swiotlb_alloc+0xb4/0x198
 __dma_alloc+0x84/0x1d8
 ...

On stable-4.9 and stable-4.14, swiotlb_alloc_coherent wille call map_single
with orig_addr equal to zero, which can cause same panic.

Fix this by skipping swiotlb_bounce when orig...

8213da67

Jun 15, 2022

Revert "sched: Fix sched_fork() access an invalid sched_task_group" · 44bf746f

Zhang Qiao authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: 186973, https://gitee.com/openeuler/kernel/issues/I5CA6K


CVE: NA

--------------------------------

This reverts commit 74bd9b82.

Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: Chen Hui <judy.chenhui@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

44bf746f

Revert "sched: Fix yet more sched_fork() races" · 82a543fe

Zhang Qiao authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: 186973, https://gitee.com/openeuler/kernel/issues/I5CA6K


CVE: NA

--------------------------------

This reverts commit af98db5f.
the patch af98db5f("sched: Fix yet more sched_fork()") may
be cause a process sleep at cgroup_post_fork()->freezer_fork()
while taking group_threadgroup_rwsem lock long time, it cause
a problem that other tasks will wait while fork child process
and the system will stall.

Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: Chen Hui <judy.chenhui@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

82a543fe

Jun 02, 2022

smp: Fix offline cpu check in flush_smp_call_function_queue() · 4afba44c

Nadav Amit authored 2 years ago

stable inclusion
from stable-4.19.239
commit bd4ad32e44ea472c83c7c21e5bee29a1bb1c24e8
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5A6BA


CVE: NA

--------------------------------

commit 9e949a3886356fe9112c6f6f34a6e23d1d35407f upstream.

The check in flush_smp_call_function_queue() for callbacks that are sent
to offline CPUs currently checks whether the queue is empty.

However, flush_smp_call_function_queue() has just deleted all the
callbacks from the queue and moved all the entries into a local list.
This checks would only be positive if some callbacks were added in the
short time after llist_del_all() was called. This does not seem to be
the intention of this check.

Change the check to look at the local list to which the entries were
moved instead of the queue from which all the callbacks were just
removed.

Fixes: 8d056c48 ("CPU hotplug, smp: flush any pending IPI callbacks before CPU offline")
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220319072015.1495036-1-namit@vmware.com


Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

4afba44c

printk: fix return value of printk.devkmsg __setup handler · 3e950ce1

Randy Dunlap authored 2 years ago

stable inclusion
from stable-4.19.238
commit bb5b72645288a1a4cf1c8b87fb7f469a52119555
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5A6BA
CVE: NA

--------------------------------

[ Upstream commit b665eae7a788c5e2bc10f9ac3c0137aa0ad1fc97 ]

If an invalid option value is used with "printk.devkmsg=<value>",
it is silently ignored.
If a valid option value is used, it is honored but the wrong return
value (0) is used, indicating that the command line option had an
error and was not handled. This string is not added to init's
environment strings due to init/main.c::unknown_bootoption()
checking for a '.' in the boot option string and then considering
that string to be an "Unused module parameter".

Print a warning message if a bad option string is used.
Always return 1 from the __setup handler to indicate that the command
line option has been handled.

Fixes: 750afe7b ("printk: add kernel parameter to control writes to /de...

3e950ce1

perf/core: Fix address filter parser for multiple filters · 02cb6880

Adrian Hunter authored 2 years ago

stable inclusion
from stable-4.19.238
commit 31ceb83b64d6e246a13db42547c2e9d6655027e1
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5A6BA


CVE: NA

--------------------------------

[ Upstream commit d680ff24e9e14444c63945b43a37ede7cd6958f9 ]

Reset appropriate variables in the parser loop between parsing separate
filters, so that they do not interfere with parsing the next filter.

Fixes: 375637bc ("perf/core: Introduce address range filtering")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220131072453.2839535-4-adrian.hunter@intel.com


Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

02cb6880

Jun 01, 2022

sched/fair: Fix enqueue_task_fair() warning some more · 47a6e1c3

Phil Auld authored 2 years ago

mainline inclusion
from mainline-v5.r7-rc6
commit b34cb07d
category: bugfix
bugzilla: 91404, https://gitee.com/openeuler/kernel/issues/I59VLJ


CVE: NA

--------------------------------

The recent patch, fe61468b (sched/fair: Fix enqueue_task_fair warning)
did not fully resolve the issues with the rq->tmp_alone_branch !=
&rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
the first for_each_sched_entity loop exits due to on_rq, having incompletely
updated the list.  In this case the second for_each_sched_entity loop can
further modify se. The later code to fix up the list management fails to do
what is needed because se does not point to the sched_entity which broke out
of the first loop. The list is not fixed up because the throttled parent was
already added back to the list by a task enqueue in a parallel child hierarchy.

Address this by calling list_add_leaf_cfs_rq if there are throttled parents
while doing the second for_each_sched_entity loop.

Fixes: fe61468b ("sched/fair: Fix enqueue_task_fair warning")
Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20200512135222.GC2201@lorien.usersys.redhat.com


Signed-off-by: Hui Tang <tanghui20@huawei.com>
Reviewed-by: Chen Hui <judy.chenhui@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

47a6e1c3

sched/fair: Fix enqueue_task_fair warning · b66e423f

Vincent Guittot authored 2 years ago

mainline inclusion
from mainline-v5.6-rc4
commit fe61468b
category: bugfix
bugzilla: 93902, https://gitee.com/openeuler/kernel/issues/I59VLJ


CVE: NA

--------------------------------

When a cfs rq is throttled, the latter and its child are removed from the
leaf list but their nr_running is not changed which includes staying higher
than 1. When a task is enqueued in this throttled branch, the cfs rqs must
be added back in order to ensure correct ordering in the list but this can
only happens if nr_running == 1.
When cfs bandwidth is used, we call unconditionnaly list_add_leaf_cfs_rq()
when enqueuing an entity to make sure that the complete branch will be
added.

Similarly unthrottle_cfs_rq() can stop adding cfs in the list when a parent
is throttled. Iterate the remaining entity to ensure that the complete
branch will be added in the list.

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: stable@vger.kernel.org
Cc: stable@vger.kernel.org #v5.1+
Link: https://lkml.kernel.org/r/20200306135257.25044-1-vincent.guittot@linaro.org


Signed-off-by: Hui Tang <tanghui20@huawei.com>
Reviewed-by: Chen Hui <judy.chenhui@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

b66e423f

May 31, 2022

perf: Fix sys_perf_event_open() race against self · 5bdcf114

Peter Zijlstra authored 2 years ago

stable inclusion
from stable-v4.19.245
commit 6cdd53a49aa7413e53c14ece27d826f0b628b18a
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I593PQ


CVE: CVE-2022-1729

--------------------------------

commit 3ac6487e584a1eb54071dbe1212e05b884136704 upstream.

Norbert reported that it's possible to race sys_perf_event_open() such
that the looser ends up in another context from the group leader,
triggering many WARNs.

The move_group case checks for races against itself, but the
!move_group case doesn't, seemingly relying on the previous
group_leader->ctx == ctx check. However, that check is racy due to not
holding any locks at that time.

Therefore, re-check the result after acquiring locks and bailing
if they no longer match.

Additionally, clarify the not_move_group case from the
move_group-vs-move_group race.

Fixes: f63a8daa ("perf: Fix event->ctx locking")
Reported-by: Norbert Slusarek <nslusarek@gmx.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Reviewed-by: Yang Jihong <yangjihong1@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

5bdcf114

May 24, 2022

uce: coredump scenario support kernel recovery · 4b0b9671

童甜根 authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I591O4


CVE: NA

--------------------------------

This patch add uce kernel recovery path support in coredump process.

Coredump file writing to fs is related to the specific implementation of
fs's write_iter operation. This patch only supports uce kernel recovery
in ext4/tmpfs/pipefs.

Coredump scenario use bit4 of uce_kernel_recover as the switch which can
be set in procfs and cmdline.

Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

4b0b9671

May 23, 2022

sched/qos: Add qos_tg_{throttle,unthrottle}_{up,down} · 453eaea6

Zhang Qiao authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4VZJT


CVE: NA

--------------------------------

1. Qos throttle reuse tg_{throttle,unthrottle}_{up,down} that
can write some cfs-bandwidth fields, it may cause some unknown
data error. So add qos_tg_{throttle,unthrottle}_{up,down} for
qos throttle.

2. walk_tg_tree_from() caller must hold rcu_lock, currently there is
none, so add it now.

Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: Chen Hui <judy.chenhui@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

4.19.90-2205.5.0

453eaea6

sched: Throttle offline task at tracehook_notify_resume() · 2701a7bb

Zhang Qiao authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4VZJT


CVE: NA

--------------------------------

Before, when detect the cpu is overloaded, we throttle offline
tasks at exit_to_user_mode_loop() before returning to user mode.
Some architects(e.g.,arm64) do not support QOS scheduler because
a task do not via exit_to_user_mode_loop() return to userspace at
these platforms.
In order to slove this problem and support qos scheduler on all
architectures, if we require throttling offline tasks, we set flag
TIF_NOTIFY_RESUME to an offline task when it is picked and throttle
it at tracehook_notify_resume().

Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: Chen Hui <judy.chenhui@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

2701a7bb

May 17, 2022

ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE · 151284c8

Jann Horn authored 2 years ago

stable inclusion
from linux-4.19.238
commit b1f438f872dcda10a79e6aeaf06fd52dfb15a6ab
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I57IYD


CVE: CVE-2022-30594

--------------------------------

commit ee1fee900537b5d9560e9f937402de5ddc8412f3 upstream.

Setting PTRACE_O_SUSPEND_SECCOMP is supposed to be a highly privileged
operation because it allows the tracee to completely bypass all seccomp
filters on kernels with CONFIG_CHECKPOINT_RESTORE=y. It is only supposed to
be settable by a process with global CAP_SYS_ADMIN, and only if that
process is not subject to any seccomp filters at all.

However, while these permission checks were done on the PTRACE_SETOPTIONS
path, they were missing on the PTRACE_SEIZE path, which also sets
user-specified ptrace flags.

Move the permissions checks out into a helper function and let both
ptrace_attach() and ptrace_setoptions() call it.

Cc: stable@kernel.org
Fixes: 13c4a901 ("seccomp: add ptrace options for suspend/resume")
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://lkml.kernel.org/r/20220319010838.1386861-1-jannh@google.com


Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Reviewed-by: Xu Kuohai <xukuohai@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

151284c8

May 07, 2022

cpuset: Fix unsafe lock order between cpuset lock and cpuslock · 9fcc65d0

Zhang Qiao authored 2 years ago

stable inclusion
from linux-4.19.236
commit aa44002e7db25f333ddf412fb81e8db6c100841a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5646A


CVE: NA

--------------------------------

The backport commit 4eec5fe1c680a ("cgroup/cpuset: Fix a race
between cpuset_attach() and cpu hotplug") looks suspicious since
it comes before commit d74b27d6 ("cgroup/cpuset: Change
cpuset_rwsem and hotplug lock order") v5.4-rc1~176^2~30 when
the locking order was: cpuset lock, cpus lock.

Fix it with the correct locking order and reduce the cpus locking
range because only set_cpus_allowed_ptr() needs the protection of
cpus lock.

Fixes: 4eec5fe1c680a ("cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug")
Reported-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

9fcc65d0

Apr 24, 2022

Revert "perf: Paper over the hw.target problems" · 0eba1f6e

Yang Jihong authored 2 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I53VHE
CVE: NA

--------------------------------

This reverts commit 03804742.

This patch is used to solve race between close() and fork() of the perf.
However, this patch is not accepted by the community. As a result,
destory interface is incorrectly invoked during the perf_remove_from_context,
causing UAF, see https://lkml.org/lkml/2019/6/28/856

.

For 4.19 kernel, he final fix patch has been incorporated, see eb41044b.
Therefore, need to revert the patch.

Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Reviewed-by: Kuohai Xu <xukuohai@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

4.19.90-2204.4.0

0eba1f6e

Apr 19, 2022

sched: Fix yet more sched_fork() races · af98db5f

Peter Zijlstra authored 2 years ago

mainline inclusion
from mainline-v5.17-rc5
commit b1e8206582f9d680cff7d04828708c8b6ab32957
category: bugfix
bugzilla: 186609, https://gitee.com/openeuler/kernel/issues/I532B0


CVE: NA

--------------------------------

Where commit 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an
invalid sched_task_group") fixed a fork race vs cgroup, it opened up a
race vs syscalls by not placing the task on the runqueue before it
gets exposed through the pidhash.

Commit 13765de8148f ("sched/fair: Fix fault in reweight_entity") is
trying to fix a single instance of this, instead fix the whole class
of issues, effectively reverting this commit.

Fixes: 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Tested-by: Zhang Qiao <zhangqiao22@huawei.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lkml.kernel.org/r/YgoeCbwj5mbCR0qA@hirez.programming.kicks-ass.net



conflict:
	include/linux/sched/task.h
	kernel/fork.c
	kernel/sched/core.c

Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: Chen Hui <judy.chenhui@huawei.com>
Reviewed-by: Wang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

af98db5f

sched/fair: Fix wrong cpu selecting from isolated domain · 742a0b5b

Xunlei Pang authored 2 years ago

mainline inclusion
from mainline-v5.10-rc1
commit df3cb4ea
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I52QKB


CVE: NA

--------------------------------

We've met problems that occasionally tasks with full cpumask
(e.g. by putting it into a cpuset or setting to full affinity)
were migrated to our isolated cpus in production environment.

After some analysis, we found that it is due to the current
select_idle_smt() not considering the sched_domain mask.

Steps to reproduce on my 31-CPU hyperthreads machine:
1. with boot parameter: "isolcpus=domain,2-31"
   (thread lists: 0,16 and 1,17)
2. cgcreate -g cpu:test; cgexec -g cpu:test "test_threads"
3. some threads will be migrated to the isolated cpu16~17.

Fix it by checking the valid domain mask in select_idle_smt().

Fixes: 10e2f1ac ("sched/core: Rewrite and improve select_idle_siblings())
Reported-by: Wetp Zhang <wetp.zy@linux.alibaba.com>
Signed-off-by: Xunlei Pang <xlpang@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/1600930127-76857-1-git-send-email-xlpang@linux.alibaba.com



Conflicts:
	kernel/sched/fair.c

Signed-off-by: Yu jiahua <yujiahua1@huawei.com>
Reviewed-by: zheng zucheng <zhengzucheng@huawei.com>
Reviewed-by: zheng zucheng <zhengzucheng@huawei.com>
Reviewed-by: Zhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

742a0b5b

Apr 12, 2022

Revert "module, async: async_synchronize_full() on module init iff async is used" · dc8da50c

Igor Pylypiv authored 2 years ago

stable inclusion
from linux-4.19.231
commit a0c66ac8b72f816d5631fde0ca0b39af602dce48

--------------------------------

[ Upstream commit 67d6212afda218d564890d1674bab28e8612170f ]

This reverts commit 774a1221.

We need to finish all async code before the module init sequence is
done.  In the reverted commit the PF_USED_ASYNC flag was added to mark a
thread that called async_schedule().  Then the PF_USED_ASYNC flag was
used to determine whether or not async_synchronize_full() needs to be
invoked.  This works when modprobe thread is calling async_schedule(),
but it does not work if module dispatches init code to a worker thread
which then calls async_schedule().

For example, PCI driver probing is invoked from a worker thread based on
a node where device is attached:

	if (cpu < nr_cpu_ids)
		error = work_on_cpu(cpu, local_pci_probe, &ddi);
	else
		error = local_pci_probe(&ddi);

We end up in ...

4.19.90-2204.2.0

dc8da50c

cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug · 6d17be8f

Zhang Qiao authored 2 years ago

stable inclusion
from linux-4.19.232
commit 4eec5fe1c680a6c47a9bc0cde00960a4eb663342

--------------------------------

commit 05c7b7a92cc87ff8d7fde189d0fade250697573c upstream.

As previously discussed(https://lkml.org/lkml/2022/1/20/51

),
cpuset_attach() is affected with similar cpu hotplug race,
as follow scenario:

     cpuset_attach()				cpu hotplug
    ---------------------------            ----------------------
    down_write(cpuset_rwsem)
    guarantee_online_cpus() // (load cpus_attach)
					sched_cpu_deactivate
					  set_cpu_active()
					  // will change cpu_active_mask
    set_cpus_allowed_ptr(cpus_attach)
      __set_cpus_allowed_ptr_locked()
       // (if the intersection of cpus_attach and
         cpu_active_mask is empty, will return -EINVAL)
    up_write(cpuset_rwsem)

To avoid races such as described above, protect cpuset_attach() call
with cpu_hotplug_lock.

Fixes: be367d09 ("cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time")
Cc: stable@vger.kernel.org # v2.6.32+
Reported-by: Zhao Gongyi <zhaogongyi@huawei.com>
Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Acked-by: Waiman Long <longman@redhat.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

6d17be8f

tracing: Fix tp_printk option related with tp_printk_stop_on_boot · 8dff553c

JaeSang Yoo authored 2 years ago

stable inclusion
from linux-4.19.231
commit 8f8c9e71e192823e4d76fdc53b4391642291bafc

--------------------------------

[ Upstream commit 3203ce39ac0b2a57a84382ec184c7d4a0bede175 ]

The kernel parameter "tp_printk_stop_on_boot" starts with "tp_printk" which is
the same as another kernel parameter "tp_printk". If "tp_printk" setup is
called before the "tp_printk_stop_on_boot", it will override the latter
and keep it from being set.

This is similar to other kernel parameter issues, such as:
  Commit 745a600c ("um: console: Ignore console= option")
or init/do_mounts.c:45 (setup function of "ro" kernel param)

Fix it by checking for a "_" right after the "tp_printk" and if that
exists do not process the parameter.

Link: https://lkml.kernel.org/r/20220208195421.969326-1-jsyoo5b@gmail.com



Signed-off-by: JaeSang Yoo <jsyoo5b@gmail.com>
[ Fixed up change log and added space after if condition ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

8dff553c

taskstats: Cleanup the use of task->exit_code · b67cb551

Eric W. Biederman authored 2 years ago

stable inclusion
from linux-4.19.231
commit c6055df602b9e9ed9427b7bb9088c75cb5cf6350

--------------------------------

commit 1b5a42d9c85f0e731f01c8d1129001fd8531a8a0 upstream.

In the function bacct_add_task the code reading task->exit_code was
introduced in commit f3cef7a9 ("[PATCH] csa: basic accounting over
taskstats"), and it is not entirely clear what the taskstats interface
is trying to return as only returning the exit_code of the first task
in a process doesn't make a lot of sense.

As best as I can figure the intent is to return task->exit_code after
a task exits.  The field is returned with per task fields, so the
exit_code of the entire process is not wanted.  Only the value of the
first task is returned so this is not a useful way to get the per task
ptrace stop code.  The ordinary case of returning this value is
returning after a task exits, which also precludes use for getting
a ptrace value.

It is common to for the first task of a process to also be the last
task of a process so this field may have done something reasonable by
accident in testing.

Make ac_exitcode a reliable per task value by always returning it for
every exited task.

Setting ac_exitcode in a sensible mannter makes it possible to continue
to provide this value going forward.

Cc: Balbir Singh <bsingharora@gmail.com>
Fixes: f3cef7a9 ("[PATCH] csa: basic accounting over taskstats")
Link: https://lkml.kernel.org/r/20220103213312.9144-5-ebiederm@xmission.com


Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
[sudip: adjust context]
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

b67cb551

perf: Fix list corruption in perf_cgroup_switch() · 90dd4e1f

Song Liu authored 2 years ago


stable inclusion
from linux-4.19.230
commit 30d9f3cbe47e1018ddc8069ac5b5c9e66fbdf727

--------------------------------

commit 5f4e5ce638e6a490b976ade4a40017b40abb2da0 upstream.

There's list corruption on cgrp_cpuctx_list. This happens on the
following path:

  perf_cgroup_switch: list_for_each_entry(cgrp_cpuctx_list)
      cpu_ctx_sched_in
         ctx_sched_in
            ctx_pinned_sched_in
              merge_sched_in
                  perf_cgroup_event_disable: remove the event from the list

Use list_for_each_entry_safe() to allow removing an entry during
iteration.

Fixes: 058fe1c0 ("perf/core: Make cgroup switch visit only cpuctxs with cgroup events")
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220204004057.2961252-1-song@kernel.org


Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

90dd4e1f

seccomp: Invalidate seccomp mode to catch death failures · e5d58f5e

Kees Cook authored 2 years ago


stable inclusion
from linux-4.19.230
commit 255264d81da6edaf4cd4fab836d1ef3ba09af6aa

--------------------------------

commit 495ac3069a6235bfdf516812a2a9b256671bbdf9 upstream.

If seccomp tries to kill a process, it should never see that process
again. To enforce this proactively, switch the mode to something
impossible. If encountered: WARN, reject all syscalls, and attempt to
kill the process again even harder.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Fixes: 8112c4f1 ("seccomp: remove 2-phase API")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

e5d58f5e

bpf: Add kconfig knob for disabling unpriv bpf by default · 25c90cb2

Daniel Borkmann authored 2 years ago


stable inclusion
from linux-4.19.230
commit 07e7f7cc619d15645e45d04b1c99550c6d292e9c

--------------------------------

commit 08389d888287c3823f80b0216766b71e17f0aba5 upstream.

Add a kconfig knob which allows for unprivileged bpf to be disabled by default.
If set, the knob sets /proc/sys/kernel/unprivileged_bpf_disabled to value of 2.

This still allows a transition of 2 -> {0,1} through an admin. Similarly,
this also still keeps 1 -> {1} behavior intact, so that once set to permanently
disabled, it cannot be undone aside from a reboot.

We've also added extra2 with max of 2 for the procfs handler, so that an admin
still has a chance to toggle between 0 <-> 2.

Either way, as an additional alternative, applications can make use of CAP_BPF
that we added a while ago.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/74ec548079189e4e4dffaeb42b8987bb3c852eee.1620765074.git.daniel@iogearbox.net


[fllinden@amazon.com: backported to 4.19]
Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

25c90cb2

PM: wakeup: simplify the output logic of pm_show_wakelocks() · 4b4a6296

Greg Kroah-Hartman authored 2 years ago


stable inclusion
from linux-4.19.228
commit cf94a8a3ff3d7aa06742f1beb5166fa45ba72122

--------------------------------

commit c9d967b2ce40d71e968eb839f36c936b8a9cf1ea upstream.

The buffer handling in pm_show_wakelocks() is tricky, and hopefully
correct.  Ensure it really is correct by using sysfs_emit_at() which
handles all of the tricky string handling logic in a PAGE_SIZE buffer
for us automatically as this is a sysfs file being read from.

Reviewed-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

4b4a6296

Apr 11, 2022

printk: Convert a use of sprintf to snprintf in console_unlock · df45decf

Nathan Chancellor authored 2 years ago

mainline inclusion
from mainline-v5.8-rc1
commit 5661dd95
category: bugfix
bugzilla: 91291
CVE: N/A

--------------------------------

When CONFIG_PRINTK is disabled (e.g. when building allnoconfig), clang
warns:

../kernel/printk/printk.c:2416:10: warning: 'sprintf' will always
overflow; destination buffer has size 0, but format string expands to at
least 33 [-Wfortify-source]
			len = sprintf(text,
			      ^
1 warning generated.

It is not wrong; text has a zero size when CONFIG_PRINTK is disabled
because LOG_LINE_MAX and PREFIX_MAX are both zero. Change to snprintf so
that this case is explicitly handled without any risk of overflow.

Link: https://github.com/ClangBuiltLinux/linux/issues/846
Link: https://github.com/llvm/llvm-project/commit/6d485ff455ea2b37fef9e06e426dae6c1241b231
Link: http://lkml.kernel.org/r/20200130221644.2273-1-natechancellor@gmail.com


Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org
Cc: clang-built-linux@googlegroups.com
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Yi Yang <yiyang13@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

df45decf

Apr 02, 2022

sched/fair: Add qos_throttle_list node in struct cfs_rq · fb59563c

Zhang Qiao authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I50PPU


CVE: NA

-----------------------------------------------------------------

when unthrottle a cfs_rq at distribute_cfs_runtime(), another cpu
may re-throttle this cfs_rq at qos_throttle_cfs_rq() before access
the cfs_rq->throttle_list.next, but meanwhile, qos throttle will
attach the cfs_rq throttle_list node to percpu qos_throttled_cfs_rq,
it will change cfs_rq->throttle_list.next and cause panic or hardlockup
at distribute_cfs_runtime().

Fix it by adding a qos_throttle_list node in struct cfs_rq, and qos
throttle disuse the cfs_rq->throttle_list.

Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Reviewed-by: zheng zucheng <zhengzucheng@huawei.com>
Reviewed-by: Chen Hui <judy.chenhui@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

fb59563c

Reinstate some of "swiotlb: rework "fix info leak with DMA_FROM_DEVICE"" · 3e109690

Linus Torvalds authored 3 years ago

mainline inclusion
from mainline-v5.18-rc1
commit 901c7280ca0d5e2b4a8929fbe0bfb007ac2a6544
category: bugfix
bugzilla: 186478, https://gitee.com/openeuler/kernel/issues/I4Z86P
CVE: CVE-2022-0854

--------------------------------

Halil Pasic points out [1] that the full revert of that commit (revert
in bddac7c1e02b), and that a partial revert that only reverts the
problematic case, but still keeps some of the cleanups is probably
better.  

And that partial revert [2] had already been verified by Oleksandr
Natalenko to also fix the issue, I had just missed that in the long
discussion.

So let's reinstate the cleanups from commit aa6f8dcbab47 ("swiotlb:
rework "fix info leak with DMA_FROM_DEVICE""), and effectively only
revert the part that caused problems.

Link: https://lore.kernel.org/all/20220328013731.017ae3e3.pasic@linux.ibm.com/ [1]
Link: https://lore.kernel.org/all/20220324055732.GB12078@lst.de/ [2]
Link: https://lore.kernel.org/all/4386660.LvFx2qVVIh@natalenko.name/

 [3]
Suggested-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Cc: Christoph Hellwig" <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:
	Documentation/core-api/dma-attributes.rst
	include/linux/dma-mapping.h
	kernel/dma/swiotlb.c
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

3e109690

Revert "swiotlb: rework "fix info leak with DMA_FROM_DEVICE"" · a5e62d73

Linus Torvalds authored 3 years ago

mainline inclusion
from mainline-v5.18-rc1
commit bddac7c1e02ba47f0570e494c9289acea3062cc1
category: bugfix
bugzilla: 186478, https://gitee.com/openeuler/kernel/issues/I4Z86P


CVE: CVE-2022-0854

--------------------------------

This reverts commit aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13.

It turns out this breaks at least the ath9k wireless driver, and
possibly others.

What the ath9k driver does on packet receive is to set up the DMA
transfer with:

  int ath_rx_init(..)
  ..
                bf->bf_buf_addr = dma_map_single(sc->dev, skb->data,
                                                 common->rx_bufsize,
                                                 DMA_FROM_DEVICE);

and then the receive logic (through ath_rx_tasklet()) will fetch
incoming packets

  static bool ath_edma_get_buffers(..)
  ..
        dma_sync_single_for_cpu(sc->dev, bf->bf_buf_addr,
                                common->rx_bufsize, DMA_FROM_DEVICE);

        ret = ath9k_hw_process_rxdesc_edma(ah, rs, skb->data);
        if (ret == -EINPROGRESS) {
                /*let device gain the buffer again*/
                dma_sync_single_for_device(sc->dev, bf->bf_buf_addr,
                                common->rx_bufsize, DMA_FROM_DEVICE);
                return false;
        }

and it's worth noting how that first DMA sync:

    dma_sync_single_for_cpu(..DMA_FROM_DEVICE);

is there to make sure the CPU can read the DMA buffer (possibly by
copying it from the bounce buffer area, or by doing some cache flush).
The iommu correctly turns that into a "copy from bounce bufer" so that
the driver can look at the state of the packets.

In the meantime, the device may continue to write to the DMA buffer, but
we at least have a snapshot of the state due to that first DMA sync.

But that _second_ DMA sync:

    dma_sync_single_for_device(..DMA_FROM_DEVICE);

is telling the DMA mapping that the CPU wasn't interested in the area
because the packet wasn't there.  In the case of a DMA bounce buffer,
that is a no-op.

Note how it's not a sync for the CPU (the "for_device()" part), and it's
not a sync for data written by the CPU (the "DMA_FROM_DEVICE" part).

Or rather, it _should_ be a no-op.  That's what commit aa6f8dcbab47
broke: it made the code bounce the buffer unconditionally, and changed
the DMA_FROM_DEVICE to just unconditionally and illogically be
DMA_TO_DEVICE.

[ Side note: purely within the confines of the swiotlb driver it wasn't
  entirely illogical: The reason it did that odd DMA_FROM_DEVICE ->
  DMA_TO_DEVICE conversion thing is because inside the swiotlb driver,
  it uses just a swiotlb_bounce() helper that doesn't care about the
  whole distinction of who the sync is for - only which direction to
  bounce.

  So it took the "sync for device" to mean that the CPU must have been
  the one writing, and thought it meant DMA_TO_DEVICE. ]

Also note how the commentary in that commit was wrong, probably due to
that whole confusion, claiming that the commit makes the swiotlb code

                                  "bounce unconditionally (that is, also
    when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
    data from the swiotlb buffer"

which is nonsensical for two reasons:

 - that "also when dir == DMA_TO_DEVICE" is nonsensical, as that was
   exactly when it always did - and should do - the bounce.

 - since this is a sync for the device (not for the CPU), we're clearly
   fundamentally not coping back stale data from the bounce buffers at
   all, because we'd be copying *to* the bounce buffers.

So that commit was just very confused.  It confused the direction of the
synchronization (to the device, not the cpu) with the direction of the
DMA (from the device).

Reported-and-bisected-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Reported-by: Olha Cherevyk <olha.cherevyk@gmail.com>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Toke Høiland-Jørgensen <toke@toke.dk>
Cc: Maxime Bizon <mbizon@freebox.fr>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:
	Documentation/core-api/dma-attributes.rst
	include/linux/dma-mapping.h
	kernel/dma/swiotlb.c
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

a5e62d73

Mar 23, 2022

swiotlb: rework "fix info leak with DMA_FROM_DEVICE" · 3f80e186

Halil Pasic authored 3 years ago

mainline inclusion
from mainline-v5.17-rc8
commit aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13
category: bugfix
bugzilla: 186478, https://gitee.com/openeuler/kernel/issues/I4Z86P


CVE: CVE-2022-0854

--------------------------------

Unfortunately, we ended up merging an old version of the patch "fix info
leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph
(the swiotlb maintainer), he asked me to create an incremental fix
(after I have pointed this out the mix up, and asked him for guidance).
So here we go.

The main differences between what we got and what was agreed are:
* swiotlb_sync_single_for_device is also required to do an extra bounce
* We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters
* The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE
  must take precedence over DMA_ATTR_SKIP_CPU_SYNC

Thus this patch removes DMA_ATTR_OVERWRITE, and makes
swiotlb_sync_single_for_device() bounce unconditionally (that is, also
when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
data from the swiotlb buffer.

Let me note, that if the size used with dma_sync_* API is less than the
size used with dma_[un]map_*, under certain circumstances we may still
end up with swiotlb not being transparent. In that sense, this is no
perfect fix either.

To get this bullet proof, we would have to bounce the entire
mapping/bounce buffer. For that we would have to figure out the starting
address, and the size of the mapping in
swiotlb_sync_single_for_device(). While this does seem possible, there
seems to be no firm consensus on how things are supposed to work.

Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Fixes: ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:
	Documentation/core-api/dma-attributes.rst
	include/linux/dma-mapping.h
	kernel/dma/swiotlb.c
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

3f80e186

swiotlb: fix info leak with DMA_FROM_DEVICE · 04c20fc8

Halil Pasic authored 3 years ago

mainline inclusion
from mainline-v5.17-rc6
commit ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e
category: bugfix
bugzilla: 186478, https://gitee.com/openeuler/kernel/issues/I4Z86P


CVE: CVE-2022-0854

--------------------------------

The problem I'm addressing was discovered by the LTP test covering
cve-2018-1000204.

A short description of what happens follows:
1) The test case issues a command code 00 (TEST UNIT READY) via the SG_IO
   interface with: dxfer_len == 524288, dxdfer_dir == SG_DXFER_FROM_DEV
   and a corresponding dxferp. The peculiar thing about this is that TUR
   is not reading from the device.
2) In sg_start_req() the invocation of blk_rq_map_user() effectively
   bounces the user-space buffer. As if the device was to transfer into
   it. Since commit a45b599a ("scsi: sg: allocate with __GFP_ZERO in
   sg_build_indirect()") we make sure this first bounce buffer is
   allocated with GFP_ZERO.
3) For the rest of the story we keep ignoring that we have a TUR, so the
   device won't touch the buffer we prepare as if the we had a
   DMA_FROM_DEVICE type of situation. My setup uses a virtio-scsi device
   and the  buffer allocated by SG is mapped by the function
   virtqueue_add_split() which uses DMA_FROM_DEVICE for the "in" sgs (here
   scatter-gather and not scsi generics). This mapping involves bouncing
   via the swiotlb (we need swiotlb to do virtio in protected guest like
   s390 Secure Execution, or AMD SEV).
4) When the SCSI TUR is done, we first copy back the content of the second
   (that is swiotlb) bounce buffer (which most likely contains some
   previous IO data), to the first bounce buffer, which contains all
   zeros.  Then we copy back the content of the first bounce buffer to
   the user-space buffer.
5) The test case detects that the buffer, which it zero-initialized,
  ain't all zeros and fails.

One can argue that this is an swiotlb problem, because without swiotlb
we leak all zeros, and the swiotlb should be transparent in a sense that
it does not affect the outcome (if all other participants are well
behaved).

Copying the content of the original buffer into the swiotlb buffer is
the only way I can think of to make swiotlb transparent in such
scenarios. So let's do just that if in doubt, but allow the driver
to tell us that the whole mapped buffer is going to be overwritten,
in which case we can preserve the old behavior and avoid the performance
impact of the extra bounce.

Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Conflicts:
	Documentation/core-api/dma-attributes.rst
	include/linux/dma-mapping.h
	kernel/dma/swiotlb.c
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

04c20fc8

Mar 21, 2022

livepatch/core: Check klp_func before 'klp_init_object_loaded' · b89cbe68

Zheng Yejian authored 3 years ago

hulk inclusion
category: feature
bugzilla: 186346, https://gitee.com/openeuler/kernel/issues/I4WBFN


CVE: NA

--------------------------------

Refer to following procedure:
  klp_init_object
    klp_init_object_loaded
      klp_find_object_symbol <-- 1. oops happened when old_name is NULL!!!
    klp_init_func  <-- 2. currently old_name is first time check here

This problem was introduced in commit 453d3845 ("livepatch/arm64:
fix func size less than limit") which exchange order of 'klp_init_func'
and 'klp_init_object_loaded' then cause old_name being used before check.

We move these checks before 'klp_init_object_loaded' and add several
logs to tell why check failed.

Fixes: 453d3845 ("livepatch/arm64: fix func size less than limit")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Reviewed-by: Cheng Jian <cj.chengjian@huawei.com>
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>

b89cbe68

Mar 17, 2022

audit: improve audit queue handling when "audit=1" on cmdline · 1518b80b

Paul Moore authored 3 years ago

mainline inclusion
from mainline-v5.17-rc3
commit f26d04331360d42dbd6b58448bd98e4edbfbe1c5
category: bugfix
bugzilla: 186384 https://gitee.com/openeuler/kernel/issues/I4X1AI?from=project-issue
CVE: NA

--------------------------------

When an admin enables audit at early boot via the "audit=1" kernel
command line the audit queue behavior is slightly different; the
audit subsystem goes to greater lengths to avoid dropping records,
which unfortunately can result in problems when the audit daemon is
forcibly stopped for an extended period of time.

This patch makes a number of changes designed to improve the audit
queuing behavior so that leaving the audit daemon in a stopped state
for an extended period does not cause a significant impact to the
system.

- kauditd_send_queue() is now limited to looping through the
  passed queue only once per call.  This not only prevents the
  function from looping indefinitely when records are returned
  to th...

1518b80b

Revert "audit: bugfix for infinite loop when flush the hold queue" · a9140c08

Cui GaoSheng authored 3 years ago

hulk inclusion
category: bugfix
bugzilla: 186384 https://gitee.com/openeuler/kernel/issues/I4X1AI?from=project-issue


CVE: NA

--------------------------------

This reverts commit 67ab712f.

Signed-off-by: Cui GaoSheng <cuigaosheng1@huawei.com>
Reviewed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>

a9140c08