Commits · 3a488a413cffcad8fe58dc5a97974126d41fa4f8 · Summer2022 / 22b970497

Jun 30, 2021

x86/kprobes: Fix to check non boostable prefixes correctly · 3a488a41


stable inclusion
from linux-4.19.191
commit 609a2d6557e8a6d5dbb6bfcfc5b42185526a0c0b

--------------------------------

[ Upstream commit 6dd3b8c9f58816a1354be39559f630cd1bd12159 ]

There are 2 bugs in the can_boost() function because of using
x86 insn decoder. Since the insn->opcode never has a prefix byte,
it can not find CS override prefix in it. And the insn->attr is
the attribute of the opcode, thus inat_is_address_size_prefix(
insn->attr) always returns false.

Fix those by checking each prefix bytes with for_each_insn_prefix
loop and getting the correct attribute for each prefix byte.
Also, this removes unlikely, because this is a slow path.

Fixes: a8d11cd0 ("kprobes/x86: Consolidate insn decoder users for copying code")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/161666691162.1120877.2808435205294352583.stgit@devnote2


Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

3a488a41

ACPI: CPPC: Replace cppc_attr with kobj_attribute · 901ee46b

Nathan Chancellor authored 3 years ago

stable inclusion
from linux-4.19.191
commit 443a6fdcd74be7c023e48c8fd5b589a661de7fe3

--------------------------------

[ Upstream commit 2bc6262c6117dd18106d5aa50d53e945b5d99c51 ]

All of the CPPC sysfs show functions are called via indirect call in
kobj_attr_show(), where they should be of type

ssize_t (*show)(struct kobject *kobj, struct kobj_attribute *attr, char *buf);

because that is the type of the ->show() member in
'struct kobj_attribute' but they are actually of type

ssize_t (*show)(struct kobject *kobj, struct attribute *attr, char *buf);

because of the ->show() member in 'struct cppc_attr', resulting in a
Control Flow Integrity violation [1].

$ cat /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf
3400

$ dmesg | grep "CFI failure"
[  175.970559] CFI failure (target: show_highest_perf+0x0/0x8):

As far as I can tell, the only difference between 'struct cppc_attr'
and 'struct kobj_attribute' aside from the type of the attr parameter
is the type of the count parameter in the ->store() member (ssize_t vs.
size_t), which does not actually matter because all of these nodes are
read-only.

Eliminate 'struct cppc_attr' in favor of 'struct kobj_attribute' to fix
the violation.

[1]: https://lore.kernel.org/r/20210401233216.2540591-1-samitolvanen@google.com/

Fixes: 158c998e ("ACPI / CPPC: add sysfs support to compute delivered performance")
Link: https://github.com/ClangBuiltLinux/linux/issues/1343


Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

901ee46b

irqchip/gic-v3: Fix OF_BAD_ADDR error handling · 3279167f

Arnd Bergmann authored 3 years ago


stable inclusion
from linux-4.19.191
commit f1dfa84c8f26d6bf372df85837ae0d7fbbc127a9

--------------------------------

[ Upstream commit 8e13d96670a4c050d4883e6743a9e9858e5cfe10 ]

When building with extra warnings enabled, clang points out a
mistake in the error handling:

drivers/irqchip/irq-gic-v3-mbi.c:306:21: error: result of comparison of constant 18446744073709551615 with expression of type 'phys_addr_t' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare]
                if (mbi_phys_base == OF_BAD_ADDR) {

Truncate the constant to the same type as the variable it gets compared
to, to shut make the check work and void the warning.

Fixes: 50528752 ("irqchip/gic-v3: Add support for Message Based Interrupts as an MSI controller")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210323131842.2773094-1-arnd@kernel.org


Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

3279167f

x86/microcode: Check for offline CPUs before requesting new microcode · 400e8d37

Otavio Pontes authored 3 years ago


stable inclusion
from linux-4.19.191
commit af6d4954f9d43def013d3621d944ea2770bbd08b

--------------------------------

[ Upstream commit 7189b3c11903667808029ec9766a6e96de5012a5 ]

Currently, the late microcode loading mechanism checks whether any CPUs
are offlined, and, in such a case, aborts the load attempt.

However, this must be done before the kernel caches new microcode from
the filesystem. Otherwise, when offlined CPUs are onlined later, those
cores are going to be updated through the CPU hotplug notifier callback
with the new microcode, while CPUs previously onine will continue to run
with the older microcode.

For example:

Turn off one core (2 threads):

  echo 0 > /sys/devices/system/cpu/cpu3/online
  echo 0 > /sys/devices/system/cpu/cpu1/online

Install the ucode fails because a primary SMT thread is offline:

  cp intel-ucode/06-8e-09 /lib/firmware/intel-ucode/
  echo 1 > /sys/devices/system/cpu/microcode/reload
  bash: echo: write error: Invalid argument

Turn the core back on

  echo 1 > /sys/devices/system/cpu/cpu3/online
  echo 1 > /sys/devices/system/cpu/cpu1/online
  cat /proc/cpuinfo |grep microcode
  microcode : 0x30
  microcode : 0xde
  microcode : 0x30
  microcode : 0xde

The rationale for why the update is aborted when at least one primary
thread is offline is because even if that thread is soft-offlined
and idle, it will still have to participate in broadcasted MCE's
synchronization dance or enter SMM, and in both examples it will execute
instructions so it better have the same microcode revision as the other
cores.

 [ bp: Heavily edit and extend commit message with the reasoning behind all
   this. ]

Fixes: 30ec26da ("x86/microcode: Do not upload microcode if CPUs are offline")
Signed-off-by: Otavio Pontes <otavio.pontes@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Ashok Raj <ashok.raj@intel.com>
Link: https://lkml.kernel.org/r/20210319165515.9240-2-otavio.pontes@intel.com


Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

400e8d37

ovl: fix missing revert_creds() on error path · 49e48a7c

Dan Carpenter authored 3 years ago


stable inclusion
from linux-4.19.191
commit a42e385f0a4160709882b9435b874a7cdab24d46

--------------------------------

commit 7b279bbfd2b230c7a210ff8f405799c7e46bbf48 upstream.

Smatch complains about missing that the ovl_override_creds() doesn't
have a matching revert_creds() if the dentry is disconnected.  Fix this
by moving the ovl_override_creds() until after the disconnected check.

Fixes: aa3ff3c1 ("ovl: copy up of disconnected dentries")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

49e48a7c

x86/cpu: Initialize MSR_TSC_AUX if RDTSCP *or* RDPID is supported · dfb4f495

Sean Christopherson authored 3 years ago


stable inclusion
from linux-4.19.191
commit bd1b3b2bd4dae57e5a2e06c340531137af633368

--------------------------------

commit b6b4fbd90b155a0025223df2c137af8a701d53b3 upstream.

Initialize MSR_TSC_AUX with CPU node information if RDTSCP or RDPID is
supported.  This fixes a bug where vdso_read_cpunode() will read garbage
via RDPID if RDPID is supported but RDTSCP is not.  While no known CPU
supports RDPID but not RDTSCP, both Intel's SDM and AMD's APM allow for
RDPID to exist without RDTSCP, e.g. it's technically a legal CPU model
for a virtual machine.

Note, technically MSR_TSC_AUX could be initialized if and only if RDPID
is supported since RDTSCP is currently not used to retrieve the CPU node.
But, the cost of the superfluous WRMSR is negigible, whereas leaving
MSR_TSC_AUX uninitialized is just asking for future breakage if someone
decides to utilize RDTSCP.

Fixes: a582c540 ("x86/vdso: Use RDPID in preference to LSL when available")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210504225632.1532621-2-seanjc@google.com


Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

dfb4f495

md: Fix missing unused status line of /proc/mdstat · 0266dcd5

Jan Glauber authored 3 years ago


stable inclusion
from linux-4.19.191
commit bc1355f10b2acddee56680f567edaa0e4a18db44

--------------------------------

commit 7abfabaf5f805f5171d133ce6af9b65ab766e76a upstream.

Reading /proc/mdstat with a read buffer size that would not
fit the unused status line in the first read will skip this
line from the output.

So 'dd if=/proc/mdstat bs=64 2>/dev/null' will not print something
like: unused devices: <none>

Don't return NULL immediately in start() for v=2 but call
show() once to print the status line also for multiple reads.

Cc: stable@vger.kernel.org
Fixes: 1f4aace6 ("fs/seq_file.c: simplify seq_file iteration code and interface")
Signed-off-by: Jan Glauber <jglauber@digitalocean.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

0266dcd5

md: md_open returns -EBUSY when entering racing area · be08a107

Zhao Heming authored 3 years ago


stable inclusion
from linux-4.19.191
commit 5db6e4c5c1284cec6720afe27f202a3dca697def

--------------------------------

commit 6a4db2a60306eb65bfb14ccc9fde035b74a4b4e7 upstream.

commit d3374825 ("md: make devices disappear when they are no longer
needed.") introduced protection between mddev creating & removing. The
md_open shouldn't create mddev when all_mddevs list doesn't contain
mddev. With currently code logic, there will be very easy to trigger
soft lockup in non-preempt env.

This patch changes md_open returning from -ERESTARTSYS to -EBUSY, which
will break the infinitely retry when md_open enter racing area.

This patch is partly fix soft lockup issue, full fix needs mddev_find
is split into two functions: mddev_find & mddev_find_or_alloc. And
md_open should call new mddev_find (it only does searching job).

For more detail, please refer with Christoph's "split mddev_find" patch
in later commits.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

be08a107

md: factor out a mddev_find_locked helper from mddev_find · 69d5e5ca

Christoph Hellwig authored 3 years ago


stable inclusion
from linux-4.19.191
commit 587241f748424a7532b1cd9536b2128a47ee54a1

--------------------------------

commit 8b57251f9a91f5e5a599de7549915d2d226cc3af upstream.

Factor out a self-contained helper to just lookup a mddev by the dev_t
"unit".

Cc: stable@vger.kernel.org
Reviewed-by: Heming Zhao <heming.zhao@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

69d5e5ca

md: split mddev_find · 5d2fc79a

Christoph Hellwig authored 3 years ago


stable inclusion
from linux-4.19.191
commit a61234356fcce492179427e5540bdce2abaa0191

--------------------------------

commit 65aa97c4d2bfd76677c211b9d03ef05a98c6d68e upstream.

Split mddev_find into a simple mddev_find that just finds an existing
mddev by the unit number, and a more complicated mddev_find that deals
with find or allocating a mddev.

This turns out to fix this bug reported by Zhao Heming.

----------------------------- snip ------------------------------
commit d3374825 ("md: make devices disappear when they are no longer
needed.") introduced protection between mddev creating & removing. The
md_open shouldn't create mddev when all_mddevs list doesn't contain
mddev. With currently code logic, there will be very easy to trigger
soft lockup in non-preempt env.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

5d2fc79a

md-cluster: fix use-after-free issue when removing rdev · b729adef

Heming Zhao authored 3 years ago


stable inclusion
from linux-4.19.191
commit 121243301effabf451294a4a77a582f3925dafe3

--------------------------------

commit f7c7a2f9a23e5b6e0f5251f29648d0238bb7757e upstream.

md_kick_rdev_from_array will remove rdev, so we should
use rdev_for_each_safe to search list.

How to trigger:

env: Two nodes on kvm-qemu x86_64 VMs (2C2G with 2 iscsi luns).

```
node2=192.168.0.3

for i in {1..20}; do
    echo ==== $i `date` ====;

    mdadm -Ss && ssh ${node2} "mdadm -Ss"
    wipefs -a /dev/sda /dev/sdb

    mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l 1 /dev/sda \
       /dev/sdb --assume-clean
    ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb"
    mdadm --wait /dev/md0
    ssh ${node2} "mdadm --wait /dev/md0"

    mdadm --manage /dev/md0 --fail /dev/sda --remove /dev/sda
    sleep 1
done
```

Crash stack:

```
stack segment: 0000 [#1] SMP
... ...
RIP: 0010:md_check_recovery+0x1e8/0x570 [md_mod]
... ...
RSP: 0018:ffffb149807a7d68 EFLAGS: 00010207
RAX: 0000000000000000 RBX: ffff9d494c180800 RCX: ffff9d490fc01e50
RDX: fffff047c0ed8308 RSI: 0000000000000246 RDI: 0000000000000246
RBP: 6b6b6b6b6b6b6b6b R08: ffff9d490fc01e40 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
R13: ffff9d494c180818 R14: ffff9d493399ef38 R15: ffff9d4933a1d800
FS:  0000000000000000(0000) GS:ffff9d494f700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe68cab9010 CR3: 000000004c6be001 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 raid1d+0x5c/0xd40 [raid1]
 ? finish_task_switch+0x75/0x2a0
 ? lock_timer_base+0x67/0x80
 ? try_to_del_timer_sync+0x4d/0x80
 ? del_timer_sync+0x41/0x50
 ? schedule_timeout+0x254/0x2d0
 ? md_start_sync+0xe0/0xe0 [md_mod]
 ? md_thread+0x127/0x160 [md_mod]
 md_thread+0x127/0x160 [md_mod]
 ? wait_woken+0x80/0x80
 kthread+0x10d/0x130
 ? kthread_park+0xa0/0xa0
 ret_from_fork+0x1f/0x40
```

Fixes: dbb64f86 ("md-cluster: Fix adding of new disk with new reload code")
Fixes: 659b254f ("md-cluster: remove a disk asynchronously from cluster environment")
Cc: stable@vger.kernel.org
Reviewed-by: Gang He <ghe@suse.com>
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

b729adef

md/bitmap: wait for external bitmap writes to complete during tear down · 584a4f1c

Sudhakar Panneerselvam authored 3 years ago


stable inclusion
from linux-4.19.191
commit 945eec9f2aa1510f625b599cae66000f8b091211

--------------------------------

commit 404a8ef512587b2460107d3272c17a89aef75edf upstream.

NULL pointer dereference was observed in super_written() when it tries
to access the mddev structure.

[The below stack trace is from an older kernel, but the problem described
in this patch applies to the mainline kernel.]

[ 1194.474861] task: ffff8fdd20858000 task.stack: ffffb99d40790000
[ 1194.488000] RIP: 0010:super_written+0x29/0xe1
[ 1194.499688] RSP: 0018:ffff8ffb7fcc3c78 EFLAGS: 00010046
[ 1194.512477] RAX: 0000000000000000 RBX: ffff8ffb7bf4a000 RCX: ffff8ffb78991048
[ 1194.527325] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8ffb56b8a200
[ 1194.542576] RBP: ffff8ffb7fcc3c90 R08: 000000000000000b R09: 0000000000000000
[ 1194.558001] R10: ffff8ffb56b8a298 R11: 0000000000000000 R12: ffff8ffb56b8a200
[ 1194.573070] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1194.588117] FS:  0000000000000000(0000) GS:ffff8ffb7fcc0000(0000) knlGS:0000000000000000
[ 1194.604264] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1194.617375] CR2: 00000000000002b8 CR3: 00000021e040a002 CR4: 00000000007606e0
[ 1194.632327] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1194.647865] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1194.663316] PKRU: 55555554
[ 1194.674090] Call Trace:
[ 1194.683735]  <IRQ>
[ 1194.692948]  bio_endio+0xae/0x135
[ 1194.703580]  blk_update_request+0xad/0x2fa
[ 1194.714990]  blk_update_bidi_request+0x20/0x72
[ 1194.726578]  __blk_end_bidi_request+0x2c/0x4d
[ 1194.738373]  __blk_end_request_all+0x31/0x49
[ 1194.749344]  blk_flush_complete_seq+0x377/0x383
[ 1194.761550]  flush_end_io+0x1dd/0x2a7
[ 1194.772910]  blk_finish_request+0x9f/0x13c
[ 1194.784544]  scsi_end_request+0x180/0x25c
[ 1194.796149]  scsi_io_completion+0xc8/0x610
[ 1194.807503]  scsi_finish_command+0xdc/0x125
[ 1194.818897]  scsi_softirq_done+0x81/0xde
[ 1194.830062]  blk_done_softirq+0xa4/0xcc
[ 1194.841008]  __do_softirq+0xd9/0x29f
[ 1194.851257]  irq_exit+0xe6/0xeb
[ 1194.861290]  do_IRQ+0x59/0xe3
[ 1194.871060]  common_interrupt+0x1c6/0x382
[ 1194.881988]  </IRQ>
[ 1194.890646] RIP: 0010:cpuidle_enter_state+0xdd/0x2a5
[ 1194.902532] RSP: 0018:ffffb99d40793e68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff43
[ 1194.917317] RAX: ffff8ffb7fce27c0 RBX: ffff8ffb7fced800 RCX: 000000000000001f
[ 1194.932056] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
[ 1194.946428] RBP: ffffb99d40793ea0 R08: 0000000000000004 R09: 0000000000002ed2
[ 1194.960508] R10: 0000000000002664 R11: 0000000000000018 R12: 0000000000000003
[ 1194.974454] R13: 000000000000000b R14: ffffffff925715a0 R15: 0000011610120d5a
[ 1194.988607]  ? cpuidle_enter_state+0xcc/0x2a5
[ 1194.999077]  cpuidle_enter+0x17/0x19
[ 1195.008395]  call_cpuidle+0x23/0x3a
[ 1195.017718]  do_idle+0x172/0x1d5
[ 1195.026358]  cpu_startup_entry+0x73/0x75
[ 1195.035769]  start_secondary+0x1b9/0x20b
[ 1195.044894]  secondary_startup_64+0xa5/0xa5
[ 1195.084921] RIP: super_written+0x29/0xe1 RSP: ffff8ffb7fcc3c78
[ 1195.096354] CR2: 00000000000002b8

bio in the above stack is a bitmap write whose completion is invoked after
the tear down sequence sets the mddev structure to NULL in rdev.

During tear down, there is an attempt to flush the bitmap writes, but for
external bitmaps, there is no explicit wait for all the bitmap writes to
complete. For instance, md_bitmap_flush() is called to flush the bitmap
writes, but the last call to md_bitmap_daemon_work() in md_bitmap_flush()
could generate new bitmap writes for which there is no explicit wait to
complete those writes. The call to md_bitmap_update_sb() will return
simply for external bitmaps and the follow-up call to md_update_sb() is
conditional and may not get called for external bitmaps. This results in a
kernel panic when the completion routine, super_written() is called which
tries to reference mddev in the rdev that has been set to
NULL(in unbind_rdev_from_array() by tear down sequence).

The solution is to call md_super_wait() for external bitmaps after the
last call to md_bitmap_daemon_work() in md_bitmap_flush() to ensure there
are no pending bitmap writes before proceeding with the tear down.

Cc: stable@vger.kernel.org
Signed-off-by: Sudhakar Panneerselvam <sudhakar.panneerselvam@oracle.com>
Reviewed-by: Zhao Heming <heming.zhao@suse.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

584a4f1c

dm rq: fix double free of blk_mq_tag_set in dev remove after table load fails · 055972fc

Benjamin Block authored 3 years ago


stable inclusion
from linux-4.19.191
commit 772b9f59657665af3b68d24d12b9d172d31f0dfb

--------------------------------

commit 8e947c8f4a5620df77e43c9c75310dc510250166 upstream.

When loading a device-mapper table for a request-based mapped device,
and the allocation/initialization of the blk_mq_tag_set for the device
fails, a following device remove will cause a double free.

E.g. (dmesg):
  device-mapper: core: Cannot initialize queue for request-based dm-mq mapped device
  device-mapper: ioctl: unable to set up device queue for new table.
  Unable to handle kernel pointer dereference in virtual kernel address space
  Failing address: 0305e098835de000 TEID: 0305e098835de803
  Fault in home space mode while using kernel ASCE.
  AS:000000025efe0007 R3:0000000000000024
  Oops: 0038 ilc:3 [#1] SMP
  Modules linked in: ... lots of modules ...
  Supported: Yes, External
  CPU: 0 PID: 7348 Comm: multipathd Kdump: loaded Tainted: G        W      X    5.3.18-53-default #1 SLE15-SP3
  Hardware name: IBM 8561 T01 7I2 (LPAR)
  Krnl PSW : 0704e00180000000 000000025e368eca (kfree+0x42/0x330)
             R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
  Krnl GPRS: 000000000000004a 000000025efe5230 c1773200d779968d 0000000000000000
             000000025e520270 000000025e8d1b40 0000000000000003 00000007aae10000
             000000025e5202a2 0000000000000001 c1773200d779968d 0305e098835de640
             00000007a8170000 000003ff80138650 000000025e5202a2 000003e00396faa8
  Krnl Code: 000000025e368eb8: c4180041e100       lgrl    %r1,25eba50b8
             000000025e368ebe: ecba06b93a55       risbg   %r11,%r10,6,185,58
            #000000025e368ec4: e3b010000008       ag      %r11,0(%r1)
            >000000025e368eca: e310b0080004       lg      %r1,8(%r11)
             000000025e368ed0: a7110001           tmll    %r1,1
             000000025e368ed4: a7740129           brc     7,25e369126
             000000025e368ed8: e320b0080004       lg      %r2,8(%r11)
             000000025e368ede: b904001b           lgr     %r1,%r11
  Call Trace:
   [<000000025e368eca>] kfree+0x42/0x330
   [<000000025e5202a2>] blk_mq_free_tag_set+0x72/0xb8
   [<000003ff801316a8>] dm_mq_cleanup_mapped_device+0x38/0x50 [dm_mod]
   [<000003ff80120082>] free_dev+0x52/0xd0 [dm_mod]
   [<000003ff801233f0>] __dm_destroy+0x150/0x1d0 [dm_mod]
   [<000003ff8012bb9a>] dev_remove+0x162/0x1c0 [dm_mod]
   [<000003ff8012a988>] ctl_ioctl+0x198/0x478 [dm_mod]
   [<000003ff8012ac8a>] dm_ctl_ioctl+0x22/0x38 [dm_mod]
   [<000000025e3b11ee>] ksys_ioctl+0xbe/0xe0
   [<000000025e3b127a>] __s390x_sys_ioctl+0x2a/0x40
   [<000000025e8c15ac>] system_call+0xd8/0x2c8
  Last Breaking-Event-Address:
   [<000000025e52029c>] blk_mq_free_tag_set+0x6c/0xb8
  Kernel panic - not syncing: Fatal exception: panic_on_oops

When allocation/initialization of the blk_mq_tag_set fails in
dm_mq_init_request_queue(), it is uninitialized/freed, but the pointer
is not reset to NULL; so when dev_remove() later gets into
dm_mq_cleanup_mapped_device() it sees the pointer and tries to
uninitialize and free it again.

Fix this by setting the pointer to NULL in dm_mq_init_request_queue()
error-handling. Also set it to NULL in dm_mq_cleanup_mapped_device().

Cc: <stable@vger.kernel.org> # 4.6+
Fixes: 1c357a1e ("dm: allocate blk_mq_tag_set rather than embed in mapped_device")
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

055972fc

dm space map common: fix division bug in sm_ll_find_free_block() · 0535d9a3

Joe Thornber authored 3 years ago


stable inclusion
from linux-4.19.191
commit 7cba7ebfd905cae6a50f548477a0e17ccd176ddf

--------------------------------

commit 5208692e80a1f3c8ce2063a22b675dd5589d1d80 upstream.

This division bug meant the search for free metadata space could skip
the final allocation bitmap's worth of entries. Fix affects DM thinp,
cache and era targets.

Cc: stable@vger.kernel.org
Signed-off-by: Joe Thornber <ejt@redhat.com>
Tested-by: Ming-Hung Tsai <mtsai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

0535d9a3

dm persistent data: packed struct should have an aligned() attribute too · 8f19789f

Joe Thornber authored 3 years ago


stable inclusion
from linux-4.19.191
commit 77aab16c6185f15fedccf72094d6d9a59dadadba

--------------------------------

commit a88b2358f1da2c9f9fcc432f2e0a79617fea397c upstream.

Otherwise most non-x86 architectures (e.g. riscv, arm) will resort to
byte-by-byte access.

Cc: stable@vger.kernel.org
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

8f19789f

tracing: Restructure trace_clock_global() to never block · 99ef33b3

Steven Rostedt (VMware) authored 3 years ago

stable inclusion
from linux-4.19.191
commit d43d56dbf452ccecc1ec735cd4b6840118005d7c

--------------------------------

commit aafe104aa9096827a429bc1358f8260ee565b7cc upstream.

It was reported that a fix to the ring buffer recursion detection would
cause a hung machine when performing suspend / resume testing. The
following backtrace was extracted from debugging that case:

Call Trace:
 trace_clock_global+0x91/0xa0
 __rb_reserve_next+0x237/0x460
 ring_buffer_lock_reserve+0x12a/0x3f0
 trace_buffer_lock_reserve+0x10/0x50
 __trace_graph_return+0x1f/0x80
 trace_graph_return+0xb7/0xf0
 ? trace_clock_global+0x91/0xa0
 ftrace_return_to_handler+0x8b/0xf0
 ? pv_hash+0xa0/0xa0
 return_to_handler+0x15/0x30
 ? ftrace_graph_caller+0xa0/0xa0
 ? trace_clock_global+0x91/0xa0
 ? __rb_reserve_next+0x237/0x460
 ? ring_buffer_lock_reserve+0x12a/0x3f0
 ? trace_event_buffer_lock_reserve+0x3c/0x120
 ? trace_event_buffer_reserve+0x6b/0xc0
 ? trace_event_raw_event_device_pm_callback_start+0x125/0x2d0
 ? dpm_run_callback+0x3b/0xc0
 ? pm_ops_is_empty+0x50/0x50
 ? platform_get_irq_byname_optional+0x90/0x90
 ? trace_device_pm_callback_start+0x82/0xd0
 ? dpm_run_callback+0x49/0xc0

With the following RIP:

RIP: 0010:native_queued_spin_lock_slowpath+0x69/0x200

Since the fix to the recursion detection would allow a single recursion to
happen while tracing, this lead to the trace_clock_global() taking a spin
lock and then trying to take it again:

ring_buffer_lock_reserve() {
  trace_clock_global() {
    arch_spin_lock() {
      queued_spin_lock_slowpath() {
        /* lock taken */
        (something else gets traced by function graph tracer)
          ring_buffer_lock_reserve() {
            trace_clock_global() {
              arch_spin_lock() {
                queued_spin_lock_slowpath() {
                /* DEAD LOCK! */

Tracing should *never* block, as it can lead to strange lockups like the
above.

Restructure the trace_clock_global() code to instead of simply taking a
lock to update the recorded "prev_time" simply use it, as two events
happening on two different CPUs that calls this at the same time, really
doesn't matter which one goes first. Use a trylock to grab the lock for
updating the prev_time, and if it fails, simply try again the next time.
If it failed to be taken, that means something else is already updating
it.

Link: https://lkml.kernel.org/r/20210430121758.650b6e8a@gandalf.local.home



Cc: stable@vger.kernel.org
Tested-by: Konstantin Kharlamov <hi-angel@yandex.ru>
Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Fixes: b02414c8 ("ring-buffer: Fix recursion protection transitions between interrupt context") # started showing the problem
Fixes: 14131f2f ("tracing: implement trace_clock_*() APIs") # where the bug happened
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212761


Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

99ef33b3

tracing: Map all PIDs to command lines · acdff4a0

Steven Rostedt (VMware) authored 3 years ago

stable inclusion
from linux-4.19.191
commit 92c52762c295602b2606181ff46e0de67a6e1417

--------------------------------

commit 785e3c0a3a870e72dc530856136ab4c8dd207128 upstream.

The default max PID is set by PID_MAX_DEFAULT, and the tracing
infrastructure uses this number to map PIDs to the comm names of the
tasks, such output of the trace can show names from the recorded PIDs in
the ring buffer. This mapping is also exported to user space via the
"saved_cmdlines" file in the tracefs directory.

But currently the mapping expects the PIDs to be less than
PID_MAX_DEFAULT, which is the default maximum and not the real maximum.
Recently, systemd will increases the maximum value of a PID on the system,
and when tasks are traced that have a PID higher than PID_MAX_DEFAULT, its
comm is not recorded. This leads to the entire trace to have "<...>" as
the comm name, which is pretty useless.

Instead, keep the array mapping the size of PID_MAX_DEFAULT, but instead
of just mapping the index to the comm, map a mask of the PID
(PID_MAX_DEFAULT - 1) to the comm, and find the full PID from the
map_cmdline_to_pid array (that already exists).

This bug goes back to the beginning of ftrace, but hasn't been an issue
until user space started increasing the maximum value of PIDs.

Link: https://lkml.kernel.org/r/20210427113207.3c601884@gandalf.local.home



Cc: stable@vger.kernel.org
Fixes: bc0c38d1 ("ftrace: latency tracer infrastructure")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

acdff4a0

tty: fix memory leak in vc_deallocate · 67cf6190

Pavel Skripkin authored 3 years ago


stable inclusion
from linux-4.19.191
commit 3ce3062bcf01de94815605f1b6fd916cb84ff08b

--------------------------------

commit 211b4d42b70f1c1660feaa968dac0efc2a96ac4d upstream.

syzbot reported memory leak in tty/vt.
The problem was in VT_DISALLOCATE ioctl cmd.
After allocating unimap with PIO_UNIMAP it wasn't
freed via VT_DISALLOCATE, but vc_cons[currcons].d was
zeroed.

Reported-by:  <syzbot+bcc922b19ccc64240b42@syzkaller.appspotmail.com>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20210327214443.21548-1-paskripkin@gmail.com


Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

67cf6190

ext4: fix error code in ext4_commit_super · 13b15778

Fengnan Chang authored 3 years ago


stable inclusion
from linux-4.19.191
commit 353b8d3bd324ecc3097d09d8c405d25e0f75ec7f

--------------------------------

commit f88f1466e2a2e5ca17dfada436d3efa1b03a3972 upstream.

We should set the error code when ext4_commit_super check argument failed.
Found in code review.
Fixes: c4be0c1d ("filesystem freeze: add error handling of write_super_lockfs/unlockfs").

Cc: stable@kernel.org
Signed-off-by: Fengnan Chang <changfengnan@vivo.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20210402101631.561-1-changfengnan@vivo.com


Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

13b15778

posix-timers: Preserve return value in clock_adjtime32() · c89fdf52

Chen Jun authored 3 years ago


stable inclusion
from linux-4.19.191
commit 1a852780d9d93378ae4f7d7d57213bc114c6873f

--------------------------------

commit 2d036dfa5f10df9782f5278fc591d79d283c1fad upstream.

The return value on success (>= 0) is overwritten by the return value of
put_old_timex32(). That works correct in the fault case, but is wrong for
the success case where put_old_timex32() returns 0.

Just check the return value of put_old_timex32() and return -EFAULT in case
it is not zero.

[ tglx: Massage changelog ]

Fixes: 3a4d44b6 ("ntp: Move adjtimex related compat syscalls to native counterparts")
Signed-off-by: Chen Jun <chenjun102@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Cochran <richardcochran@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210414030449.90692-1-chenjun102@huawei.com


Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

c89fdf52

Revert ("futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op") · 3ebb34e5

Thomas Gleixner authored 3 years ago


stable inclusion
from linux-4.19.191
commit f724a7453bbfb8bb917735e315d5d902c0b67c33

--------------------------------

commit 4fbf5d6837bf81fd7a27d771358f4ee6c4f243f8 upstream.

The FUTEX_WAIT operand has historically a relative timeout which means that
the clock id is irrelevant as relative timeouts on CLOCK_REALTIME are not
subject to wall clock changes and therefore are mapped by the kernel to
CLOCK_MONOTONIC for simplicity.

If a caller would set FUTEX_CLOCK_REALTIME for FUTEX_WAIT the timeout is
still treated relative vs. CLOCK_MONOTONIC and then the wait arms that
timeout based on CLOCK_REALTIME which is broken and obviously has never
been used or even tested.

Reject any attempt to use FUTEX_CLOCK_REALTIME with FUTEX_WAIT again.

The desired functionality can be achieved with FUTEX_WAIT_BITSET and a
FUTEX_BITSET_MATCH_ANY argument.

Fixes: 337f1304 ("futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210422194704.834797921@linutronix.de


Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

3ebb34e5

dm raid: fix inconclusive reshape layout on fast raid4/5/6 table reload sequences · 35d2f378

Heinz Mauelshagen authored 3 years ago


stable inclusion
from linux-4.19.191
commit 514666bb548d39b7a63814858f1be17becb49461

--------------------------------

commit f99a8e4373eeacb279bc9696937a55adbff7a28a upstream.

If fast table reloads occur during an ongoing reshape of raid4/5/6
devices the target may race reading a superblock vs the the MD resync
thread; causing an inconclusive reshape state to be read in its
constructor.

lvm2 test lvconvert-raid-reshape-stripes-load-reload.sh can cause
BUG_ON() to trigger in md_run(), e.g.:
"kernel BUG at drivers/md/raid5.c:7567!".

Scenario triggering the bug:

1. the MD sync thread calls end_reshape() from raid5_sync_request()
   when done reshaping. However end_reshape() _only_ updates the
   reshape position to MaxSector keeping the changed layout
   configuration though (i.e. any delta disks, chunk sector or RAID
   algorithm changes). That inconclusive configuration is stored in
   the superblock.

2. dm-raid constructs a mapping, loading named inconsistent superblock
   as of step 1 before step 3 is able to finish resetting the reshape
   state completely, and calls md_run() which leads to mentioned bug
   in raid5.c.

3. the MD RAID personality's finish_reshape() is called; which resets
   the reshape information on chunk sectors, delta disks, etc. This
   explains why the bug is rarely seen on multi-core machines, as MD's
   finish_reshape() superblock update races with the dm-raid
   constructor's superblock load in step 2.

Fix identifies inconclusive superblock content in the dm-raid
constructor and resets it before calling md_run(), factoring out
identifying checks into rs_is_layout_change() to share in existing
rs_reshape_requested() and new rs_reset_inclonclusive_reshape(). Also
enhance a comment and remove an empty line.

Cc: stable@vger.kernel.org
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

35d2f378

md/raid1: properly indicate failure when ending a failed write request · e62a93fc

Paul Clements authored 3 years ago


stable inclusion
from linux-4.19.191
commit a6e17cab00fc5bf85472434c52ac751426257c6f

--------------------------------

commit 2417b9869b81882ab90fd5ed1081a1cb2d4db1dd upstream.

This patch addresses a data corruption bug in raid1 arrays using bitmaps.
Without this fix, the bitmap bits for the failed I/O end up being cleared.

Since we are in the failure leg of raid1_end_write_request, the request
either needs to be retried (R1BIO_WriteError) or failed (R1BIO_Degraded).

Fixes: eeba6809 ("md/raid1: end bio when the device faulty")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Paul Clements <paul.clements@us.sios.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

e62a93fc

NFSv4: Don't discard segments marked for return in _pnfs_return_layout() · 88d42bc7

Trond Myklebust authored 3 years ago


stable inclusion
from linux-4.19.191
commit 9ffa7967f9379a0a1b924e9ffeda709d72237da7

--------------------------------

commit de144ff4234f935bd2150108019b5d87a90a8a96 upstream.

If the pNFS layout segment is marked with the NFS_LSEG_LAYOUTRETURN
flag, then the assumption is that it has some reporting requirement
to perform through a layoutreturn (e.g. flexfiles layout stats or error
information).

Fixes: 6d597e17 ("pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

88d42bc7

NFS: Don't discard pNFS layout segments that are marked for return · b0a47210

Trond Myklebust authored 3 years ago


stable inclusion
from linux-4.19.191
commit d51316b13dd4f35aa737d20822a18c131eed32c3

--------------------------------

commit 39fd01863616964f009599e50ca5c6ea9ebf88d6 upstream.

If the pNFS layout segment is marked with the NFS_LSEG_LAYOUTRETURN
flag, then the assumption is that it has some reporting requirement
to perform through a layoutreturn (e.g. flexfiles layout stats or error
information).

Fixes: e0b7d420 ("pNFS: Don't discard layout segments that are marked for return")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

b0a47210

ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure · e02b6f34

Marc Zyngier authored 3 years ago


stable inclusion
from linux-4.19.191
commit 7b2162db1498c71962a4bb2f776fa4e76d4d305b

--------------------------------

commit 1ecd5b129252249b9bc03d7645a7bda512747277 upstream.

When failing the driver probe because of invalid firmware properties,
the GTDT driver unmaps the interrupt that it mapped earlier.

However, it never checks whether the mapping of the interrupt actially
succeeded. Even more, should the firmware report an illegal interrupt
number that overlaps with the GIC SGI range, this can result in an
IPI being unmapped, and subsequent fireworks (as reported by Dann
Frazier).

Rework the driver to have a slightly saner behaviour and actually
check whether the interrupt has been mapped before unmapping things.

Reported-by: dann frazier <dann.frazier@canonical.com>
Fixes: ca9ae5ec ("acpi/arm64: Add SBSA Generic Watchdog support in GTDT driver")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/YH87dtTfwYgavusz@xps13.dannf


Cc: <stable@vger.kernel.org>
Cc: Fu Wei <wefu@redhat.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: dann frazier <dann.frazier@canonical.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Link: https://lore.kernel.org/r/20210421164317.1718831-2-maz@kernel.org


Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

e02b6f34

arm64/vdso: Discard .note.gnu.property sections in vDSO · ac9c4031

Bill Wendling authored 3 years ago


stable inclusion
from linux-4.19.191
commit 44149b3e106e4c632d4c5b80650580f823f72c45

--------------------------------

[ Upstream commit 388708028e6937f3fc5fc19aeeb847f8970f489c ]

The arm64 assembler in binutils 2.32 and above generates a program
property note in a note section, .note.gnu.property, to encode used x86
ISAs and features. But the kernel linker script only contains a single
NOTE segment:

  PHDRS
  {
    text    PT_LOAD    FLAGS(5) FILEHDR PHDRS; /* PF_R|PF_X */
    dynamic PT_DYNAMIC FLAGS(4);               /* PF_R */
    note    PT_NOTE    FLAGS(4);               /* PF_R */
  }

The NOTE segment generated by the vDSO linker script is aligned to 4 bytes.
But the .note.gnu.property section must be aligned to 8 bytes on arm64.

  $ readelf -n vdso64.so

  Displaying notes found in: .note
    Owner                Data size      Description
    Linux                0x00000004     Unknown note type: (0x00000000)
     description data: 06 00 00 00
  readelf: Warning: note with invalid namesz and/or descsz found at offset 0x20
  readelf: Warning:  type: 0x78, namesize: 0x00000100, descsize: 0x756e694c, alignment: 8

Since the note.gnu.property section in the vDSO is not checked by the
dynamic linker, discard the .note.gnu.property sections in the vDSO.

Similar to commit 4caffe6a ("x86/vdso: Discard .note.gnu.property
sections in vDSO"), but for arm64.

Signed-off-by: Bill Wendling <morbo@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210423205159.830854-1-morbo@google.com


Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

ac9c4031

perf/arm_pmu_platform: Fix error handling · 2a8388f4

Robin Murphy authored 3 years ago


stable inclusion
from linux-4.19.191
commit 213a9a13f0735212410312f61902a40ff065c477

--------------------------------

[ Upstream commit e338cb6bef254821a8c095018fd27254d74bfd6a ]

If we're aborting after failing to register the PMU device,
we probably don't want to leak the IRQs that we've claimed.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/53031a607fc8412a60024bfb3bb8cd7141f998f5.1616774562.git.robin.murphy@arm.com


Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

2a8388f4

genirq/matrix: Prevent allocation counter corruption · d7b48b8c

Vitaly Kuznetsov authored 3 years ago

stable inclusion
from linux-4.19.191
commit 873792c84237953385d99ff3c56ed9e467c15300

--------------------------------

[ Upstream commit c93a5e20c3c2dabef8ea360a3d3f18c6f68233ab ]

When irq_matrix_free() is called for an unallocated vector the
managed_allocated and total_allocated counters get out of sync with the
real state of the matrix. Later, when the last interrupt is freed, these
counters will underflow resulting in UINTMAX because the counters are
unsigned.

While this is certainly a problem of the calling code, this can be catched
in the allocator by checking the allocation bit for the to be freed vector
which simplifies debugging.

An example of the problem described above:
https://lore.kernel.org/lkml/20210318192819.636943062@linutronix.de/



Add the missing sanity check and emit a warning when it triggers.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210319111823.1105248-1-vkuznets@redhat.com


Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

d7b48b8c

crypto: api - check for ERR pointers in crypto_destroy_tfm() · 6bbf4a5a

Ard Biesheuvel authored 3 years ago

stable inclusion
from linux-4.19.191
commit 67afaf1a0a3b2e23645c86636d887a7ca84a09ed

--------------------------------

[ Upstream commit 83681f2bebb34dbb3f03fecd8f570308ab8b7c2c ]

Given that crypto_alloc_tfm() may return ERR pointers, and to avoid
crashes on obscure error paths where such pointers are presented to
crypto_destroy_tfm() (such as [0]), add an ERR_PTR check there
before dereferencing the second argument as a struct crypto_tfm
pointer.

[0] https://lore.kernel.org/linux-crypto/000000000000de949705bc59e0f6@google.com/



Reported-by:  <syzbot+12cf5fbfdeba210a89dd@syzkaller.appspotmail.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

6bbf4a5a

cifs: Return correct error code from smb2_get_enc_key · 9d9b8c95

Paul Aurich authored 3 years ago


stable inclusion
from linux-4.19.191
commit e486f8397f3f14a7cadc166138141fdb14379a54

--------------------------------

commit 83728cbf366e334301091d5b808add468ab46b27 upstream.

Avoid a warning if the error percolates back up:

[440700.376476] CIFS VFS: \\otters.example.com crypt_message: Could not get encryption key
[440700.386947] ------------[ cut here ]------------
[440700.386948] err = 1
[440700.386977] WARNING: CPU: 11 PID: 2733 at /build/linux-hwe-5.4-p6lk6L/linux-hwe-5.4-5.4.0/lib/errseq.c:74 errseq_set+0x5c/0x70
...
[440700.397304] CPU: 11 PID: 2733 Comm: tar Tainted: G           OE     5.4.0-70-generic #78~18.04.1-Ubuntu
...
[440700.397334] Call Trace:
[440700.397346]  __filemap_set_wb_err+0x1a/0x70
[440700.397419]  cifs_writepages+0x9c7/0xb30 [cifs]
[440700.397426]  do_writepages+0x4b/0xe0
[440700.397444]  __filemap_fdatawrite_range+0xcb/0x100
[440700.397455]  filemap_write_and_wait+0x42/0xa0
[440700.397486]  cifs_setattr+0x68b/0xf30 [cifs]
[440700.397493]  notify_change+0x358/0x4a0
[440700.397500]  utimes_common+0xe9/0x1c0
[440700.397510]  do_utimes+0xc5/0x150
[440700.397520]  __x64_sys_utimensat+0x88/0xd0

Fixes: 61cfac6f ("CIFS: Fix possible use after free in demultiplex thread")
Signed-off-by: Paul Aurich <paul@darkrain42.org>
CC: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

9d9b8c95

ftrace: Handle commands when closing set_ftrace_filter file · 560da86d

Steven Rostedt (VMware) authored 3 years ago


stable inclusion
from linux-4.19.191
commit 3d9281a4ac7171c808f9507f0937eb236b353905

--------------------------------

commit 8c9af478c06bb1ab1422f90d8ecbc53defd44bc3 upstream.

 # echo switch_mm:traceoff > /sys/kernel/tracing/set_ftrace_filter

will cause switch_mm to stop tracing by the traceoff command.

 # echo -n switch_mm:traceoff > /sys/kernel/tracing/set_ftrace_filter

does nothing.

The reason is that the parsing in the write function only processes
commands if it finished parsing (there is white space written after the
command). That's to handle:

 write(fd, "switch_mm:", 10);
 write(fd, "traceoff", 8);

cases, where the command is broken over multiple writes.

The problem is if the file descriptor is closed, then the write call is
not processed, and the command needs to be processed in the release code.
The release code can handle matching of functions, but does not handle
commands.

Cc: stable@vger.kernel.org
Fixes: eda1e328 ("tracing: handle broken names in ftrace filter")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

560da86d

ACPI/IORT: Fix 'Number of IDs' handling in iort_id_map() · b6580f3a

Hanjun Guo authored 3 years ago

mainline inclusion
from mainline-v5.6-rc1
commit 3c23b83a
category: bugfix
bugzilla: 28747
CVE: NA

-------------------------------------------------

The IORT specification [0] (Section 3, table 4, page 9) defines the
'Number of IDs' as 'The number of IDs in the range minus one'.

However, the IORT ID mapping function iort_id_map() treats the 'Number
of IDs' field as if it were the full IDs mapping count, with the
following check in place to detect out of boundary input IDs:

InputID >= Input base + Number of IDs

This check is flawed in that it considers the 'Number of IDs' field as
the full number of IDs mapping and disregards the 'minus one' from
the IDs count.

The correct check in iort_id_map() should be implemented as:

InputID > Input base + Number of IDs

this implements the specification correctly but unfortunately it breaks
existing firmwares that erroneously set the 'Number of IDs' as the full
IDs mapping count rather than IDs mapping count minus one.

e.g.

PCI hostbridge mapping entry 1:
Input base:  0x1000
ID Count:    0x100
Output base: 0x1000
Output reference: 0xC4  //ITS reference

PCI hostbridge mapping entry 2:
Input base:  0x1100
ID Count:    0x100
Output base: 0x2000
Output reference: 0xD4  //ITS reference

Two mapping entries which the second entry's Input base = the first
entry's Input base + ID count, so for InputID 0x1100 and with the
correct InputID check in place in iort_id_map() the kernel would map
the InputID to ITS 0xC4 not 0xD4 as it would be expected.

Therefore, to keep supporting existing flawed firmwares, introduce a
workaround that instructs the kernel to use the old InputID range check
logic in iort_id_map(), so that we can support both firmwares written
with the flawed 'Number of IDs' logic and the correct one as defined in
the specifications.

[0]: http://infocenter.arm.com/help/topic/com.arm.doc.den0049d/DEN0049D_IO_Remapping_Table.pdf



Reported-by: Pankaj Bansal <pankaj.bansal@nxp.com>
Link: https://lore.kernel.org/linux-acpi/20191215203303.29811-1-pankaj.bansal@nxp.com/


Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Pankaj Bansal <pankaj.bansal@nxp.com>
Cc: Will Deacon <will@kernel.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>

Conflicts:
	drivers/acpi/arm64/iort.c
[wangxiongfeng: fix a small conflict in acpi_iort_init()]
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

b6580f3a

ext4: do not use extent after put_bh · f1e754ef

yangerkun authored 3 years ago


mainline inclusion
from mainline-v5.10-rc2
commit d7dce9e0
category: bugfix
bugzilla: 45516
CVE: NA

-------------------------------------------------

ext4_ext_search_right() will read more extent blocks and call put_bh
after we get the information we need.  However, ret_ex will break this
and may cause use-after-free once pagecache has been freed.  Fix it by
copying the extent structure if needed.

Signed-off-by: yangerkun <yangerkun@huawei.com>
Link: https://lore.kernel.org/r/20201028055617.2569255-1-yangerkun@huawei.com


Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Zheng Liang <zhengliang6@huawei.com>
Reviewed-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

f1e754ef

modpost: explain why we can't use strsep · ce1949f9

Wolfram Sang authored 3 years ago


mainline inclusion
from mainline-5.8
commit 6020db50
category: bugfix
bugzilla: 46847
CVE: NA

-----------------------------------------------

Mention why we open-code strsep, so it is clear that it is intentional.

Fixes: 736bb118 ("modpost: remove use of non-standard strsep() in HOSTCC code")
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
Reviewed-by: Jian Cheng <cj.chengjian@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

ce1949f9

modpost: remove use of non-standard strsep() in HOSTCC code · 247a52d0

H. Nikolaus Schaller authored 3 years ago

mainline inclusion
from mainline-5.8
commit 736bb118
category: bugfix
bugzilla: 46847
CVE: NA

-----------------------------------------------

strsep() is neither standard C nor POSIX and used outside
the kernel code here. Using it here requires that the
build host supports it out of the box which is e.g.
not true for a Darwin build host and using a cross-compiler.
This leads to:

scripts/mod/modpost.c:145:2: warning: implicit declaration of function 'strsep' [-Wimplicit-function-declaration]
  return strsep(stringp, "\n");
  ^

and a segfault when running MODPOST.

See also: https://stackoverflow.com/a/7219504



So let's replace this by strchr() instead of using strsep().
It does not hurt kernel size or speed since this code is run
on the build host.

Fixes: ac5100f5 ("modpost: add read_text_file() and get_line() helpers")
Co-developed-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
Reviewed-by: Jian Cheng <cj.chengjian@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

247a52d0

modpost: use read_text_file() and get_line() for reading text files · eba07d7d

Masahiro Yamada authored 3 years ago


mainline inclusion
from mainline-5.8
commit 70f30cfe
category: bugfix
bugzilla: 46847
CVE: NA

------------------------------------------------

grab_file() mmaps a file, but it is not so efficient here because
get_next_line() copies every line to the temporary buffer anyway.

read_text_file() and get_line() are simpler. get_line() exploits the
library function strchr().

Going forward, the missing *.symvers or *.cmd is a fatal error.
This should not happen because scripts/Makefile.modpost guards the
-i option files with $(wildcard $(input-symdump)).

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

Drop the changes in modpost.c since the missing *.symvers or *.cmd is
a fatal error and we don't check it in scripts/Makefile.modpost
currently, on which curimistance the building process for modules would
be interrupted with an error. This isn't what we expect.

Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
Reviewed-by: Jian Cheng <cj.chengjian@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

eba07d7d

modpost: add read_text_file() and get_line() helpers · 2c3c5b69

Masahiro Yamada authored 3 years ago


mainline inclusion
from mainline-5.8
commit ac5100f5
category: bugfix
bugzilla: 46847
CVE: NA

-----------------------------------------------

modpost uses grab_file() to open a file, but it is not suitable for
a text file because the mmap'ed file is not terminated by null byte.
Actually, I see some issues for the use of grab_file().

The new helper, read_text_file() loads the whole file content into a
malloc'ed buffer, and appends a null byte. Then, get_line() reads
each line.

To handle text files, I intend to replace as follows:

  grab_file()    -> read_text_file()
  get_new_line() -> get_line()

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
Reviewed-by: Jian Cheng <cj.chengjian@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

2c3c5b69

arm64: capabilities: Merge duplicate entries for Qualcomm erratum 1003 · d4fcfb2e

Suzuki K Poulose authored 3 years ago

mainline inclusion
from mainline-v5.0-rc1
commit a3dcea2c
category: bugfix
bugzilla: 46773
CVE: NA
References: https://gitee.com/src-openeuler/kernel/issues/I235Y8



---------------------------

Remove duplicate entries for Qualcomm erratum 1003. Since the entries
are not purely based on generic MIDR checks, use the multi_cap_entry
type to merge the entries.

Cc: Christopher Covington <cov@codeaurora.org>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

Conflicts:
	arch/arm64/kernel/cpu_errata.c
[Zheng Zengkai: replace multi_entry_cap_matches with
cpucap_multi_entry_cap_matches skipped in the following commit:
arm64: cpufeature: Rework ptr auth hwcaps using multi_entry_cap_matches
]

Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

d4fcfb2e

arm64: capabilities: Merge duplicate Cavium erratum entries · 9c4ec8f5

Suzuki K Poulose authored 3 years ago

mainline inclusion
from mainline-v5.0-rc1
commit f58cdf7e
category: bugfix
bugzilla: 46773
CVE: NA
References: https://gitee.com/src-openeuler/kernel/issues/I235Y8



---------------------------

Merge duplicate entries for a single capability using the midr
range list for Cavium errata 30115 and 27456.

Cc: Andrew Pinski <apinski@cavium.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

9c4ec8f5