Commits · 0380474221530db9147a001034794a95fb4c46c1 · Summer2022 / 22b970264

Dec 27, 2019

perf: Paper over the hw.target problems · 03804742

Alexander Shishkin authored 6 years ago and

谢秀奇 committed 5 years ago

euler inclusion
category: bugfix
bugzilla: 9513/11006/11050
CVE: NA
--------------------------------------------------

[ Cheng Jian
HULK-Syzkaller reported a problem which has been reported to mainline(lkml)
by syzbot early, this patch comes from the reply form lkml.
v1	https://lkml.org/lkml/2019/2/28/529
v2	https://lkml.org/lkml/2019/3/8/206
we merged v1 first but cause bugzilla #11050, it was because :
we also use perf_remove_from_context() in perf_event_open() when we move
events from a SW context to a HW context, so we can't destroy the event
here.
now v2 will not exhibit that warning.
it's same to another patch at https://lkml.org/lkml/2019/3/8/536

.
but more clear than it.]

First, we have a race between perf_event_release_kernel() and
perf_free_event(), which happens when parent's event is released while the
child's fork fails (because of a fatal signal, for example), that looks
like this:

cpu X                            cpu Y
-----                            -----
                                 copy_process() error path
perf_release(parent)             +->perf_event_free_task()
+-> lock(child_ctx->mutex)       |  |
+-> remove_from_context(child)   |  |
+-> unlock(child_ctx->mutex)     |  |
|                                |  +-> lock(child_ctx->mutex)
|                                |  +-> unlock(child_ctx->mutex)
|                                +-> free_task(child_task)
+-> put_task_struct(child_task)

Technically, we're still holding a reference to the task via
parent->hw.target, that's not stopping free_task(), so we end up poking at
free'd memory, as is pointed out by KASAN in the syzkaller report (see Link
below). The straightforward fix is to drop the hw.target reference while
the task is still around.

Therein lies the second problem: the users of hw.target (uprobe) assume
that it's around at ->destroy() callback time, where they use it for
context. So, in order to not break the uprobe teardown and avoid leaking
stuff, we need to call ->destroy() at the same time.

This patch fixes the race and the subsequent fallout by doing both these
things at remove_from_context time.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: https://syzkaller.appspot.com/bug?extid=a24c397a29ad22d86c98



Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

03804742

aio_poll(): sanitize the logics after vfs_poll(), get rid of leak on error · 40bdfdc7

Al Viro authored 6 years ago and

谢秀奇 committed 5 years ago

euler inclusion
category: bugfix
bugzilla: 10679
CVE: NA

---------------------------

We want iocb_put() happening on errors, to balance the extra reference
we'd taken.  As it is, we end up with a leak.  The rules should be
	* error: iocb_put() to deal with the extra ref, return error,
let the caller do another iocb_put().
	* async: iocb_put() to deal with the extra ref, return 0.
	* no error, event present immediately: aio_poll_complete() to
report it, iocb_put() to deal with the extra ref, return 0.

Link: https://patchwork.kernel.org/patch/10842103/


Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reviewed-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

40bdfdc7

aio_poll_wake(): don't set ->woken if we ignore the wakeup · 2d9e350d

Al Viro authored 6 years ago and

谢秀奇 committed 5 years ago

euler inclusion
category: bugfix
bugzilla: 10679
CVE: NA

---------------------------

In case of early wakeups, aio_poll() assumes that aio_poll_complete()
has either already happened or is imminent.  In that case we do not
want to put iocb on the list of cancellables.  However, ignored
wakeups need to be treated as if wakeup has not happened at all.
Trivially fixed by having aio_poll_wake() set ->woken only after
it's committed to taking iocb out of the waitqueue.

Link: https://patchwork.kernel.org/patch/10842107/


Suggested-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reviewed-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

2d9e350d

ext4: brelse all indirect buffers in ext4_ind_remove_space() · 9b15cfc8

zhangyi (F) authored 6 years ago and

谢秀奇 committed 5 years ago


euler inclusion
category: bugfix
bugzilla: 11043
CVE: NA
---------------------------

All indirect buffers get by ext4_find_shared() should be released no
mater the branch should be freed or not. But now, we forget to release
the lower depth indirect buffers when removing space from the same
higher depth indirect block. It will lead to buffer leak and futher
more, it may lead to quota information corruption when using old quota,
consider the following case.

 - Create and mount an empty ext4 filesystem without extent and quota
   features,
 - quotacheck and enable the user & group quota,
 - Create some files and write some data to them, and then punch hole
   to some files of them, it may trigger the buffer leak problem
   mentioned above.
 - Disable quota and run quotacheck again, it will create two new
   aquota files and write the checked quota information to them, which
   probably may reuse the freed indirect block(the buffer and page
   cache was not freed) as data block.
 - Enable quota again, it will invoke
   vfs_load_quota_inode()->invalidate_bdev() to try to clean unused
   buffers and pagecache. Unfortunately, because of the buffer of quota
   data block is still referenced, quota code cannot read the up to date
   quota info from the device and lead to quota information corruption.

This problem can be reproduced by xfstests generic/231 on ext3 filesystem
or ext4 filesystem without extent and quota feature.

This patch fix this problem by brelse all indirect buffers, and also
cleanup the brelse code in ext4_ind_remove_space().

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

9b15cfc8

kasan: remove use after scope bugs detection. · 363eeef5

Andrey Ryabinin authored 6 years ago and

谢秀奇 committed 5 years ago

mainline inclusion
from mainline-v5.0
commit 7771bdbb
category: bugfix
bugzilla: 10979
CVE: NA

------------------------------------------------

Use after scope bugs detector seems to be almost entirely useless for the
linux kernel.  It exists over two years, but I've seen only one valid bug
so far [1].  And the bug was fixed before it has been reported.  There
were some other use-after-scope reports, but they were false-positives due
to different reasons like incompatibility with structleak plugin.

This feature significantly increases stack usage, especially with GCC < 9
version, and causes a 32K stack overflow.  It probably adds performance
penalty too.

Given all that, let's remove use-after-scope detector entirely.

While preparing this patch I've noticed that we mistakenly enable
use-after-scope detection for clang compiler regardless of
CONFIG_KASAN_EXTRA setting.  This is also fixed now.

[1] http://lkml.kernel.org/r/<2...

363eeef5

userfaultfd: use RCU to free the task struct when fork fails if MEMCG · 8eb04a7a

Andrea Arcangeli authored 6 years ago and

谢秀奇 committed 5 years ago


euler inclusion
category: bugfix
bugzilla: 10989
CVE: NA

------------------------------------------------

MEMCG depends on the task structure not to be freed under
rcu_read_lock() in get_mem_cgroup_from_mm() after it dereferences
mm->owner.

A better fix would be to avoid registering forked vmas in userfaultfd
contexts reported to the monitor, if case fork ends up failing.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Reviewed-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

8eb04a7a

keys: Fix dependency loop between construction record and auth key · 4877d0fd

David Howells authored 6 years ago and

谢秀奇 committed 5 years ago


mainline inclusion
from mainline-5.0-rc8
commit 822ad64d
category: bugfix
bugzilla: 10783
CVE: NA

---------------------------

In the request_key() upcall mechanism there's a dependency loop by which if
a key type driver overrides the ->request_key hook and the userspace side
manages to lose the authorisation key, the auth key and the internal
construction record (struct key_construction) can keep each other pinned.

Fix this by the following changes:

 (1) Killing off the construction record and using the auth key instead.

 (2) Including the operation name in the auth key payload and making the
     payload available outside of security/keys/.

 (3) The ->request_key hook is given the authkey instead of the cons
     record and operation name.

Changes (2) and (3) allow the auth key to naturally be cleaned up if the
keyring it is in is destroyed or cleared or the auth key is unlinked.

Fixes: 7ee02a316600 ("keys: Fix dependency loop between construction record and auth key")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.morris@microsoft.com>

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

4877d0fd

keys: Timestamp new keys · 0013e500

David Howells authored 6 years ago and

谢秀奇 committed 5 years ago


mainline inclusion
from mainline-5.0-rc8
commit 7c1857bd
category: bugfix
bugzilla: 10783
CVE: NA

---------------------------

Set the timestamp on new keys rather than leaving it unset.

Fixes: 31d5a79d ("KEYS: Do LRU discard in full keyrings")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.morris@microsoft.com>

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

0013e500

Bluetooth: Check L2CAP option sizes returned from l2cap_get_conf_opt · 8282e42e

Marcel Holtmann authored 6 years ago and

谢秀奇 committed 5 years ago


mainline inclusion
from mainline-5.0
commit af3d5d1c
category: bugfix
bugzilla: NA
CVE: CVE-2019-3460

-------------------------------------------------

When doing option parsing for standard type values of 1, 2 or 4 octets,
the value is converted directly into a variable instead of a pointer. To
avoid being tricked into being a pointer, check that for these option
types that sizes actually match. In L2CAP every option is fixed size and
thus it is prudent anyway to ensure that the remote side sends us the
right option size along with option paramters.

If the option size is not matching the option type, then that option is
silently ignored. It is a protocol violation and instead of trying to
give the remote attacker any further hints just pretend that option is
not present and proceed with the default values. Implementation
following the specification and its qualification procedures will always
use the correct size and thus not being impacted here.

To keep the code readable and consistent accross all options, a few
cosmetic changes were also required.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

8282e42e

Bluetooth: Verify that l2cap_get_conf_opt provides large enough buffer · a3dbdb59

Marcel Holtmann authored 6 years ago and

谢秀奇 committed 5 years ago


mainline inclusion
from mainline-5.0
commit 7c9cbd0b
category: bugfix
bugzilla: NA
CVE: CVE-2019-3459

-------------------------------------------------

The function l2cap_get_conf_opt will return L2CAP_CONF_OPT_SIZE + opt->len
as length value. The opt->len however is in control over the remote user
and can be used by an attacker to gain access beyond the bounds of the
actual packet.

To prevent any potential leak of heap memory, it is enough to check that
the resulting len calculation after calling l2cap_get_conf_opt is not
below zero. A well formed packet will always return >= 0 here and will
end with the length value being zero after the last option has been
parsed. In case of malformed packets messing with the opt->len field the
length value will become negative. If that is the case, then just abort
and ignore the option.

In case an attacker uses a too short opt->len value, then garbage will
be parsed, but that is protected by the unknown option handling and also
the option parameter size checks.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

a3dbdb59

Revert "perf: Paper over the hw.target problems" · db93c085

Cheng Jian authored 6 years ago and

谢秀奇 committed 5 years ago


euler inclusion
category: bugfix
bugzilla: 9513/11006
CVE: NA
--------------------------------------------------

This reverts commit b772baf9a14ab4975e8884a399a4e0bab2fb6bf9.

we merge the patch b772baf9a14a ("perf: Paper over the
hw.target problems") to reslove an use-after-free issue
(bugzilla #9513/#11006).  but it cause some new problem
(bugzilla #11050/#11049) in this version.

So just revert it.

Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

db93c085

net: hsr: fix memory leak in hsr_dev_finalize() · c91d7b3a

Mao Wenan authored 6 years ago and

谢秀奇 committed 5 years ago


mainline inclusion
from mainline-master~13
commit 6caabe7f
category: bugfix
bugzilla: 11026
CVE: NA

-------------------------------------------------

If hsr_add_port(hsr, hsr_dev, HSR_PT_MASTER) failed to
add port, it directly returns res and forgets to free the node
that allocated in hsr_create_self_node(), and forgets to delete
the node->mac_list linked in hsr->self_node_db.

BUG: memory leak
unreferenced object 0xffff8881cfa0c780 (size 64):
  comm "syz-executor.0", pid 2077, jiffies 4294717969 (age 2415.377s)
  hex dump (first 32 bytes):
    e0 c7 a0 cf 81 88 ff ff 00 02 00 00 00 00 ad de  ................
    00 e6 49 cd 81 88 ff ff c0 9b 87 d0 81 88 ff ff  ..I.............
  backtrace:
    [<00000000e2ff5070>] hsr_dev_finalize+0x736/0x960 [hsr]
    [<000000003ed2e597>] hsr_newlink+0x2b2/0x3e0 [hsr]
    [<000000003fa8c6b6>] __rtnl_newlink+0xf1f/0x1600 net/core/rtnetlink.c:3182
    [<000000001247a7ad>] rtnl_newlink+0x66/0x90 net/core/rtnetlink.c:3240
    [<00000000e7d1b61d>] rtnetlink_rcv_msg+0x54e/0xb90 net/core/rtnetlink.c:5130
    [<000000005556bd3a>] netlink_rcv_skb+0x129/0x340 net/netlink/af_netlink.c:2477
    [<00000000741d5ee6>] netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
    [<00000000741d5ee6>] netlink_unicast+0x49a/0x650 net/netlink/af_netlink.c:1336
    [<000000009d56f9b7>] netlink_sendmsg+0x88b/0xdf0 net/netlink/af_netlink.c:1917
    [<0000000046b35c59>] sock_sendmsg_nosec net/socket.c:621 [inline]
    [<0000000046b35c59>] sock_sendmsg+0xc3/0x100 net/socket.c:631
    [<00000000d208adc9>] __sys_sendto+0x33e/0x560 net/socket.c:1786
    [<00000000b582837a>] __do_sys_sendto net/socket.c:1798 [inline]
    [<00000000b582837a>] __se_sys_sendto net/socket.c:1794 [inline]
    [<00000000b582837a>] __x64_sys_sendto+0xdd/0x1b0 net/socket.c:1794
    [<00000000c866801d>] do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
    [<00000000fea382d9>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<00000000e01dacb3>] 0xffffffffffffffff

Fixes: c5a75911 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Reviewed-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

c91d7b3a

posix-cpu-timers: Avoid undefined behaviour in timespec64_to_ns() · 2a3c5819

Xiongfeng Wang authored 6 years ago and

谢秀奇 committed 5 years ago


euler inclusion
category: feature
Bugzilla: 10876
CVE: N/A

----------------------------------------

When I ran Syzkaller testsuite, I got the following call trace.
Reviewed-by: Yang Yingliang <yangyingliang@huawei.com>

================================================================================
UBSAN: Undefined behaviour in ./include/linux/time64.h:120:27
signed integer overflow:
8243129037239968815 * 1000000000 cannot be represented in type 'long long int'
CPU: 5 PID: 28854 Comm: syz-executor.1 Not tainted 4.19.24 #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xca/0x13e lib/dump_stack.c:113
 ubsan_epilogue+0xe/0x81 lib/ubsan.c:159
 handle_overflow+0x193/0x1e2 lib/ubsan.c:190
 timespec64_to_ns include/linux/time64.h:120 [inline]
 posix_cpu_timer_set+0x95a/0xb70 kernel/time/posix-cpu-timers.c:687
 do_timer_settime+0x198/0x2a0 kernel/time/posix-timers.c:892
 __do_sys_timer_settime kernel/time/posix-timers.c:918 [inline]
 __se_sys_timer_settime kernel/time/posix-timers.c:904 [inline]
 __x64_sys_timer_settime+0x18d/0x260 kernel/time/posix-timers.c:904
 do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x462eb9
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f14e4127c58 EFLAGS: 00000246 ORIG_RAX: 00000000000000df
RAX: ffffffffffffffda RBX: 000000000073bfa0 RCX: 0000000000462eb9
RDX: 0000000020000080 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f14e41286bc
R13: 00000000004c54cc R14: 0000000000704278 R15: 00000000ffffffff
================================================================================

It is because 'it_interval.tv_sec' is larger than 'KTIME_SEC_MAX' and
'it_interval.tv_sec * NSEC_PER_SEC' overflows in 'timespec64_to_ns()'.

This patch checks whether 'it_interval.tv_sec' is larger than
'KTIME_SEC_MAX' and saturate if that is the case.

Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

2a3c5819

ntp: Avoid undefined behaviour in second_overflow() · d8df4fe5

Xiongfeng Wang authored 6 years ago and

谢秀奇 committed 5 years ago


euler inclusion
category: feature
Bugzilla: 11009
CVE: N/A

----------------------------------------

When I ran Syzkaller testsuite, I got the following call trace.
Reviewed-by: Yang Yingliang <yangyingliang@huawei.com>

================================================================================
UBSAN: Undefined behaviour in kernel/time/ntp.c:457:16
signed integer overflow:
9223372036854775807 + 500 cannot be represented in type 'long int'
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.25-dirty #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xca/0x13e lib/dump_stack.c:113
 ubsan_epilogue+0xe/0x81 lib/ubsan.c:159
 handle_overflow+0x193/0x1e2 lib/ubsan.c:190
 second_overflow+0x403/0x540 kernel/time/ntp.c:457
 accumulate_nsecs_to_secs kernel/time/timekeeping.c:2002 [inline]
 logarithmic_accumulation kernel/time/timekeeping.c:2046 [inline]
 timekeeping_advance+0x2bb/0xec0 kernel/time/timekeeping.c:2114
 tick_do_update_jiffies64.part.2+0x1a0/0x350 kernel/time/tick-sched.c:97
 tick_do_update_jiffies64 kernel/time/tick-sched.c:1229 [inline]
 tick_nohz_update_jiffies kernel/time/tick-sched.c:499 [inline]
 tick_nohz_irq_enter kernel/time/tick-sched.c:1232 [inline]
 tick_irq_enter+0x1fd/0x240 kernel/time/tick-sched.c:1249
 irq_enter+0xc4/0x100 kernel/softirq.c:353
 entering_irq arch/x86/include/asm/apic.h:517 [inline]
 entering_ack_irq arch/x86/include/asm/apic.h:523 [inline]
 smp_apic_timer_interrupt+0x20/0x480 arch/x86/kernel/apic/apic.c:1052
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864
 </IRQ>
RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 c0 21 b1 83 e8 a1 0a 02 ff e9 ab fd ff ff 4c 89 e7 e8 77 b6 a5 fe e9 6a ff ff ff 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
RSP: 0018:ffff888106307d20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8881062e4f1c
RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848c78a0
R13: 0000000000000003 R14: 1ffff11020c60fae R15: 0000000000000000
 arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
 default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561
 cpuidle_idle_call kernel/sched/idle.c:153 [inline]
 do_idle+0x2ca/0x420 kernel/sched/idle.c:262
 cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368
 start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271
 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
================================================================================

It is because time_maxerror is set as 0x7FFFFFFFFFFFFFFF by user. It
overflows when we add it with 'MAXFREQ / NSEC_PER_USEC' in
'second_overflow()'.

This patch add a limit check and saturate it when the user set
'time_maxerror'.

Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

d8df4fe5

scsi: hisi_sas: add softreset behind abort device at I_T_nexus_reset() to... · 55f77c08

Luo Jiaxing authored 6 years ago and

谢秀奇 committed 5 years ago

scsi: hisi_sas: add softreset behind abort device at I_T_nexus_reset() to ensure decoupling of SATA device

driver inclusion
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

We found out that SATA disk can not be write but read only after
system come up. No abnormal IO have come back between init, but
when we try to write SATA disk, the IO can not return and timeout.

We notice that one if-check is remove at sas_I_T_nexus(), and it
cause internal_task_abort() will be allow to run besides error
handle, and obviously softreset_ata() did not run after this
condition, so it's clear that SATA disk is not decoupling.

Fixes: 0de2941 ("scsi: hisi_sas: remove the check of sas_dev status in function hisi_sas_I_T_nexus_reset()")

Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

55f77c08

ext4: add mask of ext4 flags to swap · 3d274170

yangerkun authored 6 years ago and

谢秀奇 committed 5 years ago


mainline inclusion
from mainline-next
commit abdc644e
category: bugfix
bugzilla: 5355
CVE: NA
--------------------------------------------------

The reason is that while swapping two inode, we swap the flags too.
Some flags such as EXT4_JOURNAL_DATA_FL can really confuse the things
since we're not resetting the address operations structure.  The
simplest way to keep things sane is to restrict the flags that can be
swapped.

Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Reviewed-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

3d274170

ext4: update quota information while swapping boot loader inode · 8641b4d6

yangerkun authored 6 years ago and

谢秀奇 committed 5 years ago


mainline inclusion
from mainline-next
commit aa507b5f
category: bugfix
bugzilla: 5355
CVE: NA
--------------------------------------------------

While do swap between two inode, they swap i_data without update
quota information. Also, swap_inode_boot_loader can do "revert"
somtimes, so update the quota while all operations has been finished.

Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Reviewed-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

8641b4d6

ext4: cleanup pagecache before swap i_data · 14048703

yangerkun authored 6 years ago and

谢秀奇 committed 5 years ago


mainline inclusion
from mainline-next
commit a46c68a3
category: bugfix
bugzilla: 5355
CVE: NA
--------------------------------------------------

While do swap, we should make sure there has no new dirty page since we
should swap i_data between two inode:
1.We should lock i_mmap_sem with write to avoid new pagecache from mmap
read/write;
2.Change filemap_flush to filemap_write_and_wait and move them to the
space protected by inode lock to avoid new pagecache from buffer read/write.

Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Reviewed-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

14048703

ext4: fix check of inode in swap_inode_boot_loader · b8d0a408

yangerkun authored 6 years ago and

谢秀奇 committed 5 years ago


mainline inclusion
from mainline-next
commit 67a11611
category: bugfix
bugzilla: 5355
CVE: NA
--------------------------------------------------

Before really do swap between inode and boot inode, something need to
check to avoid invalid or not permitted operation, like does this inode
has inline data. But the condition check should be protected by inode
lock to avoid change while swapping. Also some other condition will not
change between swapping, but there has no problem to do this under inode
lock.

Fixes: ee3c859409("ext4: disallow files with EXT4_JOURNAL_DATA_FL ...")
Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Reviewed-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

b8d0a408

assoc_array: Fix shortcut creation · 46bf099f

David Howells authored 6 years ago and

谢秀奇 committed 5 years ago


mainline inclusion
from mainline-5.0-rc8
commit bb2ba2d7
category: bugfix
bugzilla: 10759
CVE: NA

-------------------------------------------------

Fix the creation of shortcuts for which the length of the index key value
is an exact multiple of the machine word size.  The problem is that the
code that blanks off the unused bits of the shortcut value malfunctions if
the number of bits in the last word equals machine word size.  This is due
to the "<<" operator being given a shift of zero in this case, and so the
mask that should be all zeros is all ones instead.  This causes the
subsequent masking operation to clear everything rather than clearing
nothing.

Ordinarily, the presence of the hash at the beginning of the tree index key
makes the issue very hard to test for, but in this case, it was encountered
due to a development mistake that caused the hash output to be either 0
(keyring) or 1 (non-keyring) only.  This made it susceptible to the
keyctl/unlink/valid test in the keyutils package.

The fix is simply to skip the blanking if the shift would be 0.  For
example, an index key that is 64 bits long would produce a 0 shift and thus
a 'blank' of all 1s.  This would then be inverted and AND'd onto the
index_key, incorrectly clearing the entire last word.

Fixes: 3cb98950 ("Add a generic associative array implementation.")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
Reviewed-by: Li Bin <huawei.libin@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

46bf099f

bpf: fix lockdep false positive in stackmap · ec9d64f0

Alexei Starovoitov authored 6 years ago and

谢秀奇 committed 5 years ago


mainline inclusion
from mainline-5.0-rc8
commit 3defaf2f
category: bugfix
bugzilla: 10760
CVE: NA

-------------------------------------------------

Lockdep warns about false positive:
[   11.211460] ------------[ cut here ]------------
[   11.211936] DEBUG_LOCKS_WARN_ON(depth <= 0)
[   11.211985] WARNING: CPU: 0 PID: 141 at ../kernel/locking/lockdep.c:3592 lock_release+0x1ad/0x280
[   11.213134] Modules linked in:
[   11.214954] RIP: 0010:lock_release+0x1ad/0x280
[   11.223508] Call Trace:
[   11.223705]  <IRQ>
[   11.223874]  ? __local_bh_enable+0x7a/0x80
[   11.224199]  up_read+0x1c/0xa0
[   11.224446]  do_up_read+0x12/0x20
[   11.224713]  irq_work_run_list+0x43/0x70
[   11.225030]  irq_work_run+0x26/0x50
[   11.225310]  smp_irq_work_interrupt+0x57/0x1f0
[   11.225662]  irq_work_interrupt+0xf/0x20

since rw_semaphore is released in a different task vs task that locked the sema.
It is expected behavior.
Fix the warning with up_read_non_owner() and rwsem_release() annotation.

Fixes: bae77c5e ("bpf: enable stackmap with build_id in nmi context")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
Reviewed-by: Li Bin <huawei.libin@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

ec9d64f0

perf: Paper over the hw.target problems · e597bc6a

Alexander Shishkin authored 6 years ago and

谢秀奇 committed 5 years ago

euler inclusion
category: bugfix
bugzilla: 9513/11006
CVE: NA
--------------------------------------------------

[ Cheng Jian
HULK-Syzkaller reported a problem which has been reported
to mainline(lkml) by syzbot early, this patch comes from the
reply form lkml.
https://lkml.org/lkml/2019/2/28/529

 ]

First, we have a race between perf_event_release_kernel() and
perf_free_event(), which happens when parent's event is released while the
child's fork fails (because of a fatal signal, for example), that looks
like this:

cpu X                            cpu Y
-----                            -----
                                 copy_process() error path
perf_release(parent)             +->perf_event_free_task()
+-> lock(child_ctx->mutex)       |  |
+-> remove_from_context(child)   |  |
+-> unlock(child_ctx->mutex)     |  |
|                                |  +-> lock(child_ctx->mutex)
|                                |  +-> unlock(child_ctx->mutex)
|                                +-> free_task(child_task)
+-> put_task_struct(child_task)

Technically, we're still holding a reference to the task via
parent->hw.target, that's not stopping free_task(), so we end up poking at
free'd memory, as is pointed out by KASAN in the syzkaller report (see Link
below). The straightforward fix is to drop the hw.target reference while
the task is still around.

Therein lies the second problem: the users of hw.target (uprobe) assume
that it's around at ->destroy() callback time, where they use it for
context. So, in order to not break the uprobe teardown and avoid leaking
stuff, we need to call ->destroy() at the same time.

This patch fixes the race and the subsequent fallout by doing both these
things at remove_from_context time.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: https://syzkaller.appspot.com/bug?extid=a24c397a29ad22d86c98


Reported-by:  <syzbot+a24c397a29ad22d86c98@syzkaller.appspotmail.com>
Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
Reviewed-by: Li Bin <huawei.libin@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

e597bc6a

mm: hwpoison: fix thp split handing in soft_offline_in_use_page() · 41c46e00

zhongjiang authored 6 years ago and

谢秀奇 committed 5 years ago


mainline inclusion
from mainline-5.x
commit: <not-yet-available>
category: bugfix
bugzilla: 10883
CVE: NA

------------------------------------------------

When soft_offline_in_use_page() runs on a thp tail page after pmd is split,
we trigger the following VM_BUG_ON_PAGE():

Memory failure: 0x3755ff: non anonymous thp
__get_any_page: 0x3755ff: unknown zero refcount page type 2fffff80000000
Soft offlining pfn 0x34d805 at process virtual address 0x20fff000
page:ffffea000d360140 count:0 mapcount:0 mapping:0000000000000000 index:0x1
flags: 0x2fffff80000000()
raw: 002fffff80000000 ffffea000d360108 ffffea000d360188 0000000000000000
raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
------------[ cut here ]------------
kernel BUG at ./include/linux/mm.h:519!

soft_offline_in_use_page() passed refcount and page lock from tail page to
head page, which is not needed because we can pass any subpage to
split_huge_page().

Naoya had fixed the similar issue in the commit c3901e72 ("
mm: hwpoison: fix thp split handling in memory_failure()"). But he missed
fixing soft offline.

Fixes: 61f5d698 ("mm: re-enable THP")
Cc: <stable@vger.kernel.org>        [4.5+]
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: zhongjiang <zhongjiang@huawei.com>
Reviewed-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

41c46e00

hugetlbfs: fix memory leak for resv_map · bd575e60

Yufen Yu authored 6 years ago and

谢秀奇 committed 5 years ago


euler inclusion
category: bugfix
bugzilla: 10984
CVE: NA
---------------------------

When .mknod create a block device file in hugetlbfs, it will
allocate an inode, and kmalloc a 'struct resv_map' in resv_map_alloc().
For now, inode->i_mapping->private_data is used to point the resv_map.
However, when open the device, bd_acquire() will set i_mapping as
bd_inode->imapping, result in resv_map memory leak.

We fix the leak by adding a new entry resv_map in hugetlbfs_inode_info.
It can store resv_map pointer.

Programs to reproduce:
	mount -t hugetlbfs nodev hugetlbfs
	mknod hugetlbfs/dev b 0 0
	exec 30<> hugetlbfs/dev
	umount hugetlbfs/

Fixes: 9119a41e ("mm, hugetlb: unify region structure handling")
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

bd575e60

Linux 4.19.27 · d95806c9

Greg Kroah-Hartman authored 6 years ago and

谢秀奇 committed 5 years ago


Merge 75 patches from 4.19.27 stable
branch (79 total) beside 4 already merged patches

0655618 irqchip/gic-v3-mbi: Fix uninitialized mbi_lock
5024f0a sched/wait: Fix rcuwait_wake_up() ordering
2368e6d futex: Fix (possible) missed wakeup
9ad6216 locking/rwsem: Fix (possible) missed wakeup

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

d95806c9

x86/uaccess: Don't leak the AC flag into __put_user() value evaluation · 9975334e

Andy Lutomirski authored 6 years ago and

谢秀奇 committed 5 years ago


commit 2a418cf3 upstream.

When calling __put_user(foo(), ptr), the __put_user() macro would call
foo() in between __uaccess_begin() and __uaccess_end().  If that code
were buggy, then those bugs would be run without SMAP protection.

Fortunately, there seem to be few instances of the problem in the
kernel. Nevertheless, __put_user() should be fixed to avoid doing this.
Therefore, evaluate __put_user()'s argument before setting AC.

This issue was noticed when an objtool hack by Peter Zijlstra complained
about genregs_get() and I compared the assembly output to the C source.

 [ bp: Massage commit message and fixed up whitespace. ]

Fixes: 11f1a4b9 ("x86: reorganize SMAP handling in user space accesses")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20190225125231.845656645@infradead.org


Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

9975334e

MIPS: eBPF: Fix icache flush end address · 466a5894

Paul Burton authored 6 years ago and

谢秀奇 committed 5 years ago


commit d1a2930d upstream.

The MIPS eBPF JIT calls flush_icache_range() in order to ensure the
icache observes the code that we just wrote. Unfortunately it gets the
end address calculation wrong due to some bad pointer arithmetic.

The struct jit_ctx target field is of type pointer to u32, and as such
adding one to it will increment the address being pointed to by 4 bytes.
Therefore in order to find the address of the end of the code we simply
need to add the number of 4 byte instructions emitted, but we mistakenly
add the number of instructions multiplied by 4. This results in the call
to flush_icache_range() operating on a memory region 4x larger than
intended, which is always wasteful and can cause crashes if we overrun
into an unmapped page.

Fix this by correcting the pointer arithmetic to remove the bogus
multiplication, and use braces to remove the need for a set of brackets
whilst also making it obvious that the target field is a pointer.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: b6bd53f9 ("MIPS: Add missing file for eBPF JIT.")
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: netdev@vger.kernel.org
Cc: bpf@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

466a5894

MIPS: BCM63XX: provide DMA masks for ethernet devices · 1dac92b6

Jonas Gorski authored 6 years ago and

谢秀奇 committed 5 years ago


commit 18836b48 upstream.

The switch to the generic dma ops made dma masks mandatory, breaking
devices having them not set. In case of bcm63xx, it broke ethernet with
the following warning when trying to up the device:

[    2.633123] ------------[ cut here ]------------
[    2.637949] WARNING: CPU: 0 PID: 325 at ./include/linux/dma-mapping.h:516 bcm_enetsw_open+0x160/0xbbc
[    2.647423] Modules linked in: gpio_button_hotplug
[    2.652361] CPU: 0 PID: 325 Comm: ip Not tainted 4.19.16 #0
[    2.658080] Stack : 80520000 804cd3ec 00000000 00000000 804ccc00 87085bdc 87d3f9d4 804f9a17
[    2.666707]         8049cf18 00000145 80a942a0 00000204 80ac0000 10008400 87085b90 eb3d5ab7
[    2.675325]         00000000 00000000 80ac0000 000022b0 00000000 00000000 00000007 00000000
[    2.683954]         0000007a 80500000 0013b381 00000000 80000000 00000000 804a1664 80289878
[    2.692572]         00000009 00000204 80ac0000 00000200 00000002 00000000 00000000 80a90000
[    2.701191]         ...
[    2.703701] Call Trace:
[    2.706244] [<8001f3c8>] show_stack+0x58/0x100
[    2.710840] [<800336e4>] __warn+0xe4/0x118
[    2.715049] [<800337d4>] warn_slowpath_null+0x48/0x64
[    2.720237] [<80289878>] bcm_enetsw_open+0x160/0xbbc
[    2.725347] [<802d1d4c>] __dev_open+0xf8/0x16c
[    2.729913] [<802d20cc>] __dev_change_flags+0x100/0x1c4
[    2.735290] [<802d21b8>] dev_change_flags+0x28/0x70
[    2.740326] [<803539e0>] devinet_ioctl+0x310/0x7b0
[    2.745250] [<80355fd8>] inet_ioctl+0x1f8/0x224
[    2.749939] [<802af290>] sock_ioctl+0x30c/0x488
[    2.754632] [<80112b34>] do_vfs_ioctl+0x740/0x7dc
[    2.759459] [<80112c20>] ksys_ioctl+0x50/0x94
[    2.763955] [<800240b8>] syscall_common+0x34/0x58
[    2.768782] ---[ end trace fb1a6b14d74e28b6 ]---
[    2.773544] bcm63xx_enetsw bcm63xx_enetsw.0: cannot allocate rx ring 512

Fix this by adding appropriate DMA masks for the platform devices.

Fixes: f8c55dc6 ("MIPS: use generic dma noncoherent ops for simple noncoherent platforms")
Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

1dac92b6

MIPS: fix truncation in __cmpxchg_small for short values · fed715e6

Michael Clark authored 6 years ago and

谢秀奇 committed 5 years ago


commit 94ee12b5 upstream.

__cmpxchg_small erroneously uses u8 for load comparison which can
be either char or short. This patch changes the local variable to
u32 which is sufficiently sized, as the loaded value is already
masked and shifted appropriately. Using an integer size avoids
any unnecessary canonicalization from use of non native widths.

This patch is part of a series that adapts the MIPS small word
atomics code for xchg and cmpxchg on short and char to RISC-V.

Cc: RISC-V Patches <patches@groups.riscv.org>
Cc: Linux RISC-V <linux-riscv@lists.infradead.org>
Cc: Linux MIPS <linux-mips@linux-mips.org>
Signed-off-by: Michael Clark <michaeljclark@mac.com>
[paul.burton@mips.com:
  - Fix varialble typo per Jonas Gorski.
  - Consolidate load variable with other declarations.]
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: 3ba7f44d ("MIPS: cmpxchg: Implement 1 byte & 2 byte cmpxchg()")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

fed715e6

hugetlbfs: fix races and page leaks during migration · 58d3ea0e

Mike Kravetz authored 6 years ago and

谢秀奇 committed 5 years ago

commit cb6acd01 upstream.

hugetlb pages should only be migrated if they are 'active'.  The
routines set/clear_page_huge_active() modify the active state of hugetlb
pages.

When a new hugetlb page is allocated at fault time, set_page_huge_active
is called before the page is locked.  Therefore, another thread could
race and migrate the page while it is being added to page table by the
fault code.  This race is somewhat hard to trigger, but can be seen by
strategically adding udelay to simulate worst case scheduling behavior.
Depending on 'how' the code races, various BUG()s could be triggered.

To address this issue, simply delay the set_page_huge_active call until
after the page is successfully added to the page table.

Hugetlb pages can also be leaked at migration time if the pages are
associated with a file in an explicitly mounted hugetlbfs filesystem.
For example, consider a two node system with 4GB worth of huge pages
available.  A program mmaps a 2G file in a hugetlbfs filesystem.  It
then migrates the pages associated with the file from one node to
another.  When the program exits, huge page counts are as follows:

  node0
  1024    free_hugepages
  1024    nr_hugepages

  node1
  0       free_hugepages
  1024    nr_hugepages

  Filesystem                         Size  Used Avail Use% Mounted on
  nodev                              4.0G  2.0G  2.0G  50% /var/opt/hugepool

That is as expected.  2G of huge pages are taken from the free_hugepages
counts, and 2G is the size of the file in the explicitly mounted
filesystem.  If the file is then removed, the counts become:

  node0
  1024    free_hugepages
  1024    nr_hugepages

  node1
  1024    free_hugepages
  1024    nr_hugepages

  Filesystem                         Size  Used Avail Use% Mounted on
  nodev                              4.0G  2.0G  2.0G  50% /var/opt/hugepool

Note that the filesystem still shows 2G of pages used, while there
actually are no huge pages in use.  The only way to 'fix' the filesystem
accounting is to unmount the filesystem

If a hugetlb page is associated with an explicitly mounted filesystem,
this information in contained in the page_private field.  At migration
time, this information is not preserved.  To fix, simply transfer
page_private from old to new page at migration time if necessary.

There is a related race with removing a huge page from a file and
migration.  When a huge page is removed from the pagecache, the
page_mapping() field is cleared, yet page_private remains set until the
page is actually freed by free_huge_page().  A page could be migrated
while in this state.  However, since page_mapping() is not set the
hugetlbfs specific routine to transfer page_private is not called and we
leak the page count in the filesystem.

To fix that, check for this condition before migrating a huge page.  If
the condition is detected, return EBUSY for the page.

Link: http://lkml.kernel.org/r/74510272-7319-7372-9ea6-ec914734c179@oracle.com
Link: http://lkml.kernel.org/r/20190212221400.3512-1-mike.kravetz@oracle.com


Fixes: bcc54222 ("mm: hugetlb: introduce page_huge_active")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <stable@vger.kernel.org>
[mike.kravetz@oracle.com: v2]
  Link: http://lkml.kernel.org/r/7534d322-d782-8ac6-1c8d-a8dc380eb3ab@oracle.com
[mike.kravetz@oracle.com: update comment and changelog]
  Link: http://lkml.kernel.org/r/420bcfd6-158b-38e4-98da-26d0cd85bd01@oracle.com


Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

58d3ea0e

drm: Block fb changes for async plane updates · 466ab537

Nicholas Kazlauskas authored 6 years ago and

谢秀奇 committed 5 years ago


commit 22163229 upstream.

The prepare_fb call always happens on new_plane_state.

The drm_atomic_helper_cleanup_planes checks to see if
plane state pointer has changed when deciding to call cleanup_fb on
either the new_plane_state or the old_plane_state.

For a non-async atomic commit the state pointer is swapped, so this
helper calls prepare_fb on the new_plane_state and cleanup_fb on the
old_plane_state. This makes sense, since we want to prepare the
framebuffer we are going to use and cleanup the the framebuffer we are
no longer using.

For the async atomic update helpers this differs. The async atomic
update helpers perform in-place updates on the existing state. They call
drm_atomic_helper_cleanup_planes but the state pointer is not swapped.
This means that prepare_fb is called on the new_plane_state and
cleanup_fb is called on the new_plane_state (not the old).

In the case where old_plane_state->fb == new_plane_state->fb then
there should be no behavioral difference between an async update
and a non-async commit. But there are issues that arise when
old_plane_state->fb != new_plane_state->fb.

The first is that the new_plane_state->fb is immediately cleaned up
after it has been prepared, so we're using a fb that we shouldn't
be.

The second occurs during a sequence of async atomic updates and
non-async regular atomic commits. Suppose there are two framebuffers
being interleaved in a double-buffering scenario, fb1 and fb2:

- Async update, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb1
- Async update, oldfb = fb1, newfb = fb2, prepare fb2, cleanup fb2
- Non-async commit, oldfb = fb2, newfb = fb1, prepare fb1, cleanup fb2

We call cleanup_fb on fb2 twice in this example scenario, and any
further use will result in use-after-free.

The simple fix to this problem is to block framebuffer changes
in the drm_atomic_helper_async_check function for now.

v2: Move check by itself, add a FIXME (Daniel)

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: <stable@vger.kernel.org> # v4.14+
Fixes: fef9df8b ("drm/atomic: initial support for asynchronous plane update")
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Link: https://patchwork.freedesktop.org/patch/275364/


Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

466ab537

mm: enforce min addr even if capable() in expand_downwards() · 811c0f72

Jann Horn authored 6 years ago and

谢秀奇 committed 5 years ago


commit 0a1d5299 upstream.

security_mmap_addr() does a capability check with current_cred(), but
we can reach this code from contexts like a VFS write handler where
current_cred() must not be used.

This can be abused on systems without SMAP to make NULL pointer
dereferences exploitable again.

Fixes: 8869477a ("security: protect from stack expansion into low vm addresses")
Cc: stable@kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

811c0f72

mmc: sdhci-esdhc-imx: correct the fix of ERR004536 · e5e6aa1f

BOUGH CHEN authored 6 years ago and

谢秀奇 committed 5 years ago


commit e30be063 upstream.

Commit 18094430 ("mmc: sdhci-esdhc-imx: add ADMA Length
Mismatch errata fix") involve the fix of ERR004536, but the
fix is incorrect. Double confirm with IC, need to clear the
bit 7 of register 0x6c rather than set this bit 7.
Here is the definition of bit 7 of 0x6c:
    0: enable the new IC fix for ERR004536
    1: do not use the IC fix, keep the same as before

Find this issue on i.MX845s-evk board when enable CMDQ, and
let system in heavy loading.

root@imx8mmevk:~# dd if=/dev/mmcblk2 of=/dev/null bs=1M &
root@imx8mmevk:~# memtester 1000M > /dev/zero &
root@imx8mmevk:~# [  139.897220] mmc2: cqhci: timeout for tag 16
[  139.901417] mmc2: cqhci: ============ CQHCI REGISTER DUMP ===========
[  139.907862] mmc2: cqhci: Caps:      0x0000310a | Version:  0x00000510
[  139.914311] mmc2: cqhci: Config:    0x00001001 | Control:  0x00000000
[  139.920753] mmc2: cqhci: Int stat:  0x00000000 | Int enab: 0x00000006
[  139.927193] mmc2: cqhci: Int sig:   0x00000006 | Int Coal: 0x00000000
[  139.933634] mmc2: cqhci: TDL base:  0x7809c000 | TDL up32: 0x00000000
[  139.940073] mmc2: cqhci: Doorbell:  0x00030000 | TCN:      0x00000000
[  139.946518] mmc2: cqhci: Dev queue: 0x00010000 | Dev Pend: 0x00010000
[  139.952967] mmc2: cqhci: Task clr:  0x00000000 | SSC1:     0x00011000
[  139.959411] mmc2: cqhci: SSC2:      0x00000001 | DCMD rsp: 0x00000000
[  139.965857] mmc2: cqhci: RED mask:  0xfdf9a080 | TERRI:    0x00000000
[  139.972308] mmc2: cqhci: Resp idx:  0x0000002e | Resp arg: 0x00000900
[  139.978761] mmc2: sdhci: ============ SDHCI REGISTER DUMP ===========
[  139.985214] mmc2: sdhci: Sys addr:  0xb2c19000 | Version:  0x00000002
[  139.991669] mmc2: sdhci: Blk size:  0x00000200 | Blk cnt:  0x00000400
[  139.998127] mmc2: sdhci: Argument:  0x40110400 | Trn mode: 0x00000033
[  140.004618] mmc2: sdhci: Present:   0x01088a8f | Host ctl: 0x00000030
[  140.011113] mmc2: sdhci: Power:     0x00000002 | Blk gap:  0x00000080
[  140.017583] mmc2: sdhci: Wake-up:   0x00000008 | Clock:    0x0000000f
[  140.024039] mmc2: sdhci: Timeout:   0x0000008f | Int stat: 0x00000000
[  140.030497] mmc2: sdhci: Int enab:  0x107f4000 | Sig enab: 0x107f4000
[  140.036972] mmc2: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000502
[  140.043426] mmc2: sdhci: Caps:      0x07eb0000 | Caps_1:   0x8000b407
[  140.049867] mmc2: sdhci: Cmd:       0x00002c1a | Max curr: 0x00ffffff
[  140.056314] mmc2: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0xffffffff
[  140.062755] mmc2: sdhci: Resp[2]:   0x328f5903 | Resp[3]:  0x00d00f00
[  140.069195] mmc2: sdhci: Host ctl2: 0x00000008
[  140.073640] mmc2: sdhci: ADMA Err:  0x00000007 | ADMA Ptr: 0x7809c108
[  140.080079] mmc2: sdhci: ============================================
[  140.086662] mmc2: running CQE recovery

Fixes: 18094430 ("mmc: sdhci-esdhc-imx: add ADMA Length Mismatch errata fix")
Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

e5e6aa1f

mmc: cqhci: Fix a tiny potential memory leak on error condition · 865d5f28

Alamy Liu authored 6 years ago and

谢秀奇 committed 5 years ago


commit d07e9fad upstream.

Free up the allocated memory in the case of error return

The value of mmc_host->cqe_enabled stays 'false'. Thus, cqhci_disable
(mmc_cqe_ops->cqe_disable) won't be called to free the memory.  Also,
cqhci_disable() seems to be designed to disable and free all resources, not
suitable to handle this corner case.

Fixes: a4080225 ("mmc: cqhci: support for command queue enabled host")
Signed-off-by: Alamy Liu <alamy.liu@gmail.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

865d5f28

mmc: cqhci: fix space allocated for transfer descriptor · 18595dde

Alamy Liu authored 6 years ago and

谢秀奇 committed 5 years ago


commit 27ec9dc1 upstream.

There is not enough space being allocated when DCMD is disabled.

CQE_DCMD is not necessary to be enabled when CQE is enabled.
(Software could halt CQE to send command)

In the case that CQE_DCMD is not enabled, it still needs to allocate
space for data transfer. For instance:
  CQE_DCMD is enabled:  31 slots space (one slot used by DCMD)
  CQE_DCMD is disabled: 32 slots space

Fixes: a4080225 ("mmc: cqhci: support for command queue enabled host")
Signed-off-by: Alamy Liu <alamy.liu@gmail.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

18595dde

mmc: core: Fix NULL ptr crash from mmc_should_fail_request · 658988db

Ritesh Harjani authored 6 years ago and

谢秀奇 committed 5 years ago


commit e5723f95 upstream.

In case of CQHCI, mrq->cmd may be NULL for data requests (non DCMD).
In such case mmc_should_fail_request is directly dereferencing
mrq->cmd while cmd is NULL.
Fix this by checking for mrq->cmd pointer.

Fixes: 72a5af55 ("mmc: core: Add support for handling CQE requests")
Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

658988db

mmc: tmio: fix access width of Block Count Register · ec95b697

Takeshi Saito authored 6 years ago and

谢秀奇 committed 5 years ago


commit 5603731a upstream.

In R-Car Gen2 or later, the maximum number of transfer blocks are
changed from 0xFFFF to 0xFFFFFFFF. Therefore, Block Count Register
should use iowrite32().

If another system (U-boot, Hypervisor OS, etc) uses bit[31:16], this
value will not be cleared. So, SD/MMC card initialization fails.

So, check for the bigger register and use apropriate write. Also, mark
the register as extended on Gen2.

Signed-off-by: Takeshi Saito <takeshi.saito.xv@renesas.com>
[wsa: use max_blk_count in if(), add Gen2, update commit message]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: stable@kernel.org
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
[Ulf: Fixed build error]
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

ec95b697

mmc: tmio_mmc_core: don't claim spurious interrupts · 389cf99a

Sergei Shtylyov authored 6 years ago and

谢秀奇 committed 5 years ago


commit 5c27ff5d upstream.

I have encountered an interrupt storm during the eMMC chip probing (and
the chip finally didn't get detected).  It turned out that U-Boot left
the DMAC interrupts enabled while the Linux driver  didn't use those.
The SDHI driver's interrupt handler somehow assumes that, even if an
SDIO interrupt didn't happen, it should return IRQ_HANDLED.  I think
that if none of the enabled interrupts happened and got handled, we
should return IRQ_NONE -- that way the kernel IRQ code recoginizes
a spurious interrupt and masks it off pretty quickly...

Fixes: 7729c7a2 ("mmc: tmio: Provide separate interrupt handlers")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

389cf99a

mmc: spi: Fix card detection during probe · a37f8380

Jonathan Neuschäfer authored 6 years ago and

谢秀奇 committed 5 years ago


commit c9bd505d upstream.

When using the mmc_spi driver with a card-detect pin, I noticed that the
card was not detected immediately after probe, but only after it was
unplugged and plugged back in (and the CD IRQ fired).

The call tree looks something like this:

mmc_spi_probe
  mmc_add_host
    mmc_start_host
      _mmc_detect_change
        mmc_schedule_delayed_work(&host->detect, 0)
          mmc_rescan
            host->bus_ops->detect(host)
              mmc_detect
                _mmc_detect_card_removed
                  host->ops->get_cd(host)
                    mmc_gpio_get_cd -> -ENOSYS (ctx->cd_gpio not set)
  mmc_gpiod_request_cd
    ctx->cd_gpio = desc

To fix this issue, call mmc_detect_change after the card-detect GPIO/IRQ
is registered.

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

a37f8380

kvm: selftests: Fix region overlap check in kvm_util · 8b835f79

Ben Gardon authored 6 years ago and

谢秀奇 committed 5 years ago


[ Upstream commit 94a980c3 ]

Fix a call to userspace_mem_region_find to conform to its spec of
taking an inclusive, inclusive range. It was previously being called
with an inclusive, exclusive range. Also remove a redundant region bounds
check in vm_userspace_mem_region_add. Region overlap checking is already
performed by the call to userspace_mem_region_find.

Tested: Compiled tools/testing/selftests/kvm with -static
	Ran all resulting test binaries on an Intel Haswell test machine
	All tests passed

Signed-off-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>

8b835f79