- Mar 11, 2021
-
-
Steven Rostedt (VMware) authored
stable inclusion from linux-4.19.176 commit a19749a5fbd97f2147a8768813d084d584d9e045 -------------------------------- commit 7e0a9220467dbcfdc5bc62825724f3e52e50ab31 upstream. On some archs, the idle task can call into cpu_suspend(). The cpu_suspend() will disable or pause function graph tracing, as there's some paths in bringing down the CPU that can have issues with its return address being modified. The task_struct structure has a "tracing_graph_pause" atomic counter, that when set to something other than zero, the function graph tracer will not modify the return address. The problem is that the tracing_graph_pause counter is initialized when the function graph tracer is enabled. This can corrupt the counter for the idle task if it is suspended in these architectures. CPU 1 CPU 2 ----- ----- do_idle() cpu_suspend() pause_graph_tracing() task_struct->tracing_graph_pause++ (0 -> 1) start_graph_tracing() for_each_online_cpu(cpu) { ftrace_graph_init_idle_task(cpu) task-struct->tracing_graph_pause = 0 (1 -> 0) unpause_graph_tracing() task_struct->tracing_graph_pause-- (0 -> -1) The above should have gone from 1 to zero, and enabled function graph tracing again. But instead, it is set to -1, which keeps it disabled. There's no reason that the field tracing_graph_pause on the task_struct can not be initialized at boot up. Cc: stable@vger.kernel.org Fixes: 380c4b14 ("tracing/function-graph-tracer: append the tracing_graph_flag") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211339 Reported-by:
<pierre.gondois@arm.com> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Masami Hiramatsu authored
stable inclusion from linux-4.19.176 commit 960434acef375b2a73168b467fdfc7fa0a11a68e -------------------------------- commit 97c753e62e6c31a404183898d950d8c08d752dbd upstream. Fix kprobe_on_func_entry() returns error code instead of false so that register_kretprobe() can return an appropriate error code. append_trace_kprobe() expects the kprobe registration returns -ENOENT when the target symbol is not found, and it checks whether the target module is unloaded or not. If the target module doesn't exist, it defers to probe the target symbol until the module is loaded. However, since register_kretprobe() returns -EINVAL instead of -ENOENT in that case, it always fail on putting the kretprobe event on unloaded modules. e.g. Kprobe event: /sys/kernel/debug/tracing # echo p xfs:xfs_end_io >> kprobe_events [ 16.515574] trace_kprobe: This probe might be able to register after target module is loaded. Continue. Kretprobe event: (p -> r) /sys/kernel/debug/tracing # echo r xfs:xfs_end_io >> kprobe_events sh: write error: Invalid argument /sys/kernel/debug/tracing # cat error_log [ 41.122514] trace_kprobe: error: Failed to register probe event Command: r xfs:xfs_end_io ^ To fix this bug, change kprobe_on_func_entry() to detect symbol lookup failure and return -ENOENT in that case. Otherwise it returns -EINVAL or 0 (succeeded, given address is on the entry). Link: https://lkml.kernel.org/r/161176187132.1067016.8118042342894378981.stgit@devnote2 Cc: stable@vger.kernel.org Fixes: 59158ec4 ("tracing/kprobes: Check the probe on unloaded module correctly") Reported-by:
Jianlin Lv <Jianlin.Lv@arm.com> Signed-off-by:
Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Conflicts: kernel/kprobes.c [yyl: adjust context] Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Xiao Ni authored
stable inclusion from linux-4.19.175 commit ada82f9940e92650221b677a46afd086632ed1c7 -------------------------------- commit dc5d17a3c39b06aef866afca19245a9cfb533a79 upstream. One customer reports a crash problem which causes by flush request. It triggers a warning before crash. /* new request after previous flush is completed */ if (ktime_after(req_start, mddev->prev_flush_start)) { WARN_ON(mddev->flush_bio); mddev->flush_bio = bio; bio = NULL; } The WARN_ON is triggered. We use spin lock to protect prev_flush_start and flush_bio in md_flush_request. But there is no lock protection in md_submit_flush_data. It can set flush_bio to NULL first because of compiler reordering write instructions. For example, flush bio1 sets flush bio to NULL first in md_submit_flush_data. An interrupt or vmware causing an extended stall happen between updating flush_bio and prev_flush_start. Because flush_bio is NULL, flush bio2 can get the lock and submit to underlayer disks. Then flush bio1 updates prev_flush_start after the interrupt or extended stall. Then flush bio3 enters in md_flush_request. The start time req_start is behind prev_flush_start. The flush_bio is not NULL(flush bio2 hasn't finished). So it can trigger the WARN_ON now. Then it calls INIT_WORK again. INIT_WORK() will re-initialize the list pointers in the work_struct, which then can result in a corrupted work list and the work_struct queued a second time. With the work list corrupted, it can lead in invalid work items being used and cause a crash in process_one_work. We need to make sure only one flush bio can be handled at one same time. So add spin lock in md_submit_flush_data to protect prev_flush_start and flush_bio in an atomic way. Reviewed-by:
David Jeffery <djeffery@redhat.com> Signed-off-by:
Xiao Ni <xni@redhat.com> Signed-off-by:
Song Liu <songliubraving@fb.com> Signed-off-by:
Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Hugh Dickins authored
stable inclusion from linux-4.19.175 commit 081438440a6e0787d0e4c933bc9e447ac5d217bb -------------------------------- commit 1c2f67308af4c102b4e1e6cd6f69819ae59408e0 upstream. Sergey reported deadlock between kswapd correctly doing its usual lock_page(page) followed by down_read(page->mapping->i_mmap_rwsem), and madvise(MADV_REMOVE) on an madvise(MADV_HUGEPAGE) area doing down_write(page->mapping->i_mmap_rwsem) followed by lock_page(page). This happened when shmem_fallocate(punch hole)'s unmap_mapping_range() reaches zap_pmd_range()'s call to __split_huge_pmd(). The same deadlock could occur when partially truncating a mapped huge tmpfs file, or using fallocate(FALLOC_FL_PUNCH_HOLE) on it. __split_huge_pmd()'s page lock was added in 5.8, to make sure that any concurrent use of reuse_swap_page() (holding page lock) could not catch the anon THP's mapcounts and swapcounts while they were being split. Fortunately, reuse_swap_page() is never applied to a shmem or file THP (not even by khugepaged, which checks PageSwapCache before calling), and anonymous THPs are never created in shmem or file areas: so that __split_huge_pmd()'s page lock can only be necessary for anonymous THPs, on which there is no risk of deadlock with i_mmap_rwsem. Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2101161409470.2022@eggly.anvils Fixes: c444eb56 ("mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked()") Signed-off-by:
Hugh Dickins <hughd@google.com> Reported-by:
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Reviewed-by:
Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Muchun Song authored
stable inclusion from linux-4.19.175 commit 6bf5461ae968b870f81c813a880e0e3a2684dfc1 -------------------------------- commit ecbf4724e6061b4b01be20f6d797d64d462b2bc8 upstream. The page_huge_active() can be called from scan_movable_pages() which do not hold a reference count to the HugeTLB page. So when we call page_huge_active() from scan_movable_pages(), the HugeTLB page can be freed parallel. Then we will trigger a BUG_ON which is in the page_huge_active() when CONFIG_DEBUG_VM is enabled. Just remove the VM_BUG_ON_PAGE. Link: https://lkml.kernel.org/r/20210115124942.46403-6-songmuchun@bytedance.com Fixes: 7e1f049e ("mm: hugetlb: cleanup using paeg_huge_active()") Signed-off-by:
Muchun Song <songmuchun@bytedance.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Acked-by:
Michal Hocko <mhocko@suse.com> Reviewed-by:
Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Muchun Song authored
stable inclusion from linux-4.19.175 commit 532574ae2586940729419253fd2defd9c9880490 -------------------------------- commit 0eb2df2b5629794020f75e94655e1994af63f0d4 upstream. There is a race between isolate_huge_page() and __free_huge_page(). CPU0: CPU1: if (PageHuge(page)) put_page(page) __free_huge_page(page) spin_lock(&hugetlb_lock) update_and_free_page(page) set_compound_page_dtor(page, NULL_COMPOUND_DTOR) spin_unlock(&hugetlb_lock) isolate_huge_page(page) // trigger BUG_ON VM_BUG_ON_PAGE(!PageHead(page), page) spin_lock(&hugetlb_lock) page_huge_active(page) // trigger BUG_ON VM_BUG_ON_PAGE(!PageHuge(page), page) spin_unlock(&hugetlb_lock) When we isolate a HugeTLB page on CPU0. Meanwhile, we free it to the buddy allocator on CPU1. Then, we can trigger a BUG_ON on CPU0, because it is already freed to the buddy allocator. Link: https://lkml.kernel.org/r/20210115124942.46403-5-songmuchun@bytedance.com Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Signed-off-by:
Muchun Song <songmuchun@bytedance.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Acked-by:
Michal Hocko <mhocko@suse.com> Reviewed-by:
Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Muchun Song authored
stable inclusion from linux-4.19.175 commit db510d8f98c38a953ae5d3b51b01f435740729b4 -------------------------------- commit 7ffddd499ba6122b1a07828f023d1d67629aa017 upstream. There is a race condition between __free_huge_page() and dissolve_free_huge_page(). CPU0: CPU1: // page_count(page) == 1 put_page(page) __free_huge_page(page) dissolve_free_huge_page(page) spin_lock(&hugetlb_lock) // PageHuge(page) && !page_count(page) update_and_free_page(page) // page is freed to the buddy spin_unlock(&hugetlb_lock) spin_lock(&hugetlb_lock) clear_page_huge_active(page) enqueue_huge_page(page) // It is wrong, the page is already freed spin_unlock(&hugetlb_lock) The race window is between put_page() and dissolve_free_huge_page(). We should make sure that the page is already on the free list when it is dissolved. As a result __free_huge_page would corrupt page(s) already in the buddy allocator. Link: https://lkml.kernel.org/r/20210115124942.46403-4-songmuchun@bytedance.com Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Signed-off-by:
Muchun Song <songmuchun@bytedance.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by:
Oscar Salvador <osalvador@suse.de> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Muchun Song authored
stable inclusion from linux-4.19.175 commit b6e04c19c5b2060c91b07acec5d650a1beb6855f -------------------------------- commit 585fc0d2871c9318c949fbf45b1f081edd489e96 upstream. If a new hugetlb page is allocated during fallocate it will not be marked as active (set_page_huge_active) which will result in a later isolate_huge_page failure when the page migration code would like to move that page. Such a failure would be unexpected and wrong. Only export set_page_huge_active, just leave clear_page_huge_active as static. Because there are no external users. Link: https://lkml.kernel.org/r/20210115124942.46403-3-songmuchun@bytedance.com Fixes: 70c3547e (hugetlbfs: add hugetlbfs_fallocate()) Signed-off-by:
Muchun Song <songmuchun@bytedance.com> Acked-by:
Michal Hocko <mhocko@suse.com> Reviewed-by:
Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by:
Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Gustavo A. R. Silva authored
stable inclusion from linux-4.19.175 commit 8c323163303d9923927c176977abdbe998f217ff -------------------------------- commit 8d8d1dbefc423d42d626cf5b81aac214870ebaab upstream. While addressing some warnings generated by -Warray-bounds, I found this bug that was introduced back in 2017: CC [M] fs/cifs/smb2pdu.o fs/cifs/smb2pdu.c: In function ‘SMB2_negotiate’: fs/cifs/smb2pdu.c:822:16: warning: array subscript 1 is above array bounds of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds] 822 | req->Dialects[1] = cpu_to_le16(SMB30_PROT_ID); | ~~~~~~~~~~~~~^~~ fs/cifs/smb2pdu.c:823:16: warning: array subscript 2 is above array bounds of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds] 823 | req->Dialects[2] = cpu_to_le16(SMB302_PROT_ID); | ~~~~~~~~~~~~~^~~ fs/cifs/smb2pdu.c:824:16: warning: array subscript 3 is above array bounds of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds] 824 | req->Dialects[3] = cpu_to_le16(SMB311_PROT_ID); | ~~~~~~~~~~~~~^~~ fs/cifs/smb2pdu.c:816:16: warning: array subscript 1 is above array bounds of ‘__le16[1]’ {aka ‘short unsigned int[1]’} [-Warray-bounds] 816 | req->Dialects[1] = cpu_to_le16(SMB302_PROT_ID); | ~~~~~~~~~~~~~^~~ At the time, the size of array _Dialects_ was changed from 1 to 3 in struct validate_negotiate_info_req, and then in 2019 it was changed from 3 to 4, but those changes were never made in struct smb2_negotiate_req, which has led to a 3 and a half years old out-of-bounds bug in function SMB2_negotiate() (fs/cifs/smb2pdu.c). Fix this by increasing the size of array _Dialects_ in struct smb2_negotiate_req to 4. Fixes: 9764c02f ("SMB3: Add support for multidialect negotiate (SMB2.1 and later)") Fixes: d5c7076b ("smb3: add smb3.1.1 to default dialect list") Cc: stable@vger.kernel.org Signed-off-by:
Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by:
Steve French <stfrench@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Aurelien Aptel authored
stable inclusion from linux-4.19.175 commit 613fa1fb62c82e6b9c170b7de6f8c8c850624f5a -------------------------------- commit 21b200d091826a83aafc95d847139b2b0582f6d1 upstream. Assuming - //HOST/a is mounted on /mnt - //HOST/b is mounted on /mnt/b On a slow connection, running 'df' and killing it while it's processing /mnt/b can make cifs_get_inode_info() returns -ERESTARTSYS. This triggers the following chain of events: => the dentry revalidation fail => dentry is put and released => superblock associated with the dentry is put => /mnt/b is unmounted This patch makes cifs_d_revalidate() return the error instead of 0 (invalid) when cifs_revalidate_dentry() fails, except for ENOENT (file deleted) and ESTALE (file recreated). Signed-off-by:
Aurelien Aptel <aaptel@suse.com> Suggested-by:
Shyam Prasad N <nspmangalore@gmail.com> Reviewed-by:
Shyam Prasad N <nspmangalore@gmail.com> CC: stable@vger.kernel.org Signed-off-by:
Steve French <stfrench@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Marc Zyngier authored
stable inclusion from linux-4.19.175 commit 2bc9ccf2cdfdbd1e2bcab4114d44a09b61d59a7a -------------------------------- commit 4c457e8cb75eda91906a4f89fc39bde3f9a43922 upstream. When MSI_FLAG_ACTIVATE_EARLY is set (which is the case for PCI), __msi_domain_alloc_irqs() performs the activation of the interrupt (which in the case of PCI results in the endpoint being programmed) as soon as the interrupt is allocated. But it appears that this is only done for the first vector, introducing an inconsistent behaviour for PCI Multi-MSI. Fix it by iterating over the number of vectors allocated to each MSI descriptor. This is easily achieved by introducing a new "for_each_msi_vector" iterator, together with a tiny bit of refactoring. Fixes: f3b0946d ("genirq/msi: Make sure PCI MSIs are activated early") Reported-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by:
Marc Zyngier <maz@kernel.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210123122759.1781359-1-maz@kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Wang ShaoBo authored
stable inclusion from linux-4.19.175 commit e0d7f2f9d11c441f060da4d7509f42a8ce6bf5bc -------------------------------- commit 0188b87899ffc4a1d36a0badbe77d56c92fd91dc upstream. Our system encountered a re-init error when re-registering same kretprobe, where the kretprobe_instance in rp->free_instances is illegally accessed after re-init. Implementation to avoid re-registration has been introduced for kprobe before, but lags for register_kretprobe(). We must check if kprobe has been re-registered before re-initializing kretprobe, otherwise it will destroy the data struct of kretprobe registered, which can lead to memory leak, system crash, also some unexpected behaviors. We use check_kprobe_rereg() to check if kprobe has been re-registered before running register_kretprobe()'s body, for giving a warning message and terminate registration process. Link: https://lkml.kernel.org/r/20210128124427.2031088-1-bobo.shaobowang@huawei.com Cc: stable@vger.kernel.org Fixes: 1f0ab409 ("kprobes: Prevent re-registration of the same kprobe") [ The above commit should have been done for kretprobes too ] Acked-by:
Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by:
Ananth N Mavinakayanahalli <ananth@linux.ibm.com> Acked-by:
Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by:
Wang ShaoBo <bobo.shaobowang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
Liangyan authored
stable inclusion from linux-4.19.175 commit 307aa80243fd0e5e3197f89c9e464baedc01d377 -------------------------------- commit e04527fefba6e4e66492f122cf8cc6314f3cf3bf upstream. We need to lock d_parent->d_lock before dget_dlock, or this may have d_lockref updated parallelly like calltrace below which will cause dentry->d_lockref leak and risk a crash. CPU 0 CPU 1 ovl_set_redirect lookup_fast ovl_get_redirect __d_lookup dget_dlock //no lock protection here spin_lock(&dentry->d_lock) dentry->d_lockref.count++ dentry->d_lockref.count++ [ 49.799059] PGD 800000061fed7067 P4D 800000061fed7067 PUD 61fec5067 PMD 0 [ 49.799689] Oops: 0002 [#1] SMP PTI [ 49.800019] CPU: 2 PID: 2332 Comm: node Not tainted 4.19.24-7.20.al7.x86_64 #1 [ 49.800678] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8a46cfe 04/01/2014 [ 49.801380] RIP: 0010:_raw_spin_lock+0xc/0x20 [ 49.803470] RSP: 0018:ffffac6fc5417e98 EFLAGS: 00010246 [ 49.803949] RAX: 0000000000000000 RBX: ffff93b8da3446c0 RCX: 0000000a00000000 [ 49.804600] RDX: 0000000000000001 RSI: 000000000000000a RDI: 0000000000000088 [ 49.805252] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff993cf040 [ 49.805898] R10: ffff93b92292e580 R11: ffffd27f188a4b80 R12: 0000000000000000 [ 49.806548] R13: 00000000ffffff9c R14: 00000000fffffffe R15: ffff93b8da3446c0 [ 49.807200] FS: 00007ffbedffb700(0000) GS:ffff93b927880000(0000) knlGS:0000000000000000 [ 49.807935] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 49.808461] CR2: 0000000000000088 CR3: 00000005e3f74006 CR4: 00000000003606a0 [ 49.809113] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 49.809758] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 49.810410] Call Trace: [ 49.810653] d_delete+0x2c/0xb0 [ 49.810951] vfs_rmdir+0xfd/0x120 [ 49.811264] do_rmdir+0x14f/0x1a0 [ 49.811573] do_syscall_64+0x5b/0x190 [ 49.811917] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 49.812385] RIP: 0033:0x7ffbf505ffd7 [ 49.814404] RSP: 002b:00007ffbedffada8 EFLAGS: 00000297 ORIG_RAX: 0000000000000054 [ 49.815098] RAX: ffffffffffffffda RBX: 00007ffbedffb640 RCX: 00007ffbf505ffd7 [ 49.815744] RDX: 0000000004449700 RSI: 0000000000000000 RDI: 0000000006c8cd50 [ 49.816394] RBP: 00007ffbedffaea0 R08: 0000000000000000 R09: 0000000000017d0b [ 49.817038] R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000012 [ 49.817687] R13: 00000000072823d8 R14: 00007ffbedffb700 R15: 00000000072823d8 [ 49.818338] Modules linked in: pvpanic cirrusfb button qemu_fw_cfg atkbd libps2 i8042 [ 49.819052] CR2: 0000000000000088 [ 49.819368] ---[ end trace 4e652b8aa299aa2d ]--- [ 49.819796] RIP: 0010:_raw_spin_lock+0xc/0x20 [ 49.821880] RSP: 0018:ffffac6fc5417e98 EFLAGS: 00010246 [ 49.822363] RAX: 0000000000000000 RBX: ffff93b8da3446c0 RCX: 0000000a00000000 [ 49.823008] RDX: 0000000000000001 RSI: 000000000000000a RDI: 0000000000000088 [ 49.823658] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff993cf040 [ 49.825404] R10: ffff93b92292e580 R11: ffffd27f188a4b80 R12: 0000000000000000 [ 49.827147] R13: 00000000ffffff9c R14: 00000000fffffffe R15: ffff93b8da3446c0 [ 49.828890] FS: 00007ffbedffb700(0000) GS:ffff93b927880000(0000) knlGS:0000000000000000 [ 49.830725] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 49.832359] CR2: 0000000000000088 CR3: 00000005e3f74006 CR4: 00000000003606a0 [ 49.834085] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 49.835792] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Cc: <stable@vger.kernel.org> Fixes: a6c60655 ("ovl: redirect on rename-dir") Signed-off-by:
Liangyan <liangyan.peng@linux.alibaba.com> Reviewed-by:
Joseph Qi <joseph.qi@linux.alibaba.com> Suggested-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Roman Gushchin authored
stable inclusion from linux-4.19.175 commit 50b2181099c0b66e80bbb940c22881f2552ce7be -------------------------------- [ Upstream commit 2dcb3964544177c51853a210b6ad400de78ef17d ] With kaslr the kernel image is placed at a random place, so starting the bottom-up allocation with the kernel_end can result in an allocation failure and a warning like this one: hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node ------------[ cut here ]------------ memblock: bottom-up allocation failed, memory hotremove may be affected WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 RIP: 0010:memblock_find_in_range_node+0x178/0x25a Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000 RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046 RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000 R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000 FS: 0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0 Call Trace: memblock_alloc_range_nid+0x8d/0x11e cma_declare_contiguous_nid+0x2c4/0x38c hugetlb_cma_reserve+0xdc/0x128 flush_tlb_one_kernel+0xc/0x20 native_set_fixmap+0x82/0xd0 flat_get_apic_id+0x5/0x10 register_lapic_address+0x8e/0x97 setup_arch+0x8a5/0xc3f start_kernel+0x66/0x547 load_ucode_bsp+0x4c/0xcd secondary_startup_64_no_verify+0xb0/0xbb random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0 ---[ end trace f151227d0b39be70 ]--- At the same time, the kernel image is protected with memblock_reserve(), so we can just start searching at PAGE_SIZE. In this case the bottom-up allocation has the same chances to success as a top-down allocation, so there is no reason to fallback in the case of a failure. All together it simplifies the logic. Link: https://lkml.kernel.org/r/20201217201214.3414100-2-guro@fb.com Fixes: 8fabc623 ("powerpc: Ensure that swiotlb buffer is allocated from low memory") Signed-off-by:
Roman Gushchin <guro@fb.com> Reviewed-by:
Mike Rapoport <rppt@linux.ibm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Wonhyuk Yang <vvghjk1234@gmail.com> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Peter Zijlstra authored
stable inclusion from linux-4.19.174 commit 1746d1dcae9a960052b7a956523cc1719f71513c -------------------------------- [ Upstream commit 640f17c82460e9724fd256f0a1f5d99e7ff0bda4 ] create_worker() will already set the right affinity using kthread_bind_mask(), this means only the rescuer will need to change it's affinity. Howveer, while in cpu-hot-unplug a regular task is not allowed to run on online&&!active as it would be pushed away quite agressively. We need KTHREAD_IS_PER_CPU to survive in that environment. Therefore set the affinity after getting that magic flag. Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Valentin Schneider <valentin.schneider@arm.com> Tested-by:
Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210121103506.826629830@infradead.org Signed-off-by:
Sasha Levin <sashal@kernel.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Peter Zijlstra authored
stable inclusion from linux-4.19.174 commit fbad32181af9c8c76698b16115c28a493c26a2ee -------------------------------- [ Upstream commit ac687e6e8c26181a33270efd1a2e2241377924b0 ] There is a need to distinguish geniune per-cpu kthreads from kthreads that happen to have a single CPU affinity. Geniune per-cpu kthreads are kthreads that are CPU affine for correctness, these will obviously have PF_KTHREAD set, but must also have PF_NO_SETAFFINITY set, lest userspace modify their affinity and ruins things. However, these two things are not sufficient, PF_NO_SETAFFINITY is also set on other tasks that have their affinities controlled through other means, like for instance workqueues. Therefore another bit is needed; it turns out kthread_create_per_cpu() already has such a bit: KTHREAD_IS_PER_CPU, which is used to make kthread_park()/kthread_unpark() work correctly. Expose this flag and remove the implicit setting of it from kthread_create_on_cpu(); the io_uring usage of it seems dubious at best. Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Valentin Schneider <valentin.schneider@arm.com> Tested-by:
Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210121103506.557620262@infradead.org Signed-off-by:
Sasha Levin <sashal@kernel.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Christian Brauner authored
stable inclusion from linux-4.19.174 commit 5cbe06fe63af46c74ec80c39bacd79546d89ea9c -------------------------------- commit 7f2923c4 upstream. proc_get_long() is a funny function. It uses simple_strtoul() and for a good reason. proc_get_long() wants to always succeed the parse and return the maybe incorrect value and the trailing characters to check against a pre-defined list of acceptable trailing values. However, simple_strtoul() explicitly ignores overflows which can cause funny things like the following to happen: echo 18446744073709551616 > /proc/sys/fs/file-max cat /proc/sys/fs/file-max 0 (Which will cause your system to silently die behind your back.) On the other hand kstrtoul() does do overflow detection but does not return the trailing characters, and also fails the parse when anything other than '\n' is a trailing character whereas proc_get_long() wants to be more lenient. Now, before adding another kstrtoul() function let's simply add a static parse strtoul_lenient() which: - fails on overflow with -ERANGE - returns the trailing characters to the caller The reason why we should fail on ERANGE is that we already do a partial fail on overflow right now. Namely, when the TMPBUFLEN is exceeded. So we already reject values such as 184467440737095516160 (21 chars) but accept values such as 18446744073709551616 (20 chars) but both are overflows. So we should just always reject 64bit overflows and not special-case this based on the number of chars. Link: http://lkml.kernel.org/r/20190107222700.15954-2-christian@brauner.io Signed-off-by:
Christian Brauner <christian@brauner.io> Acked-by:
Kees Cook <keescook@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Waiman Long <longman@redhat.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Cc: Joerg Vehlow <lkml@jv-coder.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Eric Biggers authored
stable inclusion from linux-4.19.172 commit b4da738055acf42e00535c52b60d9bef808a7464 -------------------------------- commit 1e249cb5b7fc09ff216aa5a12f6c302e434e88f9 upstream. When lazytime is enabled and an inode is being written due to its in-memory updated timestamps having expired, either due to a sync() or syncfs() system call or due to dirtytime_expire_interval having elapsed, the VFS needs to inform the filesystem so that the filesystem can copy the inode's timestamps out to the on-disk data structures. This is done by __writeback_single_inode() calling mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC). However, this occurs after __writeback_single_inode() has already cleared the dirty flags from ->i_state. This causes two bugs: - mark_inode_dirty_sync() redirties the inode, causing it to remain dirty. This wastefully causes the inode to be written twice. But more importantly, it breaks cases where sync_filesystem() is expected to clean dirty inodes. This includes the FS_IOC_REMOVE_ENCRYPTION_KEY ioctl (as reported at https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well as possibly filesystem freezing (freeze_super()). - Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is called from __writeback_single_inode() for lazytime expiration, xfs_fs_dirty_inode() ignores the notification. (XFS only cares about lazytime expirations, and it assumes that i_state will contain I_DIRTY_TIME during those.) Therefore, lazy timestamps aren't persisted by sync(), syncfs(), or dirtytime_expire_interval on XFS. Fix this by moving the call to mark_inode_dirty_sync() to earlier in __writeback_single_inode(), before the dirty flags are cleared from i_state. This makes filesystems be properly notified of the timestamp expiration, and it avoids incorrectly redirtying the inode. This fixes xfstest generic/580 (which tests FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime enabled. It also fixes the new lazytime xfstest I've proposed, which reproduces the above-mentioned XFS bug (https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org). Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly. But due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the right thing to do because mark_inode_dirty_sync() now knows not to move the inode to a writeback list if it is currently queued for sync. Fixes: 0ae45f63 ("vfs: add support for a lazytime mount option") Cc: stable@vger.kernel.org Depends-on: 5afced3b ("writeback: Avoid skipping inode writeback") Link: https://lore.kernel.org/r/20210112190253.64307-2-ebiggers@kernel.org Suggested-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Eric Biggers <ebiggers@google.com> Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Jan Kara authored
stable inclusion from linux-4.19.172 commit 9f9623fc9340af731c3f3a09e6e9af0756b38a46 -------------------------------- commit 5fcd5750 upstream. The only use of I_DIRTY_TIME_EXPIRE is to detect in __writeback_single_inode() that inode got there because flush worker decided it's time to writeback the dirty inode time stamps (either because we are syncing or because of age). However we can detect this directly in __writeback_single_inode() and there's no need for the strange propagation with I_DIRTY_TIME_EXPIRE flag. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Eric Biggers <ebiggers@google.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Mikulas Patocka authored
stable inclusion from linux-4.19.172 commit 26b30d36cb5cff5e075114d05e5309043514770c -------------------------------- commit 5c02406428d5219c367c5f53457698c58bc5f917 upstream. Otherwise a malicious user could (ab)use the "recalculate" feature that makes dm-integrity calculate the checksums in the background while the device is already usable. When the system restarts before all checksums have been calculated, the calculation continues where it was interrupted even if the recalculate feature is not requested the next time the dm device is set up. Disable recalculating if we use internal_hash or journal_hash with a key (e.g. HMAC) and we don't have the "legacy_recalculate" flag. This may break activation of a volume, created by an older kernel, that is not yet fully recalculated -- if this happens, the user should add the "legacy_recalculate" flag to constructor parameters. Cc: stable@vger.kernel.org Signed-off-by:
Mikulas Patocka <mpatocka@red...>
-
Gaurav Kohli authored
stable inclusion from linux-4.19.172 commit acfa7ad7b7f6489e2bed20880ce090fdabdbb841 -------------------------------- commit bbeb9746 upstream. Below race can come, if trace_open and resize of cpu buffer is running parallely on different cpus CPUX CPUY ring_buffer_resize atomic_read(&buffer->resize_disabled) tracing_open tracing_reset_online_cpus ring_buffer_reset_cpu rb_reset_cpu rb_update_pages remove/insert pages resetting pointer This race can cause data abort or some times infinte loop in rb_remove_pages and rb_insert_pages while checking pages for sanity. Take buffer lock to fix this. Link: https://lkml.kernel.org/r/1601976833-24377-1-git-send-email-gkohli@codeaurora.org Cc: stable@vger.kernel.org Fixes: 83f40318 ("ring-buffer: Make removal of ring buffer pages atomic") Reported-by:
Denis Efremov <efremov@linux.com> Signed-off-by:
Gaurav Kohli <gkohli@codeaurora.org> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
王海 authored
stable inclusion from linux-4.19.172 commit 08b227d9f380aff229fe7eb3e43c0d90ac14ec38 -------------------------------- commit 757fed1d0898b893d7daa84183947c70f27632f3 upstream. This reverts commit dde3c6b7. syzbot report a double-free bug. The following case can cause this bug. - mm/slab_common.c: create_cache(): if the __kmem_cache_create() fails, it does: out_free_cache: kmem_cache_free(kmem_cache, s); - but __kmem_cache_create() - at least for slub() - will have done sysfs_slab_add(s) -> sysfs_create_group() .. fails .. -> kobject_del(&s->kobj); .. which frees s ... We can't remove the kmem_cache_free() in create_cache(), because other error cases of __kmem_cache_create() do not free this. So, revert the commit dde3c6b7 ("mm/slub: fix a memory leak in sysfs_slab_add()") to fix this. Reported-by:
<syzbot+d0bd96b4696c1ef67991@syzkaller.appspotmail.com> Fixes: dde3c6b7 ("mm/slub: fix a memory leak in sysfs_slab_add()") Acked-by:
Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
Wang Hai <wanghai38@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Sabyrzhan Tasbolatov authored
stable inclusion from linux-4.19.177 commit 335bacb523e6207dac471e68488c7804923f0514 -------------------------------- commit a11148e6fcce2ae53f47f0a442d098d860b4f7db upstream. syzbot found WARNING in rds_rdma_extra_size [1] when RDS_CMSG_RDMA_ARGS control message is passed with user-controlled 0x40001 bytes of args->nr_local, causing order >= MAX_ORDER condition. The exact value 0x40001 can be checked with UIO_MAXIOV which is 0x400. So for kcalloc() 0x400 iovecs with sizeof(struct rds_iovec) = 0x10 is the closest limit, with 0x10 leftover. Same condition is currently done in rds_cmsg_rdma_args(). [1] WARNING: mm/page_alloc.c:5011 [..] Call Trace: alloc_pages_current+0x18c/0x2a0 mm/mempolicy.c:2267 alloc_pages include/linux/gfp.h:547 [inline] kmalloc_order+0x2e/0xb0 mm/slab_common.c:837 kmalloc_order_trace+0x14/0x120 mm/slab_common.c:853 kmalloc_array include/linux/slab.h:592 [inline] kcalloc include/linux/slab.h:621 [inline] rds_rdma_extra_size+0xb2/0x3b0 net/rds/rdma.c:568 rds_rm_size net/rds/send.c:928 [inline] Reported-by:
<syzbot+1bd2b07f93745fa38425@syzkaller.appspotmail.com> Signed-off-by:
Sabyrzhan Tasbolatov <snovitoll@gmail.com> Acked-by:
Santosh Shilimkar <santosh.shilimkar@oracle.com> Link: https://lore.kernel.org/r/20210201203233.1324704-1-snovitoll@gmail.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Aichun Li <liaichun@huawei.com> reviewed-by:
wangxiaopeng <wangxiaopeng7@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
NeilBrown authored
stable inclusion from linux-4.19.177 commit 7436ceb2b78f334e292d72bb3d2ca4d5deb367df -------------------------------- commit af8085f3a4712c57d0dd415ad543bac85780375c upstream. The sctp transport seq_file iterators take a reference to the transport in the ->start and ->next functions and releases the reference in the ->show function. The preferred handling for such resources is to release them in the subsequent ->next or ->stop function call. Since Commit 1f4aace6 ("fs/seq_file.c: simplify seq_file iteration code and interface") there is no guarantee that ->show will be called after ->next, so this function can now leak references. So move the sctp_transport_put() call to ->next and ->stop. Fixes: 1f4aace6 ("fs/seq_file.c: simplify seq_file iteration code and interface") Reported-by:
Xin Long <lucien.xin@gmail.com> Signed-off-by:
NeilBrown <neilb@suse.de> Acked-by:
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Aichun Li <liaichun@huawei.com> reviewed-by:
wangxiaopeng <wangxiaopeng7@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Florian Westphal authored
stable inclusion from linux-4.19.177 commit 9f7c3c838aed35c1aa54e1eaf3430ed018e578d5 -------------------------------- [ Upstream commit 07998281c268592963e1cd623fe6ab0270b65ae4 ] The origin skip check needs to re-test the zone. Else, we might skip a colliding tuple in the reply direction. This only occurs when using 'directional zones' where origin tuples reside in different zones but the reply tuples share the same zone. This causes the new conntrack entry to be dropped at confirmation time because NAT clash resolution was elided. Fixes: 4e35c1cb ("netfilter: nf_nat: skip nat clash resolution for same-origin entries") Signed-off-by:
Florian Westphal <fw@strlen.de> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by:
Sasha Levin <sashal@kernel.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Aichun Li <liaichun@huawei.com> reviewed-by:
wangxiaopeng <wangxiaopeng7@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Sven Auhagen authored
stable inclusion from linux-4.19.177 commit 99109999f7c63ef833deaa8ebaa8e5a7bc6de15c -------------------------------- [ Upstream commit 8d6bca156e47d68551750a384b3ff49384c67be3 ] When updating the tcp or udp header checksum on port nat the function inet_proto_csum_replace2 with the last parameter pseudohdr as true. This leads to an error in the case that GRO is used and packets are split up in GSO. The tcp or udp checksum of all packets is incorrect. The error is probably masked due to the fact the most network driver implement tcp/udp checksum offloading. It also only happens when GRO is applied and not on single packets. The error is most visible when using a pppoe connection which is not triggering the tcp/udp checksum offload. Fixes: ac2a6666 ("netfilter: add generic flow table infrastructure") Signed-off-by:
Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by:
Sasha Levin <sashal@kernel.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Aichun Li <liaichun@huawei.com> reviewed-by:
wangxiaopeng <wangxiaopeng7@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Jozsef Kadlecsik authored
stable inclusion from linux-4.19.177 commit 28198be7e5f50d66a60ccf27fe518f55813e1b32 -------------------------------- [ Upstream commit b1bdde33b72366da20d10770ab7a49fe87b5e190 ] When both --reap and --update flag are specified, there's a code path at which the entry to be updated is reaped beforehand, which then leads to kernel crash. Reap only entries which won't be updated. Fixes kernel bugzilla #207773. Link: https://bugzilla.kernel.org/show_bug.cgi?id=207773 Reported-by:
Reindl Harald <h.reindl@thelounge.net> Fixes: 0079c5ae ("netfilter: xt_recent: add an entry reaper") Signed-off-by:
Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by:
Sasha Levin <sashal@kernel.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Aichun Li <liaichun@huawei.com> reviewed-by:
wangxiaopeng <wangxiaopeng7@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Cong Wang authored
stable inclusion from linux-4.19.176 commit fac13c1dd417f7d117c28d4929175c3b05820234 -------------------------------- [ Upstream commit afbc293add6466f8f3f0c3d944d85f53709c170f ] xfrm_probe_algs() probes kernel crypto modules and changes the availability of struct xfrm_algo_desc. But there is a small window where ealg->available and aalg->available get changed between count_ah_combs()/count_esp_combs() and dump_ah_combs()/dump_esp_combs(), in this case we may allocate a smaller skb but later put a larger amount of data and trigger the panic in skb_put(). Fix this by relaxing the checks when counting the size, that is, skipping the test of ->available. We may waste some memory for a few of sizeof(struct sadb_comb), but it is still much better than a panic. Reported-by:
<syzbot+b2bf2652983d23734c5c@syzkaller.appspotmail.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by:
Cong Wang <cong.wang@bytedance.com> Signed-off-by:
Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by:
Sasha Levin <sashal@kernel.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Aichun Li <liaichun@huawei.com> reviewed-by:
wangxiaopeng <wangxiaopeng7@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Vadim Fedorenko authored
stable inclusion from linux-4.19.175 commit 9c7b01fc3e22c49466c1fd0a9024bfda6b76440e -------------------------------- commit 28e104d00281ade30250b24e098bf50887671ea4 upstream. dev->hard_header_len for tunnel interface is set only when header_ops are set too and already contains full overhead of any tunnel encapsulation. That's why there is not need to use this overhead twice in mtu calc. Fixes: fdafed45 ("ip_gre: set dev->hard_header_len and dev->needed_headroom properly") Reported-by:
Slava Bacherikov <mail@slava.cc> Signed-off-by:
Vadim Fedorenko <vfedorenko@novek.ru> Link: https://lore.kernel.org/r/1611959267-20536-1-git-send-email-vfedorenko@novek.ru Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Aichun Li <liaichun@huawei.com> reviewed-by:
wangxiaopeng <wangxiaopeng7@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Eric Dumazet authored
stable inclusion from linux-4.19.174 commit 69874c3152adac06d04360d40e1ea775788a2c12 -------------------------------- commit dd5e073381f2ada3630f36be42833c6e9c78b75e upstream syzbot report reminded us that very big ewma_log were supported in the past, even if they made litle sense. tc qdisc replace dev xxx root est 1sec 131072sec ... While fixing the bug, also add boundary checks for ewma_log, in line with range supported by iproute2. UBSAN: shift-out-of-bounds in net/core/gen_estimator.c:83:38 shift exponent -1 is negative CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395 est_timer.cold+0xbb/0x12d net/core/gen_estimator.c:83 call_timer_fn+0x1a5/0x710 kernel/time/timer.c:1417 expire_timers kernel/time/timer.c:1462 [inline] __run_timers.part.0+0x692/0xa80 kernel/time/timer.c:1731 __run_timers kernel/time/timer.c:1712 [inline] run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1744 __do_softirq+0x2bc/0xa77 kernel/softirq.c:343 asm_call_irq_on_stack+0xf/0x20 </IRQ> __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline] do_softirq_own_stack+0xaa/0xd0 arch/x86/kernel/irq_64.c:77 invoke_softirq kernel/softirq.c:226 [inline] __irq_exit_rcu+0x17f/0x200 kernel/softirq.c:420 irq_exit_rcu+0x5/0x20 kernel/softirq.c:432 sysvec_apic_timer_interrupt+0x4d/0x100 arch/x86/kernel/apic/apic.c:1096 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:628 RIP: 0010:native_save_fl arch/x86/include/asm/irqflags.h:29 [inline] RIP: 0010:arch_local_save_flags arch/x86/include/asm/irqflags.h:79 [inline] RIP: 0010:arch_irqs_disabled arch/x86/include/asm/irqflags.h:169 [inline] RIP: 0010:acpi_safe_halt drivers/acpi/processor_idle.c:111 [inline] RIP: 0010:acpi_idle_do_entry+0x1c9/0x250 drivers/acpi/processor_idle.c:516 Fixes: 1c0d32fd ("net_sched: gen_estimator: complete rewrite of rate estimators") Signed-off-by:
Eric Dumazet <edumazet@google.com> Reported-by:
syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20210114181929.1717985-1-eric.dumazet@gmail.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org> [sudip: adjust context] Signed-off-by:
Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Aichun Li <liaichun@huawei.com> reviewed-by:
wangxiaopeng <wangxiaopeng7@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Pengcheng Yang authored
stable inclusion from linux-4.19.173 commit dc7bc439a8d248686437f710a6eeb864a6f3d588 -------------------------------- commit 62d9f1a6945ba69c125e548e72a36d203b30596e upstream. Upon receiving a cumulative ACK that changes the congestion state from Disorder to Open, the TLP timer is not set. If the sender is app-limited, it can only wait for the RTO timer to expire and retransmit. The reason for this is that the TLP timer is set before the congestion state changes in tcp_ack(), so we delay the time point of calling tcp_set_xmit_timer() until after tcp_fastretrans_alert() returns and remove the FLAG_SET_XMIT_TIMER from ack_flag when the RACK reorder timer is set. This commit has two additional benefits: 1) Make sure to reset RTO according to RFC6298 when receiving ACK, to avoid spurious RTO caused by RTO timer early expires. 2) Reduce the xmit timer reschedule once per ACK when the RACK reorder timer is set. Fixes: df92c839 ("tcp: fix xmit timer to only be reset if data ACKed/SACKed") Link: https://lore.kernel.org/netdev/1611311242-6675-1-git-send-email-yangpc@wangsu.com Signed-off-by:
Pengcheng Yang <yangpc@wangsu.com> Acked-by:
Neal Cardwell <ncardwell@google.com> Acked-by:
Yuchung Cheng <ycheng@google.com> Cc: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/1611464834-23030-1-git-send-email-yangpc@wangsu.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Aichun Li <liaichun@huawei.com> reviewed-by:
wangxiaopeng <wangxiaopeng7@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Roi Dayan authored
stable inclusion from linux-4.19.173 commit 8cbe9e3567ab4dea3e05e8cf791b5f6e6fdf8c9c -------------------------------- [ Upstream commit 487c6ef81eb98d0a43cb08be91b1fcc9b4250626 ] When we create the ft object we also init rhltable in ft->fgs_hash. So in error flow before kfree of ft we need to destroy that rhltable. Fixes: 693c6883 ("net/mlx5: Add hash table for flow groups in flow table") Signed-off-by:
Roi Dayan <roid@nvidia.com> Reviewed-by:
Maor Dickman <maord@nvidia.com> Signed-off-by:
Saeed Mahameed <saeedm@nvidia.com> Signed-off-by:
Sasha Levin <sashal@kernel.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Aichun Li <liaichun@huawei.com> reviewed-by:
wangxiaopeng <wangxiaopeng7@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Eyal Birger authored
stable inclusion from linux-4.19.173 commit 5292ab517275e62042e479aebec9755196643001 -------------------------------- [ Upstream commit 9f8550e4bd9d78a8436c2061ad2530215f875376 ] The disable_xfrm flag signals that xfrm should not be performed during routing towards a device before reaching device xmit. For xfrm interfaces this is usually desired as they perform the outbound policy lookup as part of their xmit using their if_id. Before this change enabling this flag on xfrm interfaces prevented them from xmitting as xfrm_lookup_with_ifid() would not perform a policy lookup in case the original dst had the DST_NOXFRM flag. This optimization is incorrect when the lookup is done by the xfrm interface xmit logic. Fix by performing policy lookup when invoked by xfrmi as if_id != 0. Similarly it's unlikely for the 'no policy exists on net' check to yield any performance benefits when invoked from xfrmi. Fixes: f203b76d ("xfrm: Add virtual xfrm interfaces") Signed-off-by:
Eyal Birger <eyal.birger@gmail.com> Signed-off-by:
Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by:
Sasha Levin <sashal@kernel.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Aichun Li <liaichun@huawei.com> reviewed-by:
wangxiaopeng <wangxiaopeng7@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Shmulik Ladkani authored
stable inclusion from linux-4.19.173 commit 6790d6f3cb46d6212e9cabfc82d8be0a34cb08bd -------------------------------- [ Upstream commit 56ce7c25ae1525d83cf80a880cf506ead1914250 ] When setting xfrm replay_window to values higher than 32, a rare page-fault occurs in xfrm_replay_advance_bmp: BUG: unable to handle page fault for address: ffff8af350ad7920 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD ad001067 P4D ad001067 PUD 0 Oops: 0002 [#1] SMP PTI CPU: 3 PID: 30 Comm: ksoftirqd/3 Kdump: loaded Not tainted 5.4.52-050452-generic #202007160732 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 RIP: 0010:xfrm_replay_advance_bmp+0xbb/0x130 RSP: 0018:ffffa1304013ba40 EFLAGS: 00010206 RAX: 000000000000010d RBX: 0000000000000002 RCX: 00000000ffffff4b RDX: 0000000000000018 RSI: 00000000004c234c RDI: 00000000ffb3dbff RBP: ffffa1304013ba50 R08: ffff8af330ad7920 R09: 0000000007fffffa R10: 0000000000000800 R11: 0000000000000010 R12: ffff8af29d6258c0 R13: ffff8af28b95c700 R14: 0000000000000000 R15: ffff8af29d6258fc FS: 0000000000000000(0000) GS:ffff8af339ac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff8af350ad7920 CR3: 0000000015ee4000 CR4: 00000000001406e0 Call Trace: xfrm_input+0x4e5/0xa10 xfrm4_rcv_encap+0xb5/0xe0 xfrm4_udp_encap_rcv+0x140/0x1c0 Analysis revealed offending code is when accessing: replay_esn->bmp[nr] |= (1U << bitnr); with 'nr' being 0x07fffffa. This happened in an SMP system when reordering of packets was present; A packet arrived with a "too old" sequence number (outside the window, i.e 'diff > replay_window'), and therefore the following calculation: bitnr = replay_esn->replay_window - (diff - pos); yields a negative result, but since bitnr is u32 we get a large unsigned quantity (in crash dump above: 0xffffff4b seen in ecx). This was supposed to be protected by xfrm_input()'s former call to: if (x->repl->check(x, skb, seq)) { However, the state's spinlock x->lock is *released* after '->check()' is performed, and gets re-acquired before '->advance()' - which gives a chance for a different core to update the xfrm state, e.g. by advancing 'replay_esn->seq' when it encounters more packets - leading to a 'diff > replay_window' situation when original core continues to xfrm_replay_advance_bmp(). An attempt to fix this issue was suggested in commit bcf66bf5 ("xfrm: Perform a replay check after return from async codepaths"), by calling 'x->repl->recheck()' after lock is re-acquired, but fix applied only to asyncronous crypto algorithms. Augment the fix, by *always* calling 'recheck()' - irrespective if we're using async crypto. Fixes: 0ebea8ef ("[IPSEC]: Move state lock into x->type->input") Signed-off-by:
Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by:
Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by:
Sasha Levin <sashal@kernel.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Aichun Li <liaichun@huawei.com> reviewed-by:
wangxiaopeng <wangxiaopeng7@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Pablo Neira Ayuso authored
stable inclusion from linux-4.19.173 commit 95b1e6e1ffe99f4a69e2decf4b37ab63c3a06372 -------------------------------- commit 0c5b7a501e7400869ee905b4f7af3d6717802bcb upstream. Otherwise, the newly create element shows no timeout when listing the ruleset. If the set definition does not specify a default timeout, then the set element only shows the expiration time, but not the timeout. This is a problem when restoring a stateful ruleset listing since it skips the timeout policy entirely. Fixes: 22fe54d5 ("netfilter: nf_tables: add support for dynamic set updates") Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Aichun Li <liaichun@huawei.com> reviewed-by:
wangxiaopeng <wangxiaopeng7@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Jakub Kicinski authored
stable inclusion from linux-4.19.170 commit 11e36dcef44e6c7ab269f79657ff3d2db9a82c15 -------------------------------- [ Upstream commit 47e4bb147a96f1c9b4e7691e7e994e53838bfff8 ] We need to unregister the netdevice if config failed. .ndo_uninit takes care of most of the heavy lifting. This was uncovered by recent commit c269a24ce057 ("net: make free_netdev() more lenient with unregistering devices"). Previously the partially-initialized device would be left in the system. Reported-and-tested-by:
<syzbot+2393580080a2da190f04@syzkaller.appspotmail.com> Fixes: e2f1f072 ("sit: allow to configure 6rd tunnels via netlink") Acked-by:
Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20210114012947.2515313-1-kuba@kernel.org Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Aichun Li <liaichun@huawei.com> reviewed-by:
wangxiaopeng <wangxiaopeng7@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Willem de Bruijn authored
stable inclusion from linux-4.19.170 commit d4ede0a453cb2658a72dfed7572415c5366cdf4d -------------------------------- [ Upstream commit 9bd6b629c39e3fa9e14243a6d8820492be1a5b2e ] esp(6)_output_head uses skb_page_frag_refill to allocate a buffer for the esp trailer. It accesses the page with kmap_atomic to handle highmem. But skb_page_frag_refill can return compound pages, of which kmap_atomic only maps the first underlying page. skb_page_frag_refill does not return highmem, because flag __GFP_HIGHMEM is not set. ESP uses it in the same manner as TCP. That also does not call kmap_atomic, but directly uses page_address, in skb_copy_to_page_nocache. Do the same for ESP. This issue has become easier to trigger with recent kmap local debugging feature CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP. Fixes: cac2661c ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f ("esp6: Avoid skb_cow_data whenever possible") Signed-off-by:
Willem de Bruijn <willemb@google.com> Acked-by:
Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Aichun Li <liaichun@huawei.com> reviewed-by:
wangxiaopeng <wangxiaopeng7@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Baptiste Lepers authored
stable inclusion from linux-4.19.170 commit 4669452b4c66268a184a51c543b525533013c14e -------------------------------- [ Upstream commit fd2ddef043592e7de80af53f47fa46fd3573086e ] reuse->socks[] is modified concurrently by reuseport_add_sock. To prevent reading values that have not been fully initialized, only read the array up until the last known safe index instead of incorrectly re-reading the last index of the array. Fixes: acdcecc6 ("udp: correct reuseport selection with connected sockets") Signed-off-by:
Baptiste Lepers <baptiste.lepers@gmail.com> Acked-by:
Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20210107051110.12247-1-baptiste.lepers@gmail.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Aichun Li <liaichun@huawei.com> reviewed-by:
wangxiaopeng <wangxiaopeng7@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Martin Willi authored
stable inclusion from linux-4.19.158 commit 200c2f68407601b1bed3dba5e027f34cd284ec92 -------------------------------- [ Upstream commit 9e2b7fa2 ] VRF devices use an optimized direct path on output if a default qdisc is involved, calling Netfilter hooks directly. This path, however, does not consider Netfilter rules completing asynchronously, such as with NFQUEUE. The Netfilter okfn() is called for asynchronously accepted packets, but the VRF never passes that packet down the stack to send it out over the slave device. Using the slower redirect path for this seems not feasible, as we do not know beforehand if a Netfilter hook has asynchronously completing rules. Fix the use of asynchronously completing Netfilter rules in OUTPUT and POSTROUTING by using a special completion function that additionally calls dst_output() to pass the packet down the stack. Also, slightly adjust the use of nf_reset_ct() so that is called in the asynchronous case, too. Fixes: dcdd43c4 ("net: vrf: performance improvements for IPv4") Fixes: a9ec54d1 ("net: vrf: performance improvements for IPv6") Signed-off-by:
Martin Willi <martin@strongswan.org> Link: https://lore.kernel.org/r/20201106073030.3974927-1-martin@strongswan.org Signed-off-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Aichun Li <liaichun@huawei.com> reviewed-by:
wangxiaopeng <wangxiaopeng7@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-
Dong Kai authored
hulk inclusion category: bugfix bugzilla: 50425 CVE: NA --------------------------- The jump_label_apply_nops is used to replace default nops with ideal nops. This can be called only once per module. The current logic is incorrent as it is called per klp_object. Then if klp patch contains multi klp_object to be patched, it will be called more than one times, and lead to following crash: livepatch_scsi_test: loading out-of-tree module taints kernel. livepatch_scsi_test: tainting kernel with TAINT_LIVEPATCH jump_label: Fatal kernel bug, unexpected op at patch_exit+0x0/0x38 [livepatch_scsi_test] [(____ptrval____)] (66 66 66 66 90) 80 ------------[ cut here ]------------ kernel BUG at arch/x86/kernel/jump_label.c:37! invalid opcode: 0000 [#1] SMP NOPTI CPU: 0 PID: 116 Comm: insmod Tainted: G OE K --------- - - 4.18.0+ #13 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 RIP: 0010:bug_at+0x1d/0x20 Code: 02 ba d1 04 00 00 ee c3 90 90 90 90 90 66 66 66 66 90 41 89 f0 48 89 f9 48 89 fa 48 89 fe 48 c7 c7 98 79 c6 9d e8 01 79 0f 00 <0f> 0b 90 66 66 66 66 90 48 83 ec 18 48 8b 3f 65 48 8b 04 25 28 RSP: 0018:ffffb074c0563bb0 EFLAGS: 00000282 RAX: 000000000000007f RBX: ffffffffc036f018 RCX: ffffffff9de5abc8 RDX: 0000000000000000 RSI: 0000000000000082 RDI: 0000000000000246 RBP: ffffffffc036f018 R08: 0000000000000175 R09: 20363628205d295f R10: ffffa00743f8b580 R11: 3038202930392036 R12: ffffa00743f7fb40 R13: ffffffffc036f100 R14: ffffa00743f22160 R15: 0000000000000001 FS: 0000000001a088c0(0000) GS:ffffa00747800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000001a0f000 CR3: 0000000003fb6000 CR4: 00000000000006f0 Call Trace: __jump_label_transform.isra.0+0x50/0x130 jump_label_apply_nops+0x64/0x70 klp_register_patch+0x55b/0x6f0 ? _cond_resched+0x15/0x40 ? kmem_cache_alloc_trace+0x1a6/0x1c0 ? patch_free_scaffold+0xa7/0xbf [livepatch_scsi_test] patch_init+0x43e/0x1000 [livepatch_scsi_test] ? 0xffffffffc037f000 do_one_initcall+0x46/0x1c8 ? free_unref_page_commit+0x95/0x120 ? _cond_resched+0x15/0x40 ? kmem_cache_alloc_trace+0x3e/0x1c0 do_init_module+0x5b/0x1fc load_module+0x15b1/0x1e50 ? vmap_page_range_noflush+0x33f/0x4a0 ? __do_sys_init_module+0x167/0x1a0 __do_sys_init_module+0x167/0x1a0 do_syscall_64+0x5b/0x1b0 entry_SYSCALL_64_after_hwframe+0x65/0xca RIP: 0033:0x4d63a9 To solve this, we put the jump_label_apply_nops to klp_init_patch. Fixes: 61ddd2126320 ("livepatch/core: support jump_label") Signed-off-by:
Dong Kai <dongkai11@huawei.com> Reviewed-by:
Yang Jihong <yangjihong1@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Cheng Jian <cj.chengjian@huawei.com>
-