Skip to content
Snippets Groups Projects
  1. Sep 22, 2022
    • Borislav Petkov's avatar
      x86: Fix early boot crash on gcc-10, third try · 99b180a1
      Borislav Petkov authored
      mainline inclusion
      from mainline-v5.7
      commit a9a3ed1e
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5Q0UG?from=project-issue
      
      
      CVE: NA
      
      ---------------------------
      
      ... or the odyssey of trying to disable the stack protector for the
      function which generates the stack canary value.
      
      The whole story started with Sergei reporting a boot crash with a kernel
      built with gcc-10:
      
        Kernel panic — not syncing: stack-protector: Kernel stack is corrupted in: start_secondary
        CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc5—00235—gfffb08b37df9 #139
        Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M—D3H, BIOS F12 11/14/2013
        Call Trace:
          dump_stack
          panic
          ? start_secondary
          __stack_chk_fail
          start_secondary
          secondary_startup_64
        -—-[ end Kernel panic — not syncing: stack—protector: Kernel stack is corrupted in: start_secondary
      
      This happens because gcc-10 tail-call optimizes the last function call
      in start_secondary() - cpu_startup_entry() - and thus emits a stack
      canary check which fails because the canary value changes after the
      boot_init_stack_canary() call.
      
      To fix that, the initial attempt was to mark the one function which
      generates the stack canary with:
      
        __attribute__((optimize("-fno-stack-protector"))) ... start_secondary(void *unused)
      
      however, using the optimize attribute doesn't work cumulatively
      as the attribute does not add to but rather replaces previously
      supplied optimization options - roughly all -fxxx options.
      
      The key one among them being -fno-omit-frame-pointer and thus leading to
      not present frame pointer - frame pointer which the kernel needs.
      
      The next attempt to prevent compilers from tail-call optimizing
      the last function call cpu_startup_entry(), shy of carving out
      start_secondary() into a separate compilation unit and building it with
      -fno-stack-protector, was to add an empty asm("").
      
      This current solution was short and sweet, and reportedly, is supported
      by both compilers but we didn't get very far this time: future (LTO?)
      optimization passes could potentially eliminate this, which leads us
      to the third attempt: having an actual memory barrier there which the
      compiler cannot ignore or move around etc.
      
      That should hold for a long time, but hey we said that about the other
      two solutions too so...
      
      Reported-by: default avatarSergei Trofimovich <slyfox@gentoo.org>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Tested-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20200314164451.346497-1-slyfox@gentoo.org
      
      
      
      Signed-off-by: default avatartangbin <tangbin_yewu@cmss.chinamobile.com>
      99b180a1
  2. Jul 21, 2022
  3. May 23, 2022
  4. Apr 12, 2022
    • Daniel Borkmann's avatar
      bpf: Add kconfig knob for disabling unpriv bpf by default · 25c90cb2
      Daniel Borkmann authored
      
      stable inclusion
      from linux-4.19.230
      commit 07e7f7cc619d15645e45d04b1c99550c6d292e9c
      
      --------------------------------
      
      commit 08389d888287c3823f80b0216766b71e17f0aba5 upstream.
      
      Add a kconfig knob which allows for unprivileged bpf to be disabled by default.
      If set, the knob sets /proc/sys/kernel/unprivileged_bpf_disabled to value of 2.
      
      This still allows a transition of 2 -> {0,1} through an admin. Similarly,
      this also still keeps 1 -> {1} behavior intact, so that once set to permanently
      disabled, it cannot be undone aside from a reboot.
      
      We've also added extra2 with max of 2 for the procfs handler, so that an admin
      still has a chance to toggle between 0 <-> 2.
      
      Either way, as an additional alternative, applications can make use of CAP_BPF
      that we added a while ago.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/74ec548079189e4e4dffaeb42b8987bb3c852eee.162...
      25c90cb2
  5. Feb 08, 2022
  6. Nov 29, 2021
  7. Oct 30, 2021
  8. Oct 29, 2021
  9. Jul 19, 2021
    • Mark Rutland's avatar
      pid: take a reference when initializing `cad_pid` · 79663d99
      Mark Rutland authored
      stable inclusion
      from linux-4.19.194
      commit d106f05432e60f9f62d456ef017687f5c73cb414
      
      --------------------------------
      
      commit 0711f0d7050b9e07c44bc159bbc64ac0a1022c7f upstream.
      
      During boot, kernel_init_freeable() initializes `cad_pid` to the init
      task's struct pid.  Later on, we may change `cad_pid` via a sysctl, and
      when this happens proc_do_cad_pid() will increment the refcount on the
      new pid via get_pid(), and will decrement the refcount on the old pid
      via put_pid().  As we never called get_pid() when we initialized
      `cad_pid`, we decrement a reference we never incremented, can therefore
      free the init task's struct pid early.  As there can be dangling
      references to the struct pid, we can later encounter a use-after-free
      (e.g.  when delivering signals).
      
      This was spotted when fuzzing v5.13-rc3 with Syzkaller, but seems to
      have been around since the conversion of `cad_pid` to struct pid in
      commit 9ec52099 ("[PATCH] replace cad_pid by a struct pid") from the
      pre-KASAN stone age of v2.6.19.
      
      Fix this by getting a reference to the init task's struct pid when we
      assign it to `cad_pid`.
      
      Full KASAN splat below.
      
         ==================================================================
         BUG: KASAN: use-after-free in ns_of_pid include/linux/pid.h:153 [inline]
         BUG: KASAN: use-after-free in task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509
         Read of size 4 at addr ffff23794dda0004 by task syz-executor.0/273
      
         CPU: 1 PID: 273 Comm: syz-executor.0 Not tainted 5.12.0-00001-g9aef892b2d15 #1
         Hardware name: linux,dummy-virt (DT)
         Call trace:
          ns_of_pid include/linux/pid.h:153 [inline]
          task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509
          do_notify_parent+0x308/0xe60 kernel/signal.c:1950
          exit_notify kernel/exit.c:682 [inline]
          do_exit+0x2334/0x2bd0 kernel/exit.c:845
          do_group_exit+0x108/0x2c8 kernel/exit.c:922
          get_signal+0x4e4/0x2a88 kernel/signal.c:2781
          do_signal arch/arm64/kernel/signal.c:882 [inline]
          do_notify_resume+0x300/0x970 arch/arm64/kernel/signal.c:936
          work_pending+0xc/0x2dc
      
         Allocated by task 0:
          slab_post_alloc_hook+0x50/0x5c0 mm/slab.h:516
          slab_alloc_node mm/slub.c:2907 [inline]
          slab_alloc mm/slub.c:2915 [inline]
          kmem_cache_alloc+0x1f4/0x4c0 mm/slub.c:2920
          alloc_pid+0xdc/0xc00 kernel/pid.c:180
          copy_process+0x2794/0x5e18 kernel/fork.c:2129
          kernel_clone+0x194/0x13c8 kernel/fork.c:2500
          kernel_thread+0xd4/0x110 kernel/fork.c:2552
          rest_init+0x44/0x4a0 init/main.c:687
          arch_call_rest_init+0x1c/0x28
          start_kernel+0x520/0x554 init/main.c:1064
          0x0
      
         Freed by task 270:
          slab_free_hook mm/slub.c:1562 [inline]
          slab_free_freelist_hook+0x98/0x260 mm/slub.c:1600
          slab_free mm/slub.c:3161 [inline]
          kmem_cache_free+0x224/0x8e0 mm/slub.c:3177
          put_pid.part.4+0xe0/0x1a8 kernel/pid.c:114
          put_pid+0x30/0x48 kernel/pid.c:109
          proc_do_cad_pid+0x190/0x1b0 kernel/sysctl.c:1401
          proc_sys_call_handler+0x338/0x4b0 fs/proc/proc_sysctl.c:591
          proc_sys_write+0x34/0x48 fs/proc/proc_sysctl.c:617
          call_write_iter include/linux/fs.h:1977 [inline]
          new_sync_write+0x3ac/0x510 fs/read_write.c:518
          vfs_write fs/read_write.c:605 [inline]
          vfs_write+0x9c4/0x1018 fs/read_write.c:585
          ksys_write+0x124/0x240 fs/read_write.c:658
          __do_sys_write fs/read_write.c:670 [inline]
          __se_sys_write fs/read_write.c:667 [inline]
          __arm64_sys_write+0x78/0xb0 fs/read_write.c:667
          __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline]
          invoke_syscall arch/arm64/kernel/syscall.c:49 [inline]
          el0_svc_common.constprop.1+0x16c/0x388 arch/arm64/kernel/syscall.c:129
          do_el0_svc+0xf8/0x150 arch/arm64/kernel/syscall.c:168
          el0_svc+0x28/0x38 arch/arm64/kernel/entry-common.c:416
          el0_sync_handler+0x134/0x180 arch/arm64/kernel/entry-common.c:432
          el0_sync+0x154/0x180 arch/arm64/kernel/entry.S:701
      
         The buggy address belongs to the object at ffff23794dda0000
          which belongs to the cache pid of size 224
         The buggy address is located 4 bytes inside of
          224-byte region [ffff23794dda0000, ffff23794dda00e0)
         The buggy address belongs to the page:
         page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4dda0
         head:(____ptrval____) order:1 compound_mapcount:0
         flags: 0x3fffc0000010200(slab|head)
         raw: 03fffc0000010200 dead000000000100 dead000000000122 ffff23794d40d080
         raw: 0000000000000000 0000000000190019 00000001ffffffff 0000000000000000
         page dumped because: kasan: bad access detected
      
         Memory state around the buggy address:
          ffff23794dd9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
          ffff23794dd9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
         >ffff23794dda0000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                            ^
          ffff23794dda0080: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
          ffff23794dda0100: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00
         ==================================================================
      
      Link: https://lkml.kernel.org/r/20210524172230.38715-1-mark.rutland@arm.com
      
      
      Fixes: 9ec52099 ("[PATCH] replace cad_pid by a struct pid")
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Acked-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Christian Brauner <christian@brauner.io>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kees Cook <keescook@chromium.org
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      79663d99
  10. May 22, 2021
  11. May 08, 2021
  12. May 07, 2021
  13. Apr 30, 2021
  14. Apr 15, 2021
    • Jens Axboe's avatar
      io_uring: replace workqueue usage with io-wq · 2e064c50
      Jens Axboe authored
      mainline inclusion
      from mainline-5.5-rc1
      commit 561fb04a
      category: feature
      bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
      
      
      CVE: NA
      ---------------------------
      
      Drop various work-arounds we have for workqueues:
      
      - We no longer need the async_list for tracking sequential IO.
      
      - We don't have to maintain our own mm tracking/setting.
      
      - We don't need a separate workqueue for buffered writes. This didn't
        even work that well to begin with, as it was suboptimal for multiple
        buffered writers on multiple files.
      
      - We can properly cancel pending interruptible work. This fixes
        deadlocks with particularly socket IO, where we cannot cancel them
        when the io_uring is closed. Hence the ring will wait forever for
        these requests to complete, which may never happen. This is different
        from disk IO where we know requests will complete in a finite amount
        of time.
      
      - Due to being able to cancel work interruptible work that is already
        running, we can implement file table support for work. We need that
        for supporting system calls that add to a process file table.
      
      - It gets us one step closer to adding async support for any system
        call.
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      
      Conflicts:
      	fs/io_uring.c
      	[ Patch b5420237("mm: refactor readahead defines in mm.h")
      	  is not applied. ]
      
      Signed-off-by: default avatarZhihao Cheng <chengzhihao1@huawei.com>
      Signed-off-by: default avataryangerkun <yangerkun@huawei.com>
      Reviewed-by: default avatarzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: default avatarCheng Jian <cj.chengjian@huawei.com>
      2e064c50
    • Jens Axboe's avatar
      Add io_uring IO interface · a2491968
      Jens Axboe authored
      mainline inclusion
      from mainline-5.1-rc1
      commit 2b188cc1
      category: feature
      bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=27
      CVE: NA
      ---------------------------
      
      The submission queue (SQ) and completion queue (CQ) rings are shared
      between the application and the kernel. This eliminates the need to
      copy data back and forth to submit and complete IO.
      
      IO submissions use the io_uring_sqe data structure, and completions
      are generated in the form of io_uring_cqe data structures. The SQ
      ring is an index into the io_uring_sqe array, which makes it possible
      to submit a batch of IOs without them being contiguous in the ring.
      The CQ ring is always contiguous, as completion events are inherently
      unordered, and hence any io_uring_cqe entry can point back to an
      arbitrary submission.
      
      Two new system calls are added for this:
      
      io_uring_setup(entries, params)
      	Sets up an io_uring instance for doing async IO. On success,
      	returns a file descriptor that th...
      a2491968
  15. Apr 14, 2021
  16. Mar 11, 2021
    • Steven Rostedt (VMware)'s avatar
      fgraph: Initialize tracing_graph_pause at task creation · 9cc40bc5
      Steven Rostedt (VMware) authored
      stable inclusion
      from linux-4.19.176
      commit a19749a5fbd97f2147a8768813d084d584d9e045
      
      --------------------------------
      
      commit 7e0a9220467dbcfdc5bc62825724f3e52e50ab31 upstream.
      
      On some archs, the idle task can call into cpu_suspend(). The cpu_suspend()
      will disable or pause function graph tracing, as there's some paths in
      bringing down the CPU that can have issues with its return address being
      modified. The task_struct structure has a "tracing_graph_pause" atomic
      counter, that when set to something other than zero, the function graph
      tracer will not modify the return address.
      
      The problem is that the tracing_graph_pause counter is initialized when the
      function graph tracer is enabled. This can corrupt the counter for the idle
      task if it is suspended in these architectures.
      
         CPU 1				CPU 2
         -----				-----
        do_idle()
          cpu_suspend()
            pause_graph_tracing()
                task_struct->tracing_graph_pause++ (0 -> 1)
      
      				start_graph_tracing()
      				  for_each_online_cpu(cpu) {
      				    ftrace_graph_init_idle_task(cpu)
      				      task-struct->tracing_graph_pause = 0 (1 -> 0)
      
            unpause_graph_tracing()
                task_struct->tracing_graph_pause-- (0 -> -1)
      
      The above should have gone from 1 to zero, and enabled function graph
      tracing again. But instead, it is set to -1, which keeps it disabled.
      
      There's no reason that the field tracing_graph_pause on the task_struct can
      not be initialized at boot up.
      
      Cc: stable@vger.kernel.org
      Fixes: 380c4b14 ("tracing/function-graph-tracer: append the tracing_graph_flag")
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211339
      
      
      Reported-by: default avatar <pierre.gondois@arm.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarCheng Jian <cj.chengjian@huawei.com>
      9cc40bc5
  17. Feb 22, 2021
  18. Sep 22, 2020
  19. Dec 27, 2019
  20. Dec 20, 2018
  21. Dec 17, 2018
    • Li Zhijian's avatar
      initramfs: clean old path before creating a hardlink · 2a5d5f5f
      Li Zhijian authored
      [ Upstream commit 7c0950d4 ]
      
      sys_link() can fail due to the new path already existing.  This case
      ofen occurs when we use a concated initrd, for example:
      
      1) prepare a basic rootfs, it contains a regular files rc.local
      lizhijian@:~/yocto-tiny-i386-2016-04-22$ cat etc/rc.local
       #!/bin/sh
       echo "Running /etc/rc.local..."
      yocto-tiny-i386-2016-04-22$ find . | sed 's,^\./,,' | cpio -o -H newc | gzip -n -9 >../rootfs.cgz
      
      2) create a extra initrd which also includes a etc/rc.local
      lizhijian@:~/lkp-x86_64/etc$ echo "append initrd" >rc.local
      lizhijian@:~/lkp/lkp-x86_64/etc$ cat rc.local
      append initrd
      lizhijian@:~/lkp/lkp-x86_64/etc$ ln rc.local rc.local.hardlink
      append initrd
      lizhijian@:~/lkp/lkp-x86_64/etc$ stat rc.local rc.local.hardlink
        File: 'rc.local'
        Size: 14        	Blocks: 8          IO Block: 4096   regular file
      Device: 801h/2049d	Inode: 11296086    Links: 2
      Access: (0664/-rw-rw-r--)  Uid: ( 1002/lizhijian)   Gid: ( 1002/lizhijian)
      Access: 2018-11-15 16:08:28.654464815 +0800
      Modify: 2018-11-15 16:07:57.514903210 +0800
      Change: 2018-11-15 16:08:24.180228872 +0800
       Birth: -
        File: 'rc.local.hardlink'
        Size: 14        	Blocks: 8          IO Block: 4096   regular file
      Device: 801h/2049d	Inode: 11296086    Links: 2
      Access: (0664/-rw-rw-r--)  Uid: ( 1002/lizhijian)   Gid: ( 1002/lizhijian)
      Access: 2018-11-15 16:08:28.654464815 +0800
      Modify: 2018-11-15 16:07:57.514903210 +0800
      Change: 2018-11-15 16:08:24.180228872 +0800
       Birth: -
      
      lizhijian@:~/lkp/lkp-x86_64$ find . | sed 's,^\./,,' | cpio -o -H newc | gzip -n -9 >../rc-local.cgz
      lizhijian@:~/lkp/lkp-x86_64$ gzip -dc ../rc-local.cgz | cpio -t
      .
      etc
      etc/rc.local.hardlink <<< it will be extracted first at this initrd
      etc/rc.local
      
      3) concate 2 initrds and boot
      lizhijian@:~/lkp$ cat rootfs.cgz rc-local.cgz >concate-initrd.cgz
      lizhijian@:~/lkp$ qemu-system-x86_64 -nographic -enable-kvm -cpu host -smp 1 -m 1024 -kernel ~/lkp/linux/arch/x86/boot/bzImage -append "console=ttyS0 earlyprint=ttyS0 ignore_loglevel" -initrd ./concate-initr.cgz -serial stdio -nodefaults
      
      In this case, sys_link(2) will fail and return -EEXIST, so we can only get
      the rc.local at rootfs.cgz instead of rc-local.cgz
      
      [akpm@linux-foundation.org: move code to avoid forward declaration]
      Link: http://lkml.kernel.org/r/1542352368-13299-1-git-send-email-lizhijian@cn.fujitsu.com
      
      
      Signed-off-by: default avatarLi Zhijian <lizhijian@cn.fujitsu.com>
      Cc: Philip Li <philip.li@intel.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Li Zhijian <zhijianx.li@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2a5d5f5f
  22. Aug 24, 2018
  23. Aug 23, 2018
  24. Aug 18, 2018