Skip to content
Snippets Groups Projects
  1. Jul 21, 2022
  2. Jul 14, 2022
    • Zheng Yejian's avatar
      rcu/tree: Mark functions as notrace · 9b5e728e
      Zheng Yejian authored
      hulk inclusion
      category: bugfix
      bugzilla: 187209, https://gitee.com/openeuler/kernel/issues/I5GWFT
      CVE: NA
      
      --------------------------------
      
      Syzkaller report a softlockup problem, see following logs:
        [   41.463870] watchdog: BUG: soft lockup - CPU#0 stuck for 22s!  [ksoftirqd/0:9]
        [   41.509763] Modules linked in:
        [   41.512295] CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 4.19.90 #13
        [   41.516134] Hardware name: linux,dummy-virt (DT)
        [   41.519182] pstate: 80c00005 (Nzcv daif +PAN +UAO)
        [   41.522415] pc : perf_trace_buf_alloc+0x138/0x238
        [   41.525583] lr : perf_trace_buf_alloc+0x138/0x238
        [   41.528656] sp : ffff8000c137e880
        [   41.531050] x29: ffff8000c137e880 x28: ffff20000850ced0
        [   41.534759] x27: 0000000000000000 x26: ffff8000c137e9c0
        [   41.538456] x25: ffff8000ce5c2ae0 x24: ffff200008358b08
        [   41.542151] x23: 0000000000000000 x22: ffff2000084a50ac
        [   41.545834] x21: ffff8000c137e880 x20: 000000000000001c
        [   41.549516] x19: ffff7dffbfdf88e8 x18: 0000000000000000
        [   41.553202] x17: 0000000000000000 x16: 0000000000000000
        [   41.556892] x15: 1ffff00036e07805 x14: 0000000000000000
        [   41.560592] x13: 0000000000000004 x12: 0000000000000000
        [   41.564315] x11: 1fffefbff7fbf120 x10: ffff0fbff7fbf120
        [   41.568003] x9 : dfff200000000000 x8 : ffff7dffbfdf8904
        [   41.571699] x7 : 0000000000000000 x6 : ffff0fbff7fbf121
        [   41.575398] x5 : ffff0fbff7fbf121 x4 : ffff0fbff7fbf121
        [   41.579086] x3 : ffff20000850cdc8 x2 : 0000000000000008
        [   41.582773] x1 : ffff8000c1376000 x0 : 0000000000000100
        [   41.586495] Call trace:
        [   41.588922]  perf_trace_buf_alloc+0x138/0x238
        [   41.591912]  perf_ftrace_function_call+0x1ac/0x248
        [   41.595123]  ftrace_ops_no_ops+0x3a4/0x488
        [   41.597998]  ftrace_graph_call+0x0/0xc
        [   41.600715]  rcu_dynticks_curr_cpu_in_eqs+0x14/0x70
        [   41.603962]  rcu_is_watching+0xc/0x20
        [   41.606635]  ftrace_ops_no_ops+0x240/0x488
        [   41.609530]  ftrace_graph_call+0x0/0xc
        [   41.612249]  __read_once_size_nocheck.constprop.0+0x1c/0x38
        [   41.615905]  unwind_frame+0x140/0x358
        [   41.618597]  walk_stackframe+0x34/0x60
        [   41.621359]  __save_stack_trace+0x204/0x3b8
        [   41.624328]  save_stack_trace+0x2c/0x38
        [   41.627112]  __kasan_slab_free+0x120/0x228
        [   41.630018]  kasan_slab_free+0x10/0x18
        [   41.632752]  kfree+0x84/0x250
        [   41.635107]  skb_free_head+0x70/0xb0
        [   41.637772]  skb_release_data+0x3f8/0x730
        [   41.640626]  skb_release_all+0x50/0x68
        [   41.643350]  kfree_skb+0x84/0x278
        [   41.645890]  kfree_skb_list+0x4c/0x78
        [   41.648595]  __dev_queue_xmit+0x1a4c/0x23a0
        [   41.651541]  dev_queue_xmit+0x28/0x38
        [   41.654254]  ip6_finish_output2+0xeb0/0x1630
        [   41.657261]  ip6_finish_output+0x2d8/0x7f8
        [   41.660174]  ip6_output+0x19c/0x348
        [   41.663850]  mld_sendpack+0x560/0x9e0
        [   41.666564]  mld_ifc_timer_expire+0x484/0x8a8
        [   41.669624]  call_timer_fn+0x68/0x4b0
        [   41.672355]  expire_timers+0x168/0x498
        [   41.675126]  run_timer_softirq+0x230/0x7a8
        [   41.678052]  __do_softirq+0x2d0/0xba0
        [   41.680763]  run_ksoftirqd+0x110/0x1a0
        [   41.683512]  smpboot_thread_fn+0x31c/0x620
        [   41.686429]  kthread+0x2c8/0x348
        [   41.688927]  ret_from_fork+0x10/0x18
      
      Look into above call stack, we found a recursive call in
      'ftrace_graph_call', see a snippet:
          __read_once_size_nocheck.constprop.0
            ftrace_graph_call
              ......
                rcu_dynticks_curr_cpu_in_eqs
                  ftrace_graph_call
      
      We analyze that 'rcu_dynticks_curr_cpu_in_eqs' should not be tracable,
      and we verify that mark related functions as 'notrace' can avoid the
      problem.
      
      Comparing mainline kernel, we find that commit ff5c4f5c ("rcu/tree:
      Mark the idle relevant functions noinstr") mark related functions as
      'noinstr' which implies notrace, noinline and sticks things in the
      .noinstr.text section.
      Link: https://lore.kernel.org/all/20200416114706.625340212@infradead.org/
      
      
      
      Currently 'noinstr' mechanism has not been introduced, so we would not
      directly backport that commit (otherwise more changes may be introduced).
      Instead, we mark the functions as 'notrace' where it is 'noinstr' in
      that commit.
      
      Signed-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Reviewed-by: default avatarZhen Lei <thunder.leizhen@huawei.com>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      9b5e728e
  3. Jul 01, 2022
    • Steven Rostedt (VMware)'s avatar
      tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing · 48c3a5e2
      Steven Rostedt (VMware) authored
      stable inclusion
      from stable-5.10.50
      commit 0531e84bc8ac750b7a15c18d478804c0d98f0a86
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ETW1
      CVE: NA
      
      --------------------------------
      
      commit 9913d5745bd720c4266805c8d29952a3702e4eca upstream.
      
      All internal use cases for tracepoint_probe_register() is set to not ever
      be called with the same function and data. If it is, it is considered a
      bug, as that means the accounting of handling tracepoints is corrupted.
      If the function and data for a tracepoint is already registered when
      tracepoint_probe_register() is called, it will call WARN_ON_ONCE() and
      return with EEXISTS.
      
      The BPF system call can end up calling tracepoint_probe_register() with
      the same data, which now means that this can trigger the warning because
      of a user space process. As WARN_ON_ONCE() should not be called because
      user space called a system call with bad data, there needs to be a way to
      register a tracepoint without tri...
      48c3a5e2
    • Liu Shixin's avatar
      swiotlb: skip swiotlb_bounce when orig_addr is zero · 8213da67
      Liu Shixin authored
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5EZK8
      CVE: NA
      
      --------------------------------
      
      After patch ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE"),
      swiotlb_bounce will be called in swiotlb_tbl_map_single unconditionally.
      This requires that the physical address must be valid, which is not always
      true on stable-4.19 or earlier version.
      On stable-4.19, swiotlb_alloc_buffer will call swiotlb_tbl_map_single with
      orig_addr equal to zero, which cause such a panic:
      
      Unable to handle kernel paging request at virtual address ffffb77a40000000
      ...
      pc : __memcpy+0x100/0x180
      lr : swiotlb_bounce+0x74/0x88
      ...
      Call trace:
       __memcpy+0x100/0x180
       swiotlb_tbl_map_single+0x2c8/0x338
       swiotlb_alloc+0xb4/0x198
       __dma_alloc+0x84/0x1d8
       ...
      
      On stable-4.9 and stable-4.14, swiotlb_alloc_coherent wille call map_single
      with orig_addr equal to zero, which can cause same panic.
      
      Fix this by skipping swiotlb_bounce when orig...
      8213da67
  4. Jun 15, 2022
  5. Jun 02, 2022
  6. Jun 01, 2022
  7. May 31, 2022
  8. May 24, 2022
  9. May 23, 2022
  10. May 17, 2022
  11. May 07, 2022
  12. Apr 24, 2022
  13. Apr 19, 2022
  14. Apr 12, 2022
  15. Apr 11, 2022
  16. Apr 02, 2022
    • Zhang Qiao's avatar
      sched/fair: Add qos_throttle_list node in struct cfs_rq · fb59563c
      Zhang Qiao authored
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I50PPU
      
      
      CVE: NA
      
      -----------------------------------------------------------------
      
      when unthrottle a cfs_rq at distribute_cfs_runtime(), another cpu
      may re-throttle this cfs_rq at qos_throttle_cfs_rq() before access
      the cfs_rq->throttle_list.next, but meanwhile, qos throttle will
      attach the cfs_rq throttle_list node to percpu qos_throttled_cfs_rq,
      it will change cfs_rq->throttle_list.next and cause panic or hardlockup
      at distribute_cfs_runtime().
      
      Fix it by adding a qos_throttle_list node in struct cfs_rq, and qos
      throttle disuse the cfs_rq->throttle_list.
      
      Signed-off-by: default avatarZhang Qiao <zhangqiao22@huawei.com>
      Reviewed-by: default avatarzheng zucheng <zhengzucheng@huawei.com>
      Reviewed-by: default avatarChen Hui <judy.chenhui@huawei.com>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      fb59563c
    • Linus Torvalds's avatar
      Reinstate some of "swiotlb: rework "fix info leak with DMA_FROM_DEVICE"" · 3e109690
      Linus Torvalds authored
      mainline inclusion
      from mainline-v5.18-rc1
      commit 901c7280ca0d5e2b4a8929fbe0bfb007ac2a6544
      category: bugfix
      bugzilla: 186478, https://gitee.com/openeuler/kernel/issues/I4Z86P
      CVE: CVE-2022-0854
      
      --------------------------------
      
      Halil Pasic points out [1] that the full revert of that commit (revert
      in bddac7c1e02b), and that a partial revert that only reverts the
      problematic case, but still keeps some of the cleanups is probably
      better.  
      
      And that partial revert [2] had already been verified by Oleksandr
      Natalenko to also fix the issue, I had just missed that in the long
      discussion.
      
      So let's reinstate the cleanups from commit aa6f8dcbab47 ("swiotlb:
      rework "fix info leak with DMA_FROM_DEVICE""), and effectively only
      revert the part that caused problems.
      
      Link: https://lore.kernel.org/all/20220328013731.017ae3e3.pasic@linux.ibm.com/ [1]
      Link: https://lore.kernel.org/all/20220324055732.GB12078@lst.de/ [2]
      Link: https://lore.kernel.org/all/4386660.LvFx2qVVIh@natalenko.name/
      
       [3]
      Suggested-by: default avatarHalil Pasic <pasic@linux.ibm.com>
      Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Cc: Christoph Hellwig" <hch@lst.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Conflicts:
      	Documentation/core-api/dma-attributes.rst
      	include/linux/dma-mapping.h
      	kernel/dma/swiotlb.c
      Signed-off-by: default avatarLiu Shixin <liushixin2@huawei.com>
      Reviewed-by: default avatarXiu Jianfeng <xiujianfeng@huawei.com>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      3e109690
    • Linus Torvalds's avatar
      Revert "swiotlb: rework "fix info leak with DMA_FROM_DEVICE"" · a5e62d73
      Linus Torvalds authored
      mainline inclusion
      from mainline-v5.18-rc1
      commit bddac7c1e02ba47f0570e494c9289acea3062cc1
      category: bugfix
      bugzilla: 186478, https://gitee.com/openeuler/kernel/issues/I4Z86P
      
      
      CVE: CVE-2022-0854
      
      --------------------------------
      
      This reverts commit aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13.
      
      It turns out this breaks at least the ath9k wireless driver, and
      possibly others.
      
      What the ath9k driver does on packet receive is to set up the DMA
      transfer with:
      
        int ath_rx_init(..)
        ..
                      bf->bf_buf_addr = dma_map_single(sc->dev, skb->data,
                                                       common->rx_bufsize,
                                                       DMA_FROM_DEVICE);
      
      and then the receive logic (through ath_rx_tasklet()) will fetch
      incoming packets
      
        static bool ath_edma_get_buffers(..)
        ..
              dma_sync_single_for_cpu(sc->dev, bf->bf_buf_addr,
                                      common->rx_bufsize, DMA_FROM_DEVICE);
      
              ret = ath9k_hw_process_rxdesc_edma(ah, rs, skb->data);
              if (ret == -EINPROGRESS) {
                      /*let device gain the buffer again*/
                      dma_sync_single_for_device(sc->dev, bf->bf_buf_addr,
                                      common->rx_bufsize, DMA_FROM_DEVICE);
                      return false;
              }
      
      and it's worth noting how that first DMA sync:
      
          dma_sync_single_for_cpu(..DMA_FROM_DEVICE);
      
      is there to make sure the CPU can read the DMA buffer (possibly by
      copying it from the bounce buffer area, or by doing some cache flush).
      The iommu correctly turns that into a "copy from bounce bufer" so that
      the driver can look at the state of the packets.
      
      In the meantime, the device may continue to write to the DMA buffer, but
      we at least have a snapshot of the state due to that first DMA sync.
      
      But that _second_ DMA sync:
      
          dma_sync_single_for_device(..DMA_FROM_DEVICE);
      
      is telling the DMA mapping that the CPU wasn't interested in the area
      because the packet wasn't there.  In the case of a DMA bounce buffer,
      that is a no-op.
      
      Note how it's not a sync for the CPU (the "for_device()" part), and it's
      not a sync for data written by the CPU (the "DMA_FROM_DEVICE" part).
      
      Or rather, it _should_ be a no-op.  That's what commit aa6f8dcbab47
      broke: it made the code bounce the buffer unconditionally, and changed
      the DMA_FROM_DEVICE to just unconditionally and illogically be
      DMA_TO_DEVICE.
      
      [ Side note: purely within the confines of the swiotlb driver it wasn't
        entirely illogical: The reason it did that odd DMA_FROM_DEVICE ->
        DMA_TO_DEVICE conversion thing is because inside the swiotlb driver,
        it uses just a swiotlb_bounce() helper that doesn't care about the
        whole distinction of who the sync is for - only which direction to
        bounce.
      
        So it took the "sync for device" to mean that the CPU must have been
        the one writing, and thought it meant DMA_TO_DEVICE. ]
      
      Also note how the commentary in that commit was wrong, probably due to
      that whole confusion, claiming that the commit makes the swiotlb code
      
                                        "bounce unconditionally (that is, also
          when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
          data from the swiotlb buffer"
      
      which is nonsensical for two reasons:
      
       - that "also when dir == DMA_TO_DEVICE" is nonsensical, as that was
         exactly when it always did - and should do - the bounce.
      
       - since this is a sync for the device (not for the CPU), we're clearly
         fundamentally not coping back stale data from the bounce buffers at
         all, because we'd be copying *to* the bounce buffers.
      
      So that commit was just very confused.  It confused the direction of the
      synchronization (to the device, not the cpu) with the direction of the
      DMA (from the device).
      
      Reported-and-bisected-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
      Reported-by: default avatarOlha Cherevyk <olha.cherevyk@gmail.com>
      Cc: Halil Pasic <pasic@linux.ibm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Kalle Valo <kvalo@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Toke Høiland-Jørgensen <toke@toke.dk>
      Cc: Maxime Bizon <mbizon@freebox.fr>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Conflicts:
      	Documentation/core-api/dma-attributes.rst
      	include/linux/dma-mapping.h
      	kernel/dma/swiotlb.c
      Signed-off-by: default avatarLiu Shixin <liushixin2@huawei.com>
      Reviewed-by: default avatarXiu Jianfeng <xiujianfeng@huawei.com>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      a5e62d73
  17. Mar 23, 2022
    • Halil Pasic's avatar
      swiotlb: rework "fix info leak with DMA_FROM_DEVICE" · 3f80e186
      Halil Pasic authored
      mainline inclusion
      from mainline-v5.17-rc8
      commit aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13
      category: bugfix
      bugzilla: 186478, https://gitee.com/openeuler/kernel/issues/I4Z86P
      
      
      CVE: CVE-2022-0854
      
      --------------------------------
      
      Unfortunately, we ended up merging an old version of the patch "fix info
      leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph
      (the swiotlb maintainer), he asked me to create an incremental fix
      (after I have pointed this out the mix up, and asked him for guidance).
      So here we go.
      
      The main differences between what we got and what was agreed are:
      * swiotlb_sync_single_for_device is also required to do an extra bounce
      * We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters
      * The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE
        must take precedence over DMA_ATTR_SKIP_CPU_SYNC
      
      Thus this patch removes DMA_ATTR_OVERWRITE, and makes
      swiotlb_sync_single_for_device() bounce unconditionally (that is, also
      when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
      data from the swiotlb buffer.
      
      Let me note, that if the size used with dma_sync_* API is less than the
      size used with dma_[un]map_*, under certain circumstances we may still
      end up with swiotlb not being transparent. In that sense, this is no
      perfect fix either.
      
      To get this bullet proof, we would have to bounce the entire
      mapping/bounce buffer. For that we would have to figure out the starting
      address, and the size of the mapping in
      swiotlb_sync_single_for_device(). While this does seem possible, there
      seems to be no firm consensus on how things are supposed to work.
      
      Signed-off-by: default avatarHalil Pasic <pasic@linux.ibm.com>
      Fixes: ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Conflicts:
      	Documentation/core-api/dma-attributes.rst
      	include/linux/dma-mapping.h
      	kernel/dma/swiotlb.c
      Signed-off-by: default avatarLiu Shixin <liushixin2@huawei.com>
      Reviewed-by: default avatarXiu Jianfeng <xiujianfeng@huawei.com>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      3f80e186
    • Halil Pasic's avatar
      swiotlb: fix info leak with DMA_FROM_DEVICE · 04c20fc8
      Halil Pasic authored
      mainline inclusion
      from mainline-v5.17-rc6
      commit ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e
      category: bugfix
      bugzilla: 186478, https://gitee.com/openeuler/kernel/issues/I4Z86P
      
      
      CVE: CVE-2022-0854
      
      --------------------------------
      
      The problem I'm addressing was discovered by the LTP test covering
      cve-2018-1000204.
      
      A short description of what happens follows:
      1) The test case issues a command code 00 (TEST UNIT READY) via the SG_IO
         interface with: dxfer_len == 524288, dxdfer_dir == SG_DXFER_FROM_DEV
         and a corresponding dxferp. The peculiar thing about this is that TUR
         is not reading from the device.
      2) In sg_start_req() the invocation of blk_rq_map_user() effectively
         bounces the user-space buffer. As if the device was to transfer into
         it. Since commit a45b599a ("scsi: sg: allocate with __GFP_ZERO in
         sg_build_indirect()") we make sure this first bounce buffer is
         allocated with GFP_ZERO.
      3) For the rest of the story we keep ignoring that we have a TUR, so the
         device won't touch the buffer we prepare as if the we had a
         DMA_FROM_DEVICE type of situation. My setup uses a virtio-scsi device
         and the  buffer allocated by SG is mapped by the function
         virtqueue_add_split() which uses DMA_FROM_DEVICE for the "in" sgs (here
         scatter-gather and not scsi generics). This mapping involves bouncing
         via the swiotlb (we need swiotlb to do virtio in protected guest like
         s390 Secure Execution, or AMD SEV).
      4) When the SCSI TUR is done, we first copy back the content of the second
         (that is swiotlb) bounce buffer (which most likely contains some
         previous IO data), to the first bounce buffer, which contains all
         zeros.  Then we copy back the content of the first bounce buffer to
         the user-space buffer.
      5) The test case detects that the buffer, which it zero-initialized,
        ain't all zeros and fails.
      
      One can argue that this is an swiotlb problem, because without swiotlb
      we leak all zeros, and the swiotlb should be transparent in a sense that
      it does not affect the outcome (if all other participants are well
      behaved).
      
      Copying the content of the original buffer into the swiotlb buffer is
      the only way I can think of to make swiotlb transparent in such
      scenarios. So let's do just that if in doubt, but allow the driver
      to tell us that the whole mapped buffer is going to be overwritten,
      in which case we can preserve the old behavior and avoid the performance
      impact of the extra bounce.
      
      Signed-off-by: default avatarHalil Pasic <pasic@linux.ibm.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Conflicts:
      	Documentation/core-api/dma-attributes.rst
      	include/linux/dma-mapping.h
      	kernel/dma/swiotlb.c
      Signed-off-by: default avatarLiu Shixin <liushixin2@huawei.com>
      Reviewed-by: default avatarXiu Jianfeng <xiujianfeng@huawei.com>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      04c20fc8
  18. Mar 21, 2022
  19. Mar 17, 2022