Skip to content
Snippets Groups Projects
  1. Apr 02, 2022
  2. Mar 30, 2022
  3. Mar 26, 2022
    • Zheng Yejian's avatar
      livepatch/arm64: Fix incorrect endian conversion when long jump · 2b491a2e
      Zheng Yejian authored
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4ZIB2
      
      
      CVE: NA
      
      --------------------------------
      
      Kernel panic happened on 'arm64 big endian' board after calling function
      that has been live-patched. It can be reproduced as follows:
        1. Insert 'livepatch-sample.ko' to patch kernel function 'cmdline_proc_show';
        2. Enable patch by execute:
           echo 1 > /sys/kernel/livepatch/livepatch-sample/enabled
        3. Call 'cmdline_proc_show' by execute:
           cat /proc/cmdline
        4. Then we get following panic logs:
           > kernel BUG at arch/arm64/kernel/traps.c:408!
           > Internal error: Oops - BUG: 0 [#1] SMP
           > Modules linked in: dump_mem(OE) livepatch_cmdline1(OEK) hi1382_gmac(OE)
           > [last unloaded: dump_mem]
           > CPU: 3 PID: 1752 Comm: cat Session: 0 Tainted: G           OE K
           > 5.10.0+ #2
           > Hardware name: Hisilicon PhosphorHi1382 (DT)
           > pstate: 00000005 (nzcv daif -PAN -UAO -TCO BTYPE=--)
           > pc : do_undefinstr+0x23c/0x2b4
           > lr : do_undefinstr+0x5c/0x2b4
           > sp : ffffffc010ac3a80
           > x29: ffffffc010ac3a80 x28: ffffff82eb0a8000
           > x27: 0000000000000000 x26: 0000000000000001
           > x25: 0000000000000000 x24: 0000000000001000
           > x23: 0000000000000000 x22: ffffffd0e0f16000
           > x21: ffffffd0e0ae7000 x20: ffffffc010ac3b00
           > x19: 0000000000021fd6 x18: ffffffd0e04aad94
           > x17: 0000000000000000 x16: 0000000000000000
           > x15: ffffffd0e04b519c x14: 0000000000000000
           > x13: 0000000000000000 x12: 0000000000000000
           > x11: 0000000000000000 x10: 0000000000000000
           > x9 : 0000000000000000 x8 : 0000000000000000
           > x7 : 0000000000000000 x6 : ffffffd0e0f16100
           > x5 : 0000000000000000 x4 : 00000000d5300000
           > x3 : 0000000000000000 x2 : ffffffd0e0f160f0
           > x1 : ffffffd0e0f16103 x0 : 0000000000000005
           > Call trace:
           >  do_undefinstr+0x23c/0x2b4
           >  el1_undef+0x2c/0x44
           >  el1_sync_handler+0xa4/0xb0
           >  el1_sync+0x74/0x100
           >  cmdline_proc_show+0xc/0x44
           >  proc_reg_read_iter+0xb0/0xc4
           >  new_sync_read+0x10c/0x15c
           >  vfs_read+0x144/0x18c
           >  ksys_read+0x78/0xe8
           >  __arm64_sys_read+0x24/0x30
      
      We compare first 6 instructions of 'cmdline_proc_show' before and after
      patch (see below). There are 4 instructions modified, so this is case
      that offset between old and new function is out of 128M. And we found
      that instruction at 'cmdline_proc_show+0xc' seems incorrect (it expects
      to be '00021fd6').
        origin:     patched:
        --------    --------
        fd7bbea9    929ff7f0
        21d500f0    f2a91b30
        fd030091    f2d00010
        211040f9    d61f0200 <-- cmdline_proc_show+0xc (expect is '00021fd6')
        f30b00f9    f30b00f9
        f30300aa    f30300aa
      
      It is caused by an incorrect big-to-little endian conversion, and we
      correct it.
      
      Fixes: 5aa9a1a3 ("livepatch/arm64: support livepatch without ftrace")
      Signed-off-by: default avatarZheng Yejian <zhengyejian1@huawei.com>
      Reviewed-by: default avatarKuohai Xu <xukuohai@huawei.com>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
    • Wang ShaoBo's avatar
      arm64/mpam: realign step entry when traversing rmid_transform · e79aa56b
      Wang ShaoBo authored
      hulk inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4SE03
      
      
      CVE: NA
      
      ---------------------------------------------------
      
      This makes step entry aligned with step_size*step_cnt but not step_size,
      and check for alignment before traversing rmid_transform.
      
      When modifying rmid with a value not aligned with step_size*step_cnt,
      for_each_rmid_transform_point_step_from might miss next step point if
      it has been occupied in case step_cnt or step_size not equals to 1,
      which will cause the actual allocated rmid to be inconsistent with the
      expected one.
      
      Fixes: b47b9a81 ("arm64/mpam: rmid: refine allocation and release process")
      Signed-off-by: default avatarWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: default avatarCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: default avatarLaibin Qiu <qiulaibin@huawei.com>
      e79aa56b
    • Xingang Wang's avatar
      dt-bindings: mpam: refactor device tree node structure · 880b1947
      Xingang Wang authored
      arm64/mpam: refactor device tree structure to support multiple
      devices
      
      ascend inclusion
      category: feature
      bugzilla:
      https://gitee.com/openeuler/kernel/issues/I49RB2
      
      
      CVE: NA
      
      ---------------------------------------------------
      
      To support multiple mpam device nodes, all nodes should be organized
      as child of the same parent nodes. This makes sure that the mpam
      discovery start and complete procedure in the right execution order.
      Add modification in the devicetree documentation to record this.
      
      Signed-off-by: default avatarXingang Wang <wangxingang5@huawei.com>
      Signed-off-by: default avatarWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: default avatarCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: default avatarLaibin Qiu <qiulaibin@huawei.com>
      880b1947
    • Xingang Wang's avatar
      arm64/mpam: refactor device tree structure to support multiple devices · 33dea90e
      Xingang Wang authored
      ascend inclusion
      category: feature
      bugzilla:
      https://gitee.com/openeuler/kernel/issues/I49RB2
      
      
      CVE: NA
      
      ---------------------------------------------------
      
      The process of MPAM device tree initialization is like this:
      arm_mpam_device_probe() 	// driver probe
        mpam_discovery_start()	// start discover mpam devices
          [...] 			// find and add mpam devices
        mpam_discovery_complete()   	// trigger mpam_enable
      
      When there are multiple mpam device nodes, the driver probe procedure
      will execute more than once. However, the mpam_discovery_start() and
      mpam_discovery_complete() should only run once. Besides, the start
      should run first, and the complete should run after all devices added.
      
      So we reorganize the device tree structure, so that there will be only
      one mpam device parent nodes, and the probe procedure will only run once.
      We add the child node to represent the mpam devices, and traverse and
      add all mpam devices in the middle procedure of driver probe.
      
      Signed-off-by: default avatarXingang Wang <wangxingang5@huawei.com>
      Signed-off-by: default avatarWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: default avatarCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: default avatarLaibin Qiu <qiulaibin@huawei.com>
      33dea90e
    • Xingang Wang's avatar
      arm64/mpam: fix __mpam_device_create() section mismatch error · 11e2bfdd
      Xingang Wang authored
      ascend inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I49RB2
      
      
      CVE: NA
      
      ---------------------------------------------------
      
      Fix modpost Section mismatch error in __mpam_device_create() and others.
      These warnings will occur in high version gcc, for example 10.1.0.
      
        [...]
        WARNING: vmlinux.o(.text+0x2ed88): Section mismatch in reference from the
        function __mpam_device_create() to the function .init.text:mpam_device_alloc()
        The function __mpam_device_create() references
        the function __init mpam_device_alloc().
        This is often because __mpam_device_create lacks a __init
        annotation or the annotation of mpam_device_alloc is wrong.
      
        WARNING: vmlinux.o(.text.unlikely+0xa5c): Section mismatch in reference from
        the function mpam_resctrl_init() to the function .init.text:mpam_init_padding()
        The function mpam_resctrl_init() references
        the function __init mpam_init_padding().
        This is often because mpam_resctrl_init lacks a __init
        annotation or the annotation of mpam_init_padding is wrong.
      
        WARNING: vmlinux.o(.text.unlikely+0x5a9c): Section mismatch in reference from
        the function resctrl_group_init() to the function .init.text:resctrl_group_setup_root()
        The function resctrl_group_init() references
        the function __init resctrl_group_setup_root().
        This is often because resctrl_group_init lacks a __init
        annotation or the annotation of resctrl_group_setup_root is wrong.
        [...]
      
      Fixes: c5e27c39 ("arm64/mpam: remove __init macro to support driver probe")
      Signed-off-by: default avatarXingang Wang <wangxingang5@huawei.com>
      Signed-off-by: default avatarWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: default avatarCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: default avatarLaibin Qiu <qiulaibin@huawei.com>
      11e2bfdd
  4. Mar 24, 2022
  5. Mar 23, 2022
    • Halil Pasic's avatar
      swiotlb: rework "fix info leak with DMA_FROM_DEVICE" · 3f80e186
      Halil Pasic authored
      mainline inclusion
      from mainline-v5.17-rc8
      commit aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13
      category: bugfix
      bugzilla: 186478, https://gitee.com/openeuler/kernel/issues/I4Z86P
      
      
      CVE: CVE-2022-0854
      
      --------------------------------
      
      Unfortunately, we ended up merging an old version of the patch "fix info
      leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph
      (the swiotlb maintainer), he asked me to create an incremental fix
      (after I have pointed this out the mix up, and asked him for guidance).
      So here we go.
      
      The main differences between what we got and what was agreed are:
      * swiotlb_sync_single_for_device is also required to do an extra bounce
      * We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters
      * The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE
        must take precedence over DMA_ATTR_SKIP_CPU_SYNC
      
      Thus this patch removes DMA_ATTR_OVERWRITE, and makes
      swiotlb_sync_single_for_device() bounce unconditionally (that is, also
      when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
      data from the swiotlb buffer.
      
      Let me note, that if the size used with dma_sync_* API is less than the
      size used with dma_[un]map_*, under certain circumstances we may still
      end up with swiotlb not being transparent. In that sense, this is no
      perfect fix either.
      
      To get this bullet proof, we would have to bounce the entire
      mapping/bounce buffer. For that we would have to figure out the starting
      address, and the size of the mapping in
      swiotlb_sync_single_for_device(). While this does seem possible, there
      seems to be no firm consensus on how things are supposed to work.
      
      Signed-off-by: default avatarHalil Pasic <pasic@linux.ibm.com>
      Fixes: ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Conflicts:
      	Documentation/core-api/dma-attributes.rst
      	include/linux/dma-mapping.h
      	kernel/dma/swiotlb.c
      Signed-off-by: default avatarLiu Shixin <liushixin2@huawei.com>
      Reviewed-by: default avatarXiu Jianfeng <xiujianfeng@huawei.com>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      3f80e186
    • Halil Pasic's avatar
      swiotlb: fix info leak with DMA_FROM_DEVICE · 04c20fc8
      Halil Pasic authored
      mainline inclusion
      from mainline-v5.17-rc6
      commit ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e
      category: bugfix
      bugzilla: 186478, https://gitee.com/openeuler/kernel/issues/I4Z86P
      
      
      CVE: CVE-2022-0854
      
      --------------------------------
      
      The problem I'm addressing was discovered by the LTP test covering
      cve-2018-1000204.
      
      A short description of what happens follows:
      1) The test case issues a command code 00 (TEST UNIT READY) via the SG_IO
         interface with: dxfer_len == 524288, dxdfer_dir == SG_DXFER_FROM_DEV
         and a corresponding dxferp. The peculiar thing about this is that TUR
         is not reading from the device.
      2) In sg_start_req() the invocation of blk_rq_map_user() effectively
         bounces the user-space buffer. As if the device was to transfer into
         it. Since commit a45b599a ("scsi: sg: allocate with __GFP_ZERO in
         sg_build_indirect()") we make sure this first bounce buffer is
         allocated with GFP_ZERO.
      3) For the rest of the story we keep ignoring that we have a TUR, so the
         device won't touch the buffer we prepare as if the we had a
         DMA_FROM_DEVICE type of situation. My setup uses a virtio-scsi device
         and the  buffer allocated by SG is mapped by the function
         virtqueue_add_split() which uses DMA_FROM_DEVICE for the "in" sgs (here
         scatter-gather and not scsi generics). This mapping involves bouncing
         via the swiotlb (we need swiotlb to do virtio in protected guest like
         s390 Secure Execution, or AMD SEV).
      4) When the SCSI TUR is done, we first copy back the content of the second
         (that is swiotlb) bounce buffer (which most likely contains some
         previous IO data), to the first bounce buffer, which contains all
         zeros.  Then we copy back the content of the first bounce buffer to
         the user-space buffer.
      5) The test case detects that the buffer, which it zero-initialized,
        ain't all zeros and fails.
      
      One can argue that this is an swiotlb problem, because without swiotlb
      we leak all zeros, and the swiotlb should be transparent in a sense that
      it does not affect the outcome (if all other participants are well
      behaved).
      
      Copying the content of the original buffer into the swiotlb buffer is
      the only way I can think of to make swiotlb transparent in such
      scenarios. So let's do just that if in doubt, but allow the driver
      to tell us that the whole mapped buffer is going to be overwritten,
      in which case we can preserve the old behavior and avoid the performance
      impact of the extra bounce.
      
      Signed-off-by: default avatarHalil Pasic <pasic@linux.ibm.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Conflicts:
      	Documentation/core-api/dma-attributes.rst
      	include/linux/dma-mapping.h
      	kernel/dma/swiotlb.c
      Signed-off-by: default avatarLiu Shixin <liushixin2@huawei.com>
      Reviewed-by: default avatarXiu Jianfeng <xiujianfeng@huawei.com>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      04c20fc8
  6. Mar 22, 2022
  7. Mar 21, 2022
  8. Mar 18, 2022
  9. Mar 17, 2022
    • Paul Moore's avatar
      audit: improve audit queue handling when "audit=1" on cmdline · 1518b80b
      Paul Moore authored
      mainline inclusion
      from mainline-v5.17-rc3
      commit f26d04331360d42dbd6b58448bd98e4edbfbe1c5
      category: bugfix
      bugzilla: 186384 https://gitee.com/openeuler/kernel/issues/I4X1AI?from=project-issue
      
      
      CVE: NA
      
      --------------------------------
      
      When an admin enables audit at early boot via the "audit=1" kernel
      command line the audit queue behavior is slightly different; the
      audit subsystem goes to greater lengths to avoid dropping records,
      which unfortunately can result in problems when the audit daemon is
      forcibly stopped for an extended period of time.
      
      This patch makes a number of changes designed to improve the audit
      queuing behavior so that leaving the audit daemon in a stopped state
      for an extended period does not cause a significant impact to the
      system.
      
      - kauditd_send_queue() is now limited to looping through the
        passed queue only once per call.  This not only prevents the
        function from looping indefinitely when records are returned
        to the current queue, it also allows any recovery handling in
        kauditd_thread() to take place when kauditd_send_queue()
        returns.
      
      - Transient netlink send errors seen as -EAGAIN now cause the
        record to be returned to the retry queue instead of going to
        the hold queue.  The intention of the hold queue is to store,
        perhaps for an extended period of time, the events which led
        up to the audit daemon going offline.  The retry queue remains
        a temporary queue intended to protect against transient issues
        between the kernel and the audit daemon.
      
      - The retry queue is now limited by the audit_backlog_limit
        setting, the same as the other queues.  This allows admins
        to bound the size of all of the audit queues on the system.
      
      - kauditd_rehold_skb() now returns records to the end of the
        hold queue to ensure ordering is preserved in the face of
        recent changes to kauditd_send_queue().
      
      Cc: stable@vger.kernel.org
      Fixes: 5b52330b ("audit: fix auditd/kernel connection state tracking")
      Fixes: f4b3ee3c85551 ("audit: improve robustness of the audit queue handling")
      Reported-by: default avatarGaosheng Cui <cuigaosheng1@huawei.com>
      Tested-by: default avatarGaosheng Cui <cuigaosheng1@huawei.com>
      Reviewed-by: default avatarRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarCui GaoSheng <cuigaosheng1@huawei.com>
      Reviewed-by: default avatarXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: default avatarLaibin Qiu <qiulaibin@huawei.com>
      1518b80b
    • Cui GaoSheng's avatar
      Revert "audit: bugfix for infinite loop when flush the hold queue" · a9140c08
      Cui GaoSheng authored
      hulk inclusion
      category: bugfix
      bugzilla: 186384 https://gitee.com/openeuler/kernel/issues/I4X1AI?from=project-issue
      
      
      CVE: NA
      
      --------------------------------
      
      This reverts commit 67ab712f.
      
      Signed-off-by: default avatarCui GaoSheng <cuigaosheng1@huawei.com>
      Reviewed-by: default avatarXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: default avatarLaibin Qiu <qiulaibin@huawei.com>
      a9140c08
    • Daniel Borkmann's avatar
      veth: Do not record rx queue hint in veth_xmit · 2632c58b
      Daniel Borkmann authored
      
      stable inclusion
      from linux-4.19.226
      commit bd6e97e2b6f59a19894c7032a83f03ad38ede28e
      
      --------------------------------
      
      commit 710ad98c363a66a0cd8526465426c5c5f8377ee0 upstream.
      
      Laurent reported that they have seen a significant amount of TCP retransmissions
      at high throughput from applications residing in network namespaces talking to
      the outside world via veths. The drops were seen on the qdisc layer (fq_codel,
      as per systemd default) of the phys device such as ena or virtio_net due to all
      traffic hitting a _single_ TX queue _despite_ multi-queue device. (Note that the
      setup was _not_ using XDP on veths as the issue is generic.)
      
      More specifically, after edbea9220251 ("veth: Store queue_mapping independently
      of XDP prog presence") which made it all the way back to v4.19.184+,
      skb_record_rx_queue() would set skb->queue_mapping to 1 (given 1 RX and 1 TX
      queue by default for veths) instead of leaving at 0.
      
      This is eventually retained and callbacks like ena_select_queue() will also pick
      single queue via netdev_core_pick_tx()'s ndo_select_queue() once all the traffic
      is forwarded to that device via upper stack or other means. Similarly, for others
      not implementing ndo_select_queue() if XPS is disabled, netdev_pick_tx() might
      call into the skb_tx_hash() and check for prior skb_rx_queue_recorded() as well.
      
      In general, it is a _bad_ idea for virtual devices like veth to mess around with
      queue selection [by default]. Given dev->real_num_tx_queues is by default 1,
      the skb->queue_mapping was left untouched, and so prior to edbea9220251 the
      netdev_core_pick_tx() could do its job upon __dev_queue_xmit() on the phys device.
      
      Unbreak this and restore prior behavior by removing the skb_record_rx_queue()
      from veth_xmit() altogether.
      
      If the veth peer has an XDP program attached, then it would return the first RX
      queue index in xdp_md->rx_queue_index (unless configured in non-default manner).
      However, this is still better than breaking the generic case.
      
      Fixes: edbea9220251 ("veth: Store queue_mapping independently of XDP prog presence")
      Fixes: 638264dc ("veth: Support per queue XDP ring")
      Reported-by: default avatarLaurent Bernaille <laurent.bernaille@datadoghq.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
      Cc: Toshiaki Makita <toshiaki.makita1@gmail.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarToshiaki Makita <toshiaki.makita1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Conflicts:
      	drivers/net/veth.c
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: default avatarLaibin Qiu <qiulaibin@huawei.com>
      2632c58b
  10. Mar 15, 2022