Skip to content
Snippets Groups Projects
  1. Apr 16, 2022
  2. Apr 15, 2022
  3. Apr 14, 2022
  4. Apr 12, 2022
    • Igor Pylypiv's avatar
      Revert "module, async: async_synchronize_full() on module init iff async is used" · dc8da50c
      Igor Pylypiv authored
      
      stable inclusion
      from linux-4.19.231
      commit a0c66ac8b72f816d5631fde0ca0b39af602dce48
      
      --------------------------------
      
      [ Upstream commit 67d6212afda218d564890d1674bab28e8612170f ]
      
      This reverts commit 774a1221.
      
      We need to finish all async code before the module init sequence is
      done.  In the reverted commit the PF_USED_ASYNC flag was added to mark a
      thread that called async_schedule().  Then the PF_USED_ASYNC flag was
      used to determine whether or not async_synchronize_full() needs to be
      invoked.  This works when modprobe thread is calling async_schedule(),
      but it does not work if module dispatches init code to a worker thread
      which then calls async_schedule().
      
      For example, PCI driver probing is invoked from a worker thread based on
      a node where device is attached:
      
      	if (cpu < nr_cpu_ids)
      		error = work_on_cpu(cpu, local_pci_probe, &ddi);
      	else
      		error = local_pci_probe(&ddi);
      
      We end up in a situation where a worker thread gets the PF_USED_ASYNC
      flag set instead of the modprobe thread.  As a result,
      async_synchronize_full() is not invoked and modprobe completes without
      waiting for the async code to finish.
      
      The issue was discovered while loading the pm80xx driver:
      (scsi_mod.scan=async)
      
      modprobe pm80xx                      worker
      ...
        do_init_module()
        ...
          pci_call_probe()
            work_on_cpu(local_pci_probe)
                                           local_pci_probe()
                                             pm8001_pci_probe()
                                               scsi_scan_host()
                                                 async_schedule()
                                                 worker->flags |= PF_USED_ASYNC;
                                           ...
            < return from worker >
        ...
        if (current->flags & PF_USED_ASYNC) <--- false
        	async_synchronize_full();
      
      Commit 21c3c5d2 ("block: don't request module during elevator init")
      fixed the deadlock issue which the reverted commit 774a1221
      ("module, async: async_synchronize_full() on module init iff async is
      used") tried to fix.
      
      Since commit 0fdff3ec ("async, kmod: warn on synchronous
      request_module() from async workers") synchronous module loading from
      async is not allowed.
      
      Given that the original deadlock issue is fixed and it is no longer
      allowed to call synchronous request_module() from async we can remove
      PF_USED_ASYNC flag to make module init consistently invoke
      async_synchronize_full() unless async module probe is requested.
      
      Signed-off-by: default avatarIgor Pylypiv <ipylypiv@google.com>
      Reviewed-by: default avatarChangyuan Lyu <changyuanl@google.com>
      Reviewed-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      Signed-off-by: default avatarLaibin Qiu <qiulaibin@huawei.com>
    • daniel.starke@siemens.com's avatar
      tty: n_gsm: fix encoding of control signal octet bit DV · be439c06
      daniel.starke@siemens.com authored
      stable inclusion
      from linux-4.19.232
      commit 28ca082153794cf5c98e7bb93d7f30f8ba46bec4
      
      --------------------------------
      
      commit 737b0ef3be6b319d6c1fd64193d1603311969326 upstream.
      
      n_gsm is based on the 3GPP 07.010 and its newer version is the 3GPP 27.010.
      See https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=1516
      
      
      The changes from 07.010 to 27.010 are non-functional. Therefore, I refer to
      the newer 27.010 here. Chapter 5.4.6.3.7 describes the encoding of the
      control signal octet used by the MSC (modem status command). The same
      encoding is also used in convergence layer type 2 as described in chapter
      5.5.2. Table 7 and 24 both require the DV (data valid) bit to be set 1 for
      outgoing control signal octets sent by the DTE (data terminal equipment),
      i.e. for the initiator side.
      Currently, the DV bit is only set if CD (carrier detect) is on, regardless
      of the side.
      
      This patch fixes this behavior by setting the DV bit on the initiator side
      unconditionally.
      
      Fixes: e1eaea46 ("tty: n_gsm line discipline")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDaniel Starke <daniel.starke@siemens.com>
      Link: https://lore.kernel.org/r/20220218073123.2121-1-daniel.starke@siemens.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      Signed-off-by: default avatarLaibin Qiu <qiulaibin@huawei.com>
      be439c06
    • Linus Torvalds's avatar
      fget: clarify and improve __fget_files() implementation · 8e227868
      Linus Torvalds authored
      stable inclusion
      from linux-4.19.232
      commit 400c2f361c25bc092d0636cfa32d0549a181e653
      
      --------------------------------
      
      commit e386dfc56f837da66d00a078e5314bc8382fab83 upstream.
      
      Commit 054aa8d439b9 ("fget: check that the fd still exists after getting
      a ref to it") fixed a race with getting a reference to a file just as it
      was being closed.  It was a fairly minimal patch, and I didn't think
      re-checking the file pointer lookup would be a measurable overhead,
      since it was all right there and cached.
      
      But I was wrong, as pointed out by the kernel test robot.
      
      The 'poll2' case of the will-it-scale.per_thread_ops benchmark regressed
      quite noticeably.  Admittedly it seems to be a very artificial test:
      doing "poll()" system calls on regular files in a very tight loop in
      multiple threads.
      
      That means that basically all the time is spent just looking up file
      descriptors without ever doing anything useful with them (not that doing
      'poll()' on a regular file is useful to begin with).  And as a result it
      shows the extra "re-check fd" cost as a sore thumb.
      
      Happily, the regression is fixable by just writing the code to loook up
      the fd to be better and clearer.  There's still a cost to verify the
      file pointer, but now it's basically in the noise even for that
      benchmark that does nothing else - and the code is more understandable
      and has better comments too.
      
      [ Side note: this patch is also a classic case of one that looks very
        messy with the default greedy Myers diff - it's much more legible with
        either the patience of histogram diff algorithm ]
      
      Link: https://lore.kernel.org/lkml/20211210053743.GA36420@xsang-OptiPlex-9020/
      Link: https://lore.kernel.org/lkml/20211213083154.GA20853@linux.intel.com/
      
      
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Tested-by: default avatarCarel Si <beibei.si@intel.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Miklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      Signed-off-by: default avatarLaibin Qiu <qiulaibin@huawei.com>
      8e227868
    • Miaohe Lin's avatar
      memblock: use kfree() to release kmalloced memblock regions · 7287cbc1
      Miaohe Lin authored
      
      stable inclusion
      from linux-4.19.232
      commit a6fcced73d15ab57cd97813999edf6f19d3032f8
      
      --------------------------------
      
      commit c94afc46cae7ad41b2ad6a99368147879f4b0e56 upstream.
      
      memblock.{reserved,memory}.regions may be allocated using kmalloc() in
      memblock_double_array(). Use kfree() to release these kmalloced regions
      indicated by memblock_{reserved,memory}_in_slab.
      
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Fixes: 3010f876 ("mm: discard memblock data later")
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      Signed-off-by: default avatarLaibin Qiu <qiulaibin@huawei.com>
      7287cbc1
    • daniel.starke@siemens.com's avatar
      tty: n_gsm: fix proper link termination after failed open · 99c367d7
      daniel.starke@siemens.com authored
      
      stable inclusion
      from linux-4.19.232
      commit 337e49675ce55c23a50b92aae889ec5d910d6dc7
      
      --------------------------------
      
      commit e3b7468f082d106459e86e8dc6fb9bdd65553433 upstream.
      
      Trying to open a DLCI by sending a SABM frame may fail with a timeout.
      The link is closed on the initiator side without informing the responder
      about this event. The responder assumes the link is open after sending a
      UA frame to answer the SABM frame. The link gets stuck in a half open
      state.
      
      This patch fixes this by initiating the proper link termination procedure
      after link setup timeout instead of silently closing it down.
      
      Fixes: e1eaea46 ("tty: n_gsm line discipline")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDaniel Starke <daniel.starke@siemens.com>
      Link: https://lore.kernel.org/r/20220218073123.2121-3-daniel.starke@siemens.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      Signed-off-by: default avatarLaibin Qiu <qiulaibin@huawei.com>
      99c367d7
    • Tao Liu's avatar
      gso: do not skip outer ip header in case of ipip and net_failover · df2f0cf3
      Tao Liu authored
      
      stable inclusion
      from linux-4.19.232
      commit e9ffbe63f6f32f526a461756309b61c395168d73
      
      --------------------------------
      
      commit cc20cced0598d9a5ff91ae4ab147b3b5e99ee819 upstream.
      
      We encounter a tcp drop issue in our cloud environment. Packet GROed in
      host forwards to a VM virtio_net nic with net_failover enabled. VM acts
      as a IPVS LB with ipip encapsulation. The full path like:
      host gro -> vm virtio_net rx -> net_failover rx -> ipvs fullnat
       -> ipip encap -> net_failover tx -> virtio_net tx
      
      When net_failover transmits a ipip pkt (gso_type = 0x0103, which means
      SKB_GSO_TCPV4, SKB_GSO_DODGY and SKB_GSO_IPXIP4), there is no gso
      did because it supports TSO and GSO_IPXIP4. But network_header points to
      inner ip header.
      
      Call Trace:
       tcp4_gso_segment        ------> return NULL
       inet_gso_segment        ------> inner iph, network_header points to
       ipip_gso_segment
       inet_gso_segment        ------> outer iph
       skb_mac_gso_segment
      
      Afterwards virtio_net transmits the pkt, only inner ip header is modified.
      And the outer one just keeps unchanged. The pkt will be dropped in remote
      host.
      
      Call Trace:
       inet_gso_segment        ------> inner iph, outer iph is skipped
       skb_mac_gso_segment
       __skb_gso_segment
       validate_xmit_skb
       validate_xmit_skb_list
       sch_direct_xmit
       __qdisc_run
       __dev_queue_xmit        ------> virtio_net
       dev_hard_start_xmit
       __dev_queue_xmit        ------> net_failover
       ip_finish_output2
       ip_output
       iptunnel_xmit
       ip_tunnel_xmit
       ipip_tunnel_xmit        ------> ipip
       dev_hard_start_xmit
       __dev_queue_xmit
       ip_finish_output2
       ip_output
       ip_forward
       ip_rcv
       __netif_receive_skb_one_core
       netif_receive_skb_internal
       napi_gro_receive
       receive_buf
       virtnet_poll
       net_rx_action
      
      The root cause of this issue is specific with the rare combination of
      SKB_GSO_DODGY and a tunnel device that adds an SKB_GSO_ tunnel option.
      SKB_GSO_DODGY is set from external virtio_net. We need to reset network
      header when callbacks.gso_segment() returns NULL.
      
      This patch also includes ipv6_gso_segment(), considering SIT, etc.
      
      Fixes: cb32f511 ("ipip: add GSO/TSO support")
      Signed-off-by: default avatarTao Liu <thomas.liu@ucloud.cn>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarYongqiang Liu <liuyongqiang13@huawei.com>
      Signed-off-by: default avatarLaibin Qiu <qiulaibin@huawei.com>
      df2f0cf3