Skip to content
Snippets Groups Projects
  1. Jan 18, 2017
  2. Jan 17, 2017
    • David S. Miller's avatar
      Merge branch 'mvneta-xmit_more-bql' · b8128c42
      David S. Miller authored
      
      Marcin Wojtas says:
      
      ====================
      mvneta xmit_more and bql support
      
      This is a delayed v2 of short patchset, which introduces xmit_more and BQL
      to mvneta driver. The only one change was added in xmit_more support -
      condition check preventing excessive descriptors concatenation before
      flushing in HW.
      
      Any comments or feedback would be welcome.
      
      Changelog:
      v1 -> v2:
      
      * Add checking condition that ensures too much descriptors are not
        concatenated before flushing in HW.
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8128c42
    • Marcin Wojtas's avatar
      net: mvneta: add BQL support · a29b6235
      Marcin Wojtas authored
      
      Tests showed that when whole bandwidth is consumed, the latency for
      various kind of traffic can reach high values. With saturated
      link (e.g. with iperf from target to host) simple ping could take
      significant amount of time. BQL proved to improve this situation
      when implemented in mvneta driver. Measurements of ping latency
      for 3 link speeds:
      Speed | Latency w/o BQL | Latency with BQL
      10    |      7-14 ms    |     3.5 ms
      100   |      2-12 ms    |     0.6 ms
      1000  |   often timeout |   up to 2ms
      
      Decreasing latency as above result in sligt performance cost - 4kpps
      (-1.4%) when pushing 64B packets via two bridged interfaces of Armada 38x.
      For 1500B packets in the same setup, the mpstat tool showed +8% of
      CPU occupation (default affinity, second CPU idle). Even though this
      cost seems reasonable to take, considering other improvements.
      
      This commit adds byte queue limit mechanism for the mvneta driver.
      
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a29b6235
    • Simon Guinot's avatar
      net: mvneta: add xmit_more support · 2a90f7e1
      Simon Guinot authored
      
      Basing on xmit_more flag of the skb, TX descriptors can be concatenated
      before flushing. This commit delay Tx descriptor flush if the queue is
      running and if there is more skb's to send.
      
      A maximum allowed number of descriptors for flushing at once due to
      MVNETA_TXQ_UPDATE_REG(q) reqisters limitation, is 255. Because of that
      a new macro was added (MVNETA_TXQ_DEC_SENT_MASK) in order to ensure that
      concatenated amount of descriptor does not exceed that value.
      
      Signed-off-by: default avatarSimon Guinot <simon.guinot@sequanux.org>
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a90f7e1
    • Jamal Hadi Salim's avatar
      net sched actions: fix refcnt when GETing of action after bind · 0faa9cb5
      Jamal Hadi Salim authored
      
      Demonstrating the issue:
      
      .. add a drop action
      $sudo $TC actions add action drop index 10
      
      .. retrieve it
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 2 bind 0 installed 29 sec used 29 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      ... bug 1 above: reference is two.
          Reference is actually 1 but we forget to subtract 1.
      
      ... do a GET again and we see the same issue
          try a few times and nothing changes
      ~$ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 2 bind 0 installed 31 sec used 31 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      ... lets try to bind the action to a filter..
      $ sudo $TC qdisc add dev lo ingress
      $ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
        u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10
      
      ... and now a few GETs:
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 3 bind 1 installed 204 sec used 204 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 4 bind 1 installed 206 sec used 206 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 5 bind 1 installed 235 sec used 235 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      .... as can be observed the reference count keeps going up.
      
      After the fix
      
      $ sudo $TC actions add action drop index 10
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 1 bind 0 installed 4 sec used 4 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 1 bind 0 installed 6 sec used 6 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC qdisc add dev lo ingress
      $ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
        u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 2 bind 1 installed 32 sec used 32 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 2 bind 1 installed 33 sec used 33 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      Fixes: aecc5cef ("net sched actions: fix GETing actions")
      Signed-off-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0faa9cb5
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-4.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 5cf7a0f3
      Linus Torvalds authored
      Pull NFS client bugfixes from Trond Myklebust:
      
       - fix invalid fget()/fput() calls when doing file locking
      
       - fix multiple directory cache invalidation issues due to the client
         failing to recognise that the directory wasn't changed
      
       - fix client recovery when server reboots multiple times
      
      * tag 'nfs-for-4.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        NFSv4: Fix client recovery when server reboots multiple times
        NFSv4: update_changeattr should update the attribute timestamp
        NFSv4: Don't call update_changeattr() unless the unlink is successful
        NFSv4: Don't apply change_info4 twice on rename within a directory
        NFSv4: Call update_changeattr() from _nfs4_proc_open only if a file was created
        nfs: Don't take a reference on fl->fl_file for LOCK operation
      5cf7a0f3
    • David S. Miller's avatar
      Merge branch 'mlx4-core-fixes' · 617125e7
      David S. Miller authored
      Tariq Toukan says:
      
      ====================
      mlx4 core fixes
      
      This patchset contains bug fixes from Jack to the mlx4 Core driver.
      
      Patch 1 solves a race in the flow of CQ free.
      Patch 2 moves some qp context flags update to the correct qp transition.
      Patch 3 eliminates warnings from the path of SRQ_LIMIT that flood the message log,
      and keeps them only in the path of SRQ_CATAS_ERROR.
      
      Series generated against net commit:
      1a717fcf Merge tag 'mac80211-for-davem-2017-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      
      ====================
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      617125e7
    • Jack Morgenstein's avatar
      net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV · 9577b174
      Jack Morgenstein authored
      
      When running SRIOV, warnings for SRQ LIMIT events flood the Hypervisor's
      message log when (correct, normally operating) apps use SRQ LIMIT events
      as a trigger to post WQEs to SRQs.
      
      Add more information to the existing debug printout for SRQ_LIMIT, and
      output the warning messages only for the SRQ CATAS ERROR event.
      
      Fixes: acba2420 ("mlx4_core: Add wrapper functions and comm channel and slave event support to EQs")
      Fixes: e0debf9c ("mlx4_core: Reduce warning message for SRQ_LIMIT event to debug level")
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9577b174
    • Jack Morgenstein's avatar
      net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions · 7c3945bc
      Jack Morgenstein authored
      
      Save the qp context flags byte containing the flag disabling vlan stripping
      in the RESET to INIT qp transition, rather than in the INIT to RTR
      transition. Per the firmware spec, the flags in this byte are active
      in the RESET to INIT transition.
      
      As a result of saving the flags in the incorrect qp transition, when
      switching dynamically from VGT to VST and back to VGT, the vlan
      remained stripped (as is required for VST) and did not return to
      not-stripped (as is required for VGT).
      
      Fixes: f0f829bf ("net/mlx4_core: Add immediate activate for VGT->VST->VGT")
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c3945bc
    • Jack Morgenstein's avatar
      net/mlx4_core: Fix racy CQ (Completion Queue) free · 291c566a
      Jack Morgenstein authored
      
      In function mlx4_cq_completion() and mlx4_cq_event(), the
      radix_tree_lookup requires a rcu_read_lock.
      This is mandatory: if another core frees the CQ, it could
      run the radix_tree_node_rcu_free() call_rcu() callback while
      its being used by the radix tree lookup function.
      
      Additionally, in function mlx4_cq_event(), since we are adding
      the rcu lock around the radix-tree lookup, we no longer need to take
      the spinlock. Also, the synchronize_irq() call for the async event
      eliminates the need for incrementing the cq reference count in
      mlx4_cq_event().
      
      Other changes:
      1. In function mlx4_cq_free(), replace spin_lock_irq with spin_lock:
         we no longer take this spinlock in the interrupt context.
         The spinlock here, therefore, simply protects against different
         threads simultaneously invoking mlx4_cq_free() for different cq's.
      
      2. In function mlx4_cq_free(), we move the radix tree delete to before
         the synchronize_irq() calls. This guarantees that we will not
         access this cq during any subsequent interrupts, and therefore can
         safely free the CQ after the synchronize_irq calls. The rcu_read_lock
         in the interrupt handlers only needs to protect against corrupting the
         radix tree; the interrupt handlers may access the cq outside the
         rcu_read_lock due to the synchronize_irq calls which protect against
         premature freeing of the cq.
      
      3. In function mlx4_cq_event(), we change the mlx_warn message to mlx4_dbg.
      
      4. We leave the cq reference count mechanism in place, because it is
         still needed for the cq completion tasklet mechanism.
      
      Fixes: 6d90aa5c ("net/mlx4_core: Make sure there are no pending async events when freeing CQ")
      Fixes: 225c7b1f ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      291c566a
    • Paul Blakey's avatar
      net/sched: cls_flower: Disallow duplicate internal elements · a3308d8f
      Paul Blakey authored
      
      Flower currently allows having the same filter twice with the same
      priority. Actions (and statistics update) will always execute on the
      first inserted rule leaving the second rule unused.
      This patch disallows that.
      
      Signed-off-by: default avatarPaul Blakey <paulb@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3308d8f
    • Heiner Kallweit's avatar
      net: stmmac: don't use netdev_[dbg, info, ..] before net_device is registered · b618ab45
      Heiner Kallweit authored
      
      Don't use netdev_info and friends before the net_device is registered.
      This avoids ugly messages like
      "meson8b-dwmac c9410000.ethernet (unnamed net_device) (uninitialized):
      Enable RX Mitigation via HW Watchdog Timer"
      
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b618ab45
    • Arnd Bergmann's avatar
      net/mlx5e: Fix a -Wmaybe-uninitialized warning · abeffce9
      Arnd Bergmann authored
      As found by Olof's build bot, we gain a harmless warning about a
      potential uninitialized variable reference in mlx5:
      
      drivers/net/ethernet/mellanox/mlx5/core/en_tc.c: In function 'parse_tc_fdb_actions':
      drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:769:13: warning: 'out_dev' may be used uninitialized in this function [-Wmaybe-uninitialized]
      drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:811:21: note: 'out_dev' was declared here
      
      This was introduced through the addition of an 'IS_ERR/PTR_ERR' pair
      that gcc is unfortunately unable to completely figure out.
      
      The problem being gcc cannot tell that if(IS_ERR()) in
      mlx5e_route_lookup_ipv4() is equivalent to checking if(err) later,
      so it assumes that 'out_dev' is used after the 'return PTR_ERR(rt)'.
      
      The PTR_ERR_OR_ZERO() case by comparison is fairly easy to detect
      by gcc, so it can't get that wrong, so it no longer warns.
      
      Hadar Hen Zion already attempted to fix the warning earlier by adding fake
      initializations, but that ended up not fully addressing all warnings, so
      I'm reverting it now that it is no longer needed.
      
      Link: http://arm-soc.lixom.net/buildlogs/mainline/v4.10-rc3-98-gcff3b2c/
      
      
      Fixes: a42485eb ("net/mlx5e: TC ipv4 tunnel encap offload error flow fixes")
      Fixes: a757d108 ("net/mlx5e: Fix kbuild warnings for uninitialized parameters")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      abeffce9
    • David Lebrun's avatar
      ipv6: sr: add missing Kbuild export for header files · a50a05f4
      David Lebrun authored
      
      Add missing IPv6-SR header files in include/uapi/linux/Kbuild.
      
      Also, prevent seg6_lwt_headroom() from being exported and add
      missing linux/types.h include.
      
      Signed-off-by: default avatarDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a50a05f4
    • Daniel Borkmann's avatar
      bpf, trace: make ctx access checks more robust · 2d071c64
      Daniel Borkmann authored
      
      Make sure that ctx cannot potentially be accessed oob by asserting
      explicitly that ctx access size into pt_regs for BPF_PROG_TYPE_KPROBE
      programs must be within limits. In case some 32bit archs have pt_regs
      not being a multiple of 8, then BPF_DW access could cause such access.
      
      BPF_PROG_TYPE_KPROBE progs don't have a ctx conversion function since
      there's no extra mapping needed. kprobe_prog_is_valid_access() didn't
      enforce sizeof(long) as the only allowed access size, since LLVM can
      generate non BPF_W/BPF_DW access to regs from time to time.
      
      For BPF_PROG_TYPE_TRACEPOINT we don't have a ctx conversion either, so
      add a BUILD_BUG_ON() check to make sure that BPF_DW access will not be
      a similar issue in future (ctx works on event buffer as opposed to
      pt_regs there).
      
      Fixes: 2541517c ("tracing, perf: Implement BPF programs attached to kprobes")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d071c64
    • Basil Gunn's avatar
      ax25: Fix segfault after sock connection timeout · 8a367e74
      Basil Gunn authored
      
      The ax.25 socket connection timed out & the sock struct has been
      previously taken down ie. sock struct is now a NULL pointer. Checking
      the sock_flag causes the segfault.  Check if the socket struct pointer
      is NULL before checking sock_flag. This segfault is seen in
      timed out netrom connections.
      
      Please submit to -stable.
      
      Signed-off-by: default avatarBasil Gunn <basil@pacabunga.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a367e74
    • Mahesh Bandewar's avatar
      ipvlan: fix dev_id creation corner case. · 019ec003
      Mahesh Bandewar authored
      
      In the last patch da36e13c ("ipvlan: improvise dev_id generation
      logic in IPvlan") I missed some part of Dave's suggestion and because
      of that the dev_id creation could fail in a corner case scenario. This
      would happen when more or less 64k devices have been already created and
      several have been deleted. If the devices that are still sticking around
      are the last n bits from the bitmap. So in this scenario even if lower
      bits are available, the dev_id search is so narrow that it always fails.
      
      Fixes: da36e13c ("ipvlan: improvise dev_id generation logic in IPvlan")
      CC: David Miller <davem@davemloft.org>
      CC: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      019ec003
    • Daniel Borkmann's avatar
      bpf: rework prog_digest into prog_tag · f1f7714e
      Daniel Borkmann authored
      
      Commit 7bd509e3 ("bpf: add prog_digest and expose it via
      fdinfo/netlink") was recently discussed, partially due to
      admittedly suboptimal name of "prog_digest" in combination
      with sha1 hash usage, thus inevitably and rightfully concerns
      about its security in terms of collision resistance were
      raised with regards to use-cases.
      
      The intended use cases are for debugging resp. introspection
      only for providing a stable "tag" over the instruction sequence
      that both kernel and user space can calculate independently.
      It's not usable at all for making a security relevant decision.
      So collisions where two different instruction sequences generate
      the same tag can happen, but ideally at a rather low rate. The
      "tag" will be dumped in hex and is short enough to introspect
      in tracepoints or kallsyms output along with other data such
      as stack trace, etc. Thus, this patch performs a rename into
      prog_tag and truncates the tag to a short output (64 bits) to
      make it obvious it's not collision-free.
      
      Should in future a hash or facility be needed with a security
      relevant focus, then we can think about requirements, constraints,
      etc that would fit to that situation. For now, rework the exposed
      parts for the current use cases as long as nothing has been
      released yet. Tested on x86_64 and s390x.
      
      Fixes: 7bd509e3 ("bpf: add prog_digest and expose it via fdinfo/netlink")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1f7714e
    • Edward Cree's avatar
      sfc: get PIO buffer size from the NIC · c634700f
      Edward Cree authored
      
      The 8000 series SFC NICs have 4K PIO buffers, rather than the 2K of
       the 7000 series.  Rather than having a hard-coded PIO buffer size
       (ER_DZ_TX_PIOBUF_SIZE), read it from the GET_CAPABILITIES_V2 MCDI
       response.
      
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c634700f