- Dec 27, 2019
-
-
driver inclusion category: bugfix bugzilla: NA CVE: NA Add the sgl code for sec/zip module. Reviewed-by:
fanghao <fanghao11@huawei.com> Reviewed-by:
wangzhou <wangzhou1@hisilicon.com> Reviewed-by:
hucheng.hu <hucheng.hu@huawei.com> Signed-off-by:
lingmingqiang <lingmingqiang@huawei.com> Signed-off-by:
Mingqiang Ling <lingmingqiang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category: bugfix bugzilla: NA CVE: NA Flush file operation is an OS default calling, in some scenarioes, which will cause dangerous behavior. So, as user space ioctl for putting queue is brought in, to make sure all the kernel resources are freed immediately as user space calls API of wd_release_queue. Signed-off-by:
xuzaibo <xuzaibo@huawei.com> Reviewed-by:
lingmingqiang <lingmingqiang@huawei.com> Reviewed-by:
wangzhou <wangzhou1@hisilicon.com> Signed-off-by:
lingmingqiang <lingmingqiang@huawei.com> Signed-off-by:
Mingqiang Ling <lingmingqiang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category: bugfix bugzilla: NA CVE: NA Description:uacce: Fix vunmap in uacce_start_queue Signed-off-by:
Zhou Wang <wangzhou1@hisilicon.com> Reviewed-by:
xuzaibo <xuzaibo@huawei.com> Signed-off-by:
lingmingqiang <lingmingqiang@huawei.com> Signed-off-by:
Mingqiang Ling <lingmingqiang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category: bugfix bugzilla: NA CVE: NA [ 867.662129] Call trace: [ 867.664565] dump_backtrace+0x0/0x1c0 [ 867.668213] show_stack+0x24/0x30 [ 867.671516] dump_stack+0xa8/0xcc [ 867.674818] oom_show_debug_info+0x20/0xe0 [ 867.678900] out_of_memory+0x1f0/0x520 [ 867.682634] __alloc_pages_nodemask+0xca8/0xd38 [ 867.687151] iommu_dma_alloc+0x178/0x428 [ 867.691060] __iommu_alloc_attrs+0x280/0x3f8 [ 867.695318] uacce_create_region+0x42c/0x5e8 [uacce] [ 867.700269] uacce_fops_mmap+0x298/0x338 [uacce] [ 867.704871] mmap_region+0x3e8/0x5b8 [ 867.708432] do_mmap+0x304/0x470 [ 867.711646] vm_mmap_pgoff+0xf4/0x128 [ 867.715294] ksys_mmap_pgoff+0xb4/0x258 [ 867.719115] __arm64_sys_mmap+0x34/0x48 [ 867.722937] el0_svc_common+0xa0/0x180 [ 867.726672] el0_svc_handler+0x38/0x78 [ 867.730407] el0_svc+0x8/0xc [ 867.742023] error, MEM_PrintAllMemory is NULL! [ 867.746455] [ 867.746455] slab info: [ 867.750287] slabinfo - version: 2.1 Add munmap as wd_release_queue in UACCE Signed-off-by:
xuzaibo <xuzaibo@huawei.com> Reviewed-by:
wangzhou <wangzhou1@hisilicon.com> Signed-off-by:
lingmingqiang <lingmingqiang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category: bugfix bugzilla: NA CVE: NA This patch modifies some comments on debugfs centralized review. Feature or Bugfix:Bugfix Signed-off-by:
Yufeng Mo <moyufeng@huawei.com> Reviewed-by:
lipeng <lipeng321@huawei.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category: bugfix bugzilla: NA CVE: NA In some scenarios, upper application can ensure that there is no concurrency when processing io, thus lock free can be used to improve performance. Feature or Bugfix:Bugfix Signed-off-by:
Yixian Liu <liuyixian@huawei.com> Reviewed-by:
wangxi <wangxi11@huawei.com> Reviewed-by:
oulijun <oulijun@huawei.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category: bugfix bugzilla: NA CVE: NA If msg start sge idx in sq wqe is not initialized, data inconsistency might occur during retransmission. Feature or Bugfix: Bugfix Signed-off-by:
Weihang Li <liweihang@hisilicon.com> Reviewed-by:
liuyixian <liuyixian@huawei.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category: bugfix bugzilla: NA CVE: NA Reduce eqe specification in order to reduce memory usage during load driver. Feature or Bugfix: Bugfix Signed-off-by:
Weihang Li <liweihang@hisilicon.com> Reviewed-by:
liuyixian <liuyixian@huawei.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category: bugfix bugzilla: NA CVE: NA User application would set page shift of umem, and expect to use this value during create mr. Feature or Bugfix: Bugfix Signed-off-by:
Weihang Li <liweihang@hisilicon.com> Reviewed-by:
liuyixian <liuyixian@huawei.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category: bugfix bugzilla: NA CVE: NA In add_gid and delete gid process, driver would send cmd to IMP and use jiffies to wait the result of cmd. Jiffies cannot work in irq, so spin_lock_irqsave should be deleted in this process. Feature or Bugfix: Bugfix Signed-off-by:
Weihang Li <liweihang@hisilicon.com> Reviewed-by:
wangxi <wangxi11@huawei.com> Reviewed-by:
liuyixian <liuyixian@huawei.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category: bugfix bugzilla: NA CVE: NA The format specifier "%p" can leak kernel address, and use "%pK" instead. Signed-off-by:
Xiang Chen <chenxiang66@hisilicon.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category:bugfix bugzilla:4472 CVE:NA ------------------------------------------------------------------------ This patch fix bug in api_csr_write. Reviewed-by:
chiqijun <chiqijun@huawei.com> Signed-off-by:
Wu Like <wulike1@huawei.com> Signed-off-by:
Xue <xuechaojing@huawei.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category: bugfix bugzilla: NA CVE: NA The format specifier "%p" can leak kernel addresses. Use "%pK" instead. Signed-off-by:
Lang Cheng <chenglang@huawei.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category: bugfix bugzilla: NA CVE: NA Use delayed work instead of using timers to trigger the hclge_serive. Simplify the code with one less middle function and in order to support misc irq affinity. Feature or Bugfix:Bugfix Signed-off-by:
Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by:
lipeng <lipeng321@huawei.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category: bugfix bugzilla: NA CVE: NA The unused_count variable is used to indicate how many rx BD need attaching new buffer in hns3_clean_rx_ring, and the clean_count variable has the similar meaning. This patch removes the clean_count variable and use unused_count to uniformly indicate the rx BD that need attaching new buffer. This patch also clean up some coding style related to variable assignment in hns3_clean_rx_ring. Feature or Bugfix:Bugfix Signed-off-by:
Yunsheng Lin <linyunsheng@huawei.com> Reviewed-by:
lipeng <lipeng321@huawei.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
谢秀奇 authored
hulk inclusion category: bugfix bugzilla: 16633 CVE: NA snprintf is more safe than sprintf, it is recommended to use snprintf. Signed-off-by:
Xie XiuQi <xiexiuqi@huawei.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
euler inclusion category: bugfix bugzilla: 4411 CVE: NA ------------------------------------------------- In order to avoid integer overflow, we should limit the ranges of loop_qlen value. Fixes: 997518dea253 ("ipvlan: Introduce local xmit queue for l2e mode") Signed-off-by:
Keefe Liu <liuqifa@huawei.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
谢秀奇 authored
hulk inclusion category: bugfix bugzilla: 5289,16633 CVE: NA User-mode address printing is not mandatory. To be on the safe side, changed it to %pK. Signed-off-by:
Xie XiuQi <xiexiuqi@huawei.com> Reviewed-by:
Jason Yan <yanaijie@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
hulk inclusion category: bugfix bugzilla: 4390 CVE: NA ------------------- We use 'bir' as the index of array resource[DEVICE_COUNT_RESOURCE]. Wrong 'bir' will cause access out of range. This patch add a check for 'bir'. Signed-off-by:
Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category: bugfix bugzilla: NA CVE: NA When run perftest in many times, the system will report a BUG as follows: [ 2312.559759] BUG: Bad rss-counter state mm:(____ptrval____) idx:0 val:-1 [ 2312.574803] BUG: Bad rss-counter state mm:(____ptrval____) idx:1 val:1 This patch fixes it by correcting the weird scatter list usage in the hns_roce_db_map_user() function. Feature or Bugfix:Bugfix Signed-off-by:
Xi Wang <wangxi11@huawei.com> Reviewed-by:
chenglang <chenglang@huawei.com> Reviewed-by:
oulijun <oulijun@huawei.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category: bugfix bugzilla: NA CVE: NA If we enabled alw_lcl_lpbk in promiscuous mode, packet whose source and destination mac address is equel will be handled in both inner loopback and outer loopback. This will halve performance of roce in promiscuous mode. Feature or Bugfix: Bugfix Signed-off-by:
Weihang Li <liweihang@hisilicon.com> Reviewed-by:
liyangyang20 <liyangyang20@huawei.com> Reviewed-by:
chenglang <chenglang@huawei.com> Reviewed-by:
liuyixian <liuyixian@huawei.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category: bugfix bugzilla: NA CVE: NA There is no need to tell users when eq->cons_index is overflow, we just set it back to zero. Feature or Bugfix: Bugfix Signed-off-by:
Weihang Li <liweihang@hisilicon.com> Reviewed-by:
liyangyang20 <liyangyang20@huawei.com> Reviewed-by:
liuyixian <liuyixian@huawei.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category: bugfix bugzilla: NA CVE: NA We should set IB_WC_WITH_VLAN only when VLAN is enabled. In addition, this patch move setting of IB_WC_WITH_SMAC below setting of wc->smac. Feature or Bugfix: Bugfix Signed-off-by:
Weihang Li <liweihang@hisilicon.com> Reviewed-by:
oulijun <oulijun@huawei.com> Reviewed-by:
liyangyang20 <liyangyang20@huawei.com> Reviewed-by:
liuyixian <liuyixian@huawei.com> Reviewed-by:
chenglang <chenglang@huawei.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category: bugfix bugzilla: NA CVE: NA For some case, the available queues of VF is less than hdev->max_rss_size. But the VF initializes the indir table with the max_rss_size, then we can see max queue id is larger show by ethtool -x is larger than ethtool -l. As HNS3 drier need alloc irq for ROCE, the param num_msi_left needs include ROCE irq, or the ROCE can not get right irq number and insmod fail. Feature or Bugfix:Bugfix Signed-off-by:
shenjian (K) <shenjian15@huawei.com> Reviewed-by:
lipeng <lipeng321@huawei.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category: bugfix bugzilla: NA CVE: NA The format specifier "%p" can leak kernel addresses, so we use "%pK" instead. The behavior of %pK depends on the kptr_restrict sysctl. Feature or Bugfix: Bugfix Signed-off-by:
Weihang Li <liweihang@hisilicon.com> Reviewed-by:
oulijun <oulijun@huawei.com> Reviewed-by:
liuyixian <liuyixian@huawei.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category: feature bugzilla: NA CVE: NA This patch add the flow that Send the NOTIFY spinup primitive when the disk response need spinup sense key. Signed-off-by:
Yupeng Zhou <zhouyupeng1@huawei.com> Reviewed-by:
luojian <luojian5@huawei.com> Reviewed-by:
chenxiang <chenxiang66@hisilicon.com> Reviewed-by:
Yang Yingliang <yangyingliang@huawei.com> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
driver inclusion category:bugfix bugzilla:4472 CVE:NA ------------------------------------------------------------------------ This patch fix add security check in hinic. Reviewed-by:
chiqijun <chiqijun@huawei.com> Signed-off-by:
Wu Like <wulike1@huawei.com> Signed-off-by:
Xue <xuechaojing@huawei.com> Reviewed-by:
Xie XiuQi <xiexiuqi@huawei.com> Signed-off-by:
Xie XiuQi <xiexiuqi@huawei.com>
-
branch (75 total) beside 2 already merged patches: 6461a45 perf header: Fix unchecked usage of strncpy() 49e9b49 mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemask Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
commit c5e2edeb upstream. GCC 8.1.0 reports that the ldadd instruction encoding, recently added to insn.c, doesn't match the mask and couldn't possibly be identified: linux/arch/arm64/include/asm/insn.h: In function 'aarch64_insn_is_ldadd': linux/arch/arm64/include/asm/insn.h:280:257: warning: bitwise comparison always evaluates to false [-Wtautological-compare] Bits [31:30] normally encode the size of the instruction (1 to 8 bytes) and the current instruction value only encodes the 4- and 8-byte variants. At the moment only the BPF JIT needs this instruction, and doesn't require the 1- and 2-byte variants, but to be consistent with our other ldr and str instruction encodings, clear the size field in the insn value. Fixes: 34b8ab09 ("bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd") Acked-by:
Daniel Borkmann <daniel@iogearbox.net> Reported-by:
Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by:
Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by:
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by:
Will Deacon <will.deacon@arm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
commit c7152763 upstream. Currently req->num_trbs is not reset after the TRBs are skipped and processed from the cancelled list. The gadget driver may reuse the request with an invalid req->num_trbs, and DWC3 will incorrectly skip trbs. To fix this, simply reset req->num_trbs to 0 after skipping through all of them. Fixes: c3acd590 ("usb: dwc3: gadget: use num_trbs when skipping TRBs on ->dequeue()") Signed-off-by:
Thinh Nguyen <thinhn@synopsys.com> Signed-off-by:
Felipe Balbi <felipe.balbi@linux.intel.com> Cc: Sasha Levin <sashal@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
commit c3bcde02 upstream. udp_tunnel(6)_xmit_skb() called by tipc_udp_xmit() expects a tunnel device to count packets on dev->tstats, a perpcu variable. However, TIPC is using udp tunnel with no tunnel device, and pass the lower dev, like veth device that only initializes dev->lstats(a perpcu variable) when creating it. Later iptunnel_xmit_stats() called by ip(6)tunnel_xmit() thinks the dev as a tunnel device, and uses dev->tstats instead of dev->lstats. tstats' each pointer points to a bigger struct than lstats, so when tstats->tx_bytes is increased, other percpu variable's members could be overwritten. syzbot has reported quite a few crashes due to fib_nh_common percpu member 'nhc_pcpu_rth_output' overwritten, call traces are like: BUG: KASAN: slab-out-of-bounds in rt_cache_valid+0x158/0x190 net/ipv4/route.c:1556 rt_cache_valid+0x158/0x190 net/ipv4/route.c:1556 __mkroute_output net/ipv4/route.c:2332 [inline] ip_route_output_key_hash_rcu+0x819/0x2d50 net/ipv4/route.c:2564 ip_route_output_key_hash+0x1ef/0x360 net/ipv4/route.c:2393 __ip_route_output_key include/net/route.h:125 [inline] ip_route_output_flow+0x28/0xc0 net/ipv4/route.c:2651 ip_route_output_key include/net/route.h:135 [inline] ... or: kasan: GPF could be caused by NULL-ptr deref or user memory access RIP: 0010:dst_dev_put+0x24/0x290 net/core/dst.c:168 <IRQ> rt_fibinfo_free_cpus net/ipv4/fib_semantics.c:200 [inline] free_fib_info_rcu+0x2e1/0x490 net/ipv4/fib_semantics.c:217 __rcu_reclaim kernel/rcu/rcu.h:240 [inline] rcu_do_batch kernel/rcu/tree.c:2437 [inline] invoke_rcu_callbacks kernel/rcu/tree.c:2716 [inline] rcu_process_callbacks+0x100a/0x1ac0 kernel/rcu/tree.c:2697 ... The issue exists since tunnel stats update is moved to iptunnel_xmit by Commit 039f5062 ("ip_tunnel: Move stats update to iptunnel_xmit()"), and here to fix it by passing a NULL tunnel dev to udp_tunnel(6)_xmit_skb so that the packets counting won't happen on dev->tstats. Reported-by:
<syzbot+9d4c12bfd45a58738d0a@syzkaller.appspotmail.com> Reported-by:
<syzbot+a9e23ea2aa21044c2798@syzkaller.appspotmail.com> Reported-by:
<syzbot+c4c4b2bb358bb936ad7e@syzkaller.appspotmail.com> Reported-by:
<syzbot+0290d2290a607e035ba1@syzkaller.appspotmail.com> Reported-by:
<syzbot+a43d8d4e7e8a7a9e149e@syzkaller.appspotmail.com> Reported-by:
<syzbot+a47c5f4c6c00fc1ed16e@syzkaller.appspotmail.com> Fixes: 039f5062 ("ip_tunnel: Move stats update to iptunnel_xmit()") Signed-off-by:
Xin Long <lucien.xin@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
commit 641114d2 upstream. gcc 9 now does allocation size tracking and thinks that passing the member of a union and then accessing beyond that member's bounds is an overflow. Instead of using the union member, use the entire union with a cast to get to the sockaddr. gcc will now know that the memory extends the full size of the union. Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
commit 42750351 upstream. The architecture implementations of 'arch_futex_atomic_op_inuser()' and 'futex_atomic_cmpxchg_inatomic()' are permitted to return only -EFAULT, -EAGAIN or -ENOSYS in the case of failure. Update the comments in the asm-generic/ implementation and also a stray reference in the robust futex documentation. Signed-off-by:
Will Deacon <will.deacon@arm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
commit 34b8ab09 upstream. Since ARMv8.1 supplement introduced LSE atomic instructions back in 2016, lets add support for STADD and use that in favor of LDXR / STXR loop for the XADD mapping if available. STADD is encoded as an alias for LDADD with XZR as the destination register, therefore add LDADD to the instruction encoder along with STADD as special case and use it in the JIT for CPUs that advertise LSE atomics in CPUID register. If immediate offset in the BPF XADD insn is 0, then use dst register directly instead of temporary one. Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Acked-by:
Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Acked-by:
Will Deacon <will.deacon@arm.com> Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
commit 8e4e0ac0 upstream. Returning an error code from futex_atomic_cmpxchg_inatomic() indicates that the caller should not make any use of *uval, and should instead act upon on the value of the error code. Although this is implemented correctly in our futex code, we needlessly copy uninitialised stack to *uval in the error case, which can easily be avoided. Signed-off-by:
Will Deacon <will.deacon@arm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
commit 4ac30c4b upstream. __udp6_lib_err() may be called when handling icmpv6 message. For example, the icmpv6 toobig(type=2). __udp6_lib_lookup() is then called which may call reuseport_select_sock(). reuseport_select_sock() will call into a bpf_prog (if there is one). reuseport_select_sock() is expecting the skb->data pointing to the transport header (udphdr in this case). For example, run_bpf_filter() is pulling the transport header. However, in the __udp6_lib_err() path, the skb->data is pointing to the ipv6hdr instead of the udphdr. One option is to pull and push the ipv6hdr in __udp6_lib_err(). Instead of doing this, this patch follows how the original commit 538950a1 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF") was done in IPv4, which has passed a NULL skb pointer to reuseport_select_sock(). Fixes: 538950a1 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF") Cc: Craig Gallek <kraig@google.com> Signed-off-by:
Martin KaFai Lau <kafai@fb.com> Acked-by:
Song Liu <songliubraving@fb.com> Acked-by:
Craig Gallek <kraig@google.com> Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
commit 257a525f upstream. When the commit a6024562 ("udp: Add GRO functions to UDP socket") added udp[46]_lib_lookup_skb to the udp_gro code path, it broke the reuseport_select_sock() assumption that skb->data is pointing to the transport header. This patch follows an earlier __udp6_lib_err() fix by passing a NULL skb to avoid calling the reuseport's bpf_prog. Fixes: a6024562 ("udp: Add GRO functions to UDP socket") Cc: Tom Herbert <tom@herbertland.com> Signed-off-by:
Martin KaFai Lau <kafai@fb.com> Acked-by:
Song Liu <songliubraving@fb.com> Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
commit 983695fa upstream. Intention of cgroup bind/connect/sendmsg BPF hooks is to act transparently to applications as also stated in original motivation in 7828f20e ("Merge branch 'bpf-cgroup-bind-connect'"). When recently integrating the latter two hooks into Cilium to enable host based load-balancing with Kubernetes, I ran into the issue that pods couldn't start up as DNS got broken. Kubernetes typically sets up DNS as a service and is thus subject to load-balancing. Upon further debugging, it turns out that the cgroupv2 sendmsg BPF hooks API is currently insufficient and thus not usable as-is for standard applications shipped with most distros. To break down the issue we ran into with a simple example: # cat /etc/resolv.conf nameserver 147.75.207.207 nameserver 147.75.207.208 For the purpose of a simple test, we set up above IPs as service IPs and transparently redirect traffic to a different DNS backend server for that node: # cilium service list ID Frontend Backend 1 147.75.207.207:53 1 => 8.8.8.8:53 2 147.75.207.208:53 1 => 8.8.8.8:53 The attached BPF program is basically selecting one of the backends if the service IP/port matches on the cgroup hook. DNS breaks here, because the hooks are not transparent enough to applications which have built-in msg_name address checks: # nslookup 1.1.1.1 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 [...] ;; connection timed out; no servers could be reached # dig 1.1.1.1 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 [...] ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1 ;; global options: +cmd ;; connection timed out; no servers could be reached For comparison, if none of the service IPs is used, and we tell nslookup to use 8.8.8.8 directly it works just fine, of course: # nslookup 1.1.1.1 8.8.8.8 1.1.1.1.in-addr.arpa name = one.one.one.one. In order to fix this and thus act more transparent to the application, this needs reverse translation on recvmsg() side. A minimal fix for this API is to add similar recvmsg() hooks behind the BPF cgroups static key such that the program can track state and replace the current sockaddr_in{,6} with the original service IP. From BPF side, this basically tracks the service tuple plus socket cookie in an LRU map where the reverse NAT can then be retrieved via map value as one example. Side-note: the BPF cgroups static key should be converted to a per-hook static key in future. Same example after this fix: # cilium service list ID Frontend Backend 1 147.75.207.207:53 1 => 8.8.8.8:53 2 147.75.207.208:53 1 => 8.8.8.8:53 Lookups work fine now: # nslookup 1.1.1.1 1.1.1.1.in-addr.arpa name = one.one.one.one. Authoritative answers can be found from: # dig 1.1.1.1 ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 51550 ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;1.1.1.1. IN A ;; AUTHORITY SECTION: . 23426 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2019052001 1800 900 604800 86400 ;; Query time: 17 msec ;; SERVER: 147.75.207.207#53(147.75.207.207) ;; WHEN: Tue May 21 12:59:38 UTC 2019 ;; MSG SIZE rcvd: 111 And from an actual packet level it shows that we're using the back end server when talking via 147.75.207.20{7,8} front end: # tcpdump -i any udp [...] 12:59:52.698732 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38) 12:59:52.698735 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38) 12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67) 12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67) [...] In order to be flexible and to have same semantics as in sendmsg BPF programs, we only allow return codes in [1,1] range. In the sendmsg case the program is called if msg->msg_name is present which can be the case in both, connected and unconnected UDP. The former only relies on the sockaddr_in{,6} passed via connect(2) if passed msg->msg_name was NULL. Therefore, on recvmsg side, we act in similar way to call into the BPF program whenever a non-NULL msg->msg_name was passed independent of sk->sk_state being TCP_ESTABLISHED or not. Note that for TCP case, the msg->msg_name is ignored in the regular recvmsg path and therefore not relevant. For the case of ip{,v6}_recv_error() paths, picked up via MSG_ERRQUEUE, the hook is not called. This is intentional as it aligns with the same semantics as in case of TCP cgroup BPF hooks right now. This might be better addressed in future through a different bpf_attach_type such that this case can be distinguished from the regular recvmsg paths, for example. Fixes: 1cedee13 ("bpf: Hooks for sys_sendmsg") Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Acked-by:
Andrey Ignatov <rdna@fb.com> Acked-by:
Martin KaFai Lau <kafai@fb.com> Acked-by:
Martynas Pumputis <m@lambda.lt> Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
commit 9594dc3c upstream. BPF_PROG_TYPE_RAW_TRACEPOINTs can be executed nested on the same CPU, as they do not increment bpf_prog_active while executing. This enables three levels of nesting, to support - a kprobe or raw tp or perf event, - another one of the above that irq context happens to call, and - another one in nmi context (at most one of which may be a kprobe or perf event). Fixes: 20b9d7ac ("bpf: avoid excessive stack usage for perf_sample_data") Signed-off-by:
Matt Mullins <mmullins@fb.com> Acked-by:
Andrii Nakryiko <andriin@fb.com> Acked-by:
Daniel Borkmann <daniel@iogearbox.net> Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-
commit da2577fd upstream. If the leftmost parent node of the tree has does not have a child on the left side, then trie_get_next_key (and bpftool map dump) will not look at the child on the right. This leads to the traversal missing elements. Lookup is not affected. Update selftest to handle this case. Reproducer: bpftool map create /sys/fs/bpf/lpm type lpm_trie key 6 \ value 1 entries 256 name test_lpm flags 1 bpftool map update pinned /sys/fs/bpf/lpm key 8 0 0 0 0 0 value 1 bpftool map update pinned /sys/fs/bpf/lpm key 16 0 0 0 0 128 value 2 bpftool map dump pinned /sys/fs/bpf/lpm Returns only 1 element. (2 expected) Fixes: b471f2f1 ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE") Signed-off-by:
Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by:
Martin KaFai Lau <kafai@fb.com> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Yang Yingliang <yangyingliang@huawei.com>
-