user/sven/linux.git/net/core, branch v4.4.266

can: dev: Move device back to init netns on owning netns delete

2021-03-30T12:45:01Z

commit 3a5ca857079ea022e0b1b17fc154f7ad7dbc150f upstream. When a non-initial netns is destroyed, the usual policy is to delete all virtual network interfaces contained, but move physical interfaces back to the initial netns. This keeps the physical interface visible on the system. CAN devices are somewhat special, as they define rtnl_link_ops even if they are physical devices. If a CAN interface is moved into a non-initial netns, destroying that netns lets the interface vanish instead of moving it back to the initial netns. default_device_exit() skips CAN interfaces due to having rtnl_link_ops set. Reproducer: ip netns add foo ip link set can0 netns foo ip netns delete foo WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60 CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1 Workqueue: netns cleanup_net [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [] (show_stack) from [] (dump_stack+0x94/0xa8) [] (dump_stack) from [] (__warn+0xb8/0x114) [] (__warn) from [] (warn_slowpath_fmt+0x7c/0xac) [] (warn_slowpath_fmt) from [] (ops_exit_list+0x38/0x60) [] (ops_exit_list) from [] (cleanup_net+0x230/0x380) [] (cleanup_net) from [] (process_one_work+0x1d8/0x438) [] (process_one_work) from [] (worker_thread+0x64/0x5a8) [] (worker_thread) from [] (kthread+0x148/0x14c) [] (kthread) from [] (ret_from_fork+0x14/0x2c) To properly restore physical CAN devices to the initial netns on owning netns exit, introduce a flag on rtnl_link_ops that can be set by drivers. For CAN devices setting this flag, default_device_exit() considers them non-virtual, applying the usual namespace move. The issue was introduced in the commit mentioned below, as at that time CAN devices did not have a dellink() operation. Fixes: e008b5fc8dc7 ("net: Simplfy default_device_exit and improve batching.") Link: https://lore.kernel.org/r/20210302122423.872326-1-martin@strongswan.org Signed-off-by: Martin Willi Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman

pktgen: fix misuse of BUG_ON() in pktgen_thread_worker()

2021-03-07T10:24:21Z

[ Upstream commit 275b1e88cabb34dbcbe99756b67e9939d34a99b6 ] pktgen create threads for all online cpus and bond these threads to relevant cpu repecivtily. when this thread firstly be woken up, it will compare cpu currently running with the cpu specified at the time of creation and if the two cpus are not equal, BUG_ON() will take effect causing panic on the system. Notice that these threads could be migrated to other cpus before start running because of the cpu hotplug after these threads have created. so the BUG_ON() used here seems unreasonable and we can replace it with WARN_ON() to just printf a warning other than panic the system. Signed-off-by: Di Zhu Link: https://lore.kernel.org/r/20210125124229.19334-1-zhudi21@huawei.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin

net: fix up truesize of cloned skb in skb_prepare_for_shift()

2021-03-07T10:24:20Z

commit 097b9146c0e26aabaa6ff3e5ea536a53f5254a79 upstream. Avoid the assumption that ksize(kmalloc(S)) == ksize(kmalloc(S)): when cloning an skb, save and restore truesize after pskb_expand_head(). This can occur if the allocator decides to service an allocation of the same size differently (e.g. use a different size class, or pass the allocation on to KFENCE). Because truesize is used for bookkeeping (such as sk_wmem_queued), a modified truesize of a cloned skb may result in corrupt bookkeeping and relevant warnings (such as in sk_stream_kill_queues()). Link: https://lkml.kernel.org/r/X9JR/J6dMMOy1obu@elver.google.com Reported-by: syzbot+7b99aafdcc2eedea6178@syzkaller.appspotmail.com Suggested-by: Eric Dumazet Signed-off-by: Marco Elver Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20210201160420.2826895-1-elver@google.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman

skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() too

2021-01-30T12:25:57Z

commit 66c556025d687dbdd0f748c5e1df89c977b6c02a upstream. Commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") ensured that skbs with data size lower than 1025 bytes will be kmalloc'ed to avoid excessive page cache fragmentation and memory consumption. However, the fix adressed only __napi_alloc_skb() (primarily for virtio_net and napi_get_frags()), but the issue can still be achieved through __netdev_alloc_skb(), which is still used by several drivers. Drivers often allocate a tiny skb for headers and place the rest of the frame to frags (so-called copybreak). Mirror the condition to __netdev_alloc_skb() to handle this case too. Since v1 [0]: - fix "Fixes:" tag; - refine commit message (mention copybreak usecase). [0] https://lore.kernel.org/netdev/20210114235423.232737-1-alobakin@pm.me Fixes: a1c7fff7e18f ("net: netdev_alloc_skb() use build_skb()") Signed-off-by: Alexander Lobakin Link: https://lore.kernel.org/r/20210115150354.85967-1-alobakin@pm.me Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman

net: avoid 32 x truesize under-estimation for tiny skbs

2021-01-23T14:36:56Z

[ Upstream commit 3226b158e67cfaa677fd180152bfb28989cb2fac ] Both virtio net and napi_get_frags() allocate skbs with a very small skb->head While using page fragments instead of a kmalloc backed skb->head might give a small performance improvement in some cases, there is a huge risk of under estimating memory usage. For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations per page (order-3 page in x86), or even 64 on PowerPC We have been tracking OOM issues on GKE hosts hitting tcp_mem limits but consuming far more memory for TCP buffers than instructed in tcp_mem[2] Even if we force napi_alloc_skb() to only use order-0 pages, the issue would still be there on arches with PAGE_SIZE >= 32768 This patch makes sure that small skb head are kmalloc backed, so that other objects in the slab page can be reused instead of being held as long as skbs are sitting in socket queues. Note that we might in the future use the sk_buff napi cache, instead of going through a more expensive __alloc_skb() Another idea would be to use separate page sizes depending on the allocated length (to never have more than 4 frags per page) I would like to thank Greg Thelen for his precious help on this matter, analysing crash dumps is always a time consuming task. Fixes: fd11a83dd363 ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb") Signed-off-by: Eric Dumazet Cc: Paolo Abeni Cc: Greg Thelen Reviewed-by: Alexander Duyck Acked-by: Michael S. Tsirkin Link: https://lore.kernel.org/r/20210113161819.1155526-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman

net: drop bogus skb with CHECKSUM_PARTIAL and offset beyond end of trimmed packet

2021-01-17T12:55:14Z

commit 54970a2fbb673f090b7f02d7f57b10b2e0707155 upstream. syzbot reproduces BUG_ON in skb_checksum_help(): tun creates (bogus) skb with huge partial-checksummed area and small ip packet inside. Then ip_rcv trims the skb based on size of internal ip packet, after that csum offset points beyond of trimmed skb. Then checksum_tg() called via netfilter hook triggers BUG_ON: offset = skb_checksum_start_offset(skb); BUG_ON(offset >= skb_headlen(skb)); To work around the problem this patch forces pskb_trim_rcsum_slow() to return -EINVAL in described scenario. It allows its callers to drop such kind of packets. Link: https://syzkaller.appspot.com/bug?id=b419a5ca95062664fe1a60b764621eb4526e2cd0 Reported-by: syzbot+7010af67ced6105e5ab6@syzkaller.appspotmail.com Signed-off-by: Vasily Averin Acked-by: Willem de Bruijn Link: https://lore.kernel.org/r/1b2494af-2c56-8ee2-7bc0-923fcad1cdf8@virtuozzo.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman

net: Have netpoll bring-up DSA management interface

2020-11-24T11:48:10Z

[ Upstream commit 1532b9778478577152201adbafa7738b1e844868 ] DSA network devices rely on having their DSA management interface up and running otherwise their ndo_open() will return -ENETDOWN. Without doing this it would not be possible to use DSA devices as netconsole when configured on the command line. These devices also do not utilize the upper/lower linking so the check about the netpoll device having upper is not going to be a problem. The solution adopted here is identical to the one done for net/ipv4/ipconfig.c with 728c02089a0e ("net: ipv4: handle DSA enabled master network devices"), with the network namespace scope being restricted to that of the process configuring netpoll. Fixes: 04ff53f96a93 ("net: dsa: Add netconsole support") Tested-by: Vladimir Oltean Signed-off-by: Florian Fainelli Link: https://lore.kernel.org/r/20201117035236.22658-1-f.fainelli@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman

neigh_stat_seq_next() should increase position index

2020-10-01T09:11:51Z

[ Upstream commit 1e3f9f073c47bee7c23e77316b07bc12338c5bba ] if seq_file .next fuction does not change position index, read after some lseek can generate unexpected output. https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin Signed-off-by: David S. Miller Signed-off-by: Sasha Levin

net: disable netpoll on fresh napis

2020-09-12T09:45:33Z

[ Upstream commit 96e97bc07e90f175a8980a22827faf702ca4cb30 ] napi_disable() makes sure to set the NAPI_STATE_NPSVC bit to prevent netpoll from accessing rings before init is complete. However, the same is not done for fresh napi instances in netif_napi_add(), even though we expect NAPI instances to be added as disabled. This causes crashes during driver reconfiguration (enabling XDP, changing the channel count) - if there is any printk() after netif_napi_add() but before napi_enable(). To ensure memory ordering is correct we need to use RCU accessors. Reported-by: Rob Sherwood Fixes: 2d8bff12699a ("netpoll: Close race condition between poll_one_napi and napi_disable") Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman

net: Fix potential wrong skb->protocol in skb_vlan_untag()

2020-09-03T09:19:22Z

[ Upstream commit 55eff0eb7460c3d50716ed9eccf22257b046ca92 ] We may access the two bytes after vlan_hdr in vlan_set_encap_proto(). So we should pull VLAN_HLEN + sizeof(unsigned short) in skb_vlan_untag() or we may access the wrong data. Fixes: 0d5501c1c828 ("net: Always untag vlan-tagged traffic on input.") Signed-off-by: Miaohe Lin Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman