<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/net/core, branch v5.4.113</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.4.113</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.4.113'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2021-04-14T06:24:12Z</updated>
<entry>
<title>net: let skb_orphan_partial wake-up waiters.</title>
<updated>2021-04-14T06:24:12Z</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2021-03-30T16:43:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b02bded94b91ca27888d9663e690e1c968d06c45'/>
<id>urn:sha1:b02bded94b91ca27888d9663e690e1c968d06c45</id>
<content type='text'>
commit 9adc89af724f12a03b47099cd943ed54e877cd59 upstream.

Currently the mentioned helper can end-up freeing the socket wmem
without waking-up any processes waiting for more write memory.

If the partially orphaned skb is attached to an UDP (or raw) socket,
the lack of wake-up can hang the user-space.

Even for TCP sockets not calling the sk destructor could have bad
effects on TSQ.

Address the issue using skb_orphan to release the sk wmem before
setting the new sock_efree destructor. Additionally bundle the
whole ownership update in a new helper, so that later other
potential users could avoid duplicate code.

v1 -&gt; v2:
 - use skb_orphan() instead of sort of open coding it (Eric)
 - provide an helper for the ownership change (Eric)

Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()")
Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bpf: Remove MTU check in __bpf_skb_max_len</title>
<updated>2021-04-07T12:47:41Z</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>brouer@redhat.com</email>
</author>
<published>2021-02-09T13:38:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=42c83e3bca434d9f63c58f9cbf2881e635679fee'/>
<id>urn:sha1:42c83e3bca434d9f63c58f9cbf2881e635679fee</id>
<content type='text'>
commit 6306c1189e77a513bf02720450bb43bd4ba5d8ae upstream.

Multiple BPF-helpers that can manipulate/increase the size of the SKB uses
__bpf_skb_max_len() as the max-length. This function limit size against
the current net_device MTU (skb-&gt;dev-&gt;mtu).

When a BPF-prog grow the packet size, then it should not be limited to the
MTU. The MTU is a transmit limitation, and software receiving this packet
should be allowed to increase the size. Further more, current MTU check in
__bpf_skb_max_len uses the MTU from ingress/current net_device, which in
case of redirects uses the wrong net_device.

This patch keeps a sanity max limit of SKB_MAX_ALLOC (16KiB). The real limit
is elsewhere in the system. Jesper's testing[1] showed it was not possible
to exceed 8KiB when expanding the SKB size via BPF-helper. The limiting
factor is the define KMALLOC_MAX_CACHE_SIZE which is 8192 for
SLUB-allocator (CONFIG_SLUB) in-case PAGE_SIZE is 4096. This define is
in-effect due to this being called from softirq context see code
__gfp_pfmemalloc_flags() and __do_kmalloc_node(). Jakub's testing showed
that frames above 16KiB can cause NICs to reset (but not crash). Keep this
sanity limit at this level as memory layer can differ based on kernel
config.

[1] https://github.com/xdp-project/bpf-examples/tree/master/MTU-tests

Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/161287788936.790810.2937823995775097177.stgit@firesoul
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>flow_dissector: fix TTL and TOS dissection on IPv4 fragments</title>
<updated>2021-04-07T12:47:40Z</updated>
<author>
<name>Davide Caratti</name>
<email>dcaratti@redhat.com</email>
</author>
<published>2021-02-12T15:12:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7ca4feb37e9ef0c9a044f5437b4d483c69313186'/>
<id>urn:sha1:7ca4feb37e9ef0c9a044f5437b4d483c69313186</id>
<content type='text'>
[ Upstream commit d2126838050ccd1dadf310ffb78b2204f3b032b9 ]

the following command:

 # tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \
   $tcflags dst_ip 192.0.2.2 ip_ttl 63 action drop

doesn't drop all IPv4 packets that match the configured TTL / destination
address. In particular, if "fragment offset" or "more fragments" have non
zero value in the IPv4 header, setting of FLOW_DISSECTOR_KEY_IP is simply
ignored. Fix this dissecting IPv4 TTL and TOS before fragment info; while
at it, add a selftest for tc flower's match on 'ip_ttl' that verifies the
correct behavior.

Fixes: 518d8a2e9bad ("net/flow_dissector: add support for dissection of misc ip header fields")
Reported-by: Shuang Li &lt;shuali@redhat.com&gt;
Signed-off-by: Davide Caratti &lt;dcaratti@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>can: dev: Move device back to init netns on owning netns delete</title>
<updated>2021-03-30T12:35:29Z</updated>
<author>
<name>Martin Willi</name>
<email>martin@strongswan.org</email>
</author>
<published>2021-03-02T12:24:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f866d1fa48e40af651d9c78b7f3c5a9a6ccf1815'/>
<id>urn:sha1:f866d1fa48e40af651d9c78b7f3c5a9a6ccf1815</id>
<content type='text'>
commit 3a5ca857079ea022e0b1b17fc154f7ad7dbc150f upstream.

When a non-initial netns is destroyed, the usual policy is to delete
all virtual network interfaces contained, but move physical interfaces
back to the initial netns. This keeps the physical interface visible
on the system.

CAN devices are somewhat special, as they define rtnl_link_ops even
if they are physical devices. If a CAN interface is moved into a
non-initial netns, destroying that netns lets the interface vanish
instead of moving it back to the initial netns. default_device_exit()
skips CAN interfaces due to having rtnl_link_ops set. Reproducer:

  ip netns add foo
  ip link set can0 netns foo
  ip netns delete foo

WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60
CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1
Workqueue: netns cleanup_net
[&lt;c010e700&gt;] (unwind_backtrace) from [&lt;c010a1d8&gt;] (show_stack+0x10/0x14)
[&lt;c010a1d8&gt;] (show_stack) from [&lt;c086dc10&gt;] (dump_stack+0x94/0xa8)
[&lt;c086dc10&gt;] (dump_stack) from [&lt;c086b938&gt;] (__warn+0xb8/0x114)
[&lt;c086b938&gt;] (__warn) from [&lt;c086ba10&gt;] (warn_slowpath_fmt+0x7c/0xac)
[&lt;c086ba10&gt;] (warn_slowpath_fmt) from [&lt;c0629f20&gt;] (ops_exit_list+0x38/0x60)
[&lt;c0629f20&gt;] (ops_exit_list) from [&lt;c062a5c4&gt;] (cleanup_net+0x230/0x380)
[&lt;c062a5c4&gt;] (cleanup_net) from [&lt;c0142c20&gt;] (process_one_work+0x1d8/0x438)
[&lt;c0142c20&gt;] (process_one_work) from [&lt;c0142ee4&gt;] (worker_thread+0x64/0x5a8)
[&lt;c0142ee4&gt;] (worker_thread) from [&lt;c0148a98&gt;] (kthread+0x148/0x14c)
[&lt;c0148a98&gt;] (kthread) from [&lt;c0100148&gt;] (ret_from_fork+0x14/0x2c)

To properly restore physical CAN devices to the initial netns on owning
netns exit, introduce a flag on rtnl_link_ops that can be set by drivers.
For CAN devices setting this flag, default_device_exit() considers them
non-virtual, applying the usual namespace move.

The issue was introduced in the commit mentioned below, as at that time
CAN devices did not have a dellink() operation.

Fixes: e008b5fc8dc7 ("net: Simplfy default_device_exit and improve batching.")
Link: https://lore.kernel.org/r/20210302122423.872326-1-martin@strongswan.org
Signed-off-by: Martin Willi &lt;martin@strongswan.org&gt;
Signed-off-by: Marc Kleine-Budde &lt;mkl@pengutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>pktgen: fix misuse of BUG_ON() in pktgen_thread_worker()</title>
<updated>2021-03-07T11:20:44Z</updated>
<author>
<name>Di Zhu</name>
<email>zhudi21@huawei.com</email>
</author>
<published>2021-01-25T12:42:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=46813e4a7caa280f6a0e478b8882e2c56aa8b9be'/>
<id>urn:sha1:46813e4a7caa280f6a0e478b8882e2c56aa8b9be</id>
<content type='text'>
[ Upstream commit 275b1e88cabb34dbcbe99756b67e9939d34a99b6 ]

pktgen create threads for all online cpus and bond these threads to
relevant cpu repecivtily. when this thread firstly be woken up, it
will compare cpu currently running with the cpu specified at the time
of creation and if the two cpus are not equal, BUG_ON() will take effect
causing panic on the system.
Notice that these threads could be migrated to other cpus before start
running because of the cpu hotplug after these threads have created. so the
BUG_ON() used here seems unreasonable and we can replace it with WARN_ON()
to just printf a warning other than panic the system.

Signed-off-by: Di Zhu &lt;zhudi21@huawei.com&gt;
Link: https://lore.kernel.org/r/20210125124229.19334-1-zhudi21@huawei.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: fix dev_ifsioc_locked() race condition</title>
<updated>2021-03-07T11:20:43Z</updated>
<author>
<name>Cong Wang</name>
<email>cong.wang@bytedance.com</email>
</author>
<published>2021-02-11T19:34:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=176188cff67ec1aa55103647b61d02315cc38e98'/>
<id>urn:sha1:176188cff67ec1aa55103647b61d02315cc38e98</id>
<content type='text'>
commit 3b23a32a63219f51a5298bc55a65ecee866e79d0 upstream.

dev_ifsioc_locked() is called with only RCU read lock, so when
there is a parallel writer changing the mac address, it could
get a partially updated mac address, as shown below:

Thread 1			Thread 2
// eth_commit_mac_addr_change()
memcpy(dev-&gt;dev_addr, addr-&gt;sa_data, ETH_ALEN);
				// dev_ifsioc_locked()
				memcpy(ifr-&gt;ifr_hwaddr.sa_data,
					dev-&gt;dev_addr,...);

Close this race condition by guarding them with a RW semaphore,
like netdev_get_name(). We can not use seqlock here as it does not
allow blocking. The writers already take RTNL anyway, so this does
not affect the slow path. To avoid bothering existing
dev_set_mac_address() callers in drivers, introduce a new wrapper
just for user-facing callers on ioctl and rtnetlink paths.

Note, bonding also changes slave mac addresses but that requires
a separate patch due to the complexity of bonding code.

Fixes: 3710becf8a58 ("net: RCU locking for simple ioctl()")
Reported-by: "Gong, Sishuai" &lt;sishuai@purdue.edu&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Cong Wang &lt;cong.wang@bytedance.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net: fix up truesize of cloned skb in skb_prepare_for_shift()</title>
<updated>2021-03-07T11:20:42Z</updated>
<author>
<name>Marco Elver</name>
<email>elver@google.com</email>
</author>
<published>2021-02-01T16:04:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e6af7cb64b7b37b8a371acdbe33a636496ffb5a4'/>
<id>urn:sha1:e6af7cb64b7b37b8a371acdbe33a636496ffb5a4</id>
<content type='text'>
commit 097b9146c0e26aabaa6ff3e5ea536a53f5254a79 upstream.

Avoid the assumption that ksize(kmalloc(S)) == ksize(kmalloc(S)): when
cloning an skb, save and restore truesize after pskb_expand_head(). This
can occur if the allocator decides to service an allocation of the same
size differently (e.g. use a different size class, or pass the
allocation on to KFENCE).

Because truesize is used for bookkeeping (such as sk_wmem_queued), a
modified truesize of a cloned skb may result in corrupt bookkeeping and
relevant warnings (such as in sk_stream_kill_queues()).

Link: https://lkml.kernel.org/r/X9JR/J6dMMOy1obu@elver.google.com
Reported-by: syzbot+7b99aafdcc2eedea6178@syzkaller.appspotmail.com
Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Marco Elver &lt;elver@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://lore.kernel.org/r/20210201160420.2826895-1-elver@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bpf: Fix bpf_fib_lookup helper MTU check for SKB ctx</title>
<updated>2021-03-04T09:26:17Z</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>brouer@redhat.com</email>
</author>
<published>2021-02-09T13:38:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=88f8f40c901c3c8b7739be3636a7368d1f74b441'/>
<id>urn:sha1:88f8f40c901c3c8b7739be3636a7368d1f74b441</id>
<content type='text'>
[ Upstream commit 2c0a10af688c02adcf127aad29e923e0056c6b69 ]

BPF end-user on Cilium slack-channel (Carlo Carraro) wants to use
bpf_fib_lookup for doing MTU-check, but *prior* to extending packet size,
by adjusting fib_params 'tot_len' with the packet length plus the expected
encap size. (Just like the bpf_check_mtu helper supports). He discovered
that for SKB ctx the param-&gt;tot_len was not used, instead skb-&gt;len was used
(via MTU check in is_skb_forwardable() that checks against netdev MTU).

Fix this by using fib_params 'tot_len' for MTU check. If not provided (e.g.
zero) then keep existing TC behaviour intact. Notice that 'tot_len' for MTU
check is done like XDP code-path, which checks against FIB-dst MTU.

V16:
- Revert V13 optimization, 2nd lookup is against egress/resulting netdev

V13:
- Only do ifindex lookup one time, calling dev_get_by_index_rcu().

V10:
- Use same method as XDP for 'tot_len' MTU check

Fixes: 4c79579b44b1 ("bpf: Change bpf_fib_lookup to return lookup status")
Reported-by: Carlo Carraro &lt;colrack@gmail.com&gt;
Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/161287789444.790810.15247494756551413508.stgit@firesoul
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: gro: do not keep too many GRO packets in napi-&gt;rx_list</title>
<updated>2021-02-17T09:35:19Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2021-02-04T21:31:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2085d886630a5efab216e677d03b51916cd77ff1'/>
<id>urn:sha1:2085d886630a5efab216e677d03b51916cd77ff1</id>
<content type='text'>
commit 8dc1c444df193701910f5e80b5d4caaf705a8fb0 upstream.

Commit c80794323e82 ("net: Fix packet reordering caused by GRO and
listified RX cooperation") had the unfortunate effect of adding
latencies in common workloads.

Before the patch, GRO packets were immediately passed to
upper stacks.

After the patch, we can accumulate quite a lot of GRO
packets (depdending on NAPI budget).

My fix is counting in napi-&gt;rx_count number of segments
instead of number of logical packets.

Fixes: c80794323e82 ("net: Fix packet reordering caused by GRO and listified RX cooperation")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Bisected-by: John Sperbeck &lt;jsperbeck@google.com&gt;
Tested-by: Jian Yang &lt;jianyang@google.com&gt;
Cc: Maxim Mikityanskiy &lt;maximmi@mellanox.com&gt;
Reviewed-by: Saeed Mahameed &lt;saeedm@nvidia.com&gt;
Reviewed-by: Edward Cree &lt;ecree.xilinx@gmail.com&gt;
Reviewed-by: Alexander Lobakin &lt;alobakin@pm.me&gt;
Link: https://lore.kernel.org/r/20210204213146.4192368-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>udp: fix skb_copy_and_csum_datagram with odd segment sizes</title>
<updated>2021-02-17T09:35:19Z</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2021-02-03T19:29:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0b42ab078369318a99c925508f3532a7d5288b4f'/>
<id>urn:sha1:0b42ab078369318a99c925508f3532a7d5288b4f</id>
<content type='text'>
commit 52cbd23a119c6ebf40a527e53f3402d2ea38eccb upstream.

When iteratively computing a checksum with csum_block_add, track the
offset "pos" to correctly rotate in csum_block_add when offset is odd.

The open coded implementation of skb_copy_and_csum_datagram did this.
With the switch to __skb_datagram_iter calling csum_and_copy_to_iter,
pos was reinitialized to 0 on each call.

Bring back the pos by passing it along with the csum to the callback.

Changes v1-&gt;v2
  - pass csum value, instead of csump pointer (Alexander Duyck)

Link: https://lore.kernel.org/netdev/20210128152353.GB27281@optiplex/
Fixes: 950fcaecd5cc ("datagram: consolidate datagram copy to iter helpers")
Reported-by: Oliver Graute &lt;oliver.graute@gmail.com&gt;
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Reviewed-by: Alexander Duyck &lt;alexanderduyck@fb.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://lore.kernel.org/r/20210203192952.1849843-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
