<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/netdevice.h, branch v6.12.7</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.12.7</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.12.7'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2024-10-21T10:54:25Z</updated>
<entry>
<title>net: fix races in netdev_tx_sent_queue()/dev_watchdog()</title>
<updated>2024-10-21T10:54:25Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2024-10-15T19:41:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=95ecba62e2fd201bcdcca636f5d774f1cd4f1458'/>
<id>urn:sha1:95ecba62e2fd201bcdcca636f5d774f1cd4f1458</id>
<content type='text'>
Some workloads hit the infamous dev_watchdog() message:

"NETDEV WATCHDOG: eth0 (xxxx): transmit queue XX timed out"

It seems possible to hit this even for perfectly normal
BQL enabled drivers:

1) Assume a TX queue was idle for more than dev-&gt;watchdog_timeo
   (5 seconds unless changed by the driver)

2) Assume a big packet is sent, exceeding current BQL limit.

3) Driver ndo_start_xmit() puts the packet in TX ring,
   and netdev_tx_sent_queue() is called.

4) QUEUE_STATE_STACK_XOFF could be set from netdev_tx_sent_queue()
   before txq-&gt;trans_start has been written.

5) txq-&gt;trans_start is written later, from netdev_start_xmit()

    if (rc == NETDEV_TX_OK)
          txq_trans_update(txq)

dev_watchdog() running on another cpu could read the old
txq-&gt;trans_start, and then see QUEUE_STATE_STACK_XOFF, because 5)
did not happen yet.

To solve the issue, write txq-&gt;trans_start right before one XOFF bit
is set :

- _QUEUE_STATE_DRV_XOFF from netif_tx_stop_queue()
- __QUEUE_STATE_STACK_XOFF from netdev_tx_sent_queue()

From dev_watchdog(), we have to read txq-&gt;state before txq-&gt;trans_start.

Add memory barriers to enforce correct ordering.

In the future, we could avoid writing over txq-&gt;trans_start for normal
operations, and rename this field to txq-&gt;xoff_start_time.

Fixes: bec251bc8b6a ("net: no longer stop all TX queues in dev_watchdog()")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Willem de Bruijn &lt;willemb@google.com&gt;
Reviewed-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Link: https://patch.msgid.link/20241015194118.3951657-1-edumazet@google.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>net: Fix gso_features_check to check for both dev-&gt;gso_{ipv4_,}max_size</title>
<updated>2024-10-01T08:48:52Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2024-09-23T21:22:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e609c959a939660c7519895f853dfa5624c6827a'/>
<id>urn:sha1:e609c959a939660c7519895f853dfa5624c6827a</id>
<content type='text'>
Commit 24ab059d2ebd ("net: check dev-&gt;gso_max_size in gso_features_check()")
added a dev-&gt;gso_max_size test to gso_features_check() in order to fall
back to GSO when needed.

This was added as it was noticed that some drivers could misbehave if TSO
packets get too big. However, the check doesn't respect dev-&gt;gso_ipv4_max_size
limit. For instance, a device could be configured with BIG TCP for IPv4,
but not IPv6.

Therefore, add a netif_get_gso_max_size() equivalent to netif_get_gro_max_size()
and use the helper to respect both limits before falling back to GSO engine.

Fixes: 24ab059d2ebd ("net: check dev-&gt;gso_max_size in gso_features_check()")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Paolo Abeni &lt;pabeni@redhat.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/20240923212242.15669-2-daniel@iogearbox.net
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>net: Add netif_get_gro_max_size helper for GRO</title>
<updated>2024-10-01T08:48:51Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2024-09-23T21:22:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e8d4d34df715133c319fabcf63fdec684be75ff8'/>
<id>urn:sha1:e8d4d34df715133c319fabcf63fdec684be75ff8</id>
<content type='text'>
Add a small netif_get_gro_max_size() helper which returns the maximum IPv4
or IPv6 GRO size of the netdevice.

We later add a netif_get_gso_max_size() equivalent as well for GSO, so that
these helpers can be used consistently instead of open-coded checks.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Paolo Abeni &lt;pabeni@redhat.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/20240923212242.15669-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>netdev: support binding dma-buf to netdevice</title>
<updated>2024-09-12T03:44:31Z</updated>
<author>
<name>Mina Almasry</name>
<email>almasrymina@google.com</email>
</author>
<published>2024-09-10T17:14:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=170aafe35cb98e0f3fbacb446ea86389fbce22ea'/>
<id>urn:sha1:170aafe35cb98e0f3fbacb446ea86389fbce22ea</id>
<content type='text'>
Add a netdev_dmabuf_binding struct which represents the
dma-buf-to-netdevice binding. The netlink API will bind the dma-buf to
rx queues on the netdevice. On the binding, the dma_buf_attach
&amp; dma_buf_map_attachment will occur. The entries in the sg_table from
mapping will be inserted into a genpool to make it ready
for allocation.

The chunks in the genpool are owned by a dmabuf_chunk_owner struct which
holds the dma-buf offset of the base of the chunk and the dma_addr of
the chunk. Both are needed to use allocations that come from this chunk.

We create a new type that represents an allocation from the genpool:
net_iov. We setup the net_iov allocation size in the
genpool to PAGE_SIZE for simplicity: to match the PAGE_SIZE normally
allocated by the page pool and given to the drivers.

The user can unbind the dmabuf from the netdevice by closing the netlink
socket that established the binding. We do this so that the binding is
automatically unbound even if the userspace process crashes.

The binding and unbinding leaves an indicator in struct netdev_rx_queue
that the given queue is bound, and the binding is actuated by resetting
the rx queue using the queue API.

The netdev_dmabuf_binding struct is refcounted, and releases its
resources only when all the refs are released.

Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: Kaiyuan Zhang &lt;kaiyuanz@google.com&gt;
Signed-off-by: Mina Almasry &lt;almasrymina@google.com&gt;
Reviewed-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt; # excluding netlink
Acked-by: Daniel Vetter &lt;daniel.vetter@ffwll.ch&gt;
Reviewed-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Link: https://patch.msgid.link/20240910171458.219195-4-almasrymina@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue</title>
<updated>2024-09-12T03:24:43Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2024-09-12T03:24:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=24b8c19314fa92baf03f2cea19d017789889a5b3'/>
<id>urn:sha1:24b8c19314fa92baf03f2cea19d017789889a5b3</id>
<content type='text'>
Tony Nguyen says:

====================
idpf: XDP chapter II: convert Tx completion to libeth

Alexander Lobakin says:

XDP for idpf is currently 5 chapters:
* convert Rx to libeth;
* convert Tx completion to libeth (this);
* generic XDP and XSk code changes;
* actual XDP for idpf via libeth_xdp;
* XSk for idpf (^).

Part II does the following:
* adds generic libeth Tx completion routines;
* converts idpf to use generic libeth Tx comp routines;
* fixes Tx queue timeouts and robustifies Tx completion in general;
* fixes Tx event/descriptor flushes (writebacks).

Most idpf patches again remove more lines than adds.
Generic Tx completion helpers and structs are needed as libeth_xdp
(Ch. III) makes use of them. WB_ON_ITR is needed since XDPSQs don't
want to work without it at all. Tx queue timeouts fixes are needed
since without them, it's way easier to catch a Tx timeout event when
WB_ON_ITR is enabled.

* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  idpf: enable WB_ON_ITR
  idpf: fix netdev Tx queue stop/wake
  idpf: refactor Tx completion routines
  netdevice: add netdev_tx_reset_subqueue() shorthand
  idpf: convert to libeth Tx buffer completion
  libeth: add Tx buffer completion helpers
====================

Link: https://patch.msgid.link/20240909205323.3110312-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: remove dev_pick_tx_cpu_id()</title>
<updated>2024-09-09T23:53:38Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2024-09-06T16:10:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=17245a195df4c86b1fd38718d8cdc532c040c08e'/>
<id>urn:sha1:17245a195df4c86b1fd38718d8cdc532c040c08e</id>
<content type='text'>
dev_pick_tx_cpu_id() has been introduced with two users by
commit a4ea8a3dacc3 ("net: Add generic ndo_select_queue functions").
The use in AF_PACKET has been removed in 2019 by
commit b71b5837f871 ("packet: rework packet_pick_tx_queue() to use common code selection")
The other user was a Netlogic XLP driver, removed in 2021 by
commit 47ac6f567c28 ("staging: Remove Netlogic XLP network driver").

It's relatively unlikely that any modern driver will need an
.ndo_select_queue implementation which picks purely based on CPU ID
and skips XPS, delete dev_pick_tx_cpu_id()

Found by code inspection.

Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/20240906161059.715546-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>netdevice: add netdev_tx_reset_subqueue() shorthand</title>
<updated>2024-09-09T20:15:37Z</updated>
<author>
<name>Alexander Lobakin</name>
<email>aleksander.lobakin@intel.com</email>
</author>
<published>2024-09-04T15:47:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3dc95a3edd0a86b4a59670b3fafcc64c7d83e2e7'/>
<id>urn:sha1:3dc95a3edd0a86b4a59670b3fafcc64c7d83e2e7</id>
<content type='text'>
Add a shorthand similar to other net*_subqueue() helpers for resetting
the queue by its index w/o obtaining &amp;netdev_tx_queue beforehand
manually.

Reviewed-by: Przemek Kitszel &lt;przemyslaw.kitszel@intel.com&gt;
Signed-off-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Signed-off-by: Tony Nguyen &lt;anthony.l.nguyen@intel.com&gt;
</content>
</entry>
<entry>
<title>net: napi: Prevent overflow of napi_defer_hard_irqs</title>
<updated>2024-09-06T01:42:56Z</updated>
<author>
<name>Joe Damato</name>
<email>jdamato@fastly.com</email>
</author>
<published>2024-09-04T15:34:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=08062af0a52107a243f7608fd972edb54ca5b7f8'/>
<id>urn:sha1:08062af0a52107a243f7608fd972edb54ca5b7f8</id>
<content type='text'>
In commit 6f8b12d661d0 ("net: napi: add hard irqs deferral feature")
napi_defer_irqs was added to net_device and napi_defer_irqs_count was
added to napi_struct, both as type int.

This value never goes below zero, so there is not reason for it to be a
signed int. Change the type for both from int to u32, and add an
overflow check to sysfs to limit the value to S32_MAX.

The limit of S32_MAX was chosen because the practical limit before this
patch was S32_MAX (anything larger was an overflow) and thus there are
no behavioral changes introduced. If the extra bit is needed in the
future, the limit can be raised.

Before this patch:

$ sudo bash -c 'echo 2147483649 &gt; /sys/class/net/eth4/napi_defer_hard_irqs'
$ cat /sys/class/net/eth4/napi_defer_hard_irqs
-2147483647

After this patch:

$ sudo bash -c 'echo 2147483649 &gt; /sys/class/net/eth4/napi_defer_hard_irqs'
bash: line 0: echo: write error: Numerical result out of range

Similarly, /sys/class/net/XXXXX/tx_queue_len is defined as unsigned:

include/linux/netdevice.h:      unsigned int            tx_queue_len;

And has an overflow check:

dev_change_tx_queue_len(..., unsigned long new_len):

  if (new_len != (unsigned int)new_len)
          return -ERANGE;

Suggested-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Joe Damato &lt;jdamato@fastly.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/20240904153431.307932-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>netdev_features: convert NETIF_F_FCOE_MTU to dev-&gt;fcoe_mtu</title>
<updated>2024-09-03T09:36:43Z</updated>
<author>
<name>Alexander Lobakin</name>
<email>aleksander.lobakin@intel.com</email>
</author>
<published>2024-08-29T12:33:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=782dbbf589cd9082effaec522e3f1b4ce1594803'/>
<id>urn:sha1:782dbbf589cd9082effaec522e3f1b4ce1594803</id>
<content type='text'>
Ability to handle maximum FCoE frames of 2158 bytes can never be changed
and thus more of an attribute, not a toggleable feature.
Move it from netdev_features_t to "cold" priv flags (bitfield bool) and
free yet another feature bit.

Signed-off-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
<entry>
<title>netdev_features: convert NETIF_F_NETNS_LOCAL to dev-&gt;netns_local</title>
<updated>2024-09-03T09:36:43Z</updated>
<author>
<name>Alexander Lobakin</name>
<email>aleksander.lobakin@intel.com</email>
</author>
<published>2024-08-29T12:33:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=05c1280a2bcfca187fe7fa90bb240602cf54af0a'/>
<id>urn:sha1:05c1280a2bcfca187fe7fa90bb240602cf54af0a</id>
<content type='text'>
"Interface can't change network namespaces" is rather an attribute,
not a feature, and it can't be changed via Ethtool.
Make it a "cold" private flag instead of a netdev_feature and free
one more bit.

Signed-off-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
</entry>
</feed>
