<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/netdevice.h, branch v6.1.16</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.1.16</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.1.16'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2022-12-31T12:33:02Z</updated>
<entry>
<title>net: add atomic_long_t to net_device_stats fields</title>
<updated>2022-12-31T12:33:02Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2022-11-15T08:53:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9921d1b68c46ae1926e634d21dc3d2103f4310fd'/>
<id>urn:sha1:9921d1b68c46ae1926e634d21dc3d2103f4310fd</id>
<content type='text'>
[ Upstream commit 6c1c5097781f563b70a81683ea6fdac21637573b ]

Long standing KCSAN issues are caused by data-race around
some dev-&gt;stats changes.

Most performance critical paths already use per-cpu
variables, or per-queue ones.

It is reasonable (and more correct) to use atomic operations
for the slow paths.

This patch adds an union for each field of net_device_stats,
so that we can convert paths that are not yet protected
by a spinlock or a mutex.

netdev_stats_to_stats64() no longer has an #if BITS_PER_LONG==64

Note that the memcpy() we were using on 64bit arches
had no provision to avoid load-tearing,
while atomic_long_read() is providing the needed protection
at no cost.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>Revert "net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and}"</title>
<updated>2022-10-15T10:17:12Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2022-10-14T16:07:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fc8695eb11f07d936a4a9dbd15d7797986bc8b89'/>
<id>urn:sha1:fc8695eb11f07d936a4a9dbd15d7797986bc8b89</id>
<content type='text'>
This reverts commit 854701ba4c39afae2362ba19a580c461cb183e4f.

We have more violations around, which leads to:

  WARNING: CPU: 2 PID: 1 at include/linux/cpumask.h:110 __netif_set_xps_queue+0x14e/0x770

Let's back this out and retry with a larger clean up in -next.

Fixes: 854701ba4c39 ("net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and}")
Link: https://lore.kernel.org/all/20221014030459.3272206-2-guoren@kernel.org/
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge tag 'bitmap-6.1-rc1' of https://github.com/norov/linux</title>
<updated>2022-10-10T19:49:34Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-10-10T19:49:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d4013bc4d49f6da8178a340348369bb9920225c9'/>
<id>urn:sha1:d4013bc4d49f6da8178a340348369bb9920225c9</id>
<content type='text'>
Pull bitmap updates from Yury Norov:

 - Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES (Phil Auld)

 - cleanup nr_cpu_ids vs nr_cpumask_bits mess (me)

   This series cleans that mess and adds new config FORCE_NR_CPUS that
   allows to optimize cpumask subsystem if the number of CPUs is known
   at compile-time.

 - optimize find_bit() functions (me)

   Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT()
   macros.

 - add find_nth_bit() (me)

   Adds find_nth_bit(), which is ~70 times faster than bitcounting with
   for_each() loop:

	for_each_set_bit(bit, mask, size)
		if (n-- == 0)
			return bit;

   Also adds bitmap_weight_and() to let people replace this pattern:

	tmp = bitmap_alloc(nbits);
	bitmap_and(tmp, map1, map2, nbits);
	weight = bitmap_weight(tmp, nbits);
	bitmap_free(tmp);

   with a single bitmap_weight_and() call.

 - repair cpumask_check() (me)

   After switching cpumask to use nr_cpu_ids, cpumask_check() started
   generating many false-positive warnings. This series fixes it.

 - Add for_each_cpu_andnot() and for_each_cpu_andnot() (Valentin
   Schneider)

   Extends the API with one more function and applies it in sched/core.

* tag 'bitmap-6.1-rc1' of https://github.com/norov/linux: (28 commits)
  sched/core: Merge cpumask_andnot()+for_each_cpu() into for_each_cpu_andnot()
  lib/test_cpumask: Add for_each_cpu_and(not) tests
  cpumask: Introduce for_each_cpu_andnot()
  lib/find_bit: Introduce find_next_andnot_bit()
  cpumask: fix checking valid cpu range
  lib/bitmap: add tests for for_each() loops
  lib/find: optimize for_each() macros
  lib/bitmap: introduce for_each_set_bit_wrap() macro
  lib/find_bit: add find_next{,_and}_bit_wrap
  cpumask: switch for_each_cpu{,_not} to use for_each_bit()
  net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and}
  cpumask: add cpumask_nth_{,and,andnot}
  lib/bitmap: remove bitmap_ord_to_pos
  lib/bitmap: add tests for find_nth_bit()
  lib: add find_nth{,_and,_andnot}_bit()
  lib/bitmap: add bitmap_weight_and()
  lib/bitmap: don't call __bitmap_weight() in kernel code
  tools: sync find_bit() implementation
  lib/find_bit: optimize find_next_bit() functions
  lib/find_bit: create find_first_zero_bit_le()
  ...
</content>
</entry>
<entry>
<title>net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and}</title>
<updated>2022-10-01T17:22:57Z</updated>
<author>
<name>Yury Norov</name>
<email>yury.norov@gmail.com</email>
</author>
<published>2022-09-19T21:05:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=854701ba4c39afae2362ba19a580c461cb183e4f'/>
<id>urn:sha1:854701ba4c39afae2362ba19a580c461cb183e4f</id>
<content type='text'>
The functions require to be passed with a cpu index prior to one that is
the first to start search, so the valid input range is [-1, nr_cpu_ids-1).
However, the code checks against [-1, nr_cpu_ids).

Acked-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Yury Norov &lt;yury.norov@gmail.com&gt;
</content>
</entry>
<entry>
<title>net/sched: query offload capabilities through ndo_setup_tc()</title>
<updated>2022-09-30T01:52:01Z</updated>
<author>
<name>Vladimir Oltean</name>
<email>vladimir.oltean@nxp.com</email>
</author>
<published>2022-09-28T09:51:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=aac4daa8941ea6566563ac001e9e5d4e54a674e2'/>
<id>urn:sha1:aac4daa8941ea6566563ac001e9e5d4e54a674e2</id>
<content type='text'>
When adding optional new features to Qdisc offloads, existing drivers
must reject the new configuration until they are coded up to act on it.

Since modifying all drivers in lockstep with the changes in the Qdisc
can create problems of its own, it would be nice if there existed an
automatic opt-in mechanism for offloading optional features.

Jakub proposes that we multiplex one more kind of call through
ndo_setup_tc(): one where the driver populates a Qdisc-specific
capability structure.

First user will be taprio in further changes. Here we are introducing
the definitions for the base functionality.

Link: https://patchwork.kernel.org/project/netdevbpf/patch/20220923163310.3192733-3-vladimir.oltean@nxp.com/
Suggested-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Co-developed-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Vladimir Oltean &lt;vladimir.oltean@nxp.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: skb: introduce and use a single page frag cache</title>
<updated>2022-09-30T01:48:15Z</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2022-09-28T08:43:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=dbae2b062824fc2d35ae2d5df2f500626c758e80'/>
<id>urn:sha1:dbae2b062824fc2d35ae2d5df2f500626c758e80</id>
<content type='text'>
After commit 3226b158e67c ("net: avoid 32 x truesize under-estimation
for tiny skbs") we are observing 10-20% regressions in performance
tests with small packets. The perf trace points to high pressure on
the slab allocator.

This change tries to improve the allocation schema for small packets
using an idea originally suggested by Eric: a new per CPU page frag is
introduced and used in __napi_alloc_skb to cope with small allocation
requests.

To ensure that the above does not lead to excessive truesize
underestimation, the frag size for small allocation is inflated to 1K
and all the above is restricted to build with 4K page size.

Note that we need to update accordingly the run-time check introduced
with commit fd9ea57f4e95 ("net: add napi_get_frags_check() helper").

Alex suggested a smart page refcount schema to reduce the number
of atomic operations and deal properly with pfmemalloc pages.

Under small packet UDP flood, I measure a 15% peak tput increases.

Suggested-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Suggested-by: Alexander H Duyck &lt;alexanderduyck@fb.com&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Alexander Duyck &lt;alexanderduyck@fb.com&gt;
Link: https://lore.kernel.org/r/6b6f65957c59f86a353fc09a5127e83a32ab5999.1664350652.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: drop the weight argument from netif_napi_add</title>
<updated>2022-09-29T01:57:14Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2022-09-27T13:27:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b48b89f9c189d24eb5e2b4a0ac067da5a24ee86d'/>
<id>urn:sha1:b48b89f9c189d24eb5e2b4a0ac067da5a24ee86d</id>
<content type='text'>
We tell driver developers to always pass NAPI_POLL_WEIGHT
as the weight to netif_napi_add(). This may be confusing
to newcomers, drop the weight argument, those who really
need to tweak the weight can use netif_napi_add_weight().

Acked-by: Marc Kleine-Budde &lt;mkl@pengutronix.de&gt; # for CAN
Link: https://lore.kernel.org/r/20220927132753.750069-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: introduce iterators over synced hw addresses</title>
<updated>2022-09-20T08:32:35Z</updated>
<author>
<name>Vladimir Oltean</name>
<email>vladimir.oltean@nxp.com</email>
</author>
<published>2022-09-11T01:06:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=db01868bf2e919a10551066d54cf4bef5dd5a01e'/>
<id>urn:sha1:db01868bf2e919a10551066d54cf4bef5dd5a01e</id>
<content type='text'>
Some network drivers use __dev_mc_sync()/__dev_uc_sync() and therefore
program the hardware only with addresses with a non-zero sync_cnt.

Some of the above drivers also need to save/restore the address
filtering lists when certain events happen, and they need to walk
through the struct net_device :: uc and struct net_device :: mc lists.
But these lists contain unsynced addresses too.

To keep the appearance of an elementary form of data encapsulation,
provide iterators through these lists that only look at entries with a
non-zero sync_cnt, instead of filtering entries out from device drivers.

Signed-off-by: Vladimir Oltean &lt;vladimir.oltean@nxp.com&gt;
Reviewed-by: Florian Fainelli &lt;f.fainelli@gmail.com&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
</entry>
<entry>
<title>net: remove netif_tx_napi_add()</title>
<updated>2022-09-02T11:41:43Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2022-09-01T00:00:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c3f760ef128789252e7c4f10d3c1721422dceba9'/>
<id>urn:sha1:c3f760ef128789252e7c4f10d3c1721422dceba9</id>
<content type='text'>
All callers are now gone.

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: bql: add more documentation</title>
<updated>2022-09-02T11:27:46Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2022-08-31T18:44:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=977f1aa5e4d1942bc8012bf3f0e695008979ff4e'/>
<id>urn:sha1:977f1aa5e4d1942bc8012bf3f0e695008979ff4e</id>
<content type='text'>
Add some documentation for netdev_tx_sent_queue() and
netdev_tx_completed_queue()

Stating that netdev_tx_completed_queue() must be called once
per TX completion round is apparently not obvious for everybody.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
