<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/rtnetlink.h, branch v6.1.152</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.1.152</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.1.152'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2025-04-25T08:43:25Z</updated>
<entry>
<title>rtnl: add helper to check if a notification is needed</title>
<updated>2025-04-25T08:43:25Z</updated>
<author>
<name>Victor Nogueira</name>
<email>victor@mojatatu.com</email>
</author>
<published>2023-12-08T19:28:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=db5d2b27e4efa97989a9be4a3fb141c9dad79ebf'/>
<id>urn:sha1:db5d2b27e4efa97989a9be4a3fb141c9dad79ebf</id>
<content type='text'>
[ Upstream commit 8439109b76a3c405808383bf9dd532fc4b9c2dbd ]

Building on the rtnl_has_listeners helper, add the rtnl_notify_needed
helper to check if we can bail out early in the notification routines.

Reviewed-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Signed-off-by: Victor Nogueira &lt;victor@mojatatu.com&gt;
Signed-off-by: Pedro Tammela &lt;pctammela@mojatatu.com&gt;
Link: https://lore.kernel.org/r/20231208192847.714940-3-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Stable-dep-of: 369609fc6272 ("tc: Ensure we have enough buffer space when sending filter netlink notifications")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>rtnl: add helper to check if rtnl group has listeners</title>
<updated>2025-04-25T08:43:25Z</updated>
<author>
<name>Jamal Hadi Salim</name>
<email>jhs@mojatatu.com</email>
</author>
<published>2023-12-08T19:28:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8ab0c35a9569296249b5fd04b90759c18de41ee0'/>
<id>urn:sha1:8ab0c35a9569296249b5fd04b90759c18de41ee0</id>
<content type='text'>
[ Upstream commit c5e2a973448d958feb7881e4d875eac59fdeff3d ]

As of today, rtnl code creates a new skb and unconditionally fills and
broadcasts it to the relevant group. For most operations this is okay
and doesn't waste resources in general.

When operations are done without the rtnl_lock, as in tc-flower, such
skb allocation, message fill and no-op broadcasting can happen in all
cores of the system, which contributes to system pressure and wastes
precious cpu cycles when no one will receive the built message.

Introduce this helper so rtnetlink operations can simply check if someone
is listening and then proceed if necessary.

Reviewed-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Signed-off-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Signed-off-by: Victor Nogueira &lt;victor@mojatatu.com&gt;
Signed-off-by: Pedro Tammela &lt;pctammela@mojatatu.com&gt;
Link: https://lore.kernel.org/r/20231208192847.714940-2-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Stable-dep-of: 369609fc6272 ("tc: Ensure we have enough buffer space when sending filter netlink notifications")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: sched: use queue_mapping to pick tx queue</title>
<updated>2022-04-19T10:20:45Z</updated>
<author>
<name>Tonghao Zhang</name>
<email>xiangxia.m.yue@gmail.com</email>
</author>
<published>2022-04-15T16:40:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2f1e85b1aee459b7d0fd981839042c6a38ffaf0c'/>
<id>urn:sha1:2f1e85b1aee459b7d0fd981839042c6a38ffaf0c</id>
<content type='text'>
This patch fixes issue:
* If we install tc filters with act_skbedit in clsact hook.
  It doesn't work, because netdev_core_pick_tx() overwrites
  queue_mapping.

  $ tc filter ... action skbedit queue_mapping 1

And this patch is useful:
* We can use FQ + EDT to implement efficient policies. Tx queues
  are picked by xps, ndo_select_queue of netdev driver, or skb hash
  in netdev_core_pick_tx(). In fact, the netdev driver, and skb
  hash are _not_ under control. xps uses the CPUs map to select Tx
  queues, but we can't figure out which task_struct of pod/containter
  running on this cpu in most case. We can use clsact filters to classify
  one pod/container traffic to one Tx queue. Why ?

  In containter networking environment, there are two kinds of pod/
  containter/net-namespace. One kind (e.g. P1, P2), the high throughput
  is key in these applications. But avoid running out of network resource,
  the outbound traffic of these pods is limited, using or sharing one
  dedicated Tx queues assigned HTB/TBF/FQ Qdisc. Other kind of pods
  (e.g. Pn), the low latency of data access is key. And the traffic is not
  limited. Pods use or share other dedicated Tx queues assigned FIFO Qdisc.
  This choice provides two benefits. First, contention on the HTB/FQ Qdisc
  lock is significantly reduced since fewer CPUs contend for the same queue.
  More importantly, Qdisc contention can be eliminated completely if each
  CPU has its own FIFO Qdisc for the second kind of pods.

  There must be a mechanism in place to support classifying traffic based on
  pods/container to different Tx queues. Note that clsact is outside of Qdisc
  while Qdisc can run a classifier to select a sub-queue under the lock.

  In general recording the decision in the skb seems a little heavy handed.
  This patch introduces a per-CPU variable, suggested by Eric.

  The xmit.skip_txqueue flag is firstly cleared in __dev_queue_xmit().
  - Tx Qdisc may install that skbedit actions, then xmit.skip_txqueue flag
    is set in qdisc-&gt;enqueue() though tx queue has been selected in
    netdev_tx_queue_mapping() or netdev_core_pick_tx(). That flag is cleared
    firstly in __dev_queue_xmit(), is useful:
  - Avoid picking Tx queue with netdev_tx_queue_mapping() in next netdev
    in such case: eth0 macvlan - eth0.3 vlan - eth0 ixgbe-phy:
    For example, eth0, macvlan in pod, which root Qdisc install skbedit
    queue_mapping, send packets to eth0.3, vlan in host. In __dev_queue_xmit() of
    eth0.3, clear the flag, does not select tx queue according to skb-&gt;queue_mapping
    because there is no filters in clsact or tx Qdisc of this netdev.
    Same action taked in eth0, ixgbe in Host.
  - Avoid picking Tx queue for next packet. If we set xmit.skip_txqueue
    in tx Qdisc (qdisc-&gt;enqueue()), the proper way to clear it is clearing it
    in __dev_queue_xmit when processing next packets.

  For performance reasons, use the static key. If user does not config the NET_EGRESS,
  the patch will not be compiled.

  +----+      +----+      +----+
  | P1 |      | P2 |      | Pn |
  +----+      +----+      +----+
    |           |           |
    +-----------+-----------+
                |
                | clsact/skbedit
                |      MQ
                v
    +-----------+-----------+
    | q0        | q1        | qn
    v           v           v
  HTB/FQ      HTB/FQ  ...  FIFO

Cc: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Cc: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Cc: Jiri Pirko &lt;jiri@resnulli.us&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Jakub Kicinski &lt;kuba@kernel.org&gt;
Cc: Jonathan Lemon &lt;jonathan.lemon@gmail.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Alexander Lobakin &lt;alobakin@pm.me&gt;
Cc: Paolo Abeni &lt;pabeni@redhat.com&gt;
Cc: Talal Ahmad &lt;talalahmad@google.com&gt;
Cc: Kevin Hao &lt;haokexin@gmail.com&gt;
Cc: Ilias Apalodimas &lt;ilias.apalodimas@linaro.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Kumar Kartikeya Dwivedi &lt;memxor@gmail.com&gt;
Cc: Antoine Tenart &lt;atenart@kernel.org&gt;
Cc: Wei Wang &lt;weiwan@google.com&gt;
Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Tonghao Zhang &lt;xiangxia.m.yue@gmail.com&gt;
Acked-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
</entry>
<entry>
<title>net: rtnetlink: Add UAPI toggle for IFLA_OFFLOAD_XSTATS_L3_STATS</title>
<updated>2022-03-03T10:37:23Z</updated>
<author>
<name>Petr Machata</name>
<email>petrm@nvidia.com</email>
</author>
<published>2022-03-02T16:31:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5fd0b838efac16046509f7fb100455d0463b9687'/>
<id>urn:sha1:5fd0b838efac16046509f7fb100455d0463b9687</id>
<content type='text'>
The offloaded HW stats are designed to allow per-netdevice enablement and
disablement. Add an attribute, IFLA_STATS_SET_OFFLOAD_XSTATS_L3_STATS,
which should be carried by the RTM_SETSTATS message, and expresses a desire
to toggle L3 offload xstats on or off.

As part of the above, add an exported function rtnl_offload_xstats_notify()
that drivers can use when they have installed or deinstalled the counters
backing the HW stats.

At this point, it is possible to enable, disable and query L3 offload
xstats on netdevices. (However there is no driver actually implementing
these.)

Signed-off-by: Petr Machata &lt;petrm@nvidia.com&gt;
Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: sched: extend Qdisc with rcu</title>
<updated>2018-09-26T03:17:35Z</updated>
<author>
<name>Vlad Buslov</name>
<email>vladbu@mellanox.com</email>
</author>
<published>2018-09-24T16:22:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3a7d0d07a386716b459b00783b11a8211cefcc0f'/>
<id>urn:sha1:3a7d0d07a386716b459b00783b11a8211cefcc0f</id>
<content type='text'>
Currently, Qdisc API functions assume that users have rtnl lock taken. To
implement rtnl unlocked classifiers update interface, Qdisc API must be
extended with functions that do not require rtnl lock.

Extend Qdisc structure with rcu. Implement special version of put function
qdisc_put_unlocked() that is called without rtnl lock taken. This function
only takes rtnl lock if Qdisc reference counter reached zero and is
intended to be used as optimization.

Signed-off-by: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Acked-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: core: netlink: add helper refcount dec and lock function</title>
<updated>2018-09-26T03:17:35Z</updated>
<author>
<name>Vlad Buslov</name>
<email>vladbu@mellanox.com</email>
</author>
<published>2018-09-24T16:22:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6f99528e9797794b91b43321fbbc93fe772b0803'/>
<id>urn:sha1:6f99528e9797794b91b43321fbbc93fe772b0803</id>
<content type='text'>
Rtnl lock is encapsulated in netlink and cannot be accessed by other
modules directly. This means that reference counted objects that rely on
rtnl lock cannot use it with refcounter helper function that atomically
releases decrements reference and obtains mutex.

This patch implements simple wrapper function around refcount_dec_and_lock
that obtains rtnl lock if reference counter value reached 0.

Signed-off-by: Vlad Buslov &lt;vladbu@mellanox.com&gt;
Acked-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Introduce net_rwsem to protect net_namespace_list</title>
<updated>2018-03-29T17:47:53Z</updated>
<author>
<name>Kirill Tkhai</name>
<email>ktkhai@virtuozzo.com</email>
</author>
<published>2018-03-29T16:20:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f0b07bb151b098d291fd1fd71ef7a2df56fb124a'/>
<id>urn:sha1:f0b07bb151b098d291fd1fd71ef7a2df56fb124a</id>
<content type='text'>
rtnl_lock() is used everywhere, and contention is very high.
When someone wants to iterate over alive net namespaces,
he/she has no a possibility to do that without exclusive lock.
But the exclusive rtnl_lock() in such places is overkill,
and it just increases the contention. Yes, there is already
for_each_net_rcu() in kernel, but it requires rcu_read_lock(),
and this can't be sleepable. Also, sometimes it may be need
really prevent net_namespace_list growth, so for_each_net_rcu()
is not fit there.

This patch introduces new rw_semaphore, which will be used
instead of rtnl_mutex to protect net_namespace_list. It is
sleepable and allows not-exclusive iterations over net
namespaces list. It allows to stop using rtnl_lock()
in several places (what is made in next patches) and makes
less the time, we keep rtnl_mutex. Here we just add new lock,
while the explanation of we can remove rtnl_lock() there are
in next patches.

Fine grained locks generally are better, then one big lock,
so let's do that with net_namespace_list, while the situation
allows that.

Signed-off-by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Rename net_sem to pernet_ops_rwsem</title>
<updated>2018-03-27T17:18:09Z</updated>
<author>
<name>Kirill Tkhai</name>
<email>ktkhai@virtuozzo.com</email>
</author>
<published>2018-03-27T15:02:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4420bf21fb6c0306e36ad58ade1e741fba57ce65'/>
<id>urn:sha1:4420bf21fb6c0306e36ad58ade1e741fba57ce65</id>
<content type='text'>
net_sem is some undefined area name, so it will be better
to make the area more defined.

Rename it to pernet_ops_rwsem for better readability and
better intelligibility.

Signed-off-by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Add rtnl_lock_killable()</title>
<updated>2018-03-16T16:31:19Z</updated>
<author>
<name>Kirill Tkhai</name>
<email>ktkhai@virtuozzo.com</email>
</author>
<published>2018-03-14T19:17:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=79ffdfc6522ae33d8a33e971070c08ee5f27439b'/>
<id>urn:sha1:79ffdfc6522ae33d8a33e971070c08ee5f27439b</id>
<content type='text'>
rtnl_lock() is widely used mutex in kernel. Some of kernel code
does memory allocations under it. In case of memory deficit this
may invoke OOM killer, but the problem is a killed task can't
exit if it's waiting for the mutex. This may be a reason of deadlock
and panic.

This patch adds a new primitive, which responds on SIGKILL, and
it allows to use it in the places, where we don't want to sleep
forever.

Signed-off-by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: Kill net_mutex</title>
<updated>2018-02-20T18:23:13Z</updated>
<author>
<name>Kirill Tkhai</name>
<email>ktkhai@virtuozzo.com</email>
</author>
<published>2018-02-19T09:58:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=19efbd93e6fb05eab81856b4fc8d64211dd37088'/>
<id>urn:sha1:19efbd93e6fb05eab81856b4fc8d64211dd37088</id>
<content type='text'>
We take net_mutex, when there are !async pernet_operations
registered, and read locking of net_sem is not enough. But
we may get rid of taking the mutex, and just change the logic
to write lock net_sem in such cases. This obviously reduces
the number of lock operations, we do.

Signed-off-by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
