<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/bpf/devmap.c, branch v5.7</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.7</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.7'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2020-01-27T10:16:25Z</updated>
<entry>
<title>bpf, xdp: Remove no longer required rcu_read_{un}lock()</title>
<updated>2020-01-27T10:16:25Z</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2020-01-27T00:14:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b23bfa5633b19bf1db87b36a76b2225c734f794c'/>
<id>urn:sha1:b23bfa5633b19bf1db87b36a76b2225c734f794c</id>
<content type='text'>
Now that we depend on rcu_call() and synchronize_rcu() to also wait
for preempt_disabled region to complete the rcu read critical section
in __dev_map_flush() is no longer required. Except in a few special
cases in drivers that need it for other reasons.

These originally ensured the map reference was safe while a map was
also being free'd. And additionally that bpf program updates via
ndo_bpf did not happen while flush updates were in flight. But flush
by new rules can only be called from preempt-disabled NAPI context.
The synchronize_rcu from the map free path and the rcu_call from the
delete path will ensure the reference there is safe. So lets remove
the rcu_read_lock and rcu_read_unlock pair to avoid any confusion
around how this is being protected.

If the rcu_read_lock was required it would mean errors in the above
logic and the original patch would also be wrong.

Now that we have done above we put the rcu_read_lock in the driver
code where it is needed in a driver dependent way. I think this
helps readability of the code so we know where and why we are
taking read locks. Most drivers will not need rcu_read_locks here
and further XDP drivers already have rcu_read_locks in their code
paths for reading xdp programs on RX side so this makes it symmetric
where we don't have half of rcu critical sections define in driver
and the other half in devmap.

Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Link: https://lore.kernel.org/bpf/1580084042-11598-4-git-send-email-john.fastabend@gmail.com
</content>
</entry>
<entry>
<title>bpf, xdp: Update devmap comments to reflect napi/rcu usage</title>
<updated>2020-01-27T10:16:20Z</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2020-01-27T00:14:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=42a84a8cd0ff0cbff5a4595e1304c4567a30267d'/>
<id>urn:sha1:42a84a8cd0ff0cbff5a4595e1304c4567a30267d</id>
<content type='text'>
Now that we rely on synchronize_rcu and call_rcu waiting to
exit perempt-disable regions (NAPI) lets update the comments
to reflect this.

Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup")
Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Björn Töpel &lt;bjorn.topel@intel.com&gt;
Acked-by: Song Liu &lt;songliubraving@fb.com&gt;
Link: https://lore.kernel.org/bpf/1580084042-11598-2-git-send-email-john.fastabend@gmail.com
</content>
</entry>
<entry>
<title>bpf, devmap: Pass lockdep expression to RCU lists</title>
<updated>2020-01-23T22:01:16Z</updated>
<author>
<name>Amol Grover</name>
<email>frextrite@gmail.com</email>
</author>
<published>2020-01-23T12:04:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=485ec2ea9cf556e9c120e07961b7b459d776a115'/>
<id>urn:sha1:485ec2ea9cf556e9c120e07961b7b459d776a115</id>
<content type='text'>
head is traversed using hlist_for_each_entry_rcu outside an RCU
read-side critical section but under the protection of dtab-&gt;index_lock.

Hence, add corresponding lockdep expression to silence false-positive
lockdep warnings, and harden RCU lists.

Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Signed-off-by: Amol Grover &lt;frextrite@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Acked-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Link: https://lore.kernel.org/bpf/20200123120437.26506-1-frextrite@gmail.com
</content>
</entry>
<entry>
<title>devmap: Adjust tracepoint for map-less queue flush</title>
<updated>2020-01-17T04:03:34Z</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>brouer@redhat.com</email>
</author>
<published>2020-01-16T15:14:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=58aa94f922c1b44e0919d1814d2eab5b9e8bf945'/>
<id>urn:sha1:58aa94f922c1b44e0919d1814d2eab5b9e8bf945</id>
<content type='text'>
Now that we don't have a reference to a devmap when flushing the device
bulk queue, let's change the the devmap_xmit tracepoint to remote the
map_id and map_index fields entirely. Rearrange the fields so 'drops' and
'sent' stay in the same position in the tracepoint struct, to make it
possible for the xdp_monitor utility to read both the old and the new
format.

Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/157918768613.1458396.9165902403373826572.stgit@toke.dk
</content>
</entry>
<entry>
<title>xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths</title>
<updated>2020-01-17T04:03:34Z</updated>
<author>
<name>Toke Høiland-Jørgensen</name>
<email>toke@redhat.com</email>
</author>
<published>2020-01-16T15:14:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1d233886dd904edbf239eeffe435c3308ae97625'/>
<id>urn:sha1:1d233886dd904edbf239eeffe435c3308ae97625</id>
<content type='text'>
Since the bulk queue used by XDP_REDIRECT now lives in struct net_device,
we can re-use the bulking for the non-map version of the bpf_redirect()
helper. This is a simple matter of having xdp_do_redirect_slow() queue the
frame on the bulk queue instead of sending it out with __bpf_tx_xdp().

Unfortunately we can't make the bpf_redirect() helper return an error if
the ifindex doesn't exit (as bpf_redirect_map() does), because we don't
have a reference to the network namespace of the ingress device at the time
the helper is called. So we have to leave it as-is and keep the device
lookup in xdp_do_redirect_slow().

Since this leaves less reason to have the non-map redirect code in a
separate function, so we get rid of the xdp_do_redirect_slow() function
entirely. This does lose us the tracepoint disambiguation, but fortunately
the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint
entry structures. This means both can contain a map index, so we can just
amend the tracepoint definitions so we always emit the xdp_redirect(_err)
tracepoints, but with the map ID only populated if a map is present. This
means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep
the definitions around in case someone is still listening for them.

With this change, the performance of the xdp_redirect sample program goes
from 5Mpps to 8.4Mpps (a 68% increase).

Since the flush functions are no longer map-specific, rename the flush()
functions to drop _map from their names. One of the renamed functions is
the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To
keep from having to update all drivers, use a #define to keep the old name
working, and only update the virtual drivers in this patch.

Signed-off-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/157918768505.1458396.17518057312953572912.stgit@toke.dk
</content>
</entry>
<entry>
<title>xdp: Move devmap bulk queue into struct net_device</title>
<updated>2020-01-17T04:03:34Z</updated>
<author>
<name>Toke Høiland-Jørgensen</name>
<email>toke@redhat.com</email>
</author>
<published>2020-01-16T15:14:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=75ccae62cb8d42a619323a85c577107b8b37d797'/>
<id>urn:sha1:75ccae62cb8d42a619323a85c577107b8b37d797</id>
<content type='text'>
Commit 96360004b862 ("xdp: Make devmap flush_list common for all map
instances"), changed devmap flushing to be a global operation instead of a
per-map operation. However, the queue structure used for bulking was still
allocated as part of the containing map.

This patch moves the devmap bulk queue into struct net_device. The
motivation for this is reusing it for the non-map variant of XDP_REDIRECT,
which will be changed in a subsequent commit.  To avoid other fields of
struct net_device moving to different cache lines, we also move a couple of
other members around.

We defer the actual allocation of the bulk queue structure until the
NETDEV_REGISTER notification devmap.c. This makes it possible to check for
ndo_xdp_xmit support before allocating the structure, which is not possible
at the time struct net_device is allocated. However, we keep the freeing in
free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER.

Because of this change, we lose the reference back to the map that
originated the redirect, so change the tracepoint to always return 0 as the
map ID and index. Otherwise no functional change is intended with this
patch.

After this patch, the relevant part of struct net_device looks like this,
according to pahole:

	/* --- cacheline 14 boundary (896 bytes) --- */
	struct netdev_queue *      _tx __attribute__((__aligned__(64))); /*   896     8 */
	unsigned int               num_tx_queues;        /*   904     4 */
	unsigned int               real_num_tx_queues;   /*   908     4 */
	struct Qdisc *             qdisc;                /*   912     8 */
	unsigned int               tx_queue_len;         /*   920     4 */
	spinlock_t                 tx_global_lock;       /*   924     4 */
	struct xdp_dev_bulk_queue * xdp_bulkq;           /*   928     8 */
	struct xps_dev_maps *      xps_cpus_map;         /*   936     8 */
	struct xps_dev_maps *      xps_rxqs_map;         /*   944     8 */
	struct mini_Qdisc *        miniq_egress;         /*   952     8 */
	/* --- cacheline 15 boundary (960 bytes) --- */
	struct hlist_head  qdisc_hash[16];               /*   960   128 */
	/* --- cacheline 17 boundary (1088 bytes) --- */
	struct timer_list  watchdog_timer;               /*  1088    40 */

	/* XXX last struct has 4 bytes of padding */

	int                        watchdog_timeo;       /*  1128     4 */

	/* XXX 4 bytes hole, try to pack */

	struct list_head   todo_list;                    /*  1136    16 */
	/* --- cacheline 18 boundary (1152 bytes) --- */

Signed-off-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Björn Töpel &lt;bjorn.topel@intel.com&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/157918768397.1458396.12673224324627072349.stgit@toke.dk
</content>
</entry>
<entry>
<title>xdp: Make devmap flush_list common for all map instances</title>
<updated>2019-12-20T05:09:43Z</updated>
<author>
<name>Björn Töpel</name>
<email>bjorn.topel@intel.com</email>
</author>
<published>2019-12-19T06:10:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=96360004b8628541f5d05a845ea213267db0b1a2'/>
<id>urn:sha1:96360004b8628541f5d05a845ea213267db0b1a2</id>
<content type='text'>
The devmap flush list is used to track entries that need to flushed
from via the xdp_do_flush_map() function. This list used to be
per-map, but there is really no reason for that. Instead make the
flush list global for all devmaps, which simplifies __dev_map_flush()
and dev_map_init_map().

Signed-off-by: Björn Töpel &lt;bjorn.topel@intel.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Link: https://lore.kernel.org/bpf/20191219061006.21980-6-bjorn.topel@gmail.com
</content>
</entry>
<entry>
<title>xdp: Simplify devmap cleanup</title>
<updated>2019-12-20T05:09:43Z</updated>
<author>
<name>Björn Töpel</name>
<email>bjorn.topel@intel.com</email>
</author>
<published>2019-12-19T06:09:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0536b85239b8440735cdd910aae0eb076ebbb439'/>
<id>urn:sha1:0536b85239b8440735cdd910aae0eb076ebbb439</id>
<content type='text'>
After the RCU flavor consolidation [1], call_rcu() and
synchronize_rcu() waits for preempt-disable regions (NAPI) in addition
to the read-side critical sections. As a result of this, the cleanup
code in devmap can be simplified

* There is no longer a need to flush in __dev_map_entry_free, since we
  know that this has been done when the call_rcu() callback is
  triggered.

* When freeing the map, there is no need to explicitly wait for a
  flush. It's guaranteed to be done after the synchronize_rcu() call
  in dev_map_free(). The rcu_barrier() is still needed, so that the
  map is not freed prior the elements.

[1] https://lwn.net/Articles/777036/

Signed-off-by: Björn Töpel &lt;bjorn.topel@intel.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Link: https://lore.kernel.org/bpf/20191219061006.21980-2-bjorn.topel@gmail.com
</content>
</entry>
<entry>
<title>xdp: Fix cleanup on map free for devmap_hash map type</title>
<updated>2019-11-25T00:58:46Z</updated>
<author>
<name>Toke Høiland-Jørgensen</name>
<email>toke@redhat.com</email>
</author>
<published>2019-11-21T13:36:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=071cdecec57fb5d5df78e6a12114ad7bccea5b0e'/>
<id>urn:sha1:071cdecec57fb5d5df78e6a12114ad7bccea5b0e</id>
<content type='text'>
Tetsuo pointed out that it was not only the device unregister hook that was
broken for devmap_hash types, it was also cleanup on map free. So better
fix this as well.

While we're at it, there's no reason to allocate the netdev_map array for
DEVMAP_HASH, so skip that and adjust the cost accordingly.

Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Reported-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Signed-off-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/20191121133612.430414-1-toke@redhat.com
</content>
</entry>
<entry>
<title>xdp: Handle device unregister for devmap_hash map type</title>
<updated>2019-10-21T22:51:41Z</updated>
<author>
<name>Toke Høiland-Jørgensen</name>
<email>toke@redhat.com</email>
</author>
<published>2019-10-19T11:19:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ce197d83a9fc42795c248c90983bf05faf0f013b'/>
<id>urn:sha1:ce197d83a9fc42795c248c90983bf05faf0f013b</id>
<content type='text'>
It seems I forgot to add handling of devmap_hash type maps to the device
unregister hook for devmaps. This omission causes devices to not be
properly released, which causes hangs.

Fix this by adding the missing handler.

Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index")
Reported-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Signed-off-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Link: https://lore.kernel.org/bpf/20191019111931.2981954-1-toke@redhat.com
</content>
</entry>
</feed>
