<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/skmsg.h, branch v5.5.8</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.5.8</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.5.8'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2020-01-15T22:26:13Z</updated>
<entry>
<title>bpf: Sockmap/tls, push write_space updates through ulp updates</title>
<updated>2020-01-15T22:26:13Z</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2020-01-11T06:12:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=33bfe20dd7117dd81fd896a53f743a233e1ad64f'/>
<id>urn:sha1:33bfe20dd7117dd81fd896a53f743a233e1ad64f</id>
<content type='text'>
When sockmap sock with TLS enabled is removed we cleanup bpf/psock state
and call tcp_update_ulp() to push updates to TLS ULP on top. However, we
don't push the write_space callback up and instead simply overwrite the
op with the psock stored previous op. This may or may not be correct so
to ensure we don't overwrite the TLS write space hook pass this field to
the ULP and have it fixup the ctx.

This completes a previous fix that pushed the ops through to the ULP
but at the time missed doing this for write_space, presumably because
write_space TLS hook was added around the same time.

Fixes: 95fa145479fbc ("bpf: sockmap/tls, close can race with map free")
Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Acked-by: Jonathan Lemon &lt;jonathan.lemon@gmail.com&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/bpf/20200111061206.8028-4-john.fastabend@gmail.com
</content>
</entry>
<entry>
<title>bpf: Sockmap/tls, during free we may call tcp_bpf_unhash() in loop</title>
<updated>2020-01-15T22:26:13Z</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2020-01-11T06:11:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4da6a196f93b1af7612340e8c1ad8ce71e18f955'/>
<id>urn:sha1:4da6a196f93b1af7612340e8c1ad8ce71e18f955</id>
<content type='text'>
When a sockmap is free'd and a socket in the map is enabled with tls
we tear down the bpf context on the socket, the psock struct and state,
and then call tcp_update_ulp(). The tcp_update_ulp() call is to inform
the tls stack it needs to update its saved sock ops so that when the tls
socket is later destroyed it doesn't try to call the now destroyed psock
hooks.

This is about keeping stacked ULPs in good shape so they always have
the right set of stacked ops.

However, recently unhash() hook was removed from TLS side. But, the
sockmap/bpf side is not doing any extra work to update the unhash op
when is torn down instead expecting TLS side to manage it. So both
TLS and sockmap believe the other side is managing the op and instead
no one updates the hook so it continues to point at tcp_bpf_unhash().
When unhash hook is called we call tcp_bpf_unhash() which detects the
psock has already been destroyed and calls sk-&gt;sk_prot_unhash() which
calls tcp_bpf_unhash() yet again and so on looping and hanging the core.

To fix have sockmap tear down logic fixup the stale pointer.

Fixes: 5d92e631b8be ("net/tls: partially revert fix transition through disconnect with close")
Reported-by: syzbot+83979935eb6304f8cd46@syzkaller.appspotmail.com
Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Acked-by: Song Liu &lt;songliubraving@fb.com&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/bpf/20200111061206.8028-2-john.fastabend@gmail.com
</content>
</entry>
<entry>
<title>net: skmsg: fix TLS 1.3 crash with full sk_msg</title>
<updated>2019-11-29T06:40:29Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>jakub.kicinski@netronome.com</email>
</author>
<published>2019-11-27T20:16:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=031097d9e079e40dce401031d1012e83d80eaf01'/>
<id>urn:sha1:031097d9e079e40dce401031d1012e83d80eaf01</id>
<content type='text'>
TLS 1.3 started using the entry at the end of the SG array
for chaining-in the single byte content type entry. This mostly
works:

[ E E E E E E . . ]
  ^           ^
   start       end

                 E &lt; content type
               /
[ E E E E E E C . ]
  ^           ^
   start       end

(Where E denotes a populated SG entry; C denotes a chaining entry.)

If the array is full, however, the end will point to the start:

[ E E E E E E E E ]
  ^
   start
   end

And we end up overwriting the start:

    E &lt; content type
   /
[ C E E E E E E E ]
  ^
   start
   end

The sg array is supposed to be a circular buffer with start and
end markers pointing anywhere. In case where start &gt; end
(i.e. the circular buffer has "wrapped") there is an extra entry
reserved at the end to chain the two halves together.

[ E E E E E E . . l ]

(Where l is the reserved entry for "looping" back to front.

As suggested by John, let's reserve another entry for chaining
SG entries after the main circular buffer. Note that this entry
has to be pointed to by the end entry so its position is not fixed.

Examples of full messages:

[ E E E E E E E E . l ]
  ^               ^
   start           end

   &lt;---------------.
[ E E . E E E E E E l ]
      ^ ^
   end   start

Now the end will always point to an unused entry, so TLS 1.3
can always use it.

Fixes: 130b392c6cd6 ("net: tls: Add tls 1.3 support")
Signed-off-by: Jakub Kicinski &lt;jakub.kicinski@netronome.com&gt;
Reviewed-by: Simon Horman &lt;simon.horman@netronome.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2019-11-09T19:04:37Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2019-11-09T19:04:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=14684b93019a2d2ece0df5acaf921924541b928d'/>
<id>urn:sha1:14684b93019a2d2ece0df5acaf921924541b928d</id>
<content type='text'>
One conflict in the BPF samples Makefile, some fixes in 'net' whilst
we were converting over to Makefile.target rules in 'net-next'.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net/tls: fix sk_msg trim on fallback to copy mode</title>
<updated>2019-11-06T02:07:47Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>jakub.kicinski@netronome.com</email>
</author>
<published>2019-11-04T23:36:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=683916f6a84023407761d843048f1aea486b2612'/>
<id>urn:sha1:683916f6a84023407761d843048f1aea486b2612</id>
<content type='text'>
sk_msg_trim() tries to only update curr pointer if it falls into
the trimmed region. The logic, however, does not take into the
account pointer wrapping that sk_msg_iter_var_prev() does nor
(as John points out) the fact that msg-&gt;sg is a ring buffer.

This means that when the message was trimmed completely, the new
curr pointer would have the value of MAX_MSG_FRAGS - 1, which is
neither smaller than any other value, nor would it actually be
correct.

Special case the trimming to 0 length a little bit and rework
the comparison between curr and end to take into account wrapping.

This bug caused the TLS code to not copy all of the message, if
zero copy filled in fewer sg entries than memcopy would need.

Big thanks to Alexander Potapenko for the non-KMSAN reproducer.

v2:
 - take into account that msg-&gt;sg is a ring buffer (John).

Link: https://lore.kernel.org/netdev/20191030160542.30295-1-jakub.kicinski@netronome.com/ (v1)

Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface")
Reported-by: syzbot+f8495bff23a879a6d0bd@syzkaller.appspotmail.com
Reported-by: syzbot+6f50c99e8f6194bf363f@syzkaller.appspotmail.com
Co-developed-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Jakub Kicinski &lt;jakub.kicinski@netronome.com&gt;
Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: sockmap: use bitmap for copy info</title>
<updated>2019-10-07T13:58:27Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>jakub.kicinski@netronome.com</email>
</author>
<published>2019-10-07T04:09:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=163ab96b52ae2bb2d8f188cd29f0b570610f9007'/>
<id>urn:sha1:163ab96b52ae2bb2d8f188cd29f0b570610f9007</id>
<content type='text'>
Don't use bool array in struct sk_msg_sg, save 12 bytes.

Signed-off-by: Jakub Kicinski &lt;jakub.kicinski@netronome.com&gt;
Reviewed-by: Dirk van der Merwe &lt;dirk.vandermerwe@netronome.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: sockmap/tls, close can race with map free</title>
<updated>2019-07-22T14:04:17Z</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2019-07-19T17:29:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=95fa145479fbc0a0c1fd3274ceb42ec03c042a4a'/>
<id>urn:sha1:95fa145479fbc0a0c1fd3274ceb42ec03c042a4a</id>
<content type='text'>
When a map free is called and in parallel a socket is closed we
have two paths that can potentially reset the socket prot ops, the
bpf close() path and the map free path. This creates a problem
with which prot ops should be used from the socket closed side.

If the map_free side completes first then we want to call the
original lowest level ops. However, if the tls path runs first
we want to call the sockmap ops. Additionally there was no locking
around prot updates in TLS code paths so the prot ops could
be changed multiple times once from TLS path and again from sockmap
side potentially leaving ops pointed at either TLS or sockmap
when psock and/or tls context have already been destroyed.

To fix this race first only update ops inside callback lock
so that TLS, sockmap and lowest level all agree on prot state.
Second and a ULP callback update() so that lower layers can
inform the upper layer when they are being removed allowing the
upper layer to reset prot ops.

This gets us close to allowing sockmap and tls to be stacked
in arbitrary order but will save that patch for *next trees.

v4:
 - make sure we don't free things for device;
 - remove the checks which swap the callbacks back
   only if TLS is at the top.

Reported-by: syzbot+06537213db7ba2745c4a@syzkaller.appspotmail.com
Fixes: 02c558b2d5d6 ("bpf: sockmap, support for msg_peek in sk_msg with redirect ingress")
Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Jakub Kicinski &lt;jakub.kicinski@netronome.com&gt;
Reviewed-by: Dirk van der Merwe &lt;dirk.vandermerwe@netronome.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: sockmap, restore sk_write_space when psock gets dropped</title>
<updated>2019-05-23T14:13:29Z</updated>
<author>
<name>Jakub Sitnicki</name>
<email>jakub@cloudflare.com</email>
</author>
<published>2019-05-22T10:01:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=186bcc3dcd106dc52d706117f912054b616666e1'/>
<id>urn:sha1:186bcc3dcd106dc52d706117f912054b616666e1</id>
<content type='text'>
Once psock gets unlinked from its sock (sk_psock_drop), user-space can
still trigger a call to sk-&gt;sk_write_space by setting TCP_NOTSENT_LOWAT
socket option. This causes a null-ptr-deref because we try to read
psock-&gt;saved_write_space from sk_psock_write_space:

==================================================================
BUG: KASAN: null-ptr-deref in sk_psock_write_space+0x69/0x80
Read of size 8 at addr 00000000000001a0 by task sockmap-echo/131

CPU: 0 PID: 131 Comm: sockmap-echo Not tainted 5.2.0-rc1-00094-gf49aa1de9836 #81
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29 04/01/2014
Call Trace:
 ? sk_psock_write_space+0x69/0x80
 __kasan_report.cold.2+0x5/0x3f
 ? sk_psock_write_space+0x69/0x80
 kasan_report+0xe/0x20
 sk_psock_write_space+0x69/0x80
 tcp_setsockopt+0x69a/0xfc0
 ? tcp_shutdown+0x70/0x70
 ? fsnotify+0x5b0/0x5f0
 ? remove_wait_queue+0x90/0x90
 ? __fget_light+0xa5/0xf0
 __sys_setsockopt+0xe6/0x180
 ? sockfd_lookup_light+0xb0/0xb0
 ? vfs_write+0x195/0x210
 ? ksys_write+0xc9/0x150
 ? __x64_sys_read+0x50/0x50
 ? __bpf_trace_x86_fpu+0x10/0x10
 __x64_sys_setsockopt+0x61/0x70
 do_syscall_64+0xc5/0x520
 ? vmacache_find+0xc0/0x110
 ? syscall_return_slowpath+0x110/0x110
 ? handle_mm_fault+0xb4/0x110
 ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe
 ? trace_hardirqs_off_caller+0x4b/0x120
 ? trace_hardirqs_off_thunk+0x1a/0x3a
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f2e5e7cdcce
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b1 66 2e 0f 1f 84 00 00 00 00 00
0f 1f 44 00 00 f3 0f 1e fa 49 89 ca b8 36 00 00 00 0f 05 &lt;48&gt; 3d 01 f0 ff
ff 73 01 c3 48 8b 0d 8a 11 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffed011b778 EFLAGS: 00000206 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2e5e7cdcce
RDX: 0000000000000019 RSI: 0000000000000006 RDI: 0000000000000007
RBP: 00007ffed011b790 R08: 0000000000000004 R09: 00007f2e5e84ee80
R10: 00007ffed011b788 R11: 0000000000000206 R12: 00007ffed011b78c
R13: 00007ffed011b788 R14: 0000000000000007 R15: 0000000000000068
==================================================================

Restore the saved sk_write_space callback when psock is being dropped to
fix the crash.

Signed-off-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: sk_msg, fix socket data_ready events</title>
<updated>2018-12-20T22:47:09Z</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2018-12-20T19:35:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=552de91068828daef50a227a665068cf8dde835e'/>
<id>urn:sha1:552de91068828daef50a227a665068cf8dde835e</id>
<content type='text'>
When a skb verdict program is in-use and either another BPF program
redirects to that socket or the new SK_PASS support is used the
data_ready callback does not wake up application. Instead because
the stream parser/verdict is using the sk data_ready callback we wake
up the stream parser/verdict block.

Fix this by adding a helper to check if the stream parser block is
enabled on the sk and if so call the saved pointer which is the
upper layers wake up function.

This fixes application stalls observed when an application is waiting
for data in a blocking read().

Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface")
Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf: skmsg, replace comments with BUILD bug</title>
<updated>2018-12-20T22:47:09Z</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2018-12-20T19:35:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7a69c0f250568e6ab72f401b2c69aa0e666c94f2'/>
<id>urn:sha1:7a69c0f250568e6ab72f401b2c69aa0e666c94f2</id>
<content type='text'>
Enforce comment on structure layout dependency with a BUILD_BUG_ON
to ensure the condition is maintained.

Suggested-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
</feed>
