<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/uapi/linux/tcp.h, branch v5.14.9</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.14.9</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.14.9'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2021-02-12T02:25:05Z</updated>
<entry>
<title>tcp: Sanitize CMSG flags and reserved args in tcp_zerocopy_receive.</title>
<updated>2021-02-12T02:25:05Z</updated>
<author>
<name>Arjun Roy</name>
<email>arjunroy@google.com</email>
</author>
<published>2021-02-11T21:21:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3c5a2fd042d0bfac71a2dfb99515723d318df47b'/>
<id>urn:sha1:3c5a2fd042d0bfac71a2dfb99515723d318df47b</id>
<content type='text'>
Explicitly define reserved field and require it and any subsequent
fields to be zero-valued for now. Additionally, limit the valid CMSG
flags that tcp_zerocopy_receive accepts.

Fixes: 7eeba1706eba ("tcp: Add receive timestamp support for receive zerocopy.")
Signed-off-by: Arjun Roy &lt;arjunroy@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Suggested-by: David Ahern &lt;dsahern@gmail.com&gt;
Suggested-by: Leon Romanovsky &lt;leon@kernel.org&gt;
Suggested-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Acked-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: Add receive timestamp support for receive zerocopy.</title>
<updated>2021-01-23T04:05:56Z</updated>
<author>
<name>Arjun Roy</name>
<email>arjunroy@google.com</email>
</author>
<published>2021-01-21T00:41:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7eeba1706eba6def15f6cb2fc7b3c3b9a2651edc'/>
<id>urn:sha1:7eeba1706eba6def15f6cb2fc7b3c3b9a2651edc</id>
<content type='text'>
tcp_recvmsg() uses the CMSG mechanism to receive control information
like packet receive timestamps. This patch adds CMSG fields to
struct tcp_zerocopy_receive, and provides receive timestamps
if available to the user.

Signed-off-by: Arjun Roy &lt;arjunroy@google.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: add TTL to SCM_TIMESTAMPING_OPT_STATS</title>
<updated>2021-01-23T02:20:52Z</updated>
<author>
<name>Yousuk Seung</name>
<email>ysseung@google.com</email>
</author>
<published>2021-01-20T20:41:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e7ed11ee945438b737e2ae2370e35591e16ec371'/>
<id>urn:sha1:e7ed11ee945438b737e2ae2370e35591e16ec371</id>
<content type='text'>
This patch adds TCP_NLA_TTL to SCM_TIMESTAMPING_OPT_STATS that exports
the time-to-live or hop limit of the latest incoming packet with
SCM_TSTAMP_ACK. The value exported may not be from the packet that acks
the sequence when incoming packets are aggregated. Exporting the
time-to-live or hop limit value of incoming packets helps to estimate
the hop count of the path of the flow that may change over time.

Signed-off-by: Yousuk Seung &lt;ysseung@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Link: https://lore.kernel.org/r/20210120204155.552275-1-ysseung@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>Fix whitespace in uapi/linux/tcp.h.</title>
<updated>2021-01-12T01:23:26Z</updated>
<author>
<name>Danilo Carvalho</name>
<email>doak@google.com</email>
</author>
<published>2021-01-08T22:21:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ad0bfc233ae2e7ee3bcb9a6089e4aa54e2b44fa1'/>
<id>urn:sha1:ad0bfc233ae2e7ee3bcb9a6089e4aa54e2b44fa1</id>
<content type='text'>
List of things fixed:
  - Two of the socket options were idented with spaces instead of tabs.
  - Trailing whitespace in some lines.
  - Improper spacing around parenthesis caught by checkpatch.pl.
  - Mix of space and tabs in tcp_word_hdr union.

Signed-off-by: Danilo Carvalho &lt;doak@google.com&gt;
Link: https://lore.kernel.org/r/20210108222104.2079472-1-doak@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net-zerocopy: Defer vm zap unless actually needed.</title>
<updated>2020-12-04T21:40:53Z</updated>
<author>
<name>Arjun Roy</name>
<email>arjunroy@google.com</email>
</author>
<published>2020-12-02T22:53:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=94ab9eb9b234ddf23af04a4bc7e8db68e67b8778'/>
<id>urn:sha1:94ab9eb9b234ddf23af04a4bc7e8db68e67b8778</id>
<content type='text'>
Zapping pages is required only if we are calling vm_insert_page into a
region where pages had previously been mapped. Receive zerocopy allows
reusing such regions, and hitherto called zap_page_range() before
calling vm_insert_page() in that range.

zap_page_range() can also be triggered from userspace with
madvise(MADV_DONTNEED). If userspace is configured to call this before
reusing a segment, or if there was nothing mapped at this virtual
address to begin with, we can avoid calling zap_page_range() under the
socket lock. That said, if userspace does not do that, then we are
still responsible for calling zap_page_range().

This patch adds a flag that the user can use to hint to the kernel
that a zap is not required. If the flag is not set, or if an older
user application does not have a flags field at all, then the kernel
calls zap_page_range as before. Also, if the flag is set but a zap is
still required, the kernel performs that zap as necessary. Thus
incorrectly indicating that a zap can be avoided does not change the
correctness of operation. It also increases the batchsize for
vm_insert_pages and prefetches the page struct for the batch since
we're about to bump the refcount.

An alternative mechanism could be to not have a flag, assume by
default a zap is not needed, and fall back to zapping if needed.
However, this would harm performance for older applications for which
a zap is necessary, and thus we implement it with an explicit flag
so newer applications can opt in.

When using RPC-style traffic with medium sized (tens of KB) RPCs, this
change yields an efficency improvement of about 30% for QPS/CPU usage.

Signed-off-by: Arjun Roy &lt;arjunroy@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net-zerocopy: Copy straggler unaligned data for TCP Rx. zerocopy.</title>
<updated>2020-12-04T21:40:52Z</updated>
<author>
<name>Arjun Roy</name>
<email>arjunroy@google.com</email>
</author>
<published>2020-12-02T22:53:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=18fb76ed53865c1b5d5f0157b1b825704590beb5'/>
<id>urn:sha1:18fb76ed53865c1b5d5f0157b1b825704590beb5</id>
<content type='text'>
When TCP receive zerocopy does not successfully map the entire
requested space, it outputs a 'hint' that the caller should recvmsg().

Augment zerocopy to accept a user buffer that it tries to copy this
hint into - if it is possible to copy the entire hint, it will do so.
This elides a recvmsg() call for received traffic that isn't exactly
page-aligned in size.

This was tested with RPC-style traffic of arbitrary sizes. Normally,
each received message required at least one getsockopt() call, and one
recvmsg() call for the remaining unaligned data.

With this change, almost all of the recvmsg() calls are eliminated,
leading to a savings of about 25%-50% in number of system calls
for RPC-style workloads.

Signed-off-by: Arjun Roy &lt;arjunroy@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>tcp: add earliest departure time to SCM_TIMESTAMPING_OPT_STATS</title>
<updated>2020-08-01T00:00:44Z</updated>
<author>
<name>Yousuk Seung</name>
<email>ysseung@google.com</email>
</author>
<published>2020-07-30T22:44:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=48040793fa6003d211f021c6ad273477bcd90d91'/>
<id>urn:sha1:48040793fa6003d211f021c6ad273477bcd90d91</id>
<content type='text'>
This change adds TCP_NLA_EDT to SCM_TIMESTAMPING_OPT_STATS that reports
the earliest departure time(EDT) of the timestamped skb. By tracking EDT
values of the skb from different timestamps, we can observe when and how
much the value changed. This allows to measure the precise delay
injected on the sender host e.g. by a bpf-base throttler.

Signed-off-by: Yousuk Seung &lt;ysseung@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Acked-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: add bytes not sent to SCM_TIMESTAMPING_OPT_STATS</title>
<updated>2020-03-10T00:56:33Z</updated>
<author>
<name>Yousuk Seung</name>
<email>ysseung@google.com</email>
</author>
<published>2020-03-09T20:16:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e08ab0b377a1489760533424437c5f4be7f484a4'/>
<id>urn:sha1:e08ab0b377a1489760533424437c5f4be7f484a4</id>
<content type='text'>
Add TCP_NLA_BYTES_NOTSENT to SCM_TIMESTAMPING_OPT_STATS that reports
bytes in the write queue but not sent. This is the same metric as
what is exported with tcp_info.tcpi_notsent_bytes.

Signed-off-by: Yousuk Seung &lt;ysseung@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Acked-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp-zerocopy: Return sk_err (if set) along with tcp receive zerocopy.</title>
<updated>2020-02-17T03:25:02Z</updated>
<author>
<name>Arjun Roy</name>
<email>arjunroy@google.com</email>
</author>
<published>2020-02-14T23:30:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=33946518d493cdf10aedb4a483f1aa41948a3dab'/>
<id>urn:sha1:33946518d493cdf10aedb4a483f1aa41948a3dab</id>
<content type='text'>
This patchset is intended to reduce the number of extra system calls
imposed by TCP receive zerocopy. For ping-pong RPC style workloads,
this patchset has demonstrated a system call reduction of about 30%
when coupled with userspace changes.

For applications using epoll, returning sk_err along with the result
of tcp receive zerocopy could remove the need to call
recvmsg()=-EAGAIN after a spurious wakeup.

Consider a multi-threaded application using epoll. A thread may awaken
with EPOLLIN but another thread may already be reading. The
spuriously-awoken thread does not necessarily know that another thread
'won'; rather, it may be possible that it was woken up due to the
presence of an error if there is no data. A zerocopy read receiving 0
bytes thus would need to be followed up by recvmsg to be sure.

Instead, we return sk_err directly with zerocopy, so the application
can avoid this extra system call.

Signed-off-by: Arjun Roy &lt;arjunroy@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp-zerocopy: Return inq along with tcp receive zerocopy.</title>
<updated>2020-02-17T03:25:02Z</updated>
<author>
<name>Arjun Roy</name>
<email>arjunroy@google.com</email>
</author>
<published>2020-02-14T23:30:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c8856c051454909e5059df4e81c77b9c366c5515'/>
<id>urn:sha1:c8856c051454909e5059df4e81c77b9c366c5515</id>
<content type='text'>
This patchset is intended to reduce the number of extra system calls
imposed by TCP receive zerocopy. For ping-pong RPC style workloads,
this patchset has demonstrated a system call reduction of about 30%
when coupled with userspace changes.

For applications using edge-triggered epoll, returning inq along with
the result of tcp receive zerocopy could remove the need to call
recvmsg()=-EAGAIN after a successful zerocopy. Generally speaking,
since normally we would need to perform a recvmsg() call for every
successful small RPC read via TCP receive zerocopy, returning inq can
reduce the number of system calls performed by approximately half.

Signed-off-by: Arjun Roy &lt;arjunroy@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
