<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include, branch v4.9.135</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.9.135</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.9.135'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2018-10-20T07:51:31Z</updated>
<entry>
<title>mremap: properly flush TLB before releasing the page</title>
<updated>2018-10-20T07:51:31Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-10-12T22:22:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e34bd9a96704f7089ccad61b6e01ea985fa54dd6'/>
<id>urn:sha1:e34bd9a96704f7089ccad61b6e01ea985fa54dd6</id>
<content type='text'>
commit eb66ae030829605d61fbef1909ce310e29f78821 upstream.

Jann Horn points out that our TLB flushing was subtly wrong for the
mremap() case.  What makes mremap() special is that we don't follow the
usual "add page to list of pages to be freed, then flush tlb, and then
free pages".  No, mremap() obviously just _moves_ the page from one page
table location to another.

That matters, because mremap() thus doesn't directly control the
lifetime of the moved page with a freelist: instead, the lifetime of the
page is controlled by the page table locking, that serializes access to
the entry.

As a result, we need to flush the TLB not just before releasing the lock
for the source location (to avoid any concurrent accesses to the entry),
but also before we release the destination page table lock (to avoid the
TLB being flushed after somebody else has already done something to that
page).

This also makes the whole "need_flush" logic unnecessary, since we now
always end up flushing the TLB for every valid entry.

Reported-and-tested-by: Jann Horn &lt;jannh@google.com&gt;
Acked-by: Will Deacon &lt;will.deacon@arm.com&gt;
Tested-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>ip: add helpers to process in-order fragments faster.</title>
<updated>2018-10-18T07:13:26Z</updated>
<author>
<name>Peter Oskolkov</name>
<email>posk@google.com</email>
</author>
<published>2018-10-10T19:30:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e9e4ac488c017739b2832177550ba2569fffc709'/>
<id>urn:sha1:e9e4ac488c017739b2832177550ba2569fffc709</id>
<content type='text'>
This patch introduces several helper functions/macros that will be
used in the follow-up patch. No runtime changes yet.

The new logic (fully implemented in the second patch) is as follows:

* Nodes in the rb-tree will now contain not single fragments, but lists
  of consecutive fragments ("runs").

* At each point in time, the current "active" run at the tail is
  maintained/tracked. Fragments that arrive in-order, adjacent
  to the previous tail fragment, are added to this tail run without
  triggering the re-balancing of the rb-tree.

* If a fragment arrives out of order with the offset _before_ the tail run,
  it is inserted into the rb-tree as a single fragment.

* If a fragment arrives after the current tail fragment (with a gap),
  it starts a new "tail" run, as is inserted into the rb-tree
  at the end as the head of the new run.

skb-&gt;cb is used to store additional information
needed here (suggested by Eric Dumazet).

Reported-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: Peter Oskolkov &lt;posk@google.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
(cherry picked from commit 353c9cb360874e737fb000545f783df756c06f9a)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ip: use rb trees for IP frag queue.</title>
<updated>2018-10-18T07:13:26Z</updated>
<author>
<name>Peter Oskolkov</name>
<email>posk@google.com</email>
</author>
<published>2018-10-10T19:30:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=10043954eadac2d8f8c1886190f7a7ee584ff939'/>
<id>urn:sha1:10043954eadac2d8f8c1886190f7a7ee584ff939</id>
<content type='text'>
(commit fa0f527358bd900ef92f925878ed6bfbd51305cc upstream)

Similar to TCP OOO RX queue, it makes sense to use rb trees to store
IP fragments, so that OOO fragments are inserted faster.

Tested:

- a follow-up patch contains a rather comprehensive ip defrag
  self-test (functional)
- ran neper `udp_stream -c -H &lt;host&gt; -F 100 -l 300 -T 20`:
    netstat --statistics
    Ip:
        282078937 total packets received
        0 forwarded
        0 incoming packets discarded
        946760 incoming packets delivered
        18743456 requests sent out
        101 fragments dropped after timeout
        282077129 reassemblies required
        944952 packets reassembled ok
        262734239 packet reassembles failed
   (The numbers/stats above are somewhat better re:
    reassemblies vs a kernel without this patchset. More
    comprehensive performance testing TBD).

Reported-by: Jann Horn &lt;jannh@google.com&gt;
Reported-by: Juha-Matti Tilli &lt;juha-matti.tilli@iki.fi&gt;
Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Peter Oskolkov &lt;posk@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net: add rb_to_skb() and other rb tree helpers</title>
<updated>2018-10-18T07:13:26Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2018-10-10T19:30:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b475cf3bf1e8212b0287c6d15249e2c942693ae5'/>
<id>urn:sha1:b475cf3bf1e8212b0287c6d15249e2c942693ae5</id>
<content type='text'>
Geeralize private netem_rb_to_skb()

TCP rtx queue will soon be converted to rb-tree,
so we will need skb_rbtree_walk() helpers.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
(cherry picked from commit 18a4c0eab2623cc95be98a1e6af1ad18e7695977)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends</title>
<updated>2018-10-18T07:13:26Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2018-10-10T19:30:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=791521e2e377f66ef5ee6e5002dec758234d8d32'/>
<id>urn:sha1:791521e2e377f66ef5ee6e5002dec758234d8d32</id>
<content type='text'>
After working on IP defragmentation lately, I found that some large
packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
zero paddings on the last (small) fragment.

While removing the padding with pskb_trim_rcsum(), we set skb-&gt;ip_summed
to CHECKSUM_NONE, forcing a full csum validation, even if all prior
fragments had CHECKSUM_COMPLETE set.

We can instead compute the checksum of the part we are trimming,
usually smaller than the part we keep.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
(cherry picked from commit 88078d98d1bb085d72af8437707279e203524fa5)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>net: modify skb_rbtree_purge to return the truesize of all purged skbs.</title>
<updated>2018-10-18T07:13:25Z</updated>
<author>
<name>Peter Oskolkov</name>
<email>posk@google.com</email>
</author>
<published>2018-10-10T19:30:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=871695951ec6f6b0b1a258c9bb5336bfeffab409'/>
<id>urn:sha1:871695951ec6f6b0b1a258c9bb5336bfeffab409</id>
<content type='text'>
Tested: see the next patch is the series.

Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Peter Oskolkov &lt;posk@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
(cherry picked from commit 385114dec8a49b5e5945e77ba7de6356106713f4)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ip: discard IPv4 datagrams with overlapping segments.</title>
<updated>2018-10-18T07:13:25Z</updated>
<author>
<name>Peter Oskolkov</name>
<email>posk@google.com</email>
</author>
<published>2018-10-10T19:30:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=82f36cbc74595f06900f478d4eaf7217a4f06e13'/>
<id>urn:sha1:82f36cbc74595f06900f478d4eaf7217a4f06e13</id>
<content type='text'>
This behavior is required in IPv6, and there is little need
to tolerate overlapping fragments in IPv4. This change
simplifies the code and eliminates potential DDoS attack vectors.

Tested: ran ip_defrag selftest (not yet available uptream).

Suggested-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Peter Oskolkov &lt;posk@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Acked-by: Stephen Hemminger &lt;stephen@networkplumber.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
(cherry picked from commit 7969e5c40dfd04799d4341f1b7cd266b6e47f227)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>inet: frags: get rid of ipfrag_skb_cb/FRAG_CB</title>
<updated>2018-10-18T07:13:25Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2018-10-10T19:30:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=316986fe4dcac011b4f85d9bbef1edf4953c0219'/>
<id>urn:sha1:316986fe4dcac011b4f85d9bbef1edf4953c0219</id>
<content type='text'>
ip_defrag uses skb-&gt;cb[] to store the fragment offset, and unfortunately
this integer is currently in a different cache line than skb-&gt;next,
meaning that we use two cache lines per skb when finding the insertion point.

By aliasing skb-&gt;ip_defrag_offset and skb-&gt;dev, we pack all the fields
in a single cache line and save precious memory bandwidth.

Note that after the fast path added by Changli Gao in commit
d6bebca92c66 ("fragment: add fast path for in-order fragments")
this change wont help the fast path, since we still need
to access prev-&gt;len (2nd cache line), but will show great
benefits when slow path is entered, since we perform
a linear scan of a potentially long list.

Also, note that this potential long list is an attack vector,
we might consider also using an rb-tree there eventually.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
(cherry picked from commit bf66337140c64c27fa37222b7abca7e49d63fb57)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>inet: frags: reorganize struct netns_frags</title>
<updated>2018-10-18T07:13:25Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2018-10-10T19:30:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5b68fda0a455be7f48fdf97407de1aa09d046fdd'/>
<id>urn:sha1:5b68fda0a455be7f48fdf97407de1aa09d046fdd</id>
<content type='text'>
Put the read-mostly fields in a separate cache line
at the beginning of struct netns_frags, to reduce
false sharing noticed in inet_frag_kill()

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
(cherry picked from commit c2615cf5a761b32bf74e85bddc223dfff3d9b9f0)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>rhashtable: reorganize struct rhashtable layout</title>
<updated>2018-10-18T07:13:25Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2018-10-10T19:30:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6060bcdcffaba68c3ff158a88faab6df27210ffc'/>
<id>urn:sha1:6060bcdcffaba68c3ff158a88faab6df27210ffc</id>
<content type='text'>
While under frags DDOS I noticed unfortunate false sharing between
@nelems and @params.automatic_shrinking

Move @nelems at the end of struct rhashtable so that first cache line
is shared between all cpus, because almost never dirtied.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
(cherry picked from commit e5d672a0780d9e7118caad4c171ec88b8299398d)
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
