<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/net/inetpeer.h, branch v3.2.63</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2.63</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2.63'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2014-09-13T22:41:48Z</updated>
<entry>
<title>inetpeer: get rid of ip_id_count</title>
<updated>2014-09-13T22:41:48Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-06-02T12:26:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=64b5c251d5b2cee4a0f697bfb90d79263f6dd517'/>
<id>urn:sha1:64b5c251d5b2cee4a0f697bfb90d79263f6dd517</id>
<content type='text'>
[ Upstream commit 73f156a6e8c1074ac6327e0abd1169e95eb66463 ]

Ideally, we would need to generate IP ID using a per destination IP
generator.

linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.

1) each inet_peer struct consumes 192 bytes

2) inetpeer cache uses a binary tree of inet_peer structs,
   with a nominal size of ~66000 elements under load.

3) lookups in this tree are hitting a lot of cache lines, as tree depth
   is about 20.

4) If server deals with many tcp flows, we have a high probability of
   not finding the inet_peer, allocating a fresh one, inserting it in
   the tree with same initial ip_id_count, (cf secure_ip_id())

5) We garbage collect inet_peer aggressively.

IP ID generation do not have to be 'perfect'

Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.

We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.

ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)

secure_ip_id() and secure_ipv6_id() no longer are needed.

Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>net: fix inet_getid() and ipv6_select_ident() bugs</title>
<updated>2014-07-11T12:33:57Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-05-29T15:45:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bd89ed9b75277f5a16f91a2331539dd34a8cf852'/>
<id>urn:sha1:bd89ed9b75277f5a16f91a2331539dd34a8cf852</id>
<content type='text'>
[ Upstream commit 39c36094d78c39e038c1e499b2364e13bce36f54 ]

I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery
is disabled.
Note how GSO/TSO packets do not have monotonically incrementing ID.

06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396)
06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212)
06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972)
06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292)
06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764)

It appears I introduced this bug in linux-3.1.

inet_getid() must return the old value of peer-&gt;ip_id_count,
not the new one.

Lets revert this part, and remove the prevention of
a null identification field in IPv6 Fragment Extension Header,
which is dubious and not even done properly.

Fixes: 87c48fa3b463 ("ipv6: make fragment identifications less predictable")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>inetpeer: fix a race in inetpeer_gc_worker()</title>
<updated>2013-10-26T20:05:57Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2012-06-05T03:00:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8f9b44910e7456fd898fe5da1f45e2d463cbfb31'/>
<id>urn:sha1:8f9b44910e7456fd898fe5da1f45e2d463cbfb31</id>
<content type='text'>
[ Upstream commit 55432d2b543a4b6dfae54f5c432a566877a85d90 ]

commit 5faa5df1fa2024 (inetpeer: Invalidate the inetpeer tree along with
the routing cache) added a race :

Before freeing an inetpeer, we must respect a RCU grace period, and make
sure no user will attempt to increase refcnt.

inetpeer_invalidate_tree() waits for a RCU grace period before inserting
inetpeer tree into gc_list and waking the worker. At that time, no
concurrent lookup can find a inetpeer in this tree.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Acked-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>inetpeer: Invalidate the inetpeer tree along with the routing cache</title>
<updated>2013-10-26T20:05:57Z</updated>
<author>
<name>Steffen Klassert</name>
<email>steffen.klassert@secunet.com</email>
</author>
<published>2012-03-06T21:20:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c25e82c03fed5bf21418963cebc92de7797f9b3b'/>
<id>urn:sha1:c25e82c03fed5bf21418963cebc92de7797f9b3b</id>
<content type='text'>
[ Upstream commit 5faa5df1fa2024bd750089ff21dcc4191798263d ]

We initialize the routing metrics with the values cached on the
inetpeer in rt_init_metrics(). So if we have the metrics cached on the
inetpeer, we ignore the user configured fib_metrics.

To fix this issue, we replace the old tree with a fresh initialized
inet_peer_base. The old tree is removed later with a delayed work queue.

Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>inet: add a redirect generation id in inetpeer</title>
<updated>2011-11-27T00:16:37Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-11-26T12:13:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=de68dca1816660b0d3ac89fa59ffb410007a143f'/>
<id>urn:sha1:de68dca1816660b0d3ac89fa59ffb410007a143f</id>
<content type='text'>
Now inetpeer is the place where we cache redirect information for ipv4
destinations, we must be able to invalidate informations when a route is
added/removed on host.

As inetpeer is not yet namespace aware, this patch adds a shared
redirect_genid, and a per inetpeer redirect_genid. This might be changed
later if inetpeer becomes ns aware.

Cache information for one inerpeer is valid as long as its
redirect_genid has the same value than global redirect_genid.

Reported-by: Arkadiusz Miśkiewicz &lt;a.miskiewicz@gmail.com&gt;
Tested-by: Arkadiusz Miśkiewicz &lt;a.miskiewicz@gmail.com&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>atomic: use &lt;linux/atomic.h&gt;</title>
<updated>2011-07-26T23:49:47Z</updated>
<author>
<name>Arun Sharma</name>
<email>asharma@fb.com</email>
</author>
<published>2011-07-26T23:09:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=60063497a95e716c9a689af3be2687d261f115b4'/>
<id>urn:sha1:60063497a95e716c9a689af3be2687d261f115b4</id>
<content type='text'>
This allows us to move duplicated code in &lt;asm/atomic.h&gt;
(atomic_inc_not_zero() for now) to &lt;linux/atomic.h&gt;

Signed-off-by: Arun Sharma &lt;asharma@fb.com&gt;
Reviewed-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Acked-by: Mike Frysinger &lt;vapier@gentoo.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipv6: make fragment identifications less predictable</title>
<updated>2011-07-22T04:25:58Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-07-22T04:25:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=87c48fa3b4630905f98268dde838ee43626a060c'/>
<id>urn:sha1:87c48fa3b4630905f98268dde838ee43626a060c</id>
<content type='text'>
IPv6 fragment identification generation is way beyond what we use for
IPv4 : It uses a single generator. Its not scalable and allows DOS
attacks.

Now inetpeer is IPv6 aware, we can use it to provide a more secure and
scalable frag ident generator (per destination, instead of system wide)

This patch :
1) defines a new secure_ipv6_id() helper
2) extends inet_getid() to provide 32bit results
3) extends ipv6_select_ident() with a new dest parameter

Reported-by: Fernando Gont &lt;fernando@gont.com.ar&gt;
Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inetpeer: lower false sharing effect</title>
<updated>2011-06-09T06:31:27Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-06-09T06:31:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2b77bdde97ae8241dcc23110a4e837acfbc83438'/>
<id>urn:sha1:2b77bdde97ae8241dcc23110a4e837acfbc83438</id>
<content type='text'>
Profiles show false sharing in addr_compare() because refcnt/dtime
changes dirty the first inet_peer cache line, where are lying the keys
used at lookup time. If many cpus are calling inet_getpeer() and
inet_putpeer(), or need frag ids, addr_compare() is in 2nd position in
"perf top".

Before patch, my udpflood bench (16 threads) on my 2x4x2 machine :

             5784.00  9.7% csum_partial_copy_generic [kernel]
             3356.00  5.6% addr_compare              [kernel]
             2638.00  4.4% fib_table_lookup          [kernel]
             2625.00  4.4% ip_fragment               [kernel]
             1934.00  3.2% neigh_lookup              [kernel]
             1617.00  2.7% udp_sendmsg               [kernel]
             1608.00  2.7% __ip_route_output_key     [kernel]
             1480.00  2.5% __ip_append_data          [kernel]
             1396.00  2.3% kfree                     [kernel]
             1195.00  2.0% kmem_cache_free           [kernel]
             1157.00  1.9% inet_getpeer              [kernel]
             1121.00  1.9% neigh_resolve_output      [kernel]
             1012.00  1.7% dev_queue_xmit            [kernel]
# time ./udpflood.sh

real	0m44.511s
user	0m20.020s
sys	11m22.780s

# time ./udpflood.sh

real	0m44.099s
user	0m20.140s
sys	11m15.870s

After patch, no more addr_compare() in profiles :

             4171.00 10.7% csum_partial_copy_generic   [kernel]
             1787.00  4.6% fib_table_lookup            [kernel]
             1756.00  4.5% ip_fragment                 [kernel]
             1234.00  3.2% udp_sendmsg                 [kernel]
             1191.00  3.0% neigh_lookup                [kernel]
             1118.00  2.9% __ip_append_data            [kernel]
             1022.00  2.6% kfree                       [kernel]
              993.00  2.5% __ip_route_output_key       [kernel]
              841.00  2.2% neigh_resolve_output        [kernel]
              816.00  2.1% kmem_cache_free             [kernel]
              658.00  1.7% ia32_sysenter_target        [kernel]
              632.00  1.6% kmem_cache_alloc_node       [kernel]

# time ./udpflood.sh

real	0m41.587s
user	0m19.190s
sys	10m36.370s

# time ./udpflood.sh

real	0m41.486s
user	0m19.290s
sys	10m33.650s

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inetpeer: remove unused list</title>
<updated>2011-06-09T00:05:30Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-06-08T13:35:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4b9d9be839fdb7dcd7ce7619a623fd9015a50cda'/>
<id>urn:sha1:4b9d9be839fdb7dcd7ce7619a623fd9015a50cda</id>
<content type='text'>
Andi Kleen and Tim Chen reported huge contention on inetpeer
unused_peers.lock, on memcached workload on a 40 core machine, with
disabled route cache.

It appears we constantly flip peers refcnt between 0 and 1 values, and
we must insert/remove peers from unused_peers.list, holding a contended
spinlock.

Remove this list completely and perform a garbage collection on-the-fly,
at lookup time, using the expired nodes we met during the tree
traversal.

This removes a lot of code, makes locking more standard, and obsoletes
two sysctls (inet_peer_gc_mintime and inet_peer_gc_maxtime). This also
removes two pointers in inet_peer structure.

There is still a false sharing effect because refcnt is in first cache
line of object [were the links and keys used by lookups are located], we
might move it at the end of inet_peer structure to let this first cache
line mostly read by cpus.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
CC: Andi Kleen &lt;andi@firstfloor.org&gt;
CC: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>inet: constify ip headers and in6_addr</title>
<updated>2011-04-22T18:04:14Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-04-22T04:53:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b71d1d426d263b0b6cb5760322efebbfc89d4463'/>
<id>urn:sha1:b71d1d426d263b0b6cb5760322efebbfc89d4463</id>
<content type='text'>
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
