<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/net/packet, branch leds/master</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=leds%2Fmaster</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=leds%2Fmaster'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2015-06-24T09:58:51Z</updated>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2015-06-24T09:58:51Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2015-06-24T09:58:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3a07bd6fead4f00f67b1bf5f551e686661c4f52c'/>
<id>urn:sha1:3a07bd6fead4f00f67b1bf5f551e686661c4f52c</id>
<content type='text'>
Conflicts:
	drivers/net/ethernet/mellanox/mlx4/main.c
	net/packet/af_packet.c

Both conflicts were cases of simple overlapping changes.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>packet: remove handling of tx_ring</title>
<updated>2015-06-23T13:53:29Z</updated>
<author>
<name>Maninder Singh</name>
<email>maninder1.s@samsung.com</email>
</author>
<published>2015-06-22T07:09:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e8e85cc5eb5701b935a06b5b3a03a8532946f969'/>
<id>urn:sha1:e8e85cc5eb5701b935a06b5b3a03a8532946f969</id>
<content type='text'>
Remove handling of tx_ring in prb_setup_retire_blk_timer
for TPACKET_V3 because init_prb_bdqc is called only for zero tx_ring
and thus prb_setup_retire_blk_timer for zero tx_ring only.

And also in functon init_prb_bdqc there is no usage of tx_ring.
Thus removing tx_ring from init_prb_bdqc.

Signed-off-by: Maninder Singh &lt;maninder1.s@samsung.com&gt;
Suggested-by: Frans Klaver &lt;fransklaver@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>packet: avoid out of bounds read in round robin fanout</title>
<updated>2015-06-21T17:24:37Z</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2015-06-17T19:59:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=468479e6043c84f5a65299cc07cb08a22a28c2b1'/>
<id>urn:sha1:468479e6043c84f5a65299cc07cb08a22a28c2b1</id>
<content type='text'>
PACKET_FANOUT_LB computes f-&gt;rr_cur such that it is modulo
f-&gt;num_members. It returns the old value unconditionally, but
f-&gt;num_members may have changed since the last store. Ensure
that the return value is always &lt; num.

When modifying the logic, simplify it further by replacing the loop
with an unconditional atomic increment.

Fixes: dc99f600698d ("packet: Add fanout support.")
Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>packet: free packet_rollover after synchronize_net</title>
<updated>2015-06-21T16:30:42Z</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2015-06-16T16:51:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=59f211181b5b10d4a95e1b9226febb0c0b6497c6'/>
<id>urn:sha1:59f211181b5b10d4a95e1b9226febb0c0b6497c6</id>
<content type='text'>
Destruction of the po-&gt;rollover must be delayed until there are no
more packets in flight that can access it. The field is destroyed in
packet_release, before synchronize_net. Delay using rcu.

Fixes: 0648ab70afe6 ("packet: rollover prepare: per-socket state")

Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>packet: read num_members once in packet_rcv_fanout()</title>
<updated>2015-06-21T16:23:22Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-06-16T14:59:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f98f4514d07871da7a113dd9e3e330743fd70ae4'/>
<id>urn:sha1:f98f4514d07871da7a113dd9e3e330743fd70ae4</id>
<content type='text'>
We need to tell compiler it must not read f-&gt;num_members multiple
times. Otherwise testing if num is not zero is flaky, and we could
attempt an invalid divide by 0 in fanout_demux_cpu()

Note bug was present in packet_rcv_fanout_hash() and
packet_rcv_fanout_lb() but final 3.1 had a simple location
after commit 95ec3eb417115fb ("packet: Add 'cpu' fanout policy.")

Fixes: dc99f600698dc ("packet: Add fanout support.")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net-packet: fix null pointer exception in rollover mode</title>
<updated>2015-05-18T02:41:38Z</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2015-05-17T23:44:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4633c9e07b3b7d7fc262a5f59ff635c1f702af6f'/>
<id>urn:sha1:4633c9e07b3b7d7fc262a5f59ff635c1f702af6f</id>
<content type='text'>
Rollover can be enabled as flag or mode. Allocate state in both cases.
This solves a NULL pointer exception in fanout_demux_rollover on
referencing po-&gt;rollover if using mode rollover.

Also make sure that in rollover mode each silo is tried (contrary
to rollover flag, where the main socket is excluded after an initial
try_self).

Tested:
  Passes tools/testing/net/psock_fanout.c, which tests both modes and
  flag. My previous tests were limited to bench_rollover, which only
  stresses the flag. The test now completes safely. it still gives an
  error for mode rollover, because it does not expect the new headroom
  (ROOM_NORMAL) requirement. I will send a separate patch to the test.

Fixes: 0648ab70afe6 ("packet: rollover prepare: per-socket state")

Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;

----

I should have run this test and caught this before submission, of
course. Apologies for the oversight.
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>packet: fix warnings in rollover lock contention</title>
<updated>2015-05-14T21:40:54Z</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2015-05-14T19:25:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=54d7c01d3ed699cfc213115eaecfe1175cfaff8f'/>
<id>urn:sha1:54d7c01d3ed699cfc213115eaecfe1175cfaff8f</id>
<content type='text'>
Avoid two xchg calls whose return values were unused, causing a
warning on some architectures.

The relevant variable is a hint and read without mutual exclusion.
This fix makes all writers hold the receive_queue lock.

Suggested-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>packet: rollover statistics</title>
<updated>2015-05-13T19:43:00Z</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2015-05-12T15:56:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a9b6391814d5d6b8668fca2dace86949b7244e2e'/>
<id>urn:sha1:a9b6391814d5d6b8668fca2dace86949b7244e2e</id>
<content type='text'>
Rollover indicates exceptional conditions. Export a counter to inform
socket owners of this state.

If no socket with sufficient room is found, rollover fails. Also count
these events.

Finally, also count when flows are rolled over early thanks to huge
flow detection, to validate its correctness.

Tested:
  Read counters in bench_rollover on all other tests in the patchset

Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>packet: rollover huge flows before small flows</title>
<updated>2015-05-13T19:43:00Z</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2015-05-12T15:56:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3b3a5b0aab5b9ad345d4beb9a364a7dd02c23d40'/>
<id>urn:sha1:3b3a5b0aab5b9ad345d4beb9a364a7dd02c23d40</id>
<content type='text'>
Migrate flows from a socket to another socket in the fanout group not
only when the socket is full. Start migrating huge flows early, to
divert possible 4-tuple attacks without affecting normal traffic.

Introduce fanout_flow_is_huge(). This detects huge flows, which are
defined as taking up more than half the load. It does so cheaply, by
storing the rxhashes of the N most recent packets. If over half of
these are the same rxhash as the current packet, then drop it. This
only protects against 4-tuple attacks. N is chosen to fit all data in
a single cache line.

Tested:
  Ran bench_rollover for 10 sec with 1.5 Mpps of single flow input.

    lpbb5:/export/hda3/willemb# ./bench_rollover -l 1000 -r -s
    cpu         rx       rx.k     drop.k   rollover     r.huge   r.failed
      0         14         14          0          0          0          0
      1         20         20          0          0          0          0
      2         16         16          0          0          0          0
      3    6168824    6168824          0    4867721    4867721          0
      4    4867741    4867741          0          0          0          0
      5         12         12          0          0          0          0
      6         15         15          0          0          0          0
      7         17         17          0          0          0          0

Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>packet: rollover lock contention avoidance</title>
<updated>2015-05-13T19:43:00Z</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2015-05-12T15:56:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2ccdbaa6d55b0656244ba57c4b56765a0af76c0a'/>
<id>urn:sha1:2ccdbaa6d55b0656244ba57c4b56765a0af76c0a</id>
<content type='text'>
Rollover has to call packet_rcv_has_room on sockets in the fanout
group to find a socket to migrate to. This operation is expensive
especially if the packet sockets use rings, when a lock has to be
acquired.

Avoid pounding on the lock by all sockets by temporarily marking a
socket as "under memory pressure" when such pressure is detected.
While set, only the socket owner may call packet_rcv_has_room on the
socket. Once it detects normal conditions, it clears the flag. The
socket is not used as a victim by any other socket in the meantime.

Under reasonably balanced load, each socket writer frequently calls
packet_rcv_has_room and clears its own pressure field. As a backup
for when the socket is rarely written to, also clear the flag on
reading (packet_recvmsg, packet_poll) if this can be done cheaply
(i.e., without calling packet_rcv_has_room). This is only for
edge cases.

Tested:
  Ran bench_rollover: a process with 8 sockets in a single fanout
  group, each pinned to a single cpu that receives one nic recv
  interrupt. RPS and RFS are disabled. The benchmark uses packet
  rx_ring, which has to take a lock when determining whether a
  socket has room.

  Sent 3.5 Mpps of UDP traffic with sufficient entropy to spread
  uniformly across the packet sockets (and inserted an iptables
  rule to drop in PREROUTING to avoid protocol stack processing).

  Without this patch, all sockets try to migrate traffic to
  neighbors, causing lock contention when searching for a non-
  empty neighbor. The lock is the top 9 entries.

    perf record -a -g sleep 5

    -  17.82%   bench_rollover  [kernel.kallsyms]    [k] _raw_spin_lock
       - _raw_spin_lock
          - 99.00% spin_lock
    	 + 81.77% packet_rcv_has_room.isra.41
    	 + 18.23% tpacket_rcv
          + 0.84% packet_rcv_has_room.isra.41
    +   5.20%      ksoftirqd/6  [kernel.kallsyms]    [k] _raw_spin_lock
    +   5.15%      ksoftirqd/1  [kernel.kallsyms]    [k] _raw_spin_lock
    +   5.14%      ksoftirqd/2  [kernel.kallsyms]    [k] _raw_spin_lock
    +   5.12%      ksoftirqd/7  [kernel.kallsyms]    [k] _raw_spin_lock
    +   5.12%      ksoftirqd/5  [kernel.kallsyms]    [k] _raw_spin_lock
    +   5.10%      ksoftirqd/4  [kernel.kallsyms]    [k] _raw_spin_lock
    +   4.66%      ksoftirqd/0  [kernel.kallsyms]    [k] _raw_spin_lock
    +   4.45%      ksoftirqd/3  [kernel.kallsyms]    [k] _raw_spin_lock
    +   1.55%   bench_rollover  [kernel.kallsyms]    [k] packet_rcv_has_room.isra.41

  On net-next with this patch, this lock contention is no longer a
  top entry. Most time is spent in the actual read function. Next up
  are other locks:

    +  15.52%  bench_rollover  bench_rollover     [.] reader
    +   4.68%         swapper  [kernel.kallsyms]  [k] memcpy_erms
    +   2.77%         swapper  [kernel.kallsyms]  [k] packet_lookup_frame.isra.51
    +   2.56%     ksoftirqd/1  [kernel.kallsyms]  [k] memcpy_erms
    +   2.16%         swapper  [kernel.kallsyms]  [k] tpacket_rcv
    +   1.93%         swapper  [kernel.kallsyms]  [k] mlx4_en_process_rx_cq

  Looking closer at the remaining _raw_spin_lock, the cost of probing
  in rollover is now comparable to the cost of taking the lock later
  in tpacket_rcv.

    -   1.51%         swapper  [kernel.kallsyms]  [k] _raw_spin_lock
       - _raw_spin_lock
          + 33.41% packet_rcv_has_room
          + 28.15% tpacket_rcv
          + 19.54% enqueue_to_backlog
          + 6.45% __free_pages_ok
          + 2.78% packet_rcv_fanout
          + 2.13% fanout_demux_rollover
          + 2.01% netif_receive_skb_internal

Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
