<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/samples, branch v4.2.8</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.2.8</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.2.8'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2015-07-17T18:15:13Z</updated>
<entry>
<title>tracing: Fix sample output of dynamic arrays</title>
<updated>2015-07-17T18:15:13Z</updated>
<author>
<name>Steven Rostedt (Red Hat)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2015-07-17T18:03:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d6726c8145290bef950ae2538ea6ae1d96a1944b'/>
<id>urn:sha1:d6726c8145290bef950ae2538ea6ae1d96a1944b</id>
<content type='text'>
He Kuang noticed that the trace event samples for arrays was broken:

"The output result of trace_foo_bar event in traceevent samples is
 wrong. This problem can be reproduced as following:

  (Build kernel with SAMPLE_TRACE_EVENTS=m)

  $ insmod trace-events-sample.ko

  $ echo 1 &gt; /sys/kernel/debug/tracing/events/sample-trace/foo_bar/enable

  $ cat /sys/kernel/debug/tracing/trace

  event-sample-980 [000] ....  43.649559: foo_bar: foo hello 21 0x15
  BIT1|BIT3|0x10 {0x1,0x6f6f6e53,0xff007970,0xffffffff} Snoopy
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                 The array length is not right, should be {0x1}.
  (ffffffff,ffffffff)

  event-sample-980 [000] ....  44.653827: foo_bar: foo hello 22 0x16
  BIT2|BIT3|0x10
  {0x1,0x2,0x646e6147,0x666c61,0xffffffff,0xffffffff,0x750aeffe,0x7}
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                 The array length is not right, should be {0x1,0x2}.
  Gandalf (ffffffff,ffffffff)"

This was caused by an update to have __print_array()'s second parameter
be the count of items in the array and not the size of the array.

As there is already users of __print_array(), it can not change. But
the sample code can and we can also improve on the documentation about
__print_array() and __get_dynamic_array_len().

Link: http://lkml.kernel.org/r/1436839171-31527-2-git-send-email-hekuang@huawei.com

Fixes: ac01ce1410fc2 ("tracing: Make ftrace_print_array_seq compute buf_len")
Reported-by: He Kuang &lt;hekuang@huawei.com&gt;
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>bpf: BPF based latency tracing</title>
<updated>2015-06-23T13:09:58Z</updated>
<author>
<name>Daniel Wagner</name>
<email>daniel.wagner@bmw-carit.de</email>
</author>
<published>2015-06-19T14:00:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0fb1170ee68a6aa14eca0666e02c4b62cbf1251d'/>
<id>urn:sha1:0fb1170ee68a6aa14eca0666e02c4b62cbf1251d</id>
<content type='text'>
BPF offers another way to generate latency histograms. We attach
kprobes at trace_preempt_off and trace_preempt_on and calculate the
time it takes to from seeing the off/on transition.

The first array is used to store the start time stamp. The key is the
CPU id. The second array stores the log2(time diff). We need to use
static allocation here (array and not hash tables). The kprobes
hooking into trace_preempt_on|off should not calling any dynamic
memory allocation or free path. We need to avoid recursivly
getting called. Besides that, it reduces jitter in the measurement.

CPU 0
      latency        : count     distribution
       1 -&gt; 1        : 0        |                                        |
       2 -&gt; 3        : 0        |                                        |
       4 -&gt; 7        : 0        |                                        |
       8 -&gt; 15       : 0        |                                        |
      16 -&gt; 31       : 0        |                                        |
      32 -&gt; 63       : 0        |                                        |
      64 -&gt; 127      : 0        |                                        |
     128 -&gt; 255      : 0        |                                        |
     256 -&gt; 511      : 0        |                                        |
     512 -&gt; 1023     : 0        |                                        |
    1024 -&gt; 2047     : 0        |                                        |
    2048 -&gt; 4095     : 166723   |*************************************** |
    4096 -&gt; 8191     : 19870    |***                                     |
    8192 -&gt; 16383    : 6324     |                                        |
   16384 -&gt; 32767    : 1098     |                                        |
   32768 -&gt; 65535    : 190      |                                        |
   65536 -&gt; 131071   : 179      |                                        |
  131072 -&gt; 262143   : 18       |                                        |
  262144 -&gt; 524287   : 4        |                                        |
  524288 -&gt; 1048575  : 1363     |                                        |
CPU 1
      latency        : count     distribution
       1 -&gt; 1        : 0        |                                        |
       2 -&gt; 3        : 0        |                                        |
       4 -&gt; 7        : 0        |                                        |
       8 -&gt; 15       : 0        |                                        |
      16 -&gt; 31       : 0        |                                        |
      32 -&gt; 63       : 0        |                                        |
      64 -&gt; 127      : 0        |                                        |
     128 -&gt; 255      : 0        |                                        |
     256 -&gt; 511      : 0        |                                        |
     512 -&gt; 1023     : 0        |                                        |
    1024 -&gt; 2047     : 0        |                                        |
    2048 -&gt; 4095     : 114042   |*************************************** |
    4096 -&gt; 8191     : 9587     |**                                      |
    8192 -&gt; 16383    : 4140     |                                        |
   16384 -&gt; 32767    : 673      |                                        |
   32768 -&gt; 65535    : 179      |                                        |
   65536 -&gt; 131071   : 29       |                                        |
  131072 -&gt; 262143   : 4        |                                        |
  262144 -&gt; 524287   : 1        |                                        |
  524288 -&gt; 1048575  : 364      |                                        |
CPU 2
      latency        : count     distribution
       1 -&gt; 1        : 0        |                                        |
       2 -&gt; 3        : 0        |                                        |
       4 -&gt; 7        : 0        |                                        |
       8 -&gt; 15       : 0        |                                        |
      16 -&gt; 31       : 0        |                                        |
      32 -&gt; 63       : 0        |                                        |
      64 -&gt; 127      : 0        |                                        |
     128 -&gt; 255      : 0        |                                        |
     256 -&gt; 511      : 0        |                                        |
     512 -&gt; 1023     : 0        |                                        |
    1024 -&gt; 2047     : 0        |                                        |
    2048 -&gt; 4095     : 40147    |*************************************** |
    4096 -&gt; 8191     : 2300     |*                                       |
    8192 -&gt; 16383    : 828      |                                        |
   16384 -&gt; 32767    : 178      |                                        |
   32768 -&gt; 65535    : 59       |                                        |
   65536 -&gt; 131071   : 2        |                                        |
  131072 -&gt; 262143   : 0        |                                        |
  262144 -&gt; 524287   : 1        |                                        |
  524288 -&gt; 1048575  : 174      |                                        |
CPU 3
      latency        : count     distribution
       1 -&gt; 1        : 0        |                                        |
       2 -&gt; 3        : 0        |                                        |
       4 -&gt; 7        : 0        |                                        |
       8 -&gt; 15       : 0        |                                        |
      16 -&gt; 31       : 0        |                                        |
      32 -&gt; 63       : 0        |                                        |
      64 -&gt; 127      : 0        |                                        |
     128 -&gt; 255      : 0        |                                        |
     256 -&gt; 511      : 0        |                                        |
     512 -&gt; 1023     : 0        |                                        |
    1024 -&gt; 2047     : 0        |                                        |
    2048 -&gt; 4095     : 29626    |*************************************** |
    4096 -&gt; 8191     : 2704     |**                                      |
    8192 -&gt; 16383    : 1090     |                                        |
   16384 -&gt; 32767    : 160      |                                        |
   32768 -&gt; 65535    : 72       |                                        |
   65536 -&gt; 131071   : 32       |                                        |
  131072 -&gt; 262143   : 26       |                                        |
  262144 -&gt; 524287   : 12       |                                        |
  524288 -&gt; 1048575  : 298      |                                        |

All this is based on the trace3 examples written by
Alexei Starovoitov &lt;ast@plumgrid.com&gt;.

Signed-off-by: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
Cc: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Cc: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: introduce current-&gt;pid, tgid, uid, gid, comm accessors</title>
<updated>2015-06-15T22:53:50Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2015-06-13T02:39:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ffeedafbf0236f03aeb2e8db273b3e5ae5f5bc89'/>
<id>urn:sha1:ffeedafbf0236f03aeb2e8db273b3e5ae5f5bc89</id>
<content type='text'>
eBPF programs attached to kprobes need to filter based on
current-&gt;pid, uid and other fields, so introduce helper functions:

u64 bpf_get_current_pid_tgid(void)
Return: current-&gt;tgid &lt;&lt; 32 | current-&gt;pid

u64 bpf_get_current_uid_gid(void)
Return: current_gid &lt;&lt; 32 | current_uid

bpf_get_current_comm(char *buf, int size_of_buf)
stores current-&gt;comm into buf

They can be used from the programs attached to TC as well to classify packets
based on current task fields.

Update tracex2 example to print histogram of write syscalls for each process
instead of aggregated for all.

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: allow programs to write to certain skb fields</title>
<updated>2015-06-07T09:01:33Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2015-06-04T17:11:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d691f9e8d4405c334aa10d556e73c8bf44cb0e01'/>
<id>urn:sha1:d691f9e8d4405c334aa10d556e73c8bf44cb0e01</id>
<content type='text'>
allow programs read/write skb-&gt;mark, tc_index fields and
((struct qdisc_skb_cb *)cb)-&gt;data.

mark and tc_index are generically useful in TC.
cb[0]-cb[4] are primarily used to pass arguments from one
program to another called via bpf_tail_call() which can
be seen in sockex3_kern.c example.

All fields of 'struct __sk_buff' are readable to socket and tc_cls_act progs.
mark, tc_index are writeable from tc_cls_act only.
cb[0]-cb[4] are writeable by both sockets and tc_cls_act.

Add verifier tests and improve sample code.

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: make programs see skb-&gt;data == L2 for ingress and egress</title>
<updated>2015-06-07T09:01:33Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2015-06-04T17:11:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3431205e03977aaf32bce6d4b16fb8244b510056'/>
<id>urn:sha1:3431205e03977aaf32bce6d4b16fb8244b510056</id>
<content type='text'>
eBPF programs attached to ingress and egress qdiscs see inconsistent skb-&gt;data.
For ingress L2 header is already pulled, whereas for egress it's present.
This is known to program writers which are currently forced to use
BPF_LL_OFF workaround.
Since programs don't change skb internal pointers it is safe to do
pull/push right around invocation of the program and earlier taps and
later pt-&gt;func() will not be affected.
Multiple taps via packet_rcv(), tpacket_rcv() are doing the same trick
around run_filter/BPF_PROG_RUN even if skb_shared.

This fix finally allows programs to use optimized LD_ABS/IND instructions
without BPF_LL_OFF for higher performance.
tc ingress + cls_bpf + samples/bpf/tcbpf1_kern.o
       w/o JIT   w/JIT
before  20.5     23.6 Mpps
after   21.8     26.6 Mpps

Old programs with BPF_LL_OFF will still work as-is.

We can now undo most of the earlier workaround commit:
a166151cbe33 ("bpf: fix bpf helpers to use skb-&gt;mac_header relative offsets")

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Acked-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>pktgen: add benchmark script pktgen_bench_xmit_mode_netif_receive.sh</title>
<updated>2015-05-23T03:59:17Z</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>brouer@redhat.com</email>
</author>
<published>2015-05-21T10:18:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=05a14d5e17c21817946b6a50140c4a45257ad592'/>
<id>urn:sha1:05a14d5e17c21817946b6a50140c4a45257ad592</id>
<content type='text'>
This script pktgen_bench_xmit_mode_netif_receive.sh is a benchmark
script, which can be used for benchmarking part of the network stack.
This can be used for performance improving or catching regression in
that area.

The script is developed for benchmarking ingress qdisc path, original
idea by Alexei Starovoitov.  This script don't really need any
hardware.  This is achieved via the recently introduced stack inject
feature "xmit_mode netif_receive". See commit 62f64aed622b6 ("pktgen:
introduce xmit_mode '&lt;start_xmit|netif_receive&gt;'").

Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>pktgen: add sample script pktgen_sample03_burst_single_flow.sh</title>
<updated>2015-05-23T03:59:17Z</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>brouer@redhat.com</email>
</author>
<published>2015-05-21T10:18:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1d73ba16ad3975a2eb27e47e52fc215c0c5b8d83'/>
<id>urn:sha1:1d73ba16ad3975a2eb27e47e52fc215c0c5b8d83</id>
<content type='text'>
Add the pktgen samples script pktgen_sample03_burst_single_flow.sh
that demonstrates how to acheive maximum performance.

If correctly tuned[1] single CPU 10Gbit/s wirespeed small pkts is
possible[2] which is 14.88Mpps.  The trick is to take advantage of the
"burst" feature introduced in commit 38b2cf2982dc73 ("net: pktgen:
packet bursting via skb-&gt;xmit_more").

[1] http://netoptimizer.blogspot.dk/2014/06/pktgen-for-network-overload-testing.html
[2] http://netoptimizer.blogspot.dk/2014/10/unlocked-10gbps-tx-wirespeed-smallest.html

Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>pktgen: add sample script pktgen_sample02_multiqueue.sh</title>
<updated>2015-05-23T03:59:17Z</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>brouer@redhat.com</email>
</author>
<published>2015-05-21T10:17:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=282fb58947e129dea8badf48972ea89d027a76dc'/>
<id>urn:sha1:282fb58947e129dea8badf48972ea89d027a76dc</id>
<content type='text'>
Add the pktgen samples script pktgen_sample02_multiqueue.sh that
demonstrates generating packets on multiqueue NICs.

Specifically notice the options "-t" that specifies how many
kernel threads to activate.  Also notice the flag QUEUE_MAP_CPU,
which cause the SKB TX queue to be mapped to the CPU running the
kernel thread.  For best scalability people are also encourage to
map NIC IRQ /proc/irq/*/smp_affinity to CPU number.

Usage example with "-t" 4 threads and help:
 ./pktgen_sample02_multiqueue.sh -i eth4 -m 00:1B:21:3C:9D:F8 -t 4

Usage: ./pktgen_sample02_multiqueue.sh [-vx] -i ethX
  -i : ($DEV)       output interface/device (required)
  -s : ($PKT_SIZE)  packet size
  -d : ($DEST_IP)   destination IP
  -m : ($DST_MAC)   destination MAC-addr
  -t : ($THREADS)   threads to start
  -c : ($SKB_CLONE) SKB clones send before alloc new SKB
  -b : ($BURST)     HW level bursting of SKBs
  -v : ($VERBOSE)   verbose
  -x : ($DEBUG)     debug

Removing pktgen.conf-2-1 and pktgen.conf-2-2 as these examples
should be covered now.

Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>pktgen: add sample script pktgen_sample01_simple.sh</title>
<updated>2015-05-23T03:59:17Z</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>brouer@redhat.com</email>
</author>
<published>2015-05-21T10:17:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6f09479758be247fef02188a275383ebaddbe291'/>
<id>urn:sha1:6f09479758be247fef02188a275383ebaddbe291</id>
<content type='text'>
Add the first basic pktgen samples script pktgen_sample01_simple.sh,
which demonstrates the a simple use of the helper functions.
Removing pktgen.conf-1-1 as that example should be covered now.

The naming scheme pktgen_sampleNN, where NN is a number, should encourage
reading the samples in a specific order.

Script cause pktgen sending with a single thread and single interface,
and introduce flow variation via random UDP source port.

Usage example and help:
 ./pktgen_sample01_simple.sh -i eth4 -m 00:1B:21:3C:9D:F8 -d 192.168.8.2

Usage: ./pktgen_sample01_simple.sh [-vx] -i ethX
  -i : ($DEV)       output interface/device (required)
  -s : ($PKT_SIZE)  packet size
  -d : ($DEST_IP)   destination IP
  -m : ($DST_MAC)   destination MAC-addr
  -c : ($SKB_CLONE) SKB clones send before alloc new SKB
  -v : ($VERBOSE)   verbose
  -x : ($DEBUG)     debug

Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>pktgen: new pktgen helper functions for samples scripts</title>
<updated>2015-05-23T03:59:16Z</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>brouer@redhat.com</email>
</author>
<published>2015-05-21T10:17:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b64b0d1e64959691c1f4067a05fdb541d453ed6a'/>
<id>urn:sha1:b64b0d1e64959691c1f4067a05fdb541d453ed6a</id>
<content type='text'>
Preparing for removing existing samples/pktgen/ scripts, and
replacing these with easier to use samples.

This commit provides two helper shell files, that can
be "included" by shell source'ing. Namely "functions.sh"
and "parameters.sh".

The parameters.sh file support easy and consistant parameter
parsing across the sample scripts.  Usage example is printed on
errors.

The functions.sh file provides, three new shell functions for
configuring the different components of pktgen: pg_ctrl(),
pg_thread() and pg_set().  A slightly improved version of the old
pgset() function is also provided for backwards compat.

The new functions correspond to pktgens different components.
 * pg_ctrl()   control "pgctrl" (/proc/net/pktgen/pgctrl)
 * pg_thread() control the kernel threads and binding to devices
 * pg_set()    control setup of individual devices

These changes are borrowed from:
 https://github.com/netoptimizer/network-testing/tree/master/pktgen

Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
