<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/bpf, branch v4.11</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.11</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.11'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2017-04-17T19:51:57Z</updated>
<entry>
<title>bpf: fix checking xdp_adjust_head on tail calls</title>
<updated>2017-04-17T19:51:57Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2017-04-17T01:12:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c2002f983767ea0a53acbb3e21f771e7a7e2ed28'/>
<id>urn:sha1:c2002f983767ea0a53acbb3e21f771e7a7e2ed28</id>
<content type='text'>
Commit 17bedab27231 ("bpf: xdp: Allow head adjustment in XDP prog")
added the xdp_adjust_head bit to the BPF prog in order to tell drivers
that the program that is to be attached requires support for the XDP
bpf_xdp_adjust_head() helper such that drivers not supporting this
helper can reject the program. There are also drivers that do support
the helper, but need to check for xdp_adjust_head bit in order to move
packet metadata prepended by the firmware away for making headroom.

For these cases, the current check for xdp_adjust_head bit is insufficient
since there can be cases where the program itself does not use the
bpf_xdp_adjust_head() helper, but tail calls into another program that
uses bpf_xdp_adjust_head(). As such, the xdp_adjust_head bit is still
set to 0. Since the first program has no control over which program it
calls into, we need to assume that bpf_xdp_adjust_head() helper is used
upon tail calls. Thus, for the very same reasons in cb_access, set the
xdp_adjust_head bit to 1 when the main program uses tail calls.

Fixes: 17bedab27231 ("bpf: xdp: Allow head adjustment in XDP prog")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: fix cb access in socket filter programs on tail calls</title>
<updated>2017-04-17T19:51:57Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2017-04-17T01:12:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6b1bb01bcc5be91452bd3b5bda4300f179fd4053'/>
<id>urn:sha1:6b1bb01bcc5be91452bd3b5bda4300f179fd4053</id>
<content type='text'>
Commit ff936a04e5f2 ("bpf: fix cb access in socket filter programs")
added a fix for socket filter programs such that in i) AF_PACKET the
20 bytes of skb-&gt;cb[] area gets zeroed before use in order to not leak
data, and ii) socket filter programs attached to TCP/UDP sockets need
to save/restore these 20 bytes since they are also used by protocol
layers at that time.

The problem is that bpf_prog_run_save_cb() and bpf_prog_run_clear_cb()
only look at the actual attached program to determine whether to zero
or save/restore the skb-&gt;cb[] parts. There can be cases where the
actual attached program does not access the skb-&gt;cb[], but the program
tail calls into another program which does access this area. In such
a case, the zero or save/restore is currently not performed.

Since the programs we tail call into are unknown at verification time
and can dynamically change, we need to assume that whenever the attached
program performs a tail call, that later programs could access the
skb-&gt;cb[], and therefore we need to always set cb_access to 1.

Fixes: ff936a04e5f2 ("bpf: fix cb access in socket filter programs")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: reference may_access_skb() from __bpf_prog_run()</title>
<updated>2017-04-11T14:54:27Z</updated>
<author>
<name>Johannes Berg</name>
<email>johannes.berg@intel.com</email>
</author>
<published>2017-04-11T10:10:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=96a94cc5158859943b7e4e72ae69e572815f5413'/>
<id>urn:sha1:96a94cc5158859943b7e4e72ae69e572815f5413</id>
<content type='text'>
It took me quite some time to figure out how this was linked,
so in order to save the next person the effort of finding it
add a comment in __bpf_prog_run() that indicates what exactly
determines that a program can access the ctx == skb.

Signed-off-by: Johannes Berg &lt;johannes.berg@intel.com&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf, verifier: fix rejection of unaligned access checks for map_value_adj</title>
<updated>2017-04-01T19:36:37Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2017-03-31T00:24:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=79adffcd6489ef43bda2dfded3d637d7fb4fac80'/>
<id>urn:sha1:79adffcd6489ef43bda2dfded3d637d7fb4fac80</id>
<content type='text'>
Currently, the verifier doesn't reject unaligned access for map_value_adj
register types. Commit 484611357c19 ("bpf: allow access into map value
arrays") added logic to check_ptr_alignment() extending it from PTR_TO_PACKET
to also PTR_TO_MAP_VALUE_ADJ, but for PTR_TO_MAP_VALUE_ADJ no enforcement
is in place, because reg-&gt;id for PTR_TO_MAP_VALUE_ADJ reg types is never
non-zero, meaning, we can cause BPF_H/_W/_DW-based unaligned access for
architectures not supporting efficient unaligned access, and thus worst
case could raise exceptions on some archs that are unable to correct the
unaligned access or perform a different memory access to the actual
requested one and such.

i) Unaligned load with !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
   on r0 (map_value_adj):

   0: (bf) r2 = r10
   1: (07) r2 += -8
   2: (7a) *(u64 *)(r2 +0) = 0
   3: (18) r1 = 0x42533a00
   5: (85) call bpf_map_lookup_elem#1
   6: (15) if r0 == 0x0 goto pc+11
    R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
   7: (61) r1 = *(u32 *)(r0 +0)
   8: (35) if r1 &gt;= 0xb goto pc+9
    R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R1=inv,min_value=0,max_value=10 R10=fp
   9: (07) r0 += 3
  10: (79) r7 = *(u64 *)(r0 +0)
    R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R1=inv,min_value=0,max_value=10 R10=fp
  11: (79) r7 = *(u64 *)(r0 +2)
    R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R1=inv,min_value=0,max_value=10 R7=inv R10=fp
  [...]

ii) Unaligned store with !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
    on r0 (map_value_adj):

   0: (bf) r2 = r10
   1: (07) r2 += -8
   2: (7a) *(u64 *)(r2 +0) = 0
   3: (18) r1 = 0x4df16a00
   5: (85) call bpf_map_lookup_elem#1
   6: (15) if r0 == 0x0 goto pc+19
    R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
   7: (07) r0 += 3
   8: (7a) *(u64 *)(r0 +0) = 42
    R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R10=fp
   9: (7a) *(u64 *)(r0 +2) = 43
    R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R10=fp
  10: (7a) *(u64 *)(r0 -2) = 44
    R0=map_value_adj(ks=8,vs=48,id=0),min_value=3,max_value=3 R10=fp
  [...]

For the PTR_TO_PACKET type, reg-&gt;id is initially zero when skb-&gt;data
was fetched, it later receives a reg-&gt;id from env-&gt;id_gen generator
once another register with UNKNOWN_VALUE type was added to it via
check_packet_ptr_add(). The purpose of this reg-&gt;id is twofold: i) it
is used in find_good_pkt_pointers() for setting the allowed access
range for regs with PTR_TO_PACKET of same id once verifier matched
on data/data_end tests, and ii) for check_ptr_alignment() to determine
that when not having efficient unaligned access and register with
UNKNOWN_VALUE was added to PTR_TO_PACKET, that we're only allowed
to access the content bytewise due to unknown unalignment. reg-&gt;id
was never intended for PTR_TO_MAP_VALUE{,_ADJ} types and thus is
always zero, the only marking is in PTR_TO_MAP_VALUE_OR_NULL that
was added after 484611357c19 via 57a09bf0a416 ("bpf: Detect identical
PTR_TO_MAP_VALUE_OR_NULL registers"). Above tests will fail for
non-root environment due to prohibited pointer arithmetic.

The fix splits register-type specific checks into their own helper
instead of keeping them combined, so we don't run into a similar
issue in future once we extend check_ptr_alignment() further and
forget to add reg-&gt;type checks for some of the checks.

Fixes: 484611357c19 ("bpf: allow access into map value arrays")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Josef Bacik &lt;jbacik@fb.com&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf, verifier: fix alu ops against map_value{, _adj} register types</title>
<updated>2017-04-01T19:36:37Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2017-03-31T00:24:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fce366a9dd0ddc47e7ce05611c266e8574a45116'/>
<id>urn:sha1:fce366a9dd0ddc47e7ce05611c266e8574a45116</id>
<content type='text'>
While looking into map_value_adj, I noticed that alu operations
directly on the map_value() resp. map_value_adj() register (any
alu operation on a map_value() register will turn it into a
map_value_adj() typed register) are not sufficiently protected
against some of the operations. Two non-exhaustive examples are
provided that the verifier needs to reject:

 i) BPF_AND on r0 (map_value_adj):

  0: (bf) r2 = r10
  1: (07) r2 += -8
  2: (7a) *(u64 *)(r2 +0) = 0
  3: (18) r1 = 0xbf842a00
  5: (85) call bpf_map_lookup_elem#1
  6: (15) if r0 == 0x0 goto pc+2
   R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
  7: (57) r0 &amp;= 8
  8: (7a) *(u64 *)(r0 +0) = 22
   R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=8 R10=fp
  9: (95) exit

  from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp
  9: (95) exit
  processed 10 insns

ii) BPF_ADD in 32 bit mode on r0 (map_value_adj):

  0: (bf) r2 = r10
  1: (07) r2 += -8
  2: (7a) *(u64 *)(r2 +0) = 0
  3: (18) r1 = 0xc24eee00
  5: (85) call bpf_map_lookup_elem#1
  6: (15) if r0 == 0x0 goto pc+2
   R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
  7: (04) (u32) r0 += (u32) 0
  8: (7a) *(u64 *)(r0 +0) = 22
   R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
  9: (95) exit

  from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp
  9: (95) exit
  processed 10 insns

Issue is, while min_value / max_value boundaries for the access
are adjusted appropriately, we change the pointer value in a way
that cannot be sufficiently tracked anymore from its origin.
Operations like BPF_{AND,OR,DIV,MUL,etc} on a destination register
that is PTR_TO_MAP_VALUE{,_ADJ} was probably unintended, in fact,
all the test cases coming with 484611357c19 ("bpf: allow access
into map value arrays") perform BPF_ADD only on the destination
register that is PTR_TO_MAP_VALUE_ADJ.

Only for UNKNOWN_VALUE register types such operations make sense,
f.e. with unknown memory content fetched initially from a constant
offset from the map value memory into a register. That register is
then later tested against lower / upper bounds, so that the verifier
can then do the tracking of min_value / max_value, and properly
check once that UNKNOWN_VALUE register is added to the destination
register with type PTR_TO_MAP_VALUE{,_ADJ}. This is also what the
original use-case is solving. Note, tracking on what is being
added is done through adjust_reg_min_max_vals() and later access
to the map value enforced with these boundaries and the given offset
from the insn through check_map_access_adj().

Tests will fail for non-root environment due to prohibited pointer
arithmetic, in particular in check_alu_op(), we bail out on the
is_pointer_value() check on the dst_reg (which is false in root
case as we allow for pointer arithmetic via env-&gt;allow_ptr_leaks).

Similarly to PTR_TO_PACKET, one way to fix it is to restrict the
allowed operations on PTR_TO_MAP_VALUE{,_ADJ} registers to 64 bit
mode BPF_ADD. The test_verifier suite runs fine after the patch
and it also rejects mentioned test cases.

Fixes: 484611357c19 ("bpf: allow access into map value arrays")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Josef Bacik &lt;jbacik@fb.com&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: improve verifier packet range checks</title>
<updated>2017-03-25T03:51:28Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@fb.com</email>
</author>
<published>2017-03-24T22:57:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b1977682a3858b5584ffea7cfb7bd863f68db18d'/>
<id>urn:sha1:b1977682a3858b5584ffea7cfb7bd863f68db18d</id>
<content type='text'>
llvm can optimize the 'if (ptr &gt; data_end)' checks to be in the order
slightly different than the original C code which will confuse verifier.
Like:
if (ptr + 16 &gt; data_end)
  return TC_ACT_SHOT;
// may be followed by
if (ptr + 14 &gt; data_end)
  return TC_ACT_SHOT;
while llvm can see that 'ptr' is valid for all 16 bytes,
the verifier could not.
Fix verifier logic to account for such case and add a test.

Reported-by: Huapeng Zhou &lt;hzhou@fb.com&gt;
Fixes: 969bf05eb3ce ("bpf: direct packet access")
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: fix hashmap extra_elems logic</title>
<updated>2017-03-22T21:12:18Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@fb.com</email>
</author>
<published>2017-03-22T02:05:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8c290e60fa2a51806159522331c9ed41252a8fb3'/>
<id>urn:sha1:8c290e60fa2a51806159522331c9ed41252a8fb3</id>
<content type='text'>
In both kmalloc and prealloc mode the bpf_map_update_elem() is using
per-cpu extra_elems to do atomic update when the map is full.
There are two issues with it. The logic can be misused, since it allows
max_entries+num_cpus elements to be present in the map. And alloc_extra_elems()
at map creation time can fail percpu alloc for large map values with a warn:
WARNING: CPU: 3 PID: 2752 at ../mm/percpu.c:892 pcpu_alloc+0x119/0xa60
illegal size (32824) or align (8) for percpu allocation

The fixes for both of these issues are different for kmalloc and prealloc modes.
For prealloc mode allocate extra num_possible_cpus elements and store
their pointers into extra_elems array instead of actual elements.
Hence we can use these hidden(spare) elements not only when the map is full
but during bpf_map_update_elem() that replaces existing element too.
That also improves performance, since pcpu_freelist_pop/push is avoided.
Unfortunately this approach cannot be used for kmalloc mode which needs
to kfree elements after rcu grace period. Therefore switch it back to normal
kmalloc even when full and old element exists like it was prior to
commit 6c9059817432 ("bpf: pre-allocate hash map elements").

Add tests to check for over max_entries and large map values.

Reported-by: Dave Jones &lt;davej@codemonkey.org.uk&gt;
Fixes: 6c9059817432 ("bpf: pre-allocate hash map elements")
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: convert htab map to hlist_nulls</title>
<updated>2017-03-09T21:27:17Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@fb.com</email>
</author>
<published>2017-03-08T04:00:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4fe8435909fddc97b81472026aa954e06dd192a5'/>
<id>urn:sha1:4fe8435909fddc97b81472026aa954e06dd192a5</id>
<content type='text'>
when all map elements are pre-allocated one cpu can delete and reuse htab_elem
while another cpu is still walking the hlist. In such case the lookup may
miss the element. Convert hlist to hlist_nulls to avoid such scenario.
When bucket lock is taken there is no need to take such precautions,
so only convert map_lookup and map_get_next to nulls.
The race window is extremely small and only reproducible with explicit
udelay() inside lookup_nulls_elem_raw()

Similar to hlist add hlist_nulls_for_each_entry_safe() and
hlist_nulls_entry_safe() helpers.

Fixes: 6c9059817432 ("bpf: pre-allocate hash map elements")
Reported-by: Jonathan Perry &lt;jonperry@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: fix struct htab_elem layout</title>
<updated>2017-03-09T21:27:17Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@fb.com</email>
</author>
<published>2017-03-08T04:00:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9f691549f76d488a0c74397b3e51e943865ea01f'/>
<id>urn:sha1:9f691549f76d488a0c74397b3e51e943865ea01f</id>
<content type='text'>
when htab_elem is removed from the bucket list the htab_elem.hash_node.next
field should not be overridden too early otherwise we have a tiny race window
between lookup and delete.
The bug was discovered by manual code analysis and reproducible
only with explicit udelay() in lookup_elem_raw().

Fixes: 6c9059817432 ("bpf: pre-allocate hash map elements")
Reported-by: Jonathan Perry &lt;jonperry@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: add get_next_key callback to LPM map</title>
<updated>2017-03-06T01:55:29Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@fb.com</email>
</author>
<published>2017-03-05T17:41:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f38837b08d23e66de17d46d030e0d9ac5172ad1f'/>
<id>urn:sha1:f38837b08d23e66de17d46d030e0d9ac5172ad1f</id>
<content type='text'>
map_get_next_key callback is mandatory. Supply dummy handler.

Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
