<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/bpf, branch v4.10.3</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.10.3</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.10.3'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2017-02-13T02:52:19Z</updated>
<entry>
<title>bpf: introduce BPF_F_ALLOW_OVERRIDE flag</title>
<updated>2017-02-13T02:52:19Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@fb.com</email>
</author>
<published>2017-02-11T04:28:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7f677633379b4abb3281cdbe7e7006f049305c03'/>
<id>urn:sha1:7f677633379b4abb3281cdbe7e7006f049305c03</id>
<content type='text'>
If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
to the given cgroup the descendent cgroup will be able to override
effective bpf program that was inherited from this cgroup.
By default it's not passed, therefore override is disallowed.

Examples:
1.
prog X attached to /A with default
prog Y fails to attach to /A/B and /A/B/C
Everything under /A runs prog X

2.
prog X attached to /A with allow_override.
prog Y fails to attach to /A/B with default (non-override)
prog M attached to /A/B with allow_override.
Everything under /A/B runs prog M only.

3.
prog X attached to /A with allow_override.
prog Y fails to attach to /A with default.
The user has to detach first to switch the mode.

In the future this behavior may be extended with a chain of
non-overridable programs.

Also fix the bug where detach from cgroup where nothing is attached
was not throwing error. Return ENOENT in such case.

Add several testcases and adjust libbpf.

Fixes: 3007098494be ("cgroup: add support for eBPF programs")
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Daniel Mack &lt;daniel@zonque.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: don't trigger OOM killer under pressure with map alloc</title>
<updated>2017-01-18T22:12:26Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2017-01-18T14:14:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d407bd25a204bd66b7346dde24bd3d37ef0e0b05'/>
<id>urn:sha1:d407bd25a204bd66b7346dde24bd3d37ef0e0b05</id>
<content type='text'>
This patch adds two helpers, bpf_map_area_alloc() and bpf_map_area_free(),
that are to be used for map allocations. Using kmalloc() for very large
allocations can cause excessive work within the page allocator, so i) fall
back earlier to vmalloc() when the attempt is considered costly anyway,
and even more importantly ii) don't trigger OOM killer with any of the
allocators.

Since this is based on a user space request, for example, when creating
maps with element pre-allocation, we really want such requests to fail
instead of killing other user space processes.

Also, don't spam the kernel log with warnings should any of the allocations
fail under pressure. Given that, we can make backend selection in
bpf_map_area_alloc() generic, and convert all maps over to use this API
for spots with potentially large allocation requests.

Note, replacing the one kmalloc_array() is fine as overflow checks happen
earlier in htab_map_alloc(), since it must also protect the multiplication
for vmalloc() should kmalloc_array() fail.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: rework prog_digest into prog_tag</title>
<updated>2017-01-16T19:03:31Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2017-01-13T22:38:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f1f7714ea51c56b7163fb1a5acf39c6a204dd758'/>
<id>urn:sha1:f1f7714ea51c56b7163fb1a5acf39c6a204dd758</id>
<content type='text'>
Commit 7bd509e311f4 ("bpf: add prog_digest and expose it via
fdinfo/netlink") was recently discussed, partially due to
admittedly suboptimal name of "prog_digest" in combination
with sha1 hash usage, thus inevitably and rightfully concerns
about its security in terms of collision resistance were
raised with regards to use-cases.

The intended use cases are for debugging resp. introspection
only for providing a stable "tag" over the instruction sequence
that both kernel and user space can calculate independently.
It's not usable at all for making a security relevant decision.
So collisions where two different instruction sequences generate
the same tag can happen, but ideally at a rather low rate. The
"tag" will be dumped in hex and is short enough to introspect
in tracepoints or kallsyms output along with other data such
as stack trace, etc. Thus, this patch performs a rename into
prog_tag and truncates the tag to a short output (64 bits) to
make it obvious it's not collision-free.

Should in future a hash or facility be needed with a security
relevant focus, then we can think about requirements, constraints,
etc that would fit to that situation. For now, rework the exposed
parts for the current use cases as long as nothing has been
released yet. Tested on x86_64 and s390x.

Fixes: 7bd509e311f4 ("bpf: add prog_digest and expose it via fdinfo/netlink")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: do not use KMALLOC_SHIFT_MAX</title>
<updated>2017-01-11T02:31:54Z</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2017-01-11T00:57:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7984c27c2c5cd3298de8afdba3e1bd46f884e934'/>
<id>urn:sha1:7984c27c2c5cd3298de8afdba3e1bd46f884e934</id>
<content type='text'>
Commit 01b3f52157ff ("bpf: fix allocation warnings in bpf maps and
integer overflow") has added checks for the maximum allocateable size.
It (ab)used KMALLOC_SHIFT_MAX for that purpose.

While this is not incorrect it is not very clean because we already have
KMALLOC_MAX_SIZE for this very reason so let's change both checks to use
KMALLOC_MAX_SIZE instead.

The original motivation for using KMALLOC_SHIFT_MAX was to work around
an incorrect KMALLOC_MAX_SIZE which could lead to allocation warnings
but it is no longer needed since "slab: make sure that KMALLOC_MAX_SIZE
will fit into MAX_ORDER".

Link: http://lkml.kernel.org/r/20161220130659.16461-3-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>bpf: fix mark_reg_unknown_value for spilled regs on map value marking</title>
<updated>2016-12-18T02:27:44Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2016-12-18T00:52:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6760bf2ddde8ad64f8205a651223a93de3a35494'/>
<id>urn:sha1:6760bf2ddde8ad64f8205a651223a93de3a35494</id>
<content type='text'>
Martin reported a verifier issue that hit the BUG_ON() for his
test case in the mark_reg_unknown_value() function:

  [  202.861380] kernel BUG at kernel/bpf/verifier.c:467!
  [...]
  [  203.291109] Call Trace:
  [  203.296501]  [&lt;ffffffff811364d5&gt;] mark_map_reg+0x45/0x50
  [  203.308225]  [&lt;ffffffff81136558&gt;] mark_map_regs+0x78/0x90
  [  203.320140]  [&lt;ffffffff8113938d&gt;] do_check+0x226d/0x2c90
  [  203.331865]  [&lt;ffffffff8113a6ab&gt;] bpf_check+0x48b/0x780
  [  203.343403]  [&lt;ffffffff81134c8e&gt;] bpf_prog_load+0x27e/0x440
  [  203.355705]  [&lt;ffffffff8118a38f&gt;] ? handle_mm_fault+0x11af/0x1230
  [  203.369158]  [&lt;ffffffff812d8188&gt;] ? security_capable+0x48/0x60
  [  203.382035]  [&lt;ffffffff811351a4&gt;] SyS_bpf+0x124/0x960
  [  203.393185]  [&lt;ffffffff810515f6&gt;] ? __do_page_fault+0x276/0x490
  [  203.406258]  [&lt;ffffffff816db320&gt;] entry_SYSCALL_64_fastpath+0x13/0x94

This issue got uncovered after the fix in a08dd0da5307 ("bpf: fix
regression on verifier pruning wrt map lookups"). The reason why it
wasn't noticed before was, because as mentioned in a08dd0da5307,
mark_map_regs() was doing the id matching incorrectly based on the
uncached regs[regno].id. So, in the first loop, we walked all regs
and as soon as we found regno == i, then this reg's id was cleared
when calling mark_reg_unknown_value() thus that every subsequent
register was probed against id of 0 (which, in combination with the
PTR_TO_MAP_VALUE_OR_NULL type is an invalid condition that no other
register state can hold), and therefore wasn't type transitioned such
as in the spilled register case for the second loop.

Now since that got fixed, it turned out that 57a09bf0a416 ("bpf:
Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") used
mark_reg_unknown_value() incorrectly for the spilled regs, and thus
hitting the BUG_ON() in some cases due to regno &gt;= MAX_BPF_REG.

Although spilled regs have the same type as the non-spilled regs
for the verifier state, that is, struct bpf_reg_state, they are
semantically different from the non-spilled regs. In other words,
there can be up to 64 (MAX_BPF_STACK / BPF_REG_SIZE) spilled regs
in the stack, for example, register R&lt;x&gt; could have been spilled by
the program to stack location X, Y, Z, and in mark_map_regs() we
need to scan these stack slots of type STACK_SPILL for potential
registers that we have to transition from PTR_TO_MAP_VALUE_OR_NULL.
Therefore, depending on the location, the spilled_regs regno can
be a lot higher than just MAX_BPF_REG's value since we operate on
stack instead. The reset in mark_reg_unknown_value() itself is
just fine, only that the BUG_ON() was inappropriate for this. Fix
it by making a __mark_reg_unknown_value() version that can be
called from mark_map_reg() generically; we know for the non-spilled
case that the regno is always &lt; MAX_BPF_REG anyway.

Fixes: 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
Reported-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: fix overflow in prog accounting</title>
<updated>2016-12-18T02:27:44Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2016-12-18T00:52:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5ccb071e97fbd9ffe623a0d3977cc6d013bee93c'/>
<id>urn:sha1:5ccb071e97fbd9ffe623a0d3977cc6d013bee93c</id>
<content type='text'>
Commit aaac3ba95e4c ("bpf: charge user for creation of BPF maps and
programs") made a wrong assumption of charging against prog-&gt;pages.
Unlike map-&gt;pages, prog-&gt;pages are still subject to change when we
need to expand the program through bpf_prog_realloc().

This can for example happen during verification stage when we need to
expand and rewrite parts of the program. Should the required space
cross a page boundary, then prog-&gt;pages is not the same anymore as
its original value that we used to bpf_prog_charge_memlock() on. Thus,
we'll hit a wrap-around during bpf_prog_uncharge_memlock() when prog
is freed eventually. I noticed this that despite having unlimited
memlock, programs suddenly refused to load with EPERM error due to
insufficient memlock.

There are two ways to fix this issue. One would be to add a cached
variable to struct bpf_prog that takes a snapshot of prog-&gt;pages at the
time of charging. The other approach is to also account for resizes. I
chose to go with the latter for a couple of reasons: i) We want accounting
rather to be more accurate instead of further fooling limits, ii) adding
yet another page counter on struct bpf_prog would also be a waste just
for this purpose. We also do want to charge as early as possible to
avoid going into the verifier just to find out later on that we crossed
limits. The only place that needs to be fixed is bpf_prog_realloc(),
since only here we expand the program, so we try to account for the
needed delta and should we fail, call-sites check for outcome anyway.
On cBPF to eBPF migrations, we don't grab a reference to the user as
they are charged differently. With that in place, my test case worked
fine.

Fixes: aaac3ba95e4c ("bpf: charge user for creation of BPF maps and programs")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: dynamically allocate digest scratch buffer</title>
<updated>2016-12-18T02:27:44Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2016-12-18T00:52:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=aafe6ae9cee32df85eb5e8bb6dd1d918e6807b09'/>
<id>urn:sha1:aafe6ae9cee32df85eb5e8bb6dd1d918e6807b09</id>
<content type='text'>
Geert rightfully complained that 7bd509e311f4 ("bpf: add prog_digest
and expose it via fdinfo/netlink") added a too large allocation of
variable 'raw' from bss section, and should instead be done dynamically:

  # ./scripts/bloat-o-meter kernel/bpf/core.o.1 kernel/bpf/core.o.2
  add/remove: 3/0 grow/shrink: 0/0 up/down: 33291/0 (33291)
  function                                     old     new   delta
  raw                                            -   32832  +32832
  [...]

Since this is only relevant during program creation path, which can be
considered slow-path anyway, lets allocate that dynamically and be not
implicitly dependent on verifier mutex. Move bpf_prog_calc_digest() at
the beginning of replace_map_fd_with_map_ptr() and also error handling
stays straight forward.

Reported-by: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: fix regression on verifier pruning wrt map lookups</title>
<updated>2016-12-17T15:51:31Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2016-12-15T00:30:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a08dd0da5307ba01295c8383923e51e7997c3576'/>
<id>urn:sha1:a08dd0da5307ba01295c8383923e51e7997c3576</id>
<content type='text'>
Commit 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL
registers") introduced a regression where existing programs stopped
loading due to reaching the verifier's maximum complexity limit,
whereas prior to this commit they were loading just fine; the affected
program has roughly 2k instructions.

What was found is that state pruning couldn't be performed effectively
anymore due to mismatches of the verifier's register state, in particular
in the id tracking. It doesn't mean that 57a09bf0a416 is incorrect per
se, but rather that verifier needs to perform a lot more work for the
same program with regards to involved map lookups.

Since commit 57a09bf0a416 is only about tracking registers with type
PTR_TO_MAP_VALUE_OR_NULL, the id is only needed to follow registers
until they are promoted through pattern matching with a NULL check to
either PTR_TO_MAP_VALUE or UNKNOWN_VALUE type. After that point, the
id becomes irrelevant for the transitioned types.

For UNKNOWN_VALUE, id is already reset to 0 via mark_reg_unknown_value(),
but not so for PTR_TO_MAP_VALUE where id is becoming stale. It's even
transferred further into other types that don't make use of it. Among
others, one example is where UNKNOWN_VALUE is set on function call
return with RET_INTEGER return type.

states_equal() will then fall through the memcmp() on register state;
note that the second memcmp() uses offsetofend(), so the id is part of
that since d2a4dd37f6b4 ("bpf: fix state equivalence"). But the bisect
pointed already to 57a09bf0a416, where we really reach beyond complexity
limit. What I found was that states_equal() often failed in this
case due to id mismatches in spilled regs with registers in type
PTR_TO_MAP_VALUE. Unlike non-spilled regs, spilled regs just perform
a memcmp() on their reg state and don't have any other optimizations
in place, therefore also id was relevant in this case for making a
pruning decision.

We can safely reset id to 0 as well when converting to PTR_TO_MAP_VALUE.
For the affected program, it resulted in a ~17 fold reduction of
complexity and let the program load fine again. Selftest suite also
runs fine. The only other place where env-&gt;id_gen is used currently is
through direct packet access, but for these cases id is long living, thus
a different scenario.

Also, the current logic in mark_map_regs() is not fully correct when
marking NULL branch with UNKNOWN_VALUE. We need to cache the destination
reg's id in any case. Otherwise, once we marked that reg as UNKNOWN_VALUE,
it's id is reset and any subsequent registers that hold the original id
and are of type PTR_TO_MAP_VALUE_OR_NULL won't be marked UNKNOWN_VALUE
anymore, since mark_map_reg() reuses the uncached regs[regno].id that
was just overridden. Note, we don't need to cache it outside of
mark_map_regs(), since it's called once on this_branch and the other
time on other_branch, which are both two independent verifier states.
A test case for this is added here, too.

Fixes: 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Thomas Graf &lt;tgraf@suug.ch&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: xdp: Allow head adjustment in XDP prog</title>
<updated>2016-12-08T19:25:13Z</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2016-12-07T23:53:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=17bedab2723145d17b14084430743549e6943d03'/>
<id>urn:sha1:17bedab2723145d17b14084430743549e6943d03</id>
<content type='text'>
This patch allows XDP prog to extend/remove the packet
data at the head (like adding or removing header).  It is
done by adding a new XDP helper bpf_xdp_adjust_head().

It also renames bpf_helper_changes_skb_data() to
bpf_helper_changes_pkt_data() to better reflect
that XDP prog does not work on skb.

This patch adds one "xdp_adjust_head" bit to bpf_prog for the
XDP-capable driver to check if the XDP prog requires
bpf_xdp_adjust_head() support.  The driver can then decide
to error out during XDP_SETUP_PROG.

Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: John Fastabend &lt;john.r.fastabend@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: fix state equivalence</title>
<updated>2016-12-08T18:31:11Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@fb.com</email>
</author>
<published>2016-12-07T18:57:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d2a4dd37f6b41fbcad76efbf63124eb3126c66fe'/>
<id>urn:sha1:d2a4dd37f6b41fbcad76efbf63124eb3126c66fe</id>
<content type='text'>
Commmits 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
and 484611357c19 ("bpf: allow access into map value arrays") by themselves
are correct, but in combination they make state equivalence ignore 'id' field
of the register state which can lead to accepting invalid program.

Fixes: 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
Fixes: 484611357c19 ("bpf: allow access into map value arrays")
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Thomas Graf &lt;tgraf@suug.ch&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
