<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/bpf/arraymap.c, branch v4.4.166</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.166</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.166'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2018-11-10T15:41:37Z</updated>
<entry>
<title>bpf: generally move prog destruction to RCU deferral</title>
<updated>2018-11-10T15:41:37Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2016-06-30T15:24:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e25dc63aa366fd0f61d1d9ba67b66f5d75fc4372'/>
<id>urn:sha1:e25dc63aa366fd0f61d1d9ba67b66f5d75fc4372</id>
<content type='text'>
[ Upstream commit 1aacde3d22c42281236155c1ef6d7a5aa32a826b ]

Jann Horn reported following analysis that could potentially result
in a very hard to trigger (if not impossible) UAF race, to quote his
event timeline:

 - Set up a process with threads T1, T2 and T3
 - Let T1 set up a socket filter F1 that invokes another filter F2
   through a BPF map [tail call]
 - Let T1 trigger the socket filter via a unix domain socket write,
   don't wait for completion
 - Let T2 call PERF_EVENT_IOC_SET_BPF with F2, don't wait for completion
 - Now T2 should be behind bpf_prog_get(), but before bpf_prog_put()
 - Let T3 close the file descriptor for F2, dropping the reference
   count of F2 to 2
 - At this point, T1 should have looked up F2 from the map, but not
   finished executing it
 - Let T3 remove F2 from the BPF map, dropping the reference count of
   F2 to 1
 - Now T2 should call bpf_prog_put() (wrong BPF program type), dropping
   the reference count of F2 to 0 and scheduling bpf_prog_free_deferred()
   via schedule_work()
 - At this point, the BPF program could be freed
 - BPF execution is still running in a freed BPF program

While at PERF_EVENT_IOC_SET_BPF time it's only guaranteed that the perf
event fd we're doing the syscall on doesn't disappear from underneath us
for whole syscall time, it may not be the case for the bpf fd used as
an argument only after we did the put. It needs to be a valid fd pointing
to a BPF program at the time of the call to make the bpf_prog_get() and
while T2 gets preempted, F2 must have dropped reference to 1 on the other
CPU. The fput() from the close() in T3 should also add additionally delay
to the reference drop via exit_task_work() when bpf_prog_release() gets
called as well as scheduling bpf_prog_free_deferred().

That said, it makes nevertheless sense to move the BPF prog destruction
generally after RCU grace period to guarantee that such scenario above,
but also others as recently fixed in ceb56070359b ("bpf, perf: delay release
of BPF prog after grace period") with regards to tail calls won't happen.
Integrating bpf_prog_free_deferred() directly into the RCU callback is
not allowed since the invocation might happen from either softirq or
process context, so we're not permitted to block. Reviewing all bpf_prog_put()
invocations from eBPF side (note, cBPF -&gt; eBPF progs don't use this for
their destruction) with call_rcu() look good to me.

Since we don't know whether at the time of attaching the program, we're
already part of a tail call map, we need to use RCU variant. However, due
to this, there won't be severely more stress on the RCU callback queue:
situations with above bpf_prog_get() and bpf_prog_put() combo in practice
normally won't lead to releases, but even if they would, enough effort/
cycles have to be put into loading a BPF program into the kernel already.

Reported-by: Jann Horn &lt;jannh@google.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: map_get_next_key to return first key on NULL</title>
<updated>2018-05-16T08:06:46Z</updated>
<author>
<name>Teng Qin</name>
<email>qinteng@fb.com</email>
</author>
<published>2017-04-25T02:00:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ea7c24c78551c8b3e6a7e9824e5ad8ba6224f5fe'/>
<id>urn:sha1:ea7c24c78551c8b3e6a7e9824e5ad8ba6224f5fe</id>
<content type='text'>
commit 8fe45924387be6b5c1be59a7eb330790c61d5d10 upstream.

When iterating through a map, we need to find a key that does not exist
in the map so map_get_next_key will give us the first key of the map.
This often requires a lot of guessing in production systems.

This patch makes map_get_next_key return the first key when the key
pointer in the parameter is NULL.

Signed-off-by: Teng Qin &lt;qinteng@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Chenbo Feng &lt;fengc@google.com&gt;
Cc: Lorenzo Colitti &lt;lorenzo@google.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>bpf, array: fix overflow in max_entries and undefined behavior in index_mask</title>
<updated>2018-01-17T08:35:31Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2018-01-12T16:58:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=095b0ba360ff9a86c592c1293602d42a9297e047'/>
<id>urn:sha1:095b0ba360ff9a86c592c1293602d42a9297e047</id>
<content type='text'>
commit bbeb6e4323dad9b5e0ee9f60c223dd532e2403b1 upstream.

syzkaller tried to alloc a map with 0xfffffffd entries out of a userns,
and thus unprivileged. With the recently added logic in b2157399cc98
("bpf: prevent out-of-bounds speculation") we round this up to the next
power of two value for max_entries for unprivileged such that we can
apply proper masking into potentially zeroed out map slots.

However, this will generate an index_mask of 0xffffffff, and therefore
a + 1 will let this overflow into new max_entries of 0. This will pass
allocation, etc, and later on map access we still enforce on the original
attr-&gt;max_entries value which was 0xfffffffd, therefore triggering GPF
all over the place. Thus bail out on overflow in such case.

Moreover, on 32 bit archs roundup_pow_of_two() can also not be used,
since fls_long(max_entries - 1) can result in 32 and 1UL &lt;&lt; 32 in 32 bit
space is undefined. Therefore, do this by hand in a 64 bit variable.

This fixes all the issues triggered by syzkaller's reproducers.

Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation")
Reported-by: syzbot+b0efb8e572d01bce1ae0@syzkaller.appspotmail.com
Reported-by: syzbot+6c15e9744f75f2364773@syzkaller.appspotmail.com
Reported-by: syzbot+d2f5524fb46fd3b312ee@syzkaller.appspotmail.com
Reported-by: syzbot+61d23c95395cc90dbc2b@syzkaller.appspotmail.com
Reported-by: syzbot+0d363c942452cca68c01@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bpf: prevent out-of-bounds speculation</title>
<updated>2018-01-17T08:35:31Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2018-01-08T01:33:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9a7fad4c0e215fb1c256fee27c45f9f8bc4364c5'/>
<id>urn:sha1:9a7fad4c0e215fb1c256fee27c45f9f8bc4364c5</id>
<content type='text'>
commit b2157399cc9898260d6031c5bfe45fe137c1fbe7 upstream.

Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.

To avoid leaking kernel data round up array-based maps and mask the index
after bounds check, so speculated load with out of bounds index will load
either valid value from the array or zero from the padded area.

Unconditionally mask index for all array types even when max_entries
are not rounded to power of 2 for root user.
When map is created by unpriv user generate a sequence of bpf insns
that includes AND operation to make sure that JITed code includes
the same 'index &amp; index_mask' operation.

If prog_array map is created by unpriv user replace
  bpf_tail_call(ctx, map, index);
with
  if (index &gt;= max_entries) {
    index &amp;= map-&gt;index_mask;
    bpf_tail_call(ctx, map, index);
  }
(along with roundup to power 2) to prevent out-of-bounds speculation.
There is secondary redundant 'if (index &gt;= max_entries)' in the interpreter
and in all JITs, but they can be optimized later if necessary.

Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
cannot be used by unpriv, so no changes there.

That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
all architectures with and without JIT.

v2-&gt;v3:
Daniel noticed that attack potentially can be crafted via syscall commands
without loading the program, so add masking to those paths as well.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bpf: fix allocation warnings in bpf maps and integer overflow</title>
<updated>2015-12-03T04:36:00Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2015-11-30T00:59:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=01b3f52157ff5a47d6d8d796f396a4b34a53c61d'/>
<id>urn:sha1:01b3f52157ff5a47d6d8d796f396a4b34a53c61d</id>
<content type='text'>
For large map-&gt;value_size the user space can trigger memory allocation warnings like:
WARNING: CPU: 2 PID: 11122 at mm/page_alloc.c:2989
__alloc_pages_nodemask+0x695/0x14e0()
Call Trace:
 [&lt;     inline     &gt;] __dump_stack lib/dump_stack.c:15
 [&lt;ffffffff82743b56&gt;] dump_stack+0x68/0x92 lib/dump_stack.c:50
 [&lt;ffffffff81244ec9&gt;] warn_slowpath_common+0xd9/0x140 kernel/panic.c:460
 [&lt;ffffffff812450f9&gt;] warn_slowpath_null+0x29/0x30 kernel/panic.c:493
 [&lt;     inline     &gt;] __alloc_pages_slowpath mm/page_alloc.c:2989
 [&lt;ffffffff81554e95&gt;] __alloc_pages_nodemask+0x695/0x14e0 mm/page_alloc.c:3235
 [&lt;ffffffff816188fe&gt;] alloc_pages_current+0xee/0x340 mm/mempolicy.c:2055
 [&lt;     inline     &gt;] alloc_pages include/linux/gfp.h:451
 [&lt;ffffffff81550706&gt;] alloc_kmem_pages+0x16/0xf0 mm/page_alloc.c:3414
 [&lt;ffffffff815a1c89&gt;] kmalloc_order+0x19/0x60 mm/slab_common.c:1007
 [&lt;ffffffff815a1cef&gt;] kmalloc_order_trace+0x1f/0xa0 mm/slab_common.c:1018
 [&lt;     inline     &gt;] kmalloc_large include/linux/slab.h:390
 [&lt;ffffffff81627784&gt;] __kmalloc+0x234/0x250 mm/slub.c:3525
 [&lt;     inline     &gt;] kmalloc include/linux/slab.h:463
 [&lt;     inline     &gt;] map_update_elem kernel/bpf/syscall.c:288
 [&lt;     inline     &gt;] SYSC_bpf kernel/bpf/syscall.c:744

To avoid never succeeding kmalloc with order &gt;= MAX_ORDER check that
elem-&gt;value_size and computed elem_size are within limits for both hash and
array type maps.
Also add __GFP_NOWARN to kmalloc(value_size | elem_size) to avoid OOM warnings.
Note kmalloc(key_size) is highly unlikely to trigger OOM, since key_size &lt;= 512,
so keep those kmalloc-s as-is.

Large value_size can cause integer overflows in elem_size and map.pages
formulas, so check for that as well.

Fixes: aaac3ba95e4c ("bpf: charge user for creation of BPF maps and programs")
Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf, array: fix heap out-of-bounds access when updating elements</title>
<updated>2015-12-02T02:56:17Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2015-11-30T12:02:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fbca9d2d35c6ef1b323fae75cc9545005ba25097'/>
<id>urn:sha1:fbca9d2d35c6ef1b323fae75cc9545005ba25097</id>
<content type='text'>
During own review but also reported by Dmitry's syzkaller [1] it has been
noticed that we trigger a heap out-of-bounds access on eBPF array maps
when updating elements. This happens with each map whose map-&gt;value_size
(specified during map creation time) is not multiple of 8 bytes.

In array_map_alloc(), elem_size is round_up(attr-&gt;value_size, 8) and
used to align array map slots for faster access. However, in function
array_map_update_elem(), we update the element as ...

memcpy(array-&gt;value + array-&gt;elem_size * index, value, array-&gt;elem_size);

... where we access 'value' out-of-bounds, since it was allocated from
map_update_elem() from syscall side as kmalloc(map-&gt;value_size, GFP_USER)
and later on copied through copy_from_user(value, uvalue, map-&gt;value_size).
Thus, up to 7 bytes, we can access out-of-bounds.

Same could happen from within an eBPF program, where in worst case we
access beyond an eBPF program's designated stack.

Since 1be7f75d1668 ("bpf: enable non-root eBPF programs") didn't hit an
official release yet, it only affects priviledged users.

In case of array_map_lookup_elem(), the verifier prevents eBPF programs
from accessing beyond map-&gt;value_size through check_map_access(). Also
from syscall side map_lookup_elem() only copies map-&gt;value_size back to
user, so nothing could leak.

  [1] http://github.com/google/syzkaller

Fixes: 28fbcfa08d8e ("bpf: add array type of eBPF maps")
Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: fix bpf_perf_event_read() helper</title>
<updated>2015-10-27T04:49:26Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2015-10-23T00:10:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=62544ce8e01c1879d420ba309f7f319d24c0f4e6'/>
<id>urn:sha1:62544ce8e01c1879d420ba309f7f319d24c0f4e6</id>
<content type='text'>
Fix safety checks for bpf_perf_event_read():
- only non-inherited events can be added to perf_event_array map
  (do this check statically at map insertion time)
- dynamically check that event is local and !pmu-&gt;count
Otherwise buggy bpf program can cause kernel splat.

Also fix error path after perf_event_attrs()
and remove redundant 'extern'.

Fixes: 35578d798400 ("bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter")
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Tested-by: Wang Nan &lt;wangnan0@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: introduce bpf_perf_event_output() helper</title>
<updated>2015-10-22T13:42:15Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2015-10-21T03:02:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a43eec304259a6c637f4014a6d4767159b6a3aa3'/>
<id>urn:sha1:a43eec304259a6c637f4014a6d4767159b6a3aa3</id>
<content type='text'>
This helper is used to send raw data from eBPF program into
special PERF_TYPE_SOFTWARE/PERF_COUNT_SW_BPF_OUTPUT perf_event.
User space needs to perf_event_open() it (either for one or all cpus) and
store FD into perf_event_array (similar to bpf_perf_event_read() helper)
before eBPF program can send data into it.

Today the programs triggered by kprobe collect the data and either store
it into the maps or print it via bpf_trace_printk() where latter is the debug
facility and not suitable to stream the data. This new helper replaces
such bpf_trace_printk() usage and allows programs to have dedicated
channel into user space for post-processing of the raw data collected.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: charge user for creation of BPF maps and programs</title>
<updated>2015-10-13T02:13:36Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2015-10-08T05:23:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=aaac3ba95e4c8b496d22f68bd1bc01cfbf525eca'/>
<id>urn:sha1:aaac3ba95e4c8b496d22f68bd1bc01cfbf525eca</id>
<content type='text'>
since eBPF programs and maps use kernel memory consider it 'locked' memory
from user accounting point of view and charge it against RLIMIT_MEMLOCK limit.
This limit is typically set to 64Kbytes by distros, so almost all
bpf+tracing programs would need to increase it, since they use maps,
but kernel charges maximum map size upfront.
For example the hash map of 1024 elements will be charged as 64Kbyte.
It's inconvenient for current users and changes current behavior for root,
but probably worth doing to be consistent root vs non-root.

Similar accounting logic is done by mmap of perf_event.

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ebpf: include perf_event only where really needed</title>
<updated>2015-10-05T14:04:08Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2015-10-02T16:42:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0cdf5640e4f6940bdbbefee4bb0adb7dffb185ec'/>
<id>urn:sha1:0cdf5640e4f6940bdbbefee4bb0adb7dffb185ec</id>
<content type='text'>
Commit ea317b267e9d ("bpf: Add new bpf map type to store the pointer
to struct perf_event") added perf_event.h to the main eBPF header, so
it gets included for all users. perf_event.h is actually only needed
from array map side, so lets sanitize this a bit.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Kaixu Xia &lt;xiakaixu@huawei.com&gt;
Acked-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
