<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/bpf, branch v4.19.100</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.19.100</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.19.100'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2020-01-27T13:51:20Z</updated>
<entry>
<title>bpf, offload: Unlock on error in bpf_offload_dev_create()</title>
<updated>2020-01-27T13:51:20Z</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@oracle.com</email>
</author>
<published>2019-11-04T09:15:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4622676d8fe988981982a67550bfa16e18dd0d69'/>
<id>urn:sha1:4622676d8fe988981982a67550bfa16e18dd0d69</id>
<content type='text'>
[ Upstream commit d0fbb51dfaa612f960519b798387be436e8f83c5 ]

We need to drop the bpf_devs_lock on error before returning.

Fixes: 9fd7c5559165 ("bpf: offload: aggregate offloads per-device")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Jakub Kicinski &lt;jakub.kicinski@netronome.com&gt;
Link: https://lore.kernel.org/bpf/20191104091536.GB31509@mwanda
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Add missed newline in verifier verbose log</title>
<updated>2020-01-27T13:50:37Z</updated>
<author>
<name>Andrey Ignatov</name>
<email>rdna@fb.com</email>
</author>
<published>2019-04-04T06:22:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=462c72919bcc533a2d82c6f5f0e36a7c483b2580'/>
<id>urn:sha1:462c72919bcc533a2d82c6f5f0e36a7c483b2580</id>
<content type='text'>
[ Upstream commit 1fbd20f8b77b366ea4aeb92ade72daa7f36a7e3b ]

check_stack_access() that prints verbose log is used in
adjust_ptr_min_max_vals() that prints its own verbose log and now they
stick together, e.g.:

  variable stack access var_off=(0xfffffffffffffff0; 0x4) off=-16
  size=1R2 stack pointer arithmetic goes out of range, prohibited for
  !root

Add missing newline so that log is more readable:
  variable stack access var_off=(0xfffffffffffffff0; 0x4) off=-16 size=1
  R2 stack pointer arithmetic goes out of range, prohibited for !root

Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Signed-off-by: Andrey Ignatov &lt;rdna@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Fix incorrect verifier simulation of ARSH under ALU32</title>
<updated>2020-01-23T07:21:32Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2020-01-15T20:47:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=042a3a6d938481da53f04299bbfd43401c42d41b'/>
<id>urn:sha1:042a3a6d938481da53f04299bbfd43401c42d41b</id>
<content type='text'>
commit 0af2ffc93a4b50948f9dad2786b7f1bd253bf0b9 upstream.

Anatoly has been fuzzing with kBdysch harness and reported a hang in one
of the outcomes:

  0: R1=ctx(id=0,off=0,imm=0) R10=fp0
  0: (85) call bpf_get_socket_cookie#46
  1: R0_w=invP(id=0) R10=fp0
  1: (57) r0 &amp;= 808464432
  2: R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0
  2: (14) w0 -= 810299440
  3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0
  3: (c4) w0 s&gt;&gt;= 1
  4: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0
  4: (76) if w0 s&gt;= 0x30303030 goto pc+216
  221: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0
  221: (95) exit
  processed 6 insns (limit 1000000) [...]

Taking a closer look, the program was xlated as follows:

  # ./bpftool p d x i 12
  0: (85) call bpf_get_socket_cookie#7800896
  1: (bf) r6 = r0
  2: (57) r6 &amp;= 808464432
  3: (14) w6 -= 810299440
  4: (c4) w6 s&gt;&gt;= 1
  5: (76) if w6 s&gt;= 0x30303030 goto pc+216
  6: (05) goto pc-1
  7: (05) goto pc-1
  8: (05) goto pc-1
  [...]
  220: (05) goto pc-1
  221: (05) goto pc-1
  222: (95) exit

Meaning, the visible effect is very similar to f54c7898ed1c ("bpf: Fix
precision tracking for unbounded scalars"), that is, the fall-through
branch in the instruction 5 is considered to be never taken given the
conclusion from the min/max bounds tracking in w6, and therefore the
dead-code sanitation rewrites it as goto pc-1. However, real-life input
disagrees with verification analysis since a soft-lockup was observed.

The bug sits in the analysis of the ARSH. The definition is that we shift
the target register value right by K bits through shifting in copies of
its sign bit. In adjust_scalar_min_max_vals(), we do first coerce the
register into 32 bit mode, same happens after simulating the operation.
However, for the case of simulating the actual ARSH, we don't take the
mode into account and act as if it's always 64 bit, but location of sign
bit is different:

  dst_reg-&gt;smin_value &gt;&gt;= umin_val;
  dst_reg-&gt;smax_value &gt;&gt;= umin_val;
  dst_reg-&gt;var_off = tnum_arshift(dst_reg-&gt;var_off, umin_val);

Consider an unknown R0 where bpf_get_socket_cookie() (or others) would
for example return 0xffff. With the above ARSH simulation, we'd see the
following results:

  [...]
  1: R1=ctx(id=0,off=0,imm=0) R2_w=invP65535 R10=fp0
  1: (85) call bpf_get_socket_cookie#46
  2: R0_w=invP(id=0) R10=fp0
  2: (57) r0 &amp;= 808464432
    -&gt; R0_runtime = 0x3030
  3: R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0
  3: (14) w0 -= 810299440
    -&gt; R0_runtime = 0xcfb40000
  4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0
                              (0xffffffff)
  4: (c4) w0 s&gt;&gt;= 1
    -&gt; R0_runtime = 0xe7da0000
  5: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0
                              (0x67c00000)           (0x7ffbfff8)
  [...]

In insn 3, we have a runtime value of 0xcfb40000, which is '1100 1111 1011
0100 0000 0000 0000 0000', the result after the shift has 0xe7da0000 that
is '1110 0111 1101 1010 0000 0000 0000 0000', where the sign bit is correctly
retained in 32 bit mode. In insn4, the umax was 0xffffffff, and changed into
0x7ffbfff8 after the shift, that is, '0111 1111 1111 1011 1111 1111 1111 1000'
and means here that the simulation didn't retain the sign bit. With above
logic, the updates happen on the 64 bit min/max bounds and given we coerced
the register, the sign bits of the bounds are cleared as well, meaning, we
need to force the simulation into s32 space for 32 bit alu mode.

Verification after the fix below. We're first analyzing the fall-through branch
on 32 bit signed &gt;= test eventually leading to rejection of the program in this
specific case:

  0: R1=ctx(id=0,off=0,imm=0) R10=fp0
  0: (b7) r2 = 808464432
  1: R1=ctx(id=0,off=0,imm=0) R2_w=invP808464432 R10=fp0
  1: (85) call bpf_get_socket_cookie#46
  2: R0_w=invP(id=0) R10=fp0
  2: (bf) r6 = r0
  3: R0_w=invP(id=0) R6_w=invP(id=0) R10=fp0
  3: (57) r6 &amp;= 808464432
  4: R0_w=invP(id=0) R6_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0
  4: (14) w6 -= 810299440
  5: R0_w=invP(id=0) R6_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0
  5: (c4) w6 s&gt;&gt;= 1
  6: R0_w=invP(id=0) R6_w=invP(id=0,umin_value=3888119808,umax_value=4294705144,var_off=(0xe7c00000; 0x183bfff8)) R10=fp0
                                              (0x67c00000)          (0xfffbfff8)
  6: (76) if w6 s&gt;= 0x30303030 goto pc+216
  7: R0_w=invP(id=0) R6_w=invP(id=0,umin_value=3888119808,umax_value=4294705144,var_off=(0xe7c00000; 0x183bfff8)) R10=fp0
  7: (30) r0 = *(u8 *)skb[808464432]
  BPF_LD_[ABS|IND] uses reserved fields
  processed 8 insns (limit 1000000) [...]

Fixes: 9cbe1f5a32dc ("bpf/verifier: improve register value range tracking with ARSH")
Reported-by: Anatoly Trosinenko &lt;anatoly.trosinenko@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Yonghong Song &lt;yhs@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200115204733.16648-1-daniel@iogearbox.net
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>bpf: Fix passing modified ctx to ld/abs/ind instruction</title>
<updated>2020-01-12T11:17:04Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2020-01-06T21:51:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f70280ee8937f1e86bca940fb64b394efe74bb52'/>
<id>urn:sha1:f70280ee8937f1e86bca940fb64b394efe74bb52</id>
<content type='text'>
commit 6d4f151acf9a4f6fab09b615f246c717ddedcf0c upstream.

Anatoly has been fuzzing with kBdysch harness and reported a KASAN
slab oob in one of the outcomes:

  [...]
  [   77.359642] BUG: KASAN: slab-out-of-bounds in bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.360463] Read of size 4 at addr ffff8880679bac68 by task bpf/406
  [   77.361119]
  [   77.361289] CPU: 2 PID: 406 Comm: bpf Not tainted 5.5.0-rc2-xfstests-00157-g2187f215eba #1
  [   77.362134] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
  [   77.362984] Call Trace:
  [   77.363249]  dump_stack+0x97/0xe0
  [   77.363603]  print_address_description.constprop.0+0x1d/0x220
  [   77.364251]  ? bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.365030]  ? bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.365860]  __kasan_report.cold+0x37/0x7b
  [   77.366365]  ? bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.366940]  kasan_report+0xe/0x20
  [   77.367295]  bpf_skb_load_helper_8_no_cache+0x71/0x130
  [   77.367821]  ? bpf_skb_load_helper_8+0xf0/0xf0
  [   77.368278]  ? mark_lock+0xa3/0x9b0
  [   77.368641]  ? kvm_sched_clock_read+0x14/0x30
  [   77.369096]  ? sched_clock+0x5/0x10
  [   77.369460]  ? sched_clock_cpu+0x18/0x110
  [   77.369876]  ? bpf_skb_load_helper_8+0xf0/0xf0
  [   77.370330]  ___bpf_prog_run+0x16c0/0x28f0
  [   77.370755]  __bpf_prog_run32+0x83/0xc0
  [   77.371153]  ? __bpf_prog_run64+0xc0/0xc0
  [   77.371568]  ? match_held_lock+0x1b/0x230
  [   77.371984]  ? rcu_read_lock_held+0xa1/0xb0
  [   77.372416]  ? rcu_is_watching+0x34/0x50
  [   77.372826]  sk_filter_trim_cap+0x17c/0x4d0
  [   77.373259]  ? sock_kzfree_s+0x40/0x40
  [   77.373648]  ? __get_filter+0x150/0x150
  [   77.374059]  ? skb_copy_datagram_from_iter+0x80/0x280
  [   77.374581]  ? do_raw_spin_unlock+0xa5/0x140
  [   77.375025]  unix_dgram_sendmsg+0x33a/0xa70
  [   77.375459]  ? do_raw_spin_lock+0x1d0/0x1d0
  [   77.375893]  ? unix_peer_get+0xa0/0xa0
  [   77.376287]  ? __fget_light+0xa4/0xf0
  [   77.376670]  __sys_sendto+0x265/0x280
  [   77.377056]  ? __ia32_sys_getpeername+0x50/0x50
  [   77.377523]  ? lock_downgrade+0x350/0x350
  [   77.377940]  ? __sys_setsockopt+0x2a6/0x2c0
  [   77.378374]  ? sock_read_iter+0x240/0x240
  [   77.378789]  ? __sys_socketpair+0x22a/0x300
  [   77.379221]  ? __ia32_sys_socket+0x50/0x50
  [   77.379649]  ? mark_held_locks+0x1d/0x90
  [   77.380059]  ? trace_hardirqs_on_thunk+0x1a/0x1c
  [   77.380536]  __x64_sys_sendto+0x74/0x90
  [   77.380938]  do_syscall_64+0x68/0x2a0
  [   77.381324]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [   77.381878] RIP: 0033:0x44c070
  [...]

After further debugging, turns out while in case of other helper functions
we disallow passing modified ctx, the special case of ld/abs/ind instruction
which has similar semantics (except r6 being the ctx argument) is missing
such check. Modified ctx is impossible here as bpf_skb_load_helper_8_no_cache()
and others are expecting skb fields in original position, hence, add
check_ctx_reg() to reject any modified ctx. Issue was first introduced back
in f1174f77b50c ("bpf/verifier: rework value tracking").

Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Reported-by: Anatoly Trosinenko &lt;anatoly.trosinenko@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200106215157.3553-1-daniel@iogearbox.net
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>bpf/stackmap: Fix deadlock with rq_lock in bpf_get_stack()</title>
<updated>2019-12-31T15:35:20Z</updated>
<author>
<name>Song Liu</name>
<email>songliubraving@fb.com</email>
</author>
<published>2019-10-14T17:12:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a7f4da875c16f3b8bef0d9ec67528111045bfcd8'/>
<id>urn:sha1:a7f4da875c16f3b8bef0d9ec67528111045bfcd8</id>
<content type='text'>
[ Upstream commit eac9153f2b584c702cea02c1f1a57d85aa9aea42 ]

bpf stackmap with build-id lookup (BPF_F_STACK_BUILD_ID) can trigger A-A
deadlock on rq_lock():

rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[...]
Call Trace:
 try_to_wake_up+0x1ad/0x590
 wake_up_q+0x54/0x80
 rwsem_wake+0x8a/0xb0
 bpf_get_stack+0x13c/0x150
 bpf_prog_fbdaf42eded9fe46_on_event+0x5e3/0x1000
 bpf_overflow_handler+0x60/0x100
 __perf_event_overflow+0x4f/0xf0
 perf_swevent_overflow+0x99/0xc0
 ___perf_sw_event+0xe7/0x120
 __schedule+0x47d/0x620
 schedule+0x29/0x90
 futex_wait_queue_me+0xb9/0x110
 futex_wait+0x139/0x230
 do_futex+0x2ac/0xa50
 __x64_sys_futex+0x13c/0x180
 do_syscall_64+0x42/0x100
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

This can be reproduced by:
1. Start a multi-thread program that does parallel mmap() and malloc();
2. taskset the program to 2 CPUs;
3. Attach bpf program to trace_sched_switch and gather stackmap with
   build-id, e.g. with trace.py from bcc tools:
   trace.py -U -p &lt;pid&gt; -s &lt;some-bin,some-lib&gt; t:sched:sched_switch

A sample reproducer is attached at the end.

This could also trigger deadlock with other locks that are nested with
rq_lock.

Fix this by checking whether irqs are disabled. Since rq_lock and all
other nested locks are irq safe, it is safe to do up_read() when irqs are
not disable. If the irqs are disabled, postpone up_read() in irq_work.

Fixes: 615755a77b24 ("bpf: extend stackmap to save binary_build_id+offset instead of address")
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/bpf/20191014171223.357174-1-songliubraving@fb.com

Reproducer:
============================ 8&lt; ============================

char *filename;

void *worker(void *p)
{
        void *ptr;
        int fd;
        char *pptr;

        fd = open(filename, O_RDONLY);
        if (fd &lt; 0)
                return NULL;
        while (1) {
                struct timespec ts = {0, 1000 + rand() % 2000};

                ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0);
                usleep(1);
                if (ptr == MAP_FAILED) {
                        printf("failed to mmap\n");
                        break;
                }
                munmap(ptr, 4096 * 64);
                usleep(1);
                pptr = malloc(1);
                usleep(1);
                pptr[0] = 1;
                usleep(1);
                free(pptr);
                usleep(1);
                nanosleep(&amp;ts, NULL);
        }
        close(fd);
        return NULL;
}

int main(int argc, char *argv[])
{
        void *ptr;
        int i;
        pthread_t threads[THREAD_COUNT];

        if (argc &lt; 2)
                return 0;

        filename = argv[1];

        for (i = 0; i &lt; THREAD_COUNT; i++) {
                if (pthread_create(threads + i, NULL, worker, NULL)) {
                        fprintf(stderr, "Error creating thread\n");
                        return 0;
                }
        }

        for (i = 0; i &lt; THREAD_COUNT; i++)
                pthread_join(threads[i], NULL);
        return 0;
}
============================ 8&lt; ============================

Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: btf: check name validity for various types</title>
<updated>2019-12-13T07:52:09Z</updated>
<author>
<name>Yonghong Song</name>
<email>yhs@fb.com</email>
</author>
<published>2018-11-27T21:23:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=12c49ac4cfa935ba76cde8f717c49d422e05e7bb'/>
<id>urn:sha1:12c49ac4cfa935ba76cde8f717c49d422e05e7bb</id>
<content type='text'>
[ Upstream commit eb04bbb608e683f8fd3ef7f716e2fa32dd90861f ]

This patch added name checking for the following types:
 . BTF_KIND_PTR, BTF_KIND_ARRAY, BTF_KIND_VOLATILE,
   BTF_KIND_CONST, BTF_KIND_RESTRICT:
     the name must be null
 . BTF_KIND_STRUCT, BTF_KIND_UNION: the struct/member name
     is either null or a valid identifier
 . BTF_KIND_ENUM: the enum type name is either null or a valid
     identifier; the enumerator name must be a valid identifier.
 . BTF_KIND_FWD: the name must be a valid identifier
 . BTF_KIND_TYPEDEF: the name must be a valid identifier

For those places a valid name is required, the name must be
a valid C identifier. This can be relaxed later if we found
use cases for a different (non-C) frontend.

Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)")
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Yonghong Song &lt;yhs@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: btf: implement btf_name_valid_identifier()</title>
<updated>2019-12-13T07:52:09Z</updated>
<author>
<name>Yonghong Song</name>
<email>yhs@fb.com</email>
</author>
<published>2018-11-27T21:23:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2f3e380d494c2126f7a9520ceeccbff00ab3e82f'/>
<id>urn:sha1:2f3e380d494c2126f7a9520ceeccbff00ab3e82f</id>
<content type='text'>
[ Upstream commit cdbb096adddb3f42584cecb5ec2e07c26815b71f ]

Function btf_name_valid_identifier() have been implemented in
bpf-next commit 2667a2626f4d ("bpf: btf: Add BTF_KIND_FUNC and
BTF_KIND_FUNC_PROTO"). Backport this function so later patch
can use it.

Fixes: 69b693f0aefa ("bpf: btf: Introduce BPF Type Format (BTF)")
Signed-off-by: Yonghong Song &lt;yhs@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>xdp: fix cpumap redirect SKB creation bug</title>
<updated>2019-12-05T08:21:24Z</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>brouer@redhat.com</email>
</author>
<published>2019-03-29T09:18:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7d8677ff106f9410a8b71e4a4de06fe11fec8501'/>
<id>urn:sha1:7d8677ff106f9410a8b71e4a4de06fe11fec8501</id>
<content type='text'>
[ Upstream commit 676e4a6fe703f2dae699ee9d56f14516f9ada4ea ]

We want to avoid leaking pointer info from xdp_frame (that is placed in
top of frame) like commit 6dfb970d3dbd ("xdp: avoid leaking info stored in
frame data on page reuse"), and followup commit 97e19cce05e5 ("bpf:
reserve xdp_frame size in xdp headroom") that reserve this headroom.

These changes also affected how cpumap constructed SKBs, as xdpf-&gt;headroom
size changed, the skb data starting point were in-effect shifted with 32
bytes (sizeof xdp_frame). This was still okay, as the cpumap frame_size
calculation also included xdpf-&gt;headroom which were reduced by same amount.

A bug was introduced in commit 77ea5f4cbe20 ("bpf/cpumap: make sure
frame_size for build_skb is aligned if headroom isn't"), where the
xdpf-&gt;headroom became part of the SKB_DATA_ALIGN rounding up. This
round-up to find the frame_size is in principle still correct as it does
not exceed the 2048 bytes frame_size (which is max for ixgbe and i40e),
but the 32 bytes offset of pkt_data_start puts this over the 2048 bytes
limit. This cause skb_shared_info to spill into next frame. It is a little
hard to trigger, as the SKB need to use above 15 skb_shinfo-&gt;frags[] as
far as I calculate. This does happen in practise for TCP streams when
skb_try_coalesce() kicks in.

KASAN can be used to detect these wrong memory accesses, I've seen:
 BUG: KASAN: use-after-free in skb_try_coalesce+0x3cb/0x760
 BUG: KASAN: wild-memory-access in skb_release_data+0xe2/0x250

Driver veth also construct a SKB from xdp_frame in this way, but is not
affected, as it doesn't reserve/deduct the room (used by xdp_frame) from
the SKB headroom. Instead is clears the pointers via xdp_scrub_frame(),
and allows SKB to use this area.

The fix in this patch is to do like veth and instead allow SKB to (re)use
the area occupied by xdp_frame, by clearing via xdp_scrub_frame().  (This
does kill the idea of the SKB being able to access (mem) info from this
area, but I guess it was a bad idea anyhow, and it was already killed by
the veth changes.)

Fixes: 77ea5f4cbe20 ("bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't")
Signed-off-by: Jesper Dangaard Brouer &lt;brouer@redhat.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: drop refcount if bpf_map_new_fd() fails in map_create()</title>
<updated>2019-12-05T08:21:15Z</updated>
<author>
<name>Peng Sun</name>
<email>sironhide0null@gmail.com</email>
</author>
<published>2019-02-27T14:36:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1a5df073d06cfe457ff04ffd3d7172617b06f527'/>
<id>urn:sha1:1a5df073d06cfe457ff04ffd3d7172617b06f527</id>
<content type='text'>
[ Upstream commit 352d20d611414715353ee65fc206ee57ab1a6984 ]

In bpf/syscall.c, map_create() first set map-&gt;usercnt to 1, a file
descriptor is supposed to return to userspace. When bpf_map_new_fd()
fails, drop the refcount.

Fixes: bd5f5f4ecb78 ("bpf: Add BPF_MAP_GET_FD_BY_ID")
Signed-off-by: Peng Sun &lt;sironhide0null@gmail.com&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: decrease usercnt if bpf_map_new_fd() fails in bpf_map_get_fd_by_id()</title>
<updated>2019-12-05T08:21:13Z</updated>
<author>
<name>Peng Sun</name>
<email>sironhide0null@gmail.com</email>
</author>
<published>2019-02-26T14:15:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=944df7b4b9dc1a75ec13e48505402bcc96906a89'/>
<id>urn:sha1:944df7b4b9dc1a75ec13e48505402bcc96906a89</id>
<content type='text'>
[ Upstream commit 781e62823cb81b972dc8652c1827205cda2ac9ac ]

In bpf/syscall.c, bpf_map_get_fd_by_id() use bpf_map_inc_not_zero()
to increase the refcount, both map-&gt;refcnt and map-&gt;usercnt. Then, if
bpf_map_new_fd() fails, should handle map-&gt;usercnt too.

Fixes: bd5f5f4ecb78 ("bpf: Add BPF_MAP_GET_FD_BY_ID")
Signed-off-by: Peng Sun &lt;sironhide0null@gmail.com&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
