<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel, branch v4.20.16</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.20.16</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.20.16'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2019-03-13T21:04:14Z</updated>
<entry>
<title>bpf: Fix syscall's stackmap lookup potential deadlock</title>
<updated>2019-03-13T21:04:14Z</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2019-01-31T02:12:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1038213107fc64be7d421a8340c45297747ecb94'/>
<id>urn:sha1:1038213107fc64be7d421a8340c45297747ecb94</id>
<content type='text'>
[ Upstream commit 7c4cd051add3d00bbff008a133c936c515eaa8fe ]

The map_lookup_elem used to not acquiring spinlock
in order to optimize the reader.

It was true until commit 557c0c6e7df8 ("bpf: convert stackmap to pre-allocation")
The syscall's map_lookup_elem(stackmap) calls bpf_stackmap_copy().
bpf_stackmap_copy() may find the elem no longer needed after the copy is done.
If that is the case, pcpu_freelist_push() saves this elem for reuse later.
This push requires a spinlock.

If a tracing bpf_prog got run in the middle of the syscall's
map_lookup_elem(stackmap) and this tracing bpf_prog is calling
bpf_get_stackid(stackmap) which also requires the same pcpu_freelist's
spinlock, it may end up with a dead lock situation as reported by
Eric Dumazet in https://patchwork.ozlabs.org/patch/1030266/

The situation is the same as the syscall's map_update_elem() which
needs to acquire the pcpu_freelist's spinlock and could race
with tracing bpf_prog.  Hence, this patch fixes it by protecting
bpf_stackmap_copy() with this_cpu_inc(bpf_prog_active)
to prevent tracing bpf_prog from running.

A later syscall's map_lookup_elem commit f1a2e44a3aec ("bpf: add queue and stack maps")
also acquires a spinlock and races with tracing bpf_prog similarly.
Hence, this patch is forward looking and protects the majority
of the map lookups.  bpf_map_offload_lookup_elem() is the exception
since it is for network bpf_prog only (i.e. never called by tracing
bpf_prog).

Fixes: 557c0c6e7df8 ("bpf: convert stackmap to pre-allocation")
Reported-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: fix potential deadlock in bpf_prog_register</title>
<updated>2019-03-13T21:04:14Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2019-01-31T02:12:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0795cf1a9315a75f8910760620806934fbd1f2fd'/>
<id>urn:sha1:0795cf1a9315a75f8910760620806934fbd1f2fd</id>
<content type='text'>
[ Upstream commit e16ec34039c701594d55d08a5aa49ee3e1abc821 ]

Lockdep found a potential deadlock between cpu_hotplug_lock, bpf_event_mutex, and cpuctx_mutex:
[   13.007000] WARNING: possible circular locking dependency detected
[   13.007587] 5.0.0-rc3-00018-g2fa53f892422-dirty #477 Not tainted
[   13.008124] ------------------------------------------------------
[   13.008624] test_progs/246 is trying to acquire lock:
[   13.009030] 0000000094160d1d (tracepoints_mutex){+.+.}, at: tracepoint_probe_register_prio+0x2d/0x300
[   13.009770]
[   13.009770] but task is already holding lock:
[   13.010239] 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
[   13.010877]
[   13.010877] which lock already depends on the new lock.
[   13.010877]
[   13.011532]
[   13.011532] the existing dependency chain (in reverse order) is:
[   13.012129]
[   13.012129] -&gt; #4 (bpf_event_mutex){+.+.}:
[   13.012582]        perf_event_query_prog_array+0x9b/0x130
[   13.013016]        _perf_ioctl+0x3aa/0x830
[   13.013354]        perf_ioctl+0x2e/0x50
[   13.013668]        do_vfs_ioctl+0x8f/0x6a0
[   13.014003]        ksys_ioctl+0x70/0x80
[   13.014320]        __x64_sys_ioctl+0x16/0x20
[   13.014668]        do_syscall_64+0x4a/0x180
[   13.015007]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   13.015469]
[   13.015469] -&gt; #3 (&amp;cpuctx_mutex){+.+.}:
[   13.015910]        perf_event_init_cpu+0x5a/0x90
[   13.016291]        perf_event_init+0x1b2/0x1de
[   13.016654]        start_kernel+0x2b8/0x42a
[   13.016995]        secondary_startup_64+0xa4/0xb0
[   13.017382]
[   13.017382] -&gt; #2 (pmus_lock){+.+.}:
[   13.017794]        perf_event_init_cpu+0x21/0x90
[   13.018172]        cpuhp_invoke_callback+0xb3/0x960
[   13.018573]        _cpu_up+0xa7/0x140
[   13.018871]        do_cpu_up+0xa4/0xc0
[   13.019178]        smp_init+0xcd/0xd2
[   13.019483]        kernel_init_freeable+0x123/0x24f
[   13.019878]        kernel_init+0xa/0x110
[   13.020201]        ret_from_fork+0x24/0x30
[   13.020541]
[   13.020541] -&gt; #1 (cpu_hotplug_lock.rw_sem){++++}:
[   13.021051]        static_key_slow_inc+0xe/0x20
[   13.021424]        tracepoint_probe_register_prio+0x28c/0x300
[   13.021891]        perf_trace_event_init+0x11f/0x250
[   13.022297]        perf_trace_init+0x6b/0xa0
[   13.022644]        perf_tp_event_init+0x25/0x40
[   13.023011]        perf_try_init_event+0x6b/0x90
[   13.023386]        perf_event_alloc+0x9a8/0xc40
[   13.023754]        __do_sys_perf_event_open+0x1dd/0xd30
[   13.024173]        do_syscall_64+0x4a/0x180
[   13.024519]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   13.024968]
[   13.024968] -&gt; #0 (tracepoints_mutex){+.+.}:
[   13.025434]        __mutex_lock+0x86/0x970
[   13.025764]        tracepoint_probe_register_prio+0x2d/0x300
[   13.026215]        bpf_probe_register+0x40/0x60
[   13.026584]        bpf_raw_tracepoint_open.isra.34+0xa4/0x130
[   13.027042]        __do_sys_bpf+0x94f/0x1a90
[   13.027389]        do_syscall_64+0x4a/0x180
[   13.027727]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   13.028171]
[   13.028171] other info that might help us debug this:
[   13.028171]
[   13.028807] Chain exists of:
[   13.028807]   tracepoints_mutex --&gt; &amp;cpuctx_mutex --&gt; bpf_event_mutex
[   13.028807]
[   13.029666]  Possible unsafe locking scenario:
[   13.029666]
[   13.030140]        CPU0                    CPU1
[   13.030510]        ----                    ----
[   13.030875]   lock(bpf_event_mutex);
[   13.031166]                                lock(&amp;cpuctx_mutex);
[   13.031645]                                lock(bpf_event_mutex);
[   13.032135]   lock(tracepoints_mutex);
[   13.032441]
[   13.032441]  *** DEADLOCK ***
[   13.032441]
[   13.032911] 1 lock held by test_progs/246:
[   13.033239]  #0: 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
[   13.033909]
[   13.033909] stack backtrace:
[   13.034258] CPU: 1 PID: 246 Comm: test_progs Not tainted 5.0.0-rc3-00018-g2fa53f892422-dirty #477
[   13.034964] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[   13.035657] Call Trace:
[   13.035859]  dump_stack+0x5f/0x8b
[   13.036130]  print_circular_bug.isra.37+0x1ce/0x1db
[   13.036526]  __lock_acquire+0x1158/0x1350
[   13.036852]  ? lock_acquire+0x98/0x190
[   13.037154]  lock_acquire+0x98/0x190
[   13.037447]  ? tracepoint_probe_register_prio+0x2d/0x300
[   13.037876]  __mutex_lock+0x86/0x970
[   13.038167]  ? tracepoint_probe_register_prio+0x2d/0x300
[   13.038600]  ? tracepoint_probe_register_prio+0x2d/0x300
[   13.039028]  ? __mutex_lock+0x86/0x970
[   13.039337]  ? __mutex_lock+0x24a/0x970
[   13.039649]  ? bpf_probe_register+0x1d/0x60
[   13.039992]  ? __bpf_trace_sched_wake_idle_without_ipi+0x10/0x10
[   13.040478]  ? tracepoint_probe_register_prio+0x2d/0x300
[   13.040906]  tracepoint_probe_register_prio+0x2d/0x300
[   13.041325]  bpf_probe_register+0x40/0x60
[   13.041649]  bpf_raw_tracepoint_open.isra.34+0xa4/0x130
[   13.042068]  ? __might_fault+0x3e/0x90
[   13.042374]  __do_sys_bpf+0x94f/0x1a90
[   13.042678]  do_syscall_64+0x4a/0x180
[   13.042975]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   13.043382] RIP: 0033:0x7f23b10a07f9
[   13.045155] RSP: 002b:00007ffdef42fdd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[   13.045759] RAX: ffffffffffffffda RBX: 00007ffdef42ff70 RCX: 00007f23b10a07f9
[   13.046326] RDX: 0000000000000070 RSI: 00007ffdef42fe10 RDI: 0000000000000011
[   13.046893] RBP: 00007ffdef42fdf0 R08: 0000000000000038 R09: 00007ffdef42fe10
[   13.047462] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   13.048029] R13: 0000000000000016 R14: 00007f23b1db4690 R15: 0000000000000000

Since tracepoints_mutex will be taken in tracepoint_probe_register/unregister()
there is no need to take bpf_event_mutex too.
bpf_event_mutex is protecting modifications to prog array used in kprobe/perf bpf progs.
bpf_raw_tracepoints don't need to take this mutex.

Fixes: c4f6699dfcb8 ("bpf: introduce BPF_RAW_TRACEPOINT")
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: fix lockdep false positive in percpu_freelist</title>
<updated>2019-03-13T21:04:13Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2019-01-31T02:12:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ad9c8ec58ccb52cb9dde782a4ca04e0474073730'/>
<id>urn:sha1:ad9c8ec58ccb52cb9dde782a4ca04e0474073730</id>
<content type='text'>
[ Upstream commit a89fac57b5d080771efd4d71feaae19877cf68f0 ]

Lockdep warns about false positive:
[   12.492084] 00000000e6b28347 (&amp;head-&gt;lock){+...}, at: pcpu_freelist_push+0x2a/0x40
[   12.492696] but this lock was taken by another, HARDIRQ-safe lock in the past:
[   12.493275]  (&amp;rq-&gt;lock){-.-.}
[   12.493276]
[   12.493276]
[   12.493276] and interrupts could create inverse lock ordering between them.
[   12.493276]
[   12.494435]
[   12.494435] other info that might help us debug this:
[   12.494979]  Possible interrupt unsafe locking scenario:
[   12.494979]
[   12.495518]        CPU0                    CPU1
[   12.495879]        ----                    ----
[   12.496243]   lock(&amp;head-&gt;lock);
[   12.496502]                                local_irq_disable();
[   12.496969]                                lock(&amp;rq-&gt;lock);
[   12.497431]                                lock(&amp;head-&gt;lock);
[   12.497890]   &lt;Interrupt&gt;
[   12.498104]     lock(&amp;rq-&gt;lock);
[   12.498368]
[   12.498368]  *** DEADLOCK ***
[   12.498368]
[   12.498837] 1 lock held by dd/276:
[   12.499110]  #0: 00000000c58cb2ee (rcu_read_lock){....}, at: trace_call_bpf+0x5e/0x240
[   12.499747]
[   12.499747] the shortest dependencies between 2nd lock and 1st lock:
[   12.500389]  -&gt; (&amp;rq-&gt;lock){-.-.} {
[   12.500669]     IN-HARDIRQ-W at:
[   12.500934]                       _raw_spin_lock+0x2f/0x40
[   12.501373]                       scheduler_tick+0x4c/0xf0
[   12.501812]                       update_process_times+0x40/0x50
[   12.502294]                       tick_periodic+0x27/0xb0
[   12.502723]                       tick_handle_periodic+0x1f/0x60
[   12.503203]                       timer_interrupt+0x11/0x20
[   12.503651]                       __handle_irq_event_percpu+0x43/0x2c0
[   12.504167]                       handle_irq_event_percpu+0x20/0x50
[   12.504674]                       handle_irq_event+0x37/0x60
[   12.505139]                       handle_level_irq+0xa7/0x120
[   12.505601]                       handle_irq+0xa1/0x150
[   12.506018]                       do_IRQ+0x77/0x140
[   12.506411]                       ret_from_intr+0x0/0x1d
[   12.506834]                       _raw_spin_unlock_irqrestore+0x53/0x60
[   12.507362]                       __setup_irq+0x481/0x730
[   12.507789]                       setup_irq+0x49/0x80
[   12.508195]                       hpet_time_init+0x21/0x32
[   12.508644]                       x86_late_time_init+0xb/0x16
[   12.509106]                       start_kernel+0x390/0x42a
[   12.509554]                       secondary_startup_64+0xa4/0xb0
[   12.510034]     IN-SOFTIRQ-W at:
[   12.510305]                       _raw_spin_lock+0x2f/0x40
[   12.510772]                       try_to_wake_up+0x1c7/0x4e0
[   12.511220]                       swake_up_locked+0x20/0x40
[   12.511657]                       swake_up_one+0x1a/0x30
[   12.512070]                       rcu_process_callbacks+0xc5/0x650
[   12.512553]                       __do_softirq+0xe6/0x47b
[   12.512978]                       irq_exit+0xc3/0xd0
[   12.513372]                       smp_apic_timer_interrupt+0xa9/0x250
[   12.513876]                       apic_timer_interrupt+0xf/0x20
[   12.514343]                       default_idle+0x1c/0x170
[   12.514765]                       do_idle+0x199/0x240
[   12.515159]                       cpu_startup_entry+0x19/0x20
[   12.515614]                       start_kernel+0x422/0x42a
[   12.516045]                       secondary_startup_64+0xa4/0xb0
[   12.516521]     INITIAL USE at:
[   12.516774]                      _raw_spin_lock_irqsave+0x38/0x50
[   12.517258]                      rq_attach_root+0x16/0xd0
[   12.517685]                      sched_init+0x2f2/0x3eb
[   12.518096]                      start_kernel+0x1fb/0x42a
[   12.518525]                      secondary_startup_64+0xa4/0xb0
[   12.518986]   }
[   12.519132]   ... key      at: [&lt;ffffffff82b7bc28&gt;] __key.71384+0x0/0x8
[   12.519649]   ... acquired at:
[   12.519892]    pcpu_freelist_pop+0x7b/0xd0
[   12.520221]    bpf_get_stackid+0x1d2/0x4d0
[   12.520563]    ___bpf_prog_run+0x8b4/0x11a0
[   12.520887]
[   12.521008] -&gt; (&amp;head-&gt;lock){+...} {
[   12.521292]    HARDIRQ-ON-W at:
[   12.521539]                     _raw_spin_lock+0x2f/0x40
[   12.521950]                     pcpu_freelist_push+0x2a/0x40
[   12.522396]                     bpf_get_stackid+0x494/0x4d0
[   12.522828]                     ___bpf_prog_run+0x8b4/0x11a0
[   12.523296]    INITIAL USE at:
[   12.523537]                    _raw_spin_lock+0x2f/0x40
[   12.523944]                    pcpu_freelist_populate+0xc0/0x120
[   12.524417]                    htab_map_alloc+0x405/0x500
[   12.524835]                    __do_sys_bpf+0x1a3/0x1a90
[   12.525253]                    do_syscall_64+0x4a/0x180
[   12.525659]                    entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   12.526167]  }
[   12.526311]  ... key      at: [&lt;ffffffff838f7668&gt;] __key.13130+0x0/0x8
[   12.526812]  ... acquired at:
[   12.527047]    __lock_acquire+0x521/0x1350
[   12.527371]    lock_acquire+0x98/0x190
[   12.527680]    _raw_spin_lock+0x2f/0x40
[   12.527994]    pcpu_freelist_push+0x2a/0x40
[   12.528325]    bpf_get_stackid+0x494/0x4d0
[   12.528645]    ___bpf_prog_run+0x8b4/0x11a0
[   12.528970]
[   12.529092]
[   12.529092] stack backtrace:
[   12.529444] CPU: 0 PID: 276 Comm: dd Not tainted 5.0.0-rc3-00018-g2fa53f892422 #475
[   12.530043] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[   12.530750] Call Trace:
[   12.530948]  dump_stack+0x5f/0x8b
[   12.531248]  check_usage_backwards+0x10c/0x120
[   12.531598]  ? ___bpf_prog_run+0x8b4/0x11a0
[   12.531935]  ? mark_lock+0x382/0x560
[   12.532229]  mark_lock+0x382/0x560
[   12.532496]  ? print_shortest_lock_dependencies+0x180/0x180
[   12.532928]  __lock_acquire+0x521/0x1350
[   12.533271]  ? find_get_entry+0x17f/0x2e0
[   12.533586]  ? find_get_entry+0x19c/0x2e0
[   12.533902]  ? lock_acquire+0x98/0x190
[   12.534196]  lock_acquire+0x98/0x190
[   12.534482]  ? pcpu_freelist_push+0x2a/0x40
[   12.534810]  _raw_spin_lock+0x2f/0x40
[   12.535099]  ? pcpu_freelist_push+0x2a/0x40
[   12.535432]  pcpu_freelist_push+0x2a/0x40
[   12.535750]  bpf_get_stackid+0x494/0x4d0
[   12.536062]  ___bpf_prog_run+0x8b4/0x11a0

It has been explained that is a false positive here:
https://lkml.org/lkml/2018/7/25/756
Recap:
- stackmap uses pcpu_freelist
- The lock in pcpu_freelist is a percpu lock
- stackmap is only used by tracing bpf_prog
- A tracing bpf_prog cannot be run if another bpf_prog
  has already been running (ensured by the percpu bpf_prog_active counter).

Eric pointed out that this lockdep splats stops other
legit lockdep splats in selftests/bpf/test_progs.c.

Fix this by calling local_irq_save/restore for stackmap.

Another false positive had also been worked around by calling
local_irq_save in commit 89ad2fa3f043 ("bpf: fix lockdep splat").
That commit added unnecessary irq_save/restore to fast path of
bpf hash map. irqs are already disabled at that point, since htab
is holding per bucket spin_lock with irqsave.

Let's reduce overhead for htab by introducing __pcpu_freelist_push/pop
function w/o irqsave and convert pcpu_freelist_push/pop to irqsave
to be used elsewhere (right now only in stackmap).
It stops lockdep false positive in stackmap with a bit of acceptable overhead.

Fixes: 557c0c6e7df8 ("bpf: convert stackmap to pre-allocation")
Reported-by: Naresh Kamboju &lt;naresh.kamboju@linaro.org&gt;
Reported-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: run bpf programs with preemption disabled</title>
<updated>2019-03-13T21:04:13Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2019-01-29T02:43:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=55bc09838a42a8479ba08f6cb553137f626c5de5'/>
<id>urn:sha1:55bc09838a42a8479ba08f6cb553137f626c5de5</id>
<content type='text'>
[ Upstream commit 6cab5e90ab2bd323c9f3811b6c70a4687df51e27 ]

Disabled preemption is necessary for proper access to per-cpu maps
from BPF programs.

But the sender side of socket filters didn't have preemption disabled:
unix_dgram_sendmsg-&gt;sk_filter-&gt;sk_filter_trim_cap-&gt;bpf_prog_run_save_cb-&gt;BPF_PROG_RUN

and a combination of af_packet with tun device didn't disable either:
tpacket_snd-&gt;packet_direct_xmit-&gt;packet_pick_tx_queue-&gt;ndo_select_queue-&gt;
  tun_select_queue-&gt;tun_ebpf_select_queue-&gt;bpf_prog_run_clear_cb-&gt;BPF_PROG_RUN

Disable preemption before executing BPF programs (both classic and extended).

Reported-by: Jann Horn &lt;jannh@google.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>relay: check return of create_buf_file() properly</title>
<updated>2019-03-13T21:04:13Z</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2019-01-31T12:57:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4aebaff3e9bfcad6770e0248044d7c7ba2c97e90'/>
<id>urn:sha1:4aebaff3e9bfcad6770e0248044d7c7ba2c97e90</id>
<content type='text'>
[ Upstream commit 2c1cf00eeacb784781cf1c9896b8af001246d339 ]

If create_buf_file() returns an error, don't try to reference it later
as a valid dentry pointer.

This problem was exposed when debugfs started to return errors instead
of just NULL for some calls when they do not succeed properly.

Also, the check for WARN_ON(dentry) was just wrong :)

Reported-by: Kees Cook &lt;keescook@chromium.org&gt;
Reported-and-tested-by: syzbot+16c3a70e1e9b29346c43@syzkaller.appspotmail.com
Reported-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Fixes: ff9fb72bc077 ("debugfs: return error values, not NULL")
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>psi: fix aggregation idle shut-off</title>
<updated>2019-03-13T21:04:08Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2019-02-01T22:20:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=94ea61db81c6ab4e3f5e8c2c2735ad8e80ea3caa'/>
<id>urn:sha1:94ea61db81c6ab4e3f5e8c2c2735ad8e80ea3caa</id>
<content type='text'>
[ Upstream commit 1b69ac6b40ebd85eed73e4dbccde2a36961ab990 ]

psi has provisions to shut off the periodic aggregation worker when
there is a period of no task activity - and thus no data that needs
aggregating.  However, while developing psi monitoring, Suren noticed
that the aggregation clock currently won't stay shut off for good.

Debugging this revealed a flaw in the idle design: an aggregation run
will see no task activity and decide to go to sleep; shortly thereafter,
the kworker thread that executed the aggregation will go idle and cause
a scheduling change, during which the psi callback will kick the
!pending worker again.  This will ping-pong forever, and is equivalent
to having no shut-off logic at all (but with more code!)

Fix this by exempting aggregation workers from psi's clock waking logic
when the state change is them going to sleep.  To do this, tag workers
with the last work function they executed, and if in psi we see a worker
going to sleep after aggregating psi data, we will not reschedule the
aggregation work item.

What if the worker is also executing other items before or after?

Any psi state times that were incurred by work items preceding the
aggregation work will have been collected from the per-cpu buckets
during the aggregation itself.  If there are work items following the
aggregation work, the worker's last_func tag will be overwritten and the
aggregator will be kept alive to process this genuine new activity.

If the aggregation work is the last thing the worker does, and we decide
to go idle, the brief period of non-idle time incurred between the
aggregation run and the kworker's dequeue will be stranded in the
per-cpu buckets until the clock is woken by later activity.  But that
should not be a problem.  The buckets can hold 4s worth of time, and
future activity will wake the clock with a 2s delay, giving us 2s worth
of data we can leave behind when disabling aggregation.  If it takes a
worker more than two seconds to go idle after it finishes its last work
item, we likely have bigger problems in the system, and won't notice one
sample that was averaged with a bogus per-CPU weight.

Link: http://lkml.kernel.org/r/20190116193501.1910-1-hannes@cmpxchg.org
Fixes: eb414681d5a0 ("psi: pressure stall information for CPU, memory, and IO")
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reported-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Lai Jiangshan &lt;jiangshanlai@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf core: Fix perf_proc_update_handler() bug</title>
<updated>2019-03-13T21:04:00Z</updated>
<author>
<name>Stephane Eranian</name>
<email>eranian@google.com</email>
</author>
<published>2019-01-11T01:17:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b5b1eed6b863ef020a6309d9040f3cea31fcdb10'/>
<id>urn:sha1:b5b1eed6b863ef020a6309d9040f3cea31fcdb10</id>
<content type='text'>
[ Upstream commit 1a51c5da5acc6c188c917ba572eebac5f8793432 ]

The perf_proc_update_handler() handles /proc/sys/kernel/perf_event_max_sample_rate
syctl variable.  When the PMU IRQ handler timing monitoring is disabled, i.e,
when /proc/sys/kernel/perf_cpu_time_max_percent is equal to 0 or 100,
then no modification to sysctl_perf_event_sample_rate is allowed to prevent
possible hang from wrong values.

The problem is that the test to prevent modification is made after the
sysctl variable is modified in perf_proc_update_handler().

You get an error:

  $ echo 10001 &gt;/proc/sys/kernel/perf_event_max_sample_rate
  echo: write error: invalid argument

But the value is still modified causing all sorts of inconsistencies:

  $ cat /proc/sys/kernel/perf_event_max_sample_rate
  10001

This patch fixes the problem by moving the parsing of the value after
the test.

Committer testing:

  # echo 100 &gt; /proc/sys/kernel/perf_cpu_time_max_percent
  # echo 10001 &gt; /proc/sys/kernel/perf_event_max_sample_rate
  -bash: echo: write error: Invalid argument
  # cat /proc/sys/kernel/perf_event_max_sample_rate
  10001
  #

Signed-off-by: Stephane Eranian &lt;eranian@google.com&gt;
Reviewed-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Reviewed-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/1547169436-6266-1-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: fix sanitation rewrite in case of non-pointers</title>
<updated>2019-03-10T06:10:16Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2019-03-01T21:05:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f0dcfe51152a9242b9391724f546553b1077f17a'/>
<id>urn:sha1:f0dcfe51152a9242b9391724f546553b1077f17a</id>
<content type='text'>
commit 3612af783cf52c74a031a2f11b82247b2599d3cd upstream.

Marek reported that he saw an issue with the below snippet in that
timing measurements where off when loaded as unpriv while results
were reasonable when loaded as privileged:

    [...]
    uint64_t a = bpf_ktime_get_ns();
    uint64_t b = bpf_ktime_get_ns();
    uint64_t delta = b - a;
    if ((int64_t)delta &gt; 0) {
    [...]

Turns out there is a bug where a corner case is missing in the fix
d3bd7413e0ca ("bpf: fix sanitation of alu op with pointer / scalar
type from different paths"), namely fixup_bpf_calls() only checks
whether aux has a non-zero alu_state, but it also needs to test for
the case of BPF_ALU_NON_POINTER since in both occasions we need to
skip the masking rewrite (as there is nothing to mask).

Fixes: d3bd7413e0ca ("bpf: fix sanitation of alu op with pointer / scalar type from different paths")
Reported-by: Marek Majkowski &lt;marek@cloudflare.com&gt;
Reported-by: Arthur Fabre &lt;afabre@cloudflare.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Link: https://lore.kernel.org/netdev/CAJPywTJqP34cK20iLM5YmUMz9KXQOdu1-+BZrGMAGgLuBWz7fg@mail.gmail.com/T/
Acked-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>tracing: Fix event filters and triggers to handle negative numbers</title>
<updated>2019-03-10T06:10:15Z</updated>
<author>
<name>Pavel Tikhomirov</name>
<email>ptikhomirov@virtuozzo.com</email>
</author>
<published>2018-08-23T10:25:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b3a9963ed5df9c3f520a51281bf8d11714324871'/>
<id>urn:sha1:b3a9963ed5df9c3f520a51281bf8d11714324871</id>
<content type='text'>
commit 6a072128d262d2b98d31626906a96700d1fc11eb upstream.

Then tracing syscall exit event it is extremely useful to filter exit
codes equal to some negative value, to react only to required errors.
But negative numbers does not work:

[root@snorch sys_exit_read]# echo "ret == -1" &gt; filter
bash: echo: write error: Invalid argument
[root@snorch sys_exit_read]# cat filter
ret == -1
        ^
parse_error: Invalid value (did you forget quotes)?

Similar thing happens when setting triggers.

These is a regression in v4.17 introduced by the commit mentioned below,
testing without these commit shows no problem with negative numbers.

Link: http://lkml.kernel.org/r/20180823102534.7642-1-ptikhomirov@virtuozzo.com

Cc: stable@vger.kernel.org
Fixes: 80765597bc58 ("tracing: Rewrite filter logic to be simpler and faster")
Signed-off-by: Pavel Tikhomirov &lt;ptikhomirov@virtuozzo.com&gt;
Signed-off-by: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>locking/rwsem: Fix (possible) missed wakeup</title>
<updated>2019-03-05T16:59:39Z</updated>
<author>
<name>Xie Yongji</name>
<email>xieyongji@baidu.com</email>
</author>
<published>2018-11-29T12:50:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0d086c4edc659e859079ac653fbb6220e1e66fcd'/>
<id>urn:sha1:0d086c4edc659e859079ac653fbb6220e1e66fcd</id>
<content type='text'>
[ Upstream commit e158488be27b157802753a59b336142dc0eb0380 ]

Because wake_q_add() can imply an immediate wakeup (cmpxchg failure
case), we must not rely on the wakeup being delayed. However, commit:

  e38513905eea ("locking/rwsem: Rework zeroing reader waiter-&gt;task")

relies on exactly that behaviour in that the wakeup must not happen
until after we clear waiter-&gt;task.

[ peterz: Added changelog. ]

Signed-off-by: Xie Yongji &lt;xieyongji@baidu.com&gt;
Signed-off-by: Zhang Yu &lt;zhangyu31@baidu.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: e38513905eea ("locking/rwsem: Rework zeroing reader waiter-&gt;task")
Link: https://lkml.kernel.org/r/1543495830-2644-1-git-send-email-xieyongji@baidu.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
