summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2022-12-19perf: Fix perf_pending_task() UaFPeter Zijlstra
[ Upstream commit 517e6a301f34613bff24a8e35b5455884f2d83d8 ] Per syzbot it is possible for perf_pending_task() to run after the event is free()'d. There are two related but distinct cases: - the task_work was already queued before destroying the event; - destroying the event itself queues the task_work. The first cannot be solved using task_work_cancel() since perf_release() itself might be called from a task_work (____fput), which means the current->task_works list is already empty and task_work_cancel() won't be able to find the perf_pending_task() entry. The simplest alternative is extending the perf_event lifetime to cover the task_work. The second is just silly, queueing a task_work while you know the event is going away makes no sense and is easily avoided by re-arranging how the event is marked STATE_DEAD and ensuring it goes through STATE_OFF on the way down. Reported-by: syzbot+9228d6098455bb209ec8@syzkaller.appspotmail.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Marco Elver <elver@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-14memcg: fix possible use-after-free in memcg_write_event_control()Tejun Heo
commit 4a7ba45b1a435e7097ca0f79a847d0949d0eb088 upstream. memcg_write_event_control() accesses the dentry->d_name of the specified control fd to route the write call. As a cgroup interface file can't be renamed, it's safe to access d_name as long as the specified file is a regular cgroup file. Also, as these cgroup interface files can't be removed before the directory, it's safe to access the parent too. Prior to 347c4a874710 ("memcg: remove cgroup_event->cft"), there was a call to __file_cft() which verified that the specified file is a regular cgroupfs file before further accesses. The cftype pointer returned from __file_cft() was no longer necessary and the commit inadvertently dropped the file type check with it allowing any file to slip through. With the invarients broken, the d_name and parent accesses can now race against renames and removals of arbitrary files and cause use-after-free's. Fix the bug by resurrecting the file type check in __file_cft(). Now that cgroupfs is implemented through kernfs, checking the file operations needs to go through a layer of indirection. Instead, let's check the superblock and dentry type. Link: https://lkml.kernel.org/r/Y5FRm/cfcKPGzWwl@slm.duckdns.org Fixes: 347c4a874710 ("memcg: remove cgroup_event->cft") Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jann Horn <jannh@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: <stable@vger.kernel.org> [3.14+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-08proc: proc_skip_spaces() shouldn't think it is working on C stringsLinus Torvalds
commit bce9332220bd677d83b19d21502776ad555a0e73 upstream. proc_skip_spaces() seems to think it is working on C strings, and ends up being just a wrapper around skip_spaces() with a really odd calling convention. Instead of basing it on skip_spaces(), it should have looked more like proc_skip_char(), which really is the exact same function (except it skips a particular character, rather than whitespace). So use that as inspiration, odd coding and all. Now the calling convention actually makes sense and works for the intended purpose. Reported-and-tested-by: Kyle Zeng <zengyhkyle@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-08proc: avoid integer type confusion in get_proc_longLinus Torvalds
commit e6cfaf34be9fcd1a8285a294e18986bfc41a409c upstream. proc_get_long() is passed a size_t, but then assigns it to an 'int' variable for the length. Let's not do that, even if our IO paths are limited to MAX_RW_COUNT (exactly because of these kinds of type errors). So do the proper test in the rigth type. Reported-by: Kyle Zeng <zengyhkyle@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-08tracing: Free buffers when a used dynamic event is removedSteven Rostedt (Google)
commit 4313e5a613049dfc1819a6dfb5f94cf2caff9452 upstream. After 65536 dynamic events have been added and removed, the "type" field of the event then uses the first type number that is available (not currently used by other events). A type number is the identifier of the binary blobs in the tracing ring buffer (known as events) to map them to logic that can parse the binary blob. The issue is that if a dynamic event (like a kprobe event) is traced and is in the ring buffer, and then that event is removed (because it is dynamic, which means it can be created and destroyed), if another dynamic event is created that has the same number that new event's logic on parsing the binary blob will be used. To show how this can be an issue, the following can crash the kernel: # cd /sys/kernel/tracing # for i in `seq 65536`; do echo 'p:kprobes/foo do_sys_openat2 $arg1:u32' > kprobe_events # done For every iteration of the above, the writing to the kprobe_events will remove the old event and create a new one (with the same format) and increase the type number to the next available on until the type number reaches over 65535 which is the max number for the 16 bit type. After it reaches that number, the logic to allocate a new number simply looks for the next available number. When an dynamic event is removed, that number is then available to be reused by the next dynamic event created. That is, once the above reaches the max number, the number assigned to the event in that loop will remain the same. Now that means deleting one dynamic event and created another will reuse the previous events type number. This is where bad things can happen. After the above loop finishes, the kprobes/foo event which reads the do_sys_openat2 function call's first parameter as an integer. # echo 1 > kprobes/foo/enable # cat /etc/passwd > /dev/null # cat trace cat-2211 [005] .... 2007.849603: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196 cat-2211 [005] .... 2007.849620: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196 cat-2211 [005] .... 2007.849838: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196 cat-2211 [005] .... 2007.849880: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196 # echo 0 > kprobes/foo/enable Now if we delete the kprobe and create a new one that reads a string: # echo 'p:kprobes/foo do_sys_openat2 +0($arg2):string' > kprobe_events And now we can the trace: # cat trace sendmail-1942 [002] ..... 530.136320: foo: (do_sys_openat2+0x0/0x240) arg1= cat-2046 [004] ..... 530.930817: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������" cat-2046 [004] ..... 530.930961: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������" cat-2046 [004] ..... 530.934278: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������" cat-2046 [004] ..... 530.934563: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������" bash-1515 [007] ..... 534.299093: foo: (do_sys_openat2+0x0/0x240) arg1="kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk���������@��4Z����;Y�����U And dmesg has: ================================================================== BUG: KASAN: use-after-free in string+0xd4/0x1c0 Read of size 1 at addr ffff88805fdbbfa0 by task cat/2049 CPU: 0 PID: 2049 Comm: cat Not tainted 6.1.0-rc6-test+ #641 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 Call Trace: <TASK> dump_stack_lvl+0x5b/0x77 print_report+0x17f/0x47b kasan_report+0xad/0x130 string+0xd4/0x1c0 vsnprintf+0x500/0x840 seq_buf_vprintf+0x62/0xc0 trace_seq_printf+0x10e/0x1e0 print_type_string+0x90/0xa0 print_kprobe_event+0x16b/0x290 print_trace_line+0x451/0x8e0 s_show+0x72/0x1f0 seq_read_iter+0x58e/0x750 seq_read+0x115/0x160 vfs_read+0x11d/0x460 ksys_read+0xa9/0x130 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fc2e972ade2 Code: c0 e9 b2 fe ff ff 50 48 8d 3d b2 3f 0a 00 e8 05 f0 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 RSP: 002b:00007ffc64e687c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2e972ade2 RDX: 0000000000020000 RSI: 00007fc2e980d000 RDI: 0000000000000003 RBP: 00007fc2e980d000 R08: 00007fc2e980c010 R09: 0000000000000000 R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000020f00 R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 </TASK> The buggy address belongs to the physical page: page:ffffea00017f6ec0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5fdbb flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) raw: 000fffffc0000000 0000000000000000 ffffea00017f6ec8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88805fdbbe80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88805fdbbf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff88805fdbbf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff88805fdbc000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88805fdbc080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== This was found when Zheng Yejian sent a patch to convert the event type number assignment to use IDA, which gives the next available number, and this bug showed up in the fuzz testing by Yujie Liu and the kernel test robot. But after further analysis, I found that this behavior is the same as when the event type numbers go past the 16bit max (and the above shows that). As modules have a similar issue, but is dealt with by setting a "WAS_ENABLED" flag when a module event is enabled, and when the module is freed, if any of its events were enabled, the ring buffer that holds that event is also cleared, to prevent reading stale events. The same can be done for dynamic events. If any dynamic event that is being removed was enabled, then make sure the buffers they were enabled in are now cleared. Link: https://lkml.kernel.org/r/20221123171434.545706e3@gandalf.local.home Link: https://lore.kernel.org/all/20221110020319.1259291-1-zhengyejian1@huawei.com/ Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Depends-on: e18eb8783ec49 ("tracing: Add tracing_reset_all_online_cpus_unlocked() function") Depends-on: 5448d44c38557 ("tracing: Add unified dynamic event framework") Depends-on: 6212dd29683ee ("tracing/kprobes: Use dyn_event framework for kprobe events") Depends-on: 065e63f951432 ("tracing: Only have rmmod clear buffers that its events were active in") Depends-on: 575380da8b469 ("tracing: Only clear trace buffer on module unload if event was traced") Fixes: 77b44d1b7c283 ("tracing/kprobes: Rename Kprobe-tracer to kprobe-event") Reported-by: Zheng Yejian <zhengyejian1@huawei.com> Reported-by: Yujie Liu <yujie.liu@intel.com> Reported-by: kernel test robot <yujie.liu@intel.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-08tracing: Fix race where histograms can be called before the eventSteven Rostedt (Google)
commit ef38c79a522b660f7f71d45dad2d6244bc741841 upstream. commit 94eedf3dded5 ("tracing: Fix race where eprobes can be called before the event") fixed an issue where if an event is soft disabled, and the trigger is being added, there's a small window where the event sees that there's a trigger but does not see that it requires reading the event yet, and then calls the trigger with the record == NULL. This could be solved with adding memory barriers in the hot path, or to make sure that all the triggers requiring a record check for NULL. The latter was chosen. Commit 94eedf3dded5 set the eprobe trigger handle to check for NULL, but the same needs to be done with histograms. Link: https://lore.kernel.org/linux-trace-kernel/20221118211809.701d40c0f8a757b0df3c025a@kernel.org/ Link: https://lore.kernel.org/linux-trace-kernel/20221123164323.03450c3a@gandalf.local.home Cc: Tom Zanussi <zanussi@kernel.org> Cc: stable@vger.kernel.org Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events") Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-08tracing/osnoise: Fix duration typeDaniel Bristot de Oliveira
commit 022632f6c43a86f2135642dccd5686de318e861d upstream. The duration type is a 64 long value, not an int. This was causing some long noise to report wrong values. Change the duration to a 64 bits value. Link: https://lkml.kernel.org/r/a93d8a8378c7973e9c609de05826533c9e977939.1668692096.git.bristot@kernel.org Cc: stable@vger.kernel.org Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Fixes: bce29ac9ce0b ("trace: Add osnoise tracer") Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-08bpf: Do not copy spin lock field from user in bpf_selem_allocXu Kuohai
[ Upstream commit 836e49e103dfeeff670c934b7d563cbd982fce87 ] bpf_selem_alloc function is used by inode_storage, sk_storage and task_storage maps to set map value, for these map types, there may be a spin lock in the map value, so if we use memcpy to copy the whole map value from user, the spin lock field may be initialized incorrectly. Since the spin lock field is zeroed by kzalloc, call copy_map_value instead of memcpy to skip copying the spin lock field to fix it. Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage") Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20221114134720.1057939-2-xukuohai@huawei.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-08bpf, perf: Use subprog name when reporting subprog ksymbolHou Tao
[ Upstream commit 47df8a2f78bc34ff170d147d05b121f84e252b85 ] Since commit bfea9a8574f3 ("bpf: Add name to struct bpf_ksym"), when reporting subprog ksymbol to perf, prog name instead of subprog name is used. The backtrace of bpf program with subprogs will be incorrect as shown below: ffffffffc02deace bpf_prog_e44a3057dcb151f8_overwrite+0x66 ffffffffc02de9f7 bpf_prog_e44a3057dcb151f8_overwrite+0x9f ffffffffa71d8d4e trace_call_bpf+0xce ffffffffa71c2938 perf_call_bpf_enter.isra.0+0x48 overwrite is the entry program and it invokes the overwrite_htab subprog through bpf_loop, but in above backtrace, overwrite program just jumps inside itself. Fixing it by using subprog name when reporting subprog ksymbol. After the fix, the output of perf script will be correct as shown below: ffffffffc031aad2 bpf_prog_37c0bec7d7c764a4_overwrite_htab+0x66 ffffffffc031a9e7 bpf_prog_c7eb827ef4f23e71_overwrite+0x9f ffffffffa3dd8d4e trace_call_bpf+0xce ffffffffa3dc2938 perf_call_bpf_enter.isra.0+0x48 Fixes: bfea9a8574f3 ("bpf: Add name to struct bpf_ksym") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20221114095733.158588-1-houtao@huaweicloud.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-02bpf: Add explicit cast to 'void *' for __BPF_DISPATCHER_UPDATE()Nathan Chancellor
commit a679120edfcf3d63f066f53afd425d51b480e533 upstream. When building with clang: kernel/bpf/dispatcher.c:126:33: error: pointer type mismatch ('void *' and 'unsigned int (*)(const void *, const struct bpf_insn *, bpf_func_t)' (aka 'unsigned int (*)(const void *, const struct bpf_insn *, unsigned int (*)(const void *, const struct bpf_insn *))')) [-Werror,-Wpointer-type-mismatch] __BPF_DISPATCHER_UPDATE(d, new ?: &bpf_dispatcher_nop_func); ~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~ ./include/linux/bpf.h:1045:54: note: expanded from macro '__BPF_DISPATCHER_UPDATE' __static_call_update((_d)->sc_key, (_d)->sc_tramp, (_new)) ^~~~ 1 error generated. The warning is pointing out that the type of new ('void *') and &bpf_dispatcher_nop_func are not compatible, which could have side effects coming out of a conditional operator due to promotion rules. Add the explicit cast to 'void *' to make it clear that this is expected, as __BPF_DISPATCHER_UPDATE() expands to a call to __static_call_update(), which expects a 'void *' as its final argument. Fixes: c86df29d11df ("bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)") Link: https://github.com/ClangBuiltLinux/linux/issues/1755 Reported-by: kernel test robot <lkp@intel.com> Reported-by: "kernelci.org bot" <bot@kernelci.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221107170711.42409-1-nathan@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-02bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)Peter Zijlstra
[ Upstream commit c86df29d11dfba27c0a1f5039cd6fe387fbf4239 ] The dispatcher function is currently abusing the ftrace __fentry__ call location for its own purposes -- this obviously gives trouble when the dispatcher and ftrace are both in use. A previous solution tried using __attribute__((patchable_function_entry())) which works, except it is GCC-8+ only, breaking the build on the earlier still supported compilers. Instead use static_call() -- which has its own annotations and does not conflict with ftrace -- to rewrite the dispatch function. By using: return static_call()(ctx, insni, bpf_func) you get a perfect forwarding tail call as function body (iow a single jmp instruction). By having the default static_call() target be bpf_dispatcher_nop_func() it retains the default behaviour (an indirect call to the argument function). Only once a dispatcher program is attached is the target rewritten to directly call the JIT'ed image. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lkml.kernel.org/r/Y1/oBlK0yFk5c/Im@hirez.programming.kicks-ass.net Link: https://lore.kernel.org/bpf/20221103120647.796772565@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-02gcov: clang: fix the buffer overflow issueMukesh Ojha
commit a6f810efabfd789d3bbafeacb4502958ec56c5ce upstream. Currently, in clang version of gcov code when module is getting removed gcov_info_add() incorrectly adds the sfn_ptr->counter to all the dst->functions and it result in the kernel panic in below crash report. Fix this by properly handling it. [ 8.899094][ T599] Unable to handle kernel write to read-only memory at virtual address ffffff80461cc000 [ 8.899100][ T599] Mem abort info: [ 8.899102][ T599] ESR = 0x9600004f [ 8.899103][ T599] EC = 0x25: DABT (current EL), IL = 32 bits [ 8.899105][ T599] SET = 0, FnV = 0 [ 8.899107][ T599] EA = 0, S1PTW = 0 [ 8.899108][ T599] FSC = 0x0f: level 3 permission fault [ 8.899110][ T599] Data abort info: [ 8.899111][ T599] ISV = 0, ISS = 0x0000004f [ 8.899113][ T599] CM = 0, WnR = 1 [ 8.899114][ T599] swapper pgtable: 4k pages, 39-bit VAs, pgdp=00000000ab8de000 [ 8.899116][ T599] [ffffff80461cc000] pgd=18000009ffcde003, p4d=18000009ffcde003, pud=18000009ffcde003, pmd=18000009ffcad003, pte=00600000c61cc787 [ 8.899124][ T599] Internal error: Oops: 9600004f [#1] PREEMPT SMP [ 8.899265][ T599] Skip md ftrace buffer dump for: 0x1609e0 .... .., [ 8.899544][ T599] CPU: 7 PID: 599 Comm: modprobe Tainted: G S OE 5.15.41-android13-8-g38e9b1af6bce #1 [ 8.899547][ T599] Hardware name: XXX (DT) [ 8.899549][ T599] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--) [ 8.899551][ T599] pc : gcov_info_add+0x9c/0xb8 [ 8.899557][ T599] lr : gcov_event+0x28c/0x6b8 [ 8.899559][ T599] sp : ffffffc00e733b00 [ 8.899560][ T599] x29: ffffffc00e733b00 x28: ffffffc00e733d30 x27: ffffffe8dc297470 [ 8.899563][ T599] x26: ffffffe8dc297000 x25: ffffffe8dc297000 x24: ffffffe8dc297000 [ 8.899566][ T599] x23: ffffffe8dc0a6200 x22: ffffff880f68bf20 x21: 0000000000000000 [ 8.899569][ T599] x20: ffffff880f68bf00 x19: ffffff8801babc00 x18: ffffffc00d7f9058 [ 8.899572][ T599] x17: 0000000000088793 x16: ffffff80461cbe00 x15: 9100052952800785 [ 8.899575][ T599] x14: 0000000000000200 x13: 0000000000000041 x12: 9100052952800785 [ 8.899577][ T599] x11: ffffffe8dc297000 x10: ffffffe8dc297000 x9 : ffffff80461cbc80 [ 8.899580][ T599] x8 : ffffff8801babe80 x7 : ffffffe8dc2ec000 x6 : ffffffe8dc2ed000 [ 8.899583][ T599] x5 : 000000008020001f x4 : fffffffe2006eae0 x3 : 000000008020001f [ 8.899586][ T599] x2 : ffffff8027c49200 x1 : ffffff8801babc20 x0 : ffffff80461cb3a0 [ 8.899589][ T599] Call trace: [ 8.899590][ T599] gcov_info_add+0x9c/0xb8 [ 8.899592][ T599] gcov_module_notifier+0xbc/0x120 [ 8.899595][ T599] blocking_notifier_call_chain+0xa0/0x11c [ 8.899598][ T599] do_init_module+0x2a8/0x33c [ 8.899600][ T599] load_module+0x23cc/0x261c [ 8.899602][ T599] __arm64_sys_finit_module+0x158/0x194 [ 8.899604][ T599] invoke_syscall+0x94/0x2bc [ 8.899607][ T599] el0_svc_common+0x1d8/0x34c [ 8.899609][ T599] do_el0_svc+0x40/0x54 [ 8.899611][ T599] el0_svc+0x94/0x2f0 [ 8.899613][ T599] el0t_64_sync_handler+0x88/0xec [ 8.899615][ T599] el0t_64_sync+0x1b4/0x1b8 [ 8.899618][ T599] Code: f905f56c f86e69ec f86e6a0f 8b0c01ec (f82e6a0c) [ 8.899620][ T599] ---[ end trace ed5218e9e5b6e2e6 ]--- Link: https://lkml.kernel.org/r/1668020497-13142-1-git-send-email-quic_mojha@quicinc.com Fixes: e178a5beb369 ("gcov: clang support") Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Tested-by: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Tom Rix <trix@redhat.com> Cc: <stable@vger.kernel.org> [5.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-26bpf: Prevent bpf program recursion for raw tracepoint probesJiri Olsa
commit 05b24ff9b2cfabfcfd951daaa915a036ab53c9e1 upstream. We got report from sysbot [1] about warnings that were caused by bpf program attached to contention_begin raw tracepoint triggering the same tracepoint by using bpf_trace_printk helper that takes trace_printk_lock lock. Call Trace: <TASK> ? trace_event_raw_event_bpf_trace_printk+0x5f/0x90 bpf_trace_printk+0x2b/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 __unfreeze_partials+0x5b/0x160 ... The can be reproduced by attaching bpf program as raw tracepoint on contention_begin tracepoint. The bpf prog calls bpf_trace_printk helper. Then by running perf bench the spin lock code is forced to take slow path and call contention_begin tracepoint. Fixing this by skipping execution of the bpf program if it's already running, Using bpf prog 'active' field, which is being currently used by trampoline programs for the same reason. Moving bpf_prog_inc_misses_counter to syscall.c because trampoline.c is compiled in just for CONFIG_BPF_JIT option. Reviewed-by: Stanislav Fomichev <sdf@google.com> Reported-by: syzbot+2251879aa068ad9c960d@syzkaller.appspotmail.com [1] https://lore.kernel.org/bpf/YxhFe3EwqchC%2FfYf@krava/T/#t Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220916071914.7156-1-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-26rseq: Use pr_warn_once() when deprecated/unknown ABI flags are encounteredMathieu Desnoyers
commit 448dca8c88755b768552e19bd1618be34ef6d1ff upstream. These commits use WARN_ON_ONCE() and kill the offending processes when deprecated and unknown flags are encountered: commit c17a6ff93213 ("rseq: Kill process when unknown flags are encountered in ABI structures") commit 0190e4198e47 ("rseq: Deprecate RSEQ_CS_FLAG_NO_RESTART_ON_* flags") The WARN_ON_ONCE() triggered by userspace input prevents use of Syzkaller to fuzz the rseq system call. Replace this WARN_ON_ONCE() by pr_warn_once() messages which contain actually useful information. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lkml.kernel.org/r/20221102130635.7379-1-mathieu.desnoyers@efficios.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-26tracing: Fix potential null-pointer-access of entry in list 'tr->err_log'Zheng Yejian
[ Upstream commit 067df9e0ad48a97382ab2179bbe773a13a846028 ] Entries in list 'tr->err_log' will be reused after entry number exceed TRACING_LOG_ERRS_MAX. The cmd string of the to be reused entry will be freed first then allocated a new one. If the allocation failed, then the entry will still be in list 'tr->err_log' but its 'cmd' field is set to be NULL, later access of 'cmd' is risky. Currently above problem can cause the loss of 'cmd' information of first entry in 'tr->err_log'. When execute `cat /sys/kernel/tracing/error_log`, reproduce logs like: [ 37.495100] trace_kprobe: error: Maxactive is not for kprobe(null) ^ [ 38.412517] trace_kprobe: error: Maxactive is not for kprobe Command: p4:myprobe2 do_sys_openat2 ^ Link: https://lore.kernel.org/linux-trace-kernel/20221114104632.3547266-1-zhengyejian1@huawei.com Fixes: 1581a884b7ca ("tracing: Remove size restriction on tracing_log_err cmd strings") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-26kprobes: Skip clearing aggrprobe's post_handler in kprobe-on-ftrace caseLi Huafei
[ Upstream commit 5dd7caf0bdc5d0bae7cf9776b4d739fb09bd5ebb ] In __unregister_kprobe_top(), if the currently unregistered probe has post_handler but other child probes of the aggrprobe do not have post_handler, the post_handler of the aggrprobe is cleared. If this is a ftrace-based probe, there is a problem. In later calls to disarm_kprobe(), we will use kprobe_ftrace_ops because post_handler is NULL. But we're armed with kprobe_ipmodify_ops. This triggers a WARN in __disarm_kprobe_ftrace() and may even cause use-after-free: Failed to disarm kprobe-ftrace at kernel_clone+0x0/0x3c0 (error -2) WARNING: CPU: 5 PID: 137 at kernel/kprobes.c:1135 __disarm_kprobe_ftrace.isra.21+0xcf/0xe0 Modules linked in: testKprobe_007(-) CPU: 5 PID: 137 Comm: rmmod Not tainted 6.1.0-rc4-dirty #18 [...] Call Trace: <TASK> __disable_kprobe+0xcd/0xe0 __unregister_kprobe_top+0x12/0x150 ? mutex_lock+0xe/0x30 unregister_kprobes.part.23+0x31/0xa0 unregister_kprobe+0x32/0x40 __x64_sys_delete_module+0x15e/0x260 ? do_user_addr_fault+0x2cd/0x6b0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] For the kprobe-on-ftrace case, we keep the post_handler setting to identify this aggrprobe armed with kprobe_ipmodify_ops. This way we can disarm it correctly. Link: https://lore.kernel.org/all/20221112070000.35299-1-lihuafei1@huawei.com/ Fixes: 0bc11ed5ab60 ("kprobes: Allow kprobes coexist with livepatch") Reported-by: Zhao Gongyi <zhaogongyi@huawei.com> Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Li Huafei <lihuafei1@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-26ring-buffer: Include dropped pages in counting dirty patchesSteven Rostedt (Google)
[ Upstream commit 31029a8b2c7e656a0289194ef16415050ae4c4ac ] The function ring_buffer_nr_dirty_pages() was created to find out how many pages are filled in the ring buffer. There's two running counters. One is incremented whenever a new page is touched (pages_touched) and the other is whenever a page is read (pages_read). The dirty count is the number touched minus the number read. This is used to determine if a blocked task should be woken up if the percentage of the ring buffer it is waiting for is hit. The problem is that it does not take into account dropped pages (when the new writes overwrite pages that were not read). And then the dirty pages will always be greater than the percentage. This makes the "buffer_percent" file inaccurate, as the number of dirty pages end up always being larger than the percentage, event when it's not and this causes user space to be woken up more than it wants to be. Add a new counter to keep track of lost pages, and include that in the accounting of dirty pages so that it is actually accurate. Link: https://lkml.kernel.org/r/20221021123013.55fb6055@gandalf.local.home Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-26perf: Improve missing SIGTRAP checkingMarco Elver
[ Upstream commit bb88f9695460bec25aa30ba9072595025cf6c8af ] To catch missing SIGTRAP we employ a WARN in __perf_event_overflow(), which fires if pending_sigtrap was already set: returning to user space without consuming pending_sigtrap, and then having the event fire again would re-enter the kernel and trigger the WARN. This, however, seemed to miss the case where some events not associated with progress in the user space task can fire and the interrupt handler runs before the IRQ work meant to consume pending_sigtrap (and generate the SIGTRAP). syzbot gifted us this stack trace: | WARNING: CPU: 0 PID: 3607 at kernel/events/core.c:9313 __perf_event_overflow | Modules linked in: | CPU: 0 PID: 3607 Comm: syz-executor100 Not tainted 6.1.0-rc2-syzkaller-00073-g88619e77b33d #0 | Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022 | RIP: 0010:__perf_event_overflow+0x498/0x540 kernel/events/core.c:9313 | <...> | Call Trace: | <TASK> | perf_swevent_hrtimer+0x34f/0x3c0 kernel/events/core.c:10729 | __run_hrtimer kernel/time/hrtimer.c:1685 [inline] | __hrtimer_run_queues+0x1c6/0xfb0 kernel/time/hrtimer.c:1749 | hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811 | local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1096 [inline] | __sysvec_apic_timer_interrupt+0x17c/0x640 arch/x86/kernel/apic/apic.c:1113 | sysvec_apic_timer_interrupt+0x40/0xc0 arch/x86/kernel/apic/apic.c:1107 | asm_sysvec_apic_timer_interrupt+0x16/0x20 arch/x86/include/asm/idtentry.h:649 | <...> | </TASK> In this case, syzbot produced a program with event type PERF_TYPE_SOFTWARE and config PERF_COUNT_SW_CPU_CLOCK. The hrtimer manages to fire again before the IRQ work got a chance to run, all while never having returned to user space. Improve the WARN to check for real progress in user space: approximate this by storing a 32-bit hash of the current IP into pending_sigtrap, and if an event fires while pending_sigtrap still matches the previous IP, we assume no progress (false negatives are possible given we could return to user space and trigger again on the same IP). Fixes: ca6c21327c6a ("perf: Fix missing SIGTRAPs") Reported-by: syzbot+b8ded3e2e2c6adde4990@syzkaller.appspotmail.com Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221031093513.3032814-1-elver@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-26rethook: fix a potential memleak in rethook_alloc()Yi Yang
commit 0a1ebe35cb3b7aa1f4b26b37e2a0b9ae68dc4ffb upstream. In rethook_alloc(), the variable rh is not freed or passed out if handler is NULL, which could lead to a memleak, fix it. Link: https://lore.kernel.org/all/20221110104438.88099-1-yiyang13@huawei.com/ [Masami: Add "rethook:" tag to the title.] Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook") Cc: stable@vger.kernel.org Signed-off-by: Yi Yang <yiyang13@huawei.com> Acke-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-26tracing: kprobe: Fix potential null-ptr-deref on trace_array in ↵Shang XiaoJing
kprobe_event_gen_test_exit() commit 22ea4ca9631eb137e64e5ab899e9c89cb6670959 upstream. When test_gen_kprobe_cmd() failed after kprobe_event_gen_cmd_end(), it will goto delete, which will call kprobe_event_delete() and release the corresponding resource. However, the trace_array in gen_kretprobe_test will point to the invalid resource. Set gen_kretprobe_test to NULL after called kprobe_event_delete() to prevent null-ptr-deref. BUG: kernel NULL pointer dereference, address: 0000000000000070 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 246 Comm: modprobe Tainted: G W 6.1.0-rc1-00174-g9522dc5c87da-dirty #248 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 RIP: 0010:__ftrace_set_clr_event_nolock+0x53/0x1b0 Code: e8 82 26 fc ff 49 8b 1e c7 44 24 0c ea ff ff ff 49 39 de 0f 84 3c 01 00 00 c7 44 24 18 00 00 00 00 e8 61 26 fc ff 48 8b 6b 10 <44> 8b 65 70 4c 8b 6d 18 41 f7 c4 00 02 00 00 75 2f RSP: 0018:ffffc9000159fe00 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff88810971d268 RCX: 0000000000000000 RDX: ffff8881080be600 RSI: ffffffff811b48ff RDI: ffff88810971d058 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 R10: ffffc9000159fe58 R11: 0000000000000001 R12: ffffffffa0001064 R13: ffffffffa000106c R14: ffff88810971d238 R15: 0000000000000000 FS: 00007f89eeff6540(0000) GS:ffff88813b600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000070 CR3: 000000010599e004 CR4: 0000000000330ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __ftrace_set_clr_event+0x3e/0x60 trace_array_set_clr_event+0x35/0x50 ? 0xffffffffa0000000 kprobe_event_gen_test_exit+0xcd/0x10b [kprobe_event_gen_test] __x64_sys_delete_module+0x206/0x380 ? lockdep_hardirqs_on_prepare+0xd8/0x190 ? syscall_enter_from_user_mode+0x1c/0x50 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f89eeb061b7 Link: https://lore.kernel.org/all/20221108015130.28326-3-shangxiaojing@huawei.com/ Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Cc: stable@vger.kernel.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-26tracing: kprobe: Fix potential null-ptr-deref on trace_event_file in ↵Shang XiaoJing
kprobe_event_gen_test_exit() commit e0d75267f59d7084e0468bd68beeb1bf9c71d7c0 upstream. When trace_get_event_file() failed, gen_kretprobe_test will be assigned as the error code. If module kprobe_event_gen_test is removed now, the null pointer dereference will happen in kprobe_event_gen_test_exit(). Check if gen_kprobe_test or gen_kretprobe_test is error code or NULL before dereference them. BUG: kernel NULL pointer dereference, address: 0000000000000012 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 2210 Comm: modprobe Not tainted 6.1.0-rc1-00171-g2159299a3b74-dirty #217 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 RIP: 0010:kprobe_event_gen_test_exit+0x1c/0xb5 [kprobe_event_gen_test] Code: Unable to access opcode bytes at 0xffffffff9ffffff2. RSP: 0018:ffffc900015bfeb8 EFLAGS: 00010246 RAX: ffffffffffffffea RBX: ffffffffa0002080 RCX: 0000000000000000 RDX: ffffffffa0001054 RSI: ffffffffa0001064 RDI: ffffffffdfc6349c RBP: ffffffffa0000000 R08: 0000000000000004 R09: 00000000001e95c0 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000800 R13: ffffffffa0002420 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f56b75be540(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff9ffffff2 CR3: 000000010874a006 CR4: 0000000000330ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __x64_sys_delete_module+0x206/0x380 ? lockdep_hardirqs_on_prepare+0xd8/0x190 ? syscall_enter_from_user_mode+0x1c/0x50 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Link: https://lore.kernel.org/all/20221108015130.28326-2-shangxiaojing@huawei.com/ Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-26tracing: Fix race where eprobes can be called before the eventSteven Rostedt (Google)
commit 94eedf3dded5fb472ce97bfaf3ac1c6c29c35d26 upstream. The flag that tells the event to call its triggers after reading the event is set for eprobes after the eprobe is enabled. This leads to a race where the eprobe may be triggered at the beginning of the event where the record information is NULL. The eprobe then dereferences the NULL record causing a NULL kernel pointer bug. Test for a NULL record to keep this from happening. Link: https://lore.kernel.org/linux-trace-kernel/20221116192552.1066630-1-rafaelmendsr@gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/20221117214249.2addbe10@gandalf.local.home Cc: Linux Trace Kernel <linux-trace-kernel@vger.kernel.org> Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: stable@vger.kernel.org Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reported-by: Rafael Mendonca <rafaelmendsr@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-26tracing: Fix wild-memory-access in register_synth_event()Shang XiaoJing
commit 1b5f1c34d3f5a664a57a5a7557a50e4e3cc2505c upstream. In register_synth_event(), if set_synth_event_print_fmt() failed, then both trace_remove_event_call() and unregister_trace_event() will be called, which means the trace_event_call will call __unregister_trace_event() twice. As the result, the second unregister will causes the wild-memory-access. register_synth_event set_synth_event_print_fmt failed trace_remove_event_call event_remove if call->event.funcs then __unregister_trace_event (first call) unregister_trace_event __unregister_trace_event (second call) Fix the bug by avoiding to call the second __unregister_trace_event() by checking if the first one is called. general protection fault, probably for non-canonical address 0xfbd59c0000000024: 0000 [#1] SMP KASAN PTI KASAN: maybe wild-memory-access in range [0xdead000000000120-0xdead000000000127] CPU: 0 PID: 3807 Comm: modprobe Not tainted 6.1.0-rc1-00186-g76f33a7eedb4 #299 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_trace_event+0x6e/0x280 Code: 00 fc ff df 4c 89 ea 48 c1 ea 03 80 3c 02 00 0f 85 0e 02 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 63 08 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f 85 e2 01 00 00 49 89 2c 24 48 85 ed 74 28 e8 7a 9b RSP: 0018:ffff88810413f370 EFLAGS: 00010a06 RAX: dffffc0000000000 RBX: ffff888105d050b0 RCX: 0000000000000000 RDX: 1bd5a00000000024 RSI: ffff888119e276e0 RDI: ffffffff835a8b20 RBP: dead000000000100 R08: 0000000000000000 R09: fffffbfff0913481 R10: ffffffff8489a407 R11: fffffbfff0913480 R12: dead000000000122 R13: ffff888105d050b8 R14: 0000000000000000 R15: ffff888105d05028 FS: 00007f7823e8d540(0000) GS:ffff888119e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7823e7ebec CR3: 000000010a058002 CR4: 0000000000330ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __create_synth_event+0x1e37/0x1eb0 create_or_delete_synth_event+0x110/0x250 synth_event_run_command+0x2f/0x110 test_gen_synth_cmd+0x170/0x2eb [synth_event_gen_test] synth_event_gen_test_init+0x76/0x9bc [synth_event_gen_test] do_one_initcall+0xdb/0x480 do_init_module+0x1cf/0x680 load_module+0x6a50/0x70a0 __do_sys_finit_module+0x12f/0x1c0 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Link: https://lkml.kernel.org/r/20221117012346.22647-3-shangxiaojing@huawei.com Fixes: 4b147936fa50 ("tracing: Add support for 'synthetic' events") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Cc: stable@vger.kernel.org Cc: <mhiramat@kernel.org> Cc: <zanussi@kernel.org> Cc: <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-26tracing: Fix memory leak in test_gen_synth_cmd() and test_empty_synth_event()Shang XiaoJing
commit a4527fef9afe5c903c718d0cd24609fe9c754250 upstream. test_gen_synth_cmd() only free buf in fail path, hence buf will leak when there is no failure. Add kfree(buf) to prevent the memleak. The same reason and solution in test_empty_synth_event(). unreferenced object 0xffff8881127de000 (size 2048): comm "modprobe", pid 247, jiffies 4294972316 (age 78.756s) hex dump (first 32 bytes): 20 67 65 6e 5f 73 79 6e 74 68 5f 74 65 73 74 20 gen_synth_test 20 70 69 64 5f 74 20 6e 65 78 74 5f 70 69 64 5f pid_t next_pid_ backtrace: [<000000004254801a>] kmalloc_trace+0x26/0x100 [<0000000039eb1cf5>] 0xffffffffa00083cd [<000000000e8c3bc8>] 0xffffffffa00086ba [<00000000c293d1ea>] do_one_initcall+0xdb/0x480 [<00000000aa189e6d>] do_init_module+0x1cf/0x680 [<00000000d513222b>] load_module+0x6a50/0x70a0 [<000000001fd4d529>] __do_sys_finit_module+0x12f/0x1c0 [<00000000b36c4c0f>] do_syscall_64+0x3f/0x90 [<00000000bbf20cf3>] entry_SYSCALL_64_after_hwframe+0x63/0xcd unreferenced object 0xffff8881127df000 (size 2048): comm "modprobe", pid 247, jiffies 4294972324 (age 78.728s) hex dump (first 32 bytes): 20 65 6d 70 74 79 5f 73 79 6e 74 68 5f 74 65 73 empty_synth_tes 74 20 20 70 69 64 5f 74 20 6e 65 78 74 5f 70 69 t pid_t next_pi backtrace: [<000000004254801a>] kmalloc_trace+0x26/0x100 [<00000000d4db9a3d>] 0xffffffffa0008071 [<00000000c31354a5>] 0xffffffffa00086ce [<00000000c293d1ea>] do_one_initcall+0xdb/0x480 [<00000000aa189e6d>] do_init_module+0x1cf/0x680 [<00000000d513222b>] load_module+0x6a50/0x70a0 [<000000001fd4d529>] __do_sys_finit_module+0x12f/0x1c0 [<00000000b36c4c0f>] do_syscall_64+0x3f/0x90 [<00000000bbf20cf3>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Link: https://lkml.kernel.org/r/20221117012346.22647-2-shangxiaojing@huawei.com Cc: <mhiramat@kernel.org> Cc: <zanussi@kernel.org> Cc: <fengguang.wu@intel.com> Cc: stable@vger.kernel.org Fixes: 9fe41efaca08 ("tracing: Add synth event generation test module") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-26tracing/ring-buffer: Have polling block on watermarkSteven Rostedt (Google)
commit 42fb0a1e84ff525ebe560e2baf9451ab69127e2b upstream. Currently the way polling works on the ring buffer is broken. It will return immediately if there's any data in the ring buffer whereas a read will block until the watermark (defined by the tracefs buffer_percent file) is hit. That is, a select() or poll() will return as if there's data available, but then the following read will block. This is broken for the way select()s and poll()s are supposed to work. Have the polling on the ring buffer also block the same way reads and splice does on the ring buffer. Link: https://lkml.kernel.org/r/20221020231427.41be3f26@gandalf.local.home Cc: Linux Trace Kernel <linux-trace-kernel@vger.kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Primiano Tucci <primiano@google.com> Cc: stable@vger.kernel.org Fixes: 1e0d6714aceb7 ("ring-buffer: Do not wake up a splice waiter when page is not full") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-26tracing: Fix memory leak in tracing_read_pipe()Wang Yufen
commit 649e72070cbbb8600eb823833e4748f5a0815116 upstream. kmemleak reports this issue: unreferenced object 0xffff888105a18900 (size 128): comm "test_progs", pid 18933, jiffies 4336275356 (age 22801.766s) hex dump (first 32 bytes): 25 73 00 90 81 88 ff ff 26 05 00 00 42 01 58 04 %s......&...B.X. 03 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000560143a1>] __kmalloc_node_track_caller+0x4a/0x140 [<000000006af00822>] krealloc+0x8d/0xf0 [<00000000c309be6a>] trace_iter_expand_format+0x99/0x150 [<000000005a53bdb6>] trace_check_vprintf+0x1e0/0x11d0 [<0000000065629d9d>] trace_event_printf+0xb6/0xf0 [<000000009a690dc7>] trace_raw_output_bpf_trace_printk+0x89/0xc0 [<00000000d22db172>] print_trace_line+0x73c/0x1480 [<00000000cdba76ba>] tracing_read_pipe+0x45c/0x9f0 [<0000000015b58459>] vfs_read+0x17b/0x7c0 [<000000004aeee8ed>] ksys_read+0xed/0x1c0 [<0000000063d3d898>] do_syscall_64+0x3b/0x90 [<00000000a06dda7f>] entry_SYSCALL_64_after_hwframe+0x63/0xcd iter->fmt alloced in tracing_read_pipe() -> .. ->trace_iter_expand_format(), but not freed, to fix, add free in tracing_release_pipe() Link: https://lkml.kernel.org/r/1667819090-4643-1-git-send-email-wangyufen@huawei.com Cc: stable@vger.kernel.org Fixes: efbbdaa22bb7 ("tracing: Show real address for trace event arguments") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Wang Yufen <wangyufen@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-26ring_buffer: Do not deactivate non-existant pagesDaniil Tatianin
commit 56f4ca0a79a9f1af98f26c54b9b89ba1f9bcc6bd upstream. rb_head_page_deactivate() expects cpu_buffer to contain a valid list of ->pages, so verify that the list is actually present before calling it. Found by Linux Verification Center (linuxtesting.org) with the SVACE static analysis tool. Link: https://lkml.kernel.org/r/20221114143129.3534443-1-d-tatianin@yandex-team.ru Cc: stable@vger.kernel.org Fixes: 77ae365eca895 ("ring-buffer: make lockless") Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-26ftrace: Fix null pointer dereference in ftrace_add_mod()Xiu Jianfeng
commit 19ba6c8af9382c4c05dc6a0a79af3013b9a35cd0 upstream. The @ftrace_mod is allocated by kzalloc(), so both the members {prev,next} of @ftrace_mode->list are NULL, it's not a valid state to call list_del(). If kstrdup() for @ftrace_mod->{func|module} fails, it goes to @out_free tag and calls free_ftrace_mod() to destroy @ftrace_mod, then list_del() will write prev->next and next->prev, where null pointer dereference happens. BUG: kernel NULL pointer dereference, address: 0000000000000008 Oops: 0002 [#1] PREEMPT SMP NOPTI Call Trace: <TASK> ftrace_mod_callback+0x20d/0x220 ? do_filp_open+0xd9/0x140 ftrace_process_regex.isra.51+0xbf/0x130 ftrace_regex_write.isra.52.part.53+0x6e/0x90 vfs_write+0xee/0x3a0 ? __audit_filter_op+0xb1/0x100 ? auditd_test_task+0x38/0x50 ksys_write+0xa5/0xe0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Kernel panic - not syncing: Fatal exception So call INIT_LIST_HEAD() to initialize the list member to fix this issue. Link: https://lkml.kernel.org/r/20221116015207.30858-1-xiujianfeng@huawei.com Cc: stable@vger.kernel.org Fixes: 673feb9d76ab ("ftrace: Add :mod: caching infrastructure to trace_array") Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-26ftrace: Optimize the allocation for mcount entriesWang Wensheng
commit bcea02b096333dc74af987cb9685a4dbdd820840 upstream. If we can't allocate this size, try something smaller with half of the size. Its order should be decreased by one instead of divided by two. Link: https://lkml.kernel.org/r/20221109094434.84046-3-wangwensheng4@huawei.com Cc: <mhiramat@kernel.org> Cc: <mark.rutland@arm.com> Cc: stable@vger.kernel.org Fixes: a79008755497d ("ftrace: Allocate the mcount record pages as groups") Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-26ftrace: Fix the possible incorrect kernel messageWang Wensheng
commit 08948caebe93482db1adfd2154eba124f66d161d upstream. If the number of mcount entries is an integer multiple of ENTRIES_PER_PAGE, the page count showing on the console would be wrong. Link: https://lkml.kernel.org/r/20221109094434.84046-2-wangwensheng4@huawei.com Cc: <mhiramat@kernel.org> Cc: <mark.rutland@arm.com> Cc: stable@vger.kernel.org Fixes: 5821e1b74f0d0 ("function tracing: fix wrong pos computing when read buffer has been fulfilled") Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-26bpf: Initialize same number of free nodes for each pcpu_freelistXu Kuohai
[ Upstream commit 4b45cd81f737d79d0fbfc0d320a1e518e7f0bbf0 ] pcpu_freelist_populate() initializes nr_elems / num_possible_cpus() + 1 free nodes for some CPUs, and then possibly one CPU with fewer nodes, followed by remaining cpus with 0 nodes. For example, when nr_elems == 256 and num_possible_cpus() == 32, CPU 0~27 each gets 9 free nodes, CPU 28 gets 4 free nodes, CPU 29~31 get 0 free nodes, while in fact each CPU should get 8 nodes equally. This patch initializes nr_elems / num_possible_cpus() free nodes for each CPU firstly, then allocates the remaining free nodes by one for each CPU until no free nodes left. Fixes: e19494edab82 ("bpf: introduce percpu_freelist") Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20221110122128.105214-1-xukuohai@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-26bpf: Fix memory leaks in __check_func_callWang Yufen
[ Upstream commit eb86559a691cea5fa63e57a03ec3dc9c31e97955 ] kmemleak reports this issue: unreferenced object 0xffff88817139d000 (size 2048): comm "test_progs", pid 33246, jiffies 4307381979 (age 45851.820s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000045f075f0>] kmalloc_trace+0x27/0xa0 [<0000000098b7c90a>] __check_func_call+0x316/0x1230 [<00000000b4c3c403>] check_helper_call+0x172e/0x4700 [<00000000aa3875b7>] do_check+0x21d8/0x45e0 [<000000001147357b>] do_check_common+0x767/0xaf0 [<00000000b5a595b4>] bpf_check+0x43e3/0x5bc0 [<0000000011e391b1>] bpf_prog_load+0xf26/0x1940 [<0000000007f765c0>] __sys_bpf+0xd2c/0x3650 [<00000000839815d6>] __x64_sys_bpf+0x75/0xc0 [<00000000946ee250>] do_syscall_64+0x3b/0x90 [<0000000000506b7f>] entry_SYSCALL_64_after_hwframe+0x63/0xcd The root case here is: In function prepare_func_exit(), the callee is not released in the abnormal scenario after "state->curframe--;". To fix, move "state->curframe--;" to the very bottom of the function, right when we free callee and reset frame[] pointer to NULL, as Andrii suggested. In addition, function __check_func_call() has a similar problem. In the abnormal scenario before "state->curframe++;", the callee also should be released by free_func_state(). Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper") Fixes: fd978bf7fd31 ("bpf: Add reference tracking to verifier") Signed-off-by: Wang Yufen <wangyufen@huawei.com> Link: https://lore.kernel.org/r/1667884291-15666-1-git-send-email-wangyufen@huawei.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-16bpf: Fix wrong reg type conversion in release_reference()Youlin Li
[ Upstream commit f1db20814af532f85e091231223e5e4818e8464b ] Some helper functions will allocate memory. To avoid memory leaks, the verifier requires the eBPF program to release these memories by calling the corresponding helper functions. When a resource is released, all pointer registers corresponding to the resource should be invalidated. The verifier use release_references() to do this job, by apply __mark_reg_unknown() to each relevant register. It will give these registers the type of SCALAR_VALUE. A register that will contain a pointer value at runtime, but of type SCALAR_VALUE, which may allow the unprivileged user to get a kernel pointer by storing this register into a map. Using __mark_reg_not_init() while NOT allow_ptr_leaks can mitigate this problem. Fixes: fd978bf7fd31 ("bpf: Add reference tracking to verifier") Signed-off-by: Youlin Li <liulin063@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20221103093440.3161-1-liulin063@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-16bpf: Add helper macro bpf_for_each_reg_in_vstateKumar Kartikeya Dwivedi
[ Upstream commit b239da34203f49c40b5d656220c39647c3ff0b3c ] For a lot of use cases in future patches, we will want to modify the state of registers part of some same 'group' (e.g. same ref_obj_id). It won't just be limited to releasing reference state, but setting a type flag dynamically based on certain actions, etc. Hence, we need a way to easily pass a callback to the function that iterates over all registers in current bpf_verifier_state in all frames upto (and including) the curframe. While in C++ we would be able to easily use a lambda to pass state and the callback together, sadly we aren't using C++ in the kernel. The next best thing to avoid defining a function for each case seems like statement expressions in GNU C. The kernel already uses them heavily, hence they can passed to the macro in the style of a lambda. The statement expression will then be substituted in the for loop bodies. Variables __state and __reg are set to current bpf_func_state and reg for each invocation of the expression inside the passed in verifier state. Then, convert mark_ptr_or_null_regs, clear_all_pkt_pointers, release_reference, find_good_pkt_pointers, find_equal_scalars to use bpf_for_each_reg_in_vstate. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220904204145.3089-16-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Stable-dep-of: f1db20814af5 ("bpf: Fix wrong reg type conversion in release_reference()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-16bpf, verifier: Fix memory leak in array reallocation for stack stateKees Cook
[ Upstream commit 42378a9ca55347102bbf86708776061d8fe3ece2 ] If an error (NULL) is returned by krealloc(), callers of realloc_array() were setting their allocation pointers to NULL, but on error krealloc() does not touch the original allocation. This would result in a memory resource leak. Instead, free the old allocation on the error handling path. The memory leak information is as follows as also reported by Zhengchao: unreferenced object 0xffff888019801800 (size 256): comm "bpf_repo", pid 6490, jiffies 4294959200 (age 17.170s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000b211474b>] __kmalloc_node_track_caller+0x45/0xc0 [<0000000086712a0b>] krealloc+0x83/0xd0 [<00000000139aab02>] realloc_array+0x82/0xe2 [<00000000b1ca41d1>] grow_stack_state+0xfb/0x186 [<00000000cd6f36d2>] check_mem_access.cold+0x141/0x1341 [<0000000081780455>] do_check_common+0x5358/0xb350 [<0000000015f6b091>] bpf_check.cold+0xc3/0x29d [<000000002973c690>] bpf_prog_load+0x13db/0x2240 [<00000000028d1644>] __sys_bpf+0x1605/0x4ce0 [<00000000053f29bd>] __x64_sys_bpf+0x75/0xb0 [<0000000056fedaf5>] do_syscall_64+0x35/0x80 [<000000002bd58261>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: c69431aab67a ("bpf: verifier: Improve function state reallocation") Reported-by: Zhengchao Shao <shaozhengchao@huawei.com> Reported-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Bill Wendling <morbo@google.com> Cc: Lorenz Bauer <oss@lmb.io> Link: https://lore.kernel.org/bpf/20221029025433.2533810-1-keescook@chromium.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-10ring-buffer: Check for NULL cpu_buffer in ring_buffer_wake_waiters()Steven Rostedt (Google)
commit 7433632c9ff68a991bd0bc38cabf354e9d2de410 upstream. On some machines the number of listed CPUs may be bigger than the actual CPUs that exist. The tracing subsystem allocates a per_cpu directory with access to the per CPU ring buffer via a cpuX file. But to save space, the ring buffer will only allocate buffers for online CPUs, even though the CPU array will be as big as the nr_cpu_ids. With the addition of waking waiters on the ring buffer when closing the file, the ring_buffer_wake_waiters() now needs to make sure that the buffer is allocated (with the irq_work allocated with it) before trying to wake waiters, as it will cause a NULL pointer dereference. While debugging this, I added a NULL check for the buffer itself (which is OK to do), and also NULL pointer checks against buffer->buffers (which is not fine, and will WARN) as well as making sure the CPU number passed in is within the nr_cpu_ids (which is also not fine if it isn't). Link: https://lore.kernel.org/all/87h6zklb6n.wl-tiwai@suse.de/ Link: https://lore.kernel.org/all/CAM6Wdxc0KRJMXVAA0Y=u6Jh2V=uWB-_Fn6M4xRuNppfXzL1mUg@mail.gmail.com/ Link: https://lkml.kernel.org/linux-trace-kernel/20221101191009.1e7378c8@rorschach.local.home Cc: stable@vger.kernel.org Cc: Steven Noonan <steven.noonan@gmail.com> Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1204705 Reported-by: Takashi Iwai <tiwai@suse.de> Reported-by: Roland Ruckerbauer <roland.rucky@gmail.com> Fixes: f3ddb74ad079 ("tracing: Wake up ring buffer waiters on closing of the file") Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-10kprobe: reverse kp->flags when arm_kprobe failedLi Qiang
commit 4a6f316d6855a434f56dbbeba05e14c01acde8f8 upstream. In aggregate kprobe case, when arm_kprobe failed, we need set the kp->flags with KPROBE_FLAG_DISABLED again. If not, the 'kp' kprobe will been considered as enabled but it actually not enabled. Link: https://lore.kernel.org/all/20220902155820.34755-1-liq3ea@163.com/ Fixes: 12310e343755 ("kprobes: Propagate error from arm_kprobe_ftrace()") Cc: stable@vger.kernel.org Signed-off-by: Li Qiang <liq3ea@163.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-10tracing: kprobe: Fix memory leak in test_gen_kprobe/kretprobe_cmd()Shang XiaoJing
commit 66f0919c953ef7b55e5ab94389a013da2ce80a2c upstream. test_gen_kprobe_cmd() only free buf in fail path, hence buf will leak when there is no failure. Move kfree(buf) from fail path to common path to prevent the memleak. The same reason and solution in test_gen_kretprobe_cmd(). unreferenced object 0xffff888143b14000 (size 2048): comm "insmod", pid 52490, jiffies 4301890980 (age 40.553s) hex dump (first 32 bytes): 70 3a 6b 70 72 6f 62 65 73 2f 67 65 6e 5f 6b 70 p:kprobes/gen_kp 72 6f 62 65 5f 74 65 73 74 20 64 6f 5f 73 79 73 robe_test do_sys backtrace: [<000000006d7b836b>] kmalloc_trace+0x27/0xa0 [<0000000009528b5b>] 0xffffffffa059006f [<000000008408b580>] do_one_initcall+0x87/0x2a0 [<00000000c4980a7e>] do_init_module+0xdf/0x320 [<00000000d775aad0>] load_module+0x3006/0x3390 [<00000000e9a74b80>] __do_sys_finit_module+0x113/0x1b0 [<000000003726480d>] do_syscall_64+0x35/0x80 [<000000003441e93b>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 Link: https://lore.kernel.org/all/20221102072954.26555-1-shangxiaojing@huawei.com/ Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module") Cc: stable@vger.kernel.org Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-10fprobe: Check rethook_alloc() return in rethook initializationRafael Mendonca
commit d05ea35e7eea14d32f29fd688d3daeb9089de1a5 upstream. Check if fp->rethook succeeded to be allocated. Otherwise, if rethook_alloc() fails, then we end up dereferencing a NULL pointer in rethook_add_node(). Link: https://lore.kernel.org/all/20221025031209.954836-1-rafaelmendsr@gmail.com/ Fixes: 5b0ab78998e3 ("fprobe: Add exit_handler support") Cc: stable@vger.kernel.org Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-10tracing/fprobe: Fix to check whether fprobe is registered correctlyMasami Hiramatsu (Google)
commit 61b304b73ab4b48b1cd7796efe42a570e2a0e0fc upstream. Since commit ab51e15d535e ("fprobe: Introduce FPROBE_FL_KPROBE_SHARED flag for fprobe") introduced fprobe_kprobe_handler() for fprobe::ops::func, unregister_fprobe() fails to unregister the registered if user specifies FPROBE_FL_KPROBE_SHARED flag. Moreover, __register_ftrace_function() is possible to change the ftrace_ops::func, thus we have to check fprobe::ops::saved_func instead. To check it correctly, it should confirm the fprobe::ops::saved_func is either fprobe_handler() or fprobe_kprobe_handler(). Link: https://lore.kernel.org/all/166677683946.1459107.15997653945538644683.stgit@devnote3/ Fixes: cad9931f64dc ("fprobe: Add ftrace based probe APIs") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-10ftrace: Fix use-after-free for dynamic ftrace_opsLi Huafei
commit 0e792b89e6800cd9cb4757a76a96f7ef3e8b6294 upstream. KASAN reported a use-after-free with ftrace ops [1]. It was found from vmcore that perf had registered two ops with the same content successively, both dynamic. After unregistering the second ops, a use-after-free occurred. In ftrace_shutdown(), when the second ops is unregistered, the FTRACE_UPDATE_CALLS command is not set because there is another enabled ops with the same content. Also, both ops are dynamic and the ftrace callback function is ftrace_ops_list_func, so the FTRACE_UPDATE_TRACE_FUNC command will not be set. Eventually the value of 'command' will be 0 and ftrace_shutdown() will skip the rcu synchronization. However, ftrace may be activated. When the ops is released, another CPU may be accessing the ops. Add the missing synchronization to fix this problem. [1] BUG: KASAN: use-after-free in __ftrace_ops_list_func kernel/trace/ftrace.c:7020 [inline] BUG: KASAN: use-after-free in ftrace_ops_list_func+0x2b0/0x31c kernel/trace/ftrace.c:7049 Read of size 8 at addr ffff56551965bbc8 by task syz-executor.2/14468 CPU: 1 PID: 14468 Comm: syz-executor.2 Not tainted 5.10.0 #7 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x40c arch/arm64/kernel/stacktrace.c:132 show_stack+0x30/0x40 arch/arm64/kernel/stacktrace.c:196 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b4/0x248 lib/dump_stack.c:118 print_address_description.constprop.0+0x28/0x48c mm/kasan/report.c:387 __kasan_report mm/kasan/report.c:547 [inline] kasan_report+0x118/0x210 mm/kasan/report.c:564 check_memory_region_inline mm/kasan/generic.c:187 [inline] __asan_load8+0x98/0xc0 mm/kasan/generic.c:253 __ftrace_ops_list_func kernel/trace/ftrace.c:7020 [inline] ftrace_ops_list_func+0x2b0/0x31c kernel/trace/ftrace.c:7049 ftrace_graph_call+0x0/0x4 __might_sleep+0x8/0x100 include/linux/perf_event.h:1170 __might_fault mm/memory.c:5183 [inline] __might_fault+0x58/0x70 mm/memory.c:5171 do_strncpy_from_user lib/strncpy_from_user.c:41 [inline] strncpy_from_user+0x1f4/0x4b0 lib/strncpy_from_user.c:139 getname_flags+0xb0/0x31c fs/namei.c:149 getname+0x2c/0x40 fs/namei.c:209 [...] Allocated by task 14445: kasan_save_stack+0x24/0x50 mm/kasan/common.c:48 kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc mm/kasan/common.c:479 [inline] __kasan_kmalloc.constprop.0+0x110/0x13c mm/kasan/common.c:449 kasan_kmalloc+0xc/0x14 mm/kasan/common.c:493 kmem_cache_alloc_trace+0x440/0x924 mm/slub.c:2950 kmalloc include/linux/slab.h:563 [inline] kzalloc include/linux/slab.h:675 [inline] perf_event_alloc.part.0+0xb4/0x1350 kernel/events/core.c:11230 perf_event_alloc kernel/events/core.c:11733 [inline] __do_sys_perf_event_open kernel/events/core.c:11831 [inline] __se_sys_perf_event_open+0x550/0x15f4 kernel/events/core.c:11723 __arm64_sys_perf_event_open+0x6c/0x80 kernel/events/core.c:11723 [...] Freed by task 14445: kasan_save_stack+0x24/0x50 mm/kasan/common.c:48 kasan_set_track+0x24/0x34 mm/kasan/common.c:56 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:358 __kasan_slab_free.part.0+0x11c/0x1b0 mm/kasan/common.c:437 __kasan_slab_free mm/kasan/common.c:445 [inline] kasan_slab_free+0x2c/0x40 mm/kasan/common.c:446 slab_free_hook mm/slub.c:1569 [inline] slab_free_freelist_hook mm/slub.c:1608 [inline] slab_free mm/slub.c:3179 [inline] kfree+0x12c/0xc10 mm/slub.c:4176 perf_event_alloc.part.0+0xa0c/0x1350 kernel/events/core.c:11434 perf_event_alloc kernel/events/core.c:11733 [inline] __do_sys_perf_event_open kernel/events/core.c:11831 [inline] __se_sys_perf_event_open+0x550/0x15f4 kernel/events/core.c:11723 [...] Link: https://lore.kernel.org/linux-trace-kernel/20221103031010.166498-1-lihuafei1@huawei.com Fixes: edb096e00724f ("ftrace: Fix memleak when unregistering dynamic ops when tracing disabled") Cc: stable@vger.kernel.org Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Li Huafei <lihuafei1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-04PM: hibernate: Allow hybrid sleep to work with s2idleMario Limonciello
[ Upstream commit 85850af4fc47132f3f2f0dd698b90f67906600b4 ] Hybrid sleep is currently hardcoded to only operate with S3 even on systems that might not support it. Instead of assuming this mode is what the user wants to use, for hybrid sleep follow the setting of `mem_sleep_current` which will respect mem_sleep_default kernel command line and policy decisions made by the presence of the FADT low power idle bit. Fixes: 81d45bdf8913 ("PM / hibernate: Untangle power_down()") Reported-and-tested-by: kolAflash <kolAflash@kolahilft.de> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216574 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-04rcu: Keep synchronize_rcu() from enabling irqs in early bootPaul E. McKenney
[ Upstream commit 31d8aaa87fcef1be5932f3813ea369e21bd3b11d ] Making polled RCU grace periods account for expedited grace periods required acquiring the leaf rcu_node structure's lock during early boot, but after rcu_init() was called. This lock is irq-disabled, but the code incorrectly assumes that irqs are always disabled when invoking synchronize_rcu(). The exception is early boot before the scheduler has started, which means that upon return from synchronize_rcu(), irqs will be incorrectly enabled. This commit fixes this bug by using irqsave/irqrestore locking primitives. Fixes: bf95b2bc3e42 ("rcu: Switch polled grace-period APIs to ->gp_seq_polled") Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-04bpf: prevent decl_tag from being referenced in func_protoStanislav Fomichev
[ Upstream commit ea68376c8bed5cd156900852aada20c3a0874d17 ] Syzkaller was able to hit the following issue: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 3609 at kernel/bpf/btf.c:1946 btf_type_id_size+0x2d5/0x9d0 kernel/bpf/btf.c:1946 Modules linked in: CPU: 0 PID: 3609 Comm: syz-executor361 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 RIP: 0010:btf_type_id_size+0x2d5/0x9d0 kernel/bpf/btf.c:1946 Code: ef e8 7f 8e e4 ff 41 83 ff 0b 77 28 f6 44 24 10 18 75 3f e8 6d 91 e4 ff 44 89 fe bf 0e 00 00 00 e8 20 8e e4 ff e8 5b 91 e4 ff <0f> 0b 45 31 f6 e9 98 02 00 00 41 83 ff 12 74 18 e8 46 91 e4 ff 44 RSP: 0018:ffffc90003cefb40 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000 RDX: ffff8880259c0000 RSI: ffffffff81968415 RDI: 0000000000000005 RBP: ffff88801270ca00 R08: 0000000000000005 R09: 000000000000000e R10: 0000000000000011 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000011 R14: ffff888026ee6424 R15: 0000000000000011 FS: 000055555641b300(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000f2e258 CR3: 000000007110e000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> btf_func_proto_check kernel/bpf/btf.c:4447 [inline] btf_check_all_types kernel/bpf/btf.c:4723 [inline] btf_parse_type_sec kernel/bpf/btf.c:4752 [inline] btf_parse kernel/bpf/btf.c:5026 [inline] btf_new_fd+0x1926/0x1e70 kernel/bpf/btf.c:6892 bpf_btf_load kernel/bpf/syscall.c:4324 [inline] __sys_bpf+0xb7d/0x4cf0 kernel/bpf/syscall.c:5010 __do_sys_bpf kernel/bpf/syscall.c:5069 [inline] __se_sys_bpf kernel/bpf/syscall.c:5067 [inline] __x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:5067 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f0fbae41c69 Code: 28 c3 e8 2a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc8aeb6228 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0fbae41c69 RDX: 0000000000000020 RSI: 0000000020000140 RDI: 0000000000000012 RBP: 00007f0fbae05e10 R08: 0000000000000000 R09: 0000000000000000 R10: 00000000ffffffff R11: 0000000000000246 R12: 00007f0fbae05ea0 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Looks like it tries to create a func_proto which return type is decl_tag. For the details, see Martin's spot on analysis in [0]. 0: https://lore.kernel.org/bpf/CAKH8qBuQDLva_hHxxBuZzyAcYNO4ejhovz6TQeVSk8HY-2SO6g@mail.gmail.com/T/#mea6524b3fcd6298347432226e81b1e6155efc62c Cc: Yonghong Song <yhs@fb.com> Cc: Martin KaFai Lau <martin.lau@kernel.org> Fixes: bd16dee66ae4 ("bpf: Add BTF_KIND_DECL_TAG typedef support") Reported-by: syzbot+d8bd751aef7c6b39a344@syzkaller.appspotmail.com Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221015002444.2680969-2-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-04sched/core: Fix comparison in sched_group_cookie_match()Lin Shengwang
[ Upstream commit e705968dd687574b6ca3ebe772683d5642759132 ] In commit 97886d9dcd86 ("sched: Migration changes for core scheduling"), sched_group_cookie_match() was added to help determine if a cookie matches the core state. However, while it iterates the SMT group, it fails to actually use the RQ for each of the CPUs iterated, use cpu_rq(cpu) instead of rq to fix things. Fixes: 97886d9dcd86 ("sched: Migration changes for core scheduling") Signed-off-by: Lin Shengwang <linshengwang1@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221008022709.642-1-linshengwang1@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-11-04perf: Fix missing SIGTRAPsPeter Zijlstra
[ Upstream commit ca6c21327c6af02b7eec31ce4b9a740a18c6c13f ] Marco reported: Due to the implementation of how SIGTRAP are delivered if perf_event_attr::sigtrap is set, we've noticed 3 issues: 1. Missing SIGTRAP due to a race with event_sched_out() (more details below). 2. Hardware PMU events being disabled due to returning 1 from perf_event_overflow(). The only way to re-enable the event is for user space to first "properly" disable the event and then re-enable it. 3. The inability to automatically disable an event after a specified number of overflows via PERF_EVENT_IOC_REFRESH. The worst of the 3 issues is problem (1), which occurs when a pending_disable is "consumed" by a racing event_sched_out(), observed as follows: CPU0 | CPU1 --------------------------------+--------------------------- __perf_event_overflow() | perf_event_disable_inatomic() | pending_disable = CPU0 | ... | _perf_event_enable() | event_function_call() | task_function_call() | /* sends IPI to CPU0 */ <IPI> | ... __perf_event_enable() +--------------------------- ctx_resched() task_ctx_sched_out() ctx_sched_out() group_sched_out() event_sched_out() pending_disable = -1 </IPI> <IRQ-work> perf_pending_event() perf_pending_event_disable() /* Fails to send SIGTRAP because no pending_disable! */ </IRQ-work> In the above case, not only is that particular SIGTRAP missed, but also all future SIGTRAPs because 'event_limit' is not reset back to 1. To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction of a separate 'pending_sigtrap', no longer using 'event_limit' and 'pending_disable' for its delivery. Additionally; and different to Marco's proposed patch: - recognise that pending_disable effectively duplicates oncpu for the case where it is set. As such, change the irq_work handler to use ->oncpu to target the event and use pending_* as boolean toggles. - observe that SIGTRAP targets the ctx->task, so the context switch optimization that carries contexts between tasks is invalid. If the irq_work were delayed enough to hit after a context switch the SIGTRAP would be delivered to the wrong task. - observe that if the event gets scheduled out (rotation/migration/context-switch/...) the irq-work would be insufficient to deliver the SIGTRAP when the event gets scheduled back in (the irq-work might still be pending on the old CPU). Therefore have event_sched_out() convert the pending sigtrap into a task_work which will deliver the signal at return_to_user. Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events") Reported-by: Dmitry Vyukov <dvyukov@google.com> Debugged-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Marco Elver <elver@google.com> Debugged-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Marco Elver <elver@google.com> Tested-by: Marco Elver <elver@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26gcov: support GCC 12.1 and newer compilersMartin Liska
commit 977ef30a7d888eeb52fb6908f99080f33e5309a8 upstream. Starting with GCC 12.1, the created .gcda format can't be read by gcov tool. There are 2 significant changes to the .gcda file format that need to be supported: a) [gcov: Use system IO buffering] (23eb66d1d46a34cb28c4acbdf8a1deb80a7c5a05) changed that all sizes in the format are in bytes and not in words (4B) b) [gcov: make profile merging smarter] (72e0c742bd01f8e7e6dcca64042b9ad7e75979de) add a new checksum to the file header. Tested with GCC 7.5, 10.4, 12.2 and the current master. Link: https://lkml.kernel.org/r/624bda92-f307-30e9-9aaa-8cc678b2dfb2@suse.cz Signed-off-by: Martin Liska <mliska@suse.cz> Tested-by: Peter Oberparleiter <oberpar@linux.ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-21bpf: use bpf_prog_pack for bpf_dispatcherSong Liu
[ Upstream commit 19c02415da2345d0dda2b5c4495bc17cc14b18b5 ] Allocate bpf_dispatcher with bpf_prog_pack_alloc so that bpf_dispatcher can share pages with bpf programs. arch_prepare_bpf_dispatcher() is updated to provide a RW buffer as working area for arch code to write to. This also fixes CPA W^X warnning like: CPA refuse W^X violation: 8000000000000163 -> 0000000000000163 range: ... Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20220926184739.3512547-2-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21bpf: Adjust kprobe_multi entry_ip for CONFIG_X86_KERNEL_IBTJiri Olsa
[ Upstream commit c09eb2e578eb1668bbc84dc07e8d8bd6f04b9a02 ] Martynas reported bpf_get_func_ip returning +4 address when CONFIG_X86_KERNEL_IBT option is enabled. When CONFIG_X86_KERNEL_IBT is enabled we'll have endbr instruction at the function entry, which screws return value of bpf_get_func_ip() helper that should return the function address. There's short term workaround for kprobe_multi bpf program made by Alexei [1], but we need this fixup also for bpf_get_attach_cookie, that returns cookie based on the entry_ip value. Moving the fixup in the fprobe handler, so both bpf_get_func_ip and bpf_get_attach_cookie get expected function address when CONFIG_X86_KERNEL_IBT option is enabled. Also renaming kprobe_multi_link_handler entry_ip argument to fentry_ip so it's clearer this is an ftrace __fentry__ ip. [1] commit 7f0059b58f02 ("selftests/bpf: Fix kprobe_multi test.") Cc: Peter Zijlstra <peterz@infradead.org> Reported-by: Martynas Pumputis <m@lambda.lt> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220926153340.1621984-5-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-21rcu-tasks: Ensure RCU Tasks Trace loops have quiescent statesPaul E. McKenney
[ Upstream commit d6ad60635cafe900bcd11ad588d8accb36c36b1b ] The RCU Tasks Trace grace-period kthread loops across all CPUs, and there can be quite a few CPUs, with some commercially available systems sporting well over a thousand of them. Some of these loops can feature IPIs, which can take some time. This commit therefore places a call to cond_resched_tasks_rcu_qs() in each such loop. Link: https://docs.google.com/document/d/1V0YnG1HTWMt9WHJjroiJL9lf-hMrud4v8Fn3fhyY0cI/edit?usp=sharing Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>