summaryrefslogtreecommitdiff
path: root/kernel/trace/trace.c
AgeCommit message (Collapse)Author
2022-10-29tracing: Do not free snapshot if tracer is on cmdlineSteven Rostedt (Google)
[ Upstream commit a541a9559bb0a8ecc434de01d3e4826c32e8bb53 ] The ftrace_boot_snapshot and alloc_snapshot cmdline options allocate the snapshot buffer at boot up for use later. The ftrace_boot_snapshot in particular requires the snapshot to be allocated because it will take a snapshot at the end of boot up allowing to see the traces that happened during boot so that it's not lost when user space takes over. When a tracer is registered (started) there's a path that checks if it requires the snapshot buffer or not, and if it does not and it was allocated it will do a synchronization and free the snapshot buffer. This is only required if the previous tracer was using it for "max latency" snapshots, as it needs to make sure all max snapshots are complete before freeing. But this is only needed if the previous tracer was using the snapshot buffer for latency (like irqoff tracer and friends). But it does not make sense to free it, if the previous tracer was not using it, and the snapshot was allocated by the cmdline parameters. This basically takes away the point of allocating it in the first place! Note, the allocated snapshot worked fine for just trace events, but fails when a tracer is enabled on the cmdline. Further investigation, this goes back even further and it does not require a tracer on the cmdline to fail. Simply enable snapshots and then enable a tracer, and it will remove the snapshot. Link: https://lkml.kernel.org/r/20221005113757.041df7fe@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Fixes: 45ad21ca5530 ("tracing: Have trace_array keep track if snapshot buffer is allocated") Reported-by: Ross Zwisler <zwisler@kernel.org> Tested-by: Ross Zwisler <zwisler@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-29tracing: Simplify conditional compilation code in tracing_set_tracer()sunliming
[ Upstream commit f4b0d318097e45cbac5e14976f8bb56aa2cef504 ] Two conditional compilation directives "#ifdef CONFIG_TRACER_MAX_TRACE" are used consecutively, and no other code in between. Simplify conditional the compilation code and only use one "#ifdef CONFIG_TRACER_MAX_TRACE". Link: https://lkml.kernel.org/r/20220602140613.545069-1-sunliming@kylinos.cn Signed-off-by: sunliming <sunliming@kylinos.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Stable-dep-of: a541a9559bb0 ("tracing: Do not free snapshot if tracer is on cmdline") Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26tracing: Add ioctl() to force ring buffer waiters to wake upSteven Rostedt (Google)
commit 01b2a52171735c6eea80ee2f355f32bea6c41418 upstream. If a process is waiting on the ring buffer for data, there currently isn't a clean way to force it to wake up. Add an ioctl call that will force any tasks that are waiting on the trace_pipe_raw file to wake up. Link: https://lkml.kernel.org/r/20220929095029.117f913f@gandalf.local.home Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-26tracing: Wake up waiters when tracing is disabledSteven Rostedt (Google)
commit 2b0fd9a59b7990c161fa1cb7b79edb22847c87c2 upstream. When tracing is disabled, there's no reason that waiters should stay waiting, wake them up, otherwise tasks get stuck when they should be flushing the buffers. Cc: stable@vger.kernel.org Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-26tracing: Wake up ring buffer waiters on closing of the fileSteven Rostedt (Google)
commit f3ddb74ad0790030c9592229fb14d8c451f4e9a8 upstream. When the file that represents the ring buffer is closed, there may be waiters waiting on more input from the ring buffer. Call ring_buffer_wake_waiters() to wake up any waiters when the file is closed. Link: https://lkml.kernel.org/r/20220927231825.182416969@goodmis.org Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-26tracing: Disable interrupt or preemption before acquiring arch_spinlock_tWaiman Long
commit c0a581d7126c0bbc96163276f585fd7b4e4d8d0e upstream. It was found that some tracing functions in kernel/trace/trace.c acquire an arch_spinlock_t with preemption and irqs enabled. An example is the tracing_saved_cmdlines_size_read() function which intermittently causes a "BUG: using smp_processor_id() in preemptible" warning when the LTP read_all_proc test is run. That can be problematic in case preemption happens after acquiring the lock. Add the necessary preemption or interrupt disabling code in the appropriate places before acquiring an arch_spinlock_t. The convention here is to disable preemption for trace_cmdline_lock and interupt for max_lock. Link: https://lkml.kernel.org/r/20220922145622.1744826-1-longman@redhat.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: stable@vger.kernel.org Fixes: a35873a0993b ("tracing: Add conditional snapshot") Fixes: 939c7a4f04fc ("tracing: Introduce saved_cmdlines_size file") Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-07-29tracing: Fix return value of trace_pid_write()Wonhyuk Yang
[ Upstream commit b27f266f74fbda4ee36c2b2b04d15992860cf23b ] Setting set_event_pid with trailing whitespace lead to endless write system calls like below. $ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid execve("/usr/bin/echo", ["echo", "123 "], ...) = 0 ... write(1, "123 \n", 5) = 4 write(1, "\n", 1) = 0 write(1, "\n", 1) = 0 write(1, "\n", 1) = 0 write(1, "\n", 1) = 0 write(1, "\n", 1) = 0 .... This is because, the result of trace_get_user's are not returned when it read at least one pid. To fix it, update read variable even if parser->idx == 0. The result of applied patch is below. $ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid execve("/usr/bin/echo", ["echo", "123 "], ...) = 0 ... write(1, "123 \n", 5) = 5 close(1) = 0 Link: https://lkml.kernel.org/r/20220503050546.288911-1-vvghjk1234@gmail.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Baik Song An <bsahn@etri.re.kr> Cc: Hong Yeon Kim <kimhy@etri.re.kr> Cc: Taeung Song <taeung@reallinux.co.kr> Cc: linuxgeek@linuxgeek.io Cc: stable@vger.kernel.org Fixes: 4909010788640 ("tracing: Add set_event_pid directory for future use") Signed-off-by: Wonhyuk Yang <vvghjk1234@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-07-29tracing: Place trace_pid_list logic into abstract functionsSteven Rostedt (VMware)
[ Upstream commit 6954e415264eeb5ee6be0d22d789ad12c995ee64 ] Instead of having the logic that does trace_pid_list open coded, wrap it in abstract functions. This will allow a rewrite of the logic that implements the trace_pid_list without affecting the users. Note, this causes a change in behavior. Every time a pid is written into the set_*_pid file, it creates a new list and uses RCU to update it. If pid_max is lowered, but there was a pid currently in the list that was higher than pid_max, those pids will now be removed on updating the list. The old behavior kept that from happening. The rewrite of the pid_list logic will no longer depend on pid_max, and will return the old behavior. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-07-21tracing: Fix sleeping while atomic in kdb ftdumpDouglas Anderson
[ Upstream commit 495fcec8648cdfb483b5b9ab310f3839f07cb3b8 ] If you drop into kdb and type "ftdump" you'll get a sleeping while atomic warning from memory allocation in trace_find_next_entry(). This appears to have been caused by commit ff895103a84a ("tracing: Save off entry when peeking at next entry"), which added the allocation in that path. The problematic commit was already fixed by commit 8e99cf91b99b ("tracing: Do not allocate buffer in trace_find_next_entry() in atomic") but that fix missed the kdb case. The fix here is easy: just move the assignment of the static buffer to the place where it should have been to begin with: trace_init_global_iter(). That function is called in two places, once is right before the assignment of the static buffer added by the previous fix and once is in kdb. Note that it appears that there's a second static buffer that we need to assign that was added in commit efbbdaa22bb7 ("tracing: Show real address for trace event arguments"), so we'll move that too. Link: https://lkml.kernel.org/r/20220708170919.1.I75844e5038d9425add2ad853a608cb44bb39df40@changeid Fixes: ff895103a84a ("tracing: Save off entry when peeking at next entry") Fixes: efbbdaa22bb7 ("tracing: Show real address for trace event arguments") Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14tracing: Avoid adding tracer option before update_tracer_optionsMark-PK Tsai
[ Upstream commit ef9188bcc6ca1d8a2ad83e826b548e6820721061 ] To prepare for support asynchronous tracer_init_tracefs initcall, avoid calling create_trace_option_files before __update_tracer_options. Otherwise, create_trace_option_files will show warning because some tracers in trace_types list are already in tr->topts. For example, hwlat_tracer call register_tracer in late_initcall, and global_trace.dir is already created in tracing_init_dentry, hwlat_tracer will be put into tr->topts. Then if the __update_tracer_options is executed after hwlat_tracer registered, create_trace_option_files find that hwlat_tracer is already in tr->topts. Link: https://lkml.kernel.org/r/20220426122407.17042-2-mark-pk.tsai@mediatek.com Link: https://lore.kernel.org/lkml/20220322133339.GA32582@xsang-OptiPlex-9020/ Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14tracing: Fix sleeping function called from invalid context on RT kernelJun Miao
[ Upstream commit 12025abdc8539ed9d5014e2d647a3fd1bd3de5cd ] When setting bootparams="trace_event=initcall:initcall_start tp_printk=1" in the cmdline, the output_printk() was called, and the spin_lock_irqsave() was called in the atomic and irq disable interrupt context suitation. On the PREEMPT_RT kernel, these locks are replaced with sleepable rt-spinlock, so the stack calltrace will be triggered. Fix it by raw_spin_lock_irqsave when PREEMPT_RT and "trace_event=initcall:initcall_start tp_printk=1" enabled. BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0 preempt_count: 2, expected: 0 RCU nest depth: 0, expected: 0 Preemption disabled at: [<ffffffff8992303e>] try_to_wake_up+0x7e/0xba0 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.1-rt17+ #19 34c5812404187a875f32bee7977f7367f9679ea7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x60/0x8c dump_stack+0x10/0x12 __might_resched.cold+0x11d/0x155 rt_spin_lock+0x40/0x70 trace_event_buffer_commit+0x2fa/0x4c0 ? map_vsyscall+0x93/0x93 trace_event_raw_event_initcall_start+0xbe/0x110 ? perf_trace_initcall_finish+0x210/0x210 ? probe_sched_wakeup+0x34/0x40 ? ttwu_do_wakeup+0xda/0x310 ? trace_hardirqs_on+0x35/0x170 ? map_vsyscall+0x93/0x93 do_one_initcall+0x217/0x3c0 ? trace_event_raw_event_initcall_level+0x170/0x170 ? push_cpu_stop+0x400/0x400 ? cblist_init_generic+0x241/0x290 kernel_init_freeable+0x1ac/0x347 ? _raw_spin_unlock_irq+0x65/0x80 ? rest_init+0xf0/0xf0 kernel_init+0x1e/0x150 ret_from_fork+0x22/0x30 </TASK> Link: https://lkml.kernel.org/r/20220419013910.894370-1-jun.miao@intel.com Signed-off-by: Jun Miao <jun.miao@intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08tracing: Have trace event string test handle zero length stringsSteven Rostedt (Google)
commit eca344a7362e0f34f179298fd8366bcd556eede1 upstream. If a trace event has in its TP_printk(): "%*.s", len, len ? __get_str(string) : NULL It is perfectly valid if len is zero and passing in the NULL. Unfortunately, the runtime string check at time of reading the trace sees the NULL and flags it as a bad string and produces a WARN_ON(). Handle this case by passing into the test function if the format has an asterisk (star) and if so, if the length is zero, then mark it as safe. Link: https://lore.kernel.org/all/YjsWzuw5FbWPrdqq@bfoster/ Cc: stable@vger.kernel.org Reported-by: Brian Foster <bfoster@redhat.com> Tested-by: Brian Foster <bfoster@redhat.com> Fixes: 9a6944fee68e2 ("tracing: Add a verifier to check string pointers for trace events") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-16tracing: Ensure trace buffer is at least 4096 bytes largeSven Schnelle
[ Upstream commit 7acf3a127bb7c65ff39099afd78960e77b2ca5de ] Booting the kernel with 'trace_buf_size=1' give a warning at boot during the ftrace selftests: [ 0.892809] Running postponed tracer tests: [ 0.892893] Testing tracer function: [ 0.901899] Callback from call_rcu_tasks_trace() invoked. [ 0.983829] Callback from call_rcu_tasks_rude() invoked. [ 1.072003] .. bad ring buffer .. corrupted trace buffer .. [ 1.091944] Callback from call_rcu_tasks() invoked. [ 1.097695] PASSED [ 1.097701] Testing dynamic ftrace: .. filter failed count=0 ..FAILED! [ 1.353474] ------------[ cut here ]------------ [ 1.353478] WARNING: CPU: 0 PID: 1 at kernel/trace/trace.c:1951 run_tracer_selftest+0x13c/0x1b0 Therefore enforce a minimum of 4096 bytes to make the selftest pass. Link: https://lkml.kernel.org/r/20220214134456.1751749-1-svens@linux.ibm.com Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08tracing: Fix return value of __setup handlersRandy Dunlap
commit 1d02b444b8d1345ea4708db3bab4db89a7784b55 upstream. __setup() handlers should generally return 1 to indicate that the boot options have been handled. Using invalid option values causes the entire kernel boot option string to be reported as Unknown and added to init's environment strings, polluting it. Unknown kernel command line parameters "BOOT_IMAGE=/boot/bzImage-517rc6 kprobe_event=p,syscall_any,$arg1 trace_options=quiet trace_clock=jiffies", will be passed to user space. Run /sbin/init as init process with arguments: /sbin/init with environment: HOME=/ TERM=linux BOOT_IMAGE=/boot/bzImage-517rc6 kprobe_event=p,syscall_any,$arg1 trace_options=quiet trace_clock=jiffies Return 1 from the __setup() handlers so that init's environment is not polluted with kernel boot options. Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru Link: https://lkml.kernel.org/r/20220303031744.32356-1-rdunlap@infradead.org Cc: stable@vger.kernel.org Fixes: 7bcfaf54f591 ("tracing: Add trace_options kernel command line parameter") Fixes: e1e232ca6b8f ("tracing: Add trace_clock=<clock> kernel parameter") Fixes: 970988e19eb0 ("tracing/kprobe: Add kprobe_event= boot parameter") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-23tracing: Fix tp_printk option related with tp_printk_stop_on_bootJaeSang Yoo
[ Upstream commit 3203ce39ac0b2a57a84382ec184c7d4a0bede175 ] The kernel parameter "tp_printk_stop_on_boot" starts with "tp_printk" which is the same as another kernel parameter "tp_printk". If "tp_printk" setup is called before the "tp_printk_stop_on_boot", it will override the latter and keep it from being set. This is similar to other kernel parameter issues, such as: Commit 745a600cf1a6 ("um: console: Ignore console= option") or init/do_mounts.c:45 (setup function of "ro" kernel param) Fix it by checking for a "_" right after the "tp_printk" and if that exists do not process the parameter. Link: https://lkml.kernel.org/r/20220208195421.969326-1-jsyoo5b@gmail.com Signed-off-by: JaeSang Yoo <jsyoo5b@gmail.com> [ Fixed up change log and added space after if condition ] Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-01tracing: Don't inc err_log entry count if entry allocation failsTom Zanussi
commit 67ab5eb71b37b55f7c5522d080a1b42823351776 upstream. tr->n_err_log_entries should only be increased if entry allocation succeeds. Doing it when it fails won't cause any problems other than wasting an entry, but should be fixed anyway. Link: https://lkml.kernel.org/r/cad1ab28f75968db0f466925e7cba5970cec6c29.1643319703.git.zanussi@kernel.org Cc: stable@vger.kernel.org Fixes: 2f754e771b1a6 ("tracing: Don't inc err_log entry count if entry allocation fails") Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-11tracing: Tag trace_percpu_buffer as a percpu pointerNaveen N. Rao
commit f28439db470cca8b6b082239314e9fd10bd39034 upstream. Tag trace_percpu_buffer as a percpu pointer to resolve warnings reported by sparse: /linux/kernel/trace/trace.c:3218:46: warning: incorrect type in initializer (different address spaces) /linux/kernel/trace/trace.c:3218:46: expected void const [noderef] __percpu *__vpp_verify /linux/kernel/trace/trace.c:3218:46: got struct trace_buffer_struct * /linux/kernel/trace/trace.c:3234:9: warning: incorrect type in initializer (different address spaces) /linux/kernel/trace/trace.c:3234:9: expected void const [noderef] __percpu *__vpp_verify /linux/kernel/trace/trace.c:3234:9: got int * Link: https://lkml.kernel.org/r/ebabd3f23101d89cb75671b68b6f819f5edc830b.1640255304.git.naveen.n.rao@linux.vnet.ibm.com Cc: stable@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Fixes: 07d777fe8c398 ("tracing: Add percpu buffers for trace_printk()") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-11tracing: Fix check for trace_percpu_buffer validity in get_trace_buf()Naveen N. Rao
commit 823e670f7ed616d0ce993075c8afe0217885f79d upstream. With the new osnoise tracer, we are seeing the below splat: Kernel attempted to read user page (c7d880000) - exploit attempt? (uid: 0) BUG: Unable to handle kernel data access on read at 0xc7d880000 Faulting instruction address: 0xc0000000002ffa10 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries ... NIP [c0000000002ffa10] __trace_array_vprintk.part.0+0x70/0x2f0 LR [c0000000002ff9fc] __trace_array_vprintk.part.0+0x5c/0x2f0 Call Trace: [c0000008bdd73b80] [c0000000001c49cc] put_prev_task_fair+0x3c/0x60 (unreliable) [c0000008bdd73be0] [c000000000301430] trace_array_printk_buf+0x70/0x90 [c0000008bdd73c00] [c0000000003178b0] trace_sched_switch_callback+0x250/0x290 [c0000008bdd73c90] [c000000000e70d60] __schedule+0x410/0x710 [c0000008bdd73d40] [c000000000e710c0] schedule+0x60/0x130 [c0000008bdd73d70] [c000000000030614] interrupt_exit_user_prepare_main+0x264/0x270 [c0000008bdd73de0] [c000000000030a70] syscall_exit_prepare+0x150/0x180 [c0000008bdd73e10] [c00000000000c174] system_call_vectored_common+0xf4/0x278 osnoise tracer on ppc64le is triggering osnoise_taint() for negative duration in get_int_safe_duration() called from trace_sched_switch_callback()->thread_exit(). The problem though is that the check for a valid trace_percpu_buffer is incorrect in get_trace_buf(). The check is being done after calculating the pointer for the current cpu, rather than on the main percpu pointer. Fix the check to be against trace_percpu_buffer. Link: https://lkml.kernel.org/r/a920e4272e0b0635cf20c444707cbce1b2c8973d.1640255304.git.naveen.n.rao@linux.vnet.ibm.com Cc: stable@vger.kernel.org Fixes: e2ace001176dc9 ("tracing: Choose static tp_printk buffer by explicit nesting count") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-08tracing: Don't use out-of-sync va_list in event printingNikita Yushchenko
[ Upstream commit 2ef75e9bd2c998f1c6f6f23a3744136105ddefd5 ] If trace_seq becomes full, trace_seq_vprintf() no longer consumes arguments from va_list, making va_list out of sync with format processing by trace_check_vprintf(). This causes va_arg() in trace_check_vprintf() to return wrong positional argument, which results into a WARN_ON_ONCE() hit. ftrace_stress_test from LTP triggers this situation. Fix it by explicitly avoiding further use if va_list at the point when it's consistency can no longer be guaranteed. Link: https://lkml.kernel.org/r/20211118145516.13219-1-nikita.yushchenko@virtuozzo.com Signed-off-by: Nikita Yushchenko <nikita.yushchenko@virtuozzo.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18tracing: Disable "other" permission bits in the tracefs filesSteven Rostedt (VMware)
[ Upstream commit 21ccc9cd72116289469e5519b6159c675a2fa58f ] When building the files in the tracefs file system, do not by default set any permissions for OTH (other). This will make it easier for admins who want to define a group for accessing tracefs and not having to first disable all the permission bits for "other" in the file system. As tracing can leak sensitive information, it should never by default allowing all users access. An admin can still set the permission bits for others to have access, which may be useful for creating a honeypot and seeing who takes advantage of it and roots the machine. Link: https://lkml.kernel.org/r/20210818153038.864149276@goodmis.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-10-07tracing: Fix missing osnoise tracer on max_latencyJackie Liu
The compiler warns when the data are actually unused: kernel/trace/trace.c:1712:13: error: ‘trace_create_maxlat_file’ defined but not used [-Werror=unused-function] 1712 | static void trace_create_maxlat_file(struct trace_array *tr, | ^~~~~~~~~~~~~~~~~~~~~~~~ [Why] CONFIG_HWLAT_TRACER=n, CONFIG_TRACER_MAX_TRACE=n, CONFIG_OSNOISE_TRACER=y gcc report warns. [How] Now trace_create_maxlat_file will only take effect when CONFIG_HWLAT_TRACER=y or CONFIG_TRACER_MAX_TRACE=y. In fact, after adding osnoise trace, it also needs to take effect. Link: https://lore.kernel.org/all/c1d9e328-ad7c-920b-6c24-9e1598a6421c@infradead.org/ Link: https://lkml.kernel.org/r/20210922025122.3268022-1-liu.yun@linux.dev Fixes: bce29ac9ce0b ("trace: Add osnoise tracer") Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-09Merge tag 'trace-v5.15-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull more tracing updates from Steven Rostedt: - Add migrate-disable counter to tracing header - Fix error handling in event probes - Fix missed unlock in osnoise in error path - Fix merge issue with tools/bootconfig - Clean up bootconfig data when init memory is removed - Fix bootconfig to loop only on subkeys - Have kernel command lines override bootconfig options - Increase field counts for synthetic events - Have histograms dynamic allocate event elements to save space - Fixes in testing and documentation * tag 'trace-v5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/boot: Fix to loop on only subkeys selftests/ftrace: Exclude "(fault)" in testing add/remove eprobe events tracing: Dynamically allocate the per-elt hist_elt_data array tracing: synth events: increase max fields count tools/bootconfig: Show whole test command for each test case bootconfig: Fix missing return check of xbc_node_compose_key function tools/bootconfig: Fix tracing_on option checking in ftrace2bconf.sh docs: bootconfig: Add how to use bootconfig for kernel parameters init/bootconfig: Reorder init parameter from bootconfig and cmdline init: bootconfig: Remove all bootconfig data when the init memory is removed tracing/osnoise: Fix missed cpus_read_unlock() in start_per_cpu_kthreads() tracing: Fix some alloc_event_probe() error handling bugs tracing: Add migrate-disabled counter to tracing output.
2021-09-05Merge tag 'trace-v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - simplify the Kconfig use of FTRACE and TRACE_IRQFLAGS_SUPPORT - bootconfig can now start histograms - bootconfig supports group/all enabling - histograms now can put values in linear size buckets - execnames can be passed to synthetic events - introduce "event probes" that attach to other events and can retrieve data from pointers of fields, or record fields as different types (a pointer to a string as a string instead of just a hex number) - various fixes and clean ups * tag 'trace-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (35 commits) tracing/doc: Fix table format in histogram code selftests/ftrace: Add selftest for testing duplicate eprobes and kprobes selftests/ftrace: Add selftest for testing eprobe events on synthetic events selftests/ftrace: Add test case to test adding and removing of event probe selftests/ftrace: Fix requirement check of README file selftests/ftrace: Add clear_dynamic_events() to test cases tracing: Add a probe that attaches to trace events tracing/probes: Reject events which have the same name of existing one tracing/probes: Have process_fetch_insn() take a void * instead of pt_regs tracing/probe: Change traceprobe_set_print_fmt() to take a type tracing/probes: Use struct_size() instead of defining custom macros tracing/probes: Allow for dot delimiter as well as slash for system names tracing/probe: Have traceprobe_parse_probe_arg() take a const arg tracing: Have dynamic events have a ref counter tracing: Add DYNAMIC flag for dynamic events tracing: Replace deprecated CPU-hotplug functions. MAINTAINERS: Add an entry for os noise/latency tracepoint: Fix kerneldoc comments bootconfig/tracing/ktest: Update ktest example for boot-time tracing tools/bootconfig: Use per-group/all enable option in ftrace2bconf script ...
2021-09-03tracing: Add migrate-disabled counter to tracing output.Thomas Gleixner
migrate_disable() forbids task migration to another CPU. It is available since v5.11 and has already users such as highmem or BPF. It is useful to observe this task state in tracing which already has other states like the preemption counter. Instead of adding the migrate disable counter as a new entry to struct trace_entry, which would extend the whole struct by four bytes, it is squashed into the preempt-disable counter. The lower four bits represent the preemption counter, the upper four bits represent the migrate disable counter. Both counter shouldn't exceed 15 but if they do, there is a safety net which caps the value at 15. Add the migrate-disable counter to the trace entry so it shows up in the trace. Due to the users mentioned above, it is already possible to observe it: | bash-1108 [000] ...21 73.950578: rss_stat: mm_id=2213312838 curr=0 type=MM_ANONPAGES size=8192B | bash-1108 [000] d..31 73.951222: irq_disable: caller=flush_tlb_mm_range+0x115/0x130 parent=ptep_clear_flush+0x42/0x50 | bash-1108 [000] d..31 73.951222: tlb_flush: pages:1 reason:local mm shootdown (3) The last value is the migrate-disable counter. Things that popped up: - trace_print_lat_context() does not print the migrate counter. Not sure if it should. It is used in "verbose" mode and uses 8 digits and I'm not sure ther is something processing the value. - trace_define_common_fields() now defines a different variable. This probably breaks things. No ide what to do in order to preserve the old behaviour. Since this is used as a filter it should be split somehow to be able to match both nibbles here. Link: https://lkml.kernel.org/r/20210810132625.ylssabmsrkygokuv@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [bigeasy: patch description.] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> [ SDR: Removed change to common_preempt_count field name ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-01Merge tag 'printk-for-5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Optionally, provide an index of possible printk messages via <debugfs>/printk/index/. It can be used when monitoring important kernel messages on a farm of various hosts. The monitor has to be updated when some messages has changed or are not longer available by a newly deployed kernel. - Add printk.console_no_auto_verbose boot parameter. It allows to generate crash dump even with slow consoles in a reasonable time frame. - Remove printk_safe buffers. The messages are always stored directly to the main logbuffer, even in NMI or recursive context. Also it allows to serialize syslog operations by a mutex instead of a spin lock. - Misc clean up and build fixes. * tag 'printk-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk/index: Fix -Wunused-function warning lib/nmi_backtrace: Serialize even messages about idle CPUs printk: Add printk.console_no_auto_verbose boot parameter printk: Remove console_silent() lib/test_scanf: Handle n_bits == 0 in random tests printk: syslog: close window between wait and read printk: convert @syslog_lock to mutex printk: remove NMI tracking printk: remove safe buffers printk: track/limit recursion lib/nmi_backtrace: explicitly serialize banner and regs printk: Move the printk() kerneldoc comment to its new home printk/index: Fix warning about missing prototypes MIPS/asm/printk: Fix build failure caused by printk printk: index: Add indexing support to dev_printk printk: Userspace format indexing support printk: Rework parse_prefix into printk_parse_prefix printk: Straighten out log_flags into printk_info_flags string_helpers: Escape double quotes in escape_special printk/console: Check consistent sequence number when handling race in console_unlock()
2021-08-30Merge branch 'rework/printk_safe-removal' into for-linusPetr Mladek
2021-08-20tracing: Add a probe that attaches to trace eventsTzvetomir Stoyanov (VMware)
A new dynamic event is introduced: event probe. The event is attached to an existing tracepoint and uses its fields as arguments. The user can specify custom format string of the new event, select what tracepoint arguments will be printed and how to print them. An event probe is created by writing configuration string in 'dynamic_events' ftrace file: e[:[SNAME/]ENAME] SYSTEM/EVENT [FETCHARGS] - Set an event probe -:SNAME/ENAME - Delete an event probe Where: SNAME - System name, if omitted 'eprobes' is used. ENAME - Name of the new event in SNAME, if omitted the SYSTEM_EVENT is used. SYSTEM - Name of the system, where the tracepoint is defined, mandatory. EVENT - Name of the tracepoint event in SYSTEM, mandatory. FETCHARGS - Arguments: <name>=$<field>[:TYPE] - Fetch given filed of the tracepoint and print it as given TYPE with given name. Supported types are: (u8/u16/u32/u64/s8/s16/s32/s64), basic type (x8/x16/x32/x64), hexadecimal types "string", "ustring" and bitfield. Example, attach an event probe on openat system call and print name of the file that will be opened: echo "e:esys/eopen syscalls/sys_enter_openat file=\$filename:string" >> dynamic_events A new dynamic event is created in events/esys/eopen/ directory. It can be deleted with: echo "-:esys/eopen" >> dynamic_events Filters, triggers and histograms can be attached to the new event, it can be matched in synthetic events. There is one limitation - an event probe can not be attached to kprobe, uprobe or another event probe. Link: https://lkml.kernel.org/r/20210812145805.2292326-1-tz.stoyanov@gmail.com Link: https://lkml.kernel.org/r/20210819152825.142428383@goodmis.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-18tracing: Have dynamic events have a ref counterSteven Rostedt (VMware)
As dynamic events are not created by modules, if something is attached to one, calling "try_module_get()" on its "mod" field, is not going to keep the dynamic event from going away. Since dynamic events do not need the "mod" pointer of the event structure, make a union out of it in order to save memory (there's one structure for each of the thousand+ events in the kernel), and have any event with the DYNAMIC flag set to use a ref counter instead. Link: https://lore.kernel.org/linux-trace-devel/20210813004448.51c7de69ce432d338f4d226b@kernel.org/ Link: https://lkml.kernel.org/r/20210817035027.174869074@goodmis.org Suggested-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-16tracing/histogram: Update the documentation for the buckets modifierSteven Rostedt (VMware)
Update both the tracefs README file as well as the histogram.rst to include an explanation of what the buckets modifier is and how to use it. Include an example with the wakeup_latency example for both log2 and the buckets modifiers as there was no existing log2 example. Link: https://lkml.kernel.org/r/20210707213922.167218794@goodmis.org Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-16tracing: Apply trace filters on all output channelsPingfan Liu
The event filters are not applied on all of the output, which results in the flood of printk when using tp_printk. Unfolding event_trigger_unlock_commit_regs() into trace_event_buffer_commit(), so the filters can be applied on every output. Link: https://lkml.kernel.org/r/20210814034538.8428-1-kernelfans@gmail.com Cc: stable@vger.kernel.org Fixes: 0daa2302968c1 ("tracing: Add tp_printk cmdline to have tracepoints go to printk()") Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-07-30tracing: Fix NULL pointer dereference in start_creatingKamal Agrawal
The event_trace_add_tracer() can fail. In this case, it leads to a crash in start_creating with below call stack. Handle the error scenario properly in trace_array_create_dir. Call trace: down_write+0x7c/0x204 start_creating.25017+0x6c/0x194 tracefs_create_file+0xc4/0x2b4 init_tracer_tracefs+0x5c/0x940 trace_array_create_dir+0x58/0xb4 trace_array_create+0x1bc/0x2b8 trace_array_get_by_name+0xdc/0x18c Link: https://lkml.kernel.org/r/1627651386-21315-1-git-send-email-kamaagra@codeaurora.org Cc: stable@vger.kernel.org Fixes: 4114fbfd02f1 ("tracing: Enable creating new instance early boot") Signed-off-by: Kamal Agrawal <kamaagra@codeaurora.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-07-26printk: remove NMI trackingJohn Ogness
All NMI contexts are handled the same as the safe context: store the message and defer printing. There is no need to have special NMI context tracking for this. Using in_nmi() is enough. There are several parts of the kernel that are manually calling into the printk NMI context tracking in order to cause general printk deferred printing: arch/arm/kernel/smp.c arch/powerpc/kexec/crash.c kernel/trace/trace.c For arm/kernel/smp.c and powerpc/kexec/crash.c, provide a new function pair printk_deferred_enter/exit that explicitly achieves the same objective. For ftrace, remove the printk context manipulation completely. It was added in commit 03fc7f9c99c1 ("printk/nmi: Prevent deadlock when accessing the main log buffer in NMI"). The purpose was to enforce storing messages directly into the ring buffer even in NMI context. It really should have only modified the behavior in NMI context. There is no need for a special behavior any longer. All messages are always stored directly now. The console deferring is handled transparently in vprintk(). Signed-off-by: John Ogness <john.ogness@linutronix.de> [pmladek@suse.com: Remove special handling in ftrace.c completely. Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210715193359.25946-5-john.ogness@linutronix.de
2021-07-23tracing/histogram: Rename "cpu" to "common_cpu"Steven Rostedt (VMware)
Currently the histogram logic allows the user to write "cpu" in as an event field, and it will record the CPU that the event happened on. The problem with this is that there's a lot of events that have "cpu" as a real field, and using "cpu" as the CPU it ran on, makes it impossible to run histograms on the "cpu" field of events. For example, if I want to have a histogram on the count of the workqueue_queue_work event on its cpu field, running: ># echo 'hist:keys=cpu' > events/workqueue/workqueue_queue_work/trigger Gives a misleading and wrong result. Change the command to "common_cpu" as no event should have "common_*" fields as that's a reserved name for fields used by all events. And this makes sense here as common_cpu would be a field used by all events. Now we can even do: ># echo 'hist:keys=common_cpu,cpu if cpu < 100' > events/workqueue/workqueue_queue_work/trigger ># cat events/workqueue/workqueue_queue_work/hist # event histogram # # trigger info: hist:keys=common_cpu,cpu:vals=hitcount:sort=hitcount:size=2048 if cpu < 100 [active] # { common_cpu: 0, cpu: 2 } hitcount: 1 { common_cpu: 0, cpu: 4 } hitcount: 1 { common_cpu: 7, cpu: 7 } hitcount: 1 { common_cpu: 0, cpu: 7 } hitcount: 1 { common_cpu: 0, cpu: 1 } hitcount: 1 { common_cpu: 0, cpu: 6 } hitcount: 2 { common_cpu: 0, cpu: 5 } hitcount: 2 { common_cpu: 1, cpu: 1 } hitcount: 4 { common_cpu: 6, cpu: 6 } hitcount: 4 { common_cpu: 5, cpu: 5 } hitcount: 14 { common_cpu: 4, cpu: 4 } hitcount: 26 { common_cpu: 0, cpu: 0 } hitcount: 39 { common_cpu: 2, cpu: 2 } hitcount: 184 Now for backward compatibility, I added a trick. If "cpu" is used, and the field is not found, it will fall back to "common_cpu" and work as it did before. This way, it will still work for old programs that use "cpu" to get the actual CPU, but if the event has a "cpu" as a field, it will get that event's "cpu" field, which is probably what it wants anyway. I updated the tracefs/README to include documentation about both the common_timestamp and the common_cpu. This way, if that text is present in the README, then an application can know that common_cpu is supported over just plain "cpu". Link: https://lkml.kernel.org/r/20210721110053.26b4f641@oasis.local.home Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Fixes: 8b7622bf94a44 ("tracing: Add cpu field for hist triggers") Reviewed-by: Tom Zanussi <zanussi@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-07-03Merge tag 'trace-v5.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - Added option for per CPU threads to the hwlat tracer - Have hwlat tracer handle hotplug CPUs - New tracer: osnoise, that detects latency caused by interrupts, softirqs and scheduling of other tasks. - Added timerlat tracer that creates a thread and measures in detail what sources of latency it has for wake ups. - Removed the "success" field of the sched_wakeup trace event. This has been hardcoded as "1" since 2015, no tooling should be looking at it now. If one exists, we can revert this commit, fix that tool and try to remove it again in the future. - tgid mapping fixed to handle more than PID_MAX_DEFAULT pids/tgids. - New boot command line option "tp_printk_stop", as tp_printk causes trace events to write to console. When user space starts, this can easily live lock the system. Having a boot option to stop just after boot up is useful to prevent that from happening. - Have ftrace_dump_on_oops boot command line option take numbers that match the numbers shown in /proc/sys/kernel/ftrace_dump_on_oops. - Bootconfig clean ups, fixes and enhancements. - New ktest script that tests bootconfig options. - Add tracepoint_probe_register_may_exist() to register a tracepoint without triggering a WARN*() if it already exists. BPF has a path from user space that can do this. All other paths are considered a bug. - Small clean ups and fixes * tag 'trace-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (49 commits) tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT tracing: Simplify & fix saved_tgids logic treewide: Add missing semicolons to __assign_str uses tracing: Change variable type as bool for clean-up trace/timerlat: Fix indentation on timerlat_main() trace/osnoise: Make 'noise' variable s64 in run_osnoise() tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing tracing: Fix spelling in osnoise tracer "interferences" -> "interference" Documentation: Fix a typo on trace/osnoise-tracer trace/osnoise: Fix return value on osnoise_init_hotplug_support trace/osnoise: Make interval u64 on osnoise_main trace/osnoise: Fix 'no previous prototype' warnings tracing: Have osnoise_main() add a quiescent state for task rcu seq_buf: Make trace_seq_putmem_hex() support data longer than 8 seq_buf: Fix overflow in seq_buf_putmem_hex() trace/osnoise: Support hotplug operations trace/hwlat: Support hotplug operations trace/hwlat: Protect kdata->kthread with get/put_online_cpus trace: Add timerlat tracer trace: Add osnoise tracer ...
2021-07-01tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULTPaul Burton
Currently tgid_map is sized at PID_MAX_DEFAULT entries, which means that on systems where pid_max is configured higher than PID_MAX_DEFAULT the ftrace record-tgid option doesn't work so well. Any tasks with PIDs higher than PID_MAX_DEFAULT are simply not recorded in tgid_map, and don't show up in the saved_tgids file. In particular since systemd v243 & above configure pid_max to its highest possible 1<<22 value by default on 64 bit systems this renders the record-tgids option of little use. Increase the size of tgid_map to the configured pid_max instead, allowing it to cover the full range of PIDs up to the maximum value of PID_MAX_LIMIT if the system is configured that way. On 64 bit systems with pid_max == PID_MAX_LIMIT this will increase the size of tgid_map from 256KiB to 16MiB. Whilst this 64x increase in memory overhead sounds significant 64 bit systems are presumably best placed to accommodate it, and since tgid_map is only allocated when the record-tgid option is actually used presumably the user would rather it spends sufficient memory to actually record the tgids they expect. The size of tgid_map could also increase for CONFIG_BASE_SMALL=y configurations, but these seem unlikely to be systems upon which people are both configuring a large pid_max and running ftrace with record-tgid anyway. Of note is that we only allocate tgid_map once, the first time that the record-tgid option is enabled. Therefore its size is only set once, to the value of pid_max at the time the record-tgid option is first enabled. If a user increases pid_max after that point, the saved_tgids file will not contain entries for any tasks with pids beyond the earlier value of pid_max. Link: https://lkml.kernel.org/r/20210701172407.889626-2-paulburton@google.com Fixes: d914ba37d714 ("tracing: Add support for recording tgid of tasks") Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Fernandes <joelaf@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Paul Burton <paulburton@google.com> [ Fixed comment coding style ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-07-01kernel.h: split out panic and oops helpersAndy Shevchenko
kernel.h is being used as a dump for all kinds of stuff for a long time. Here is the attempt to start cleaning it up by splitting out panic and oops helpers. There are several purposes of doing this: - dropping dependency in bug.h - dropping a loop by moving out panic_notifier.h - unload kernel.h from something which has its own domain At the same time convert users tree-wide to use new headers, although for the time being include new header back to kernel.h to avoid twisted indirected includes for existing users. [akpm@linux-foundation.org: thread_info.h needs limits.h] [andriy.shevchenko@linux.intel.com: ia64 fix] Link: https://lkml.kernel.org/r/20210520130557.55277-1-andriy.shevchenko@linux.intel.com Link: https://lkml.kernel.org/r/20210511074137.33666-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Co-developed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Wei Liu <wei.liu@kernel.org> Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Sebastian Reichel <sre@kernel.org> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Stephen Boyd <sboyd@kernel.org> Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Acked-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30tracing: Simplify & fix saved_tgids logicPaul Burton
The tgid_map array records a mapping from pid to tgid, where the index of an entry within the array is the pid & the value stored at that index is the tgid. The saved_tgids_next() function iterates over pointers into the tgid_map array & dereferences the pointers which results in the tgid, but then it passes that dereferenced value to trace_find_tgid() which treats it as a pid & does a further lookup within the tgid_map array. It seems likely that the intent here was to skip over entries in tgid_map for which the recorded tgid is zero, but instead we end up skipping over entries for which the thread group leader hasn't yet had its own tgid recorded in tgid_map. A minimal fix would be to remove the call to trace_find_tgid, turning: if (trace_find_tgid(*ptr)) into: if (*ptr) ..but it seems like this logic can be much simpler if we simply let seq_read() iterate over the whole tgid_map array & filter out empty entries by returning SEQ_SKIP from saved_tgids_show(). Here we take that approach, removing the incorrect logic here entirely. Link: https://lkml.kernel.org/r/20210630003406.4013668-1-paulburton@google.com Fixes: d914ba37d714 ("tracing: Add support for recording tgid of tasks") Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Fernandes <joelaf@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Paul Burton <paulburton@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-25tracing: Add LATENCY_FS_NOTIFY to define if latency_fsnotify() is definedSteven Rostedt (VMware)
With the coming addition of the osnoise tracer, the configs needed to include the latency_fsnotify() has become more complex, and to keep the declaration in the header file the same as in the C file, just have the logic needed to define it in one place, and that defines LATENCY_FS_NOTIFY which will be used in the C code. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-25trace: Add a generic function to read/write u64 values from tracefsDaniel Bristot de Oliveira
The hwlat detector and (in preparation for) the osnoise/timerlat tracers have a set of u64 parameters that the user can read/write via tracefs. For instance, we have hwlat_detector's window and width. To reduce the code duplication, hwlat's window and width share the same read function. However, they do not share the write functions because they do different parameter checks. For instance, the width needs to be smaller than the window, while the window needs to be larger than the window. The same pattern repeats on osnoise/timerlat, and a large portion of the code was devoted to the write function. Despite having different checks, the write functions have the same structure: read a user-space buffer take the lock that protects the value check for minimum and maximum acceptable values save the value release the lock return success or error To reduce the code duplication also in the write functions, this patch provides a generic read and write implementation for u64 values that need to be within some minimum and/or maximum parameters, while (potentially) being protected by a lock. To use this interface, the structure trace_min_max_param needs to be filled: struct trace_min_max_param { struct mutex *lock; u64 *val; u64 *min; u64 *max; }; The desired value is stored on the variable pointed by *val. If *min points to a minimum acceptable value, it will be checked during the write operation. Likewise, if *max points to a maximum allowable value, it will be checked during the write operation. Finally, if *lock points to a mutex, it will be taken at the beginning of the operation and released at the end. The definition of a trace_min_max_param needs to passed as the (private) *data for tracefs_create_file(), and the trace_min_max_fops (added by this patch) as the *fops file_operations. Link: https://lkml.kernel.org/r/3e35760a7c8b5c55f16ae5ad5fc54a0e71cbe647.1624372313.git.bristot@redhat.com Cc: Phil Auld <pauld@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Kate Carcia <kcarcia@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Alexandre Chartre <alexandre.chartre@oracle.com> Cc: Clark Willaims <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-18tracing: Do not stop recording comms if the trace file is being readSteven Rostedt (VMware)
A while ago, when the "trace" file was opened, tracing was stopped, and code was added to stop recording the comms to saved_cmdlines, for mapping of the pids to the task name. Code has been added that only records the comm if a trace event occurred, and there's no reason to not trace it if the trace file is opened. Cc: stable@vger.kernel.org Fixes: 7ffbd48d5cab2 ("tracing: Cache comms only after an event occurred") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-18tracing: Do not stop recording cmdlines when tracing is offSteven Rostedt (VMware)
The saved_cmdlines is used to map pids to the task name, such that the output of the tracing does not just show pids, but also gives a human readable name for the task. If the name is not mapped, the output looks like this: <...>-1316 [005] ...2 132.044039: ... Instead of this: gnome-shell-1316 [005] ...2 132.044039: ... The names are updated when tracing is running, but are skipped if tracing is stopped. Unfortunately, this stops the recording of the names if the top level tracer is stopped, and not if there's other tracers active. The recording of a name only happens when a new event is written into a ring buffer, so there is no need to test if tracing is on or not. If tracing is off, then no event is written and no need to test if tracing is off or not. Remove the check, as it hides the names of tasks for events in the instance buffers. Cc: stable@vger.kernel.org Fixes: 7ffbd48d5cab2 ("tracing: Cache comms only after an event occurred") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-17tracing: Have ftrace_dump_on_oops kernel parameter take numbersSteven Rostedt (VMware)
The kernel parameter for ftrace_dump_on_oops can take a single assignment. That is, it can be: ftrace_dump_on_oops or ftrace_dump_on_oops=orig_cpu But the content in the sysctl file is a number. 0 for disabled 1 for ftrace_dump_on_oops (all CPUs) 2 for ftrace_dump_on_oops (orig CPU) Allow the kernel command line to take a number as well to match the sysctl numbers. That is: ftrace_dump_on_oops=1 is the same as ftrace_dump_on_oops and ftrace_dump_on_oops=2 is the same as ftrace_dump_on_oops=orig_cpu Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-17tracing: Add tp_printk_stop_on_boot optionSteven Rostedt (VMware)
Add a kernel command line option that disables printing of events to console at late_initcall_sync(). This is useful when needing to see specific events written to console on boot up, but not wanting it when user space starts, as user space may make the console so noisy that the system becomes inoperable. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-10tracing: Add better comments for the filtering temp buffer use caseSteven Rostedt (VMware)
When filtering is enabled, the event is copied into a temp buffer instead of being written into the ring buffer directly, because the discarding of events from the ring buffer is very expensive, and doing the extra copy is much faster than having to discard most of the time. As that logic is subtle, add comments to explain in more detail to what is going on and how it works. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-10tracing: Simplify the max length test when using the filtering temp bufferSteven Rostedt (VMware)
When filtering trace events, a temp buffer is used because the extra copy from the temp buffer into the ring buffer is still faster than the direct write into the ring buffer followed by a discard if the filter does not match. But the data that can be stored in the temp buffer is a PAGE_SIZE minus the ring buffer event header. The calculation of that header size is complex, but using the helper macro "struct_size()" can simplify it. Link: https://lore.kernel.org/stable/CAHk-=whKbJkuVmzb0hD3N6q7veprUrSpiBHRxVY=AffWZPtxmg@mail.gmail.com/ Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-10tracing: Remove redundant initialization of variable retColin Ian King
The variable ret is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Link: https://lkml.kernel.org/r/20210513115517.58178-1-colin.king@canonical.com Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-08tracing: Correct the length check which causes memory corruptionLiangyan
We've suffered from severe kernel crashes due to memory corruption on our production environment, like, Call Trace: [1640542.554277] general protection fault: 0000 [#1] SMP PTI [1640542.554856] CPU: 17 PID: 26996 Comm: python Kdump: loaded Tainted:G [1640542.556629] RIP: 0010:kmem_cache_alloc+0x90/0x190 [1640542.559074] RSP: 0018:ffffb16faa597df8 EFLAGS: 00010286 [1640542.559587] RAX: 0000000000000000 RBX: 0000000000400200 RCX: 0000000006e931bf [1640542.560323] RDX: 0000000006e931be RSI: 0000000000400200 RDI: ffff9a45ff004300 [1640542.560996] RBP: 0000000000400200 R08: 0000000000023420 R09: 0000000000000000 [1640542.561670] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff9a20608d [1640542.562366] R13: ffff9a45ff004300 R14: ffff9a45ff004300 R15: 696c662f65636976 [1640542.563128] FS: 00007f45d7c6f740(0000) GS:ffff9a45ff840000(0000) knlGS:0000000000000000 [1640542.563937] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1640542.564557] CR2: 00007f45d71311a0 CR3: 000000189d63e004 CR4: 00000000003606e0 [1640542.565279] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1640542.566069] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1640542.566742] Call Trace: [1640542.567009] anon_vma_clone+0x5d/0x170 [1640542.567417] __split_vma+0x91/0x1a0 [1640542.567777] do_munmap+0x2c6/0x320 [1640542.568128] vm_munmap+0x54/0x70 [1640542.569990] __x64_sys_munmap+0x22/0x30 [1640542.572005] do_syscall_64+0x5b/0x1b0 [1640542.573724] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [1640542.575642] RIP: 0033:0x7f45d6e61e27 James Wang has reproduced it stably on the latest 4.19 LTS. After some debugging, we finally proved that it's due to ftrace buffer out-of-bound access using a debug tool as follows: [ 86.775200] BUG: Out-of-bounds write at addr 0xffff88aefe8b7000 [ 86.780806] no_context+0xdf/0x3c0 [ 86.784327] __do_page_fault+0x252/0x470 [ 86.788367] do_page_fault+0x32/0x140 [ 86.792145] page_fault+0x1e/0x30 [ 86.795576] strncpy_from_unsafe+0x66/0xb0 [ 86.799789] fetch_memory_string+0x25/0x40 [ 86.804002] fetch_deref_string+0x51/0x60 [ 86.808134] kprobe_trace_func+0x32d/0x3a0 [ 86.812347] kprobe_dispatcher+0x45/0x50 [ 86.816385] kprobe_ftrace_handler+0x90/0xf0 [ 86.820779] ftrace_ops_assist_func+0xa1/0x140 [ 86.825340] 0xffffffffc00750bf [ 86.828603] do_sys_open+0x5/0x1f0 [ 86.832124] do_syscall_64+0x5b/0x1b0 [ 86.835900] entry_SYSCALL_64_after_hwframe+0x44/0xa9 commit b220c049d519 ("tracing: Check length before giving out the filter buffer") adds length check to protect trace data overflow introduced in 0fc1b09ff1ff, seems that this fix can't prevent overflow entirely, the length check should also take the sizeof entry->array[0] into account, since this array[0] is filled the length of trace data and occupy addtional space and risk overflow. Link: https://lkml.kernel.org/r/20210607125734.1770447-1-liangyan.peng@linux.alibaba.com Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@redhat.com> Cc: Xunlei Pang <xlpang@linux.alibaba.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: b220c049d519 ("tracing: Check length before giving out the filter buffer") Reviewed-by: Xunlei Pang <xlpang@linux.alibaba.com> Reviewed-by: yinbinbin <yinbinbin@alibabacloud.com> Reviewed-by: Wetp Zhang <wetp.zy@linux.alibaba.com> Tested-by: James Wang <jnwang@linux.alibaba.com> Signed-off-by: Liangyan <liangyan.peng@linux.alibaba.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-05-13tracing: Handle %.*s in trace_check_vprintf()Steven Rostedt (VMware)
If a trace event uses the %*.s notation, the trace_check_vprintf() will fail and will warn about a bad processing of strings, because it does not take into account the length field when processing the star (*) part. Have it handle this case as well. Link: https://lore.kernel.org/linux-nfs/238C0E2D-C2A4-4578-ADD2-C565B3B99842@oracle.com/ Reported-by: Chuck Lever III <chuck.lever@oracle.com> Fixes: 9a6944fee68e2 ("tracing: Add a verifier to check string pointers for trace events") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-05-03Merge tag 'trace-v5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "New feature: - A new "func-no-repeats" option in tracefs/options directory. When set the function tracer will detect if the current function being traced is the same as the previous one, and instead of recording it, it will keep track of the number of times that the function is repeated in a row. And when another function is recorded, it will write a new event that shows the function that repeated, the number of times it repeated and the time stamp of when the last repeated function occurred. Enhancements: - In order to implement the above "func-no-repeats" option, the ring buffer timestamp can now give the accurate timestamp of the event as it is being recorded, instead of having to record an absolute timestamp for all events. This helps the histogram code which no longer needs to waste ring buffer space. - New validation logic to make sure all trace events that access dereferenced pointers do so in a safe way, and will warn otherwise. Fixes: - No longer limit the PIDs of tasks that are recorded for "saved_cmdlines" to PID_MAX_DEFAULT (32768), as systemd now allows for a much larger range. This caused the mapping of PIDs to the task names to be dropped for all tasks with a PID greater than 32768. - Change trace_clock_global() to never block. This caused a deadlock. Clean ups: - Typos, prototype fixes, and removing of duplicate or unused code. - Better management of ftrace_page allocations" * tag 'trace-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (32 commits) tracing: Restructure trace_clock_global() to never block tracing: Map all PIDs to command lines ftrace: Reuse the output of the function tracer for func_repeats tracing: Add "func_no_repeats" option for function tracing tracing: Unify the logic for function tracing options tracing: Add method for recording "func_repeats" events tracing: Add "last_func_repeats" to struct trace_array tracing: Define new ftrace event "func_repeats" tracing: Define static void trace_print_time() ftrace: Simplify the calculation of page number for ftrace_page->records some more ftrace: Store the order of pages allocated in ftrace_page tracing: Remove unused argument from "ring_buffer_time_stamp() tracing: Remove duplicate struct declaration in trace_events.h tracing: Update create_system_filter() kernel-doc comment tracing: A minor cleanup for create_system_filter() kernel: trace: Mundane typo fixes in the file trace_events_filter.c tracing: Fix various typos in comments scripts/recordmcount.pl: Make vim and emacs indent the same scripts/recordmcount.pl: Make indent spacing consistent tracing: Add a verifier to check string pointers for trace events ...
2021-04-28tracing: Map all PIDs to command linesSteven Rostedt (VMware)
The default max PID is set by PID_MAX_DEFAULT, and the tracing infrastructure uses this number to map PIDs to the comm names of the tasks, such output of the trace can show names from the recorded PIDs in the ring buffer. This mapping is also exported to user space via the "saved_cmdlines" file in the tracefs directory. But currently the mapping expects the PIDs to be less than PID_MAX_DEFAULT, which is the default maximum and not the real maximum. Recently, systemd will increases the maximum value of a PID on the system, and when tasks are traced that have a PID higher than PID_MAX_DEFAULT, its comm is not recorded. This leads to the entire trace to have "<...>" as the comm name, which is pretty useless. Instead, keep the array mapping the size of PID_MAX_DEFAULT, but instead of just mapping the index to the comm, map a mask of the PID (PID_MAX_DEFAULT - 1) to the comm, and find the full PID from the map_cmdline_to_pid array (that already exists). This bug goes back to the beginning of ftrace, but hasn't been an issue until user space started increasing the maximum value of PIDs. Link: https://lkml.kernel.org/r/20210427113207.3c601884@gandalf.local.home Cc: stable@vger.kernel.org Fixes: bc0c38d139ec7 ("ftrace: latency tracer infrastructure") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>