| Age | Commit message (Collapse) | Author |
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git
# Conflicts:
# include/uapi/linux/kvm.h
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
|
|
to resolve the conflict with the urgent clockevents fixes.
|
|
When an unqualified kprobe target exists in both vmlinux and a loaded
module, number_of_same_symbols() returns a count greater than 1,
causing kprobe attachment to fail with -EADDRNOTAVAIL even though the
vmlinux symbol is unambiguous.
When no module qualifier is given and the symbol is found in vmlinux,
return the vmlinux-only count without scanning loaded modules. This
preserves the existing behavior for all other cases:
- Symbol only in a module: vmlinux count is 0, falls through to module
scan as before.
- Symbol qualified with MOD:SYM: mod != NULL, unchanged path.
- Symbol ambiguous within vmlinux itself: count > 1 is returned as-is.
Fixes: 926fe783c8a6 ("tracing/kprobes: Fix symbol counting logic by looking at modules as well")
Fixes: 9d8616034f16 ("tracing/kprobes: Add symbol counting check when module loads")
Suggested-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
Link: https://lore.kernel.org/r/20260407203912.1787502-2-andrey.grodzovsky@crowdstrike.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
|
|
|
|
|
|
|
|
When the persistent ring buffer was first introduced, it did not make
sense to start tracing for it on the kernel command line. That's because
if there was a crash, the start of events would invalidate the events from
the previous boot that had the crash.
But now that there's a "backup" instance that can take a snapshot of the
persistent ring buffer when boot starts, it is possible to have the
persistent ring buffer start events at boot up and not lose the old events.
Update the code where the boot events start after all boot time instances
are created. This will allow the backup instance to copy the persistent
ring buffer from the previous boot, and allow the persistent ring buffer
to start tracing new events for the current boot.
reserve_mem=100M:12M:trace trace_instance=boot_mapped^@trace,sched trace_instance=backup=boot_mapped
The above will create a boot_mapped persistent ring buffer and enabled the
scheduler events. If there's a crash, a "backup" instance will be created
holding the events of the persistent ring buffer from the previous boot,
while the persistent ring buffer will once again start tracing scheduler
events of the current boot.
Now the user doesn't have to remember to start the persistent ring buffer.
It will always have the events started at each boot.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: John Stultz <jstultz@google.com>
Link: https://patch.msgid.link/20260331163924.6ccb3896@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Since the backup instance is readonly, after reading all data via pipe, no
data is left on the instance. Thus it can be removed safely after closing
all files. This also removes it if user resets the ring buffer manually
via 'trace' file.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/177502547711.1311542.12572973358010839400.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Since there is no reason to reuse the backup instance, make it readonly
(but erasable). Note that only backup instances are readonly, because
other trace instances will be empty unless it is writable. Only backup
instances have copy entries from the original.
With this change, most of the trace control files are removed from the
backup instance, including eventfs enable/filter etc.
# find /sys/kernel/tracing/instances/backup/events/ | wc -l
4093
# find /sys/kernel/tracing/instances/boot_map/events/ | wc -l
9573
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/177502546939.1311542.1826814401724828930.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
On CPU hotplug, if it is the first time a trace_buffer sees a CPU, a
ring_buffer_per_cpu will be allocated and its corresponding bit toggled
in the cpumask. Many readers check this cpumask to know if they can
safely read the ring_buffer_per_cpu but they are doing so without memory
ordering and may observe the cpumask bit set while having NULL buffer
pointer.
Enforce the memory read ordering by sending an IPI to all online CPUs.
The hotplug path is a slow-path anyway and it saves us from adding read
barriers in numerous call sites.
Link: https://patch.msgid.link/20260401053659.3458961-1-vdonnefort@google.com
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
kprobe.multi programs run in atomic/RCU context and cannot sleep.
However, bpf_kprobe_multi_link_attach() did not validate whether the
program being attached had the sleepable flag set, allowing sleepable
helpers such as bpf_copy_from_user() to be invoked from a non-sleepable
context.
This causes a "sleeping function called from invalid context" splat:
BUG: sleeping function called from invalid context at ./include/linux/uaccess.h:169
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1787, name: sudo
preempt_count: 1, expected: 0
RCU nest depth: 2, expected: 0
Fix this by rejecting sleepable programs early in
bpf_kprobe_multi_link_attach(), before any further processing.
Fixes: 0dcac2725406 ("bpf: Add multi kprobe link")
Signed-off-by: Varun R Mallya <varunrmallya@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Leon Hwang <leon.hwang@linux.dev>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260401191126.440683-1-varunrmallya@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When a trace_buffer is created while a CPU is offline, this CPU is
cleared from the trace_buffer CPU mask, preventing the creation of a
non-consuming iterator (ring_buffer_iter). For trace remotes, it means
the iterator fails to be allocated (-ENOMEM) even though there are
available ring buffers in the trace_buffer.
For non-consuming reads of trace remotes, skip missing ring_buffer_iter
to allow reading the available ring buffers.
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://patch.msgid.link/20260401045100.3394299-2-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
When ftrace_lookup_symbols() is called with a single symbol (cnt == 1),
use kallsyms_lookup_name() for O(log N) binary search instead of the
full linear scan via kallsyms_on_each_symbol().
ftrace_lookup_symbols() was designed for batch resolution of many
symbols in a single pass. For large cnt this is efficient: a single
O(N) walk over all symbols with O(log cnt) binary search into the
sorted input array. But for cnt == 1 it still decompresses all ~200K
kernel symbols only to match one.
kallsyms_lookup_name() uses the sorted kallsyms index and needs only
~17 decompressions for a single lookup.
This is the common path for kprobe.session with exact function names,
where libbpf sends one symbol per BPF_LINK_CREATE syscall.
If binary lookup fails (duplicate symbol names where the first match
is not ftrace-instrumented), the function falls through to the existing
linear scan path.
Before (cnt=1, 50 kprobe.session programs):
Attach: 858 ms (kallsyms_expand_symbol 25% of CPU)
After:
Attach: 52 ms (16x faster)
Cc: <bpf@vger.kernel.org>
Link: https://patch.msgid.link/20260302200837.317907-3-andrey.grodzovsky@crowdstrike.com
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When the SNAPSHOT is defined but FSNOTIFY is not the latency_fsnotify()
function is turned into a static inline stub. But this stub was defined in
both trace.h and trace_snapshot.c causing a error in build when
CONFIG_SNAPSHOT is defined but FSNOTIFY is not. The stub is not needed in
trace_snapshot.c as it will be defined in trace.h, remove it from the C
file.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260330205859.24c0aae3@gandalf.local.home
Fixes: bade44fe5462 ("tracing: Move snapshot code out of trace.c and into trace_snapshot.c")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202603310604.lGE9LDBK-lkp@intel.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
trace_trigger= tokenizes bootup_trigger_buf in place and stores pointers
into that buffer for later trigger registration. Repeated trace_trigger=
parameters overwrite the buffer contents from earlier calls, leaving
only the last set of parsed event and trigger strings.
Keep each new trace_trigger= string at the end of bootup_trigger_buf and
parse only the appended range. That preserves the earlier event and
trigger strings while still letting repeated parameters queue additional
boot-time triggers.
This also lets Bootconfig array values work naturally when they expand
to repeated trace_trigger= entries.
Before this change, only the last trace_trigger= instance survived boot.
Link: https://patch.msgid.link/20260330181103.1851230-2-atwellwea@gmail.com
Signed-off-by: Wesley Atwell <atwellwea@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Some tracing boot parameters already accept delimited value lists, but
their __setup() handlers keep only the last instance seen at boot.
Make repeated instances append to the same boot-time buffer in the
format each parser already consumes.
Use a shared trace_append_boot_param() helper for the ftrace filters,
trace_options, and kprobe_event boot parameters.
This also lets Bootconfig array values work naturally when they expand
to repeated param=value entries.
Before this change, only the last instance from each repeated
parameter survived boot.
Link: https://patch.msgid.link/20260330181103.1851230-1-atwellwea@gmail.com
Signed-off-by: Wesley Atwell <atwellwea@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
formats
Change 2d8b7f9bf8e6e ("tracing: Have show_event_trigger/filter format a bit more in columns")
added space padding to align the output.
However it used ("%*.s", len, "") which requests the default precision.
It doesn't matter here whether the userspace default (0) or kernel
default (no precision) is used, but the format should be "%*s".
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20260326201824.3919-1-david.laight.linux@gmail.com
Signed-off-by: David Laight <david.laight.linux@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The function tracing_alloc_snapshot() is only used between trace.c and
trace_snapshot.c. When snapshot isn't configured, it's not used at all.
The stub function was defined as a global with no users and no prototype
causing build issues.
Remove the function when snapshot isn't configured as nothing is calling
it.
Also remove the EXPORT_SYMBOL_GPL() that was associated with it as it's
not used outside of the tracing subsystem which also includes any modules.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260328101946.2c4ef4a5@robin
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/acb-IuZ4vDkwwQLW@sirena.co.uk/
Fixes: bade44fe546212 (tracing: Move snapshot code out of trace.c and into trace_snapshot.c)
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Boot-time trigger registration can fail before the trigger-data cleanup
kthread exists. Deferring those frees until late init is fine, but the
post-boot fallback must still drain the deferred list if kthread
creation never succeeds.
Otherwise, boot-deferred nodes can accumulate on
trigger_data_free_list, later frees fall back to synchronously freeing
only the current object, and the older queued entries are leaked
forever.
To trigger this, add the following to the kernel command line:
trace_event=sched_switch trace_trigger=sched_switch.traceon,sched_switch.traceon
The second traceon trigger will fail and be freed. This triggers a NULL
pointer dereference and crashes the kernel.
Keep the deferred boot-time behavior, but when kthread creation fails,
drain the whole queued list synchronously. Do the same in the late-init
drain path so queued entries are not stranded there either.
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260324221326.1395799-3-atwellwea@gmail.com
Fixes: 61d445af0a7c ("tracing: Add bulk garbage collection of freeing event_trigger_data")
Signed-off-by: Wesley Atwell <atwellwea@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The following sequence may leads deadlock in cpu hotplug:
task1 task2 task3
----- ----- -----
mutex_lock(&interface_lock)
[CPU GOING OFFLINE]
cpus_write_lock();
osnoise_cpu_die();
kthread_stop(task3);
wait_for_completion();
osnoise_sleep();
mutex_lock(&interface_lock);
cpus_read_lock();
[DEAD LOCK]
Fix by swap the order of cpus_read_lock() and mutex_lock(&interface_lock).
Cc: stable@vger.kernel.org
Cc: <mathieu.desnoyers@efficios.com>
Cc: <zhang.run@zte.com.cn>
Cc: <yang.tao172@zte.com.cn>
Cc: <ran.xiaokai@zte.com.cn>
Fixes: bce29ac9ce0bb ("trace: Add osnoise tracer")
Link: https://patch.msgid.link/20260326141953414bVSj33dAYktqp9Oiyizq8@zte.com.cn
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Luo Haiyang <luo.haiyang@zte.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Conflicts:
kernel/trace/trace.c
|
|
The trace.c file was a dumping ground for most tracing code. Start
organizing it better by moving various functions out into their own files.
Move all the snapshot code, including the max trace code into its own
trace_snapshot.c file.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260324140145.36352d6a@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The testing for tracing was triggering a timestamp count issue that was
always off by one. This has been happening for some time but has never
been reported by anyone else. It was finally discovered to be an issue
with the "uptime" (jiffies) clock that happened to be traced and the
internal recursion caused the discrepancy. This would have been much
easier to solve if the clock function being used was displayed when the
error was detected.
Add the clock function to the error output.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260323202212.479bb288@gandalf.local.home
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
trace/ring-buffer/core
The commit f35dbac69421 ("ring-buffer: Fix to update per-subbuf entries of
persistent ring buffer") was a fix and merged upstream. It is needed for
some other work in the ring buffer. The current branch has the remote
buffer code that is shared with the Arm64 subsystem and can't be rebased.
Merge in the upstream commit to allow continuing of the ring buffer work.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
If fprobe_entry does not fill the allocated fgraph_data completely, the
unused part does not have to be zeroed.
fgraph_data is a short-lived part of the shadow stack. The preceding
length field allows locating the end regardless of the content.
Link: https://lore.kernel.org/all/20260324084804.375764-1-martin@kaiser.cx/
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
Currently, print_function_args() prints enum parameter values
in decimal format, reducing trace log readability.
Use BTF information to resolve enum parameters and print their
symbolic names (where available). This improves readability by
showing meaningful identifiers instead of raw numbers.
Before:
mod_memcg_lruvec_state(lruvec=0xffff..., idx=5, val=320)
After:
mod_memcg_lruvec_state(lruvec=0xffff..., idx=5 [NR_SLAB_RECLAIMABLE_B], val=320)
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://patch.msgid.link/20260209071949.4040193-1-dolinux.peng@gmail.com
Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The tracing_open_file_tr() function currently copies the trace_event_file
pointer from inode->i_private to file->private_data when the file is
successfully opened. This duplication is not particularly useful, as all
event code should utilize event_file_file() or event_file_data() to
retrieve a trace_event_file pointer from a file struct and these access
functions read file->f_inode->i_private. Moreover, this setup requires the
code for opening hist files to explicitly clear file->private_data before
calling single_open(), since this function expects the private_data member
to be set to NULL and uses it to store a pointer to a seq_file.
Remove the unnecessary setting of file->private_data in
tracing_open_file_tr() and simplify the hist code.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Link: https://patch.msgid.link/20260219162737.314231-6-petr.pavlu@suse.com
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The tracing code provides two functions event_file_file() and
event_file_data() to obtain a trace_event_file pointer from a file struct.
The primary method to use is event_file_file(), as it checks for the
EVENT_FILE_FL_FREED flag to determine whether the event is being removed.
The second function event_file_data() is an optimization for retrieving the
same data when the event_mutex is still held.
In the past, when removing an event directory in remove_event_file_dir(),
the code set i_private to NULL for all event files and readers were
expected to check for this state to recognize that the event is being
removed. In the case of event_id_read(), the value was read using
event_file_data() without acquiring the event_mutex. This required
event_file_data() to use READ_ONCE() when retrieving the i_private data.
With the introduction of eventfs, i_private is assigned when an eventfs
inode is allocated and remains set throughout its lifetime.
Remove the now unnecessary READ_ONCE() access to i_private in both
event_file_file() and event_file_data(). Inline the access to i_private in
remove_event_file_dir(), which allows event_file_data() to handle i_private
solely as a trace_event_file pointer. Add a check in event_file_data() to
ensure that the event_mutex is held and that file->flags doesn't have the
EVENT_FILE_FL_FREED flag set. Finally, move event_file_data() immediately
after event_file_code() since the latter provides a comment explaining how
both functions should be used together.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Link: https://patch.msgid.link/20260219162737.314231-5-petr.pavlu@suse.com
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The event_filter_write() function calls event_file_file() to retrieve
a trace_event_file associated with a given file struct. If a non-NULL
pointer is returned, the function then checks whether the trace_event_file
instance has the EVENT_FILE_FL_FREED flag set. This check is redundant
because event_file_file() already performs this validation and returns NULL
if the flag is set. The err value is also already initialized to -ENODEV.
Remove the unnecessary check for EVENT_FILE_FL_FREED in
event_filter_write().
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Link: https://patch.msgid.link/20260219162737.314231-4-petr.pavlu@suse.com
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The sunrpc change to use trace_printk() for debugging caused
a new warning for every instance of dprintk() in some configurations,
when -Wformat-security is enabled:
fs/nfs/getroot.c: In function 'nfs_get_root':
fs/nfs/getroot.c:90:17: error: format not a string literal and no format arguments [-Werror=format-security]
90 | nfs_errorf(fc, "NFS: Couldn't getattr on root");
I've been slowly chipping away at those warnings over time with the
intention of enabling them by default in the future. While I could not
figure out why this only happens for this one instance, I see that the
__trace_bprintk() function is always called with a local variable as
the format string, rather than a literal.
Move the __printf(2,3) annotation on this function from the declaration
to the caller. As this is can only be validated for literals, the
attribute on the declaration causes the warnings every time, but
removing it entirely introduces a new warning on the __ftrace_vbprintk()
definition.
The format strings still get checked because the underlying literal keeps
getting passed into __trace_printk() in the "else" branch, which is not
taken but still evaluated for compile-time warnings.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yury Norov <ynorov@nvidia.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20260203164545.3174910-1-arnd@kernel.org
Fixes: ec7d8e68ef0e ("sunrpc: add a Kconfig option to redirect dfprintk() output to trace buffer")
Acked-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When the check_undefined command in kernel/trace/Makefile fails, there
is no output, making it hard to understand why the build failed. Capture
the output of the $(NM) + grep command and print it when failing to make
it clearer what the problem is.
Fixes: a717943d8ecc ("tracing: Check for undefined symbols in simple_ring_buffer")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260320-cmd_check_undefined-verbose-v1-1-54fc5b061f94@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The modify logic registers temporary ftrace_ops object (tmp_ops) to trigger
the slow path for all direct callers to be able to safely modify attached
addresses.
At the moment we use ops->func_hash for tmp_ops filter, which represents all
the systems attachments. It's faster to use just the passed hash filter, which
contains only the modified sites and is always a subset of the ops->func_hash.
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Menglong Dong <menglong8.dong@gmail.com>
Cc: Song Liu <song@kernel.org>
Link: https://patch.msgid.link/20260312123738.129926-1-jolsa@kernel.org
Fixes: e93672f770d7 ("ftrace: Add update_ftrace_direct_mod function")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Since the validation loop in rb_meta_validate_events() updates the same
cpu_buffer->head_page->entries, the other subbuf entries are not updated.
Fix to use head_page to update the entries field, since it is the cursor
in this loop.
Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ian Rogers <irogers@google.com>
Fixes: 5f3b6e839f3c ("ring-buffer: Validate boot range memory events")
Link: https://patch.msgid.link/177391153882.193994.17158784065013676533.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When the "copy_trace_marker" option is enabled for an instance, anything
written into /sys/kernel/tracing/trace_marker is also copied into that
instances buffer. When the option is set, that instance's trace_array
descriptor is added to the marker_copies link list. This list is protected
by RCU, as all iterations uses an RCU protected list traversal.
When the instance is deleted, all the flags that were enabled are cleared.
This also clears the copy_trace_marker flag and removes the trace_array
descriptor from the list.
The issue is after the flags are called, a direct call to
update_marker_trace() is performed to clear the flag. This function
returns true if the state of the flag changed and false otherwise. If it
returns true here, synchronize_rcu() is called to make sure all readers
see that its removed from the list.
But since the flag was already cleared, the state does not change and the
synchronization is never called, leaving a possible UAF bug.
Move the clearing of all flags below the updating of the copy_trace_marker
option which then makes sure the synchronization is performed.
Also use the flag for checking the state in update_marker_trace() instead
of looking at if the list is empty.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260318185512.1b6c7db4@gandalf.local.home
Fixes: 7b382efd5e8a ("tracing: Allow the top level trace_marker to write into another instances")
Reported-by: Sasha Levin <sashal@kernel.org>
Closes: https://lore.kernel.org/all/20260225133122.237275-1-sashal@kernel.org/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The system call trace events call trace_user_fault_read() to read the user
space part of some system calls. This is done by grabbing a per-cpu
buffer, disabling migration, enabling preemption, calling
copy_from_user(), disabling preemption, enabling migration and checking if
the task was preempted while preemption was enabled. If it was, the buffer
is considered corrupted and it tries again.
There's a safety mechanism that will fail out of this loop if it fails 100
times (with a warning). That warning message was triggered in some
pi_futex stress tests. Enabling the sched_switch trace event and
traceoff_on_warning, showed the problem:
pi_mutex_hammer-1375 [006] d..21 138.981648: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981651: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981656: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981659: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981664: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981667: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981671: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981675: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981679: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981682: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981687: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981690: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981695: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981698: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981703: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981706: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981711: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981714: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981719: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981722: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981727: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981730: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981735: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981738: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
What happened was the task 1375 was flagged to be migrated. When
preemption was enabled, the migration thread woke up to migrate that task,
but failed because migration for that task was disabled. This caused the
loop to fail to exit because the task scheduled out while trying to read
user space.
Every time the task enabled preemption the migration thread would schedule
in, try to migrate the task, fail and let the task continue. But because
the loop would only enable preemption with migration disabled, it would
always fail because each time it enabled preemption to read user space,
the migration thread would try to migrate it.
To solve this, when the loop fails to read user space without being
scheduled out, enabled and disable preemption with migration enabled. This
will allow the migration task to successfully migrate the task and the
next loop should succeed to read user space without being scheduled out.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260316130734.1858a998@gandalf.local.home
Fixes: 64cf7d058a005 ("tracing: Have trace_marker use per-cpu data to read user space")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Resolve conflict between this change in the upstream kernel:
4c652a47722f ("rseq: Mark rseq_arm_slice_extension_timer() __always_inline")
... and this pending change in timers/core:
0e98eb14814e ("entry: Prepare for deferred hrtimer rearming")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Restore the SPDX tag that was accidentally dropped.
Fixes: 7e4b6c94300e3 ("tracing: add more symbols to whitelist")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://patch.msgid.link/20260317194252.1890568-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Compiler and tooling-generated symbols are difficult to maintain
across all supported architectures. Make the allowlist more robust by
replacing the harcoded list with a mechanism that automatically detects
these symbols.
This mechanism generates a C function designed to trigger common
compiler-inserted symbols.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260316092845.3367411-1-vdonnefort@google.com
[maz: added __msan prefix to allowlist as pointed out by Arnd]
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Randconfig builds show a number of cryptic build errors from
hitting undefined symbols in simple_ring_buffer.o:
make[7]: *** [/home/arnd/arm-soc/kernel/trace/Makefile:147: kernel/trace/simple_ring_buffer.o.checked] Error 1
These happen with CONFIG_TRACE_BRANCH_PROFILING, CONFIG_KASAN_HW_TAGS,
CONFIG_STACKPROTECTOR, CONFIG_DEBUG_IRQFLAGS and indirectly from WARN_ON().
Add exceptions for each one that I have hit so far on arm64, x86_64 and arm
randconfig builds.
Other architectures likely hit additional ones, so it would be nice
to produce a little more verbose output that include the name of the
missing symbols directly.
Fixes: a717943d8ecc ("tracing: Check for undefined symbols in simple_ring_buffer")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260312123601.625063-2-arnd@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Undefined symbols are not allowed for simple_ring_buffer.c. But some
compiler emitted symbols are missing in the allowlist. Update it.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Fixes: a717943d8ecc ("tracing: Check for undefined symbols in simple_ring_buffer")
Closes: https://lore.kernel.org/all/20260311221816.GA316631@ax162/
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://patch.msgid.link/20260312113535.2213350-1-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The sentinel value added by the wrapper macros __print_symbolic() et al
prevents the callers from adding their own trailing comma. This makes
constructing symbol list dynamically based on kconfig values tedious.
Drop the sentinel elements, so callers can either specify the trailing
comma or not, just like in regular array initializers.
Signed-off-by: Thomas Weißschuh (Schneider Electric) <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260311-hrtimer-cleanups-v1-2-095357392669@linutronix.de
|
|
The simple_ring_buffer implementation must remain simple enough to be
used by the pKVM hypervisor. Prevent the object build if unresolved
symbols are found.
Link: https://patch.msgid.link/20260309162516.2623589-19-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add load/unload callback used for each admitted page in the ring-buffer.
This will be later useful for the pKVM hypervisor which uses a different
VA space and need to dynamically map/unmap the ring-buffer pages.
Link: https://patch.msgid.link/20260309162516.2623589-18-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add a module to help testing the tracefs support for trace remotes. This
module:
* Use simple_ring_buffer to write into a ring-buffer.
* Declare a single "selftest" event that can be triggered from
user-space.
* Register a "test" trace remote.
This is intended to be used by trace remote selftests.
Link: https://patch.msgid.link/20260309162516.2623589-15-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add a simple implementation of the kernel ring-buffer. This intends to
be used later by ring-buffer remotes such as the pKVM hypervisor, hence
the need for a cut down version (write only) without any dependency.
Link: https://patch.msgid.link/20260309162516.2623589-14-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
In preparation for allowing the writing of ring-buffer compliant pages
outside of ring_buffer.c, move buffer_data_page and timestamps encoding
macros into the publicly available ring_buffer_types.h.
Link: https://patch.msgid.link/20260309162516.2623589-13-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Just like for the kernel events directory, add 'enable', 'header_page'
and 'header_event' at the root of the trace remote events/ directory.
Link: https://patch.msgid.link/20260309162516.2623589-11-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|