| Age | Commit message (Collapse) | Author |
|
commit 78041c0c9e935d9ce4086feeff6c569ed88ddfd4 upstream.
The tracing seftests checks various aspects of the tracing infrastructure,
and one is filtering. If trace_printk() is active during a self test, it can
cause the filtering to fail, which will disable that part of the trace.
To keep the selftests from failing because of trace_printk() calls,
trace_printk() checks the variable tracing_selftest_running, and if set, it
does not write to the tracing buffer.
As some tracers were registered earlier in boot, the selftest they triggered
would fail because not all the infrastructure was set up for the full
selftest. Thus, some of the tests were post poned to when their
infrastructure was ready (namely file system code). The postpone code did
not set the tracing_seftest_running variable, and could fail if a
trace_printk() was added and executed during their run.
Cc: stable@vger.kernel.org
Fixes: 9afecfbb95198 ("tracing: Postpone tracer start-up tests till the system is more robust")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 6722b23e7a2ace078344064a9735fb73e554e9ef ]
if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.
Without patch:
# dd bs=30 skip=1 if=/sys/kernel/tracing/events/sched/sched_switch/trigger
dd: /sys/kernel/tracing/events/sched/sched_switch/trigger: cannot skip to specified offset
n traceoff snapshot stacktrace enable_event disable_event enable_hist disable_hist hist
# Available triggers:
# traceon traceoff snapshot stacktrace enable_event disable_event enable_hist disable_hist hist
6+1 records in
6+1 records out
206 bytes copied, 0.00027916 s, 738 kB/s
Notice the printing of "# Available triggers:..." after the line.
With the patch:
# dd bs=30 skip=1 if=/sys/kernel/tracing/events/sched/sched_switch/trigger
dd: /sys/kernel/tracing/events/sched/sched_switch/trigger: cannot skip to specified offset
n traceoff snapshot stacktrace enable_event disable_event enable_hist disable_hist hist
2+1 records in
2+1 records out
88 bytes copied, 0.000526867 s, 167 kB/s
It only prints the end of the file, and does not restart.
Link: http://lkml.kernel.org/r/3c35ee24-dd3a-8119-9c19-552ed253388a@virtuozzo.com
https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e4075e8bdffd93a9b6d6e1d52fabedceeca5a91b ]
if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.
Without patch:
# dd bs=4 skip=1 if=/sys/kernel/tracing/set_ftrace_pid
dd: /sys/kernel/tracing/set_ftrace_pid: cannot skip to specified offset
id
no pid
2+1 records in
2+1 records out
10 bytes copied, 0.000213285 s, 46.9 kB/s
Notice the "id" followed by "no pid".
With the patch:
# dd bs=4 skip=1 if=/sys/kernel/tracing/set_ftrace_pid
dd: /sys/kernel/tracing/set_ftrace_pid: cannot skip to specified offset
id
0+1 records in
0+1 records out
3 bytes copied, 0.000202112 s, 14.8 kB/s
Notice that it only prints "id" and not the "no pid" afterward.
Link: http://lkml.kernel.org/r/4f87c6ad-f114-30bb-8506-c32274ce2992@virtuozzo.com
https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit dfb6cd1e654315168e36d947471bd2a0ccd834ae ]
Looking through old emails in my INBOX, I came across a patch from Luis
Henriques that attempted to fix a race of two stat tracers registering the
same stat trace (extremely unlikely, as this is done in the kernel, and
probably doesn't even exist). The submitted patch wasn't quite right as it
needed to deal with clean up a bit better (if two stat tracers were the
same, it would have the same files).
But to make the code cleaner, all we needed to do is to keep the
all_stat_sessions_mutex held for most of the registering function.
Link: http://lkml.kernel.org/r/1410299375-20068-1-git-send-email-luis.henriques@canonical.com
Fixes: 002bb86d8d42f ("tracing/ftrace: separate events tracing and stats tracing engine")
Reported-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit afccc00f75bbbee4e4ae833a96c2d29a7259c693 ]
tracing_stat_init() was always returning '0', even on the error paths. It
now returns -ENODEV if tracing_init_dentry() fails or -ENOMEM if it fails
to created the 'trace_stat' debugfs directory.
Link: http://lkml.kernel.org/r/1410299381-20108-1-git-send-email-luis.henriques@canonical.com
Fixes: ed6f1c996bfe4 ("tracing: Check return value of tracing_init_dentry()")
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
[ Pulled from the archeological digging of my INBOX ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 54a16ff6f2e50775145b210bcd94d62c3c2af117 ]
As function_graph tracer can run when RCU is not "watching", it can not be
protected by synchronize_rcu() it requires running a task on each CPU before
it can be freed. Calling schedule_on_each_cpu(ftrace_sync) needs to be used.
Link: https://lore.kernel.org/r/20200205131110.GT2935@paulmck-ThinkPad-P72
Cc: stable@vger.kernel.org
Fixes: b9b0c831bed26 ("ftrace: Convert graph filter to use hash tables")
Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 16052dd5bdfa16dbe18d8c1d4cde2ddab9d23177 ]
Because the function graph tracer can execute in sections where RCU is not
"watching", the rcu_dereference_sched() for the has needs to be open coded.
This is fine because the RCU "flavor" of the ftrace hash is protected by
its own RCU handling (it does its own little synchronization on every CPU
and does not rely on RCU sched).
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit fd0e6852c407dd9aefc594f54ddcc21d84803d3b ]
Fix following instances of sparse error
kernel/trace/ftrace.c:5667:29: error: incompatible types in comparison
kernel/trace/ftrace.c:5813:21: error: incompatible types in comparison
kernel/trace/ftrace.c:5868:36: error: incompatible types in comparison
kernel/trace/ftrace.c:5870:25: error: incompatible types in comparison
Use rcu_dereference_protected to dereference the newly annotated pointer.
Link: http://lkml.kernel.org/r/20200205055701.30195-1-frextrite@gmail.com
Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 24a9729f831462b1d9d61dc85ecc91c59037243f ]
Fix following instances of sparse error
kernel/trace/ftrace.c:5664:29: error: incompatible types in comparison
kernel/trace/ftrace.c:5785:21: error: incompatible types in comparison
kernel/trace/ftrace.c:5864:36: error: incompatible types in comparison
kernel/trace/ftrace.c:5866:25: error: incompatible types in comparison
Use rcu_dereference_protected to access the __rcu annotated pointer.
Link: http://lkml.kernel.org/r/20200201072703.17330-1-frextrite@gmail.com
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 64ae572bc7d0060429e40e1c8d803ce5eb31a0d6 upstream.
Reading the sched_cmdline_ref and sched_tgid_ref initial state within
tracing_start_sched_switch without holding the sched_register_mutex is
racy against concurrent updates, which can lead to tracepoint probes
being registered more than once (and thus trigger warnings within
tracepoint.c).
[ May be the fix for this bug ]
Link: https://lore.kernel.org/r/000000000000ab6f84056c786b93@google.com
Link: http://lkml.kernel.org/r/20190817141208.15226-1-mathieu.desnoyers@efficios.com
Cc: stable@vger.kernel.org
CC: Steven Rostedt (VMware) <rostedt@goodmis.org>
CC: Joel Fernandes (Google) <joel@joelfernandes.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Paul E. McKenney <paulmck@linux.ibm.com>
Reported-by: syzbot+774fddf07b7ab29a1e55@syzkaller.appspotmail.com
Fixes: d914ba37d7145 ("tracing: Add support for recording tgid of tasks")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b8299d362d0837ae39e87e9019ebe6b736e0f035 upstream.
On some archs with some configurations, MCOUNT_INSN_SIZE is not defined, and
this makes the stack tracer fail to compile. Just define it to zero in this
case.
Link: https://lore.kernel.org/r/202001020219.zvE3vsty%lkp@intel.com
Cc: stable@vger.kernel.org
Fixes: 4df297129f622 ("tracing: Remove most or all of stack tracer stack size from stack_max_size")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
sched_migrate_task fail
commit 50f9ad607ea891a9308e67b81f774c71736d1098 upstream.
In the function, if register_trace_sched_migrate_task() returns error,
sched_switch/sched_wakeup_new/sched_wakeup won't unregister. That is
why fail_deprobe_sched_switch was added.
Link: http://lkml.kernel.org/r/20191231133530.2794-1-pilgrimtao@gmail.com
Cc: stable@vger.kernel.org
Fixes: 478142c39c8c2 ("tracing: do not grab lock in wakeup latency function tracing")
Signed-off-by: Kaitao Cheng <pilgrimtao@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e31f7939c1c27faa5d0e3f14519eaf7c89e8a69d upstream.
The ftrace_profile->counter is unsigned long and
do_div truncates it to 32 bits, which means it can test
non-zero and be truncated to zero for division.
Fix this issue by using div64_ul() instead.
Link: http://lkml.kernel.org/r/20200103030248.14516-1-wenyang@linux.alibaba.com
Cc: stable@vger.kernel.org
Fixes: e330b3bcd8319 ("tracing: Show sample std dev in function profiling")
Fixes: 34886c8bc590f ("tracing: add average time in function to function profiler")
Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 106f41f5a302cb1f36c7543fae6a05de12e96fa4 upstream.
The compare functions of the histogram code would be specific for the size
of the value being compared (byte, short, int, long long). It would
reference the value from the array via the type of the compare, but the
value was stored in a 64 bit number. This is fine for little endian
machines, but for big endian machines, it would end up comparing zeros or
all ones (depending on the sign) for anything but 64 bit numbers.
To fix this, first derference the value as a u64 then convert it to the type
being compared.
Link: http://lkml.kernel.org/r/20191211103557.7bed6928@gandalf.local.home
Cc: stable@vger.kernel.org
Fixes: 08d43a5fa063e ("tracing: Add lock-free tracing_map")
Acked-by: Tom Zanussi <zanussi@kernel.org>
Reported-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3a53acf1d9bea11b57c1f6205e3fe73f9d8a3688 upstream.
Task T2 Task T3
trace_options_core_write() subsystem_open()
mutex_lock(trace_types_lock) mutex_lock(event_mutex)
set_tracer_flag()
trace_event_enable_tgid_record() mutex_lock(trace_types_lock)
mutex_lock(event_mutex)
This gives a circular dependency deadlock between trace_types_lock and
event_mutex. To fix this invert the usage of trace_types_lock and
event_mutex in trace_options_core_write(). This keeps the sequence of
lock usage consistent.
Link: http://lkml.kernel.org/r/0101016eef175e38-8ca71caf-a4eb-480d-a1e6-6f0bbc015495-000000@us-west-2.amazonses.com
Cc: stable@vger.kernel.org
Fixes: d914ba37d7145 ("tracing: Add support for recording tgid of tasks")
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit d303de1fcf344ff7c15ed64c3f48a991c9958775 ]
A customer reported the following softlockup:
[899688.160002] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [test.sh:16464]
[899688.160002] CPU: 0 PID: 16464 Comm: test.sh Not tainted 4.12.14-6.23-azure #1 SLE12-SP4
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] Kernel panic - not syncing: softlockup: hung tasks
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] RSP: 0018:ffffa86784d4fde8 EFLAGS: 00000257 ORIG_RAX: ffffffffffffff12
[899688.160002] RAX: ffffffff970fea00 RBX: 0000000000000001 RCX: 0000000000000000
[899688.160002] RDX: ffffffff00000001 RSI: 0000000000000080 RDI: ffffffff970fea00
[899688.160002] RBP: ffffffffffffffff R08: ffffffffffffffff R09: 0000000000000000
[899688.160002] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b59014720d8
[899688.160002] R13: ffff8b59014720c0 R14: ffff8b5901471090 R15: ffff8b5901470000
[899688.160002] tracing_read_pipe+0x336/0x3c0
[899688.160002] __vfs_read+0x26/0x140
[899688.160002] vfs_read+0x87/0x130
[899688.160002] SyS_read+0x42/0x90
[899688.160002] do_syscall_64+0x74/0x160
It caught the process in the middle of trace_access_unlock(). There is
no loop. So, it must be looping in the caller tracing_read_pipe()
via the "waitagain" label.
Crashdump analyze uncovered that iter->seq was completely zeroed
at this point, including iter->seq.seq.size. It means that
print_trace_line() was never able to print anything and
there was no forward progress.
The culprit seems to be in the code:
/* reset all but tr, trace, and overruns */
memset(&iter->seq, 0,
sizeof(struct trace_iterator) -
offsetof(struct trace_iterator, seq));
It was added by the commit 53d0aa773053ab182877 ("ftrace:
add logic to record overruns"). It was v2.6.27-rc1.
It was the time when iter->seq looked like:
struct trace_seq {
unsigned char buffer[PAGE_SIZE];
unsigned int len;
};
There was no "size" variable and zeroing was perfectly fine.
The solution is to reinitialize the structure after or without
zeroing.
Link: http://lkml.kernel.org/r/20191011142134.11997-1-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 194c2c74f5532e62c218adeb8e2b683119503907 upstream.
As instances may have different tracers available, we need to look at the
trace_array descriptor that shows the list of the available tracers for the
instance. But there's a race between opening the file and an admin
deleting the instance. The trace_array_get() needs to be called before
accessing the trace_array.
Cc: stable@vger.kernel.org
Fixes: 607e2ea167e56 ("tracing: Set up infrastructure to allow tracers for instances")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9ef16693aff8137faa21d16ffe65bb9832d24d71 upstream.
The ftrace set_ftrace_filter and set_ftrace_notrace files are specific for
an instance now. They need to take a reference to the instance otherwise
there could be a race between accessing the files and deleting the instance.
It wasn't until the :mod: caching where these file operations started
referencing the trace_array directly.
Cc: stable@vger.kernel.org
Fixes: 673feb9d76ab3 ("ftrace: Add :mod: caching infrastructure to trace_array")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fc64e4ad80d4b72efce116f87b3174f0b7196f8e upstream.
max_latency is intended to record the maximum ever observed hardware
latency, which may occur in either part of the loop (inner/outer). So
we need to also consider the outer-loop sample when updating
max_latency.
Link: http://lkml.kernel.org/r/157073345463.17189.18124025522664682811.stgit@srivatsa-ubuntu
Fixes: e7c15cd8a113 ("tracing: Added hardware latency tracer")
Cc: stable@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 98dc19c11470ee6048aba723d77079ad2cda8a52 upstream.
nmi_total_ts is supposed to record the total time spent in *all* NMIs
that occur on the given CPU during the (active portion of the)
sampling window. However, the code seems to be overwriting this
variable for each NMI, thereby only recording the time spent in the
most recent NMI. Fix it by accumulating the duration instead.
Link: http://lkml.kernel.org/r/157073343544.17189.13911783866738671133.stgit@srivatsa-ubuntu
Fixes: 7b2c86250122 ("tracing: Add NMI tracing in hwlat detector")
Cc: stable@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 372e0d01da71c84dcecf7028598a33813b0d5256 upstream.
The race between adding a function probe and reading the probes that exist
is very subtle. It needs a comment. Also, the issue can also happen if the
probe has has the EMPTY_HASH as its func_hash.
Cc: stable@vger.kernel.org
Fixes: 7b60f3d876156 ("ftrace: Dynamically create the probe ftrace_ops for the trace_array")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5b0022dd32b7c2e15edf1827ba80aa1407edf9ff upstream.
In register_ftrace_function_probe(), we are not checking the return
value of alloc_and_copy_ftrace_hash(). The subsequent call to
ftrace_match_records() may end up dereferencing the same. Add a check to
ensure this doesn't happen.
Link: http://lkml.kernel.org/r/26e92574f25ad23e7cafa3cf5f7a819de1832cbe.1562249521.git.naveen.n.rao@linux.vnet.ibm.com
Cc: stable@vger.kernel.org
Fixes: 1ec3a81a0cf42 ("ftrace: Have each function probe use its own ftrace_ops")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 7bd46644ea0f6021dc396a39a8bfd3a58f6f1f9f upstream.
LTP testsuite on powerpc results in the below crash:
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc00000000029d800
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA PowerNV
...
CPU: 68 PID: 96584 Comm: cat Kdump: loaded Tainted: G W
NIP: c00000000029d800 LR: c00000000029dac4 CTR: c0000000001e6ad0
REGS: c0002017fae8ba10 TRAP: 0300 Tainted: G W
MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28022422 XER: 20040000
CFAR: c00000000029d90c DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0
...
NIP [c00000000029d800] t_probe_next+0x60/0x180
LR [c00000000029dac4] t_mod_start+0x1a4/0x1f0
Call Trace:
[c0002017fae8bc90] [c000000000cdbc40] _cond_resched+0x10/0xb0 (unreliable)
[c0002017fae8bce0] [c0000000002a15b0] t_start+0xf0/0x1c0
[c0002017fae8bd30] [c0000000004ec2b4] seq_read+0x184/0x640
[c0002017fae8bdd0] [c0000000004a57bc] sys_read+0x10c/0x300
[c0002017fae8be30] [c00000000000b388] system_call+0x5c/0x70
The test (ftrace_set_ftrace_filter.sh) is part of ftrace stress tests
and the crash happens when the test does 'cat
$TRACING_PATH/set_ftrace_filter'.
The address points to the second line below, in t_probe_next(), where
filter_hash is dereferenced:
hash = iter->probe->ops.func_hash->filter_hash;
size = 1 << hash->size_bits;
This happens due to a race with register_ftrace_function_probe(). A new
ftrace_func_probe is created and added into the func_probes list in
trace_array under ftrace_lock. However, before initializing the filter,
we drop ftrace_lock, and re-acquire it after acquiring regex_lock. If
another process is trying to read set_ftrace_filter, it will be able to
acquire ftrace_lock during this window and it will end up seeing a NULL
filter_hash.
Fix this by just checking for a NULL filter_hash in t_probe_next(). If
the filter_hash is NULL, then this probe is just being added and we can
simply return from here.
Link: http://lkml.kernel.org/r/05e021f757625cbbb006fad41380323dbe4e3b43.1562249521.git.naveen.n.rao@linux.vnet.ibm.com
Cc: stable@vger.kernel.org
Fixes: 7b60f3d876156 ("ftrace: Dynamically create the probe ftrace_ops for the trace_array")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit a124692b698b00026a58d89831ceda2331b2e1d0 ]
Custom trampolines can only be enabled if there is only a single ops
attached to it. If there's only a single callback registered to a function,
and the ops has a trampoline registered for it, then we can call the
trampoline directly. This is very useful for improving the performance of
ftrace and livepatch.
If more than one callback is registered to a function, the general
trampoline is used, and the custom trampoline is not restored back to the
direct call even if all the other callbacks were unregistered and we are
back to one callback for the function.
To fix this, set FTRACE_FL_TRAMP flag if rec count is decremented
to one, and the ops that left has a trampoline.
Testing After this patch :
insmod livepatch_unshare_files.ko
cat /sys/kernel/debug/tracing/enabled_functions
unshare_files (1) R I tramp: 0xffffffffc0000000(klp_ftrace_handler+0x0/0xa0) ->ftrace_ops_assist_func+0x0/0xf0
echo unshare_files > /sys/kernel/debug/tracing/set_ftrace_filter
echo function > /sys/kernel/debug/tracing/current_tracer
cat /sys/kernel/debug/tracing/enabled_functions
unshare_files (2) R I ->ftrace_ops_list_func+0x0/0x150
echo nop > /sys/kernel/debug/tracing/current_tracer
cat /sys/kernel/debug/tracing/enabled_functions
unshare_files (1) R I tramp: 0xffffffffc0000000(klp_ftrace_handler+0x0/0xa0) ->ftrace_ops_assist_func+0x0/0xf0
Link: http://lkml.kernel.org/r/1556969979-111047-1-git-send-email-cj.chengjian@huawei.com
Signed-off-by: Cheng Jian <cj.chengjian@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
ftrace_run_update_code()
commit d5b844a2cf507fc7642c9ae80a9d585db3065c28 upstream.
The commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text
permissions race") causes a possible deadlock between register_kprobe()
and ftrace_run_update_code() when ftrace is using stop_machine().
The existing dependency chain (in reverse order) is:
-> #1 (text_mutex){+.+.}:
validate_chain.isra.21+0xb32/0xd70
__lock_acquire+0x4b8/0x928
lock_acquire+0x102/0x230
__mutex_lock+0x88/0x908
mutex_lock_nested+0x32/0x40
register_kprobe+0x254/0x658
init_kprobes+0x11a/0x168
do_one_initcall+0x70/0x318
kernel_init_freeable+0x456/0x508
kernel_init+0x22/0x150
ret_from_fork+0x30/0x34
kernel_thread_starter+0x0/0xc
-> #0 (cpu_hotplug_lock.rw_sem){++++}:
check_prev_add+0x90c/0xde0
validate_chain.isra.21+0xb32/0xd70
__lock_acquire+0x4b8/0x928
lock_acquire+0x102/0x230
cpus_read_lock+0x62/0xd0
stop_machine+0x2e/0x60
arch_ftrace_update_code+0x2e/0x40
ftrace_run_update_code+0x40/0xa0
ftrace_startup+0xb2/0x168
register_ftrace_function+0x64/0x88
klp_patch_object+0x1a2/0x290
klp_enable_patch+0x554/0x980
do_one_initcall+0x70/0x318
do_init_module+0x6e/0x250
load_module+0x1782/0x1990
__s390x_sys_finit_module+0xaa/0xf0
system_call+0xd8/0x2d0
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(text_mutex);
lock(cpu_hotplug_lock.rw_sem);
lock(text_mutex);
lock(cpu_hotplug_lock.rw_sem);
It is similar problem that has been solved by the commit 2d1e38f56622b9b
("kprobes: Cure hotplug lock ordering issues"). Many locks are involved.
To be on the safe side, text_mutex must become a low level lock taken
after cpu_hotplug_lock.rw_sem.
This can't be achieved easily with the current ftrace design.
For example, arm calls set_all_modules_text_rw() already in
ftrace_arch_code_modify_prepare(), see arch/arm/kernel/ftrace.c.
This functions is called:
+ outside stop_machine() from ftrace_run_update_code()
+ without stop_machine() from ftrace_module_enable()
Fortunately, the problematic fix is needed only on x86_64. It is
the only architecture that calls set_all_modules_text_rw()
in ftrace path and supports livepatching at the same time.
Therefore it is enough to move text_mutex handling from the generic
kernel/trace/ftrace.c into arch/x86/kernel/ftrace.c:
ftrace_arch_code_modify_prepare()
ftrace_arch_code_modify_post_process()
This patch basically reverts the ftrace part of the problematic
commit 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module
text permissions race"). And provides x86_64 specific-fix.
Some refactoring of the ftrace code will be needed when livepatching
is implemented for arm or nds32. These architectures call
set_all_modules_text_rw() and use stop_machine() at the same time.
Link: http://lkml.kernel.org/r/20190627081334.12793-1-pmladek@suse.com
Fixes: 9f255b632bf12c4dd7 ("module: Fix livepatch/ftrace module text permissions race")
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
[
As reviewed by Miroslav Benes <mbenes@suse.cz>, removed return value of
ftrace_run_update_code() as it is a void function.
]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 46cc0b44428d0f0e81f11ea98217fc0edfbeab07 upstream.
Current snapshot implementation swaps two ring_buffers even though their
sizes are different from each other, that can cause an inconsistency
between the contents of buffer_size_kb file and the current buffer size.
For example:
# cat buffer_size_kb
7 (expanded: 1408)
# echo 1 > events/enable
# grep bytes per_cpu/cpu0/stats
bytes: 1441020
# echo 1 > snapshot // current:1408, spare:1408
# echo 123 > buffer_size_kb // current:123, spare:1408
# echo 1 > snapshot // current:1408, spare:123
# grep bytes per_cpu/cpu0/stats
bytes: 1443700
# cat buffer_size_kb
123 // != current:1408
And also, a similar per-cpu case hits the following WARNING:
Reproducer:
# echo 1 > per_cpu/cpu0/snapshot
# echo 123 > buffer_size_kb
# echo 1 > per_cpu/cpu0/snapshot
WARNING:
WARNING: CPU: 0 PID: 1946 at kernel/trace/trace.c:1607 update_max_tr_single.part.0+0x2b8/0x380
Modules linked in:
CPU: 0 PID: 1946 Comm: bash Not tainted 5.2.0-rc6 #20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
RIP: 0010:update_max_tr_single.part.0+0x2b8/0x380
Code: ff e8 dc da f9 ff 0f 0b e9 88 fe ff ff e8 d0 da f9 ff 44 89 ee bf f5 ff ff ff e8 33 dc f9 ff 41 83 fd f5 74 96 e8 b8 da f9 ff <0f> 0b eb 8d e8 af da f9 ff 0f 0b e9 bf fd ff ff e8 a3 da f9 ff 48
RSP: 0018:ffff888063e4fca0 EFLAGS: 00010093
RAX: ffff888066214380 RBX: ffffffff99850fe0 RCX: ffffffff964298a8
RDX: 0000000000000000 RSI: 00000000fffffff5 RDI: 0000000000000005
RBP: 1ffff1100c7c9f96 R08: ffff888066214380 R09: ffffed100c7c9f9b
R10: ffffed100c7c9f9a R11: 0000000000000003 R12: 0000000000000000
R13: 00000000ffffffea R14: ffff888066214380 R15: ffffffff99851060
FS: 00007f9f8173c700(0000) GS:ffff88806d000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000714dc0 CR3: 0000000066fa6000 CR4: 00000000000006f0
Call Trace:
? trace_array_printk_buf+0x140/0x140
? __mutex_lock_slowpath+0x10/0x10
tracing_snapshot_write+0x4c8/0x7f0
? trace_printk_init_buffers+0x60/0x60
? selinux_file_permission+0x3b/0x540
? tracer_preempt_off+0x38/0x506
? trace_printk_init_buffers+0x60/0x60
__vfs_write+0x81/0x100
vfs_write+0x1e1/0x560
ksys_write+0x126/0x250
? __ia32_sys_read+0xb0/0xb0
? do_syscall_64+0x1f/0x390
do_syscall_64+0xc1/0x390
entry_SYSCALL_64_after_hwframe+0x49/0xbe
This patch adds resize_buffer_duplicate_size() to check if there is a
difference between current/spare buffer sizes and resize a spare buffer
if necessary.
Link: http://lkml.kernel.org/r/20190625012910.13109-1-devel@etsukata.com
Cc: stable@vger.kernel.org
Fixes: ad909e21bbe69 ("tracing: Add internal tracing_snapshot() functions")
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 04e03d9a616c19a47178eaca835358610e63a1dd ]
The mapper may be NULL when called from register_ftrace_function_probe()
with probe->data == NULL.
This issue can be reproduced as follow (it may be covered by compiler
optimization sometime):
/ # cat /sys/kernel/debug/tracing/set_ftrace_filter
#### all functions enabled ####
/ # echo foo_bar:dump > /sys/kernel/debug/tracing/set_ftrace_filter
[ 206.949100] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 206.952402] Mem abort info:
[ 206.952819] ESR = 0x96000006
[ 206.955326] Exception class = DABT (current EL), IL = 32 bits
[ 206.955844] SET = 0, FnV = 0
[ 206.956272] EA = 0, S1PTW = 0
[ 206.956652] Data abort info:
[ 206.957320] ISV = 0, ISS = 0x00000006
[ 206.959271] CM = 0, WnR = 0
[ 206.959938] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000419f3a000
[ 206.960483] [0000000000000000] pgd=0000000411a87003, pud=0000000411a83003, pmd=0000000000000000
[ 206.964953] Internal error: Oops: 96000006 [#1] SMP
[ 206.971122] Dumping ftrace buffer:
[ 206.973677] (ftrace buffer empty)
[ 206.975258] Modules linked in:
[ 206.976631] Process sh (pid: 281, stack limit = 0x(____ptrval____))
[ 206.978449] CPU: 10 PID: 281 Comm: sh Not tainted 5.2.0-rc1+ #17
[ 206.978955] Hardware name: linux,dummy-virt (DT)
[ 206.979883] pstate: 60000005 (nZCv daif -PAN -UAO)
[ 206.980499] pc : free_ftrace_func_mapper+0x2c/0x118
[ 206.980874] lr : ftrace_count_free+0x68/0x80
[ 206.982539] sp : ffff0000182f3ab0
[ 206.983102] x29: ffff0000182f3ab0 x28: ffff8003d0ec1700
[ 206.983632] x27: ffff000013054b40 x26: 0000000000000001
[ 206.984000] x25: ffff00001385f000 x24: 0000000000000000
[ 206.984394] x23: ffff000013453000 x22: ffff000013054000
[ 206.984775] x21: 0000000000000000 x20: ffff00001385fe28
[ 206.986575] x19: ffff000013872c30 x18: 0000000000000000
[ 206.987111] x17: 0000000000000000 x16: 0000000000000000
[ 206.987491] x15: ffffffffffffffb0 x14: 0000000000000000
[ 206.987850] x13: 000000000017430e x12: 0000000000000580
[ 206.988251] x11: 0000000000000000 x10: cccccccccccccccc
[ 206.988740] x9 : 0000000000000000 x8 : ffff000013917550
[ 206.990198] x7 : ffff000012fac2e8 x6 : ffff000012fac000
[ 206.991008] x5 : ffff0000103da588 x4 : 0000000000000001
[ 206.991395] x3 : 0000000000000001 x2 : ffff000013872a28
[ 206.991771] x1 : 0000000000000000 x0 : 0000000000000000
[ 206.992557] Call trace:
[ 206.993101] free_ftrace_func_mapper+0x2c/0x118
[ 206.994827] ftrace_count_free+0x68/0x80
[ 206.995238] release_probe+0xfc/0x1d0
[ 206.995555] register_ftrace_function_probe+0x4a8/0x868
[ 206.995923] ftrace_trace_probe_callback.isra.4+0xb8/0x180
[ 206.996330] ftrace_dump_callback+0x50/0x70
[ 206.996663] ftrace_regex_write.isra.29+0x290/0x3a8
[ 206.997157] ftrace_filter_write+0x44/0x60
[ 206.998971] __vfs_write+0x64/0xf0
[ 206.999285] vfs_write+0x14c/0x2f0
[ 206.999591] ksys_write+0xbc/0x1b0
[ 206.999888] __arm64_sys_write+0x3c/0x58
[ 207.000246] el0_svc_common.constprop.0+0x408/0x5f0
[ 207.000607] el0_svc_handler+0x144/0x1c8
[ 207.000916] el0_svc+0x8/0xc
[ 207.003699] Code: aa0003f8 a9025bf5 aa0103f5 f946ea80 (f9400303)
[ 207.008388] ---[ end trace 7b6d11b5f542bdf1 ]---
[ 207.010126] Kernel panic - not syncing: Fatal exception
[ 207.011322] SMP: stopping secondary CPUs
[ 207.013956] Dumping ftrace buffer:
[ 207.014595] (ftrace buffer empty)
[ 207.015632] Kernel Offset: disabled
[ 207.017187] CPU features: 0x002,20006008
[ 207.017985] Memory Limit: none
[ 207.019825] ---[ end Kernel panic - not syncing: Fatal exception ]---
Link: http://lkml.kernel.org/r/20190606031754.10798-1-liwei391@huawei.com
Signed-off-by: Wei Li <liwei391@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 9f255b632bf12c4dd7fc31caee89aa991ef75176 ]
It's possible for livepatch and ftrace to be toggling a module's text
permissions at the same time, resulting in the following panic:
BUG: unable to handle page fault for address: ffffffffc005b1d9
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
PGD 3ea0c067 P4D 3ea0c067 PUD 3ea0e067 PMD 3cc13067 PTE 3b8a1061
Oops: 0003 [#1] PREEMPT SMP PTI
CPU: 1 PID: 453 Comm: insmod Tainted: G O K 5.2.0-rc1-a188339ca5 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014
RIP: 0010:apply_relocate_add+0xbe/0x14c
Code: fa 0b 74 21 48 83 fa 18 74 38 48 83 fa 0a 75 40 eb 08 48 83 38 00 74 33 eb 53 83 38 00 75 4e 89 08 89 c8 eb 0a 83 38 00 75 43 <89> 08 48 63 c1 48 39 c8 74 2e eb 48 83 38 00 75 32 48 29 c1 89 08
RSP: 0018:ffffb223c00dbb10 EFLAGS: 00010246
RAX: ffffffffc005b1d9 RBX: 0000000000000000 RCX: ffffffff8b200060
RDX: 000000000000000b RSI: 0000004b0000000b RDI: ffff96bdfcd33000
RBP: ffffb223c00dbb38 R08: ffffffffc005d040 R09: ffffffffc005c1f0
R10: ffff96bdfcd33c40 R11: ffff96bdfcd33b80 R12: 0000000000000018
R13: ffffffffc005c1f0 R14: ffffffffc005e708 R15: ffffffff8b2fbc74
FS: 00007f5f447beba8(0000) GS:ffff96bdff900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffc005b1d9 CR3: 000000003cedc002 CR4: 0000000000360ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
klp_init_object_loaded+0x10f/0x219
? preempt_latency_start+0x21/0x57
klp_enable_patch+0x662/0x809
? virt_to_head_page+0x3a/0x3c
? kfree+0x8c/0x126
patch_init+0x2ed/0x1000 [livepatch_test02]
? 0xffffffffc0060000
do_one_initcall+0x9f/0x1c5
? kmem_cache_alloc_trace+0xc4/0xd4
? do_init_module+0x27/0x210
do_init_module+0x5f/0x210
load_module+0x1c41/0x2290
? fsnotify_path+0x3b/0x42
? strstarts+0x2b/0x2b
? kernel_read+0x58/0x65
__do_sys_finit_module+0x9f/0xc3
? __do_sys_finit_module+0x9f/0xc3
__x64_sys_finit_module+0x1a/0x1c
do_syscall_64+0x52/0x61
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The above panic occurs when loading two modules at the same time with
ftrace enabled, where at least one of the modules is a livepatch module:
CPU0 CPU1
klp_enable_patch()
klp_init_object_loaded()
module_disable_ro()
ftrace_module_enable()
ftrace_arch_code_modify_post_process()
set_all_modules_text_ro()
klp_write_object_relocations()
apply_relocate_add()
*patches read-only code* - BOOM
A similar race exists when toggling ftrace while loading a livepatch
module.
Fix it by ensuring that the livepatch and ftrace code patching
operations -- and their respective permissions changes -- are protected
by the text_mutex.
Link: http://lkml.kernel.org/r/ab43d56ab909469ac5d2520c5d944ad6d4abd476.1560474114.git.jpoimboe@redhat.com
Reported-by: Johannes Erdfelt <johannes@erdfelt.com>
Fixes: 444d13ff10fb ("modules: add ro_after_init support")
Acked-by: Jessica Yu <jeyu@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
This reverts commit 8190d6fbb1e9b7fa4eb41fe7aa337c46ca514e79, which was
upstream commit 4a6c91fbdef846ec7250b82f2eeeb87ac5f18cf9.
On Tue, Jun 25, 2019 at 09:39:45AM +0200, Sebastian Andrzej Siewior wrote:
>Please backport commit e74deb11931ff682b59d5b9d387f7115f689698e to
>stable _or_ revert the backport of commit 4a6c91fbdef84 ("x86/uaccess,
>ftrace: Fix ftrace_likely_update() vs. SMAP"). It uses
>user_access_{save|restore}() which has been introduced in the following
>commit.
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 0c97bf863efce63d6ab7971dad811601e6171d2f upstream.
Starting with GCC 9, -Warray-bounds detects cases when memset is called
starting on a member of a struct but the size to be cleared ends up
writing over further members.
Such a call happens in the trace code to clear, at once, all members
after and including `seq` on struct trace_iterator:
In function 'memset',
inlined from 'ftrace_dump' at kernel/trace/trace.c:8914:3:
./include/linux/string.h:344:9: warning: '__builtin_memset' offset
[8505, 8560] from the object at 'iter' is out of the bounds of
referenced subobject 'seq' with type 'struct trace_seq' at offset
4368 [-Warray-bounds]
344 | return __builtin_memset(p, c, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In order to avoid GCC complaining about it, we compute the address
ourselves by adding the offsetof distance instead of referring
directly to the member.
Since there are two places doing this clear (trace.c and trace_kdb.c),
take the chance to move the workaround into a single place in
the internal header.
Link: http://lkml.kernel.org/r/20190523124535.GA12931@gmail.com
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
[ Removed unnecessary parenthesis around "iter" ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 4a6c91fbdef846ec7250b82f2eeeb87ac5f18cf9 ]
For CONFIG_TRACE_BRANCH_PROFILING=y the likely/unlikely things get
overloaded and generate callouts to this code, and thus also when
AC=1.
Make it safe.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit cbe08bcbbe787315c425dde284dcb715cfbf3f39 upstream.
When reading only part of the id file, the ppos isn't tracked correctly.
This is taken care by simple_read_from_buffer.
Reading a single byte, and then the next byte would result EOF.
While this seems like not a big deal, this breaks abstractions that
reads information from files unbuffered. See for example
https://github.com/golang/go/issues/29399
This code was mentioned as problematic in
commit cd458ba9d5a5
("tracing: Do not (ab)use trace_seq in event_id_read()")
An example C code that show this bug is:
#include <stdio.h>
#include <stdint.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
int main(int argc, char **argv) {
if (argc < 2)
return 1;
int fd = open(argv[1], O_RDONLY);
char c;
read(fd, &c, 1);
printf("First %c\n", c);
read(fd, &c, 1);
printf("Second %c\n", c);
}
Then run with, e.g.
sudo ./a.out /sys/kernel/debug/tracing/events/tcp/tcp_set_state/id
You'll notice you're getting the first character twice, instead of the
first two characters in the id file.
Link: http://lkml.kernel.org/r/20181231115837.4932-1-elazar@lightbitslabs.com
Cc: Orit Wasserman <orit.was@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 23725aeeab10b ("ftrace: provide an id file for each event")
Signed-off-by: Elazar Leibovich <elazar@lightbitslabs.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 5cf99a0f3161bc3ae2391269d134d6bf7e26f00e ]
The tracefs file set_graph_function is used to only function graph functions
that are listed in that file (or all functions if the file is empty). The
way this is implemented is that the function graph tracer looks at every
function, and if the current depth is zero and the function matches
something in the file then it will trace that function. When other functions
are called, the depth will be greater than zero (because the original
function will be at depth zero), and all functions will be traced where the
depth is greater than zero.
The issue is that when a function is first entered, and the handler that
checks this logic is called, the depth is set to zero. If an interrupt comes
in and a function in the interrupt handler is traced, its depth will be
greater than zero and it will automatically be traced, even if the original
function was not. But because the logic only looks at depth it may trace
interrupts when it should not be.
The recent design change of the function graph tracer to fix other bugs
caused the depth to be zero while the function graph callback handler is
being called for a longer time, widening the race of this happening. This
bug was actually there for a longer time, but because the race window was so
small it seldom happened. The Fixes tag below is for the commit that widen
the race window, because that commit belongs to a series that will also help
fix the original bug.
Cc: stable@kernel.org
Fixes: 39eb456dacb5 ("function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack")
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
|
|
commit 15fab63e1e57be9fdb5eec1bbc5916e9825e9acb upstream.
Change pipe_buf_get() to return a bool indicating whether it succeeded
in raising the refcount of the page (if the thing in the pipe is a page).
This removes another mechanism for overflowing the page refcount. All
callers converted to handle a failure.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d6097c9e4454adf1f8f2c9547c2fa6060d55d952 upstream.
Unless the very next line is schedule(), or implies it, one must not use
preempt_enable_no_resched(). It can cause a preemption to go missing and
thereby cause arbitrary delays, breaking the PREEMPT=y invariant.
Link: http://lkml.kernel.org/r/20190423200318.GY14281@hirez.programming.kicks-ass.net
Cc: Waiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: the arch/x86 maintainers <x86@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: stable@vger.kernel.org
Fixes: 2c2d7329d8af ("tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b987222654f84f7b4ca95b3a55eca784cb30235b upstream.
This fixes multiple issues in buffer_pipe_buf_ops:
- The ->steal() handler must not return zero unless the pipe buffer has
the only reference to the page. But generic_pipe_buf_steal() assumes
that every reference to the pipe is tracked by the page's refcount,
which isn't true for these buffers - buffer_pipe_buf_get(), which
duplicates a buffer, doesn't touch the page's refcount.
Fix it by using generic_pipe_buf_nosteal(), which refuses every
attempted theft. It should be easy to actually support ->steal, but the
only current users of pipe_buf_steal() are the virtio console and FUSE,
and they also only use it as an optimization. So it's probably not worth
the effort.
- The ->get() and ->release() handlers can be invoked concurrently on pipe
buffers backed by the same struct buffer_ref. Make them safe against
concurrency by using refcount_t.
- The pointers stored in ->private were only zeroed out when the last
reference to the buffer_ref was dropped. As far as I know, this
shouldn't be necessary anyway, but if we do it, let's always do it.
Link: http://lkml.kernel.org/r/20190404215925.253531-1-jannh@google.com
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Fixes: 73a757e63114d ("ring-buffer: Return reader page back into existing ring buffer")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 91862cc7867bba4ee5c8fcf0ca2f1d30427b6129 upstream.
In trace_pid_write(), the buffer for trace parser is allocated through
kmalloc() in trace_parser_get_init(). Later on, after the buffer is used,
it is then freed through kfree() in trace_parser_put(). However, it is
possible that trace_pid_write() is terminated due to unexpected errors,
e.g., ENOMEM. In that case, the allocated buffer will not be freed, which
is a memory leak bug.
To fix this issue, free the allocated buffer when an error is encountered.
Link: http://lkml.kernel.org/r/1555726979-15633-1-git-send-email-wang6495@umn.edu
Fixes: f4d34a87e9c10 ("tracing: Use pid bitmap instead of a pid array for set_event_pid")
Cc: stable@vger.kernel.org
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit fabe38ab6b2bd9418350284c63825f13b8a6abba upstream.
Mark ftrace mcount handler functions nokprobe since
probing on these functions with kretprobe pushes
return address incorrectly on kretprobe shadow stack.
Reported-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Tested-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/155094062044.6137.6419622920568680640.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 31b265b3baaf55f209229888b7ffea523ddab366 ]
As reported back in 2016-11 [1], the "ftdump" kdb command triggers a
BUG for "sleeping function called from invalid context".
kdb's "ftdump" command wants to call ring_buffer_read_prepare() in
atomic context. A very simple solution for this is to add allocation
flags to ring_buffer_read_prepare() so kdb can call it without
triggering the allocation error. This patch does that.
Note that in the original email thread about this, it was suggested
that perhaps the solution for kdb was to either preallocate the buffer
ahead of time or create our own iterator. I'm hoping that this
alternative of adding allocation flags to ring_buffer_read_prepare()
can be considered since it means I don't need to duplicate more of the
core trace code into "trace_kdb.c" (for either creating my own
iterator or re-preparing a ring allocator whose memory was already
allocated).
NOTE: another option for kdb is to actually figure out how to make it
reuse the existing ftrace_dump() function and totally eliminate the
duplication. This sounds very appealing and actually works (the "sr
z" command can be seen to properly dump the ftrace buffer). The
downside here is that ftrace_dump() fully consumes the trace buffer.
Unless that is changed I'd rather not use it because it means "ftdump
| grep xyz" won't be very useful to search the ftrace buffer since it
will throw away the whole trace on the first grep. A future patch to
dump only the last few lines of the buffer will also be hard to
implement.
[1] https://lkml.kernel.org/r/20161117191605.GA21459@google.com
Link: http://lkml.kernel.org/r/20190308193205.213659-1-dianders@chromium.org
Reported-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit e7f0c424d0806b05d6f47be9f202b037eb701707 upstream.
Commit d716ff71dd12 ("tracing: Remove taking of trace_types_lock in
pipe files") use the current tracer instead of the copy in
tracing_open_pipe(), but it forget to remove the freeing sentence in
the error path.
There's an error path that can call kfree(iter->trace) after the iter->trace
was assigned to tr->current_trace, which would be bad to free.
Link: http://lkml.kernel.org/r/1550060946-45984-1-git-send-email-yi.zhang@huawei.com
Cc: stable@vger.kernel.org
Fixes: d716ff71dd12 ("tracing: Remove taking of trace_types_lock in pipe files")
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9f0bbf3115ca9f91f43b7c74e9ac7d79f47fc6c2 upstream.
Because there may be random garbage beyond a string's null terminator,
it's not correct to copy the the complete character array for use as a
hist trigger key. This results in multiple histogram entries for the
'same' string key.
So, in the case of a string key, use strncpy instead of memcpy to
avoid copying in the extra bytes.
Before, using the gdbus entries in the following hist trigger as an
example:
# echo 'hist:key=comm' > /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
# cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist
...
{ comm: ImgDecoder #4 } hitcount: 203
{ comm: gmain } hitcount: 213
{ comm: gmain } hitcount: 216
{ comm: StreamTrans #73 } hitcount: 221
{ comm: mozStorage #3 } hitcount: 230
{ comm: gdbus } hitcount: 233
{ comm: StyleThread#5 } hitcount: 253
{ comm: gdbus } hitcount: 256
{ comm: gdbus } hitcount: 260
{ comm: StyleThread#4 } hitcount: 271
...
# cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist | egrep gdbus | wc -l
51
After:
# cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist | egrep gdbus | wc -l
1
Link: http://lkml.kernel.org/r/50c35ae1267d64eee975b8125e151e600071d4dc.1549309756.git.tom.zanussi@linux.intel.com
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 79e577cbce4c4 ("tracing: Support string type key properly")
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9e7382153f80ba45a0bbcd540fb77d4b15f6e966 upstream.
The following commit
441dae8f2f29 ("tracing: Add support for display of tgid in trace output")
removed the call to print_event_info() from print_func_help_header_irq()
which results in the ftrace header not reporting the number of entries
written in the buffer. As this wasn't the original intent of the patch,
re-introduce the call to print_event_info() to restore the orginal
behaviour.
Link: http://lkml.kernel.org/r/20190214152950.4179-1-quentin.perret@arm.com
Acked-by: Joel Fernandes <joelaf@google.com>
Cc: stable@vger.kernel.org
Fixes: 441dae8f2f29 ("tracing: Add support for display of tgid in trace output")
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0722069a5374b904ec1a67f91249f90e1cfae259 upstream.
When printing multiple uprobe arguments as strings the output for the
earlier arguments would also include all later string arguments.
This is best explained in an example:
Consider adding a uprobe to a function receiving two strings as
parameters which is at offset 0xa0 in strlib.so and we want to print
both parameters when the uprobe is hit (on x86_64):
$ echo 'p:func /lib/strlib.so:0xa0 +0(%di):string +0(%si):string' > \
/sys/kernel/debug/tracing/uprobe_events
When the function is called as func("foo", "bar") and we hit the probe,
the trace file shows a line like the following:
[...] func: (0x7f7e683706a0) arg1="foobar" arg2="bar"
Note the extra "bar" printed as part of arg1. This behaviour stacks up
for additional string arguments.
The strings are stored in a dynamically growing part of the uprobe
buffer by fetch_store_string() after copying them from userspace via
strncpy_from_user(). The return value of strncpy_from_user() is then
directly used as the required size for the string. However, this does
not take the terminating null byte into account as the documentation
for strncpy_from_user() cleary states that it "[...] returns the
length of the string (not including the trailing NUL)" even though the
null byte will be copied to the destination.
Therefore, subsequent calls to fetch_store_string() will overwrite
the terminating null byte of the most recently fetched string with
the first character of the current string, leading to the
"accumulation" of strings in earlier arguments in the output.
Fix this by incrementing the return value of strncpy_from_user() by
one if we did not hit the maximum buffer size.
Link: http://lkml.kernel.org/r/20190116141629.5752-1-andreas.ziegler@fau.de
Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 5baaa59ef09e ("tracing/probes: Implement 'memory' fetch method for uprobes")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ea6eb5e7d15e1838de335609994b4546e2abcaaf upstream.
The subsystem-specific message prefix for uprobes was also
"trace_kprobe: " instead of "trace_uprobe: " as described in
the original commit message.
Link: http://lkml.kernel.org/r/20190117133023.19292-1-andreas.ziegler@fau.de
Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 7257634135c24 ("tracing/probe: Show subsystem name in messages")
Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2840f84f74035e5a535959d5f17269c69fa6edc5 upstream.
The following commands will cause a memory leak:
# cd /sys/kernel/tracing
# mkdir instances/foo
# echo schedule > instance/foo/set_ftrace_filter
# rmdir instances/foo
The reason is that the hashes that hold the filters to set_ftrace_filter and
set_ftrace_notrace are not freed if they contain any data on the instance
and the instance is removed.
Found by kmemleak detector.
Cc: stable@vger.kernel.org
Fixes: 591dffdade9f ("ftrace: Allow for function tracing instance to filter functions")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3cec638b3d793b7cacdec5b8072364b41caeb0e1 upstream.
When create_event_filter() fails in set_trigger_filter(), the filter may
still be allocated and needs to be freed. The caller expects the
data->filter to be updated with the new filter, even if the new filter
failed (we could add an error message by setting set_str parameter of
create_event_filter(), but that's another update).
But because the error would just exit, filter was left hanging and
nothing could free it.
Found by kmemleak detector.
Cc: stable@vger.kernel.org
Fixes: bac5fb97a173a ("tracing: Add and use generic set_trigger_filter() implementation")
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 1efb6ee3edea57f57f9fb05dba8dcb3f7333f61f ]
A format string consisting of "%p" or "%s" followed by an invalid
specifier (e.g. "%p%\n" or "%s%") could pass the check which
would make format_decode (lib/vsprintf.c) to warn.
Fixes: 9c959c863f82 ("tracing: Allow BPF programs to call bpf_trace_printk()")
Reported-by: syzbot+1ec5c5ec949c4adaa0c4@syzkaller.appspotmail.com
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 83f365554e47997ec68dc4eca3f5dce525cd15c3 upstream.
When reducing ring buffer size, pages are removed by scheduling a work
item on each CPU for the corresponding CPU ring buffer. After the pages
are removed from ring buffer linked list, the pages are free()d in a
tight loop. The loop does not give up CPU until all pages are removed.
In a worst case behavior, when lot of pages are to be freed, it can
cause system stall.
After the pages are removed from the list, the free() can happen while
the work is rescheduled. Call cond_resched() in the loop to prevent the
system hangup.
Link: http://lkml.kernel.org/r/20180907223129.71994-1-vnagarnaik@google.com
Cc: stable@vger.kernel.org
Fixes: 83f40318dab00 ("ring-buffer: Make removal of ring buffer pages atomic")
Reported-by: Jason Behmer <jbehmer@google.com>
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 016f8ffc48cb01d1e7701649c728c5d2e737d295 upstream.
While debugging another bug, I was looking at all the synchronize*()
functions being used in kernel/trace, and noticed that trace_uprobes was
using synchronize_sched(), with a comment to synchronize with
{u,ret}_probe_trace_func(). When looking at those functions, the data is
protected with "rcu_read_lock()" and not with "rcu_read_lock_sched()". This
is using the wrong synchronize_*() function.
Link: http://lkml.kernel.org/r/20180809160553.469e1e32@gandalf.local.home
Cc: stable@vger.kernel.org
Fixes: 70ed91c6ec7f8 ("tracing/uprobes: Support ftrace_event_file base multibuffer")
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 757d9140072054528b13bbe291583d9823cde195 upstream.
Masami Hiramatsu reported:
Current trace-enable attribute in sysfs returns an error
if user writes the same setting value as current one,
e.g.
# cat /sys/block/sda/trace/enable
0
# echo 0 > /sys/block/sda/trace/enable
bash: echo: write error: Invalid argument
# echo 1 > /sys/block/sda/trace/enable
# echo 1 > /sys/block/sda/trace/enable
bash: echo: write error: Device or resource busy
But this is not a preferred behavior, it should ignore
if new setting is same as current one. This fixes the
problem as below.
# cat /sys/block/sda/trace/enable
0
# echo 0 > /sys/block/sda/trace/enable
# echo 1 > /sys/block/sda/trace/enable
# echo 1 > /sys/block/sda/trace/enable
Link: http://lkml.kernel.org/r/20180816103802.08678002@gandalf.local.home
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: cd649b8bb830d ("blktrace: remove sysfs_blk_trace_enable_show/store()")
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|