| Age | Commit message (Collapse) | Author |
|
Set in symbol__annotate() but never used.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.ibm.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Charlie Jenkins <charlie@rivosinc.com>
Cc: Collin Funk <collin.funk1@gmail.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Haibo Xu <haibo1.xu@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It can slowdown annotation browser if objdump is processing large DWARF
data. Let's add a hashmap to save the data type info for each line.
Note that this is needed for TUI only because stdio only processes each
line once. TUI will display the same line whenever it refreshes the
screen.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250816031635.25318-13-namhyung@kernel.org
[ Add lines around an if block and use zfree() in one case, acked by Namhyung ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It'd be great if it can get the correct debug information using DSO
build-Id not just the path name. Instead of adding new callsites of
debuginfo__new(), let's add dso__debuginfo() which can hide the access
using the pathname and help the future conversion.
Suggested-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250816031635.25318-12-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It's mostly unnecessary to print when it has no actual type information
like in the stack operations and canary. Let's have them if -v option
is given.
Before:
$ perf annotate --code-with-type
...
: 0 0xd640 <_dl_relocate_object>:
0.00 : 0: endbr64
0.00 : 4: pushq %rbp # data-type: (stack operation)
0.00 : 5: movq %rsp, %rbp
0.00 : 8: pushq %r15 # data-type: (stack operation)
0.00 : a: pushq %r14 # data-type: (stack operation)
0.00 : c: pushq %r13 # data-type: (stack operation)
0.00 : e: pushq %r12 # data-type: (stack operation)
0.00 : 10: pushq %rbx # data-type: (stack operation)
0.00 : 11: subq $0xf8, %rsp
...
0.00 : d4: testl %eax, %eax
0.00 : d6: jne 0xf424
0.00 : dc: movq 0xf0(%r14), %rbx # data-type: struct link_map +0xf0
0.00 : e3: testq %rbx, %rbx
0.00 : e6: jne 0xf2dd
0.00 : ec: cmpq $0, 0xf8(%r14) # data-type: struct link_map +0xf8
...
After:
: 0 0xd640 <_dl_relocate_object>:
0.00 : 0: endbr64
0.00 : 4: pushq %rbp
0.00 : 5: movq %rsp, %rbp
0.00 : 8: pushq %r15
0.00 : a: pushq %r14
0.00 : c: pushq %r13
0.00 : e: pushq %r12
0.00 : 10: pushq %rbx
0.00 : 11: subq $0xf8, %rsp
...
0.00 : d4: testl %eax, %eax
0.00 : d6: jne 0xf424
0.00 : dc: movq 0xf0(%r14), %rbx # data-type: struct link_map +0xf0
0.00 : e3: testq %rbx, %rbx
0.00 : e6: jne 0xf2dd
0.00 : ec: cmpq $0, 0xf8(%r14) # data-type: struct link_map +0xf8
...
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250816031635.25318-11-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Until now, the --code-with-type option is available only on stdio.
But it was an artifical limitation because of an implemention issue.
Implement the same logic in annotation_line__write() for stdio2/TUI
and remove the limitation and update the man page.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250816031635.25318-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Like other print functions, make disasm_line__write() return the number
of printed characters. It'll be used to skip unnecessary operations
when the screen is full.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250816031635.25318-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The width is updated after each part is printed. It can skip the output
processing if the total printed size is bigger than the width.
No function changes intended.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250816031635.25318-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It will be used for data type display later.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20250816031635.25318-5-namhyung@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Get rid of the internal function and convert function arguments into
local variables if they are used more than once.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250816031635.25318-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The start field is to control whether the output shows full address or
offset from the function start. But actually it can be changed
dynamically in annotation__toggle_full_addr(). The informaiton should
be available through struct annotation. Let's use it directly.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250816031635.25318-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Recently it uses llvm and capstone to speed up annotation or disassembly
of instructions. But they don't support source code view yet. Until it
fixed, we can force to use objdump for source code annotation.
To prevent performance loss, it's disabled by default and turned it on
when user requests it in TUI by pressing 's' key.
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250625230339.702610-1-namhyung@kernel.org
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
To avoid initial clutter, and not to change the view users that are not
interested in toggling the source code view, just show it when the user
does the first toggle keypress (pressing 's').
I know that there are users that really disable the source code view by
using:
# perf config annotate.hide_src_code=yes
Tested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/Z_TYux5fUg2pW-pF@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Sometimes it's useful to see both instructions and their data type
together. Let's extend the annotate code to use data type profiling
functions.
To make it easy to pass more argument, introduce a struct to carry
necessary information together. Also add a new annotation_option called
'code_with_type' to control the behavior. This is not enabled yet but
it'll be set later from the command line.
For simplicity, this is implemented for --stdio only.
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250310224925.799005-7-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
So that it can only handle a single disasm_linme and hopefully make the
code simpler. This is also a preparation to be called from different
places later.
The NO_TYPE macro was added to distinguish when it failed or needs retry.
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250310224925.799005-6-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
It's a prepartion to support code annotation and data type
annotation at the same time. Data type annotation needs more
information in the hist_entry so it needs to be passed deeper.
Also rename a function with the same name in the builtin-annotate.c
to hist_entry__stdio_annotate since it matches better to the command
line option. And change the condition inside to be simpler.
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250310224925.799005-5-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The annotation_line__print() has many arguments. But min_percent,
max_lines and percent_type are from struct annotaion_options. So let's
pass a pointer to the option instead of passing them separately to
reduce the number of function arguments.
Actually it has a recursive call if 'queue' is set. Add a new option
instance to pass different values for the case.
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250310224925.799005-4-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
It's not used anywhere, let's get rid of it.
Reviewed-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250310224925.799005-3-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Prior to this change a string was used which could cause issues with
an unrecognized disassembler in symbol__disassembler. Change to
initializing an array of perf_disassembler enum values. If a value
already exists then adding it a second time is ignored to avoid array
out of bounds problems present in the previous code, it also allows a
statically sized array and removes memory allocation needs. Errors in
the disassembler string are reported when the config is parsed during
perf annotate or perf top start up. If the array is uninitialized
after processing the config file the default llvm, capstone then
objdump values are added but without a need to parse a string.
Fixes: a6e8a58de629 ("perf disasm: Allow configuring what disassemblers to use")
Closes: https://lore.kernel.org/lkml/CAP-5=fUdfCyxmEiTpzS2uumUp3-SyQOseX2xZo81-dQtWXj6vA@mail.gmail.com/
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20250124043856.1177264-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
An evsel idx may not be stable due to sorting, evlist removal,
etc. Try to reduce it being part of APIs by explicitly passing the
evsel in annotate code. Internally the code just reads evsel->core.idx
so behavior is unchanged.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Chen Ni <nichen@iscas.ac.cn>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20250117181848.690474-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The perf tools annotation code used for a long time parsing the output
of binutils's objdump (or its reimplementations, like llvm's) to then
parse and augment it with samples, allow navigation, etc.
More recently disassemblers from the capstone and llvm (libraries, not
parsing the output of tools using those libraries to mimic binutils's
objdump output) were introduced.
So when all those methods are available, there is a static preference
for a series of attempts of disassembling a binary, with the 'llvm,
capstone, objdump' sequence being hard coded.
This patch allows users to change that sequence, specifying via a 'perf
config' 'annotate.disassemblers' entry which and in what order
disassemblers should be attempted.
As alluded to in the comments in the source code of this series, this
flexibility is useful for users and developers alike, elliminating the
requirement to rebuild the tool with some specific set of libraries to
see how the output of disassembling would be for one of these methods.
root@x1:~# rm -f ~/.perfconfig
root@x1:~# perf annotate -v --stdio2 update_load_avg
<SNIP>
symbol__disassemble:
filename=/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux,
sym=update_load_avg, start=0xffffffffb6148fe0, en>
annotating [0x6ff7170]
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux :
[0x7407ca0] update_load_avg
Disassembled with llvm
annotate.disassemblers=llvm,capstone,objdump
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent 0xffffffff81148fe0 <update_load_avg>:
1.61 pushq %r15
pushq %r14
1.00 pushq %r13
movl %edx,%r13d
1.90 pushq %r12
pushq %rbp
movq %rsi,%rbp
pushq %rbx
movq %rdi,%rbx
subq $0x18,%rsp
15.14 movl 0x1a4(%rdi),%eax
root@x1:~# perf config annotate.disassemblers=capstone
root@x1:~# cat ~/.perfconfig
# this file is auto-generated.
[annotate]
disassemblers = capstone
root@x1:~#
root@x1:~# perf annotate -v --stdio2 update_load_avg
<SNIP>
Disassembled with capstone
annotate.disassemblers=capstone
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent 0xffffffff81148fe0 <update_load_avg>:
1.61 pushq %r15
pushq %r14
1.00 pushq %r13
movl %edx,%r13d
1.90 pushq %r12
pushq %rbp
movq %rsi,%rbp
pushq %rbx
movq %rdi,%rbx
subq $0x18,%rsp
15.14 movl 0x1a4(%rdi),%eax
root@x1:~# perf config annotate.disassemblers=objdump,capstone
root@x1:~# perf config annotate.disassemblers
annotate.disassemblers=objdump,capstone
root@x1:~# cat ~/.perfconfig
# this file is auto-generated.
[annotate]
disassemblers = objdump,capstone
root@x1:~# perf annotate -v --stdio2 update_load_avg
Executing: objdump --start-address=0xffffffff81148fe0 \
--stop-address=0xffffffff811497aa \
-d --no-show-raw-insn -S -C "$1"
Disassembled with objdump
annotate.disassemblers=objdump,capstone
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent
Disassembly of section .text:
ffffffff81148fe0 <update_load_avg>:
#define DO_ATTACH 0x4
ffffffff81148fe0 <update_load_avg>:
#define DO_ATTACH 0x4
#define DO_DETACH 0x8
/* Update task and its cfs_rq load average */
static inline void update_load_avg(struct cfs_rq *cfs_rq,
struct sched_entity *se,
int flags)
{
1.61 push %r15
push %r14
1.00 push %r13
mov %edx,%r13d
1.90 push %r12
push %rbp
mov %rsi,%rbp
push %rbx
mov %rdi,%rbx
sub $0x18,%rsp
}
/* rq->task_clock normalized against any time
this cfs_rq has spent throttled */
static inline u64 cfs_rq_clock_pelt(struct cfs_rq *cfs_rq)
{
if (unlikely(cfs_rq->throttle_count))
15.14 mov 0x1a4(%rdi),%eax
root@x1:~#
After adding a way to select the disassembler from the command line a
'perf test' comparing the output of the various diassemblers should be
introduced, to test these codebases.
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Link: https://lore.kernel.org/r/20241111151734.1018476-4-acme@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Rather than pass 0/EM_NONE, use the value computed in the disasm
struct arch. Switch the EM_NONE case to EM_HOST, rewriting EM_NONE if
it were passed to get_dwarf_regnum. Pass a flags value as
architectures like csky need the flags to determine the ABI variant.
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Anup Patel <anup@brainfault.org>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Shenlin Liang <liangshenlin@eswincomputing.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Guilherme Amadio <amadio@gentoo.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Chen Pei <cp0613@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Aditya Gupta <adityag@linux.ibm.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Cc: Bibo Mao <maobibo@loongson.cn>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-csky@vger.kernel.org
Link: https://lore.kernel.org/r/20241108234606.429459-6-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
No event is printed in the "Branch Counter" column on hybrid machines.
For example,
$ perf record -e "{cpu_core/branch-instructions/pp,cpu_core/branches/}:S" -j any,counter
$ perf report --total-cycles
# Branch counter abbr list:
# cpu_core/branch-instructions/pp = A
# cpu_core/branches/ = B
# '-' No event occurs
# '+' Event occurrences may be lost due to branch counter saturated
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter
# ............... .............. ........... .......... ..............
44.54% 727.1K 0.00% 1 |+ |+ |
36.31% 592.7K 0.00% 2 |+ |+ |
17.83% 291.1K 0.00% 1 |+ |+ |
The branch counter information (br_cntr_width and br_cntr_nr) in the
perf_env is retrieved from the CPU_PMU_CAPS. However, the CPU_PMU_CAPS
is not available on hybrid machines. Without the width information, the
number of occurrences of an event cannot be calculated.
For a hybrid machine, the caps information should be retrieved from the
PMU_CAPS, and stored in the perf_env->pmu_caps.
Add a perf_env__find_br_cntr_info() to return the correct branch counter
information from the corresponding fields.
Committer notes:
While testing I couldn't s ee those "Branch counter" columns enabled by
pressing 'B' on the TUI, after reporting it to the list Kan explained
the situation:
<quote Kan Liang>
For a hybrid client, the "Branch Counter" feature is only supported
starting from the just released Lunar Lake. Perf falls back to only
"ANY" on your Raptor Lake.
The "The branch counter is not available" message is expected.
Here is the 'perf evlist' result from my Lunar Lake machine,
# perf evlist -v
cpu_core/branch-instructions/pp: type: 4 (cpu_core), size: 136, config: 0xc4 (branch-instructions), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|READ|PERIOD|BRANCH_STACK|IDENTIFIER, read_format: ID|GROUP|LOST, disabled: 1, freq: 1, enable_on_exec: 1, precise_ip: 2, sample_id_all: 1, exclude_guest: 1, branch_sample_type: ANY|COUNTERS
#
</quote>
Fixes: 6f9d8d1de2c61288 ("perf script: Add branch counters")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240909184201.553519-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
I found some portion of mem-store events sampled on CALL instruction
which has no memory access. But it actually saves a return address
into stack. It should be considered as a stack operation like RET
instruction.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240909214251.3033827-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Display the branch counter histogram in the annotation view.
Press 'B' to display the branch counter's abbreviation list as well.
Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }',
4000 Hz, Event count (approx.):
f3 /home/sdp/test/tchain_edit [Percent: local period]
Percent │ IPC Cycle Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%)
│ 0000000000401755 <f3>:
0.00 0.00 │ endbr64
│ push %rbp
│ mov %rsp,%rbp
│ movl $0x0,-0x4(%rbp)
0.00 0.00 │1.33 3 |A |- | ↓ jmp 25
11.03 11.03 │ 11: mov -0x4(%rbp),%eax
│ and $0x1,%eax
│ test %eax,%eax
17.13 17.13 │2.41 1 |A |- | ↓ je 21
│ addl $0x1,-0x4(%rbp)
21.84 21.84 │2.22 2 |AA |- | ↓ jmp 25
17.13 17.13 │ 21: addl $0x1,-0x4(%rbp)
21.84 21.84 │ 25: cmpl $0x270f,-0x4(%rbp)
11.03 11.03 │0.61 3 |A |- | ↑ jle 11
│ nop
│ pop %rbp
0.00 0.00 │0.24 20 |AA |B | ← ret
Originally-by: Tinghao Zhang <tinghao.zhang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240813160208.2493643-8-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Reusing the existing --total-cycles option to display the branch
counters. Add a new PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER to display
the logged branch counter events. They are shown right after all the
cycle-related annotations.
Extend the 'struct block_info' to store and pass the branch counter
related information.
The annotation_br_cntr_entry() is to print the histogram of each branch
counter event. If the number of logged events is less than 4, the exact
number of the abbr name is printed. Otherwise, using '+' to stands for
more than 3 events.
Assume the number of logged events is less than 4.
The annotation_br_cntr_abbr_list() prints the branch counter's
abbreviation list. Press 'B' to display the list in the TUI mode.
$ perf record -e "{branch-instructions:ppp,branch-misses}:S" -j any,counter
$ perf report --total-cycles --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }'
# Event count (approx.): 1610046
#
# Branch counter abbr list:
# branch-instructions:ppp = A
# branch-misses = B
# '-' No event occurs
# '+' Event occurrences may be lost due to branch counter saturated
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range]
# ............... .............. ........... .......... .............. ..................
#
57.55% 2.5M 0.00% 3 |A |- | ...
25.27% 1.1M 0.00% 2 |AA |- | ...
15.61% 667.2K 0.00% 1 |A |- | ...
0.16% 6.9K 0.81% 575 |A |- | ...
0.16% 6.8K 1.38% 977 |AA |- | ...
0.16% 6.8K 0.04% 28 |AA |B | ...
0.15% 6.6K 1.33% 946 |A |- | ...
0.11% 4.5K 0.06% 46 |AAA+|- | ...
0.10% 4.4K 0.88% 624 |A |- | ...
0.09% 3.7K 0.74% 524 |AAA+|B | ...
With -v applied,
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range]
# ............... .............. ........... .......... .............. ..................
#
57.55% 2.5M 0.00% 3 A=1 ,B=- ...
25.27% 1.1M 0.00% 2 A=2 ,B=- ...
15.61% 667.2K 0.00% 1 A=1 ,B=- ...
0.16% 6.9K 0.81% 575 A=1 ,B=- ...
0.16% 6.8K 1.38% 977 A=2 ,B=- ...
0.16% 6.8K 0.04% 28 A=2 ,B=1 ...
0.15% 6.6K 1.33% 946 A=1 ,B=- ...
0.11% 4.5K 0.06% 46 A=3+,B=- ...
0.10% 4.4K 0.88% 624 A=1 ,B=- ...
0.09% 3.7K 0.74% 524 A=3+,B=1 ...
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240813160208.2493643-7-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When annotating a basic block, it's useful to display the occurrences
of other events in the block.
The branch counter feature is only available for newer Intel platforms.
So a dedicated option to display the branch counters is not introduced.
Reuse the existing --total-cycles option, which triggers the annotation
of a basic block and displays the cycle-related annotation.
When the branch counters information is available, the branch counters
are automatically appended after all the cycle-related annotation.
Accounting the branch counters as well when accounting the cycles in
hist__account_cycles().
In 'struct annotated_branch', introduce a br_cntr array to save the
accumulation of each branch counter.
In a sample, all the branch counters for a branch are saved in a u64
space.
Because the saturation of a branch counter is small, e.g., for Intel
Sierra Forest, the saturation is only 3.
Add ANNOTATION__BR_CNTR_SATURATED_FLAG to indicate if a branch counter
once saturated. That can be used to indicate a potential event lost
because of the saturation.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240813160208.2493643-5-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
In find_data_type(), it creates and deletes a debug info whenver it
tries to find data type for a sample. This is inefficient and it most
likely accesses the same binary again and again.
Let's add a single entry cache the debug info structure for the last DSO.
Depending on sample data, it usually gives me 2~3x (and sometimes more)
speed ups.
Note that this will introduce a little difference in the output due to
the order of checking stack operations. It used to check the stack ops
before checking the availability of debug info but I moved it after the
symbol check. So it'll report stack operations in DSOs without debug
info as unknown. But I think it's ok and better to have the checking
near the caching logic.
Committer testing:
root@x1:~# perf mem record -a sleep 5s
root@x1:~# perf evlist
cpu_atom/mem-loads,ldlat=30/P
cpu_atom/mem-stores/P
dummy:u
root@x1:~# diff -u before after
--- before 2024-08-08 09:33:53.880780784 -0300
+++ after 2024-08-08 09:35:13.917325041 -0300
@@ -81,8 +81,8 @@
# Overhead Data Type
# ........ .........
#
- 55.43% (unknown)
- 11.61% (stack operation)
+ 55.56% (unknown)
+ 11.48% (stack operation)
4.93% struct pcpu_hot
3.26% unsigned int
2.48% struct
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240805234648.1453689-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Like in 'perf report', we want to hide empty events in the 'perf annotate'
output. This is consistent when the option is set in perf report.
For example, the following command would use 3 events including dummy.
$ perf mem record -a -- perf test -w noploop
$ perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
dummy:u
Just using perf annotate with --group will show the all 3 events.
$ perf annotate --group --stdio | head
Percent | Source code & Disassembly of ...
--------------------------------------------------------------
: 0 0xe060 <_dl_relocate_object>:
0.00 0.00 0.00 : e060: pushq %rbp
0.00 0.00 0.00 : e061: movq %rsp, %rbp
0.00 0.00 0.00 : e064: pushq %r15
0.00 0.00 0.00 : e066: movq %rdi, %r15
0.00 0.00 0.00 : e069: pushq %r14
0.00 0.00 0.00 : e06b: pushq %r13
0.00 0.00 0.00 : e06d: movl %edx, %r13d
Now with --skip-empty, it'll hide the last dummy event.
$ perf annotate --group --stdio --skip-empty | head
Percent | Source code & Disassembly of ...
------------------------------------------------------
: 0 0xe060 <_dl_relocate_object>:
0.00 0.00 : e060: pushq %rbp
0.00 0.00 : e061: movq %rsp, %rbp
0.00 0.00 : e064: pushq %r15
0.00 0.00 : e066: movq %rdi, %r15
0.00 0.00 : e069: pushq %r14
0.00 0.00 : e06b: pushq %r13
0.00 0.00 : e06d: movl %edx, %r13d
Committer testing:
root@x1:~# perf evlist
cpu_atom/mem-loads,ldlat=30/P
cpu_atom/mem-stores/P
dummy:u
root@x1:~#
Before:
root@x1:~# perf annotate --group --stdio2 do_lookup_x | head -25
Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 769079, [percent: local period]
do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2
Percent 0x9900 <do_lookup_x>:
pushq %rbp
movq %rsp,%rbp
pushq %r15
pushq %r14
pushq %r13
pushq %r12
pushq %rbx
subq $0x88,%rsp
movq %rdi,-0x50(%rbp)
movl 8(%r9),%edi
movq 0x10(%rbp),%r12
movq 0x28(%rbp),%r10
movq %rdx,-0x70(%rbp)
movq %rcx,-0x58(%rbp)
movq %rdi,%r11
0.00 5.73 0.00 movq %r8,-0x68(%rbp)
movq (%r9),%r8
movl %esi,%eax
8.30 0.00 0.00 movl 0x30(%rbp),%r9d
movl %esi,%r15d
shrl $6, %eax
movq %r8,%r13
root@x1:~#
After:
root@x1:~# perf annotate --group --skip-empty --stdio2 do_lookup_x | head -25
Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 769079, [percent: local period]
do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2
Percent 0x9900 <do_lookup_x>:
pushq %rbp
movq %rsp,%rbp
pushq %r15
pushq %r14
pushq %r13
pushq %r12
pushq %rbx
subq $0x88,%rsp
movq %rdi,-0x50(%rbp)
movl 8(%r9),%edi
movq 0x10(%rbp),%r12
movq 0x28(%rbp),%r10
movq %rdx,-0x70(%rbp)
movq %rcx,-0x58(%rbp)
movq %rdi,%r11
0.00 5.73 movq %r8,-0x68(%rbp)
movq (%r9),%r8
movl %esi,%eax
8.30 0.00 movl 0x30(%rbp),%r9d
movl %esi,%r15d
shrl $6, %eax
movq %r8,%r13
root@x1:~#
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240803211332.1107222-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The annotation__pcnt_width() calculates the screen width for the
overhead (percent) area considering event groups properly. Use this
function consistently so that we can make sure it has similar output
in different modes. But there's a difference in stdio and tui output:
stdio uses 8 and tui uses 7 for a percent.
Let's use 8 and adjust the print width in __annotation_line__write()
properly.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240803211332.1107222-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We want to use it in different places so make sure it sets properly
in symbol__annotate() before creating the disasm lines.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240803211332.1107222-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The data_nr keeps the number of entries in al->data[] so it should use
it when it iterates the array. The notes->src->nr_events should have
the same number but it'd be natural to use al->data_nr.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240803211332.1107222-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
instruction
Since the "ins.name" is not set while using raw instruction,
'perf annotate' with insn-stat gives wrong data:
Result from "./perf annotate --data-type --insn-stat":
Annotate Instruction stats
total 615, ok 419 (68.1%), bad 196 (31.9%)
Name : Good Bad
-----------------------------------------------------------
: 419 196
This patch sets "dl->ins.name" in arch specific function
"check_ppc_insn" while initialising "struct disasm_line".
Also update "ins_find" function to pass "struct disasm_line" as a
parameter so as to set its name field in arch specific call.
With the patch changes:
Annotate Instruction stats
total 609, ok 446 (73.2%), bad 163 (26.8%)
Name/opcode : Good Bad
-----------------------------------------------------------
58 : 323 80
32 : 49 43
34 : 33 11
OP_31_XOP_LDX : 8 20
40 : 23 0
OP_31_XOP_LWARX : 5 1
OP_31_XOP_LWZX : 2 3
OP_31_XOP_LDARX : 3 0
33 : 0 2
OP_31_XOP_LBZX : 0 1
OP_31_XOP_LWAX : 0 1
OP_31_XOP_LHZX : 0 1
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Akanksha J N <akanksha@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Link: https://lore.kernel.org/lkml/20240718084358.72242-16-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
instruction on powerpc
Use the raw instruction code and macros to identify memory instructions,
extract register fields and also offset.
The implementation addresses the D-form, X-form, DS-form instructions.
Adds "mem_ref" field to check whether source/target has memory
reference.
Add function "get_powerpc_regs" which will set these fields: reg1, reg2,
offset depending of where it is source or target ops.
Update "parse" callback for "struct ins_ops" to also pass "struct
disasm_line" as argument. This is needed in parse functions where opcode
is used to determine whether to set multi_regs and other fields
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Akanksha J N <akanksha@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Link: https://lore.kernel.org/lkml/20240718084358.72242-7-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
A symbol can have no samples, then accessing the annotated_source->samples
hashmap will result in a segfault.
Fixes: a3f7768bcf48281d ("perf annotate: Fix memory leak in annotated_source")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240510210452.2449944-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Freeing hash map doesn't free the entries added to the hashmap, add
the missing free().
Fixes: d3e7cad6f36d9e80 ("perf annotate: Add a hashmap for symbol histogram")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Li Dong <lidong@vivo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240507183545.1236093-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Fix a comment in function which explains how multi_regs field gets set
for an instruction. In the example, "mov %rsi, 8(%rbx,%rcx,4)", the
comment mistakenly referred to "dst_multi_regs = 0". Correct it to use
"src_multi_regs = 0"
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Akanksha J N <akanksha@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Link: https://lore.kernel.org/r/20240506121906.76639-4-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When freeing a->b it is good practice to set a->b to NULL using
zfree(&a->b) so that when we have a bug where a reference to a freed 'a'
pointer is kept somewhere, we can more quickly cause a segfault if some
code tries to use a->b.
This is mostly done but some new cases were introduced recently, convert
them to zfree().
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZjmbHHrjIm5YRIBv@x1
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Add reference count checking to struct dso, this can help with
implementing correct reference counting discipline. To avoid
RC_CHK_ACCESS everywhere, add accessor functions for the variables in
struct dso.
The majority of the change is mechanical in nature and not easy to
split up.
Committer testing:
'perf test' up to this patch shows no regressions.
But:
util/symbol.c: In function ‘dso__load_bfd_symbols’:
util/symbol.c:1683:9: error: too few arguments to function ‘dso__set_adjust_symbols’
1683 | dso__set_adjust_symbols(dso);
| ^~~~~~~~~~~~~~~~~~~~~~~
In file included from util/symbol.c:21:
util/dso.h:268:20: note: declared here
268 | static inline void dso__set_adjust_symbols(struct dso *dso, bool val)
| ^~~~~~~~~~~~~~~~~~~~~~~
make[6]: *** [/home/acme/git/perf-tools-next/tools/build/Makefile.build:106: /tmp/tmp.ZWHbQftdN6/util/symbol.o] Error 1
MKDIR /tmp/tmp.ZWHbQftdN6/tests/workloads/
make[6]: *** Waiting for unfinished jobs....
This was updated:
- symbols__fixup_end(&dso->symbols, false);
- symbols__fixup_duplicate(&dso->symbols);
- dso->adjust_symbols = 1;
+ symbols__fixup_end(dso__symbols(dso), false);
+ symbols__fixup_duplicate(dso__symbols(dso));
+ dso__set_adjust_symbols(dso);
But not build tested with BUILD_NONDISTRO and libbfd devel files installed
(binutils-devel on fedora).
Add the missing argument:
symbols__fixup_end(dso__symbols(dso), false);
symbols__fixup_duplicate(dso__symbols(dso));
- dso__set_adjust_symbols(dso);
+ dso__set_adjust_symbols(dso, true);
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Chengen Du <chengen.du@canonical.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Dima Kogan <dima@secretsauce.net>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Li Dong <lidong@vivo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20240504213803.218974-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
'perf report' TUI
As it removed the sample accounting for code when no symbol sort key is
given for 'perf report' TUI, it might not have allocated the
'struct annotated_source' yet. Let's check if it's NULL first.
Fixes: 6cdd977ec24e1538 ("perf report: Do not collect sample histogram unnecessarily")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240424230015.1054013-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick up fixes sent via perf-tools, by Namhyung Kim.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The data type profiling alone doesn't need the sample histogram for
functions. It only needs the histogram for the types.
Let's remove the condition in the report_callback to check if data type
profiling is selected and make sure the annotation has the 'struct
annotated_source' instantiated before calling symbol__disassemble().
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240411033256.2099646-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It's a pseudo data type and has no field.
Fixes: b3c95109c131fcc9 ("perf annotate-data: Add stack canary type")
Closes: https://lore.kernel.org/lkml/Zhb6jJneP36Z-or0@x1
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240411033256.2099646-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Now symbol__annotate() is reentrant and it doesn't need to remove
non-instruction lines. Let's get rid of symbol__ensure_annotate() and
call symbol__annotate() directly. Also we can use it to get the arch
pointer instead of calling evsel__get_arch() directly.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240405211800.1412920-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
For data type profiling, it removed non-instruction lines from the list
of annotation lines. It was to simplify the implementation dealing with
instructions like to calculate the PC-relative address and to search the
shortest path to the target instruction or basic block.
But it means that it removes all the comments and debug information in
the annotate output like source file name and line numbers. To support
both code annotation and data type annotation, it'd be better to keep
the non-instruction lines as well.
So this change is to skip those lines during the data type profiling
and to display them in the normal perf annotate output.
No function changes intended (other than having more lines).
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240405211800.1412920-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The symbol__annotate2() initializes some data structures needed by TUI.
It has a logic to prevent calling it multiple times by checking if it
has the annotated source. But data type profiling uses a different
code (symbol__annotate) to allocate the annotated lines in advance.
So TUI missed to call symbol__annotate2() when it shows the annotation
browser.
Make symbol__annotate() reentrant and handle that situation properly.
This fixes a crash in the annotation browser started by perf report in
TUI like below.
$ perf report -s type,sym --tui
# and press 'a' key and then move down
Fixes: 81e57deec325 ("perf report: Support data type profiling")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240405211800.1412920-2-namhyung@kernel.org
|
|
It's only used in 'perf annotate' output which means functions with actual
samples. No need to consume memory for every symbol ('struct annotation').
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240404175716.1225482-10-namhyung@kernel.org
Cc: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: <linux-perf-users@vger.kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It's only used in 'perf annotate' output which means functions with actual
samples. No need to consume memory for every symbol ('struct annotation').
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240404175716.1225482-9-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It's only used in 'perf annotate' output which means functions with actual
samples. No need to consume memory for every symbol ('struct annotation').
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240404175716.1225482-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It's only used in 'perf annotate' output which means functions with
actual samples. No need to consume memory for every symbol
('struct annotation').
Also move the 'max_line_len' field into it as it's related.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240404175716.1225482-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The struct annotated_source.offsets[] is to save pointers to
annotation_line at each offset. We can use annotated_source__get_line()
helper instead so let's get rid of the array.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240404175716.1225482-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|