summaryrefslogtreecommitdiff
path: root/tools/perf/util/annotate.c
AgeCommit message (Collapse)Author
2024-04-08perf annotate: Check annotation lines more efficientlyNamhyung Kim
In some places, it checks annotated (disasm) lines for each byte. But as it already has a list of disasm lines, it'd be better to traverse the list entries instead of checking every offset with linear search (by annotated_source__get_line() helper). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240404175716.1225482-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-08perf annotate: Introduce annotated_source__get_line()Namhyung Kim
It's a helper function to get annotation_line at the given offset without using the offsets array. The goal is to get rid of the offsets array altogether. It just does the linear search but I think it's better to save memory as it won't be called in a hot path. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240404175716.1225482-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-08perf annotate: Staticize some local functionsNamhyung Kim
I found annotation__mark_jump_targets(), annotation__set_offsets() and annotation__init_column_widths() are only used in the same file. Let's make them static. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240404175716.1225482-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-08perf annotate: Fix annotation_calc_lines() to pass correct address to ↵Namhyung Kim
get_srcline() It should pass a proper address (i.e. suitable for objdump or addr2line) to get_srcline() in order to work correctly. It used to pass an address with map__rip_2objdump() as the second argument but later it's changed to use notes->start. It's ok in normal cases but it can be changed when annotate_opts.full_addr is set. So let's convert the address directly instead of using the notes->start. Also the last argument is an IP to print symbol offset if requested. So it should pass symbol-relative address. Fixes: 7d18a824b5e57ddd ("perf annotate: Toggle full address <-> offset display") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240404175716.1225482-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-03perf annotate: Initialize 'arch' variable not to trip some ↵Arnaldo Carvalho de Melo
-Werror=maybe-uninitialized In some older distros the build is failing due to -Werror=maybe-uninitialized, in this case we know that this isn't the case because 'arch' gets initialized by evsel__get_arch(), so make sure it is initialized to NULL before returning from evsel__get_arch(), as suggested by Ian Rogers. E.g.: 32 17.12 opensuse:15.5 : FAIL gcc version 7.5.0 (SUSE Linux) util/annotate.c: In function 'hist_entry__get_data_type': util/annotate.c:2269:15: error: 'arch' may be used uninitialized in this function [-Werror=maybe-uninitialized] struct arch *arch; ^~~~ cc1: all warnings being treated as errors 43 7.30 ubuntu:18.04-x-powerpc64el : FAIL gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) util/annotate.c: In function 'hist_entry__get_data_type': util/annotate.c:2351:36: error: 'arch' may be used uninitialized in this function [-Werror=maybe-uninitialized] if (map__dso(ms->map)->kernel && arch__is(arch, "x86") && ^~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/CAP-5=fUqtjxAsmdGrnkjhUTLHs-JvV10TtxyocpYDJK_+LYTiQ@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-03perf annotate: Split out util/disasm.cNamhyung Kim
The util/annotate.c code has both disassembly and sample annotation related codes. Factor out the disasm part so that it can be handled more easily. No functional changes intended. Committer notes: Add missing include env.h, util.h, bpf-event.h and bpf-util.h to disasm.c, to fix things like: util/disasm.c: In function ‘symbol__disassemble_bpf’: util/disasm.c:1203:9: error: implicit declaration of function ‘perf_exe’ [-Werror=implicit-function-declaration] 1203 | perf_exe(tpath, sizeof(tpath)); | ^~~~~~~~ Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240329215812.537846-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-03perf annotate: Add and use ins__is_nop()Namhyung Kim
Likewise, add ins__is_nop() to check if the current instruction is NOP. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240329215812.537846-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-04-03perf annotate: Use ins__is_xxx() if possibleNamhyung Kim
This is to prepare separation of disasm related code. Use the public ins API instead of checking the internal data structure. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240329215812.537846-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf annotate-data: Add stack canary typeNamhyung Kim
When the stack protector is enabled, compiler would generate code to check stack overflow with a special value called 'stack carary' at runtime. On x86_64, GCC hard-codes the stack canary as %gs:40. While there's a definition of fixed_percpu_data in asm/processor.h, it seems that the header is not included everywhere and many places it cannot find the type info. As it's in the well-known location (at %gs:40), let's add a pseudo stack canary type to handle it specially. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-22-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf annotate-data: Handle this-cpu variables in kernelNamhyung Kim
On x86, the kernel gets the current task using the current macro like below: #define current get_current() static __always_inline struct task_struct *get_current(void) { return this_cpu_read_stable(pcpu_hot.current_task); } So it returns the current_task field of struct pcpu_hot which is the first member. On my build, it's located at 0x32940. $ nm vmlinux | grep pcpu_hot 0000000000032940 D pcpu_hot And the current macro generates the instructions like below: mov %gs:0x32940, %rcx So the %gs segment register points to the beginning of the per-cpu region of this cpu and it points the variable with a constant. Let's update the instruction location info to have a segment register and handle %gs in kernel to look up a global variable. Pretend it as a global variable by changing the register number to DWARF_REG_PC. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-18-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf annotate: Parse x86 segment register locationNamhyung Kim
Add a segment field in the struct annotated_insn_loc and save it for the segment based addressing like %gs:0x28. For simplicity it now handles %gs register only. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-17-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf annotate-data: Add get_global_var_type()Namhyung Kim
Accessing global variable is common when it tracks execution later. Factor out the common code into a function for later use. It adds thread and cpumode to struct data_loc_info to find (global) symbols if needed. Also remove var_name as it's retrieved in the helper function. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-12-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf annotate-data: Add update_insn_state()Namhyung Kim
The update_insn_state() function is to update the type state table after processing each instruction. For now, it handles MOV (on x86) insn to transfer type info from the source location to the target. The location can be a register or a stack slot. Check carefully when memory reference happens and fetch the type correctly. It basically ignores write to a memory since it doesn't change the type info. One exception is writes to (new) stack slots for register spilling. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-11-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf annotate: Add annotate_get_basic_blocks()Namhyung Kim
The annotate_get_basic_blocks() is to find a list of basic blocks from the source instruction to the destination instruction in a function. It'll be used to find variables in a scope. Use BFS (Breadth First Search) to find a shortest path to carry the variable/register state minimally. Also change find_disasm_line() to be used in annotate_get_basic_blocks() and add 'allow_update' argument to control if it can update the IP. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-21perf annotate-data: Introduce 'struct data_loc_info'Namhyung Kim
The find_data_type() needs many information to describe the location of the data. Add the new 'struct data_loc_info' to pass those information at once. No functional changes intended. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240319055115.4063940-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-03-06perf annotate: Remove sym_hist.addr[] arrayNamhyung Kim
It's not used anymore and the code is coverted to use a hash map. Now sym_hist has a static size, so no need to have sizeof_sym_hist in the struct annotated_source. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240304230815.1440583-4-namhyung@kernel.org
2024-03-06perf annotate: Calculate instruction overhead using hashmapNamhyung Kim
Use annotated_source.samples hashmap instead of addr array in the struct sym_hist. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240304230815.1440583-3-namhyung@kernel.org
2024-03-06perf annotate: Add a hashmap for symbol histogramNamhyung Kim
Now symbol histogram uses an array to save per-offset sample counts. But it wastes a lot of memory if the symbol has a few samples only. Add a hashmap to save values only for actual samples. For now, it has duplicate histogram (one in the existing array and another in the new hash map). Once it can convert to use the hash in all places, we can get rid of the array later. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240304230815.1440583-2-namhyung@kernel.org
2024-01-22perf annotate-data: Support global variablesNamhyung Kim
Global variables are accessed using PC-relative address so it needs to be handled separately. The PC-rel addressing is detected by using DWARF_REG_PC. On x86, %rip register would be used. The address can be calculated using the ip and offset in the instruction. But it should start from the next instruction so add calculate_pcrel_addr() to do it properly. But global variables defined in a different file would only have a declaration which doesn't include a location list. So it first tries to get the type info using the address, and then looks up the variable declarations using name. The name of global variables should be get from the symbol table. The declaration would have the type info. So extend find_var_type() to take both address and name for global variables. The stat is now looks like: Annotate data type stats: total 294, ok 153 (52.0%), bad 141 (48.0%) ----------------------------------------------------------- 30 : no_sym 32 : no_mem_ops 61 : no_var 10 : no_typeinfo 8 : bad_offset Reviewed-by: Ian Rogers <irogers@google.com> Cc: Stephane Eranian <eranian@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20240117062657.985479-7-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-01-22perf annotate-data: Add stack operation pseudo typeNamhyung Kim
A typical function prologue and epilogue include multiple stack operations to save and restore the current value of registers. On x86, it looks like below: push r15 push r14 push r13 push r12 ... pop r12 pop r13 pop r14 pop r15 ret As these all touches the stack memory region, chances are high that they appear in a memory profile data. But these are not used for any real purpose yet so it'd return no types. One of my profile type shows that non neglible portion of data came from the stack operations. It also seems GCC generates more stack operations than clang. Annotate Instruction stats total 264, ok 169 (64.0%), bad 95 (36.0%) Name : Good Bad ----------------------------------------------------------- movq : 49 27 movl : 24 9 popq : 0 19 <-- here cmpl : 17 2 addq : 14 1 cmpq : 12 2 cmpxchgl : 3 7 Instead of dealing them as unknown, let's create a seperate pseudo type to represent those stack operations separately. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Stephane Eranian <eranian@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20240117062657.985479-5-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-01-22perf annotate-data: Handle array style accessesNamhyung Kim
On x86, instructions for array access often looks like below. mov 0x1234(%rax,%rbx,8), %rcx Usually the first register holds the type information and the second one has the index. And the current code only looks up a variable for the first register. But it's possible to be in the other way around so it needs to check the second register if the first one failed. The stat changed like this. Annotate data type stats: total 294, ok 148 (50.3%), bad 146 (49.7%) ----------------------------------------------------------- 30 : no_sym 32 : no_mem_ops 66 : no_var 10 : no_typeinfo 8 : bad_offset Reviewed-by: Ian Rogers <irogers@google.com> Cc: Stephane Eranian <eranian@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20240117062657.985479-4-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-01-22perf annotate-data: Handle macro fusion on x86Namhyung Kim
When a sample was come from a conditional branch without a memory operand, it could be due to a macro fusion with a previous instruction. So it needs to check the memory operand in the previous one. This improves the stat like below: Annotate data type stats: total 294, ok 147 (50.0%), bad 147 (50.0%) ----------------------------------------------------------- 30 : no_sym 32 : no_mem_ops 71 : no_var 6 : no_typeinfo 8 : bad_offset Reviewed-by: Ian Rogers <irogers@google.com> Cc: Stephane Eranian <eranian@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20240117062657.985479-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-01-22perf annotate-data: Parse 'lock' prefix from llvm-objdumpNamhyung Kim
For the performance reason, I prefer llvm-objdump over GNU's. But I found that llvm-objdump puts x86 lock prefix in a separate line like below. ffffffff81000695: f0 lock ffffffff81000696: ff 83 54 0b 00 00 incl 2900(%rbx) This should be parsed properly, but I just changed to find the insn with next offset for now. This improves the statistics as it can process more instructions. Annotate data type stats: total 294, ok 144 (49.0%), bad 150 (51.0%) ----------------------------------------------------------- 30 : no_sym 35 : no_mem_ops 71 : no_var 6 : no_typeinfo 8 : bad_offset Reviewed-by: Ian Rogers <irogers@google.com> Cc: Stephane Eranian <eranian@google.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20240117062657.985479-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-12-23perf annotate: Add --insn-stat option for debuggingNamhyung Kim
This is for a debugging purpose. It'd be useful to see per-instrucion level success/failure stats. $ perf annotate --data-type --insn-stat Annotate Instruction stats total 264, ok 143 (54.2%), bad 121 (45.8%) Name : Good Bad ----------------------------------------------------------- movq : 45 31 movl : 22 11 popq : 0 19 cmpl : 16 3 addq : 8 7 cmpq : 11 3 cmpxchgl : 3 7 cmpxchgq : 8 0 incl : 3 3 movzbl : 4 2 incq : 4 2 decl : 6 0 ... Committer notes: So these are about being able to find the type for accesses from these instructions, we should improve the naming, but it is for debugging, we can improve this later: @@ -3726,6 +3759,10 @@ struct annotated_data_type *hist_entry__get_data_type(struct hist_entry *he) continue; mem_type = find_data_type(ms, ip, op_loc->reg, op_loc->offset); + if (mem_type) + istat->good++; + else + istat->bad++; Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-18-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-23perf annotate: Add --type-stat option for debuggingNamhyung Kim
The --type-stat option is to be used with --data-type and to print detailed failure reasons for the data type annotation. $ perf annotate --data-type --type-stat Annotate data type stats: total 294, ok 116 (39.5%), bad 178 (60.5%) ----------------------------------------------------------- 30 : no_sym 40 : no_insn_ops 33 : no_mem_ops 63 : no_var 4 : no_typeinfo 8 : bad_offset Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-17-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-23perf annotate: Add --data-type optionNamhyung Kim
Support data type annotation with new --data-type option. It internally uses type sort key to collect sample histogram for the type and display every members like below. $ perf annotate --data-type ... Annotate type: 'struct cfs_rq' in [kernel.kallsyms] (13 samples): ============================================================================ samples offset size field 13 0 640 struct cfs_rq { 2 0 16 struct load_weight load { 2 0 8 unsigned long weight; 0 8 4 u32 inv_weight; }; 0 16 8 unsigned long runnable_weight; 0 24 4 unsigned int nr_running; 1 28 4 unsigned int h_nr_running; ... For simplicity it prints the number of samples per field for now. But it should be easy to show the overhead percentage instead. The number at the outer struct is a sum of the numbers of the inner members. For example, struct cfs_rq got total 13 samples, and 2 came from the load (struct load_weight) and 1 from h_nr_running. Similarly, the struct load_weight got total 2 samples and they all came from the weight field. I've added two new flags in the symbol_conf for this. The annotate_data_member is to get the members of the type. This is also needed for perf report with typeoff sort key. The annotate_data_sample is to update sample stats for each offset and used only in annotate. Currently it only support stdio output mode, TUI support can be added later. Committer testing: With the perf.data from the previous csets, a very simple, short duration one: # perf annotate --data-type Annotate type: 'struct list_head' in [kernel.kallsyms] (1 samples): ============================================================================ samples offset size field 1 0 16 struct list_head { 0 0 8 struct list_head* next; 1 8 8 struct list_head* prev; }; Annotate type: 'char' in [kernel.kallsyms] (1 samples): ============================================================================ samples offset size field 1 0 1 char ; # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-15-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-23perf report: Add 'typeoff' sort keyNamhyung Kim
The typeoff sort key shows the data type name, offset and the name of the field. This is useful to see which field in the struct is accessed most frequently. $ perf report -s type,typeoff --hierarchy --stdio ... # Overhead Data Type / Data Type Offset # ............ ............................ # ... 1.23% struct cfs_rq 0.19% struct cfs_rq +404 (throttle_count) 0.19% struct cfs_rq +0 (load.weight) 0.19% struct cfs_rq +336 (leaf_cfs_rq_list.next) 0.09% struct cfs_rq +272 (propagate) 0.09% struct cfs_rq +196 (removed.nr) 0.09% struct cfs_rq +80 (curr) 0.09% struct cfs_rq +544 (lt_b_children_throttled) 0.06% struct cfs_rq +320 (rq) Committer testing: Again with the perf.data from the previous csets: # perf report --stdio -s type,typeoff # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 4 of event 'cpu_atom/mem-loads,ldlat=30/P' # Event count (approx.): 7 # # Overhead Data Type Data Type Offset # ........ ......... ................ # 42.86% struct list_head struct list_head +8 (prev) 42.86% (unknown) (unknown) +0 (no field) 14.29% char char +0 (no field) # # (Tip: To see callchains in a more compact form: perf report -g folded) # # perf report --stdio -s dso,type,typeoff # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 4 of event 'cpu_atom/mem-loads,ldlat=30/P' # Event count (approx.): 7 # # Overhead Shared Object Data Type Data Type Offset # ........ .................... ......... ................ # 42.86% [kernel.kallsyms] struct list_head struct list_head +8 (prev) 28.57% libc.so.6 (unknown) (unknown) +0 (no field) 14.29% [kernel.kallsyms] char char +0 (no field) 14.29% ld-linux-x86-64.so.2 (unknown) (unknown) +0 (no field) # # (Tip: If you have debuginfo enabled, try: perf report -s sym,srcline) # # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-13-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-23perf annotate-data: Update sample histogram for typeNamhyung Kim
The annotated_data_type__update_samples() to get histogram for data type access. It'll be called by perf annotate to show which fields in the data type are accessed frequently. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-12-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-23perf annotate: Implement hist_entry__get_data_type()Namhyung Kim
It's the function to find out the type info from the given sample data and will be called from the hist_entry sort logic when 'type' sort key is used. It first calls objdump to disassemble the instructions and figure out information about memory access at the location. Maybe we can do it better by analyzing the instruction directly, but I'll leave it for later work. The memory access is determined by checking instruction operands to have "(" and then extract register name and offset. It'll return NULL if no data type is found. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-23perf annotate: Add annotate_get_insn_location()Namhyung Kim
The annotate_get_insn_location() is to get the detailed information of instruction locations like registers and offset. It has source and target operands locations in an array. Each operand can have a register and an offset. The offset is meaningful when mem_ref flag is set. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-23perf annotate: Factor out evsel__get_arch()Namhyung Kim
The evsel__get_arch() is to get architecture info from the environment. It'll be used by other places later so let's factor it out. Also add arch__is() to check the arch info by name. Committer notes: "get" is usually associated with refcounting, so we better rename this at some point to a better name. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231213001323.718046-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-07perf annotate: Get rid of local annotation optionsNamhyung Kim
It doesn't need the option in the struct annotation which is allocated for each symbol. It can directly use the global options and save 8 bytes per symbol. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231128175441.721579-9-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-07perf annotate: Remove remaining usages of local annotation optionsNamhyung Kim
So that it can get rid of the unused data. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231128175441.721579-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-07perf annotate: Ensure init/exit for global optionsNamhyung Kim
Now it only cares about the global options so it can just handle it without the argument. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231128175441.721579-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-07perf annotate: Use global annotation_optionsNamhyung Kim
Now it can directly use the global options and no need to pass it as an argument. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231128175441.721579-5-namhyung@kernel.org [ Fixup build with GTK2=1 ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-12-07perf annotate: Introduce global annotation_optionsNamhyung Kim
The annotation options are to control the behavior of objdump and the output. It's basically used by 'perf annotate' but 'perf report' and 'perf top' can call it on TUI dynamically. But it doesn't need to have a copy of annotation options in many places. As most of the work is done in the util/annotate.c file, add a global variable and set/use it instead of having their own copies. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231128175441.721579-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-11-27perf annotate: Check if operand has multiple regsNamhyung Kim
It needs to check all possible information in an instruction. Let's add a field indicating if the operand has multiple registers. I'll be used to search type information like in an array access on x86 like: mov 0x10(%rax,%rbx,8), %rcx ------------- here Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231012035111.676789-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-11-10perf annotate: Move raw_comment and raw_func_start fields out of 'struct ↵Namhyung Kim
ins_operands' Thoese two fields are used only for the jump_ops, so move them into the union to save some bytes. Also add jump__delete() callback not to free the fields as they didn't allocate new strings. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: WANG Rui <wangrui@loongson.cn> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231110000012.3538610-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-11-10perf annotate: Pass "-l" option to objdump conditionallyNamhyung Kim
The "-l" option is to print line numbers in the objdump output. perf annotate TUI only can show the line numbers later but it causes big slow downs for the kernel binary. Similarly, showing source code also takes a long time and it already has an option to control it. $ time objdump ... -d -S -C vmlinux > /dev/null real 0m3.474s user 0m3.047s sys 0m0.428s $ time objdump ... -d -l -C vmlinux > /dev/null real 0m1.796s user 0m1.459s sys 0m0.338s $ time objdump ... -d -C vmlinux > /dev/null real 0m0.051s user 0m0.036s sys 0m0.016s As it's not needed for data type profiling, let's make it conditional so that it can skip the unnecessary work. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: linux-toolchains@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Link: https://lore.kernel.org/r/20231110000012.3538610-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-11-09perf annotate: Move offsets array from 'struct annotation' to 'struct ↵Namhyung Kim
annotated_source' The offsets array keeps pointers to 'struct annotation_line' entries which are available in the 'struct annotated_source'. Let's move it to there. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231103191907.54531-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-11-09perf annotate: Move some source code related fields from 'struct annotation' ↵Namhyung Kim
to 'struct annotated_source' Some fields in the 'struct annotation' are only used with 'struct annotated_source' so better to be moved there in order to reduce memory consumption for other symbols. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231103191907.54531-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-11-09perf annotate: Move max_coverage from 'struct annotation' to 'struct ↵Namhyung Kim
annotated_branch' The max_coverage field is only used when branch stack info is available so it'd be natural to move to 'struct annotated_branch'. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231103191907.54531-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-11-09perf annotate: Split branch stack cycles info from 'struct annotation'Namhyung Kim
The cycles info is only meaningful when sample has branch stacks. To save the memory for normal cases, move those fields to a new 'struct annotated_branch' and dynamically allocate it when needed. Also move cycles_hist from annotated_source as it's related here. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231103191907.54531-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-11-09perf annotate: Split branch stack cycles information out of 'struct ↵Namhyung Kim
annotation_line' The cycles info is used only when branch stack is provided. Separate them from 'struct annotation_line' into a separate struct and lazy allocate them to save some memory. Committer notes: Make annotation__compute_ipc() check if the lazy allocation works, bailing out if so, its callers already do error checking and propagation. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231103191907.54531-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-08-03perf annotate bpf: Don't enclose non-debug code with an assert()Arnaldo Carvalho de Melo
In 616b14b47a86d880 ("perf build: Conditionally define NDEBUG") we started using NDEBUG=1 when DEBUG=1 isn't present, so code that is enclosed with assert() is not called. In dd317df072071903 ("perf build: Make binutil libraries opt in") we stopped linking against binutils-devel, for licensing reasons. Recently people asked me why annotation of BPF programs wasn't working, i.e. this: $ perf annotate bpf_prog_5280546344e3f45c_kfree_skb was returning: case SYMBOL_ANNOTATE_ERRNO__NO_LIBOPCODES_FOR_BPF: scnprintf(buf, buflen, "Please link with binutils's libopcode to enable BPF annotation"); This was on a fedora rpm, so its new enough that I had to try to test by rebuilding using BUILD_NONDISTRO=1, only to get it segfaulting on me. This combination made this libopcode function not to be called: assert(bfd_check_format(bfdf, bfd_object)); Changing it to: if (!bfd_check_format(bfdf, bfd_object)) abort(); Made it work, looking at this "check" function made me realize it changes the 'bfdf' internal state, i.e. we better call it. So stop using assert() on it, just call it and abort if it fails. Probably it is better to propagate the error, etc, but it seems it is unlikely to fail from the usage done so far and we really need to stop using libopcodes, so do the quick fix above and move on. With it we have BPF annotation back working when built with BUILD_NONDISTRO=1: ⬢[acme@toolbox perf-tools-next]$ perf annotate --stdio2 bpf_prog_5280546344e3f45c_kfree_skb | head No kallsyms or vmlinux with build-id 939bc71a1a51cdc434e60af93c7e734f7d5c0e7e was found Samples: 12 of event 'cpu-clock:ppp', 4000 Hz, Event count (approx.): 3000000, [percent: local period] bpf_prog_5280546344e3f45c_kfree_skb() bpf_prog_5280546344e3f45c_kfree_skb Percent int kfree_skb(struct trace_event_raw_kfree_skb *args) { nop 33.33 xchg %ax,%ax push %rbp mov %rsp,%rbp sub $0x180,%rsp push %rbx push %r13 ⬢[acme@toolbox perf-tools-next]$ Fixes: 6987561c9e86eace ("perf annotate: Enable annotation of BPF programs") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mohamed Mahmoud <mmahmoud@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Dave Tucker <datucker@redhat.com> Cc: Derek Barbosa <debarbos@redhat.com> Cc: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/lkml/ZMrMzoQBe0yqMek1@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-06-20perf annotate: Fix instruction association and parsing for LoongArchWANG Rui
In the perf annotate view for LoongArch, there is no arrowed line pointing to the target from the branch instruction. This issue is caused by incorrect instruction association and parsing. $ perf record alloc-6276705c94ad1398 # rust benchmark $ perf report 0.28 │ ori $a1, $zero, 0x63 │ move $a2, $zero 10.55 │ addi.d $a3, $a2, 1(0x1) │ sltu $a4, $a3, $s7 9.53 │ masknez $a4, $s7, $a4 │ sub.d $a3, $a3, $a4 12.12 │ st.d $a1, $fp, 24(0x18) │ st.d $a3, $fp, 16(0x10) 16.29 │ slli.d $a2, $a2, 0x2 │ ldx.w $a2, $s8, $a2 12.77 │ st.w $a2, $sp, 724(0x2d4) │ st.w $s0, $sp, 720(0x2d0) 7.03 │ addi.d $a2, $sp, 720(0x2d0) │ addi.d $a1, $a1, -1(0xfff) 12.03 │ move $a2, $a3 │ → bne $a1, $s3, -52(0x3ffcc) # 82ce8 <test::bench::Bencher::iter+0x3f4> 2.50 │ addi.d $a0, $a0, 1(0x1) This patch fixes instruction association issues, such as associating branch instructions with jump_ops instead of call_ops, and corrects false instruction matches. It also implements branch instruction parsing specifically for LoongArch. With this patch, we will be able to see the arrowed line. 0.79 │3ec: ori $a1, $zero, 0x63 │ move $a2, $zero 10.32 │3f4:┌─→addi.d $a3, $a2, 1(0x1) │ │ sltu $a4, $a3, $s7 10.44 │ │ masknez $a4, $s7, $a4 │ │ sub.d $a3, $a3, $a4 14.17 │ │ st.d $a1, $fp, 24(0x18) │ │ st.d $a3, $fp, 16(0x10) 13.15 │ │ slli.d $a2, $a2, 0x2 │ │ ldx.w $a2, $s8, $a2 11.00 │ │ st.w $a2, $sp, 724(0x2d4) │ │ st.w $s0, $sp, 720(0x2d0) 8.00 │ │ addi.d $a2, $sp, 720(0x2d0) │ │ addi.d $a1, $a1, -1(0xfff) 11.99 │ │ move $a2, $a3 │ └──bne $a1, $s3, 3f4 3.17 │ addi.d $a0, $a0, 1(0x1) Signed-off-by: WANG Rui <wangrui@loongson.cn> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: loongarch@lists.linux.dev Cc: loongson-kernel@lists.loongnix.cn Cc: Huacai Chen <chenhuacai@loongson.cn> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: WANG Xuerui <kernel@xen0n.name> Link: https://lore.kernel.org/r/20230620132025.105563-1-wangrui@loongson.cn Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-06-20perf annotation: Switch lock from a mutex to a sharded_mutexIan Rogers
Remove the "struct mutex lock" variable from annotation that is allocated per symbol. This removes in the region of 40 bytes per symbol allocation. Use a sharded mutex where the number of shards is set to the number of CPUs. Assuming good hashing of the annotation (done based on the pointer), this means in order to contend there needs to be more threads than CPUs, which is not currently true in any perf command. Were contention an issue it is straightforward to increase the number of shards in the mutex. On my Debian/glibc based machine, this reduces the size of struct annotation from 136 bytes to 96 bytes, or nearly 30%. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Andres Freund <andres@anarazel.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Yuan Can <yuancan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/r/20230615040715.2064350-2-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2023-06-16perf annotate: Work with vmlinux outside symfsVincent Whitchurch
It is currently possible to use --symfs along with a vmlinux which lies outside of the symfs by passing an absolute path to --vmlinux, thanks to the check in dso__load_vmlinux() which handles this explicitly. However, the annotate code lacks this check and thus 'perf annotate' does not work ("Internal error: Invalid -1 error code") for kernel functions with this combination. Add the missing handling. Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel@axis.com Link: https://lore.kernel.org/r/20221125114210.2353820-1-vincent.whitchurch@axis.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-06-12perf annotate: Allow whitespace between insn operandsNamhyung Kim
The llvm-objdump adds a space between the operands while GNU objdump does not. Allow a space to handle the both. In GNU objdump: Disassembly of section .text: here | ffffffff81000000 <_stext>: v ffffffff81000000: 48 8d 25 51 1f 40 01 lea 0x1401f51(%rip),%rsp ffffffff81000007: e8 d4 00 00 00 call ffffffff810000e0 <verify_cpu> ffffffff8100000c: 48 8d 3d ed ff ff ff lea -0x13(%rip),%rdi In llvm-objdump: Disassembly of section .text: here | ffffffff81000000 <startup_64>: v ffffffff81000000: 48 8d 25 51 1f 40 01 leaq 20979537(%rip), %rsp ffffffff81000007: e8 d4 00 00 00 callq 0xffffffff810000e0 <verify_cpu> ffffffff8100000c: 48 8d 3d ed ff ff ff leaq -19(%rip), %rdi Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230612230026.3887586-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-06-12perf srcline: Change free_srcline to zfree_srclineIan Rogers
Make use after free more unlikely. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Fangrui Song <maskray@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Babrou <ivan@cloudflare.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Ye Xingchen <ye.xingchen@zte.com.cn> Cc: Yuan Can <yuancan@huawei.com> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230608232823.4027869-26-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>