summaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)Author
2025-10-20perf jevents: Fix build when there are other json files in the treeJames Clark
The unquoted glob *.json will expand to a real file if, for example, there is any file in the Perf source ending in .json. This can happen when using tools like Bear and clangd which generate a compile_commands.json file. With the glob already expanded by the shell, the find command will fail to wildcard any real json events files. Fix it by wrapping the star in quotes so it's passed to find rather than the shell. This fixes the following build error (most of the diff output omitted): $ make V=1 -C tools/perf O=/tmp/perf_build_with_json TEST /tmp/perf_build_with_json/pmu-events/empty-pmu-events.log ... /* offset=121053 */ "node-access\000legacy cache\000Local memory read accesses\000legacy-cache-config=6\000\00010\000\000\000\000\000" /* offset=121135 */ "node-misses\000legacy cache\000Local memory read misses\000legacy-cache-config=0x10006\000\00010\000\000\000\000\000" /* offset=121221 */ "node-miss\000legacy cache\000Local memory read misses\000legacy-cache-config=0x10006\000\00010\000\000\000\000\000" ... - { .event_table = { 0, 0 }, .metric_table = { 0, 0 }, }, make[3]: *** [pmu-events/Build:54: /tmp/perf_build_with_json/pmu-events/empty-pmu-events.log] Error 1 Fixes: 4bb55de4ff03 ("perf jevents: Support copying the source json files to OUTPUT") Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Leo Yan <leo.yan@arm.com> Tested-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-20perf parse-events: Make X modifier more respectful of groupsIan Rogers
Events with an X modifier were reordered within a group, for example slots was made the leader in: ``` $ perf record -e '{cpu/mem-stores/ppu,cpu/slots/uX}' -- sleep 1 ``` Fix by making `dont_regroup` evsels always use their index for sorting. Make the cur_leader, when fixing the groups, be that of `dont_regroup` evsel so that the `dont_regroup` evsel doesn't become a leader. On a tigerlake this patch corrects this and meets expectations in: ``` $ perf stat -e '{cpu/mem-stores/,cpu/slots/uX}' -a -- sleep 0.1 Performance counter stats for 'system wide': 83,458,652 cpu/mem-stores/ 2,720,854,880 cpu/slots/uX 0.103780587 seconds time elapsed $ perf stat -e 'slots,slots:X' -a -- sleep 0.1 Performance counter stats for 'system wide': 732,042,247 slots (48.96%) 643,288,155 slots:X (51.04%) 0.102731018 seconds time elapsed ``` Closes: https://lore.kernel.org/lkml/18f20d38-070c-4e17-bc90-cf7102e1e53d@linux.intel.com/ Fixes: 035c17893082 ("perf parse-events: Add 'X' modifier to exclude an event from being regrouped") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-19perf c2c annotate: Start from the contention lineTianyou Li
Add support to highlight the contention line in the annotate browser, use 'TAB'/'UNTAB' to refocus to the contention line. Signed-off-by: Tianyou Li <tianyou.li@intel.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Reviewed-by: Thomas Falcon <thomas.falcon@intel.com> Reviewed-by: Jiebin Sun <jiebin.sun@intel.com> Reviewed-by: Pan Deng <pan.deng@intel.com> Reviewed-by: Zhiguo Zhou <zhiguo.zhou@intel.com> Reviewed-by: Wangyang Guo <wangyang.guo@intel.com> Tested-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-19perf c2c: Add annotation support to perf c2c reportTianyou Li
Perf c2c report currently specified the code address and source:line information in the cacheline browser, while it is lack of annotation support like perf report to directly show the disassembly code for the particular symbol shared that same cacheline. This patches add a key 'a' binding to the cacheline browser which reuse the annotation browser to show the disassembly view for easier analysis of cacheline contentions. Signed-off-by: Tianyou Li <tianyou.li@intel.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Reviewed-by: Thomas Falcon <thomas.falcon@intel.com> Reviewed-by: Jiebin Sun <jiebin.sun@intel.com> Reviewed-by: Pan Deng <pan.deng@intel.com> Reviewed-by: Zhiguo Zhou <zhiguo.zhou@intel.com> Reviewed-by: Wangyang Guo <wangyang.guo@intel.com> Tested-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-19perf stat bperf cgroup: Increase MAX_EVENTS from 32 to 1024Ian Rogers
The MAX_EVENTS value ensured a counted loop presumably to satisfy the BPF verifier. It is possible to go past 32 events when gathering uncore events. Increase the amount to 1024 as that should provide some amount of headroom. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-19perf ilist: Add PMU information to metricsIan Rogers
Duplicate metrics may exist on hybrid platforms, with the metric's PMU being used to select the metric to use. Incorporate the metric PMU into the ilist display and support opening it just for a given PMU. Before: ``` ⭘ Interactive Perf List ├── ▼ TopdownL1 tma_backend_bound │ ├── tma_backend_bound Counts the total number of issue slots that were │ ├── ▶ tma_backend_bound_group not consumed by the backend due to backend stalls │ ├── tma_backend_bound Counts the total number of issue slots that were │ ├── ▶ tma_backend_bound_group not consumed by the backend due to backend stalls. │ ├── tma_bad_speculation Note that uops must be available for consumption │ ├── ▶ tma_bad_speculation_group in order for this event to count. If a uop is not │ ├── tma_bad_speculation available (IQ is empty), this event will not count │ ├── ▶ tma_bad_speculation_group cpu_atom@TOPDOWN_BE_BOUND.ALL@ / (5 * │ ├── tma_frontend_bound cpu_atom@CPU_CLK_UNHALTED.CORE@) │ ├── ▶ tma_frontend_bound_group tma_backend_bound > 0.1 │ ├── tma_frontend_bound ▆▆ │ ├── ▶ tma_frontend_bound_group │ ├── tma_retiring │ ├── ▶ tma_retiring_group │ ├── tma_retiring │ └── ▶ tma_retiring_group ├── ▶ TopdownL2 total▄▄▅▅▆▅▅▂▁▁▁▁▂▃▂▂▃▄▄▇▇█▆▆▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▄▅▅▅▄▆▆▆▅▅▅▅▅▅▇▇▇▇▆▅▆▆▆▆▅▅▅▄▃▃▃▃▃▃▃▃▃▃▄▄▄▅▅▅▅▅▆▆▆▆▆▆ cpu0▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ cpu1▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▃▃▃▃▃▃▃▃▃▃▄▄▄▄▄▄▄▄▄▄▅▅▅▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ cpu2▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇█████▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ cpu3▁▁▁▁▁▁▁▁▁▄▄▄▄▄▄▄▄▄▄█████▆▆▆▆▆▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅ cpu4████▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ cpu5▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▇▇▇▇▇▇▇▇▇▇▆▆ cpu6▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇█████▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ cpu7▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▃▃▃▃▃▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ cpu8▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ cpu9▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂█████▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅ cpu10▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ cpu11▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇█████▁▁▁▁▁▁▁▁▁ cpu12▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ cpu13▁▁▁▁▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ cpu14▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇█████▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ cpu15▁▁▁▁▁▁▁▁▁▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ cpu16▃▃▃▃▃▃▃▃▃▄▄▃▃▃▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▄▃▃▃▃▃▃▃▃▄▄▄▄▄▄▁▁▃▃▃▃▃▃▃▃▃▃▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ cpu17▁▁▁▁▁▄▄▅▅▅▅▅▅▅▅▄▄▄▄▄▄▄▄▃▃▃▃▂▂▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▃▃▄▄▄▄▃▃▃▃▂▂▂▂▄▄▄▄▄▄▆▆▆▆▆▆▆▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ s Search n Next p Previous c Collapse ^q Quit ▏^p palette ``` After: ``` ⭘ Interactive Perf List ├── ▼ TopdownL1 tma_backend_bound │ ├── tma_backend_bound (cpu_atom) Counts the total number of issue slots that were │ ├── ▼ tma_backend_bound_group (cpu_atom) not consumed by the backend due to backend stalls │ │ ├── tma_core_bound (cpu_atom) Counts the total number of issue slots that were │ │ ├── ▶ tma_core_bound_group (cpu_atom not consumed by the backend due to backend stalls. │ │ ├── tma_resource_bound (cpu_atom) Note that uops must be available for consumption │ │ └── ▶ tma_resource_bound_group (cpu_ in order for this event to count. If a uop is not │ ├── tma_backend_bound (cpu_core) available (IQ is empty), this event will not count │ ├── ▶ tma_backend_bound_group (cpu_core) cpu_atom@TOPDOWN_BE_BOUND.ALL@ / (5 * │ ├── tma_bad_speculation (cpu_atom) cpu_atom@CPU_CLK_UNHALTED.CORE@) │ ├── ▶ tma_bad_speculation_group (cpu_ato▆▆tma_backend_bound > 0.1 │ ├── tma_bad_speculation (cpu_core) │ ├── ▶ tma_bad_speculation_group (cpu_cor▃▃ │ ├── tma_frontend_bound (cpu_atom) │ ├── ▶ tma_frontend_bound_group (cpu_atom │ ├── tma_frontend_bound (cpu_core) │ ├── ▶ tma_frontend_bound_group (cpu_core ▌ total▁▁▁▁▂▂▂▂▂▂▂▂▂▃▃▃▃▃▃▃▃▃▃▂▂▂▂▃▃▃▄▄▄▄▄▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▄▄▄▄▄▄▅▅▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▇▆▇▇ cpu16▇▇▇▇▇▇▇▇▇▇▇▆▆▁▁▁▁▁▁▁▁▁▁▁▁▂▂▄▄▅▅▅▅▅▅▆▆▆▆▆▆▆▆▇▇▇▇▆▆▆▆▆▆▆▆▅▅▅▅▅▅▅▅▅▅▅▅▅▅▆▆▆▆▄▄▄▄▃▃▄▄▄▄▇▇▇▇▇▇▇▇▇▇ cpu17█▇▇▇▇▇▇▇▇▆▆▆▆▆▆▅▅▅▅▃▃▃▃▂▂▁▁▅▅▅▅▅▅▅▅▄▄▄▄▄▄▄▄▄▄▄▄▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▅▅▄▄▂▂▇▇▇▇▆▆▅▅▆▆ cpu18▇▇▇▇▇██▇▇▃▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▃▃▃▃▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▄▄▄▄▄▅▅▅▅▅▅▅▅ cpu19▇▃▃▄▄▄▄▄▄▄▄▄▄▅▅▅▅▅▅▅▅▁▁▂▂▃▃▃▃▅▅▆▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇██▇▇▇▇▇▇▆▆▅▅▅▅▆▆▄▄▄▄▅▅ cpu20▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▄▄▄▄▅▅▅▅▅▅▆▆▇▇ cpu21▇▇▇▇▇▇▇██▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▆▆▆▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▆▆▆▆▆▆▆▆▆▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▅▅▄▄▂▂▂▂▂▂▁▁▁▁ cpu22█▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▆▆▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▂▂▁▁▁▁▂▂▂▂▂▂▂▂ cpu23▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃▄▄▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▇▇▇▇▆▆██▇▇▇▇▇▇ cpu24▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▃▃▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▅▅▇▇▆▆▆▆▆▆▇▇▇▇ cpu25▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▆▆▆▆▇▇▇▇▇▇██ cpu26▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃▄▄▇▇▇▇▇▇▇▇▇▇██▇▇▇▇▇▇▇▇▇▇▆▆▆▆▆▆▆▆▆▆▆▆▇▇▇▇▇▇▇▇▆▆▆▆▆▆▆▆▆▆▂▂▁▁▁▁▂▂▂▂▂▂▂▂ cpu27▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▃▃▄▄▄▄▅▅▅▅▅▅▇▇ total 7.4923074548462605 cpu16 0.2961618003253457 cpu17 0.3065719718925585 cpu18 0.27800656881051855 cpu19 0.28564742078353406 cpu20 0.2764790653117084 s Search n Next p Previous c Collapse ^q Quit ▏^p palette ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-19perf python: Add PMU argument to parse_metricsIan Rogers
Add an optional PMU argument to parse_metrics to allow restriction of the particular metrics to be opened. If no argument is provided then all metrics with the given name/group are opened Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Gautam Menghani <gautam@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-19perf ilist: Don't display deprecated eventsIan Rogers
Unsupported legacy events are flagged as deprecated. Don't display these events in ilist as they won't open and there are over 1,000 legacy cache events. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-19perf trace: Don't synthesize mmaps unless callchains are enabledIan Rogers
Synthesizing mmaps in perf trace is unnecessary unless call chains are being generated. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf test parse-events: Add evsel test helperIan Rogers
Add TEST_ASSERT_EVSEL to dump the failing evsel in the event of a failure. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf test parse-events: Add evlist test helperIan Rogers
Add TEST_ASSERT_EVLIST to dump the failing evlist in the event of a failure. Add the macro to a number of tests not currently checking the evlist length. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf test: Clean up test_..config helpersIan Rogers
Just have a single test_hw_config helper that strips extended type information in the case of hardware and hardware cache events. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf test: Switch cycles event to cpu-cyclesIan Rogers
Without a PMU perf matches an event against any PMU with the event. Unfortunately some PMU drivers advertise a "cycles" event which is typically just a core event. As tests assume a core event, switch to use "cpu-cycles" that avoids the overloaded "cycles" event on troublesome PMUs and is so far not overloaded. Note, on x86 this changes a legacy event into a sysfs one. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf test parse-events: Remove cpu PMU requirementIan Rogers
In the event parse string, switch "cpu" to "default_core" and then rewrite this to the first core PMU name prior to parsing. This enables testing with a PMU on hybrid x86 and other systems that don't use "cpu" for the core PMU name. The name "default_core" is already used by jevents. Update test expectations to match. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf test parse-events: Without a PMU use cpu-cycles rather than cyclesIan Rogers
Without a PMU perf matches an event against any PMU with the event. Unfortunately some PMU drivers advertise a "cycles" event which is typically just a core event. Switch to using "cpu-cycles" which is an indentical legacy event but avoids the multiple PMU confusion introduced by the PMU drivers. Note, on x86 cpu-cycles is also a sysfs event but cycles isn't. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf test parse-events: Use evsel__match for legacy eventsIan Rogers
Switch from the test's assert_hw/test_config to the common evsel__match code that appropriately handles events with both legacy and sysfs/json encoding. For tests asserting that a config value matches that placed in the perf_event_attr just directly compare the config values. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf evsel: Improvements to __evsel__matchIan Rogers
Ensure both the perf_event_attr and alternate_hw_config are checked in the match. Don't mask the config if the perf_event_attr isn't a HARDWARE or HW_CACHE event. Add common early exit cases. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf evlist: Avoid scanning all PMUs for evlist__new_defaultIan Rogers
Rather than wildcard matching the cycles event specify only the core PMUs. This avoids potentially loading unnecessary uncore PMUs. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf top: Use evlist__new_default when no events specifiedIan Rogers
Rather than distributing the code doing similar things to evlist__new_default, use the one implementation so that paranoia and wildcard scanning can be optimized. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf record: Use evlist__new_default when no events specifiedIan Rogers
Rather than distributing the code doing similar things to evlist__new_default, use the one implementation so that paranoia and wildcard scanning can be optimized. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf parse-events: Remove hard coded legacy hardware and cache parsingIan Rogers
Now that legacy hardware and cache events are in json, having the lexer match the specific event is no longer necessary and generic PMU parsing can take place. Because of this remove the specific term parsing, event adding, and passing of alternate_hw_config which was now always PERF_COUNT_HW_MAX. This mirrors a similar change for software events in commit 6e9fa4131abb ("perf parse-events: Remove non-json software events"). With no hard coded legacy hardware or cache events the wild card, case insensitivity, etc. is consistent for events. This does, however, mean events like cycles will wild card against all PMUs. A change does the same was originally posted and merged from: https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com and reverted by Linus in commit 4f1b067359ac ("Revert "perf parse-events: Prefer sysfs/JSON hardware events over legacy"") due to his dislike for the cycles behavior on ARM. Earlier patches in this series make perf record event opening failures non-fatal and hide the cycles event's failure to open on ARM in perf record, so it is expected the behavior will now be transparent in perf record. perf stat with a cycles event will wildcard open the event on all PMUs. As cycles is a "default event", the perf stat behavior for default events was updated to only open them on core/software PMUs. The change to support legacy events with PMUs was done to clean up Intel's hybrid PMU implementation. Having sysfs/json events with increased priority to legacy was requested by Mark Rutland <mark.rutland@arm.com> to fix Apple-M PMU issues wrt broken legacy events on that PMU. It was requested that RISC-V be able to add events to the perf tool json so the PMU driver didn't need to map legacy events to config encodings: https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/ A previous series of patches decreasing legacy hardware event priorities was posted in: https://lore.kernel.org/lkml/20250416045117.876775-1-irogers@google.com/ Namhyung Kim <namhyung@kernel.org> mentioned that hardware and software events can be implemented similarly: https://lore.kernel.org/lkml/aIJmJns2lopxf3EK@google.com/ Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf print-events: Remove print_symbol_eventsIan Rogers
Now legacy hardware events are in json there's no need for a specific printing routine that previously served for both hardware and software events. The associated event_symbols_hw is also removed. To support the previous filtered version use an event glob of "legacy hardware" which matches the topic of the json events. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf print-events: Remove print_hwcache_eventsIan Rogers
Now legacy cache events are in json there's no need for a specific printing routine. To support the previous filtered version use an event glob of "legacy cache" which matches the topic of the json events. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf jevents: Add legacy-hardware and legacy-cache jsonIan Rogers
The legacy-hardware.json is added containing hardware events similarly to the software.json file. A difference is that for the software PMU the name is known and matches sysfs. In the legacy-hardware.json no Unit/PMU is specified for the events meaning default_core is used and the events will appear for all core PMUs. There are potentially 1216 legacy cache events, rather than list them in a json file add a make_legacy_cache.py helper to generate them. By using json for legacy hardware and cache events: descriptions of the events can be added; events can be marked as deprecated, such as those misleadingly named l2 (deprecated is also used to mark all events that weren't previously displayed in perf list); and the name lookup becomes case insensitive. The C string encoding all the perf events and metrics is increased in size by 123,499 bytes which will increase the perf binary size. Later changes will remove hard coded event parsing for legacy hardware and cache events, turning parsing overhead into a binary search during event lookup. That event descriptions are based off of those in perf_event_open man page, credit to Vince Weaver <vincent.weaver@maine.edu>. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf pmu: Add and use legacy_terms in alias informationIan Rogers
Add support to finding/adding events from the default_core event table. If an event already exists from sysfs/json then the default_core configuration is saved in the legacy_terms string. Lazily use the legacy_terms string to set a legacy hardware or cache event as deprecated if the core PMU doesn't support it. Use the legacy terms string to set the alternate_hw_config, avoiding the value needing to be passed from the parse_events parser. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf jevents: Add legacy json terms and default_core event table helperIan Rogers
Add json LegacyConfigCode and LegacyCacheCode values that translate to legacy-hardware-config and legacy-cache-config event terms respectively. Add perf_pmu__default_core_events_table as a means to find a default_core event table that will later contain legacy events. In situations like hypervisors it is more likely that tables will be NULL. Rather than testing in the calling PMU code, early exit in the pmu-event.c routines. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf parse-events: Add terms for legacy hardware and cache config valuesIan Rogers
Add the PMU terms legacy-hardware-config and legacy-cache-config. These terms are similar to the config term in that their values are assigned to the perf_event_attr config value. They differ in that the PMU type is switched to be either PERF_TYPE_HARDWARE or PERF_TYPE_HW_CACHE, and the PMU type is moved into the extended type information of the config value. This will allow later patches to add legacy events to json. An example use of the terms is in the following: ``` $ perf stat -vv -e 'cpu/legacy-hardware-config=1/,cpu/legacy-cache-config=0x10001/' true Using CPUID GenuineIntel-6-8D-1 Attempt to add: cpu/legacy-hardware-config=0x1/ ..after resolving event: cpu/legacy-hardware-config=0x1/ Attempt to add: cpu/legacy-cache-config=0x10001/ ..after resolving event: cpu/legacy-cache-config=0x10001/ Control descriptor is not initialized ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0x1 (PERF_COUNT_HW_INSTRUCTIONS) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 ------------------------------------------------------------ sys_perf_event_open: pid 994937 cpu -1 group_fd -1 flags 0x8 = 3 ------------------------------------------------------------ perf_event_attr: type 3 (PERF_TYPE_HW_CACHE) size 136 config 0x10001 (PERF_COUNT_HW_CACHE_RESULT_MISS | PERF_COUNT_HW_CACHE_OP_READ | PERF_COUNT_HW_CACHE_L1I) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 ------------------------------------------------------------ sys_perf_event_open: pid 994937 cpu -1 group_fd -1 flags 0x8 = 4 cpu/legacy-hardware-config=1/: -1: 1364046 414756 414756 cpu/legacy-cache-config=0x10001/: -1: 57453 414756 414756 cpu/legacy-hardware-config=1/: 1364046 414756 414756 cpu/legacy-cache-config=0x10001/: 57453 414756 414756 Performance counter stats for 'true': 1,364,046 cpu/legacy-hardware-config=1/ 57,453 cpu/legacy-cache-config=0x10001/ 0.001988593 seconds time elapsed 0.002194000 seconds user 0.000000000 seconds sys ``` Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf pmu: Factor term parsing into a perf_event_attr into a helperIan Rogers
Factor existing functionality in perf_pmu__name_from_config into a helper that will be used in later patches. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf pmu: Use fd rather than FILE from new_aliasIan Rogers
The FILE argument was necessary for the scanner but now that functionality is not being used we can switch to just using io__getline which should cut down on stdio buffer usage. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf parse-events: Remove unused FILE input argument to scannerIan Rogers
Now the events file isn't directly parsed from a FILE but stored in a string prior to parsing, remove the FILE argument to the associated scanner functions as they only ever pass NULL. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf pmu: Don't eagerly parse event termsIan Rogers
When an event/alias is created for a PMU the terms are eagerly parsed using parse_events_terms. For a command like perf stat or perf record, the particular event/alias will be found, the terms parsed, the terms cloned for use in the event parsing, and then the terms used to configure the perf_event_attr. Events/aliases may be eagerly loaded, such as from sysfs or in perf list, in which case the aliases terms will be little or never used. To avoid redundant work, to avoid cloning, and to reduce memory overhead, hold the terms for an event as a string until they need handling as a term list. This may introduce duplicate parsing if an event is repeated in a list, but this situation is expected to be uncommon. Measuring the number of instructions before and after with a sysfs event and perf stat, there is a minor reduction in the number of instructions executed by 0.3%. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf jevents: Support copying the source json files to OUTPUTIan Rogers
The jevents command expects all json files to be organized under a single directory. When generating json files from scripts (to reduce laborious copy and paste in the json) we don't want to generate the json into the source directory if there is an OUTPUT directory specified. This change adds a GEN_JSON for this case where the GEN_JSON copies the JSON files to OUTPUT, only when OUTPUT is specified. The Makefile.perf clean code is updated to clean up this directory when present. This patch is part of: https://lore.kernel.org/lkml/20240926173554.404411-12-irogers@google.com/ which was similarly adding support for generating json in scripts for the consumption of jevents.py. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf record: Skip don't fail for events that don't openIan Rogers
Whilst for many tools it is an expected behavior that failure to open a perf event is a failure, ARM decided to name PMU events the same as legacy events and then failed to rename such events on a server uncore SLC PMU. As perf's default behavior when no PMU is specified is to open the event on all PMUs that advertise/"have" the event, this yielded failures when trying to make the priority of legacy and sysfs/json events uniform - something requested by RISC-V and ARM. A legacy event user on ARM hardware may find their event opened on an uncore PMU which for perf record will fail. Arnaldo suggested skipping such events which this patch implements. Rather than have the skipping conditional on running on ARM, the skipping is done on all architectures as such a fundamental behavioral difference could lead to problems with tools built/depending on perf. An example of perf record failing to open events on x86 is: ``` $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1 Error: Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed. The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read). "dmesg | grep -i perf" may provide additional information. Error: Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed. The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read). "dmesg | grep -i perf" may provide additional information. Error: Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed. The LLC-prefetch-read event is not supported. [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ] $ perf report --stats Aggregated stats: TOTAL events: 17255 MMAP events: 284 ( 1.6%) COMM events: 1961 (11.4%) EXIT events: 1 ( 0.0%) FORK events: 1960 (11.4%) SAMPLE events: 87 ( 0.5%) MMAP2 events: 12836 (74.4%) KSYMBOL events: 83 ( 0.5%) BPF_EVENT events: 36 ( 0.2%) FINISHED_ROUND events: 2 ( 0.0%) ID_INDEX events: 1 ( 0.0%) THREAD_MAP events: 1 ( 0.0%) CPU_MAP events: 1 ( 0.0%) TIME_CONV events: 1 ( 0.0%) FINISHED_INIT events: 1 ( 0.0%) cycles stats: SAMPLE events: 87 ``` If all events fail to open then the perf record will fail: ``` $ perf record -e LLC-prefetch-read true Error: Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed. The LLC-prefetch-read event is not supported. Error: Failure to open any events for recording ``` As an evlist may have dummy events that open when all command line events fail we ignore dummy events when detecting if at least some events open. This still permits the dummy event on its own to be used as a permission check: ``` $ perf record -e dummy true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.046 MB perf.data ] ``` but allows failure when a dummy event is implicilty inserted or when there are insufficient permissions to open it: ``` $ perf record -e LLC-prefetch-read -a true Error: Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed. The LLC-prefetch-read event is not supported. Error: Failure to open any events for recording ``` As the first parsed event in an evlist is marked as tracking, removing this event can remove tracking from the evlist, removing mmap events and breaking symbolization. To avoid this, if a tracking event is removed then the next event has tracking added. The issue with legacy events is that on RISC-V they want the driver to not have mappings from legacy to non-legacy config encodings for each vendor/model due to size, complexity and difficulty to update. It was reported that on ARM Apple-M? CPUs the legacy mapping in the driver was broken and the sysfs/json events should always take precedent, however, it isn't clear this is still the case. It is the case that without working around this issue a legacy event like cycles without a PMU can encode differently than when specified with a PMU - the non-PMU version favoring legacy encodings, the PMU one avoiding legacy encodings. Legacy events are also case sensitive while sysfs/json events are not. The patch removes events and then adjusts the idx value for each evsel. This is done so that the dense xyarrays used for file descriptors, etc. don't contain broken entries. On ARM it could be common following this change to see a lot of warnings for the cycles event due to many ARM PMUs advertising the cycles event (ARM inconsistently have events bus_cycles and then cycles implying CPU cycles, they also sometimes have a cpu_cycles event). As cycles is a popular event, avoid potentially spamming users with error messages on ARM when there are multiple cycles events in the evlist, the error is still shown when verbose is enabled. Prior versions without adding the tracking data and not warning for cycles on ARM was: Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Tested-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Tested-by: Atish Patra <atishp@rivosinc.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf stat: Avoid wildcarding PMUs for default eventsIan Rogers
Without a PMU perf matches an event against any PMU with the event. Unfortunately some PMU drivers advertise a "cycles" event which is typically just a core event. To make perf's behavior consistent, just look up default events with their designated PMU types. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf perf_api_probe: Avoid scanning all PMUs, try software PMU firstIan Rogers
Scan the software PMU first rather than last as it is the least likely to fail the probe. Specifying the software PMU by name was enabled by commit 9957d8c801fe ("perf jevents: Add common software event json"). For hardware events, add core PMU names when getting events to probe so that not all PMUs are scanned. For example, when legacy events support wildcards and for the event "cycles:u" on x86, we want to only scan the "cpu" PMU and not all uncore PMUs for the event too. Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-15perf parse-events: Fix legacy cache events if event is duplicated in a PMUIan Rogers
The term list when adding an event to a PMU is expected to have the event name for the alias lookup. Also, set found_supported so that -EINVAL isn't returned. Fixes: 62593394f66a ("perf parse-events: Legacy cache names on all PMUs and lower priority") Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-13perf bpf_counter: Fix opening of "any"(-1) CPU eventsIan Rogers
The bperf BPF counter code doesn't handle "any"(-1) CPU events, always wanting to aggregate a count against a CPU, which avoids the need for atomics so let's not change that. Force evsels used for BPF counters to require a CPU when not in system-wide mode so that the "any"(-1) value isn't used during map propagation and evsel's CPU map matches that of the PMU. Fixes: b91917c0c6fa ("perf bpf_counter: Fix handling of cpumap fixing hybrid") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-13perf build python: Don't leave a.out file when building with clangIan Rogers
Testing clang features doesn't specify a "-o" option so an a.out file is created and left in the make directory (not the output). Fix this by specifying a "-o" of "/dev/null". Reorganize the code a little to help with readability. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-13perf stat: Additional verbose details for <not supported> eventsIan Rogers
If an event shows as "<not supported>" in perf stat output, in verbose mode add the strerror output to help diagnose the issue. Consider: ``` $ perf stat -e cycles,data_read,instructions true Performance counter stats for 'true': 357,457 cycles:u <not supported> MiB data_read:u 156,182 instructions:u # 0.44 insn per cycle 0.001250315 seconds time elapsed 0.001283000 seconds user 0.000000000 seconds sys ``` To understand why the data_read uncore event failed you can run it again with -v option. This change adds detailed message about the error and suggestion how to fix it potentially. Warning: data_read:u event is not supported by the kernel. Invalid event (data_read:u) in per-thread mode, enable system wide with '-a'. Signed-off-by: Ian Rogers <irogers@google.com> [ simplified the commit message ] Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-13perf tests: use strdup() in "Object code reading"James Clark
Use strdup() instead of fixed PATH_MAX buffer for storing paths to not waste memory. Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-10-11Merge tag 'x86_cleanups_for_v6.18_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Borislav Petkov: - Simplify inline asm flag output operands now that the minimum compiler version supports the =@ccCOND syntax - Remove a bunch of AS_* Kconfig symbols which detect assembler support for various instruction mnemonics now that the minimum assembler version supports them all - The usual cleanups all over the place * tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm: Remove code depending on __GCC_ASM_FLAG_OUTPUTS__ x86/sgx: Use ENCLS mnemonic in <kernel/cpu/sgx/encls.h> x86/mtrr: Remove license boilerplate text with bad FSF address x86/asm: Use RDPKRU and WRPKRU mnemonics in <asm/special_insns.h> x86/idle: Use MONITORX and MWAITX mnemonics in <asm/mwait.h> x86/entry/fred: Push __KERNEL_CS directly x86/kconfig: Remove CONFIG_AS_AVX512 crypto: x86 - Remove CONFIG_AS_VPCLMULQDQ crypto: X86 - Remove CONFIG_AS_VAES crypto: x86 - Remove CONFIG_AS_GFNI x86/kconfig: Drop unused and needless config X86_64_SMP
2025-10-08Merge tag 'perf-tools-for-v6.18-1-2025-10-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Arnaldo Carvalho de Melo: - Extended 'perf annotate' with DWARF type information (--code-with-type) integration in the TUI, including a 'T' hotkey to toggle it - Enhanced 'perf bench mem' with new mmap() workloads and control over page/chunk sizes - Fix 'perf stat' error handling to correctly display unsupported events - Improved support for Clang cross-compilation - Refactored LLVM and Capstone disasm for modularity - Introduced the :X modifier to exclude an event from automatic regrouping - Adjusted KVM sampling defaults to use the "cycles" event to prevent failures - Added comprehensive support for decoding PowerPC Dispatch Trace Log (DTL) - Updated Arm SPE tracing logic for better analysis of memory and snoop details - Synchronized Intel PMU events and metrics with TMA 5.1 across multiple processor generations - Converted dependencies like libperl and libtracefs to be opt-in - Handle more Rust symbols in kallsyms ('N', debugging) - Improve the python binding to allow for python based tools to use more of the libraries, add a 'ilist' utility to test those new bindings - Various 'perf test' fixes - Kan Liang no longer a perf tools reviewer * tag 'perf-tools-for-v6.18-1-2025-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (192 commits) perf tools: Fix arm64 libjvmti build by generating unistd_64.h perf tests: Don't retest sections in "Object code reading" perf docs: Document building with Clang perf build: Support build with clang perf test coresight: Dismiss clang warning for unroll loop thread perf test coresight: Dismiss clang warning for thread loop perf test coresight: Dismiss clang warning for memcpy thread perf build: Disable thread safety analysis for perl header perf build: Correct CROSS_ARCH for clang perf python: split Clang options when invoking Popen tools build: Align warning options with perf perf disasm: Remove unused evsel from 'struct annotate_args' perf srcline: Fallback between addr2line implementations perf disasm: Make ins__scnprintf() and ins__is_nop() static perf dso: Clean up read_symbol() error handling perf dso: Support BPF programs in dso__read_symbol() perf dso: Move read_symbol() from llvm/capstone to dso perf llvm: Reduce LLVM initialization perf check: Add libLLVM feature perf parse-events: Fix parsing of >30kb event strings ...
2025-10-06perf tools: Fix arm64 libjvmti build by generating unistd_64.hVincent Minet
Since commit 22f72088ffe6 ("tools headers: Update the syscall table with the kernel sources") the arm64 syscall header is generated at build time. Later, commit bfb713ea53c7 ("perf tools: Fix arm64 build by generating unistd_64.h") added a dependency to libperf to guarantee that this header was created before building libperf or perf itself. However, libjvmti also requires this header but does not depend on libperf, leading to build failures such as: In file included from /usr/include/sys/syscall.h:24, from /usr/include/syscall.h:1, from jvmti/jvmti_agent.c:36: tools/arch/arm64/include/uapi/asm/unistd.h:2:10: fatal error: asm/unistd_64.h: No such file or directory 2 | #include <asm/unistd_64.h> Fix this by ensuring that libperf is built before libjvmti, so that unistd_64.h is always available. Fixes: 22f72088ffe69a37 ("tools headers: Update the syscall table with the kernel sources") Cc: Namhyung Kim <namhyung@kernel.org> Cc: Vincent Minet <v.minet@criteo.com> Link: https://lore.kernel.org/r/20250922053702.2688374-1-v.minet@criteo.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-10-06perf tests: Don't retest sections in "Object code reading"James Clark
We already only test each kcore map once, but on slow systems (particularly with network filesystems) even the non-kcore maps are slow. The test can test the same objdump output over and over which only wastes time. Generalize the skipping mechanism to track all DSOs and addresses so that each section is only tested once. On a fully loaded ARM Juno (simulating a parallel 'perf test' run) with a network filesystem, the original runtime is: real 1m51.126s user 0m19.445s sys 1m15.431s And the new runtime is: real 0m48.873s user 0m8.031s sys 0m32.353s Committer testing: # perf test "code read" 22: Object code reading : Ok # Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-10-06perf docs: Document building with ClangLeo Yan
Add example commands for building perf with Clang. Since recent Android NDK releases use Clang as the default compiler, a separate Android specific document is no longer needed; point to the general build documentation instead. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20251006-perf_build_android_ndk-v3-9-4305590795b2@arm.com Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: linux-riscv@lists.infradead.org Cc: llvm@lists.linux.dev Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-10-06perf build: Support build with clangLeo Yan
Add support for building perf with clang. For cross compilation, the Makefile dynamically selects target flag for corresponding arch. This patch has been verified on x86_64 machine with Ubuntu distro, it can build successfully for native target, and for cross building Arm64 and s390. Example: native build on x86_64 / Ubuntu machine: $ HOSTCC=clang CC=clang CXX=clang++ make -C tools/perf Example: cross building s390 target on x86_64 / Ubuntu machine: # Install x390x cross toolchain and headers $ sudo apt-get install gcc-s390x-linux-gnu g++-s390x-linux-gnu \ libc6-dev-s390x-cross linux-libc-dev-s390x-cross # Build with clang $ HOSTCC=clang CC=clang CXX=clang++ \ ARCH=s390 CROSS_COMPILE=s390x-linux-gnu- \ make -C tools/perf NO_LIBELF=1 NO_LIBTRACEEVENT=1 NO_LIBPYTHON=1 Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20251006-perf_build_android_ndk-v3-8-4305590795b2@arm.com Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: linux-riscv@lists.infradead.org Cc: llvm@lists.linux.dev Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-10-06perf test coresight: Dismiss clang warning for unroll loop threadLeo Yan
clang-18.1.3 on Ubuntu 24.04.2 reports warning: unroll_loop_thread.c:35:25: warning: value size does not match register size specified by the constraint and modifier [-Wasm-operand-widths] 35 | : /* in */ [in] "r" (in) | ^ unroll_loop_thread.c:39:1: warning: non-void function does not return a value [-Wreturn-type] 39 | } | ^ Use the modifier "w" for 32-bit register access and return NULL at the end of thread function. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20251006-perf_build_android_ndk-v3-7-4305590795b2@arm.com Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: linux-riscv@lists.infradead.org Cc: llvm@lists.linux.dev Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-10-06perf test coresight: Dismiss clang warning for thread loopLeo Yan
clang-18.1.3 on Ubuntu 24.04.2 reports warning: thread_loop.c:41:23: warning: value size does not match register size specified by the constraint and modifier [-Wasm-operand-widths] 41 | : /* in */ [i] "r" (i), [len] "r" (len) | ^ thread_loop.c:37:8: note: use constraint modifier "w" 37 | "add %[i], %[i], #1\n" | ^~~~ | %w[i] thread_loop.c:41:23: warning: value size does not match register size specified by the constraint and modifier [-Wasm-operand-widths] 41 | : /* in */ [i] "r" (i), [len] "r" (len) | ^ thread_loop.c:37:14: note: use constraint modifier "w" 37 | "add %[i], %[i], #1\n" | ^~~~ | %w[i] thread_loop.c:41:23: warning: value size does not match register size specified by the constraint and modifier [-Wasm-operand-widths] 41 | : /* in */ [i] "r" (i), [len] "r" (len) | ^ thread_loop.c:38:8: note: use constraint modifier "w" 38 | "cmp %[i], %[len]\n" | ^~~~ | %w[i] thread_loop.c:41:38: warning: value size does not match register size specified by the constraint and modifier [-Wasm-operand-widths] 41 | : /* in */ [i] "r" (i), [len] "r" (len) | ^ thread_loop.c:38:14: note: use constraint modifier "w" 38 | "cmp %[i], %[len]\n" | ^~~~~~ | %w[len] Use the modifier "w" for 32-bit register access. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20251006-perf_build_android_ndk-v3-6-4305590795b2@arm.com Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: linux-riscv@lists.infradead.org Cc: llvm@lists.linux.dev Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-10-06perf test coresight: Dismiss clang warning for memcpy threadLeo Yan
clang-18.1.3 on Ubuntu 24.04.2 reports warning: memcpy_thread.c:30:1: warning: non-void function does not return a value in all control paths [-Wreturn-type] 30 | } | ^ Dismiss the warning with returning NULL from the thread function. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20251006-perf_build_android_ndk-v3-5-4305590795b2@arm.com Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: linux-riscv@lists.infradead.org Cc: llvm@lists.linux.dev Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-10-06perf build: Disable thread safety analysis for perl headerLeo Yan
When build with perl5, it reports error: In file included from /usr/lib/perl5/5.42.0/x86_64-linux-thread-multi/CORE/perl.h:7933: /usr/lib/perl5/5.42.0/x86_64-linux-thread-multi/CORE/inline.h:298:5: error: mutex 'PL_env_mutex.lock' is not held on every path through here [-Werror,-Wthread-safety-analysis] 298 | ENV_UNLOCK; | ^ /usr/lib/perl5/5.42.0/x86_64-linux-thread-multi/CORE/perl.h:7091:31: note: expanded from macro 'ENV_UNLOCK' 7091 | # define ENV_UNLOCK PERL_REENTRANT_UNLOCK("env"... | ^ /usr/lib/perl5/5.42.0/x86_64-linux-thread-multi/CORE/perl.h:6465:7: note: expanded from macro 'PERL_REENTRANT_UNLOCK' 6465 | } STMT_END | ^ /usr/lib/perl5/5.42.0/x86_64-linux-thread-multi/CORE/perl.h:865:28: note: expanded from macro 'STMT_END' 865 | # define STMT_END while (0) | ^ The error is caused by perl header but not perf code, disable thread safety analysis if including the header. Though GCC does not support the thread safety analysis option, this negative warning flag is silently ignored by it. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20251006-perf_build_android_ndk-v3-4-4305590795b2@arm.com Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: linux-riscv@lists.infradead.org Cc: llvm@lists.linux.dev Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>