summaryrefslogtreecommitdiff
path: root/tools/perf
AgeCommit message (Collapse)Author
2025-11-18perf arm_spe: Report extended memory operations in recordsLeo Yan
Extended memory operations include atomic (AT), acquire/release (AR), and exclusive (EXCL) operations. Save the relevant information in the records. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18perf arm_spe: Report MTE allocation tag in recordLeo Yan
Save MTE tag info in memory record. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18perf arm_spe: Report register access in recordLeo Yan
Record register access info for load / store operations. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18perf arm_spe: Introduce data processing macro for SVE operationsLeo Yan
Introduce the ARM_SPE_OP_DP (data processing) macro as associated information for SVE operations. For SVE register access, only ARM_SPE_OP_SVE is set; for SVE data processing, both ARM_SPE_OP_SVE and ARM_SPE_OP_DP are set together. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18perf arm_spe: Consolidate operation typesLeo Yan
Consolidate operation types in a way: (a) Extract the second-level types into separate enums. (b) The second-level types for memory and SIMD operations are classified by modules. E.g., an operation may relate to general register, SIMD/FP, SVE, etc. (c) The associated information tells details. E.g., an operation is load or store, whether it is atomic operation, etc. Start the enum items for the second-level types from 8 to accommodate more entries within a 32-bit integer. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18perf arm_spe: Remove unused operation typesLeo Yan
Remove unused SVE operation types. These operations will be reintroduced in subsequent refactoring, but with a different format. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18perf arm_spe: Decode SME data processing packetLeo Yan
For SME data processing, decode its Effective vector length or Tile Size (ETS), and print out if a floating-point operation. After: . 00000000: 49 00 SME-OTHER ETS 1024 FP . 00000002: b2 18 3c d7 83 00 80 ff ff VA 0xffff800083d73c18 . 0000000b: 9a 00 00 LAT 0 XLAT . 0000000e: 43 00 DATA-SOURCE 0 Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18perf arm_spe: Decode ASE and FP fields in other operationLeo Yan
Add a check for other operation, which prevents any incorrectly classifying. Parse the ASE and FP fields. After: . 0000002f: 48 06 OTHER ASE FP INSN-OTHER . 00000031: b2 08 80 48 01 08 00 ff ff VA 0xffff000801488008 . 0000003a: 9a 00 00 LAT 0 XLAT . 0000003d: 42 16 EV RETIRED L1D-ACCESS TLB-ACCESS Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18perf arm_spe: Rename SPE_OP_PKT_IS_OTHER_SVE_OP macroLeo Yan
Rename the macro to SPE_OP_PKT_OTHER_SUBCLASS_SVE to unify naming. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18perf arm_spe: Decode GCS operationLeo Yan
Decode a load or store from a GCS operation and the associated "common" field. After: . 00000000: 49 44 LD GCS COMM . 00000002: b2 18 3c d7 83 00 80 ff ff VA 0xffff800083d73c18 . 0000000b: 9a 00 00 LAT 0 XLAT . 0000000e: 43 00 DATA-SOURCE 0 Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18perf arm_spe: Unify operation namingLeo Yan
Rename extended subclass and SVE/SME register access subclass, so that the naming can be consistent cross all sub classes. Add an log "SVE-SME-REG" for the SVE/SME register access, this is easier for parsing. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-18perf arm_spe: Fix memset subclass in operationLeo Yan
The operation subclass is extracted from bits [7..1] of the payload. Since bit [0] is not parsed, there is no chance to match the memset type (0x25). As a result, the memset payload is never parsed successfully. Instead of extracting a unified bit field, change to extract the specific bits for each operation subclass. Fixes: 34fb60400e32 ("perf arm-spe: Add raw decoding for SPEv1.3 MTE and MOPS load/store") Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-17perf tool_pmu: More accurately set the cpus for tool eventsIan Rogers
The user and system time events can record on different CPUs, but for all other events a single CPU map of just CPU 0 makes sense. In parse-events detect a tool PMU and then pass the perf_event_attr so that the tool_pmu can return CPUs specific for the event. This avoids a CPU map of all online CPUs being used for events like duration_time. Avoiding this avoids the evlist CPUs containing CPUs for which duration_time just gives 0. Minimizing the evlist CPUs can remove unnecessary sched_setaffinity syscalls that delay metric calculations. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-17perf stat: Reduce scope of walltime_nsecs_statsIan Rogers
walltime_nsecs_stats is no longer used for counter values, move into that stat_config where it controls certain things like noise measurement. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-17perf stat: Reduce scope of ru_statsIan Rogers
The ru_stats are used to capture user and system time stats when a process exits. These are then applied to user and system time tool events if their reads fail due to the process terminating. Reduce the scope now the metric code no longer reads these values. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-17perf stat-shadow: Read tool events directlyIan Rogers
When reading time values for metrics don't use the globals updated in builtin-stat, just read the events as regular events. The only exception is for time events where nanoseconds need converting to seconds as metrics assume time metrics are in seconds. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-17perf tool_pmu: Use old_count when computing count values for time eventsIan Rogers
When running in interval mode every third count of a time event isn't showing properly: ``` $ perf stat -e duration_time -a -I 1000 1.001082862 1,002,290,425 duration_time 2.004264262 1,003,183,516 duration_time 3.007381401 <not counted> duration_time 4.011160141 1,003,705,631 duration_time 5.014515385 1,003,290,110 duration_time 6.018539680 <not counted> duration_time 7.022065321 1,003,591,720 duration_time ``` The regression came in with a different fix, found through bisection, commit 68cb1567439f ("perf tool_pmu: Fix aggregation on duration_time"). The issue is caused by the enabled and running time of the event matching the old_count's and creating a delta of 0, which is indicative of an error. Fixes: 68cb1567439f ("perf tool_pmu: Fix aggregation on duration_time") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-17perf pmu: perf_cpu_map__new_int to avoid parsing a stringIan Rogers
Prefer perf_cpu_map__new_int(0) to perf_cpu_map__new("0") as it avoids strings parsing. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-17perf stat: Display metric-only for 0 countersIan Rogers
0 counters may occur in hypervisor settings but metric-only output is always expected. This resolves an issue in the "perf stat STD output linter" test. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-16perf test: Don't fail if user rdpmc returns 0 when disabledIan Rogers
In certain hypervisor set ups the value 0 may be returned but this is only erroneous if the user rdpmc isn't disabled. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-16perf parse-events: Add debug logging to perf_eventIan Rogers
If verbose is enabled and parse_event is called, typically by tests, log failures. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-16perf test: Be tolerant of missing json metric none valueIan Rogers
print_metric_only_json and print_metric_end in stat-display.c may create a metric value of "none" which fails validation as isfloat. Add a helper to properly validate metric numeric values. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-16perf sample: Fix the wrong format specifierliujing
In the file tools/perf/util/cs-etm.c, queue_nr is of type unsigned int and should be printed with %u. Signed-off-by: liujing <liujing@cmss.chinamobile.com> Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-15perf script: Fix build by removing unused evsel_script()James Clark
The evsel_script() function is unused since the linked commit. Fix the build by removing it. Fixes the following compilation error: static inline struct evsel_script *evsel_script(struct evsel *evsel) ^ builtin-script.c:347:36: error: unused function 'evsel_script' [-Werror,-Wunused-function] Fixes: 3622990efaab ("perf script: Change metric format to use json metrics") Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-13perf vendor metrics s390: Avoid has_event(INSTRUCTIONS)Ian Rogers
The instructions event is now provided in json meaning the has_event test always succeeds. Switch to using non-legacy event names in the affected metrics. Reported-by: Thomas Richter <tmricht@linux.ibm.com> Closes: https://lore.kernel.org/linux-perf-users/3e80f453-f015-4f4f-93d3-8df6bb6b3c95@linux.ibm.com/ Fixes: 0012e0fa221b ("perf jevents: Add legacy-hardware and legacy-cache json") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Thomas Richter <tmricht@linux.ibm.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-13perf auxtrace: Remove errno.h from auxtrace.h and fix transitive dependenciesIan Rogers
errno.h isn't used in auxtrace.h so remove it and fix build failures caused by transitive dependencies through auxtrace.h on errno.h. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-13perf build: Remove NO_AUXTRACE build optionIan Rogers
The NO_AUXTRACE build option was used when the __get_cpuid feature test failed or if it was provided on the command line. The option no longer avoids a dependency on a library and so having the option is just adding complexity to the code base. Remove the option CONFIG_AUXTRACE from Build files and HAVE_AUXTRACE_SUPPORT by assuming it is always defined. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-13perf build: Don't add NO_AUXTRACE if missing feature-get_cpuidIan Rogers
The intel-pt code dependent on __get_cpuid is no longer present so remove the feature test in the Makefile.config. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-13perf intel-pt: Use the perf provided "cpuid.h"Ian Rogers
Rather than having a feature test and include of <cpuid.h> for the __get_cpuid function, use the cpuid function provided by tools/perf/arch/x86/util/cpuid.h. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-13perf libbfd: Ensure libbfd is initialized prior to useIan Rogers
Multiple threads may be creating and destroying BFD objects in situations like `perf top`. Without appropriate initialization crashes may occur during libbfd's cache management. BFD's locks require recursive mutexes, add support for these. Committer testing: This happens only when building with 'make BUILD_NONDISTRO=1' and having the binutils-devel package (or equivalent) installed, i.e. linking with binutils devel files, an opt-in perf build. Before: root@x1:~# perf top perf: Segmentation fault -------- backtrace -------- <SNIP multiple failed attempts at printing a backtrace> root@x1:~# After this patch it works as before. Closes: https://lore.kernel.org/lkml/aQt66zhfxSA80xwt@gentoo.org/ Fixes: 95931d9a594dd0b5 ("perf libbfd: Move libbfd functionality to its own file") Reported-by: Guilherme Amadio <amadio@gentoo.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.18-rc6). No conflicts, adjacent changes in: drivers/net/phy/micrel.c 96a9178a29a6 ("net: phy: micrel: lan8814 fix reset of the QSGMII interface") 61b7ade9ba8c ("net: phy: micrel: Add support for non PTP SKUs for lan8814") and a trivial one in tools/testing/selftests/drivers/net/Makefile. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-13perf test: Fix lock contention testRavi Bangoria
Couple of independent fixes: 1. Wire in SIGSEGV handler that terminates the test with a failure code. 2. Use "--lock-cgroup" instead of "-g"; "-g" was proposed but never merged. See commit 4d1792d0a2564caf ("perf lock contention: Add --lock-cgroup option") 3. Call cleanup() on every normal exit so trap_cleanup() doesn't mistake it for an unexpected signal and emit a false-negative "Unexpected signal in main" message. Before patch: # ./perf test -vv "lock contention" 85: kernel lock contention analysis test: --- start --- test child forked, pid 610711 Testing perf lock record and perf lock contention Testing perf lock contention --use-bpf Testing perf lock record and perf lock contention at the same time Testing perf lock contention --threads Testing perf lock contention --lock-addr Testing perf lock contention --lock-cgroup Unexpected signal in test_aggr_cgroup ---- end(0) ---- 85: kernel lock contention analysis test : Ok After patch: # ./perf test -vv "lock contention" 85: kernel lock contention analysis test: --- start --- test child forked, pid 602637 Testing perf lock record and perf lock contention Testing perf lock contention --use-bpf Testing perf lock record and perf lock contention at the same time Testing perf lock contention --threads Testing perf lock contention --lock-addr Testing perf lock contention --lock-cgroup Testing perf lock contention --type-filter (w/ spinlock) Testing perf lock contention --lock-filter (w/ tasklist_lock) Testing perf lock contention --callstack-filter (w/ unix_stream) [Skip] Could not find 'unix_stream' Testing perf lock contention --callstack-filter with task aggregation [Skip] Could not find 'unix_stream' Testing perf lock contention --cgroup-filter Testing perf lock contention CSV output ---- end(0) ---- 85: kernel lock contention analysis test : Ok Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Tycho Andersen <tycho@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-13perf lock: Fix segfault due to missing kernel mapRavi Bangoria
Kernel maps are encoded in PERF_RECORD_MMAP2 samples but "perf lock report" and "perf lock contention" do not process MMAP2 samples. Because of that, machine->vmlinux_map stays NULL and any later access triggers a segmentation fault. Fix it by adding ->mmap2() callbacks. Fixes: 53b00ff358dc75b1 ("perf record: Make --buildid-mmap the default") Reported-by: Tycho Andersen (AMD) <tycho@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Tested-by: Tycho Andersen (AMD) <tycho@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-13perf build: Don't fail fast path feature detection when binutils-devel is ↵Arnaldo Carvalho de Melo
not available This is one more remnant of the BUILD_NONDISTRO series to make building with binutils-devel opt-in due to license incompatibility. In this case just the references at link time were still in place, which make building the test-all.bin file fail, which wasn't detected before probably because the last test was done with binutils-devel available, doh. Now: $ rpm -q binutils-devel package binutils-devel is not installed $ file /tmp/build/perf-tools/feature/test-all.bin /tmp/build/perf-tools/feature/test-all.bin: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=4b5388a346b51f1b993f0b0dbd49f4570769b03c, for GNU/Linux 3.2.0, not stripped $ Fixes: 970ae86307718c34 ("perf build: The bfd features are opt-in, stop testing for them by default") Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-13perf header: Write bpf_prog (infos|btfs)_cnt to data fileThomas Falcon
With commit f0d0f978f3f5830a ("perf header: Don't write empty BPF/BTF info"), the write_bpf_( prog_info() | btf() ) functions exit without writing anything if env->bpf_prog.(infos| btfs)_cnt is zero. process_bpf_( prog_info() | btf() ), however, still expect a "count" value to exist in the data file. If btf information is empty, for example, process_bpf_btf will read garbage or some other data as the number of btf nodes in the data file. As a result, the data file will not be processed correctly. Instead, write the count to the data file and exit if it is zero. Fixes: f0d0f978f3f5830a ("perf header: Don't write empty BPF/BTF info") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Thomas Falcon <thomas.falcon@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-11-13Merge tag 'v6.18-rc5' into objtool/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-11-12perf test: Add a perf event fallback testZide Chen
This adds test cases to verify the precise ip fallback logic: - If the system supports precise ip, for an event given with the maximum precision level, it should be able to decrease precise_ip to find a supported level. - The same fallback behavior should also work in more complex scenarios, such as event groups or when PEBS is involved Additional fallback tests, such as those covering missing feature cases, can be added in the future. Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Reviewed-by: Ian Rogers <irogers!@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf stat: Align metric output without eventsNamhyung Kim
One of my concern in the perf stat output was the alignment in the metrics and shadow stats. I think it missed to calculate the basic output length using COUNTS_LEN and EVNAME_LEN but missed to add the unit length like "msec" and surround 2 spaces. I'm not sure why it's not printed below though. But anyway, now it shows correctly aligned metric output. $ perf stat true Performance counter stats for 'true': 859,772 task-clock # 0.395 CPUs utilized 0 context-switches # 0.000 /sec 0 cpu-migrations # 0.000 /sec 56 page-faults # 65.134 K/sec 1,075,022 instructions # 0.86 insn per cycle 1,255,911 cycles # 1.461 GHz 220,573 branches # 256.548 M/sec 7,381 branch-misses # 3.35% of all branches TopdownL1 # 19.2 % tma_retiring # 28.6 % tma_backend_bound # 9.5 % tma_bad_speculation # 42.6 % tma_frontend_bound 0.002174871 seconds time elapsed ^ | 0.002154000 seconds user | 0.000000000 seconds sys here Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf tool_pmu: Make core_wide and target_cpu json eventsIan Rogers
For the sake of better documentation, add core_wide and target_cpu to the tool.json. When the values of system_wide and user_requested_cpu_list are unknown, use the values from the global stat_config. Example output showing how '-a' modifies the values in `perf stat`: ``` $ perf stat -e core_wide,target_cpu true Performance counter stats for 'true': 0 core_wide 0 target_cpu 0.000993787 seconds time elapsed 0.001128000 seconds user 0.000000000 seconds sys $ perf stat -e core_wide,target_cpu -a true Performance counter stats for 'system wide': 1 core_wide 1 target_cpu 0.002271723 seconds time elapsed $ perf list ... tool: core_wide [1 if not SMT,if SMT are events being gathered on all SMT threads 1 otherwise 0. Unit: tool] ... target_cpu [1 if CPUs being analyzed,0 if threads/processes. Unit: tool] ... ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf test stat csv: Update test expectations and eventsIan Rogers
Explicitly use a metric rather than implicitly expecting '-e instructions,cycles' to produce a metric. Use a metric with software events to make it more compatible. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf test stat: Update test expectations and eventsIan Rogers
test_stat_record_report and test_stat_record_script used default output which triggers a bug when sending metrics. As this isn't relevant to the test switch to using named software events. Update the match in test_hybrid as the cycles event is now cpu-cycles to workaround potential ARM issues. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf test stat: Update shadow test to use metricsIan Rogers
Previously '-e cycles,instructions' would implicitly create an IPC metric. This now has to be explicit with '-M insn_per_cycle'. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf test metrics: Update all metrics for possibly failing default metricsIan Rogers
Default metrics may use unsupported events and be ignored. These metrics shouldn't cause metric testing to fail. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf test stat: Update std_output testing metric expectationsIan Rogers
Make the expectations match json metrics rather than the previous hard coded ones. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf test stat: Ignore failures in Default[234] metricgroupsIan Rogers
The Default[234] metric groups may contain unsupported legacy events. Allow those metric groups to fail. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf test stat+json: Improve metric-only testingIan Rogers
When testing metric-only, pass a metric to perf rather than expecting a hard coded metric value to be generated. Remove keys that were really metric-only units and instead don't expect metric only to have a matching json key as it encodes metrics as {"metric_name", "metric_value"}. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf stat: Remove "unit" workarounds for metric-onlyIan Rogers
Remove code that tested the "unit" as in KB/sec for certain hard coded metric values and did workarounds. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf stat: Sort default events/metricsIan Rogers
To improve the readability of default events/metrics, sort the evsels after the Default metric groups have be parsed. Before: ``` $ perf stat -a sleep 1 Performance counter stats for 'system wide': 22,087 context-switches # nan cs/sec cs_per_second TopdownL1 (cpu_core) # 10.3 % tma_bad_speculation # 25.8 % tma_frontend_bound # 34.5 % tma_backend_bound # 29.3 % tma_retiring 7,829 page-faults # nan faults/sec page_faults_per_second 880,144,270 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (50.10%) 1,693,081,235 cpu_core/cpu-cycles/ # nan GHz cycles_frequency TopdownL1 (cpu_atom) # 20.5 % tma_bad_speculation # 13.8 % tma_retiring (50.26%) # 34.6 % tma_frontend_bound (50.23%) 89,326,916 cpu_atom/branches/ # nan M/sec branch_frequency (60.19%) 538,123,088 cpu_core/branches/ # nan M/sec branch_frequency 1,368 cpu-migrations # nan migrations/sec migrations_per_second # 31.1 % tma_backend_bound (60.19%) 0.00 msec cpu-clock # 0.0 CPUs CPUs_utilized 485,744,856 cpu_atom/instructions/ # 0.6 instructions insn_per_cycle (59.87%) 3,093,112,283 cpu_core/instructions/ # 1.8 instructions insn_per_cycle 4,939,427 cpu_atom/branch-misses/ # 5.0 % branch_miss_rate (49.77%) 7,632,248 cpu_core/branch-misses/ # 1.4 % branch_miss_rate 1.005084693 seconds time elapsed ``` After: ``` $ perf stat -a sleep 1 Performance counter stats for 'system wide': 22,165 context-switches # nan cs/sec cs_per_second 0.00 msec cpu-clock # 0.0 CPUs CPUs_utilized 2,260 cpu-migrations # nan migrations/sec migrations_per_second 20,476 page-faults # nan faults/sec page_faults_per_second 17,052,357 cpu_core/branch-misses/ # 1.5 % branch_miss_rate 1,120,090,590 cpu_core/branches/ # nan M/sec branch_frequency 3,402,892,275 cpu_core/cpu-cycles/ # nan GHz cycles_frequency 6,129,236,701 cpu_core/instructions/ # 1.8 instructions insn_per_cycle 6,159,523 cpu_atom/branch-misses/ # 3.1 % branch_miss_rate (49.86%) 222,158,812 cpu_atom/branches/ # nan M/sec branch_frequency (50.25%) 1,547,610,244 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (50.40%) 1,304,901,260 cpu_atom/instructions/ # 0.8 instructions insn_per_cycle (50.41%) TopdownL1 (cpu_core) # 13.7 % tma_bad_speculation # 23.5 % tma_frontend_bound # 33.3 % tma_backend_bound # 29.6 % tma_retiring TopdownL1 (cpu_atom) # 32.1 % tma_backend_bound (59.65%) # 30.1 % tma_frontend_bound (59.51%) # 22.3 % tma_bad_speculation # 15.5 % tma_retiring (59.53%) 1.008405429 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf stat: Fix default metricgroup display on hybridIan Rogers
The logic to skip output of a default metric line was firing on Alderlake and not displaying 'TopdownL1 (cpu_atom)'. Remove the need_full_name check as it is equivalent to the different PMU test in the cases we care about, merge the 'if's and flip the evsel of the PMU test. The 'if' is now basically saying, if the output matches the last printed output then skip the output. Before: ``` TopdownL1 (cpu_core) # 11.3 % tma_bad_speculation # 24.3 % tma_frontend_bound TopdownL1 (cpu_core) # 33.9 % tma_backend_bound # 30.6 % tma_retiring # 42.2 % tma_backend_bound # 25.0 % tma_frontend_bound (49.81%) # 12.8 % tma_bad_speculation # 20.0 % tma_retiring (59.46%) ``` After: ``` TopdownL1 (cpu_core) # 8.3 % tma_bad_speculation # 43.7 % tma_frontend_bound # 30.7 % tma_backend_bound # 17.2 % tma_retiring TopdownL1 (cpu_atom) # 31.9 % tma_backend_bound # 37.6 % tma_frontend_bound (49.66%) # 18.0 % tma_bad_speculation # 12.6 % tma_retiring (59.58%) ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-11-11perf stat: Remove hard coded shadow metricsIan Rogers
Now that the metrics are encoded in common json the hard coded printing means the metrics are shown twice. Remove the hard coded version. This means that when specifying events, and those events correspond to a hard coded metric, the metric will no longer be displayed. The metric will be displayed if the metric is requested. Due to the adhoc printing in the previous approach it was often found frustrating, the new approach avoids this. The default perf stat output on an alderlake now looks like: ``` $ perf stat -a -- sleep 1 Performance counter stats for 'system wide': 19,697 context-switches # nan cs/sec cs_per_second TopdownL1 (cpu_core) # 10.7 % tma_bad_speculation # 24.9 % tma_frontend_bound TopdownL1 (cpu_core) # 34.3 % tma_backend_bound # 30.1 % tma_retiring 6,593 page-faults # nan faults/sec page_faults_per_second 729,065,658 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (49.79%) 1,605,131,101 cpu_core/cpu-cycles/ # nan GHz cycles_frequency # 19.7 % tma_bad_speculation # 14.2 % tma_retiring (50.14%) # 37.3 % tma_frontend_bound (50.31%) 87,302,268 cpu_atom/branches/ # nan M/sec branch_frequency (60.27%) 512,046,956 cpu_core/branches/ # nan M/sec branch_frequency 1,111 cpu-migrations # nan migrations/sec migrations_per_second # 28.8 % tma_backend_bound (60.26%) 0.00 msec cpu-clock # 0.0 CPUs CPUs_utilized 392,509,323 cpu_atom/instructions/ # 0.6 instructions insn_per_cycle (60.19%) 2,990,369,310 cpu_core/instructions/ # 1.9 instructions insn_per_cycle 3,493,478 cpu_atom/branch-misses/ # 5.9 % branch_miss_rate (49.69%) 7,297,531 cpu_core/branch-misses/ # 1.4 % branch_miss_rate 1.006621701 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>