| Age | Commit message (Collapse) | Author |
|
Extended memory operations include atomic (AT), acquire/release (AR),
and exclusive (EXCL) operations. Save the relevant information
in the records.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Save MTE tag info in memory record.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Record register access info for load / store operations.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Introduce the ARM_SPE_OP_DP (data processing) macro as associated
information for SVE operations. For SVE register access, only
ARM_SPE_OP_SVE is set; for SVE data processing, both ARM_SPE_OP_SVE and
ARM_SPE_OP_DP are set together.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Consolidate operation types in a way:
(a) Extract the second-level types into separate enums.
(b) The second-level types for memory and SIMD operations are classified
by modules. E.g., an operation may relate to general register,
SIMD/FP, SVE, etc.
(c) The associated information tells details. E.g., an operation is
load or store, whether it is atomic operation, etc.
Start the enum items for the second-level types from 8 to accommodate
more entries within a 32-bit integer.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Remove unused SVE operation types. These operations will be reintroduced
in subsequent refactoring, but with a different format.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
For SME data processing, decode its Effective vector length or Tile Size
(ETS), and print out if a floating-point operation.
After:
. 00000000: 49 00 SME-OTHER ETS 1024 FP
. 00000002: b2 18 3c d7 83 00 80 ff ff VA 0xffff800083d73c18
. 0000000b: 9a 00 00 LAT 0 XLAT
. 0000000e: 43 00 DATA-SOURCE 0
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add a check for other operation, which prevents any incorrectly
classifying. Parse the ASE and FP fields.
After:
. 0000002f: 48 06 OTHER ASE FP INSN-OTHER
. 00000031: b2 08 80 48 01 08 00 ff ff VA 0xffff000801488008
. 0000003a: 9a 00 00 LAT 0 XLAT
. 0000003d: 42 16 EV RETIRED L1D-ACCESS TLB-ACCESS
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Rename the macro to SPE_OP_PKT_OTHER_SUBCLASS_SVE to unify naming.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Decode a load or store from a GCS operation and the associated "common"
field.
After:
. 00000000: 49 44 LD GCS COMM
. 00000002: b2 18 3c d7 83 00 80 ff ff VA 0xffff800083d73c18
. 0000000b: 9a 00 00 LAT 0 XLAT
. 0000000e: 43 00 DATA-SOURCE 0
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Rename extended subclass and SVE/SME register access subclass, so that
the naming can be consistent cross all sub classes.
Add an log "SVE-SME-REG" for the SVE/SME register access, this is easier
for parsing.
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The operation subclass is extracted from bits [7..1] of the payload.
Since bit [0] is not parsed, there is no chance to match the memset type
(0x25). As a result, the memset payload is never parsed successfully.
Instead of extracting a unified bit field, change to extract the
specific bits for each operation subclass.
Fixes: 34fb60400e32 ("perf arm-spe: Add raw decoding for SPEv1.3 MTE and MOPS load/store")
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The user and system time events can record on different CPUs, but for
all other events a single CPU map of just CPU 0 makes sense. In
parse-events detect a tool PMU and then pass the perf_event_attr so
that the tool_pmu can return CPUs specific for the event. This avoids
a CPU map of all online CPUs being used for events like
duration_time. Avoiding this avoids the evlist CPUs containing CPUs
for which duration_time just gives 0. Minimizing the evlist CPUs can
remove unnecessary sched_setaffinity syscalls that delay metric
calculations.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
walltime_nsecs_stats is no longer used for counter values, move into
that stat_config where it controls certain things like noise
measurement.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The ru_stats are used to capture user and system time stats when a
process exits. These are then applied to user and system time tool
events if their reads fail due to the process terminating. Reduce the
scope now the metric code no longer reads these values.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When reading time values for metrics don't use the globals updated in
builtin-stat, just read the events as regular events. The only
exception is for time events where nanoseconds need converting to
seconds as metrics assume time metrics are in seconds.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When running in interval mode every third count of a time event isn't
showing properly:
```
$ perf stat -e duration_time -a -I 1000
1.001082862 1,002,290,425 duration_time
2.004264262 1,003,183,516 duration_time
3.007381401 <not counted> duration_time
4.011160141 1,003,705,631 duration_time
5.014515385 1,003,290,110 duration_time
6.018539680 <not counted> duration_time
7.022065321 1,003,591,720 duration_time
```
The regression came in with a different fix, found through bisection,
commit 68cb1567439f ("perf tool_pmu: Fix aggregation on
duration_time"). The issue is caused by the enabled and running time
of the event matching the old_count's and creating a delta of 0, which
is indicative of an error.
Fixes: 68cb1567439f ("perf tool_pmu: Fix aggregation on duration_time")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Prefer perf_cpu_map__new_int(0) to perf_cpu_map__new("0") as it avoids
strings parsing.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
0 counters may occur in hypervisor settings but metric-only output is
always expected. This resolves an issue in the "perf stat STD output
linter" test.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Falcon <thomas.falcon@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
In certain hypervisor set ups the value 0 may be returned but this is
only erroneous if the user rdpmc isn't disabled.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
If verbose is enabled and parse_event is called, typically by tests,
log failures.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
print_metric_only_json and print_metric_end in stat-display.c may
create a metric value of "none" which fails validation as isfloat. Add
a helper to properly validate metric numeric values.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
In the file tools/perf/util/cs-etm.c, queue_nr is of type unsigned
int and should be printed with %u.
Signed-off-by: liujing <liujing@cmss.chinamobile.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The evsel_script() function is unused since the linked commit. Fix the
build by removing it.
Fixes the following compilation error:
static inline struct evsel_script *evsel_script(struct evsel *evsel)
^
builtin-script.c:347:36: error: unused function 'evsel_script' [-Werror,-Wunused-function]
Fixes: 3622990efaab ("perf script: Change metric format to use json metrics")
Signed-off-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The instructions event is now provided in json meaning the has_event
test always succeeds. Switch to using non-legacy event names in the
affected metrics.
Reported-by: Thomas Richter <tmricht@linux.ibm.com>
Closes: https://lore.kernel.org/linux-perf-users/3e80f453-f015-4f4f-93d3-8df6bb6b3c95@linux.ibm.com/
Fixes: 0012e0fa221b ("perf jevents: Add legacy-hardware and legacy-cache json")
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Thomas Richter <tmricht@linux.ibm.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
errno.h isn't used in auxtrace.h so remove it and fix build failures
caused by transitive dependencies through auxtrace.h on errno.h.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The NO_AUXTRACE build option was used when the __get_cpuid feature
test failed or if it was provided on the command line. The option no
longer avoids a dependency on a library and so having the option is
just adding complexity to the code base. Remove the option
CONFIG_AUXTRACE from Build files and HAVE_AUXTRACE_SUPPORT by assuming
it is always defined.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The intel-pt code dependent on __get_cpuid is no longer present so
remove the feature test in the Makefile.config.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Rather than having a feature test and include of <cpuid.h> for the
__get_cpuid function, use the cpuid function provided by
tools/perf/arch/x86/util/cpuid.h.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Multiple threads may be creating and destroying BFD objects in
situations like `perf top`.
Without appropriate initialization crashes may occur during libbfd's
cache management.
BFD's locks require recursive mutexes, add support for these.
Committer testing:
This happens only when building with 'make BUILD_NONDISTRO=1' and having
the binutils-devel package (or equivalent) installed, i.e. linking with
binutils devel files, an opt-in perf build.
Before:
root@x1:~# perf top
perf: Segmentation fault
-------- backtrace --------
<SNIP multiple failed attempts at printing a backtrace>
root@x1:~#
After this patch it works as before.
Closes: https://lore.kernel.org/lkml/aQt66zhfxSA80xwt@gentoo.org/
Fixes: 95931d9a594dd0b5 ("perf libbfd: Move libbfd functionality to its own file")
Reported-by: Guilherme Amadio <amadio@gentoo.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Cross-merge networking fixes after downstream PR (net-6.18-rc6).
No conflicts, adjacent changes in:
drivers/net/phy/micrel.c
96a9178a29a6 ("net: phy: micrel: lan8814 fix reset of the QSGMII interface")
61b7ade9ba8c ("net: phy: micrel: Add support for non PTP SKUs for lan8814")
and a trivial one in tools/testing/selftests/drivers/net/Makefile.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Couple of independent fixes:
1. Wire in SIGSEGV handler that terminates the test with a failure code.
2. Use "--lock-cgroup" instead of "-g"; "-g" was proposed but never
merged. See commit 4d1792d0a2564caf ("perf lock contention: Add
--lock-cgroup option")
3. Call cleanup() on every normal exit so trap_cleanup() doesn't mistake
it for an unexpected signal and emit a false-negative "Unexpected
signal in main" message.
Before patch:
# ./perf test -vv "lock contention"
85: kernel lock contention analysis test:
--- start ---
test child forked, pid 610711
Testing perf lock record and perf lock contention
Testing perf lock contention --use-bpf
Testing perf lock record and perf lock contention at the same time
Testing perf lock contention --threads
Testing perf lock contention --lock-addr
Testing perf lock contention --lock-cgroup
Unexpected signal in test_aggr_cgroup
---- end(0) ----
85: kernel lock contention analysis test : Ok
After patch:
# ./perf test -vv "lock contention"
85: kernel lock contention analysis test:
--- start ---
test child forked, pid 602637
Testing perf lock record and perf lock contention
Testing perf lock contention --use-bpf
Testing perf lock record and perf lock contention at the same time
Testing perf lock contention --threads
Testing perf lock contention --lock-addr
Testing perf lock contention --lock-cgroup
Testing perf lock contention --type-filter (w/ spinlock)
Testing perf lock contention --lock-filter (w/ tasklist_lock)
Testing perf lock contention --callstack-filter (w/ unix_stream)
[Skip] Could not find 'unix_stream'
Testing perf lock contention --callstack-filter with task aggregation
[Skip] Could not find 'unix_stream'
Testing perf lock contention --cgroup-filter
Testing perf lock contention CSV output
---- end(0) ----
85: kernel lock contention analysis test : Ok
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Tycho Andersen <tycho@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Kernel maps are encoded in PERF_RECORD_MMAP2 samples but "perf lock
report" and "perf lock contention" do not process MMAP2 samples.
Because of that, machine->vmlinux_map stays NULL and any later access
triggers a segmentation fault.
Fix it by adding ->mmap2() callbacks.
Fixes: 53b00ff358dc75b1 ("perf record: Make --buildid-mmap the default")
Reported-by: Tycho Andersen (AMD) <tycho@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Tested-by: Tycho Andersen (AMD) <tycho@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
not available
This is one more remnant of the BUILD_NONDISTRO series to make building
with binutils-devel opt-in due to license incompatibility.
In this case just the references at link time were still in place, which
make building the test-all.bin file fail, which wasn't detected before
probably because the last test was done with binutils-devel available,
doh.
Now:
$ rpm -q binutils-devel
package binutils-devel is not installed
$ file /tmp/build/perf-tools/feature/test-all.bin
/tmp/build/perf-tools/feature/test-all.bin: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2,
BuildID[sha1]=4b5388a346b51f1b993f0b0dbd49f4570769b03c, for GNU/Linux 3.2.0, not stripped
$
Fixes: 970ae86307718c34 ("perf build: The bfd features are opt-in, stop testing for them by default")
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
With commit f0d0f978f3f5830a ("perf header: Don't write empty BPF/BTF
info"), the write_bpf_( prog_info() | btf() ) functions exit without
writing anything if env->bpf_prog.(infos| btfs)_cnt is zero.
process_bpf_( prog_info() | btf() ), however, still expect a "count"
value to exist in the data file. If btf information is empty, for
example, process_bpf_btf will read garbage or some other data as the
number of btf nodes in the data file. As a result, the data file will
not be processed correctly.
Instead, write the count to the data file and exit if it is zero.
Fixes: f0d0f978f3f5830a ("perf header: Don't write empty BPF/BTF info")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Thomas Falcon <thomas.falcon@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This adds test cases to verify the precise ip fallback logic:
- If the system supports precise ip, for an event given with the maximum
precision level, it should be able to decrease precise_ip to find a
supported level.
- The same fallback behavior should also work in more complex scenarios,
such as event groups or when PEBS is involved
Additional fallback tests, such as those covering missing feature cases,
can be added in the future.
Suggested-by: Ian Rogers <irogers@google.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
Reviewed-by: Ian Rogers <irogers!@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
One of my concern in the perf stat output was the alignment in the
metrics and shadow stats. I think it missed to calculate the basic
output length using COUNTS_LEN and EVNAME_LEN but missed to add the
unit length like "msec" and surround 2 spaces. I'm not sure why it's
not printed below though.
But anyway, now it shows correctly aligned metric output.
$ perf stat true
Performance counter stats for 'true':
859,772 task-clock # 0.395 CPUs utilized
0 context-switches # 0.000 /sec
0 cpu-migrations # 0.000 /sec
56 page-faults # 65.134 K/sec
1,075,022 instructions # 0.86 insn per cycle
1,255,911 cycles # 1.461 GHz
220,573 branches # 256.548 M/sec
7,381 branch-misses # 3.35% of all branches
TopdownL1 # 19.2 % tma_retiring
# 28.6 % tma_backend_bound
# 9.5 % tma_bad_speculation
# 42.6 % tma_frontend_bound
0.002174871 seconds time elapsed ^
|
0.002154000 seconds user |
0.000000000 seconds sys here
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
For the sake of better documentation, add core_wide and target_cpu to
the tool.json. When the values of system_wide and
user_requested_cpu_list are unknown, use the values from the global
stat_config.
Example output showing how '-a' modifies the values in `perf stat`:
```
$ perf stat -e core_wide,target_cpu true
Performance counter stats for 'true':
0 core_wide
0 target_cpu
0.000993787 seconds time elapsed
0.001128000 seconds user
0.000000000 seconds sys
$ perf stat -e core_wide,target_cpu -a true
Performance counter stats for 'system wide':
1 core_wide
1 target_cpu
0.002271723 seconds time elapsed
$ perf list
...
tool:
core_wide
[1 if not SMT,if SMT are events being gathered on all SMT threads 1 otherwise 0. Unit: tool]
...
target_cpu
[1 if CPUs being analyzed,0 if threads/processes. Unit: tool]
...
```
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Explicitly use a metric rather than implicitly expecting '-e
instructions,cycles' to produce a metric. Use a metric with software
events to make it more compatible.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
test_stat_record_report and test_stat_record_script used default
output which triggers a bug when sending metrics. As this isn't
relevant to the test switch to using named software events.
Update the match in test_hybrid as the cycles event is now cpu-cycles
to workaround potential ARM issues.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Previously '-e cycles,instructions' would implicitly create an IPC
metric. This now has to be explicit with '-M insn_per_cycle'.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Default metrics may use unsupported events and be ignored. These
metrics shouldn't cause metric testing to fail.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Make the expectations match json metrics rather than the previous hard
coded ones.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The Default[234] metric groups may contain unsupported legacy
events. Allow those metric groups to fail.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When testing metric-only, pass a metric to perf rather than expecting
a hard coded metric value to be generated.
Remove keys that were really metric-only units and instead don't
expect metric only to have a matching json key as it encodes metrics
as {"metric_name", "metric_value"}.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Remove code that tested the "unit" as in KB/sec for certain hard coded
metric values and did workarounds.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
To improve the readability of default events/metrics, sort the evsels
after the Default metric groups have be parsed.
Before:
```
$ perf stat -a sleep 1
Performance counter stats for 'system wide':
22,087 context-switches # nan cs/sec cs_per_second
TopdownL1 (cpu_core) # 10.3 % tma_bad_speculation
# 25.8 % tma_frontend_bound
# 34.5 % tma_backend_bound
# 29.3 % tma_retiring
7,829 page-faults # nan faults/sec page_faults_per_second
880,144,270 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (50.10%)
1,693,081,235 cpu_core/cpu-cycles/ # nan GHz cycles_frequency
TopdownL1 (cpu_atom) # 20.5 % tma_bad_speculation
# 13.8 % tma_retiring (50.26%)
# 34.6 % tma_frontend_bound (50.23%)
89,326,916 cpu_atom/branches/ # nan M/sec branch_frequency (60.19%)
538,123,088 cpu_core/branches/ # nan M/sec branch_frequency
1,368 cpu-migrations # nan migrations/sec migrations_per_second
# 31.1 % tma_backend_bound (60.19%)
0.00 msec cpu-clock # 0.0 CPUs CPUs_utilized
485,744,856 cpu_atom/instructions/ # 0.6 instructions insn_per_cycle (59.87%)
3,093,112,283 cpu_core/instructions/ # 1.8 instructions insn_per_cycle
4,939,427 cpu_atom/branch-misses/ # 5.0 % branch_miss_rate (49.77%)
7,632,248 cpu_core/branch-misses/ # 1.4 % branch_miss_rate
1.005084693 seconds time elapsed
```
After:
```
$ perf stat -a sleep 1
Performance counter stats for 'system wide':
22,165 context-switches # nan cs/sec cs_per_second
0.00 msec cpu-clock # 0.0 CPUs CPUs_utilized
2,260 cpu-migrations # nan migrations/sec migrations_per_second
20,476 page-faults # nan faults/sec page_faults_per_second
17,052,357 cpu_core/branch-misses/ # 1.5 % branch_miss_rate
1,120,090,590 cpu_core/branches/ # nan M/sec branch_frequency
3,402,892,275 cpu_core/cpu-cycles/ # nan GHz cycles_frequency
6,129,236,701 cpu_core/instructions/ # 1.8 instructions insn_per_cycle
6,159,523 cpu_atom/branch-misses/ # 3.1 % branch_miss_rate (49.86%)
222,158,812 cpu_atom/branches/ # nan M/sec branch_frequency (50.25%)
1,547,610,244 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (50.40%)
1,304,901,260 cpu_atom/instructions/ # 0.8 instructions insn_per_cycle (50.41%)
TopdownL1 (cpu_core) # 13.7 % tma_bad_speculation
# 23.5 % tma_frontend_bound
# 33.3 % tma_backend_bound
# 29.6 % tma_retiring
TopdownL1 (cpu_atom) # 32.1 % tma_backend_bound (59.65%)
# 30.1 % tma_frontend_bound (59.51%)
# 22.3 % tma_bad_speculation
# 15.5 % tma_retiring (59.53%)
1.008405429 seconds time elapsed
```
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The logic to skip output of a default metric line was firing on
Alderlake and not displaying 'TopdownL1 (cpu_atom)'. Remove the
need_full_name check as it is equivalent to the different PMU test in
the cases we care about, merge the 'if's and flip the evsel of the PMU
test. The 'if' is now basically saying, if the output matches the last
printed output then skip the output.
Before:
```
TopdownL1 (cpu_core) # 11.3 % tma_bad_speculation
# 24.3 % tma_frontend_bound
TopdownL1 (cpu_core) # 33.9 % tma_backend_bound
# 30.6 % tma_retiring
# 42.2 % tma_backend_bound
# 25.0 % tma_frontend_bound (49.81%)
# 12.8 % tma_bad_speculation
# 20.0 % tma_retiring (59.46%)
```
After:
```
TopdownL1 (cpu_core) # 8.3 % tma_bad_speculation
# 43.7 % tma_frontend_bound
# 30.7 % tma_backend_bound
# 17.2 % tma_retiring
TopdownL1 (cpu_atom) # 31.9 % tma_backend_bound
# 37.6 % tma_frontend_bound (49.66%)
# 18.0 % tma_bad_speculation
# 12.6 % tma_retiring (59.58%)
```
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Now that the metrics are encoded in common json the hard coded
printing means the metrics are shown twice. Remove the hard coded
version.
This means that when specifying events, and those events correspond to
a hard coded metric, the metric will no longer be displayed. The
metric will be displayed if the metric is requested. Due to the adhoc
printing in the previous approach it was often found frustrating, the
new approach avoids this.
The default perf stat output on an alderlake now looks like:
```
$ perf stat -a -- sleep 1
Performance counter stats for 'system wide':
19,697 context-switches # nan cs/sec cs_per_second
TopdownL1 (cpu_core) # 10.7 % tma_bad_speculation
# 24.9 % tma_frontend_bound
TopdownL1 (cpu_core) # 34.3 % tma_backend_bound
# 30.1 % tma_retiring
6,593 page-faults # nan faults/sec page_faults_per_second
729,065,658 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (49.79%)
1,605,131,101 cpu_core/cpu-cycles/ # nan GHz cycles_frequency
# 19.7 % tma_bad_speculation
# 14.2 % tma_retiring (50.14%)
# 37.3 % tma_frontend_bound (50.31%)
87,302,268 cpu_atom/branches/ # nan M/sec branch_frequency (60.27%)
512,046,956 cpu_core/branches/ # nan M/sec branch_frequency
1,111 cpu-migrations # nan migrations/sec migrations_per_second
# 28.8 % tma_backend_bound (60.26%)
0.00 msec cpu-clock # 0.0 CPUs CPUs_utilized
392,509,323 cpu_atom/instructions/ # 0.6 instructions insn_per_cycle (60.19%)
2,990,369,310 cpu_core/instructions/ # 1.9 instructions insn_per_cycle
3,493,478 cpu_atom/branch-misses/ # 5.9 % branch_miss_rate (49.69%)
7,297,531 cpu_core/branch-misses/ # 1.4 % branch_miss_rate
1.006621701 seconds time elapsed
```
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|