<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/perf, branch v6.4.5</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.4.5</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.4.5'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2023-07-19T14:35:14Z</updated>
<entry>
<title>drivers/perf: apple_m1: Force 63bit counters for M2 CPUs</title>
<updated>2023-07-19T14:35:14Z</updated>
<author>
<name>Marc Zyngier</name>
<email>maz@kernel.org</email>
</author>
<published>2023-05-28T08:02:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=dafc70ae6beab16e21d89666fff4d9c353fcaa15'/>
<id>urn:sha1:dafc70ae6beab16e21d89666fff4d9c353fcaa15</id>
<content type='text'>
[ Upstream commit 8be3593b9efa8903d2ee7bb9cdf57a8e56c66f36 ]

Sidharth reports that on M2, the PMU never generates any interrupt
when using 'perf record', which is a annoying as you get no sample.
I'm temped to say "no sample, no problem", but others may have
a different opinion.

Upon investigation, it appears that the counters on M2 are
significantly different from the ones on M1, as they count on
64 bits instead of 48. Which of course, in the fine M1 tradition,
means that we can only use 63 bits, as the top bit is used to signal
the interrupt...

This results in having to introduce yet another flag to indicate yet
another odd counter width. Who knows what the next crazy implementation
will do...

With this, perf can work out the correct offset, and 'perf record'
works as intended.

Tested on M2 and M2-Pro CPUs.

Cc: Janne Grunau &lt;j@jannau.net&gt;
Cc: Hector Martin &lt;marcan@marcan.st&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Fixes: 7d0bfb7c9977 ("drivers/perf: apple_m1: Add Apple M2 support")
Reported-by: Sidharth Kshatriya &lt;sid.kshatriya@gmail.com&gt;
Tested-by: Sidharth Kshatriya &lt;sid.kshatriya@gmail.com&gt;
Signed-off-by: Marc Zyngier &lt;maz@kernel.org&gt;
Link: https://lore.kernel.org/r/20230528080205.288446-1-maz@kernel.org
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>arm64: perf: Abstract system register accesses away</title>
<updated>2023-03-27T13:01:18Z</updated>
<author>
<name>Marc Zyngier</name>
<email>marc.zyngier@arm.com</email>
</author>
<published>2023-03-17T19:50:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=df29ddf4f04b00cf60669686dc7f534c3ed7c433'/>
<id>urn:sha1:df29ddf4f04b00cf60669686dc7f534c3ed7c433</id>
<content type='text'>
As we want to enable 32bit support, we need to distanciate the
PMUv3 driver from the AArch64 system register names.

This patch moves all system register accesses to an architecture
specific include file, allowing the 32bit counterpart to be
slotted in at a later time.

Signed-off-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Co-developed-by: Zaid Al-Bassam &lt;zalbassam@google.com&gt;
Signed-off-by: Zaid Al-Bassam &lt;zalbassam@google.com&gt;
Tested-by: Florian Fainelli &lt;f.fainelli@gmail.com&gt;
Link: https://lore.kernel.org/r/20230317195027.3746949-3-zalbassam@google.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>arm64: perf: Move PMUv3 driver to drivers/perf</title>
<updated>2023-03-27T13:01:18Z</updated>
<author>
<name>Marc Zyngier</name>
<email>marc.zyngier@arm.com</email>
</author>
<published>2023-03-17T19:50:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7755cec63adeecea3cbbf4032047812c37cf7cc3'/>
<id>urn:sha1:7755cec63adeecea3cbbf4032047812c37cf7cc3</id>
<content type='text'>
Having the ARM PMUv3 driver sitting in arch/arm64/kernel is getting
in the way of being able to use perf on ARMv8 cores running a 32bit
kernel, such as 32bit KVM guests.

This patch moves it into drivers/perf/arm_pmuv3.c, with an include
file in include/linux/perf/arm_pmuv3.h. The only thing left in
arch/arm64 is some mundane perf stuff.

Signed-off-by: Marc Zyngier &lt;marc.zyngier@arm.com&gt;
Signed-off-by: Zaid Al-Bassam &lt;zalbassam@google.com&gt;
Tested-by: Florian Fainelli &lt;f.fainelli@gmail.com&gt;
Link: https://lore.kernel.org/r/20230317195027.3746949-2-zalbassam@google.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm</title>
<updated>2023-02-25T19:30:21Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-02-25T19:30:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=49d575926890e6ada930bf6f06d62b2fde8fce95'/>
<id>urn:sha1:49d575926890e6ada930bf6f06d62b2fde8fce95</id>
<content type='text'>
Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Provide a virtual cache topology to the guest to avoid
     inconsistencies with migration on heterogenous systems. Non secure
     software has no practical need to traverse the caches by set/way in
     the first place

   - Add support for taking stage-2 access faults in parallel. This was
     an accidental omission in the original parallel faults
     implementation, but should provide a marginal improvement to
     machines w/o FEAT_HAFDBS (such as hardware from the fruit company)

   - A preamble to adding support for nested virtualization to KVM,
     including vEL2 register state, rudimentary nested exception
     handling and masking unsupported features for nested guests

   - Fixes to the PSCI relay that avoid an unexpected host SVE trap when
     resuming a CPU when running pKVM

   - VGIC maintenance interrupt support for the AIC

   - Improvements to the arch timer emulation, primarily aimed at
     reducing the trap overhead of running nested

   - Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the
     interest of CI systems

   - Avoid VM-wide stop-the-world operations when a vCPU accesses its
     own redistributor

   - Serialize when toggling CPACR_EL1.SMEN to avoid unexpected
     exceptions in the host

   - Aesthetic and comment/kerneldoc fixes

   - Drop the vestiges of the old Columbia mailing list and add [Oliver]
     as co-maintainer

  RISC-V:

   - Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE

   - Correctly place the guest in S-mode after redirecting a trap to the
     guest

   - Redirect illegal instruction traps to guest

   - SBI PMU support for guest

  s390:

   - Sort out confusion between virtual and physical addresses, which
     currently are the same on s390

   - A new ioctl that performs cmpxchg on guest memory

   - A few fixes

  x86:

   - Change tdp_mmu to a read-only parameter

   - Separate TDP and shadow MMU page fault paths

   - Enable Hyper-V invariant TSC control

   - Fix a variety of APICv and AVIC bugs, some of them real-world, some
     of them affecting architecurally legal but unlikely to happen in
     practice

   - Mark APIC timer as expired if its in one-shot mode and the count
     underflows while the vCPU task was being migrated

   - Advertise support for Intel's new fast REP string features

   - Fix a double-shootdown issue in the emergency reboot code

   - Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give
     SVM similar treatment to VMX

   - Update Xen's TSC info CPUID sub-leaves as appropriate

   - Add support for Hyper-V's extended hypercalls, where "support" at
     this point is just forwarding the hypercalls to userspace

   - Clean up the kvm-&gt;lock vs. kvm-&gt;srcu sequences when updating the
     PMU and MSR filters

   - One-off fixes and cleanups

   - Fix and cleanup the range-based TLB flushing code, used when KVM is
     running on Hyper-V

   - Add support for filtering PMU events using a mask. If userspace
     wants to restrict heavily what events the guest can use, it can now
     do so without needing an absurd number of filter entries

   - Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
     support is disabled

   - Add PEBS support for Intel Sapphire Rapids

   - Fix a mostly benign overflow bug in SEV's
     send|receive_update_data()

   - Move several SVM-specific flags into vcpu_svm

  x86 Intel:

   - Handle NMI VM-Exits before leaving the noinstr region

   - A few trivial cleanups in the VM-Enter flows

   - Stop enabling VMFUNC for L1 purely to document that KVM doesn't
     support EPTP switching (or any other VM function) for L1

   - Fix a crash when using eVMCS's enlighted MSR bitmaps

  Generic:

   - Clean up the hardware enable and initialization flow, which was
     scattered around multiple arch-specific hooks. Instead, just let
     the arch code call into generic code. Both x86 and ARM should
     benefit from not having to fight common KVM code's notion of how to
     do initialization

   - Account allocations in generic kvm_arch_alloc_vm()

   - Fix a memory leak if coalesced MMIO unregistration fails

  selftests:

   - On x86, cache the CPU vendor (AMD vs. Intel) and use the info to
     emit the correct hypercall instruction instead of relying on KVM to
     patch in VMMCALL

   - Use TAP interface for kvm_binary_stats_test and tsc_msrs_test"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits)
  KVM: SVM: hyper-v: placate modpost section mismatch error
  KVM: x86/mmu: Make tdp_mmu_allowed static
  KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID
  KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes
  KVM: arm64: nv: Filter out unsupported features from ID regs
  KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
  KVM: arm64: nv: Allow a sysreg to be hidden from userspace only
  KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
  KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
  KVM: arm64: nv: Handle SMCs taken from virtual EL2
  KVM: arm64: nv: Handle trapped ERET from virtual EL2
  KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
  KVM: arm64: nv: Support virtual EL2 exceptions
  KVM: arm64: nv: Handle HCR_EL2.NV system register traps
  KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
  KVM: arm64: nv: Add EL2 system registers to vcpu context
  KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
  KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
  KVM: arm64: nv: Introduce nested virtualization VCPU feature
  KVM: arm64: Use the S2 MMU context to iterate over S2 table
  ...
</content>
</entry>
<entry>
<title>arm_pmu: fix event CPU filtering</title>
<updated>2023-02-16T21:23:52Z</updated>
<author>
<name>Mark Rutland</name>
<email>mark.rutland@arm.com</email>
</author>
<published>2023-02-16T14:12:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=61d03862734360aad470019f160d484403a3923e'/>
<id>urn:sha1:61d03862734360aad470019f160d484403a3923e</id>
<content type='text'>
Janne reports that perf has been broken on Apple M1 as of commit:

  bd27568117664b8b ("perf: Rewrite core context handling")

That commit replaced the pmu::filter_match() callback with
pmu::filter(), whose return value has the opposite polarity, with true
implying events should be ignored rather than scheduled. While an
attempt was made to update the logic in armv8pmu_filter() and
armpmu_filter() accordingly, the return value remains inverted in a
couple of cases:

* If the arm_pmu does not have an arm_pmu::filter() callback,
  armpmu_filter() will always return whether the CPU is supported rather
  than whether the CPU is not supported.

  As a result, the perf core will not schedule events on supported CPUs,
  resulting in a loss of events. Additionally, the perf core will
  attempt to schedule events on unsupported CPUs, but this will be
  rejected by armpmu_add(), which may result in a loss of events from
  other PMUs on those unsupported CPUs.

* If the arm_pmu does have an arm_pmu::filter() callback, and
  armpmu_filter() is called on a CPU which is not supported by the
  arm_pmu, armpmu_filter() will return false rather than true.

  As a result, the perf core will attempt to schedule events on
  unsupported CPUs, but this will be rejected by armpmu_add(), which may
  result in a loss of events from other PMUs on those unsupported CPUs.

This means a loss of events can be seen with any arm_pmu driver, but
with the ARMv8 PMUv3 driver (which is the only arm_pmu driver with an
arm_pmu::filter() callback) the event loss will be more limited and may
go unnoticed, which is how this issue evaded testing so far.

Fix the CPU filtering by performing this consistently in
armpmu_filter(), and remove the redundant arm_pmu::filter() callback and
armv8pmu_filter() implementation.

Commit bd2756811766 also silently removed the CHAIN event filtering from
armv8pmu_filter(), which will be addressed by a separate patch without
using the filter callback.

Fixes: bd2756811766 ("perf: Rewrite core context handling")
Reported-by: Janne Grunau &lt;j@jannau.net&gt;
Link: https://lore.kernel.org/asahi/20230215-arm_pmu_m1_regression-v1-1-f5a266577c8d@jannau.net/
Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ravi Bangoria &lt;ravi.bangoria@amd.com&gt;
Cc: Asahi Lina &lt;lina@asahilina.net&gt;
Cc: Eric Curtin &lt;ecurtin@redhat.com&gt;
Tested-by: Janne Grunau &lt;j@jannau.net&gt;
Link: https://lore.kernel.org/r/20230216141240.3833272-2-mark.rutland@arm.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf: RISC-V: Improve privilege mode filtering for perf</title>
<updated>2023-02-07T15:05:33Z</updated>
<author>
<name>Atish Patra</name>
<email>atishp@rivosinc.com</email>
</author>
<published>2023-02-05T01:15:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8929283a687bb4b71ec9d3f1e827aecf829c6b1a'/>
<id>urn:sha1:8929283a687bb4b71ec9d3f1e827aecf829c6b1a</id>
<content type='text'>
Currently, the host driver doesn't have any method to identify if the
requested perf event is from kvm or bare metal. As KVM runs in HS
mode, there are no separate hypervisor privilege mode to distinguish
between the attributes for guest/host.

Improve the privilege mode filtering by using the event specific
config1 field.

Reviewed-by: Andrew Jones &lt;ajones@ventanamicro.com&gt;
Reviewed-by: Anup Patel &lt;anup@brainfault.org&gt;
Signed-off-by: Atish Patra &lt;atishp@rivosinc.com&gt;
Signed-off-by: Anup Patel &lt;anup@brainfault.org&gt;
</content>
</entry>
<entry>
<title>perf: RISC-V: Define helper functions expose hpm counter width and count</title>
<updated>2023-02-07T15:05:31Z</updated>
<author>
<name>Atish Patra</name>
<email>atishp@rivosinc.com</email>
</author>
<published>2023-02-05T01:15:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=585e351ff359c032ea7ab48d999b252ba09f8051'/>
<id>urn:sha1:585e351ff359c032ea7ab48d999b252ba09f8051</id>
<content type='text'>
KVM module needs to know how many hardware counters and the counter
width that the platform supports. Otherwise, it will not be able to show
optimal value of virtual counters to the guest. The virtual hardware
counters also need to have the same width as the logical hardware
counters for simplicity. However, there shouldn't be mapping between
virtual hardware counters and logical hardware counters. As we don't
support hetergeneous harts or counters with different width as of now,
the implementation relies on the counter width of the first available
programmable counter.

Reviewed-by: Anup Patel &lt;anup@brainfault.org&gt;
Reviewed-by: Andrew Jones &lt;ajones@ventanamicro.com&gt;
Signed-off-by: Atish Patra &lt;atishp@rivosinc.com&gt;
Signed-off-by: Anup Patel &lt;anup@brainfault.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'perf-core-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2022-12-12T23:19:38Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-12-12T23:19:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=add76959575736c194b3118d96e43f8cd7bcec82'/>
<id>urn:sha1:add76959575736c194b3118d96e43f8cd7bcec82</id>
<content type='text'>
Pull perf events updates from Ingo Molnar:

 - Thoroughly rewrite the data structures that implement perf task
   context handling, with the goal of fixing various quirks and
   unfeatures both in already merged, and in upcoming proposed code.

   The old data structure is the per task and per cpu
   perf_event_contexts:

         task_struct::perf_events_ctxp[] &lt;-&gt; perf_event_context &lt;-&gt; perf_cpu_context
              ^                                 |    ^     |           ^
              `---------------------------------'    |     `--&gt; pmu ---'
                                                     v           ^
                                                perf_event ------'

   In this new design this is replaced with a single task context and a
   single CPU context, plus intermediate data-structures:

         task_struct::perf_event_ctxp -&gt; perf_event_context &lt;- perf_cpu_context
              ^                           |   ^ ^
              `---------------------------'   | |
                                              | |    perf_cpu_pmu_context &lt;--.
                                              | `----.    ^                  |
                                              |      |    |                  |
                                              |      v    v                  |
                                              | ,--&gt; perf_event_pmu_context  |
                                              | |                            |
                                              | |                            |
                                              v v                            |
                                         perf_event ---&gt; pmu ----------------'

   [ See commit bd2756811766 for more details. ]

   This rewrite was developed by Peter Zijlstra and Ravi Bangoria.

 - Optimize perf_tp_event()

 - Update the Intel uncore PMU driver, extending it with UPI topology
   discovery on various hardware models.

 - Misc fixes &amp; cleanups

* tag 'perf-core-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  perf/x86/intel/uncore: Fix reference count leak in __uncore_imc_init_box()
  perf/x86/intel/uncore: Fix reference count leak in snr_uncore_mmio_map()
  perf/x86/intel/uncore: Fix reference count leak in hswep_has_limit_sbox()
  perf/x86/intel/uncore: Fix reference count leak in sad_cfg_iio_topology()
  perf/x86/intel/uncore: Make set_mapping() procedure void
  perf/x86/intel/uncore: Update sysfs-devices-mapping file
  perf/x86/intel/uncore: Enable UPI topology discovery for Sapphire Rapids
  perf/x86/intel/uncore: Enable UPI topology discovery for Icelake Server
  perf/x86/intel/uncore: Get UPI NodeID and GroupID
  perf/x86/intel/uncore: Enable UPI topology discovery for Skylake Server
  perf/x86/intel/uncore: Generalize get_topology() for SKX PMUs
  perf/x86/intel/uncore: Disable I/O stacks to PMU mapping on ICX-D
  perf/x86/intel/uncore: Clear attr_update properly
  perf/x86/intel/uncore: Introduce UPI topology type
  perf/x86/intel/uncore: Generalize IIO topology support
  perf/core: Don't allow grouping events from different hw pmus
  perf/amd/ibs: Make IBS a core pmu
  perf: Fix function pointer case
  perf/x86/amd: Remove the repeated declaration
  perf: Fix possible memleak in pmu_dev_alloc()
  ...
</content>
</entry>
<entry>
<title>arm_pmu: rework ACPI probing</title>
<updated>2022-11-07T16:16:20Z</updated>
<author>
<name>Mark Rutland</name>
<email>mark.rutland@arm.com</email>
</author>
<published>2022-09-30T11:18:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fe40ffdb7656d1f9c42dd402740765ff8b418b17'/>
<id>urn:sha1:fe40ffdb7656d1f9c42dd402740765ff8b418b17</id>
<content type='text'>
The current ACPI PMU probing logic tries to associate PMUs with CPUs
when the CPU is first brought online, in order to handle late hotplug,
though PMUs are only registered during early boot, and so for late
hotplugged CPUs this can only associate the CPU with an existing PMU.

We tried to be clever and the have the arm_pmu_acpi_cpu_starting()
callback allocate a struct arm_pmu when no matching instance is found,
in order to avoid duplication of logic. However, as above this doesn't
do anything useful for late hotplugged CPUs, and this requires us to
allocate memory in an atomic context, which is especially problematic
for PREEMPT_RT, as reported by Valentin and Pierre.

This patch reworks the probing to detect PMUs for all online CPUs in the
arm_pmu_acpi_probe() function, which is more aligned with how DT probing
works. The arm_pmu_acpi_cpu_starting() callback only tries to associate
CPUs with an existing arm_pmu instance, avoiding the problem of
allocating in atomic context.

Note that as we didn't previously register PMUs for late-hotplugged
CPUs, this change doesn't result in a loss of existing functionality,
though we will now warn when we cannot associate a CPU with a PMU.

This change allows us to pull the hotplug callback registration into the
arm_pmu_acpi_probe() function, as we no longer need the callbacks to be
invoked shortly after probing the boot CPUs, and can register it without
invoking the calls.

For the moment the arm_pmu_acpi_init() initcall remains to register the
SPE PMU, though in future this should probably be moved elsewhere (e.g.
the arm64 ACPI init code), since this doesn't need to be tied to the
regular CPU PMU code.

Signed-off-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Reported-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Link: https://lore.kernel.org/r/20210810134127.1394269-2-valentin.schneider@arm.com/
Reported-by: Pierre Gondois &lt;pierre.gondois@arm.com&gt;
Link: https://lore.kernel.org/linux-arm-kernel/20220912155105.1443303-1-pierre.gondois@arm.com/
Cc: Pierre Gondois &lt;pierre.gondois@arm.com&gt;
Cc: Valentin Schneider &lt;vschneid@redhat.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Reviewed-and-tested-by: Pierre Gondois &lt;pierre.gondois@arm.com&gt;
Link: https://lore.kernel.org/r/20220930111844.1522365-4-mark.rutland@arm.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf: Rewrite core context handling</title>
<updated>2022-10-27T18:12:16Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2022-10-08T06:24:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bd27568117664b8b3e259721393df420ed51f57b'/>
<id>urn:sha1:bd27568117664b8b3e259721393df420ed51f57b</id>
<content type='text'>
There have been various issues and limitations with the way perf uses
(task) contexts to track events. Most notable is the single hardware
PMU task context, which has resulted in a number of yucky things (both
proposed and merged).

Notably:
 - HW breakpoint PMU
 - ARM big.little PMU / Intel ADL PMU
 - Intel Branch Monitoring PMU
 - AMD IBS PMU
 - S390 cpum_cf PMU
 - PowerPC trace_imc PMU

*Current design:*

Currently we have a per task and per cpu perf_event_contexts:

  task_struct::perf_events_ctxp[] &lt;-&gt; perf_event_context &lt;-&gt; perf_cpu_context
       ^                                 |    ^     |           ^
       `---------------------------------'    |     `--&gt; pmu ---'
                                              v           ^
                                         perf_event ------'

Each task has an array of pointers to a perf_event_context. Each
perf_event_context has a direct relation to a PMU and a group of
events for that PMU. The task related perf_event_context's have a
pointer back to that task.

Each PMU has a per-cpu pointer to a per-cpu perf_cpu_context, which
includes a perf_event_context, which again has a direct relation to
that PMU, and a group of events for that PMU.

The perf_cpu_context also tracks which task context is currently
associated with that CPU and includes a few other things like the
hrtimer for rotation etc.

Each perf_event is then associated with its PMU and one
perf_event_context.

*Proposed design:*

New design proposed by this patch reduce to a single task context and
a single CPU context but adds some intermediate data-structures:

  task_struct::perf_event_ctxp -&gt; perf_event_context &lt;- perf_cpu_context
       ^                           |   ^ ^
       `---------------------------'   | |
                                       | |    perf_cpu_pmu_context &lt;--.
                                       | `----.    ^                  |
                                       |      |    |                  |
                                       |      v    v                  |
                                       | ,--&gt; perf_event_pmu_context  |
                                       | |                            |
                                       | |                            |
                                       v v                            |
                                  perf_event ---&gt; pmu ----------------'

With the new design, perf_event_context will hold all events for all
pmus in the (respective pinned/flexible) rbtrees. This can be achieved
by adding pmu to rbtree key:

  {cpu, pmu, cgroup, group_index}

Each perf_event_context carries a list of perf_event_pmu_context which
is used to hold per-pmu-per-context state. For example, it keeps track
of currently active events for that pmu, a pmu specific task_ctx_data,
a flag to tell whether rotation is required or not etc.

Additionally, perf_cpu_pmu_context is used to hold per-pmu-per-cpu
state like hrtimer details to drive the event rotation, a pointer to
perf_event_pmu_context of currently running task and some other
ancillary information.

Each perf_event is associated to it's pmu, perf_event_context and
perf_event_pmu_context.

Further optimizations to current implementation are possible. For
example, ctx_resched() can be optimized to reschedule only single pmu
events.

Much thanks to Ravi for picking this up and pushing it towards
completion.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Co-developed-by: Ravi Bangoria &lt;ravi.bangoria@amd.com&gt;
Signed-off-by: Ravi Bangoria &lt;ravi.bangoria@amd.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20221008062424.313-1-ravi.bangoria@amd.com
</content>
</entry>
</feed>
