<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/perf_event.h, branch v6.2.7</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.2.7</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.2.7'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2022-10-27T18:12:16Z</updated>
<entry>
<title>perf: Rewrite core context handling</title>
<updated>2022-10-27T18:12:16Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2022-10-08T06:24:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bd27568117664b8b3e259721393df420ed51f57b'/>
<id>urn:sha1:bd27568117664b8b3e259721393df420ed51f57b</id>
<content type='text'>
There have been various issues and limitations with the way perf uses
(task) contexts to track events. Most notable is the single hardware
PMU task context, which has resulted in a number of yucky things (both
proposed and merged).

Notably:
 - HW breakpoint PMU
 - ARM big.little PMU / Intel ADL PMU
 - Intel Branch Monitoring PMU
 - AMD IBS PMU
 - S390 cpum_cf PMU
 - PowerPC trace_imc PMU

*Current design:*

Currently we have a per task and per cpu perf_event_contexts:

  task_struct::perf_events_ctxp[] &lt;-&gt; perf_event_context &lt;-&gt; perf_cpu_context
       ^                                 |    ^     |           ^
       `---------------------------------'    |     `--&gt; pmu ---'
                                              v           ^
                                         perf_event ------'

Each task has an array of pointers to a perf_event_context. Each
perf_event_context has a direct relation to a PMU and a group of
events for that PMU. The task related perf_event_context's have a
pointer back to that task.

Each PMU has a per-cpu pointer to a per-cpu perf_cpu_context, which
includes a perf_event_context, which again has a direct relation to
that PMU, and a group of events for that PMU.

The perf_cpu_context also tracks which task context is currently
associated with that CPU and includes a few other things like the
hrtimer for rotation etc.

Each perf_event is then associated with its PMU and one
perf_event_context.

*Proposed design:*

New design proposed by this patch reduce to a single task context and
a single CPU context but adds some intermediate data-structures:

  task_struct::perf_event_ctxp -&gt; perf_event_context &lt;- perf_cpu_context
       ^                           |   ^ ^
       `---------------------------'   | |
                                       | |    perf_cpu_pmu_context &lt;--.
                                       | `----.    ^                  |
                                       |      |    |                  |
                                       |      v    v                  |
                                       | ,--&gt; perf_event_pmu_context  |
                                       | |                            |
                                       | |                            |
                                       v v                            |
                                  perf_event ---&gt; pmu ----------------'

With the new design, perf_event_context will hold all events for all
pmus in the (respective pinned/flexible) rbtrees. This can be achieved
by adding pmu to rbtree key:

  {cpu, pmu, cgroup, group_index}

Each perf_event_context carries a list of perf_event_pmu_context which
is used to hold per-pmu-per-context state. For example, it keeps track
of currently active events for that pmu, a pmu specific task_ctx_data,
a flag to tell whether rotation is required or not etc.

Additionally, perf_cpu_pmu_context is used to hold per-pmu-per-cpu
state like hrtimer details to drive the event rotation, a pointer to
perf_event_pmu_context of currently running task and some other
ancillary information.

Each perf_event is associated to it's pmu, perf_event_context and
perf_event_pmu_context.

Further optimizations to current implementation are possible. For
example, ctx_resched() can be optimized to reschedule only single pmu
events.

Much thanks to Ravi for picking this up and pushing it towards
completion.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Co-developed-by: Ravi Bangoria &lt;ravi.bangoria@amd.com&gt;
Signed-off-by: Ravi Bangoria &lt;ravi.bangoria@amd.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20221008062424.313-1-ravi.bangoria@amd.com
</content>
</entry>
<entry>
<title>perf: Fix missing SIGTRAPs</title>
<updated>2022-10-17T14:32:05Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2022-10-06T13:00:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ca6c21327c6af02b7eec31ce4b9a740a18c6c13f'/>
<id>urn:sha1:ca6c21327c6af02b7eec31ce4b9a740a18c6c13f</id>
<content type='text'>
Marco reported:

Due to the implementation of how SIGTRAP are delivered if
perf_event_attr::sigtrap is set, we've noticed 3 issues:

  1. Missing SIGTRAP due to a race with event_sched_out() (more
     details below).

  2. Hardware PMU events being disabled due to returning 1 from
     perf_event_overflow(). The only way to re-enable the event is
     for user space to first "properly" disable the event and then
     re-enable it.

  3. The inability to automatically disable an event after a
     specified number of overflows via PERF_EVENT_IOC_REFRESH.

The worst of the 3 issues is problem (1), which occurs when a
pending_disable is "consumed" by a racing event_sched_out(), observed
as follows:

		CPU0			|	CPU1
	--------------------------------+---------------------------
	__perf_event_overflow()		|
	 perf_event_disable_inatomic()	|
	  pending_disable = CPU0	| ...
					| _perf_event_enable()
					|  event_function_call()
					|   task_function_call()
					|    /* sends IPI to CPU0 */
	&lt;IPI&gt;				| ...
	 __perf_event_enable()		+---------------------------
	  ctx_resched()
	   task_ctx_sched_out()
	    ctx_sched_out()
	     group_sched_out()
	      event_sched_out()
	       pending_disable = -1
	&lt;/IPI&gt;
	&lt;IRQ-work&gt;
	 perf_pending_event()
	  perf_pending_event_disable()
	   /* Fails to send SIGTRAP because no pending_disable! */
	&lt;/IRQ-work&gt;

In the above case, not only is that particular SIGTRAP missed, but also
all future SIGTRAPs because 'event_limit' is not reset back to 1.

To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction
of a separate 'pending_sigtrap', no longer using 'event_limit' and
'pending_disable' for its delivery.

Additionally; and different to Marco's proposed patch:

 - recognise that pending_disable effectively duplicates oncpu for
   the case where it is set. As such, change the irq_work handler to
   use -&gt;oncpu to target the event and use pending_* as boolean toggles.

 - observe that SIGTRAP targets the ctx-&gt;task, so the context switch
   optimization that carries contexts between tasks is invalid. If
   the irq_work were delayed enough to hit after a context switch the
   SIGTRAP would be delivered to the wrong task.

 - observe that if the event gets scheduled out
   (rotation/migration/context-switch/...) the irq-work would be
   insufficient to deliver the SIGTRAP when the event gets scheduled
   back in (the irq-work might still be pending on the old CPU).

   Therefore have event_sched_out() convert the pending sigtrap into a
   task_work which will deliver the signal at return_to_user.

Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events")
Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Debugged-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Reported-by: Marco Elver &lt;elver@google.com&gt;
Debugged-by: Marco Elver &lt;elver@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Marco Elver &lt;elver@google.com&gt;
Tested-by: Marco Elver &lt;elver@google.com&gt;
</content>
</entry>
<entry>
<title>perf: Fix lockdep_assert_event_ctx()</title>
<updated>2022-10-04T11:32:08Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2022-10-04T08:46:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0ce38047e82a02017839b6cae837f13a1383a3a0'/>
<id>urn:sha1:0ce38047e82a02017839b6cae837f13a1383a3a0</id>
<content type='text'>
I'm a flamin' moron; because even after Mark told me it should be '&amp;&amp;'
I still got it wrong in the final commit.

Fixes: f3c0eba28704 ("perf: Add a few assertions")
Reported-by: Borislav Petkov &lt;bp@alien8.de&gt;
Reported-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Borislav Petkov &lt;bp@alien8.de&gt;
Link: https://lkml.kernel.org/r/YvvIWmDBWdIUCMZj@FVFF77S0Q05N
</content>
</entry>
<entry>
<title>perf: Use sample_flags for raw_data</title>
<updated>2022-09-27T20:50:24Z</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2022-09-21T22:00:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=838d9bb62d132ec3baf1b5aba2e95ef9a7a9a3cd'/>
<id>urn:sha1:838d9bb62d132ec3baf1b5aba2e95ef9a7a9a3cd</id>
<content type='text'>
Use the new sample_flags to indicate whether the raw data field is
filled by the PMU driver.  Although it could check with the NULL,
follow the same rule with other fields.

Remove the raw field from the perf_sample_data_init() to minimize
the number of cache lines touched.

Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20220921220032.2858517-2-namhyung@kernel.org
</content>
</entry>
<entry>
<title>perf: Use sample_flags for addr</title>
<updated>2022-09-27T20:50:24Z</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2022-09-21T22:00:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7b084630153152239d84990ac4540c2dd360186f'/>
<id>urn:sha1:7b084630153152239d84990ac4540c2dd360186f</id>
<content type='text'>
Use the new sample_flags to indicate whether the addr field is filled by
the PMU driver.  As most PMU drivers pass 0, it can set the flag only if
it has a non-zero value.  And use 0 in perf_sample_output() if it's not
filled already.

Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20220921220032.2858517-1-namhyung@kernel.org
</content>
</entry>
<entry>
<title>perf: Add a few assertions</title>
<updated>2022-09-07T19:54:01Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2022-09-02T16:48:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f3c0eba287049237b23d1300376768293eb89e69'/>
<id>urn:sha1:f3c0eba287049237b23d1300376768293eb89e69</id>
<content type='text'>
While auditing 6b959ba22d34 ("perf/core: Fix reentry problem in
perf_output_read_group()") a few spots were found that wanted
assertions.

Notable for_each_sibling_event() relies on exclusion from
modification. This would normally be holding either ctx-&gt;lock or
ctx-&gt;mutex, however due to how things are constructed disabling IRQs
is a valid and sufficient substitute for ctx-&gt;lock.

Another possible site to add assertions would be the various
pmu::{add,del,read,..}() methods, but that's not trivially expressable
in C -- the best option is wrappers, but those are easy enough to
forget.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
</content>
</entry>
<entry>
<title>perf/core: Assert PERF_EVENT_FLAG_ARCH does not overlap with generic flags</title>
<updated>2022-09-07T19:54:00Z</updated>
<author>
<name>Anshuman Khandual</name>
<email>anshuman.khandual@arm.com</email>
</author>
<published>2022-09-07T09:19:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f67dd218fafd9de9a13d095e775b621db76a058f'/>
<id>urn:sha1:f67dd218fafd9de9a13d095e775b621db76a058f</id>
<content type='text'>
This just ensures that PERF_EVENT_FLAG_ARCH does not overlap with generic
hardware event flags.

Signed-off-by: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: James Clark &lt;james.clark@arm.com&gt;
Link: https://lkml.kernel.org/r/20220907091924.439193-3-anshuman.khandual@arm.com
</content>
</entry>
<entry>
<title>perf/core: Expand PERF_EVENT_FLAG_ARCH</title>
<updated>2022-09-07T19:54:00Z</updated>
<author>
<name>Anshuman Khandual</name>
<email>anshuman.khandual@arm.com</email>
</author>
<published>2022-09-07T09:19:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7517f08b9a5eef0fa683b976c97d6178d00e6a3d'/>
<id>urn:sha1:7517f08b9a5eef0fa683b976c97d6178d00e6a3d</id>
<content type='text'>
Two hardware event flags on x86 platform has overshot PERF_EVENT_FLAG_ARCH
(0x0000ffff). These flags are PERF_X86_EVENT_PEBS_LAT_HYBRID (0x20000) and
PERF_X86_EVENT_AMD_BRS (0x10000). Lets expand PERF_EVENT_FLAG_ARCH mask to
accommodate those flags, and also create room for two more in the future.

Signed-off-by: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: James Clark &lt;james.clark@arm.com&gt;
Link: https://lkml.kernel.org/r/20220907091924.439193-2-anshuman.khandual@arm.com
</content>
</entry>
<entry>
<title>perf: Consolidate branch sample filter helpers</title>
<updated>2022-09-07T19:54:00Z</updated>
<author>
<name>Anshuman Khandual</name>
<email>anshuman.khandual@arm.com</email>
</author>
<published>2022-09-06T08:44:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=03b02db93be407103c385814033633364674a6f6'/>
<id>urn:sha1:03b02db93be407103c385814033633364674a6f6</id>
<content type='text'>
Besides the branch type filtering requests, 'event.attr.branch_sample_type'
also contains various flags indicating which additional information should
be captured, along with the base branch record. These flags help configure
the underlying hardware, and capture the branch records appropriately when
required e.g after PMU interrupt. But first, this moves an existing helper
perf_sample_save_hw_index() into the header before adding some more helpers
for other branch sample filter flags.

Signed-off-by: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20220906084414.396220-1-anshuman.khandual@arm.com
</content>
</entry>
<entry>
<title>perf: Use sample_flags for txn</title>
<updated>2022-09-06T09:33:03Z</updated>
<author>
<name>Kan Liang</name>
<email>kan.liang@linux.intel.com</email>
</author>
<published>2022-09-01T13:09:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ee9db0e14b0575aa827579dc2471a29ec5fc6877'/>
<id>urn:sha1:ee9db0e14b0575aa827579dc2471a29ec5fc6877</id>
<content type='text'>
Use the new sample_flags to indicate whether the txn field is filled by
the PMU driver.

Remove the txn field from the perf_sample_data_init() to minimize the
number of cache lines touched.

Signed-off-by: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20220901130959.1285717-7-kan.liang@linux.intel.com
</content>
</entry>
</feed>
