<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/perf_event.h, branch v6.4.9</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.4.9</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.4.9'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2023-06-16T14:46:33Z</updated>
<entry>
<title>perf/core: Drop __weak attribute from arch_perf_update_userpage() prototype</title>
<updated>2023-06-16T14:46:33Z</updated>
<author>
<name>Marc Zyngier</name>
<email>maz@kernel.org</email>
</author>
<published>2023-06-16T11:48:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b50f26a44887f3f71ff5457135ee1d5f1d542d7d'/>
<id>urn:sha1:b50f26a44887f3f71ff5457135ee1d5f1d542d7d</id>
<content type='text'>
Reiji reports that the arm64 implementation of arch_perf_update_userpage()
is now ignored and replaced by the dummy stub in core code.
This seems to happen since the PMUv3 driver was moved to driver/perf.

As it turns out, dropping the __weak attribute from the *prototype*
of the function solves the problem. You're right, this doesn't seem
to make much sense. And yet... It appears that both symbols get
flagged as weak, and that the first one to appear in the link order
wins:

$ nm drivers/perf/arm_pmuv3.o|grep arch_perf_update_userpage
0000000000001db0 W arch_perf_update_userpage

Dropping the attribute from the prototype restores the expected
behaviour, and arm64 is able to enjoy arch_perf_update_userpage()
again.

Fixes: 7755cec63ade ("arm64: perf: Move PMUv3 driver to drivers/perf")
Fixes: f1ec3a517b43 ("kernel/events: Add a missing prototype for arch_perf_update_userpage()")
Reported-by: Reiji Watanabe &lt;reijiw@google.com&gt;
Signed-off-by: Marc Zyngier &lt;maz@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Tested-by: Reiji Watanabe &lt;reijiw@google.com&gt;
Link: https://lkml.kernel.org/r/20230616114831.3186980-1-maz@kernel.org
</content>
</entry>
<entry>
<title>perf/core: Introduce perf_prepare_header()</title>
<updated>2023-01-18T10:57:20Z</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2023-01-18T06:05:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f6e707156e1d5d150f288823987bee1ba0104c4c'/>
<id>urn:sha1:f6e707156e1d5d150f288823987bee1ba0104c4c</id>
<content type='text'>
Factor out perf_prepare_header() so that it can call
perf_prepare_sample() without a header if not needed.

Also it checks the filtered_sample_type to avoid duplicate
work when perf_prepare_sample() is called twice (or more).

Suggested-by: Peter Zijlstr &lt;peterz@infradead.org&gt;
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Tested-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Acked-by: Song Liu &lt;song@kernel.org&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20230118060559.615653-8-namhyung@kernel.org
</content>
</entry>
<entry>
<title>perf/core: Add perf_sample_save_brstack() helper</title>
<updated>2023-01-18T10:57:20Z</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2023-01-18T06:05:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=eb55b455ef9c7123bdfa7e8a7f1ebeaa8034eb83'/>
<id>urn:sha1:eb55b455ef9c7123bdfa7e8a7f1ebeaa8034eb83</id>
<content type='text'>
When we saves the branch stack to the perf sample data, we needs to
update the sample flags and the dynamic size.  To make sure this is
done consistently, add the perf_sample_save_brstack() helper and
convert all call sites.

Suggested-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Tested-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Acked-by: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20230118060559.615653-5-namhyung@kernel.org
</content>
</entry>
<entry>
<title>perf/core: Add perf_sample_save_raw_data() helper</title>
<updated>2023-01-18T10:57:19Z</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2023-01-18T06:05:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0a9081cf0a11770f6b0affd377db8caa3ec4c793'/>
<id>urn:sha1:0a9081cf0a11770f6b0affd377db8caa3ec4c793</id>
<content type='text'>
When we save the raw_data to the perf sample data, we need to update
the sample flags and the dynamic size.  To make sure this is done
consistently, add the perf_sample_save_raw_data() helper and convert
all call sites.

Suggested-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Tested-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20230118060559.615653-4-namhyung@kernel.org
</content>
</entry>
<entry>
<title>perf/core: Add perf_sample_save_callchain() helper</title>
<updated>2023-01-18T10:57:19Z</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2023-01-18T06:05:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=31046500c1864b8ab25d1b9846ad10aa3f7b1821'/>
<id>urn:sha1:31046500c1864b8ab25d1b9846ad10aa3f7b1821</id>
<content type='text'>
When we save the callchain to the perf sample data, we need to update
the sample flags and the dynamic size.  To ensure this is done consistently,
add the perf_sample_save_callchain() helper and convert all call sites.

Suggested-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Tested-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Acked-by: Song Liu &lt;song@kernel.org&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20230118060559.615653-3-namhyung@kernel.org
</content>
</entry>
<entry>
<title>perf/core: Save the dynamic parts of sample data size</title>
<updated>2023-01-18T10:57:19Z</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2023-01-18T06:05:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4cf7a136115e96241f9f1089d2b53c47accf3823'/>
<id>urn:sha1:4cf7a136115e96241f9f1089d2b53c47accf3823</id>
<content type='text'>
The perf sample data can be divided into parts.  The event-&gt;header_size
and event-&gt;id_header_size keep the static part of the sample data which
is determined by the sample_type flags.

But other parts like CALLCHAIN and BRANCH_STACK are changing dynamically
so it needs to see the actual data.  In preparation of handling repeated
calls for perf_prepare_sample(), it can save the dynamic size to the
perf sample data to avoid the duplicate work.

Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Tested-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Acked-by: Song Liu &lt;song@kernel.org&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20230118060559.615653-2-namhyung@kernel.org
</content>
</entry>
<entry>
<title>perf/core: Change the layout of perf_sample_data</title>
<updated>2023-01-09T11:22:09Z</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2022-12-29T20:40:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7bdb1767bf011c7f6065ac483ad2f00e434c3979'/>
<id>urn:sha1:7bdb1767bf011c7f6065ac483ad2f00e434c3979</id>
<content type='text'>
The layout of perf_sample_data is designed to minimize cache-line
access.  The perf_sample_data_init() used to initialize a couple of
fields unconditionally so they were placed together at the head.

But it's changed now to set the fields according to the actual
sample_type flags.  The main user (the perf tools) sets the IP, TID,
TIME, PERIOD always.  Also group relevant fields like addr, phys_addr
and data_page_size.

Suggested-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20221229204101.1099430-1-namhyung@kernel.org
</content>
</entry>
<entry>
<title>perf: Add PMU_FORMAT_ATTR_SHOW</title>
<updated>2023-01-09T11:22:07Z</updated>
<author>
<name>Kan Liang</name>
<email>kan.liang@linux.intel.com</email>
</author>
<published>2023-01-04T20:13:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b6c00fb9949fbd073e651a77aa75faca978cf2a6'/>
<id>urn:sha1:b6c00fb9949fbd073e651a77aa75faca978cf2a6</id>
<content type='text'>
The macro PMU_FORMAT_ATTR facilitates the definition of both the "show"
function and "format_attr". But it only works for a non-hybrid platform.
For a hybrid platform, the name "format_attr_hybrid_" is used.

The definition of the "show" function can be shared between a non-hybrid
platform and a hybrid platform. Add a new macro PMU_FORMAT_ATTR_SHOW.

No functional change. The PMU_FORMAT_ATTR_SHOW will be used in the
following patch.

Signed-off-by: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20230104201349.1451191-1-kan.liang@linux.intel.com
</content>
</entry>
<entry>
<title>perf: Rewrite core context handling</title>
<updated>2022-10-27T18:12:16Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2022-10-08T06:24:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bd27568117664b8b3e259721393df420ed51f57b'/>
<id>urn:sha1:bd27568117664b8b3e259721393df420ed51f57b</id>
<content type='text'>
There have been various issues and limitations with the way perf uses
(task) contexts to track events. Most notable is the single hardware
PMU task context, which has resulted in a number of yucky things (both
proposed and merged).

Notably:
 - HW breakpoint PMU
 - ARM big.little PMU / Intel ADL PMU
 - Intel Branch Monitoring PMU
 - AMD IBS PMU
 - S390 cpum_cf PMU
 - PowerPC trace_imc PMU

*Current design:*

Currently we have a per task and per cpu perf_event_contexts:

  task_struct::perf_events_ctxp[] &lt;-&gt; perf_event_context &lt;-&gt; perf_cpu_context
       ^                                 |    ^     |           ^
       `---------------------------------'    |     `--&gt; pmu ---'
                                              v           ^
                                         perf_event ------'

Each task has an array of pointers to a perf_event_context. Each
perf_event_context has a direct relation to a PMU and a group of
events for that PMU. The task related perf_event_context's have a
pointer back to that task.

Each PMU has a per-cpu pointer to a per-cpu perf_cpu_context, which
includes a perf_event_context, which again has a direct relation to
that PMU, and a group of events for that PMU.

The perf_cpu_context also tracks which task context is currently
associated with that CPU and includes a few other things like the
hrtimer for rotation etc.

Each perf_event is then associated with its PMU and one
perf_event_context.

*Proposed design:*

New design proposed by this patch reduce to a single task context and
a single CPU context but adds some intermediate data-structures:

  task_struct::perf_event_ctxp -&gt; perf_event_context &lt;- perf_cpu_context
       ^                           |   ^ ^
       `---------------------------'   | |
                                       | |    perf_cpu_pmu_context &lt;--.
                                       | `----.    ^                  |
                                       |      |    |                  |
                                       |      v    v                  |
                                       | ,--&gt; perf_event_pmu_context  |
                                       | |                            |
                                       | |                            |
                                       v v                            |
                                  perf_event ---&gt; pmu ----------------'

With the new design, perf_event_context will hold all events for all
pmus in the (respective pinned/flexible) rbtrees. This can be achieved
by adding pmu to rbtree key:

  {cpu, pmu, cgroup, group_index}

Each perf_event_context carries a list of perf_event_pmu_context which
is used to hold per-pmu-per-context state. For example, it keeps track
of currently active events for that pmu, a pmu specific task_ctx_data,
a flag to tell whether rotation is required or not etc.

Additionally, perf_cpu_pmu_context is used to hold per-pmu-per-cpu
state like hrtimer details to drive the event rotation, a pointer to
perf_event_pmu_context of currently running task and some other
ancillary information.

Each perf_event is associated to it's pmu, perf_event_context and
perf_event_pmu_context.

Further optimizations to current implementation are possible. For
example, ctx_resched() can be optimized to reschedule only single pmu
events.

Much thanks to Ravi for picking this up and pushing it towards
completion.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Co-developed-by: Ravi Bangoria &lt;ravi.bangoria@amd.com&gt;
Signed-off-by: Ravi Bangoria &lt;ravi.bangoria@amd.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20221008062424.313-1-ravi.bangoria@amd.com
</content>
</entry>
<entry>
<title>perf: Fix missing SIGTRAPs</title>
<updated>2022-10-17T14:32:05Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2022-10-06T13:00:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ca6c21327c6af02b7eec31ce4b9a740a18c6c13f'/>
<id>urn:sha1:ca6c21327c6af02b7eec31ce4b9a740a18c6c13f</id>
<content type='text'>
Marco reported:

Due to the implementation of how SIGTRAP are delivered if
perf_event_attr::sigtrap is set, we've noticed 3 issues:

  1. Missing SIGTRAP due to a race with event_sched_out() (more
     details below).

  2. Hardware PMU events being disabled due to returning 1 from
     perf_event_overflow(). The only way to re-enable the event is
     for user space to first "properly" disable the event and then
     re-enable it.

  3. The inability to automatically disable an event after a
     specified number of overflows via PERF_EVENT_IOC_REFRESH.

The worst of the 3 issues is problem (1), which occurs when a
pending_disable is "consumed" by a racing event_sched_out(), observed
as follows:

		CPU0			|	CPU1
	--------------------------------+---------------------------
	__perf_event_overflow()		|
	 perf_event_disable_inatomic()	|
	  pending_disable = CPU0	| ...
					| _perf_event_enable()
					|  event_function_call()
					|   task_function_call()
					|    /* sends IPI to CPU0 */
	&lt;IPI&gt;				| ...
	 __perf_event_enable()		+---------------------------
	  ctx_resched()
	   task_ctx_sched_out()
	    ctx_sched_out()
	     group_sched_out()
	      event_sched_out()
	       pending_disable = -1
	&lt;/IPI&gt;
	&lt;IRQ-work&gt;
	 perf_pending_event()
	  perf_pending_event_disable()
	   /* Fails to send SIGTRAP because no pending_disable! */
	&lt;/IRQ-work&gt;

In the above case, not only is that particular SIGTRAP missed, but also
all future SIGTRAPs because 'event_limit' is not reset back to 1.

To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction
of a separate 'pending_sigtrap', no longer using 'event_limit' and
'pending_disable' for its delivery.

Additionally; and different to Marco's proposed patch:

 - recognise that pending_disable effectively duplicates oncpu for
   the case where it is set. As such, change the irq_work handler to
   use -&gt;oncpu to target the event and use pending_* as boolean toggles.

 - observe that SIGTRAP targets the ctx-&gt;task, so the context switch
   optimization that carries contexts between tasks is invalid. If
   the irq_work were delayed enough to hit after a context switch the
   SIGTRAP would be delivered to the wrong task.

 - observe that if the event gets scheduled out
   (rotation/migration/context-switch/...) the irq-work would be
   insufficient to deliver the SIGTRAP when the event gets scheduled
   back in (the irq-work might still be pending on the old CPU).

   Therefore have event_sched_out() convert the pending sigtrap into a
   task_work which will deliver the signal at return_to_user.

Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events")
Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Debugged-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Reported-by: Marco Elver &lt;elver@google.com&gt;
Debugged-by: Marco Elver &lt;elver@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Marco Elver &lt;elver@google.com&gt;
Tested-by: Marco Elver &lt;elver@google.com&gt;
</content>
</entry>
</feed>
