<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/events, branch v5.10.100</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.10.100</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.10.100'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2022-02-01T16:25:45Z</updated>
<entry>
<title>perf: Fix perf_event_read_local() time</title>
<updated>2022-02-01T16:25:45Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2021-12-20T12:19:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=91b04e83c71057927380d7597efe1e93e0bf3462'/>
<id>urn:sha1:91b04e83c71057927380d7597efe1e93e0bf3462</id>
<content type='text'>
[ Upstream commit 09f5e7dc7ad705289e1b1ec065439aa3c42951c4 ]

Time readers that cannot take locks (due to NMI etc..) currently make
use of perf_event::shadow_ctx_time, which, for that event gives:

  time' = now + (time - timestamp)

or, alternatively arranged:

  time' = time + (now - timestamp)

IOW, the progression of time since the last time the shadow_ctx_time
was updated.

There's problems with this:

 A) the shadow_ctx_time is per-event, even though the ctx_time it
    reflects is obviously per context. The direct concequence of this
    is that the context needs to iterate all events all the time to
    keep the shadow_ctx_time in sync.

 B) even with the prior point, the context itself might not be active
    meaning its time should not advance to begin with.

 C) shadow_ctx_time isn't consistently updated when ctx_time is

There are 3 users of this stuff, that suffer differently from this:

 - calc_timer_values()
   - perf_output_read()
   - perf_event_update_userpage()	/* A */

 - perf_event_read_local()		/* A,B */

In particular, perf_output_read() doesn't suffer at all, because it's
sample driven and hence only relevant when the event is actually
running.

This same was supposed to be true for perf_event_update_userpage(),
after all self-monitoring implies the context is active *HOWEVER*, as
per commit f79256532682 ("perf/core: fix userpage-&gt;time_enabled of
inactive events") this goes wrong when combined with counter
overcommit, in that case those events that do not get scheduled when
the context becomes active (task events typically) miss out on the
EVENT_TIME update and ENABLED time is inflated (for a little while)
with the time the context was inactive. Once the event gets rotated
in, this gets corrected, leading to a non-monotonic timeflow.

perf_event_read_local() made things even worse, it can request time at
any point, suffering all the problems perf_event_update_userpage()
does and more. Because while perf_event_update_userpage() is limited
by the context being active, perf_event_read_local() users have no
such constraint.

Therefore, completely overhaul things and do away with
perf_event::shadow_ctx_time. Instead have regular context time updates
keep track of this offset directly and provide perf_event_time_now()
to complement perf_event_time().

perf_event_time_now() will, in adition to being context wide, also
take into account if the context is active. For inactive context, it
will not advance time.

This latter property means the cgroup perf_cgroup_info context needs
to grow addition state to track this.

Additionally, since all this is strictly per-cpu, we can use barrier()
to order context activity vs context time.

Fixes: 7d9285e82db5 ("perf/bpf: Extend the perf_event_read_local() interface, a.k.a. "bpf: perf event change needed for subsequent bpf helpers"")
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Song Liu &lt;song@kernel.org&gt;
Tested-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Link: https://lkml.kernel.org/r/YcB06DasOBtU0b00@hirez.programming.kicks-ass.net
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>kernel: delete repeated words in comments</title>
<updated>2022-02-01T16:25:45Z</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2021-02-26T01:21:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cffed7e631b565310029cd5c69b8a2394adc72b3'/>
<id>urn:sha1:cffed7e631b565310029cd5c69b8a2394adc72b3</id>
<content type='text'>
[ Upstream commit c034f48e99907d5be147ac8f0f3e630a9307c2be ]

Drop repeated words in kernel/events/.
{if, the, that, with, time}

Drop repeated words in kernel/locking/.
{it, no, the}

Drop repeated words in kernel/sched/.
{in, not}

Link: https://lkml.kernel.org/r/20210127023412.26292-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Acked-by: Will Deacon &lt;will@kernel.org&gt;	[kernel/locking/]
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@kernel.org&gt;
Cc: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Cc: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf: Protect perf_guest_cbs with RCU</title>
<updated>2022-01-20T08:17:50Z</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2021-11-11T02:07:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=723acd75a062f7630ed9149733a47d4158f5dbdf'/>
<id>urn:sha1:723acd75a062f7630ed9149733a47d4158f5dbdf</id>
<content type='text'>
commit ff083a2d972f56bebfd82409ca62e5dfce950961 upstream.

Protect perf_guest_cbs with RCU to fix multiple possible errors.  Luckily,
all paths that read perf_guest_cbs already require RCU protection, e.g. to
protect the callback chains, so only the direct perf_guest_cbs touchpoints
need to be modified.

Bug #1 is a simple lack of WRITE_ONCE/READ_ONCE behavior to ensure
perf_guest_cbs isn't reloaded between a !NULL check and a dereference.
Fixed via the READ_ONCE() in rcu_dereference().

Bug #2 is that on weakly-ordered architectures, updates to the callbacks
themselves are not guaranteed to be visible before the pointer is made
visible to readers.  Fixed by the smp_store_release() in
rcu_assign_pointer() when the new pointer is non-NULL.

Bug #3 is that, because the callbacks are global, it's possible for
readers to run in parallel with an unregisters, and thus a module
implementing the callbacks can be unloaded while readers are in flight,
resulting in a use-after-free.  Fixed by a synchronize_rcu() call when
unregistering callbacks.

Bug #1 escaped notice because it's extremely unlikely a compiler will
reload perf_guest_cbs in this sequence.  perf_guest_cbs does get reloaded
for future derefs, e.g. for -&gt;is_user_mode(), but the -&gt;is_in_guest()
guard all but guarantees the consumer will win the race, e.g. to nullify
perf_guest_cbs, KVM has to completely exit the guest and teardown down
all VMs before KVM start its module unload / unregister sequence.  This
also makes it all but impossible to encounter bug #3.

Bug #2 has not been a problem because all architectures that register
callbacks are strongly ordered and/or have a static set of callbacks.

But with help, unloading kvm_intel can trigger bug #1 e.g. wrapping
perf_guest_cbs with READ_ONCE in perf_misc_flags() while spamming
kvm_intel module load/unload leads to:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP
  CPU: 6 PID: 1825 Comm: stress Not tainted 5.14.0-rc2+ #459
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:perf_misc_flags+0x1c/0x70
  Call Trace:
   perf_prepare_sample+0x53/0x6b0
   perf_event_output_forward+0x67/0x160
   __perf_event_overflow+0x52/0xf0
   handle_pmi_common+0x207/0x300
   intel_pmu_handle_irq+0xcf/0x410
   perf_event_nmi_handler+0x28/0x50
   nmi_handle+0xc7/0x260
   default_do_nmi+0x6b/0x170
   exc_nmi+0x103/0x130
   asm_exc_nmi+0x76/0xbf

Fixes: 39447b386c84 ("perf: Enhance perf to allow for guest statistic collection from host")
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211111020738.2512932-2-seanjc@google.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "perf: Rework perf_event_exit_event()"</title>
<updated>2021-11-26T09:39:22Z</updated>
<author>
<name>Sasha Levin</name>
<email>sashal@kernel.org</email>
</author>
<published>2021-11-25T00:18:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d35250ec5a23771187c85a46e6812d5943b5c13e'/>
<id>urn:sha1:d35250ec5a23771187c85a46e6812d5943b5c13e</id>
<content type='text'>
This reverts commit 94902ee2996a7f71471138093495df452dab87b6 which is
upstream commit ef54c1a476aef7eef26fe13ea10dc090952c00f8.

Reverting for now due to issues that need to get fixed upstream.

Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf/core: Avoid put_page() when GUP fails</title>
<updated>2021-11-21T12:46:37Z</updated>
<author>
<name>Greg Thelen</name>
<email>gthelen@google.com</email>
</author>
<published>2021-11-11T02:18:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bd40513d0bee24095948abfaa17d3ad8f444a85b'/>
<id>urn:sha1:bd40513d0bee24095948abfaa17d3ad8f444a85b</id>
<content type='text'>
commit 4716023a8f6a0f4a28047f14dd7ebdc319606b84 upstream.

PEBS PERF_SAMPLE_PHYS_ADDR events use perf_virt_to_phys() to convert PMU
sampled virtual addresses to physical using get_user_page_fast_only()
and page_to_phys().

Some get_user_page_fast_only() error cases return false, indicating no
page reference, but still initialize the output page pointer with an
unreferenced page. In these error cases perf_virt_to_phys() calls
put_page(). This causes page reference count underflow, which can lead
to unintentional page sharing.

Fix perf_virt_to_phys() to only put_page() if get_user_page_fast_only()
returns a referenced page.

Fixes: fc7ce9c74c3ad ("perf/core, x86: Add PERF_SAMPLE_PHYS_ADDR")
Signed-off-by: Greg Thelen &lt;gthelen@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20211111021814.757086-1-gthelen@google.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>perf/core: fix userpage-&gt;time_enabled of inactive events</title>
<updated>2021-10-17T08:43:33Z</updated>
<author>
<name>Song Liu</name>
<email>songliubraving@fb.com</email>
</author>
<published>2021-09-29T19:43:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bdae2a08343613782be6c5be97b6c3a1ebccbfff'/>
<id>urn:sha1:bdae2a08343613782be6c5be97b6c3a1ebccbfff</id>
<content type='text'>
[ Upstream commit f792565326825ed806626da50c6f9a928f1079c1 ]

Users of rdpmc rely on the mmapped user page to calculate accurate
time_enabled. Currently, userpage-&gt;time_enabled is only updated when the
event is added to the pmu. As a result, inactive event (due to counter
multiplexing) does not have accurate userpage-&gt;time_enabled. This can
be reproduced with something like:

   /* open 20 task perf_event "cycles", to create multiplexing */

   fd = perf_event_open();  /* open task perf_event "cycles" */
   userpage = mmap(fd);     /* use mmap and rdmpc */

   while (true) {
     time_enabled_mmap = xxx; /* use logic in perf_event_mmap_page */
     time_enabled_read = read(fd).time_enabled;
     if (time_enabled_mmap &gt; time_enabled_read)
         BUG();
   }

Fix this by updating userpage for inactive events in merge_sched_in.

Suggested-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reported-and-tested-by: Lucian Grijincu &lt;lucian@fb.com&gt;
Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20210929194313.2398474-1-songliubraving@fb.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>events: Reuse value read using READ_ONCE instead of re-reading it</title>
<updated>2021-09-22T10:27:57Z</updated>
<author>
<name>Baptiste Lepers</name>
<email>baptiste.lepers@gmail.com</email>
</author>
<published>2021-09-06T01:53:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c5102ced8ac0a106182332de15ea2b2920581811'/>
<id>urn:sha1:c5102ced8ac0a106182332de15ea2b2920581811</id>
<content type='text'>
commit b89a05b21f46150ac10a962aa50109250b56b03b upstream.

In perf_event_addr_filters_apply, the task associated with
the event (event-&gt;ctx-&gt;task) is read using READ_ONCE at the beginning
of the function, checked, and then re-read from event-&gt;ctx-&gt;task,
voiding all guarantees of the checks. Reuse the value that was read by
READ_ONCE to ensure the consistency of the task struct throughout the
function.

Fixes: 375637bc52495 ("perf/core: Introduce address range filtering")
Signed-off-by: Baptiste Lepers &lt;baptiste.lepers@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20210906015310.12802-1-baptiste.lepers@gmail.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>perf: Fix data race between pin_count increment/decrement</title>
<updated>2021-06-16T10:01:45Z</updated>
<author>
<name>Marco Elver</name>
<email>elver@google.com</email>
</author>
<published>2021-05-27T10:47:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c64a3be39fa9fe473dd85cf1e6a4a7b27c0d935f'/>
<id>urn:sha1:c64a3be39fa9fe473dd85cf1e6a4a7b27c0d935f</id>
<content type='text'>
commit 6c605f8371159432ec61cbb1488dcf7ad24ad19a upstream.

KCSAN reports a data race between increment and decrement of pin_count:

  write to 0xffff888237c2d4e0 of 4 bytes by task 15740 on cpu 1:
   find_get_context		kernel/events/core.c:4617
   __do_sys_perf_event_open	kernel/events/core.c:12097 [inline]
   __se_sys_perf_event_open	kernel/events/core.c:11933
   ...
  read to 0xffff888237c2d4e0 of 4 bytes by task 15743 on cpu 0:
   perf_unpin_context		kernel/events/core.c:1525 [inline]
   __do_sys_perf_event_open	kernel/events/core.c:12328 [inline]
   __se_sys_perf_event_open	kernel/events/core.c:11933
   ...

Because neither read-modify-write here is atomic, this can lead to one
of the operations being lost, resulting in an inconsistent pin_count.
Fix it by adding the missing locking in the CPU-event case.

Fixes: fe4b04fa31a6 ("perf: Cure task_oncpu_function_call() races")
Reported-by: syzbot+142c9018f5962db69c7e@syzkaller.appspotmail.com
Signed-off-by: Marco Elver &lt;elver@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20210527104711.2671610-1-elver@google.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>perf: Rework perf_event_exit_event()</title>
<updated>2021-05-11T12:47:31Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2021-04-08T10:35:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=94902ee2996a7f71471138093495df452dab87b6'/>
<id>urn:sha1:94902ee2996a7f71471138093495df452dab87b6</id>
<content type='text'>
[ Upstream commit ef54c1a476aef7eef26fe13ea10dc090952c00f8 ]

Make perf_event_exit_event() more robust, such that we can use it from
other contexts. Specifically the up and coming remove_on_exec.

For this to work we need to address a few issues. Remove_on_exec will
not destroy the entire context, so we cannot rely on TASK_TOMBSTONE to
disable event_function_call() and we thus have to use
perf_remove_from_context().

When using perf_remove_from_context(), there's two races to consider.
The first is against close(), where we can have concurrent tear-down
of the event. The second is against child_list iteration, which should
not find a half baked event.

To address this, teach perf_remove_from_context() to special case
!ctx-&gt;is_active and about DETACH_CHILD.

[ elver@google.com: fix racing parent/child exit in sync_child_event(). ]
Signed-off-by: Marco Elver &lt;elver@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20210408103605.1676875-2-elver@google.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf/core: Fix unconditional security_locked_down() call</title>
<updated>2021-05-07T09:04:33Z</updated>
<author>
<name>Ondrej Mosnacek</name>
<email>omosnace@redhat.com</email>
</author>
<published>2021-02-24T21:56:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4348d3b5027bc3ff6336368b6c60605d4ef8e1ce'/>
<id>urn:sha1:4348d3b5027bc3ff6336368b6c60605d4ef8e1ce</id>
<content type='text'>
commit 08ef1af4de5fe7de9c6d69f1e22e51b66e385d9b upstream.

Currently, the lockdown state is queried unconditionally, even though
its result is used only if the PERF_SAMPLE_REGS_INTR bit is set in
attr.sample_type. While that doesn't matter in case of the Lockdown LSM,
it causes trouble with the SELinux's lockdown hook implementation.

SELinux implements the locked_down hook with a check whether the current
task's type has the corresponding "lockdown" class permission
("integrity" or "confidentiality") allowed in the policy. This means
that calling the hook when the access control decision would be ignored
generates a bogus permission check and audit record.

Fix this by checking sample_type first and only calling the hook when
its result would be honored.

Fixes: b0c8fdc7fdb7 ("lockdown: Lock down perf when in confidentiality mode")
Signed-off-by: Ondrej Mosnacek &lt;omosnace@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Paul Moore &lt;paul@paul-moore.com&gt;
Link: https://lkml.kernel.org/r/20210224215628.192519-1-omosnace@redhat.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
