user/sven/linux.git/include/linux/perf_event.h, branch v6.11.4

drivers/perf: arm_spe: Use perf_allow_kernel() for permissions

2024-10-10T10:03:15Z

[ Upstream commit 5e9629d0ae977d6f6916d7e519724804e95f0b07 ] Use perf_allow_kernel() for 'pa_enable' (physical addresses), 'pct_enable' (physical timestamps) and context IDs. This means that perf_event_paranoid is now taken into account and LSM hooks can be used, which is more consistent with other perf_event_open calls. For example PERF_SAMPLE_PHYS_ADDR uses perf_allow_kernel() rather than just perfmon_capable(). This also indirectly fixes the following error message which is misleading because perf_event_paranoid is not taken into account by perfmon_capable(): $ perf record -e arm_spe/pa_enable/ Error: Access to performance monitoring and observability operations is limited. Consider adjusting /proc/sys/kernel/perf_event_paranoid setting ... Suggested-by: Al Grant Signed-off-by: James Clark Link: https://lore.kernel.org/r/20240827145113.1224604-1-james.clark@linaro.org Link: https://lore.kernel.org/all/20240807120039.GD37996@noisy.programming.kicks-ass.net/ Signed-off-by: Will Deacon Signed-off-by: Sasha Levin

sysctl: treewide: constify the ctl_table argument of proc_handlers

2024-07-24T18:59:29Z

const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified. This patch has been generated by the following coccinelle script: ``` virtual patch @r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)"; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos); @r2@ identifier func, ctl, write, buffer, lenp, ppos; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos) { ... } @r3@ identifier func; @@ int func( - struct ctl_table * + const struct ctl_table * ,int , void *, size_t *, loff_t *); @r4@ identifier func, ctl; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int , void *, size_t *, loff_t *); @r5@ identifier func, write, buffer, lenp, ppos; @@ int func( - struct ctl_table * + const struct ctl_table * ,int write, void *buffer, size_t *lenp, loff_t *ppos); ``` * Code formatting was adjusted in xfs_sysctl.c to comply with code conventions. The xfs_stats_clear_proc_handler, xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where adjusted. * The ctl_table argument in proc_watchdog_common was const qualified. This is called from a proc_handler itself and is calling back into another proc_handler, making it necessary to change it as part of the proc_handler migration. Co-developed-by: Thomas Weißschuh Signed-off-by: Thomas Weißschuh Co-developed-by: Joel Granados Signed-off-by: Joel Granados

perf: Split __perf_pending_irq() out of perf_pending_irq()

2024-07-09T11:26:37Z

perf_pending_irq() invokes perf_event_wakeup() and __perf_pending_irq(). The former is in charge of waking any tasks which waits to be woken up while the latter disables perf-events. The irq_work perf_pending_irq(), while this an irq_work, the callback is invoked in thread context on PREEMPT_RT. This is needed because all the waking functions (wake_up_all(), kill_fasync()) acquire sleep locks which must not be used with disabled interrupts. Disabling events, as done by __perf_pending_irq(), expects a hardirq context and disabled interrupts. This requirement is not fulfilled on PREEMPT_RT. Split functionality based on perf_event::pending_disable into irq_work named `pending_disable_irq' and invoke it in hardirq context on PREEMPT_RT. Rename the split out callback to perf_pending_disable(). Reported-by: Arnaldo Carvalho de Melo Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Peter Zijlstra (Intel) Tested-by: Marco Elver Tested-by: Arnaldo Carvalho de Melo Link: https://lore.kernel.org/r/20240704170424.1466941-8-bigeasy@linutronix.de

perf: Move swevent_htable::recursion into task_struct.

2024-07-09T11:26:36Z

The swevent_htable::recursion counter is used to avoid creating an swevent while an event is processed to avoid recursion. The counter is per-CPU and preemption must be disabled to have a stable counter. perf_pending_task() disables preemption to access the counter and then signal. This is problematic on PREEMPT_RT because sending a signal uses a spinlock_t which must not be acquired in atomic on PREEMPT_RT because it becomes a sleeping lock. The atomic context can be avoided by moving the counter into the task_struct. There is a 4 byte hole between futex_state (usually always on) and the following perf pointer (perf_event_ctxp). After the recursion lost some weight it fits perfectly. Move swevent_htable::recursion into task_struct. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Peter Zijlstra (Intel) Tested-by: Marco Elver Link: https://lore.kernel.org/r/20240704170424.1466941-6-bigeasy@linutronix.de

perf: Enqueue SIGTRAP always via task_work.

2024-07-09T11:26:35Z

A signal is delivered by raising irq_work() which works from any context including NMI. irq_work() can be delayed if the architecture does not provide an interrupt vector. In order not to lose a signal, the signal is injected via task_work during event_sched_out(). Instead going via irq_work, the signal could be added directly via task_work. The signal is sent to current and can be enqueued on its return path to userland. Queue signal via task_work and consider possible NMI context. Remove perf_event::pending_sigtrap and and use perf_event::pending_work instead. Reported-by: Arnaldo Carvalho de Melo Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Peter Zijlstra (Intel) Tested-by: Marco Elver Tested-by: Arnaldo Carvalho de Melo Link: https://lore.kernel.org/r/20240704170424.1466941-4-bigeasy@linutronix.de

perf: Fix event leak upon exec and file release

2024-07-09T11:26:33Z

The perf pending task work is never waited upon the matching event release. In the case of a child event, released via free_event() directly, this can potentially result in a leaked event, such as in the following scenario that doesn't even require a weak IRQ work implementation to trigger: schedule() prepare_task_switch() =======> perf_event_overflow() event->pending_sigtrap = ... irq_work_queue(&event->pending_irq) <======= perf_event_task_sched_out() event_sched_out() event->pending_sigtrap = 0; atomic_long_inc_not_zero(&event->refcount) task_work_add(&event->pending_task) finish_lock_switch() =======> perf_pending_irq() //do nothing, rely on pending task work <======= begin_new_exec() perf_event_exit_task() perf_event_exit_event() // If is child event free_event() WARN(atomic_long_cmpxchg(&event->refcount, 1, 0) != 1) // event is leaked Similar scenarios can also happen with perf_event_remove_on_exec() or simply against concurrent perf_event_release(). Fix this with synchonizing against the possibly remaining pending task work while freeing the event, just like is done with remaining pending IRQ work. This means that the pending task callback neither need nor should hold a reference to the event, preventing it from ever beeing freed. Fixes: 517e6a301f34 ("perf: Fix perf_pending_task() UaF") Signed-off-by: Frederic Weisbecker Signed-off-by: Peter Zijlstra (Intel) Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240621091601.18227-5-frederic@kernel.org

perf: Move perf_event_fasync() to perf_event.h

2024-04-14T20:26:32Z

This will allow it to be called from perf_output_wakeup(). Signed-off-by: Kyle Huey Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20240413141618.4160-2-khuey@kylehuey.com

perf/bpf: Remove unneeded uses_default_overflow_handler()

2024-04-12T09:49:50Z

Now that struct perf_event's orig_overflow_handler is gone, there's no need for the functions and macros to support looking past overflow_handler to orig_overflow_handler. This patch is solely a refactoring and results in no behavior change. Signed-off-by: Kyle Huey Signed-off-by: Ingo Molnar Acked-by: Will Deacon Acked-by: Song Liu Acked-by: Jiri Olsa Acked-by: Andrii Nakryiko Link: https://lore.kernel.org/r/20240412015019.7060-6-khuey@kylehuey.com

perf/bpf: Call BPF handler directly, not through overflow machinery

2024-04-12T09:49:49Z

To ultimately allow BPF programs attached to perf events to completely suppress all of the effects of a perf event overflow (rather than just the sample output, as they do today), call bpf_overflow_handler() from __perf_event_overflow() directly rather than modifying struct perf_event's overflow_handler. Return the BPF program's return value from bpf_overflow_handler() so that __perf_event_overflow() knows how to proceed. Remove the now unnecessary orig_overflow_handler from struct perf_event. This patch is solely a refactoring and results in no behavior change. Suggested-by: Namhyung Kim Signed-off-by: Kyle Huey Signed-off-by: Ingo Molnar Acked-by: Song Liu Acked-by: Jiri Olsa Acked-by: Andrii Nakryiko Link: https://lore.kernel.org/r/20240412015019.7060-5-khuey@kylehuey.com

perf/bpf: Remove #ifdef CONFIG_BPF_SYSCALL from struct perf_event members

2024-04-12T09:49:49Z

This will allow __perf_event_overflow() (which is independent of CONFIG_BPF_SYSCALL) to use struct perf_event's prog to decide whether to call bpf_overflow_handler(). Suggested-by: Ingo Molnar Signed-off-by: Kyle Huey Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20240412015019.7060-4-khuey@kylehuey.com