From 9d475d80c93735f0f336b34a8e2c22beea6145ab Mon Sep 17 00:00:00 2001 From: Gabriele Monaco Date: Mon, 28 Jul 2025 15:50:17 +0200 Subject: rv: Retry when da monitor detects race conditions DA monitor can be accessed from multiple cores simultaneously, this is likely, for instance when dealing with per-task monitors reacting on events that do not always occur on the CPU where the task is running. This can cause race conditions where two events change the next state and we see inconsistent values. E.g.: [62] event_srs: 27: sleepable x sched_wakeup -> running (final) [63] event_srs: 27: sleepable x sched_set_state_sleepable -> sleepable [63] error_srs: 27: event sched_switch_suspend not expected in the state running In this case the monitor fails because the event on CPU 62 wins against the one on CPU 63, although the correct state should have been sleepable, since the task get suspended. Detect if the current state was modified by using try_cmpxchg while storing the next value. If it was, try again reading the current state. After a maximum number of failed retries, react by calling a special tracepoint, print on the console and reset the monitor. Remove the functions da_monitor_curr_state() and da_monitor_set_state() as they only hide the underlying implementation in this case. Monitors where this type of condition can occur must be able to account for racing events in any possible order, as we cannot know the winner. Cc: Ingo Molnar Cc: Masami Hiramatsu Cc: Tomas Glozar Cc: Juri Lelli Cc: Clark Williams Cc: John Kacur Cc: Peter Zijlstra Link: https://lore.kernel.org/20250728135022.255578-6-gmonaco@redhat.com Signed-off-by: Gabriele Monaco Reviewed-by: Nam Cao Signed-off-by: Steven Rostedt (Google) --- kernel/trace/rv/Kconfig | 5 +++++ kernel/trace/rv/rv_trace.h | 24 ++++++++++++++++++++++++ 2 files changed, 29 insertions(+) (limited to 'kernel/trace') diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig index 26017378f79b..34164eb4ec91 100644 --- a/kernel/trace/rv/Kconfig +++ b/kernel/trace/rv/Kconfig @@ -3,12 +3,17 @@ config RV_MON_EVENTS bool +config RV_MON_MAINTENANCE_EVENTS + bool + config DA_MON_EVENTS_IMPLICIT select RV_MON_EVENTS + select RV_MON_MAINTENANCE_EVENTS bool config DA_MON_EVENTS_ID select RV_MON_EVENTS + select RV_MON_MAINTENANCE_EVENTS bool config LTL_MON_EVENTS_ID diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h index d38e0d3abdfd..3af46cd185b3 100644 --- a/kernel/trace/rv/rv_trace.h +++ b/kernel/trace/rv/rv_trace.h @@ -176,6 +176,30 @@ DECLARE_EVENT_CLASS(error_ltl_monitor_id, #include // Add new monitors based on CONFIG_LTL_MON_EVENTS_ID here #endif /* CONFIG_LTL_MON_EVENTS_ID */ + +#ifdef CONFIG_RV_MON_MAINTENANCE_EVENTS +/* Tracepoint useful for monitors development, currenly only used in DA */ +TRACE_EVENT(rv_retries_error, + + TP_PROTO(char *name, char *event), + + TP_ARGS(name, event), + + TP_STRUCT__entry( + __string( name, name ) + __string( event, event ) + ), + + TP_fast_assign( + __assign_str(name); + __assign_str(event); + ), + + TP_printk(__stringify(MAX_DA_RETRY_RACING_EVENTS) + " retries reached for event %s, resetting monitor %s", + __get_str(event), __get_str(name)) +); +#endif /* CONFIG_RV_MON_MAINTENANCE_EVENTS */ #endif /* _TRACE_RV_H */ /* This part must be outside protection */ -- cgit v1.2.3