user/sven/linux.git/kernel/softirq.c, branch v5.15.141

timers/nohz: Last resort update jiffies on nohz_full IRQ entry

2023-08-16T16:22:04Z

[ Upstream commit 53e87e3cdc155f20c3417b689df8d2ac88d79576 ] When at least one CPU runs in nohz_full mode, a dedicated timekeeper CPU is guaranteed to stay online and to never stop its tick. Meanwhile on some rare case, the dedicated timekeeper may be running with interrupts disabled for a while, such as in stop_machine. If jiffies stop being updated, a nohz_full CPU may end up endlessly programming the next tick in the past, taking the last jiffies update monotonic timestamp as a stale base, resulting in an tick storm. Here is a scenario where it matters: 0) CPU 0 is the timekeeper and CPU 1 a nohz_full CPU. 1) A stop machine callback is queued to execute somewhere. 2) CPU 0 reaches MULTI_STOP_DISABLE_IRQ while CPU 1 is still in MULTI_STOP_PREPARE. Hence CPU 0 can't do its timekeeping duty. CPU 1 can still take IRQs. 3) CPU 1 receives an IRQ which queues a timer callback one jiffy forward. 4) On IRQ exit, CPU 1 schedules the tick one jiffy forward, taking last_jiffies_update as a base. But last_jiffies_update hasn't been updated for 2 jiffies since the timekeeper has interrupts disabled. 5) clockevents_program_event(), which relies on ktime_get(), observes that the expiration is in the past and therefore programs the min delta event on the clock. 6) The tick fires immediately, goto 3) 7) Tick storm, the nohz_full CPU is drown and takes ages to reach MULTI_STOP_DISABLE_IRQ, which is the only way out of this situation. Solve this with unconditionally updating jiffies if the value is stale on nohz_full IRQ entry. IRQs and other disturbances are expected to be rare enough on nohz_full for the unconditional call to ktime_get() to actually matter. Reported-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Signed-off-by: Thomas Gleixner Tested-by: Paul E. McKenney Link: https://lore.kernel.org/r/20211026141055.57358-2-frederic@kernel.org Signed-off-by: Joel Fernandes (Google) Signed-off-by: Greg Kroah-Hartman

genirq: Change force_irqthreads to a static key

2021-08-10T20:50:07Z

With CONFIG_IRQ_FORCED_THREADING=y, testing the boolean force_irqthreads could incur a cache line miss in invoke_softirq() and other places. Replace the test with a static key to avoid the potential cache miss. [ tglx: Dropped the IDE part, removed the export and updated blk-mq ] Suggested-by: Eric Dumazet Signed-off-by: Tanner Love Signed-off-by: Thomas Gleixner Reviewed-by: Eric Dumazet Reviewed-by: Kees Cook Link: https://lore.kernel.org/r/20210602180338.3324213-1-tannerlove.kernel@gmail.com

sched: Introduce task_is_running()

2021-06-18T09:43:07Z

Replace a bunch of 'p->state == TASK_RUNNING' with a new helper: task_is_running(p). Signed-off-by: Peter Zijlstra (Intel) Acked-by: Davidlohr Bueso Acked-by: Geert Uytterhoeven Acked-by: Will Deacon Link: https://lore.kernel.org/r/20210611082838.222401495@infradead.org

sched: Unbreak wakeups

2021-06-18T09:43:06Z

Remove broken task->state references and let wake_up_process() DTRT. The anti-pattern in these patches breaks the ordering of ->state vs COND as described in the comment near set_current_state() and can lead to missed wakeups: (OoO load, observes RUNNING)<-. for (;;) { | t->state = UNINTERRUPTIBLE; | smp_mb(); ,-----> | (observes !COND) | / if (COND) ---------' | COND = 1; break; `- if (t->state != RUNNING) wake_up_process(t); // not done schedule(); // forever waiting } t->state = TASK_RUNNING; Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Davidlohr Bueso Acked-by: Greg Kroah-Hartman Acked-by: Will Deacon Link: https://lore.kernel.org/r/20210611082838.160855222@infradead.org

Merge tag 'core-rcu-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

2021-04-28T19:00:13Z

Pull RCU updates from Ingo Molnar: - Support for "N" as alias for last bit in bitmap parsing library (eg using syntax like "nohz_full=2-N") - kvfree_rcu updates - mm_dump_obj() updates. (One of these is to mm, but was suggested by Andrew Morton.) - RCU callback offloading update - Polling RCU grace-period interfaces - Realtime-related RCU updates - Tasks-RCU updates - Torture-test updates - Torture-test scripting updates - Miscellaneous fixes * tag 'core-rcu-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits) rcutorture: Test start_poll_synchronize_rcu() and poll_state_synchronize_rcu() rcu: Provide polling interfaces for Tiny RCU grace periods torture: Fix kvm.sh --datestamp regex check torture: Consolidate qemu-cmd duration editing into kvm-transform.sh torture: Print proper vmlinux path for kvm-again.sh runs torture: Make TORTURE_TRUST_MAKE available in kvm-again.sh environment torture: Make kvm-transform.sh update jitter commands torture: Add --duration argument to kvm-again.sh torture: Add kvm-again.sh to rerun a previous torture-test torture: Create a "batches" file for build reuse torture: De-capitalize TORTURE_SUITE torture: Make upper-case-only no-dot no-slash scenario names official torture: Rename SRCU-t and SRCU-u to avoid lowercase characters torture: Remove no-mpstat error message torture: Record kvm-test-1-run.sh and kvm-test-1-run-qemu.sh PIDs torture: Record jitter start/stop commands torture: Extract kvm-test-1-run-qemu.sh from kvm-test-1-run.sh torture: Record TORTURE_KCONFIG_GDB_ARG in qemu-cmd torture: Abstract jitter.sh start/stop into scripts rcu: Provide polling interfaces for Tree RCU grace periods ...

tick/sched: Prevent false positive softirq pending warnings on RT

2021-03-17T15:34:11Z

On RT a task which has soft interrupts disabled can block on a lock and schedule out to idle while soft interrupts are pending. This triggers the warning in the NOHZ idle code which complains about going idle with pending soft interrupts. But as the task is blocked soft interrupt processing is temporarily blocked as well which means that such a warning is a false positive. To prevent that check the per CPU state which indicates that a scheduled out task has soft interrupts disabled. Signed-off-by: Thomas Gleixner Tested-by: Sebastian Andrzej Siewior Tested-by: Paul E. McKenney Reviewed-by: Frederic Weisbecker Acked-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20210309085727.527563866@linutronix.de

softirq: Make softirq control and processing RT aware

2021-03-17T15:34:10Z

Provide a local lock based serialization for soft interrupts on RT which allows the local_bh_disabled() sections and servicing soft interrupts to be preemptible. Provide the necessary inline helpers which allow to reuse the bulk of the softirq processing code. Signed-off-by: Thomas Gleixner Tested-by: Sebastian Andrzej Siewior Tested-by: Paul E. McKenney Reviewed-by: Frederic Weisbecker Acked-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20210309085727.426370483@linutronix.de

softirq: Move various protections into inline helpers

2021-03-17T15:34:10Z

To allow reuse of the bulk of softirq processing code for RT and to avoid #ifdeffery all over the place, split protections for various code sections out into inline helpers so the RT variant can just replace them in one go. Signed-off-by: Thomas Gleixner Tested-by: Sebastian Andrzej Siewior Tested-by: Paul E. McKenney Reviewed-by: Frederic Weisbecker Acked-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20210309085727.310118772@linutronix.de

tasklets: Prevent tasklet_unlock_spin_wait() deadlock on RT

2021-03-17T15:33:57Z

tasklet_unlock_spin_wait() spin waits for the TASKLET_STATE_SCHED bit in the tasklet state to be cleared. This works on !RT nicely because the corresponding execution can only happen on a different CPU. On RT softirq processing is preemptible, therefore a task preempting the softirq processing thread can spin forever. Prevent this by invoking local_bh_disable()/enable() inside the loop. In case that the softirq processing thread was preempted by the current task, current will block on the local lock which yields the CPU to the preempted softirq processing thread. If the tasklet is processed on a different CPU then the local_bh_disable()/enable() pair is just a waste of processor cycles. Signed-off-by: Thomas Gleixner Tested-by: Sebastian Andrzej Siewior Acked-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20210309084241.988908275@linutronix.de

tasklets: Replace spin wait in tasklet_kill()

2021-03-17T15:33:57Z

tasklet_kill() spin waits for TASKLET_STATE_SCHED to be cleared invoking yield() from inside the loop. yield() is an ill defined mechanism and the result might still be wasting CPU cycles in a tight loop which is especially painful in a guest when the CPU running the tasklet is scheduled out. tasklet_kill() is used in teardown paths and not performance critical at all. Replace the spin wait with wait_var_event(). Signed-off-by: Peter Zijlstra Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20210309084241.890532921@linutronix.de