user/sven/linux.git/include/linux/wait.h, branch v5.4.204

wait: add wake_up_pollfree()

2021-12-14T13:49:02Z

commit 42288cb44c4b5fff7653bc392b583a2b8bd6a8c0 upstream. Several ->poll() implementations are special in that they use a waitqueue whose lifetime is the current task, rather than the struct file as is normally the case. This is okay for blocking polls, since a blocking poll occurs within one task; however, non-blocking polls require another solution. This solution is for the queue to be cleared before it is freed, using 'wake_up_poll(wq, EPOLLHUP | POLLFREE);'. However, that has a bug: wake_up_poll() calls __wake_up() with nr_exclusive=1. Therefore, if there are multiple "exclusive" waiters, and the wakeup function for the first one returns a positive value, only that one will be called. That's *not* what's needed for POLLFREE; POLLFREE is special in that it really needs to wake up everyone. Considering the three non-blocking poll systems: - io_uring poll doesn't handle POLLFREE at all, so it is broken anyway. - aio poll is unaffected, since it doesn't support exclusive waits. However, that's fragile, as someone could add this feature later. - epoll doesn't appear to be broken by this, since its wakeup function returns 0 when it sees POLLFREE. But this is fragile. Although there is a workaround (see epoll), it's better to define a function which always sends POLLFREE to all waiters. Add such a function. Also make it verify that the queue really becomes empty after all waiters have been woken up. Reported-by: Linus Torvalds Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211209010455.42744-2-ebiggers@kernel.org Signed-off-by: Eric Biggers Signed-off-by: Greg Kroah-Hartman

rq-qos: fix missed wake-ups in rq_qos_throttle try two

2021-07-19T06:53:16Z

commit 11c7aa0ddea8611007768d3e6b58d45dc60a19e1 upstream. Commit 545fbd0775ba ("rq-qos: fix missed wake-ups in rq_qos_throttle") tried to fix a problem that a process could be sleeping in rq_qos_wait() without anyone to wake it up. However the fix is not complete and the following can still happen: CPU1 (waiter1) CPU2 (waiter2) CPU3 (waker) rq_qos_wait() rq_qos_wait() acquire_inflight_cb() -> fails acquire_inflight_cb() -> fails completes IOs, inflight decreased prepare_to_wait_exclusive() prepare_to_wait_exclusive() has_sleeper = !wq_has_single_sleeper() -> true as there are two sleepers has_sleeper = !wq_has_single_sleeper() -> true io_schedule() io_schedule() Deadlock as now there's nobody to wakeup the two waiters. The logic automatically blocking when there are already sleepers is really subtle and the only way to make it work reliably is that we check whether there are some waiters in the queue when adding ourselves there. That way, we are guaranteed that at least the first process to enter the wait queue will recheck the waiting condition before going to sleep and thus guarantee forward progress. Fixes: 545fbd0775ba ("rq-qos: fix missed wake-ups in rq_qos_throttle") CC: stable@vger.kernel.org Signed-off-by: Jan Kara Link: https://lore.kernel.org/r/20210607112613.25344-1-jack@suse.cz Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

2019-09-17T19:35:15Z

Pull core timer updates from Thomas Gleixner: "Timers and timekeeping updates: - A large overhaul of the posix CPU timer code which is a preparation for moving the CPU timer expiry out into task work so it can be properly accounted on the task/process. An update to the bogus permission checks will come later during the merge window as feedback was not complete before heading of for travel. - Switch the timerqueue code to use cached rbtrees and get rid of the homebrewn caching of the leftmost node. - Consolidate hrtimer_init() + hrtimer_init_sleeper() calls into a single function - Implement the separation of hrtimers to be forced to expire in hard interrupt context even when PREEMPT_RT is enabled and mark the affected timers accordingly. - Implement a mechanism for hrtimers and the timer wheel to protect RT against priority inversion and live lock issues when a (hr)timer which should be canceled is currently executing the callback. Instead of infinitely spinning, the task which tries to cancel the timer blocks on a per cpu base expiry lock which is held and released by the (hr)timer expiry code. - Enable the Hyper-V TSC page based sched_clock for Hyper-V guests resulting in faster access to timekeeping functions. - Updates to various clocksource/clockevent drivers and their device tree bindings. - The usual small improvements all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits) posix-cpu-timers: Fix permission check regression posix-cpu-timers: Always clear head pointer on dequeue hrtimer: Add a missing bracket and hide `migration_base' on !SMP posix-cpu-timers: Make expiry_active check actually work correctly posix-timers: Unbreak CONFIG_POSIX_TIMERS=n build tick: Mark sched_timer to expire in hard interrupt context hrtimer: Add kernel doc annotation for HRTIMER_MODE_HARD x86/hyperv: Hide pv_ops access for CONFIG_PARAVIRT=n posix-cpu-timers: Utilize timerqueue for storage posix-cpu-timers: Move state tracking to struct posix_cputimers posix-cpu-timers: Deduplicate rlimit handling posix-cpu-timers: Remove pointless comparisons posix-cpu-timers: Get rid of 64bit divisions posix-cpu-timers: Consolidate timer expiry further posix-cpu-timers: Get rid of zero checks rlimit: Rewrite non-sensical RLIMIT_CPU comment posix-cpu-timers: Respect INFINITY for hard RTTIME limit posix-cpu-timers: Switch thread group sampling to array posix-cpu-timers: Restructure expiry array posix-cpu-timers: Remove cputime_expires ...

hrtimer: Consolidate hrtimer_init() + hrtimer_init_sleeper() calls

2019-08-01T15:43:15Z

hrtimer_init_sleeper() calls require prior initialisation of the hrtimer object which is embedded into the hrtimer_sleeper. Combine the initialization and spare a function call. Fixup all call sites. This is also a preparatory change for PREEMPT_RT to do hrtimer sleeper specific initializations of the embedded hrtimer without modifying any of the call sites. No functional change. [ anna-maria: Minor cleanups ] [ tglx: Adopted to the removal of the task argument of hrtimer_init_sleeper() and trivial polishing. Folded a fix from Stephen Rothwell for the vsoc code ] Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Anna-Maria Gleixner Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20190726185752.887468908@linutronix.de

hrtimer: Remove task argument from hrtimer_init_sleeper()

2019-07-30T21:57:51Z

All callers hand in 'current' and that's the only task pointer which actually makes sense. Remove the task argument and set current in the function. Signed-off-by: Thomas Gleixner Reviewed-by: Steven Rostedt (VMware) Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20190726185752.791885290@linutronix.de

wait: add wq_has_single_sleeper helper

2019-07-18T16:20:13Z

rq-qos sits in the io path so we want to take locks as sparingly as possible. To accomplish this we try not to take the waitqueue head lock unless we are sure we need to go to sleep, and we have an optimization to make sure that we don't starve out existing waiters. Since we check if there are existing waiters locklessly we need to be able to update our view of the waitqueue list after we've added ourselves to the waitqueue. Accomplish this by adding this helper to see if there is more than just ourselves on the list. Reviewed-by: Oleg Nesterov Signed-off-by: Josef Bacik Signed-off-by: Jens Axboe

docs: Add colon clearing sphinx warning

2019-04-09T21:14:49Z

Sphinx emits various warnings all caused by a missing colon before code block: WARNING: Block quote ends without a blank line; unexpected unindent. ERROR: Unexpected indentation. WARNING: Block quote ends without a blank line; unexpected unindent. Add the colon, clearing sphinx warnings. Signed-off-by: Tobin C. Harding Signed-off-by: Jonathan Corbet

sched/wait: Use freezable_schedule() when possible

2019-02-11T07:34:04Z

Replace 'schedule(); try_to_freeze();' with a call to freezable_schedule(). Tasks calling freezable_schedule() set the PF_FREEZER_SKIP flag before calling schedule(). Unlike tasks calling schedule(); try_to_freeze() tasks calling freezable_schedule() are not awaken by try_to_freeze_tasks(). Instead they call try_to_freeze() when they wake up if the freeze is still underway. It is not a problem since sleeping tasks can't do anything which isn't allowed for a frozen task while sleeping. The result is a potential performance gain during freeze, since less tasks have to be awaken. For instance on a bare Debian vm running a 4.19 stable kernel, the number of tasks skipped in freeze_task() went up from 12 without the patch to 32 with the patch (out of 448), an increase of > x2.5. Signed-off-by: Hugo Lefeuvre Reviewed-by: Joel Fernandes (Google) Cc: Joel Fernandes Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20190207200352.GA27859@behemoth.owl.eu.com.local Signed-off-by: Ingo Molnar

scsi: sched/wait: Add wait_event_lock_irq_timeout for TASK_UNINTERRUPTIBLE usage

2018-10-16T04:11:13Z

Short of reverting commit 00d909a10710 ("scsi: target: Make the session shutdown code also wait for commands that are being aborted") for v4.19, target-core needs a wait_event_t macro can be executed using TASK_UNINTERRUPTIBLE to function correctly with existing fabric drivers that expect to run with signals pending during session shutdown and active se_cmd I/O quiesce. The most notable is iscsi-target/iser-target, while ibmvscsi_tgt invokes session shutdown logic from userspace via configfs attribute that could also potentially have signals pending. So go ahead and introduce wait_event_lock_irq_timeout() to achieve this, and update + rename __wait_event_lock_irq_timeout() to make it accept 'state' as a parameter. Fixes: 00d909a10710 ("scsi: target: Make the session shutdown code also wait for commands that are being aborted") Cc: # v4.19+ Cc: Bart Van Assche Cc: Mike Christie Cc: Hannes Reinecke Cc: Christoph Hellwig Cc: Sagi Grimberg Cc: Bryant G. Ly Cc: Peter Zijlstra (Intel) Tested-by: Nicholas Bellinger Signed-off-by: Nicholas Bellinger Reviewed-by: Bryant G. Ly Acked-by: Peter Zijlstra (Intel) Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen

sched/wait: add wait_event_idle() functions.

2018-02-16T14:19:09Z

The new TASK_IDLE state (TASK_UNINTERRUPTIBLE | __TASK_NOLOAD) is not much used. One way to make it easier to use is to add wait_event*() family functions that make use of it. This patch adds: wait_event_idle() wait_event_idle_timeout() wait_event_idle_exclusive() wait_event_idle_exclusive_timeout() This set was chosen because lustre needs them before it can discard its own l_wait_event() macro. Acked-by: Peter Zijlstra (Intel) Reviewed-by: James Simmons Signed-off-by: NeilBrown Reviewed-by: Patrick Farrell Signed-off-by: Greg Kroah-Hartman