user/sven/linux.git/kernel/workqueue.c, branch v6.19.9

workqueue: Use POOL_BH instead of WQ_BH when checking pool flags

2026-03-19T15:14:48Z

[ Upstream commit f42f9091be9e5ff57567a3945cfcdd498f475348 ] pr_cont_worker_id() checks pool->flags against WQ_BH, which is a workqueue-level flag (defined in workqueue.h). Pool flags use a separate namespace with POOL_* constants (defined in workqueue.c). The correct constant is POOL_BH. Both WQ_BH and POOL_BH are defined as (1 << 0) so this has no behavioral impact, but it is semantically wrong and inconsistent with every other pool-level BH check in the file. Fixes: 4cb1ef64609f ("workqueue: Implement BH workqueues to eventually replace tasklets") Signed-off-by: Breno Leitao Acked-by: Song Liu Signed-off-by: Tejun Heo Signed-off-by: Sasha Levin

workqueue: Process rescuer work items one-by-one using a cursor

2026-02-26T23:00:55Z

[ Upstream commit e5a30c303b07a4d6083e0f7f051b53add6d93c5d ] Previously, the rescuer scanned for all matching work items at once and processed them within a single rescuer thread, which could cause one blocking work item to stall all others. Make the rescuer process work items one-by-one instead of slurping all matches in a single pass. Break the rescuer loop after finding and processing the first matching work item, then restart the search to pick up the next. This gives normal worker threads a chance to process other items which gives them the opportunity to be processed instead of waiting on the rescuer's queue and prevents a blocking work item from stalling the rest once memory pressure is relieved. Introduce a dummy cursor work item to avoid potentially O(N^2) rescans of the work list. The marker records the resume position for the next scan, eliminating redundant traversals. Also introduce RESCUER_BATCH to control the maximum number of work items the rescuer processes in each turn, and move on to other PWQs when the limit is reached. Cc: ying chen Reported-by: ying chen Fixes: e22bee782b3b ("workqueue: implement concurrency managed dynamic worker pool") Signed-off-by: Lai Jiangshan Signed-off-by: Tejun Heo Signed-off-by: Sasha Levin

workqueue: Don't rely on wq->rescuer to stop rescuer

2025-11-21T19:45:36Z

The commit1 def98c84b6cd ("workqueue: Fix spurious sanity check failures in destroy_workqueue()") tries to fix spurious sanity check failures by stopping send_mayday() via setting wq->rescuer to NULL. But it fails to stop the pwq->mayday_node requeuing in the rescuer, and the commit2 e66b39af00f4 ("workqueue: Fix pwq ref leak in rescuer_thread()") fixes it by checking wq->rescuer which is the result of commit1. Both commits together really fix spurious sanity check failures caused by the rescuer, but they both use a convoluted method by relying on wq->rescuer state rather than the real count of work items. Actually __WQ_DESTROYING and drain_workqueue() together already stop send_mayday() by draining all the work items and ensuring no new work item requeuing. And the more proper fix to stop the pwq->mayday_node requeuing in the rescuer is from commit3 4f3f4cf388f8 ("workqueue: avoid unneeded requeuing the pwq in rescuer thread") and renders the checking of wq->rescuer in commit2 unnecessary. So __WQ_DESTROYING, drain_workqueue() and commit3 together fix spurious sanity check failures introduced by the rescuer. Just remove the convoluted code of using wq->rescuer. Signed-off-by: Lai Jiangshan Signed-off-by: Tejun Heo

workqueue: Only assign rescuer work when really needed

2025-11-21T19:45:36Z

If the pwq does not need rescue (normal workers have been created or become available), the rescuer can immediately move on to other stalled pwqs. Signed-off-by: Lai Jiangshan Signed-off-by: Tejun Heo

workqueue: Factor out assign_rescuer_work()

2025-11-21T19:45:36Z

Move the code to assign work to rescuer and assign_rescuer_work(). Signed-off-by: Lai Jiangshan Signed-off-by: Tejun Heo

workqueue: Init rescuer's affinities as wq_unbound_cpumask

2025-11-20T20:27:55Z

The affinity to set to the rescuers should be consistent in all paths when a rescuer is in detached state. The affinity could be either wq_unbound_cpumask or unbound_effective_cpumask(wq). Related paths: rescuer's worker_detach_from_pool() update wq_unbound_cpumask update wq's cpumask init_rescuer() Both affinities are Ok as long as they are consistent in all paths. In the commit 449b31ad2937 ("workqueue: Init rescuer's affinities as the wq's effective cpumask") makes init_rescuer use unbound_effective_cpumask(wq) which is consistent with then apply_wqattrs_commit(). But using unbound_effective_cpumask(wq) requres much more code to maintain the consistency, and it doesn't make much sense since the affinity is only effective when the rescuer is not processing works. wq_unbound_cpumask is more favorable. So apply_wqattrs_commit() and the path of "updating wq's cpumask" had been changed to not update the rescuer's affinity, and both the paths of "updating wq_unbound_cpumask" and "rescuer's worker_detach_from_pool()" had been changed to use wq_unbound_cpumask. Now, make init_rescuer() use wq_unbound_cpumask for rescuer's affinity and make all the paths consistent. Cc: Juri Lelli Cc: Waiman Long Signed-off-by: Lai Jiangshan Signed-off-by: Tejun Heo

workqueue: Let DISASSOCIATED workers follow unbound wq cpumask changes

2025-11-20T20:27:55Z

When workqueue cpumask changes are committed, the DISASSOCIATED workers affinity is not touched and this might be a problem down the line for isolated setups when the DISASSOCIATED pools still have works to run after the cpu is offline. Make sure the workers' affinity is updated every time a workqueue cpumask changes, so these workers can't break isolation. Cc: Juri Lelli Cc: Waiman Long Signed-off-by: Lai Jiangshan Signed-off-by: Tejun Heo

workqueue: Update the rescuer's affinity only when it is detached

2025-11-20T20:27:55Z

When a rescuer is attached to a pool, its affinity should be only managed by the pool. But updating the detached rescuer's affinity is still meaningful so that it will not disrupt isolated CPUs when it is to be waken up. But the commit d64f2fa064f8 ("kernel/workqueue: Let rescuers follow unbound wq cpumask changes") updates the affinity unconditionally, and causes some issues 1) it also changes the affinity when the rescuer is already attached to a pool, which violates the affinity management. 2) the said commit tries to update the affinity of the rescuers, but it misses the rescuers of the PERCPU workqueues, and isolated CPUs can be possibly disrupted by these rescuers when they are summoned. 3) The affinity to set to the rescuers should be consistent in all paths when a rescuer is in detached state. The affinity could be either wq_unbound_cpumask or unbound_effective_cpumask(wq). Related paths: rescuer's worker_detach_from_pool() update wq_unbound_cpumask update wq's cpumask init_rescuer() Both affinities are Ok as long as they are consistent in all paths. But using unbound_effective_cpumask(wq) requres much more code to maintain the consistency, and it doesn't make much sense since the affinity is only effective when the rescuer is not processing works. wq_unbound_cpumask is more favorable. Fix the 1) issue by testing rescuer->pool before updating with wq_pool_attach_mutex held. Fix the 2) issue by moving the rescuer's affinity updating code to the place updating wq_unbound_cpumask and make it also update for PERCPU workqueues. Partially cleanup the 3) consistency issue by using wq_unbound_cpumask. So that the path of "updating wq's cpumask" doesn't need to maintain it. and both the paths of "updating wq_unbound_cpumask" and "rescuer's worker_detach_from_pool()" use wq_unbound_cpumask. Cleanup for init_rescuer()'s consistency for affinity can be done in future. Cc: Juri Lelli Cc: Waiman Long Signed-off-by: Lai Jiangshan Signed-off-by: Tejun Heo

workqueue: Remove unused assert_rcu_or_wq_mutex_or_pool_mutex

2025-11-10T16:10:18Z

assert_rcu_or_wq_mutex_or_pool_mutex is never referenced in the code. Just remove it. Signed-off-by: zhang jiao Reviewed-by: Lai Jiangshan Signed-off-by: Tejun Heo

workqueue: WQ_PERCPU added to alloc_workqueue users

2025-09-16T20:33:53Z

Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This patch adds a new WQ_PERCPU flag to explicitly request the use of the per-CPU behavior. Both flags coexist for one release cycle to allow callers to transition their calls. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. All existing users have been updated accordingly. Suggested-by: Tejun Heo Signed-off-by: Marco Crivellari Signed-off-by: Tejun Heo