<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/workqueue.c, branch v5.15.27</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.15.27</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.15.27'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2022-01-16T08:12:41Z</updated>
<entry>
<title>workqueue: Fix unbind_workers() VS wq_worker_running() race</title>
<updated>2022-01-16T08:12:41Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2021-12-01T15:19:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cf5b6bd2c79269c7774105f81fcae460063a6199'/>
<id>urn:sha1:cf5b6bd2c79269c7774105f81fcae460063a6199</id>
<content type='text'>
commit 07edfece8bcb0580a1828d939e6f8d91a8603eb2 upstream.

At CPU-hotplug time, unbind_worker() may preempt a worker while it is
waking up. In that case the following scenario can happen:

        unbind_workers()                     wq_worker_running()
        --------------                      -------------------
        	                      if (!(worker-&gt;flags &amp; WORKER_NOT_RUNNING))
        	                          //PREEMPTED by unbind_workers
        worker-&gt;flags |= WORKER_UNBOUND;
        [...]
        atomic_set(&amp;pool-&gt;nr_running, 0);
        //resume to worker
		                              atomic_inc(&amp;worker-&gt;pool-&gt;nr_running);

After unbind_worker() resets pool-&gt;nr_running, the value is expected to
remain 0 until the pool ever gets rebound in case cpu_up() is called on
the target CPU in the future. But here the race leaves pool-&gt;nr_running
with a value of 1, triggering the following warning when the worker goes
idle:

	WARNING: CPU: 3 PID: 34 at kernel/workqueue.c:1823 worker_enter_idle+0x95/0xc0
	Modules linked in:
	CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 5.16.0-rc1+ #34
	Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
	Workqueue:  0x0 (rcu_par_gp)
	RIP: 0010:worker_enter_idle+0x95/0xc0
	Code: 04 85 f8 ff ff ff 39 c1 7f 09 48 8b 43 50 48 85 c0 74 1b 83 e2 04 75 99 8b 43 34 39 43 30 75 91 8b 83 00 03 00 00 85 c0 74 87 &lt;0f&gt; 0b 5b c3 48 8b 35 70 f1 37 01 48 8d 7b 48 48 81 c6 e0 93  0
	RSP: 0000:ffff9b7680277ed0 EFLAGS: 00010086
	RAX: 00000000ffffffff RBX: ffff93465eae9c00 RCX: 0000000000000000
	RDX: 0000000000000000 RSI: ffff9346418a0000 RDI: ffff934641057140
	RBP: ffff934641057170 R08: 0000000000000001 R09: ffff9346418a0080
	R10: ffff9b768027fdf0 R11: 0000000000002400 R12: ffff93465eae9c20
	R13: ffff93465eae9c20 R14: ffff93465eae9c70 R15: ffff934641057140
	FS:  0000000000000000(0000) GS:ffff93465eac0000(0000) knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 0000000000000000 CR3: 000000001cc0c000 CR4: 00000000000006e0
	DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
	DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
	Call Trace:
	  &lt;TASK&gt;
	  worker_thread+0x89/0x3d0
	  ? process_one_work+0x400/0x400
	  kthread+0x162/0x190
	  ? set_kthread_struct+0x40/0x40
	  ret_from_fork+0x22/0x30
	  &lt;/TASK&gt;

Also due to this incorrect "nr_running == 1", further queued work may
end up not being served, because no worker is awaken at work insert time.
This raises rcutorture writer stalls for example.

Fix this with disabling preemption in the right place in
wq_worker_running().

It's worth noting that if the worker migrates and runs concurrently with
unbind_workers(), it is guaranteed to see the WORKER_UNBOUND flag update
due to set_cpus_allowed_ptr() acquiring/releasing rq-&gt;lock.

Fixes: 6d25be5782e4 ("sched/core, workqueues: Distangle worker accounting from rq lock")
Reviewed-by: Lai Jiangshan &lt;jiangshanlai@gmail.com&gt;
Tested-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Cc: Daniel Bristot de Oliveira &lt;bristot@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>workqueue: make sysfs of unbound kworker cpumask more clever</title>
<updated>2021-11-18T18:16:17Z</updated>
<author>
<name>Menglong Dong</name>
<email>imagedong@tencent.com</email>
</author>
<published>2021-10-17T12:04:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b09a201b714d698d712ba8af1052548df0350d1e'/>
<id>urn:sha1:b09a201b714d698d712ba8af1052548df0350d1e</id>
<content type='text'>
[ Upstream commit d25302e46592c97d29f70ccb1be558df31a9a360 ]

Some unfriendly component, such as dpdk, write the same mask to
unbound kworker cpumask again and again. Every time it write to
this interface some work is queue to cpu, even though the mask
is same with the original mask.

So, fix it by return success and do nothing if the cpumask is
equal with the old one.

Signed-off-by: Mengen Sun &lt;mengensun@tencent.com&gt;
Signed-off-by: Menglong Dong &lt;imagedong@tencent.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: fix state-dump console deadlock</title>
<updated>2021-10-11T16:50:28Z</updated>
<author>
<name>Johan Hovold</name>
<email>johan@kernel.org</email>
</author>
<published>2021-10-06T11:58:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=57116ce17b04fde2fe30f0859df69d8dbe5809f6'/>
<id>urn:sha1:57116ce17b04fde2fe30f0859df69d8dbe5809f6</id>
<content type='text'>
Console drivers often queue work while holding locks also taken in their
console write paths, something which can lead to deadlocks on SMP when
dumping workqueue state (e.g. sysrq-t or on suspend failures).

For serial console drivers this could look like:

	CPU0				CPU1
	----				----

	show_workqueue_state();
	  lock(&amp;pool-&gt;lock);		&lt;IRQ&gt;
	  				  lock(&amp;port-&gt;lock);
					  schedule_work();
					    lock(&amp;pool-&gt;lock);
	  printk();
	    lock(console_owner);
	    lock(&amp;port-&gt;lock);

where workqueues are, for example, used to push data to the line
discipline, process break signals and handle modem-status changes. Line
disciplines and serdev drivers can also queue work on write-wakeup
notifications, etc.

Reworking every console driver to avoid queuing work while holding locks
also taken in their write paths would complicate drivers and is neither
desirable or feasible.

Instead use the deferred-printk mechanism to avoid printing while
holding pool locks when dumping workqueue state.

Note that there are a few WARN_ON() assertions in the workqueue code
which could potentially also trigger a deadlock. Hopefully the ongoing
printk rework will provide a general solution for this eventually.

This was originally reported after a lockdep splat when executing
sysrq-t with the imx serial driver.

Fixes: 3494fc30846d ("workqueue: dump workqueues on sysrq-t")
Cc: stable@vger.kernel.org	# 4.0
Reported-by: Fabio Estevam &lt;festevam@denx.de&gt;
Tested-by: Fabio Estevam &lt;festevam@denx.de&gt;
Signed-off-by: Johan Hovold &lt;johan@kernel.org&gt;
Reviewed-by: John Ogness &lt;john.ogness@linutronix.de&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Assign a color to barrier work items</title>
<updated>2021-08-17T17:49:10Z</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@linux.alibaba.com</email>
</author>
<published>2021-08-17T01:32:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d812796eb3906424cd3c0eea530983961e2f88bd'/>
<id>urn:sha1:d812796eb3906424cd3c0eea530983961e2f88bd</id>
<content type='text'>
There was no strong reason to or not to flush barrier work items in
flush_workqueue().  And we have to make barrier work items not participate
in nr_active so we had been using WORK_NO_COLOR for them which also makes
them can't be flushed by flush_workqueue().

And the users of flush_workqueue() often do not intend to wait barrier work
items issued by flush_work().  That made the choice sound perfect.

But barrier work items have reference to internal structure (pool_workqueue)
and the worker thread[s] is/are still busy for the workqueue user when the
barrrier work items are not done.  So it is reasonable to make flush_workqueue()
also watch for flush_work() to make it more robust.

And a problem[1] reported by Li Zhe shows that we need such robustness.
The warning logs are listed below:

WARNING: CPU: 0 PID: 19336 at kernel/workqueue.c:4430 destroy_workqueue+0x11a/0x2f0
*****
destroy_workqueue: test_workqueue9 has the following busy pwq
  pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=0/1 refcnt=2
      in-flight: 5658:wq_barrier_func
Showing busy workqueues and worker pools:
*****

It shows that even after drain_workqueue() returns, the barrier work item
is still in flight and the pwq (and a worker) is still busy on it.

The problem is caused by flush_workqueue() not watching flush_work():

Thread A				Worker
					/* normal work item with linked */
					process_scheduled_works()
destroy_workqueue()			  process_one_work()
  drain_workqueue()			    /* run normal work item */
				 /--	    pwq_dec_nr_in_flight()
    flush_workqueue()	    &lt;---/
		/* the last normal work item is done */
  sanity_check				  process_one_work()
				       /--  raw_spin_unlock_irq(&amp;pool-&gt;lock)
    raw_spin_lock_irq(&amp;pool-&gt;lock)  &lt;-/     /* maybe preempt */
    *WARNING*				    wq_barrier_func()
					    /* maybe preempt by cond_resched() */

Thread A can get the pool lock after the Worker unlocks the pool lock before
running wq_barrier_func().  And if there is any preemption happen around
wq_barrier_func(), destroy_workqueue()'s sanity check is more likely to
get the lock and catch it.  (Note: preemption is not necessary to cause the bug,
the unlocking is enough to possibly trigger the WARNING.)

A simple solution might be just executing all linked barrier work items
once without releasing pool lock after the head work item's
pwq_dec_nr_in_flight().  But this solution has two problems:

  1) the head work item might also be barrier work item when the user-queued
     work item is cancelled. For example:
	thread 1:		thread 2:
	queue_work(wq, &amp;my_work)
	flush_work(&amp;my_work)
				cancel_work_sync(&amp;my_work);
	/* Neiter my_work nor the barrier work is scheduled. */
				destroy_workqueue(wq);
	/* This is an easier way to catch the WARNING. */

  2) there might be too much linked barrier work items and running them
     all once without releasing pool lock just causes trouble.

The only solution is to make flush_workqueue() aslo watch barrier work
items.  So we have to assign a color to these barrier work items which
is the color of the head (user-queued) work item.

Assigning a color doesn't cause any problem in ative management, because
the prvious patch made barrier work items not participate in nr_active
via WORK_STRUCT_INACTIVE rather than reliance on the (old) WORK_NO_COLOR.

[1]: https://lore.kernel.org/lkml/20210812083814.32453-1-lizhe.67@bytedance.com/
Reported-by: Li Zhe &lt;lizhe.67@bytedance.com&gt;
Signed-off-by: Lai Jiangshan &lt;laijs@linux.alibaba.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Mark barrier work with WORK_STRUCT_INACTIVE</title>
<updated>2021-08-17T17:49:10Z</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@linux.alibaba.com</email>
</author>
<published>2021-08-17T01:32:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=018f3a13dd6300701103f268b6bfec0a56beea57'/>
<id>urn:sha1:018f3a13dd6300701103f268b6bfec0a56beea57</id>
<content type='text'>
Currently, WORK_NO_COLOR has two meanings:
	Not participate in flushing
	Not participate in nr_active

And only non-barrier work items are marked with WORK_STRUCT_INACTIVE
when they are in inactive_works list.  The barrier work items are not
marked INACTIVE even linked in inactive_works list since these tail
items are always moved together with the head work item.

These definitions are simple, clean and practical. (Except a small
blemish that only the first meaning of WORK_NO_COLOR is documented in
include/linux/workqueue.h while both meanings are in workqueue.c)

But dual-purpose WORK_NO_COLOR used for barrier work items has proven to
be problematical[1].  Only the second purpose is obligatory.  So we plan
to make barrier work items participate in flushing but keep them still
not participating in nr_active.

So the plan is to mark barrier work items inactive without using
WORK_NO_COLOR in this patch so that we can assign a flushing color to
them in next patch.

The reasonable way is to add or reuse a bit in work data of the work
item.  But adding a bit will double the size of pool_workqueue.

Currently, WORK_STRUCT_INACTIVE is only used in try_to_grab_pending()
for user-queued work items and try_to_grab_pending() can't work for
barrier work items.  So we extend WORK_STRUCT_INACTIVE to also mark
barrier work items no matter which list they are in because we don't
need to determind which list a barrier work item is in.

So the meaning of WORK_STRUCT_INACTIVE becomes just "the work items don't
participate in nr_active" (no matter whether it is a barrier work item or
a user-queued work item).  And WORK_STRUCT_INACTIVE for user-queued work
items means they are in inactive_works list.

This patch does it by setting WORK_STRUCT_INACTIVE for barrier work items
in insert_wq_barrier() and checking WORK_STRUCT_INACTIVE first in
pwq_dec_nr_in_flight().  And the meaning of WORK_NO_COLOR is reduced to
only "not participating in flushing".

There is no functionality change intended in this patch.  Because
WORK_NO_COLOR+WORK_STRUCT_INACTIVE represents the previous WORK_NO_COLOR
in meaning and try_to_grab_pending() doesn't use for barrier work items
and avoids being confused by this extended WORK_STRUCT_INACTIVE.

A bunch of comment for nr_active &amp; WORK_STRUCT_INACTIVE is also added for
documenting how WORK_STRUCT_INACTIVE works in nr_active management.

[1]: https://lore.kernel.org/lkml/20210812083814.32453-1-lizhe.67@bytedance.com/
Signed-off-by: Lai Jiangshan &lt;laijs@linux.alibaba.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Change the code of calculating work_flags in insert_wq_barrier()</title>
<updated>2021-08-17T17:49:10Z</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@linux.alibaba.com</email>
</author>
<published>2021-08-17T01:32:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d21cece0dbb424ad3ff9e49bde6954632b8efede'/>
<id>urn:sha1:d21cece0dbb424ad3ff9e49bde6954632b8efede</id>
<content type='text'>
Add a local var @work_flags to calculate work_flags step by step, so that
we don't need to squeeze several flags in only the last line of code.

Parepare for next patch to add a bit to barrier work item's flag.  Not
squshing this to next patch makes it clear that what it will have changed.

No functional change intended.

Signed-off-by: Lai Jiangshan &lt;laijs@linux.alibaba.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Change arguement of pwq_dec_nr_in_flight()</title>
<updated>2021-08-17T17:49:09Z</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@linux.alibaba.com</email>
</author>
<published>2021-08-17T01:32:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c4560c2c88a4c809800ba8e76faabaf80bf6ee89'/>
<id>urn:sha1:c4560c2c88a4c809800ba8e76faabaf80bf6ee89</id>
<content type='text'>
Make pwq_dec_nr_in_flight() use work_data rather just work_color.

Prepare for later patch to get WORK_STRUCT_INACTIVE bit from work_data
in pwq_dec_nr_in_flight().

No functional change intended.

Signed-off-by: Lai Jiangshan &lt;laijs@linux.alibaba.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Rename "delayed" (delayed by active management) to "inactive"</title>
<updated>2021-08-17T17:49:09Z</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@linux.alibaba.com</email>
</author>
<published>2021-08-17T01:32:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f97a4a1a3f8769e3452885967955e21c88f3f263'/>
<id>urn:sha1:f97a4a1a3f8769e3452885967955e21c88f3f263</id>
<content type='text'>
There are two kinds of "delayed" work items in workqueue subsystem.

One is for timer-delayed work items which are visible to workqueue users.
The other kind is for work items delayed by active management which can
not be directly visible to workqueue users.  We mixed the word "delayed"
for both kinds and caused somewhat ambiguity.

This patch renames the later one (delayed by active management) to
"inactive", because it is used for workqueue active management and
most of its related symbols are named with "active" or "activate".

All "delayed" and "DELAYED" are carefully checked and renamed one by
one to avoid accidentally changing the name of the other kind for
timer-delayed.

No functional change intended.

Signed-off-by: Lai Jiangshan &lt;laijs@linux.alibaba.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Replace deprecated CPU-hotplug functions.</title>
<updated>2021-08-09T22:33:30Z</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2021-08-03T14:16:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ffd8bea81fbb5abe6518bce8d6297a149b935cf7'/>
<id>urn:sha1:ffd8bea81fbb5abe6518bce8d6297a149b935cf7</id>
<content type='text'>
The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().

Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.

Cc: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Lai Jiangshan &lt;jiangshanlai@gmail.com&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Replace deprecated ida_simple_*() with ida_alloc()/ida_free()</title>
<updated>2021-08-09T22:32:38Z</updated>
<author>
<name>Zhen Lei</name>
<email>thunder.leizhen@huawei.com</email>
</author>
<published>2021-08-04T03:50:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e441b56fe438fd126b9eea7d30c57d3cd3f34e14'/>
<id>urn:sha1:e441b56fe438fd126b9eea7d30c57d3cd3f34e14</id>
<content type='text'>
Replace ida_simple_get() with ida_alloc() and ida_simple_remove() with
ida_free(), the latter is more concise and intuitive.

In addition, if ida_alloc() fails, NULL is returned directly. This
eliminates unnecessary initialization of two local variables and an 'if'
judgment.

Signed-off-by: Zhen Lei &lt;thunder.leizhen@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
</feed>
