<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/workqueue.c, branch v5.16</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.16</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.16'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2021-11-06T21:08:17Z</updated>
<entry>
<title>Merge branch 'akpm' (patches from Andrew)</title>
<updated>2021-11-06T21:08:17Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-11-06T21:08:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=512b7931ad0561ffe14265f9ff554a3c081b476b'/>
<id>urn:sha1:512b7931ad0561ffe14265f9ff554a3c081b476b</id>
<content type='text'>
Merge misc updates from Andrew Morton:
 "257 patches.

  Subsystems affected by this patch series: scripts, ocfs2, vfs, and
  mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache,
  gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc,
  pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools,
  memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm,
  vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram,
  cleanups, kfence, and damon)"

* emailed patches from Andrew Morton &lt;akpm@linux-foundation.org&gt;: (257 commits)
  mm/damon: remove return value from before_terminate callback
  mm/damon: fix a few spelling mistakes in comments and a pr_debug message
  mm/damon: simplify stop mechanism
  Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions
  Docs/admin-guide/mm/damon/start: simplify the content
  Docs/admin-guide/mm/damon/start: fix a wrong link
  Docs/admin-guide/mm/damon/start: fix wrong example commands
  mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on
  mm/damon: remove unnecessary variable initialization
  Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM
  mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM)
  selftests/damon: support watermarks
  mm/damon/dbgfs: support watermarks
  mm/damon/schemes: activate schemes based on a watermarks mechanism
  tools/selftests/damon: update for regions prioritization of schemes
  mm/damon/dbgfs: support prioritization weights
  mm/damon/vaddr,paddr: support pageout prioritization
  mm/damon/schemes: prioritize regions within the quotas
  mm/damon/selftests: support schemes quotas
  mm/damon/dbgfs: support quotas of schemes
  ...
</content>
</entry>
<entry>
<title>workqueue, kasan: avoid alloc_pages() when recording stack</title>
<updated>2021-11-06T20:30:33Z</updated>
<author>
<name>Marco Elver</name>
<email>elver@google.com</email>
</author>
<published>2021-11-05T20:35:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f70da745be4d4c367568b8345f50db1ae04efcb2'/>
<id>urn:sha1:f70da745be4d4c367568b8345f50db1ae04efcb2</id>
<content type='text'>
Shuah Khan reported:

 | When CONFIG_PROVE_RAW_LOCK_NESTING=y and CONFIG_KASAN are enabled,
 | kasan_record_aux_stack() runs into "BUG: Invalid wait context" when
 | it tries to allocate memory attempting to acquire spinlock in page
 | allocation code while holding workqueue pool raw_spinlock.
 |
 | There are several instances of this problem when block layer tries
 | to __queue_work(). Call trace from one of these instances is below:
 |
 |     kblockd_mod_delayed_work_on()
 |       mod_delayed_work_on()
 |         __queue_delayed_work()
 |           __queue_work() (rcu_read_lock, raw_spin_lock pool-&gt;lock held)
 |             insert_work()
 |               kasan_record_aux_stack()
 |                 kasan_save_stack()
 |                   stack_depot_save()
 |                     alloc_pages()
 |                       __alloc_pages()
 |                         get_page_from_freelist()
 |                           rm_queue()
 |                             rm_queue_pcplist()
 |                               local_lock_irqsave(&amp;pagesets.lock, flags);
 |                               [ BUG: Invalid wait context triggered ]

The default kasan_record_aux_stack() calls stack_depot_save() with
GFP_NOWAIT, which in turn can then call alloc_pages(GFP_NOWAIT, ...).
In general, however, it is not even possible to use either GFP_ATOMIC
nor GFP_NOWAIT in certain non-preemptive contexts, including
raw_spin_locks (see gfp.h and commmit ab00db216c9c7).

Fix it by instructing stackdepot to not expand stack storage via
alloc_pages() in case it runs out by using
kasan_record_aux_stack_noalloc().

While there is an increased risk of failing to insert the stack trace,
this is typically unlikely, especially if the same insertion had already
succeeded previously (stack depot hit).

For frequent calls from the same location, it therefore becomes
extremely unlikely that kasan_record_aux_stack_noalloc() fails.

Link: https://lkml.kernel.org/r/20210902200134.25603-1-skhan@linuxfoundation.org
Link: https://lkml.kernel.org/r/20210913112609.2651084-7-elver@google.com
Signed-off-by: Marco Elver &lt;elver@google.com&gt;
Reported-by: Shuah Khan &lt;skhan@linuxfoundation.org&gt;
Tested-by: Shuah Khan &lt;skhan@linuxfoundation.org&gt;
Acked-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Andrey Konovalov &lt;andreyknvl@gmail.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Andrey Ryabinin &lt;ryabinin.a.a@gmail.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: "Gustavo A. R. Silva" &lt;gustavoars@kernel.org&gt;
Cc: Lai Jiangshan &lt;jiangshanlai@gmail.com&gt;
Cc: Taras Madan &lt;tarasmadan@google.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vijayanand Jitta &lt;vjitta@codeaurora.org&gt;
Cc: Vinayak Menon &lt;vinmenon@codeaurora.org&gt;
Cc: Walter Wu &lt;walter-zh.wu@mediatek.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Introduce show_one_worker_pool and show_one_workqueue.</title>
<updated>2021-10-20T16:19:03Z</updated>
<author>
<name>Imran Khan</name>
<email>imran.f.khan@oracle.com</email>
</author>
<published>2021-10-20T03:09:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=55df0933be74bd2e52aba0b67eb743ae0feabe7e'/>
<id>urn:sha1:55df0933be74bd2e52aba0b67eb743ae0feabe7e</id>
<content type='text'>
Currently show_workqueue_state shows the state of all workqueues and of
all worker pools. In certain cases we may need to dump state of only a
specific workqueue or worker pool. For example in destroy_workqueue we
only need to show state of the workqueue which is getting destroyed.

So rename show_workqueue_state to show_all_workqueues(to signify it
dumps state of all busy workqueues) and divide it into more granular
functions (show_one_workqueue and show_one_worker_pool), that would show
states of individual workqueues and worker pools and can be used in
cases such as the one mentioned above.

Also, as mentioned earlier, make destroy_workqueue dump data pertaining
to only the workqueue that is being destroyed and make user(s) of
earlier interface(show_workqueue_state), use new interface
(show_all_workqueues).

Signed-off-by: Imran Khan &lt;imran.f.khan@oracle.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: make sysfs of unbound kworker cpumask more clever</title>
<updated>2021-10-19T18:38:31Z</updated>
<author>
<name>Menglong Dong</name>
<email>imagedong@tencent.com</email>
</author>
<published>2021-10-17T12:04:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d25302e46592c97d29f70ccb1be558df31a9a360'/>
<id>urn:sha1:d25302e46592c97d29f70ccb1be558df31a9a360</id>
<content type='text'>
Some unfriendly component, such as dpdk, write the same mask to
unbound kworker cpumask again and again. Every time it write to
this interface some work is queue to cpu, even though the mask
is same with the original mask.

So, fix it by return success and do nothing if the cpumask is
equal with the old one.

Signed-off-by: Mengen Sun &lt;mengensun@tencent.com&gt;
Signed-off-by: Menglong Dong &lt;imagedong@tencent.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: fix state-dump console deadlock</title>
<updated>2021-10-11T16:50:28Z</updated>
<author>
<name>Johan Hovold</name>
<email>johan@kernel.org</email>
</author>
<published>2021-10-06T11:58:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=57116ce17b04fde2fe30f0859df69d8dbe5809f6'/>
<id>urn:sha1:57116ce17b04fde2fe30f0859df69d8dbe5809f6</id>
<content type='text'>
Console drivers often queue work while holding locks also taken in their
console write paths, something which can lead to deadlocks on SMP when
dumping workqueue state (e.g. sysrq-t or on suspend failures).

For serial console drivers this could look like:

	CPU0				CPU1
	----				----

	show_workqueue_state();
	  lock(&amp;pool-&gt;lock);		&lt;IRQ&gt;
	  				  lock(&amp;port-&gt;lock);
					  schedule_work();
					    lock(&amp;pool-&gt;lock);
	  printk();
	    lock(console_owner);
	    lock(&amp;port-&gt;lock);

where workqueues are, for example, used to push data to the line
discipline, process break signals and handle modem-status changes. Line
disciplines and serdev drivers can also queue work on write-wakeup
notifications, etc.

Reworking every console driver to avoid queuing work while holding locks
also taken in their write paths would complicate drivers and is neither
desirable or feasible.

Instead use the deferred-printk mechanism to avoid printing while
holding pool locks when dumping workqueue state.

Note that there are a few WARN_ON() assertions in the workqueue code
which could potentially also trigger a deadlock. Hopefully the ongoing
printk rework will provide a general solution for this eventually.

This was originally reported after a lockdep splat when executing
sysrq-t with the imx serial driver.

Fixes: 3494fc30846d ("workqueue: dump workqueues on sysrq-t")
Cc: stable@vger.kernel.org	# 4.0
Reported-by: Fabio Estevam &lt;festevam@denx.de&gt;
Tested-by: Fabio Estevam &lt;festevam@denx.de&gt;
Signed-off-by: Johan Hovold &lt;johan@kernel.org&gt;
Reviewed-by: John Ogness &lt;john.ogness@linutronix.de&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Assign a color to barrier work items</title>
<updated>2021-08-17T17:49:10Z</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@linux.alibaba.com</email>
</author>
<published>2021-08-17T01:32:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d812796eb3906424cd3c0eea530983961e2f88bd'/>
<id>urn:sha1:d812796eb3906424cd3c0eea530983961e2f88bd</id>
<content type='text'>
There was no strong reason to or not to flush barrier work items in
flush_workqueue().  And we have to make barrier work items not participate
in nr_active so we had been using WORK_NO_COLOR for them which also makes
them can't be flushed by flush_workqueue().

And the users of flush_workqueue() often do not intend to wait barrier work
items issued by flush_work().  That made the choice sound perfect.

But barrier work items have reference to internal structure (pool_workqueue)
and the worker thread[s] is/are still busy for the workqueue user when the
barrrier work items are not done.  So it is reasonable to make flush_workqueue()
also watch for flush_work() to make it more robust.

And a problem[1] reported by Li Zhe shows that we need such robustness.
The warning logs are listed below:

WARNING: CPU: 0 PID: 19336 at kernel/workqueue.c:4430 destroy_workqueue+0x11a/0x2f0
*****
destroy_workqueue: test_workqueue9 has the following busy pwq
  pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=0/1 refcnt=2
      in-flight: 5658:wq_barrier_func
Showing busy workqueues and worker pools:
*****

It shows that even after drain_workqueue() returns, the barrier work item
is still in flight and the pwq (and a worker) is still busy on it.

The problem is caused by flush_workqueue() not watching flush_work():

Thread A				Worker
					/* normal work item with linked */
					process_scheduled_works()
destroy_workqueue()			  process_one_work()
  drain_workqueue()			    /* run normal work item */
				 /--	    pwq_dec_nr_in_flight()
    flush_workqueue()	    &lt;---/
		/* the last normal work item is done */
  sanity_check				  process_one_work()
				       /--  raw_spin_unlock_irq(&amp;pool-&gt;lock)
    raw_spin_lock_irq(&amp;pool-&gt;lock)  &lt;-/     /* maybe preempt */
    *WARNING*				    wq_barrier_func()
					    /* maybe preempt by cond_resched() */

Thread A can get the pool lock after the Worker unlocks the pool lock before
running wq_barrier_func().  And if there is any preemption happen around
wq_barrier_func(), destroy_workqueue()'s sanity check is more likely to
get the lock and catch it.  (Note: preemption is not necessary to cause the bug,
the unlocking is enough to possibly trigger the WARNING.)

A simple solution might be just executing all linked barrier work items
once without releasing pool lock after the head work item's
pwq_dec_nr_in_flight().  But this solution has two problems:

  1) the head work item might also be barrier work item when the user-queued
     work item is cancelled. For example:
	thread 1:		thread 2:
	queue_work(wq, &amp;my_work)
	flush_work(&amp;my_work)
				cancel_work_sync(&amp;my_work);
	/* Neiter my_work nor the barrier work is scheduled. */
				destroy_workqueue(wq);
	/* This is an easier way to catch the WARNING. */

  2) there might be too much linked barrier work items and running them
     all once without releasing pool lock just causes trouble.

The only solution is to make flush_workqueue() aslo watch barrier work
items.  So we have to assign a color to these barrier work items which
is the color of the head (user-queued) work item.

Assigning a color doesn't cause any problem in ative management, because
the prvious patch made barrier work items not participate in nr_active
via WORK_STRUCT_INACTIVE rather than reliance on the (old) WORK_NO_COLOR.

[1]: https://lore.kernel.org/lkml/20210812083814.32453-1-lizhe.67@bytedance.com/
Reported-by: Li Zhe &lt;lizhe.67@bytedance.com&gt;
Signed-off-by: Lai Jiangshan &lt;laijs@linux.alibaba.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Mark barrier work with WORK_STRUCT_INACTIVE</title>
<updated>2021-08-17T17:49:10Z</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@linux.alibaba.com</email>
</author>
<published>2021-08-17T01:32:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=018f3a13dd6300701103f268b6bfec0a56beea57'/>
<id>urn:sha1:018f3a13dd6300701103f268b6bfec0a56beea57</id>
<content type='text'>
Currently, WORK_NO_COLOR has two meanings:
	Not participate in flushing
	Not participate in nr_active

And only non-barrier work items are marked with WORK_STRUCT_INACTIVE
when they are in inactive_works list.  The barrier work items are not
marked INACTIVE even linked in inactive_works list since these tail
items are always moved together with the head work item.

These definitions are simple, clean and practical. (Except a small
blemish that only the first meaning of WORK_NO_COLOR is documented in
include/linux/workqueue.h while both meanings are in workqueue.c)

But dual-purpose WORK_NO_COLOR used for barrier work items has proven to
be problematical[1].  Only the second purpose is obligatory.  So we plan
to make barrier work items participate in flushing but keep them still
not participating in nr_active.

So the plan is to mark barrier work items inactive without using
WORK_NO_COLOR in this patch so that we can assign a flushing color to
them in next patch.

The reasonable way is to add or reuse a bit in work data of the work
item.  But adding a bit will double the size of pool_workqueue.

Currently, WORK_STRUCT_INACTIVE is only used in try_to_grab_pending()
for user-queued work items and try_to_grab_pending() can't work for
barrier work items.  So we extend WORK_STRUCT_INACTIVE to also mark
barrier work items no matter which list they are in because we don't
need to determind which list a barrier work item is in.

So the meaning of WORK_STRUCT_INACTIVE becomes just "the work items don't
participate in nr_active" (no matter whether it is a barrier work item or
a user-queued work item).  And WORK_STRUCT_INACTIVE for user-queued work
items means they are in inactive_works list.

This patch does it by setting WORK_STRUCT_INACTIVE for barrier work items
in insert_wq_barrier() and checking WORK_STRUCT_INACTIVE first in
pwq_dec_nr_in_flight().  And the meaning of WORK_NO_COLOR is reduced to
only "not participating in flushing".

There is no functionality change intended in this patch.  Because
WORK_NO_COLOR+WORK_STRUCT_INACTIVE represents the previous WORK_NO_COLOR
in meaning and try_to_grab_pending() doesn't use for barrier work items
and avoids being confused by this extended WORK_STRUCT_INACTIVE.

A bunch of comment for nr_active &amp; WORK_STRUCT_INACTIVE is also added for
documenting how WORK_STRUCT_INACTIVE works in nr_active management.

[1]: https://lore.kernel.org/lkml/20210812083814.32453-1-lizhe.67@bytedance.com/
Signed-off-by: Lai Jiangshan &lt;laijs@linux.alibaba.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Change the code of calculating work_flags in insert_wq_barrier()</title>
<updated>2021-08-17T17:49:10Z</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@linux.alibaba.com</email>
</author>
<published>2021-08-17T01:32:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d21cece0dbb424ad3ff9e49bde6954632b8efede'/>
<id>urn:sha1:d21cece0dbb424ad3ff9e49bde6954632b8efede</id>
<content type='text'>
Add a local var @work_flags to calculate work_flags step by step, so that
we don't need to squeeze several flags in only the last line of code.

Parepare for next patch to add a bit to barrier work item's flag.  Not
squshing this to next patch makes it clear that what it will have changed.

No functional change intended.

Signed-off-by: Lai Jiangshan &lt;laijs@linux.alibaba.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Change arguement of pwq_dec_nr_in_flight()</title>
<updated>2021-08-17T17:49:09Z</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@linux.alibaba.com</email>
</author>
<published>2021-08-17T01:32:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c4560c2c88a4c809800ba8e76faabaf80bf6ee89'/>
<id>urn:sha1:c4560c2c88a4c809800ba8e76faabaf80bf6ee89</id>
<content type='text'>
Make pwq_dec_nr_in_flight() use work_data rather just work_color.

Prepare for later patch to get WORK_STRUCT_INACTIVE bit from work_data
in pwq_dec_nr_in_flight().

No functional change intended.

Signed-off-by: Lai Jiangshan &lt;laijs@linux.alibaba.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Rename "delayed" (delayed by active management) to "inactive"</title>
<updated>2021-08-17T17:49:09Z</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@linux.alibaba.com</email>
</author>
<published>2021-08-17T01:32:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f97a4a1a3f8769e3452885967955e21c88f3f263'/>
<id>urn:sha1:f97a4a1a3f8769e3452885967955e21c88f3f263</id>
<content type='text'>
There are two kinds of "delayed" work items in workqueue subsystem.

One is for timer-delayed work items which are visible to workqueue users.
The other kind is for work items delayed by active management which can
not be directly visible to workqueue users.  We mixed the word "delayed"
for both kinds and caused somewhat ambiguity.

This patch renames the later one (delayed by active management) to
"inactive", because it is used for workqueue active management and
most of its related symbols are named with "active" or "activate".

All "delayed" and "DELAYED" are carefully checked and renamed one by
one to avoid accidentally changing the name of the other kind for
timer-delayed.

No functional change intended.

Signed-off-by: Lai Jiangshan &lt;laijs@linux.alibaba.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
</feed>
