<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/workqueue.c, branch v4.1.34</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.1.34</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.1.34'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2016-05-17T17:43:11Z</updated>
<entry>
<title>workqueue: fix rebind bound workers warning</title>
<updated>2016-05-17T17:43:11Z</updated>
<author>
<name>Wanpeng Li</name>
<email>wanpeng.li@hotmail.com</email>
</author>
<published>2016-05-11T09:55:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3a1b9a74de5c10995d1fd842e3d2344d4e826ae1'/>
<id>urn:sha1:3a1b9a74de5c10995d1fd842e3d2344d4e826ae1</id>
<content type='text'>
[ Upstream commit f7c17d26f43d5cc1b7a6b896cd2fa24a079739b9 ]

------------[ cut here ]------------
WARNING: CPU: 0 PID: 16 at kernel/workqueue.c:4559 rebind_workers+0x1c0/0x1d0
Modules linked in:
CPU: 0 PID: 16 Comm: cpuhp/0 Not tainted 4.6.0-rc4+ #31
Hardware name: IBM IBM System x3550 M4 Server -[7914IUW]-/00Y8603, BIOS -[D7E128FUS-1.40]- 07/23/2013
 0000000000000000 ffff881037babb58 ffffffff8139d885 0000000000000010
 0000000000000000 0000000000000000 0000000000000000 ffff881037babba8
 ffffffff8108505d ffff881037ba0000 000011cf3e7d6e60 0000000000000046
Call Trace:
 dump_stack+0x89/0xd4
 __warn+0xfd/0x120
 warn_slowpath_null+0x1d/0x20
 rebind_workers+0x1c0/0x1d0
 workqueue_cpu_up_callback+0xf5/0x1d0
 notifier_call_chain+0x64/0x90
 ? trace_hardirqs_on_caller+0xf2/0x220
 ? notify_prepare+0x80/0x80
 __raw_notifier_call_chain+0xe/0x10
 __cpu_notify+0x35/0x50
 notify_down_prepare+0x5e/0x80
 ? notify_prepare+0x80/0x80
 cpuhp_invoke_callback+0x73/0x330
 ? __schedule+0x33e/0x8a0
 cpuhp_down_callbacks+0x51/0xc0
 cpuhp_thread_fun+0xc1/0xf0
 smpboot_thread_fn+0x159/0x2a0
 ? smpboot_create_threads+0x80/0x80
 kthread+0xef/0x110
 ? wait_for_completion+0xf0/0x120
 ? schedule_tail+0x35/0xf0
 ret_from_fork+0x22/0x50
 ? __init_kthread_worker+0x70/0x70
---[ end trace eb12ae47d2382d8f ]---
notify_down_prepare: attempt to take down CPU 0 failed

This bug can be reproduced by below config w/ nohz_full= all cpus:

CONFIG_BOOTPARAM_HOTPLUG_CPU0=y
CONFIG_DEBUG_HOTPLUG_CPU0=y
CONFIG_NO_HZ_FULL=y

As Thomas pointed out:

| If a down prepare callback fails, then DOWN_FAILED is invoked for all
| callbacks which have successfully executed DOWN_PREPARE.
|
| But, workqueue has actually two notifiers. One which handles
| UP/DOWN_FAILED/ONLINE and one which handles DOWN_PREPARE.
|
| Now look at the priorities of those callbacks:
|
| CPU_PRI_WORKQUEUE_UP        = 5
| CPU_PRI_WORKQUEUE_DOWN      = -5
|
| So the call order on DOWN_PREPARE is:
|
| CB 1
| CB ...
| CB workqueue_up() -&gt; Ignores DOWN_PREPARE
| CB ...
| CB X ---&gt; Fails
|
| So we call up to CB X with DOWN_FAILED
|
| CB 1
| CB ...
| CB workqueue_up() -&gt; Handles DOWN_FAILED
| CB ...
| CB X-1
|
| So the problem is that the workqueue stuff handles DOWN_FAILED in the up
| callback, while it should do it in the down callback. Which is not a good idea
| either because it wants to be called early on rollback...
|
| Brilliant stuff, isn't it? The hotplug rework will solve this problem because
| the callbacks become symetric, but for the existing mess, we need some
| workaround in the workqueue code.

The boot CPU handles housekeeping duty(unbound timers, workqueues,
timekeeping, ...) on behalf of full dynticks CPUs. It must remain
online when nohz full is enabled. There is a priority set to every
notifier_blocks:

workqueue_cpu_up &gt; tick_nohz_cpu_down &gt; workqueue_cpu_down

So tick_nohz_cpu_down callback failed when down prepare cpu 0, and
notifier_blocks behind tick_nohz_cpu_down will not be called any
more, which leads to workers are actually not unbound. Then hotplug
state machine will fallback to undo and online cpu 0 again. Workers
will be rebound unconditionally even if they are not unbound and
trigger the warning in this progress.

This patch fix it by catching !DISASSOCIATED to avoid rebind bound
workers.

Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Lai Jiangshan &lt;jiangshanlai@gmail.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Frédéric Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: stable@vger.kernel.org
Suggested-by: Lai Jiangshan &lt;jiangshanlai@gmail.com&gt;
Signed-off-by: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>workqueue: fix ghost PENDING flag while doing MQ IO</title>
<updated>2016-05-17T17:42:42Z</updated>
<author>
<name>Roman Pen</name>
<email>roman.penyaev@profitbricks.com</email>
</author>
<published>2016-04-26T11:15:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=14794cfb6c9bcca151dbe940f3d7b9f9f818499f'/>
<id>urn:sha1:14794cfb6c9bcca151dbe940f3d7b9f9f818499f</id>
<content type='text'>
[ Upstream commit 346c09f80459a3ad97df1816d6d606169a51001a ]

The bug in a workqueue leads to a stalled IO request in MQ ctx-&gt;rq_list
with the following backtrace:

[  601.347452] INFO: task kworker/u129:5:1636 blocked for more than 120 seconds.
[  601.347574]       Tainted: G           O    4.4.5-1-storage+ #6
[  601.347651] "echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  601.348142] kworker/u129:5  D ffff880803077988     0  1636      2 0x00000000
[  601.348519] Workqueue: ibnbd_server_fileio_wq ibnbd_dev_file_submit_io_worker [ibnbd_server]
[  601.348999]  ffff880803077988 ffff88080466b900 ffff8808033f9c80 ffff880803078000
[  601.349662]  ffff880807c95000 7fffffffffffffff ffffffff815b0920 ffff880803077ad0
[  601.350333]  ffff8808030779a0 ffffffff815b01d5 0000000000000000 ffff880803077a38
[  601.350965] Call Trace:
[  601.351203]  [&lt;ffffffff815b0920&gt;] ? bit_wait+0x60/0x60
[  601.351444]  [&lt;ffffffff815b01d5&gt;] schedule+0x35/0x80
[  601.351709]  [&lt;ffffffff815b2dd2&gt;] schedule_timeout+0x192/0x230
[  601.351958]  [&lt;ffffffff812d43f7&gt;] ? blk_flush_plug_list+0xc7/0x220
[  601.352208]  [&lt;ffffffff810bd737&gt;] ? ktime_get+0x37/0xa0
[  601.352446]  [&lt;ffffffff815b0920&gt;] ? bit_wait+0x60/0x60
[  601.352688]  [&lt;ffffffff815af784&gt;] io_schedule_timeout+0xa4/0x110
[  601.352951]  [&lt;ffffffff815b3a4e&gt;] ? _raw_spin_unlock_irqrestore+0xe/0x10
[  601.353196]  [&lt;ffffffff815b093b&gt;] bit_wait_io+0x1b/0x70
[  601.353440]  [&lt;ffffffff815b056d&gt;] __wait_on_bit+0x5d/0x90
[  601.353689]  [&lt;ffffffff81127bd0&gt;] wait_on_page_bit+0xc0/0xd0
[  601.353958]  [&lt;ffffffff81096db0&gt;] ? autoremove_wake_function+0x40/0x40
[  601.354200]  [&lt;ffffffff81127cc4&gt;] __filemap_fdatawait_range+0xe4/0x140
[  601.354441]  [&lt;ffffffff81127d34&gt;] filemap_fdatawait_range+0x14/0x30
[  601.354688]  [&lt;ffffffff81129a9f&gt;] filemap_write_and_wait_range+0x3f/0x70
[  601.354932]  [&lt;ffffffff811ced3b&gt;] blkdev_fsync+0x1b/0x50
[  601.355193]  [&lt;ffffffff811c82d9&gt;] vfs_fsync_range+0x49/0xa0
[  601.355432]  [&lt;ffffffff811cf45a&gt;] blkdev_write_iter+0xca/0x100
[  601.355679]  [&lt;ffffffff81197b1a&gt;] __vfs_write+0xaa/0xe0
[  601.355925]  [&lt;ffffffff81198379&gt;] vfs_write+0xa9/0x1a0
[  601.356164]  [&lt;ffffffff811c59d8&gt;] kernel_write+0x38/0x50

The underlying device is a null_blk, with default parameters:

  queue_mode    = MQ
  submit_queues = 1

Verification that nullb0 has something inflight:

root@pserver8:~# cat /sys/block/nullb0/inflight
       0        1
root@pserver8:~# find /sys/block/nullb0/mq/0/cpu* -name rq_list -print -exec cat {} \;
...
/sys/block/nullb0/mq/0/cpu2/rq_list
CTX pending:
        ffff8838038e2400
...

During debug it became clear that stalled request is always inserted in
the rq_list from the following path:

   save_stack_trace_tsk + 34
   blk_mq_insert_requests + 231
   blk_mq_flush_plug_list + 281
   blk_flush_plug_list + 199
   wait_on_page_bit + 192
   __filemap_fdatawait_range + 228
   filemap_fdatawait_range + 20
   filemap_write_and_wait_range + 63
   blkdev_fsync + 27
   vfs_fsync_range + 73
   blkdev_write_iter + 202
   __vfs_write + 170
   vfs_write + 169
   kernel_write + 56

So blk_flush_plug_list() was called with from_schedule == true.

If from_schedule is true, that means that finally blk_mq_insert_requests()
offloads execution of __blk_mq_run_hw_queue() and uses kblockd workqueue,
i.e. it calls kblockd_schedule_delayed_work_on().

That means, that we race with another CPU, which is about to execute
__blk_mq_run_hw_queue() work.

Further debugging shows the following traces from different CPUs:

  CPU#0                                  CPU#1
  ----------------------------------     -------------------------------
  reqeust A inserted
  STORE hctx-&gt;ctx_map[0] bit marked
  kblockd_schedule...() returns 1
  &lt;schedule to kblockd workqueue&gt;
                                         request B inserted
                                         STORE hctx-&gt;ctx_map[1] bit marked
                                         kblockd_schedule...() returns 0
  *** WORK PENDING bit is cleared ***
  flush_busy_ctxs() is executed, but
  bit 1, set by CPU#1, is not observed

As a result request B pended forever.

This behaviour can be explained by speculative LOAD of hctx-&gt;ctx_map on
CPU#0, which is reordered with clear of PENDING bit and executed _before_
actual STORE of bit 1 on CPU#1.

The proper fix is an explicit full barrier &lt;mfence&gt;, which guarantees
that clear of PENDING bit is to be executed before all possible
speculative LOADS or STORES inside actual work function.

Signed-off-by: Roman Pen &lt;roman.penyaev@profitbricks.com&gt;
Cc: Gioh Kim &lt;gi-oh.kim@profitbricks.com&gt;
Cc: Michael Wang &lt;yun.wang@profitbricks.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup</title>
<updated>2016-03-04T15:25:41Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2016-02-03T18:54:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f3d69fd89df665c1fa1931f4af846039a4ac5dc4'/>
<id>urn:sha1:f3d69fd89df665c1fa1931f4af846039a4ac5dc4</id>
<content type='text'>
[ Upstream commit d6e022f1d207a161cd88e08ef0371554680ffc46 ]

When looking up the pool_workqueue to use for an unbound workqueue,
workqueue assumes that the target CPU is always bound to a valid NUMA
node.  However, currently, when a CPU goes offline, the mapping is
destroyed and cpu_to_node() returns NUMA_NO_NODE.

This has always been broken but hasn't triggered often enough before
874bbfe600a6 ("workqueue: make sure delayed work run in local cpu").
After the commit, workqueue forcifully assigns the local CPU for
delayed work items without explicit target CPU to fix a different
issue.  This widens the window where CPU can go offline while a
delayed work item is pending causing delayed work items dispatched
with target CPU set to an already offlined CPU.  The resulting
NUMA_NO_NODE mapping makes workqueue try to queue the work item on a
NULL pool_workqueue and thus crash.

While 874bbfe600a6 has been reverted for a different reason making the
bug less visible again, it can still happen.  Fix it by mapping
NUMA_NO_NODE to the default pool_workqueue from unbound_pwq_by_node().
This is a temporary workaround.  The long term solution is keeping CPU
-&gt; NODE mapping stable across CPU off/online cycles which is being
worked on.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Cc: Tang Chen &lt;tangchen@cn.fujitsu.com&gt;
Cc: Rafael J. Wysocki &lt;rafael@kernel.org&gt;
Cc: Len Brown &lt;len.brown@intel.com&gt;
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/g/1454424264.11183.46.camel@gmail.com
Link: http://lkml.kernel.org/g/1453702100-2597-1-git-send-email-tangchen@cn.fujitsu.com
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>workqueue: wq_pool_mutex protects the attrs-installation</title>
<updated>2016-03-04T15:25:41Z</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@cn.fujitsu.com</email>
</author>
<published>2015-05-12T12:32:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d3c4dd8843bef1885fabaa9f61d5d354ec2a5e3a'/>
<id>urn:sha1:d3c4dd8843bef1885fabaa9f61d5d354ec2a5e3a</id>
<content type='text'>
[ Upstream commit 5b95e1af8d17d85a17728f6de7dbff538e6e3c49 ]

Current wq_pool_mutex doesn't proctect the attrs-installation, it results
that -&gt;unbound_attrs, -&gt;numa_pwq_tbl[] and -&gt;dfl_pwq can only be accessed
under wq-&gt;mutex and causes some inconveniences. Example, wq_update_unbound_numa()
has to acquire wq-&gt;mutex before fetching the wq-&gt;unbound_attrs-&gt;no_numa
and the old_pwq.

attrs-installation is a short operation, so this change will no cause any
latency for other operations which also acquire the wq_pool_mutex.

The only unprotected attrs-installation code is in apply_workqueue_attrs(),
so this patch touches code less than comments.

It is also a preparation patch for next several patches which read
wq-&gt;unbound_attrs, wq-&gt;numa_pwq_tbl[] and wq-&gt;dfl_pwq with
only wq_pool_mutex held.

Signed-off-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>workqueue: split apply_workqueue_attrs() into 3 stages</title>
<updated>2016-03-04T15:25:40Z</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@cn.fujitsu.com</email>
</author>
<published>2015-04-27T09:58:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9e1a3771b412f52694e22a93cf188dbe9f0eab24'/>
<id>urn:sha1:9e1a3771b412f52694e22a93cf188dbe9f0eab24</id>
<content type='text'>
[ Upstream commit 2d5f0764b5264d2954ba6e3deb04f4f5de8e4476 ]

Current apply_workqueue_attrs() includes pwqs-allocation and pwqs-installation,
so when we batch multiple apply_workqueue_attrs()s as a transaction, we can't
ensure the transaction must succeed or fail as a complete unit.

To solve this, we split apply_workqueue_attrs() into three stages.
The first stage does the preparation: allocation memory, pwqs.
The second stage does the attrs-installaion and pwqs-installation.
The third stage frees the allocated memory and (old or unused) pwqs.

As the result, batching multiple apply_workqueue_attrs()s can
succeed or fail as a complete unit:
	1) batch do all the first stage for all the workqueues
	2) only commit all when all the above succeed.

This patch is a preparation for the next patch ("Allow modifying low level
unbound workqueue cpumask") which will do a multiple apply_workqueue_attrs().

The patch doesn't have functionality changed except two minor adjustment:
	1) free_unbound_pwq() for the error path is removed, we use the
	   heavier version put_pwq_unlocked() instead since the error path
	   is rare. this adjustment simplifies the code.
	2) the memory-allocation is also moved into wq_pool_mutex.
	   this is needed to avoid to do the further splitting.

tj: minor updates to comments.

Suggested-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Kevin Hilman &lt;khilman@linaro.org&gt;
Cc: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Cc: Mike Galbraith &lt;bitbucket@online.de&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Viresh Kumar &lt;viresh.kumar@linaro.org&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Signed-off-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>Revert "workqueue: make sure delayed work run in local cpu"</title>
<updated>2016-03-04T15:25:40Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2016-02-09T21:11:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=68fce03ba7901aa338a566292a59e6a753948861'/>
<id>urn:sha1:68fce03ba7901aa338a566292a59e6a753948861</id>
<content type='text'>
[ Upstream commit 041bd12e272c53a35c54c13875839bcb98c999ce ]

This reverts commit 874bbfe600a660cba9c776b3957b1ce393151b76.

Workqueue used to implicity guarantee that work items queued without
explicit CPU specified are put on the local CPU.  Recent changes in
timer broke the guarantee and led to vmstat breakage which was fixed
by 176bed1de5bf ("vmstat: explicitly schedule per-cpu work on the CPU
we need it to run on").

vmstat is the most likely to expose the issue and it's quite possible
that there are other similar problems which are a lot more difficult
to trigger.  As a preventive measure, 874bbfe600a6 ("workqueue: make
sure delayed work run in local cpu") was applied to restore the local
CPU guarnatee.  Unfortunately, the change exposed a bug in timer code
which got fixed by 22b886dd1018 ("timers: Use proper base migration in
add_timer_on()").  Due to code restructuring, the commit couldn't be
backported beyond certain point and stable kernels which only had
874bbfe600a6 started crashing.

The local CPU guarantee was accidental more than anything else and we
want to get rid of it anyway.  As, with the vmstat case fixed,
874bbfe600a6 is causing more problems than it's fixing, it has been
decided to take the chance and officially break the guarantee by
reverting the commit.  A debug feature will be added to force foreign
CPU assignment to expose cases relying on the guarantee and fixes for
the individual cases will be backported to stable as necessary.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Fixes: 874bbfe600a6 ("workqueue: make sure delayed work run in local cpu")
Link: http://lkml.kernel.org/g/20160120211926.GJ10810@quack.suse.cz
Cc: stable@vger.kernel.org
Cc: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Cc: Henrique de Moraes Holschuh &lt;hmh@hmh.eng.br&gt;
Cc: Daniel Bilik &lt;daniel.bilik@neosystem.cz&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Shaohua Li &lt;shli@fb.com&gt;
Cc: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Cc: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Daniel Bilik &lt;daniel.bilik@neosystem.cz&gt;
Cc: Jiri Slaby &lt;jslaby@suse.cz&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>workqueue: make sure delayed work run in local cpu</title>
<updated>2015-10-27T00:51:56Z</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2015-09-30T16:05:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=65825ff6388e1905ea128cb5341822840ddda62e'/>
<id>urn:sha1:65825ff6388e1905ea128cb5341822840ddda62e</id>
<content type='text'>
commit 874bbfe600a660cba9c776b3957b1ce393151b76 upstream.

My system keeps crashing with below message. vmstat_update() schedules a delayed
work in current cpu and expects the work runs in the cpu.
schedule_delayed_work() is expected to make delayed work run in local cpu. The
problem is timer can be migrated with NO_HZ. __queue_work() queues work in
timer handler, which could run in a different cpu other than where the delayed
work is scheduled. The end result is the delayed work runs in different cpu.
The patch makes __queue_delayed_work records local cpu earlier. Where the timer
runs doesn't change where the work runs with the change.

[   28.010131] ------------[ cut here ]------------
[   28.010609] kernel BUG at ../mm/vmstat.c:1392!
[   28.011099] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[   28.011860] Modules linked in:
[   28.012245] CPU: 0 PID: 289 Comm: kworker/0:3 Tainted: G        W4.3.0-rc3+ #634
[   28.013065] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153802- 04/01/2014
[   28.014160] Workqueue: events vmstat_update
[   28.014571] task: ffff880117682580 ti: ffff8800ba428000 task.ti: ffff8800ba428000
[   28.015445] RIP: 0010:[&lt;ffffffff8115f921&gt;]  [&lt;ffffffff8115f921&gt;]vmstat_update+0x31/0x80
[   28.016282] RSP: 0018:ffff8800ba42fd80  EFLAGS: 00010297
[   28.016812] RAX: 0000000000000000 RBX: ffff88011a858dc0 RCX:0000000000000000
[   28.017585] RDX: ffff880117682580 RSI: ffffffff81f14d8c RDI:ffffffff81f4df8d
[   28.018366] RBP: ffff8800ba42fd90 R08: 0000000000000001 R09:0000000000000000
[   28.019169] R10: 0000000000000000 R11: 0000000000000121 R12:ffff8800baa9f640
[   28.019947] R13: ffff88011a81e340 R14: ffff88011a823700 R15:0000000000000000
[   28.020071] FS:  0000000000000000(0000) GS:ffff88011a800000(0000)knlGS:0000000000000000
[   28.020071] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   28.020071] CR2: 00007ff6144b01d0 CR3: 00000000b8e93000 CR4:00000000000006f0
[   28.020071] Stack:
[   28.020071]  ffff88011a858dc0 ffff8800baa9f640 ffff8800ba42fe00ffffffff8106bd88
[   28.020071]  ffffffff8106bd0b 0000000000000096 0000000000000000ffffffff82f9b1e8
[   28.020071]  ffffffff829f0b10 0000000000000000 ffffffff81f18460ffff88011a81e340
[   28.020071] Call Trace:
[   28.020071]  [&lt;ffffffff8106bd88&gt;] process_one_work+0x1c8/0x540
[   28.020071]  [&lt;ffffffff8106bd0b&gt;] ? process_one_work+0x14b/0x540
[   28.020071]  [&lt;ffffffff8106c214&gt;] worker_thread+0x114/0x460
[   28.020071]  [&lt;ffffffff8106c100&gt;] ? process_one_work+0x540/0x540
[   28.020071]  [&lt;ffffffff81071bf8&gt;] kthread+0xf8/0x110
[   28.020071]  [&lt;ffffffff81071b00&gt;] ?kthread_create_on_node+0x200/0x200
[   28.020071]  [&lt;ffffffff81a6522f&gt;] ret_from_fork+0x3f/0x70
[   28.020071]  [&lt;ffffffff81071b00&gt;] ?kthread_create_on_node+0x200/0x200

Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>workqueue: Reorder sysfs code</title>
<updated>2015-04-06T15:16:04Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>fweisbec@gmail.com</email>
</author>
<published>2015-04-02T11:14:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6ba94429c8e7b87b0fff13c5ac90731b239b77fa'/>
<id>urn:sha1:6ba94429c8e7b87b0fff13c5ac90731b239b77fa</id>
<content type='text'>
The sysfs code usually belongs to the botom of the file since it deals
with high level objects. In the workqueue code it's misplaced and such
that we'll need to work around functions references to allow the sysfs
code to call APIs like apply_workqueue_attrs().

Lets move that block further in the file, almost the botom.

And declare workqueue_sysfs_unregister() just before destroy_workqueue()
which reference it.

tj: Moved workqueue_sysfs_unregister() forward declaration where other
    forward declarations are.

Suggested-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Kevin Hilman &lt;khilman@linaro.org&gt;
Cc: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Cc: Mike Galbraith &lt;bitbucket@online.de&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Viresh Kumar &lt;viresh.kumar@linaro.org&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Signed-off-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: dump workqueues on sysrq-t</title>
<updated>2015-03-09T13:22:28Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-03-09T13:22:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3494fc30846dceb808de4cc02930ef347fabd21a'/>
<id>urn:sha1:3494fc30846dceb808de4cc02930ef347fabd21a</id>
<content type='text'>
Workqueues are used extensively throughout the kernel but sometimes
it's difficult to debug stalls involving work items because visibility
into its inner workings is fairly limited.  Although sysrq-t task dump
annotates each active worker task with the information on the work
item being executed, it is challenging to find out which work items
are pending or delayed on which queues and how pools are being
managed.

This patch implements show_workqueue_state() which dumps all busy
workqueues and pools and is called from the sysrq-t handler.  At the
end of sysrq-t dump, something like the following is printed.

 Showing busy workqueues and worker pools:
 ...
 workqueue filler_wq: flags=0x0
   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
     in-flight: 491:filler_workfn, 507:filler_workfn
   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
     in-flight: 501:filler_workfn
     pending: filler_workfn
 ...
 workqueue test_wq: flags=0x8
   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/1
     in-flight: 510(RESCUER):test_workfn BAR(69) BAR(500)
     delayed: test_workfn1 BAR(492), test_workfn2
 ...
 pool 0: cpus=0 node=0 flags=0x0 nice=0 workers=2 manager: 137
 pool 2: cpus=1 node=0 flags=0x0 nice=0 workers=3 manager: 469
 pool 3: cpus=1 node=0 flags=0x0 nice=-20 workers=2 idle: 16
 pool 8: cpus=0-3 flags=0x4 nice=0 workers=2 manager: 62

The above shows that test_wq is executing test_workfn() on pid 510
which is the rescuer and also that there are two tasks 69 and 500
waiting for the work item to finish in flush_work().  As test_wq has
max_active of 1, there are two work items for test_workfn1() and
test_workfn2() which are delayed till the current work item is
finished.  In addition, pid 492 is flushing test_workfn1().

The work item for test_workfn() is being executed on pwq of pool 2
which is the normal priority per-cpu pool for CPU 1.  The pool has
three workers, two of which are executing filler_workfn() for
filler_wq and the last one is assuming the manager role trying to
create more workers.

This extra workqueue state dump will hopefully help chasing down hangs
involving workqueues.

v3: cpulist_pr_cont() replaced with "%*pbl" printf formatting.

v2: As suggested by Andrew, minor formatting change in pr_cont_work(),
    printk()'s replaced with pr_info()'s, and cpumask printing now
    uses cpulist_pr_cont().

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
CC: Ingo Molnar &lt;mingo@redhat.com&gt;
</content>
</entry>
<entry>
<title>workqueue: keep track of the flushing task and pool manager</title>
<updated>2015-03-09T13:22:28Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-03-09T13:22:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2607d7a6dba1e790aaacb14600ceffa3aa2f43e7'/>
<id>urn:sha1:2607d7a6dba1e790aaacb14600ceffa3aa2f43e7</id>
<content type='text'>
Add wq_barrier-&gt;task and worker_pool-&gt;manager to keep track of the
flushing task and pool manager respectively.  These are purely
informational and will be used to implement sysrq dump of workqueues.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
</feed>
