<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/workqueue.c, branch v4.4.193</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.193</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.193'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2018-05-30T05:49:04Z</updated>
<entry>
<title>workqueue: use put_device() instead of kfree()</title>
<updated>2018-05-30T05:49:04Z</updated>
<author>
<name>Arvind Yadav</name>
<email>arvind.yadav.cs@gmail.com</email>
</author>
<published>2018-03-06T10:05:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b29cfb8ae0a309b99251bb1286012ac5d77ce2e5'/>
<id>urn:sha1:b29cfb8ae0a309b99251bb1286012ac5d77ce2e5</id>
<content type='text'>
[ Upstream commit 537f4146c53c95aac977852b371bafb9c6755ee1 ]

Never directly free @dev after calling device_register(), even
if it returned an error! Always use put_device() to give up the
reference initialized in this function instead.

Signed-off-by: Arvind Yadav &lt;arvind.yadav.cs@gmail.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Allow retrieval of current task's work struct</title>
<updated>2018-03-18T10:17:48Z</updated>
<author>
<name>Lukas Wunner</name>
<email>lukas@wunner.de</email>
</author>
<published>2018-02-11T09:38:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e235f151a39b3af6d357c21f290087df7639580b'/>
<id>urn:sha1:e235f151a39b3af6d357c21f290087df7639580b</id>
<content type='text'>
commit 27d4ee03078aba88c5e07dcc4917e8d01d046f38 upstream.

Introduce a helper to retrieve the current task's work struct if it is
a workqueue worker.

This allows us to fix a long-standing deadlock in several DRM drivers
wherein the -&gt;runtime_suspend callback waits for a specific worker to
finish and that worker in turn calls a function which waits for runtime
suspend to finish.  That function is invoked from multiple call sites
and waiting for runtime suspend to finish is the correct thing to do
except if it's executing in the context of the worker.

Cc: Lai Jiangshan &lt;jiangshanlai@gmail.com&gt;
Cc: Dave Airlie &lt;airlied@redhat.com&gt;
Cc: Ben Skeggs &lt;bskeggs@redhat.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Lyude Paul &lt;lyude@redhat.com&gt;
Signed-off-by: Lukas Wunner &lt;lukas@wunner.de&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/2d8f603074131eb87e588d2b803a71765bd3a2fd.1518338788.git.lukas@wunner.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>workqueue: trigger WARN if queue_delayed_work() is called with NULL @wq</title>
<updated>2017-12-16T09:33:52Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2017-03-06T20:33:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d9d47a6d6862df5536837b43407d3b527cad5090'/>
<id>urn:sha1:d9d47a6d6862df5536837b43407d3b527cad5090</id>
<content type='text'>
[ Upstream commit 637fdbae60d6cb9f6e963c1079d7e0445c86ff7d ]

If queue_delayed_work() gets called with NULL @wq, the kernel will
oops asynchronuosly on timer expiration which isn't too helpful in
tracking down the offender.  This actually happened with smc.

__queue_delayed_work() already does several input sanity checks
synchronously.  Add NULL @wq check.

Reported-by: Dave Jones &lt;davej@codemonkey.org.uk&gt;
Link: http://lkml.kernel.org/r/20170227171439.jshx3qplflyrgcv7@codemonkey.org.uk
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>workqueue: replace pool-&gt;manager_arb mutex with a flag</title>
<updated>2017-11-02T08:40:48Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2017-10-09T15:04:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fce67b31c7cd5a6599fe9cf1b2c398b8f2b874cb'/>
<id>urn:sha1:fce67b31c7cd5a6599fe9cf1b2c398b8f2b874cb</id>
<content type='text'>
commit 692b48258dda7c302e777d7d5f4217244478f1f6 upstream.

Josef reported a HARDIRQ-safe -&gt; HARDIRQ-unsafe lock order detected by
lockdep:

 [ 1270.472259] WARNING: HARDIRQ-safe -&gt; HARDIRQ-unsafe lock order detected
 [ 1270.472783] 4.14.0-rc1-xfstests-12888-g76833e8 #110 Not tainted
 [ 1270.473240] -----------------------------------------------------
 [ 1270.473710] kworker/u5:2/5157 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
 [ 1270.474239]  (&amp;(&amp;lock-&gt;wait_lock)-&gt;rlock){+.+.}, at: [&lt;ffffffff8da253d2&gt;] __mutex_unlock_slowpath+0xa2/0x280
 [ 1270.474994]
 [ 1270.474994] and this task is already holding:
 [ 1270.475440]  (&amp;pool-&gt;lock/1){-.-.}, at: [&lt;ffffffff8d2992f6&gt;] worker_thread+0x366/0x3c0
 [ 1270.476046] which would create a new lock dependency:
 [ 1270.476436]  (&amp;pool-&gt;lock/1){-.-.} -&gt; (&amp;(&amp;lock-&gt;wait_lock)-&gt;rlock){+.+.}
 [ 1270.476949]
 [ 1270.476949] but this new dependency connects a HARDIRQ-irq-safe lock:
 [ 1270.477553]  (&amp;pool-&gt;lock/1){-.-.}
 ...
 [ 1270.488900] to a HARDIRQ-irq-unsafe lock:
 [ 1270.489327]  (&amp;(&amp;lock-&gt;wait_lock)-&gt;rlock){+.+.}
 ...
 [ 1270.494735]  Possible interrupt unsafe locking scenario:
 [ 1270.494735]
 [ 1270.495250]        CPU0                    CPU1
 [ 1270.495600]        ----                    ----
 [ 1270.495947]   lock(&amp;(&amp;lock-&gt;wait_lock)-&gt;rlock);
 [ 1270.496295]                                local_irq_disable();
 [ 1270.496753]                                lock(&amp;pool-&gt;lock/1);
 [ 1270.497205]                                lock(&amp;(&amp;lock-&gt;wait_lock)-&gt;rlock);
 [ 1270.497744]   &lt;Interrupt&gt;
 [ 1270.497948]     lock(&amp;pool-&gt;lock/1);

, which will cause a irq inversion deadlock if the above lock scenario
happens.

The root cause of this safe -&gt; unsafe lock order is the
mutex_unlock(pool-&gt;manager_arb) in manage_workers() with pool-&gt;lock
held.

Unlocking mutex while holding an irq spinlock was never safe and this
problem has been around forever but it never got noticed because the
only time the mutex is usually trylocked while holding irqlock making
actual failures very unlikely and lockdep annotation missed the
condition until the recent b9c16a0e1f73 ("locking/mutex: Fix
lockdep_assert_held() fail").

Using mutex for pool-&gt;manager_arb has always been a bit of stretch.
It primarily is an mechanism to arbitrate managership between workers
which can easily be done with a pool flag.  The only reason it became
a mutex is that pool destruction path wants to exclude parallel
managing operations.

This patch replaces the mutex with a new pool flag POOL_MANAGER_ACTIVE
and make the destruction path wait for the current manager on a wait
queue.

v2: Drop unnecessary flag clearing before pool destruction as
    suggested by Boqun.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: Lai Jiangshan &lt;jiangshanlai@gmail.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>workqueue: implicit ordered attribute should be overridable</title>
<updated>2017-08-11T16:09:00Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2017-07-23T12:36:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=34a08ae493f1970d5ce80dd3812b8dba4e5cbe22'/>
<id>urn:sha1:34a08ae493f1970d5ce80dd3812b8dba4e5cbe22</id>
<content type='text'>
commit 0a94efb5acbb6980d7c9ab604372d93cd507e4d8 upstream.

5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be
ordered") automatically enabled ordered attribute for unbound
workqueues w/ max_active == 1.  Because ordered workqueues reject
max_active and some attribute changes, this implicit ordered mode
broke cases where the user creates an unbound workqueue w/ max_active
== 1 and later explicitly changes the related attributes.

This patch distinguishes explicit and implicit ordered setting and
overrides from attribute changes if implict.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Fixes: 5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")
Cc: Holger Hoffstätte &lt;holger@applied-asynchrony.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>workqueue: restore WQ_UNBOUND/max_active==1 to be ordered</title>
<updated>2017-08-11T16:08:46Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2017-07-18T22:41:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c59eec4dad4a95f6da1b8ea688e361416869e42d'/>
<id>urn:sha1:c59eec4dad4a95f6da1b8ea688e361416869e42d</id>
<content type='text'>
commit 5c0338c68706be53b3dc472e4308961c36e4ece1 upstream.

The combination of WQ_UNBOUND and max_active == 1 used to imply
ordered execution.  After NUMA affinity 4c16bd327c74 ("workqueue:
implement NUMA affinity for unbound workqueues"), this is no longer
true due to per-node worker pools.

While the right way to create an ordered workqueue is
alloc_ordered_workqueue(), the documentation has been misleading for a
long time and people do use WQ_UNBOUND and max_active == 1 for ordered
workqueues which can lead to subtle bugs which are very difficult to
trigger.

It's unlikely that we'd see noticeable performance impact by enforcing
ordering on WQ_UNBOUND / max_active == 1 workqueues.  Let's
automatically set __WQ_ORDERED for those workqueues.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Christoph Hellwig &lt;hch@infradead.org&gt;
Reported-by: Alexei Potashnik &lt;alexei@purestorage.com&gt;
Fixes: 4c16bd327c74 ("workqueue: implement NUMA affinity for unbound workqueues")
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>workqueue: fix rebind bound workers warning</title>
<updated>2016-05-19T00:06:50Z</updated>
<author>
<name>Wanpeng Li</name>
<email>wanpeng.li@hotmail.com</email>
</author>
<published>2016-05-11T09:55:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cf73d8ad76e4555a45ee399887b7c0361354d10f'/>
<id>urn:sha1:cf73d8ad76e4555a45ee399887b7c0361354d10f</id>
<content type='text'>
commit f7c17d26f43d5cc1b7a6b896cd2fa24a079739b9 upstream.

------------[ cut here ]------------
WARNING: CPU: 0 PID: 16 at kernel/workqueue.c:4559 rebind_workers+0x1c0/0x1d0
Modules linked in:
CPU: 0 PID: 16 Comm: cpuhp/0 Not tainted 4.6.0-rc4+ #31
Hardware name: IBM IBM System x3550 M4 Server -[7914IUW]-/00Y8603, BIOS -[D7E128FUS-1.40]- 07/23/2013
 0000000000000000 ffff881037babb58 ffffffff8139d885 0000000000000010
 0000000000000000 0000000000000000 0000000000000000 ffff881037babba8
 ffffffff8108505d ffff881037ba0000 000011cf3e7d6e60 0000000000000046
Call Trace:
 dump_stack+0x89/0xd4
 __warn+0xfd/0x120
 warn_slowpath_null+0x1d/0x20
 rebind_workers+0x1c0/0x1d0
 workqueue_cpu_up_callback+0xf5/0x1d0
 notifier_call_chain+0x64/0x90
 ? trace_hardirqs_on_caller+0xf2/0x220
 ? notify_prepare+0x80/0x80
 __raw_notifier_call_chain+0xe/0x10
 __cpu_notify+0x35/0x50
 notify_down_prepare+0x5e/0x80
 ? notify_prepare+0x80/0x80
 cpuhp_invoke_callback+0x73/0x330
 ? __schedule+0x33e/0x8a0
 cpuhp_down_callbacks+0x51/0xc0
 cpuhp_thread_fun+0xc1/0xf0
 smpboot_thread_fn+0x159/0x2a0
 ? smpboot_create_threads+0x80/0x80
 kthread+0xef/0x110
 ? wait_for_completion+0xf0/0x120
 ? schedule_tail+0x35/0xf0
 ret_from_fork+0x22/0x50
 ? __init_kthread_worker+0x70/0x70
---[ end trace eb12ae47d2382d8f ]---
notify_down_prepare: attempt to take down CPU 0 failed

This bug can be reproduced by below config w/ nohz_full= all cpus:

CONFIG_BOOTPARAM_HOTPLUG_CPU0=y
CONFIG_DEBUG_HOTPLUG_CPU0=y
CONFIG_NO_HZ_FULL=y

As Thomas pointed out:

| If a down prepare callback fails, then DOWN_FAILED is invoked for all
| callbacks which have successfully executed DOWN_PREPARE.
|
| But, workqueue has actually two notifiers. One which handles
| UP/DOWN_FAILED/ONLINE and one which handles DOWN_PREPARE.
|
| Now look at the priorities of those callbacks:
|
| CPU_PRI_WORKQUEUE_UP        = 5
| CPU_PRI_WORKQUEUE_DOWN      = -5
|
| So the call order on DOWN_PREPARE is:
|
| CB 1
| CB ...
| CB workqueue_up() -&gt; Ignores DOWN_PREPARE
| CB ...
| CB X ---&gt; Fails
|
| So we call up to CB X with DOWN_FAILED
|
| CB 1
| CB ...
| CB workqueue_up() -&gt; Handles DOWN_FAILED
| CB ...
| CB X-1
|
| So the problem is that the workqueue stuff handles DOWN_FAILED in the up
| callback, while it should do it in the down callback. Which is not a good idea
| either because it wants to be called early on rollback...
|
| Brilliant stuff, isn't it? The hotplug rework will solve this problem because
| the callbacks become symetric, but for the existing mess, we need some
| workaround in the workqueue code.

The boot CPU handles housekeeping duty(unbound timers, workqueues,
timekeeping, ...) on behalf of full dynticks CPUs. It must remain
online when nohz full is enabled. There is a priority set to every
notifier_blocks:

workqueue_cpu_up &gt; tick_nohz_cpu_down &gt; workqueue_cpu_down

So tick_nohz_cpu_down callback failed when down prepare cpu 0, and
notifier_blocks behind tick_nohz_cpu_down will not be called any
more, which leads to workers are actually not unbound. Then hotplug
state machine will fallback to undo and online cpu 0 again. Workers
will be rebound unconditionally even if they are not unbound and
trigger the warning in this progress.

This patch fix it by catching !DISASSOCIATED to avoid rebind bound
workers.

Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Lai Jiangshan &lt;jiangshanlai@gmail.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Frédéric Weisbecker &lt;fweisbec@gmail.com&gt;
Suggested-by: Lai Jiangshan &lt;jiangshanlai@gmail.com&gt;
Signed-off-by: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>workqueue: fix ghost PENDING flag while doing MQ IO</title>
<updated>2016-05-04T21:48:49Z</updated>
<author>
<name>Roman Pen</name>
<email>roman.penyaev@profitbricks.com</email>
</author>
<published>2016-04-26T11:15:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2da9606aea5a8fd1b710f8c8dd5295da4825e9cd'/>
<id>urn:sha1:2da9606aea5a8fd1b710f8c8dd5295da4825e9cd</id>
<content type='text'>
commit 346c09f80459a3ad97df1816d6d606169a51001a upstream.

The bug in a workqueue leads to a stalled IO request in MQ ctx-&gt;rq_list
with the following backtrace:

[  601.347452] INFO: task kworker/u129:5:1636 blocked for more than 120 seconds.
[  601.347574]       Tainted: G           O    4.4.5-1-storage+ #6
[  601.347651] "echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  601.348142] kworker/u129:5  D ffff880803077988     0  1636      2 0x00000000
[  601.348519] Workqueue: ibnbd_server_fileio_wq ibnbd_dev_file_submit_io_worker [ibnbd_server]
[  601.348999]  ffff880803077988 ffff88080466b900 ffff8808033f9c80 ffff880803078000
[  601.349662]  ffff880807c95000 7fffffffffffffff ffffffff815b0920 ffff880803077ad0
[  601.350333]  ffff8808030779a0 ffffffff815b01d5 0000000000000000 ffff880803077a38
[  601.350965] Call Trace:
[  601.351203]  [&lt;ffffffff815b0920&gt;] ? bit_wait+0x60/0x60
[  601.351444]  [&lt;ffffffff815b01d5&gt;] schedule+0x35/0x80
[  601.351709]  [&lt;ffffffff815b2dd2&gt;] schedule_timeout+0x192/0x230
[  601.351958]  [&lt;ffffffff812d43f7&gt;] ? blk_flush_plug_list+0xc7/0x220
[  601.352208]  [&lt;ffffffff810bd737&gt;] ? ktime_get+0x37/0xa0
[  601.352446]  [&lt;ffffffff815b0920&gt;] ? bit_wait+0x60/0x60
[  601.352688]  [&lt;ffffffff815af784&gt;] io_schedule_timeout+0xa4/0x110
[  601.352951]  [&lt;ffffffff815b3a4e&gt;] ? _raw_spin_unlock_irqrestore+0xe/0x10
[  601.353196]  [&lt;ffffffff815b093b&gt;] bit_wait_io+0x1b/0x70
[  601.353440]  [&lt;ffffffff815b056d&gt;] __wait_on_bit+0x5d/0x90
[  601.353689]  [&lt;ffffffff81127bd0&gt;] wait_on_page_bit+0xc0/0xd0
[  601.353958]  [&lt;ffffffff81096db0&gt;] ? autoremove_wake_function+0x40/0x40
[  601.354200]  [&lt;ffffffff81127cc4&gt;] __filemap_fdatawait_range+0xe4/0x140
[  601.354441]  [&lt;ffffffff81127d34&gt;] filemap_fdatawait_range+0x14/0x30
[  601.354688]  [&lt;ffffffff81129a9f&gt;] filemap_write_and_wait_range+0x3f/0x70
[  601.354932]  [&lt;ffffffff811ced3b&gt;] blkdev_fsync+0x1b/0x50
[  601.355193]  [&lt;ffffffff811c82d9&gt;] vfs_fsync_range+0x49/0xa0
[  601.355432]  [&lt;ffffffff811cf45a&gt;] blkdev_write_iter+0xca/0x100
[  601.355679]  [&lt;ffffffff81197b1a&gt;] __vfs_write+0xaa/0xe0
[  601.355925]  [&lt;ffffffff81198379&gt;] vfs_write+0xa9/0x1a0
[  601.356164]  [&lt;ffffffff811c59d8&gt;] kernel_write+0x38/0x50

The underlying device is a null_blk, with default parameters:

  queue_mode    = MQ
  submit_queues = 1

Verification that nullb0 has something inflight:

root@pserver8:~# cat /sys/block/nullb0/inflight
       0        1
root@pserver8:~# find /sys/block/nullb0/mq/0/cpu* -name rq_list -print -exec cat {} \;
...
/sys/block/nullb0/mq/0/cpu2/rq_list
CTX pending:
        ffff8838038e2400
...

During debug it became clear that stalled request is always inserted in
the rq_list from the following path:

   save_stack_trace_tsk + 34
   blk_mq_insert_requests + 231
   blk_mq_flush_plug_list + 281
   blk_flush_plug_list + 199
   wait_on_page_bit + 192
   __filemap_fdatawait_range + 228
   filemap_fdatawait_range + 20
   filemap_write_and_wait_range + 63
   blkdev_fsync + 27
   vfs_fsync_range + 73
   blkdev_write_iter + 202
   __vfs_write + 170
   vfs_write + 169
   kernel_write + 56

So blk_flush_plug_list() was called with from_schedule == true.

If from_schedule is true, that means that finally blk_mq_insert_requests()
offloads execution of __blk_mq_run_hw_queue() and uses kblockd workqueue,
i.e. it calls kblockd_schedule_delayed_work_on().

That means, that we race with another CPU, which is about to execute
__blk_mq_run_hw_queue() work.

Further debugging shows the following traces from different CPUs:

  CPU#0                                  CPU#1
  ----------------------------------     -------------------------------
  reqeust A inserted
  STORE hctx-&gt;ctx_map[0] bit marked
  kblockd_schedule...() returns 1
  &lt;schedule to kblockd workqueue&gt;
                                         request B inserted
                                         STORE hctx-&gt;ctx_map[1] bit marked
                                         kblockd_schedule...() returns 0
  *** WORK PENDING bit is cleared ***
  flush_busy_ctxs() is executed, but
  bit 1, set by CPU#1, is not observed

As a result request B pended forever.

This behaviour can be explained by speculative LOAD of hctx-&gt;ctx_map on
CPU#0, which is reordered with clear of PENDING bit and executed _before_
actual STORE of bit 1 on CPU#1.

The proper fix is an explicit full barrier &lt;mfence&gt;, which guarantees
that clear of PENDING bit is to be executed before all possible
speculative LOADS or STORES inside actual work function.

Signed-off-by: Roman Pen &lt;roman.penyaev@profitbricks.com&gt;
Cc: Gioh Kim &lt;gi-oh.kim@profitbricks.com&gt;
Cc: Michael Wang &lt;yun.wang@profitbricks.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Revert "workqueue: make sure delayed work run in local cpu"</title>
<updated>2016-03-03T23:07:27Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2016-02-09T21:11:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6684710434d07dfd7e512e5ecc77eefb5a30151e'/>
<id>urn:sha1:6684710434d07dfd7e512e5ecc77eefb5a30151e</id>
<content type='text'>
commit 041bd12e272c53a35c54c13875839bcb98c999ce upstream.

This reverts commit 874bbfe600a660cba9c776b3957b1ce393151b76.

Workqueue used to implicity guarantee that work items queued without
explicit CPU specified are put on the local CPU.  Recent changes in
timer broke the guarantee and led to vmstat breakage which was fixed
by 176bed1de5bf ("vmstat: explicitly schedule per-cpu work on the CPU
we need it to run on").

vmstat is the most likely to expose the issue and it's quite possible
that there are other similar problems which are a lot more difficult
to trigger.  As a preventive measure, 874bbfe600a6 ("workqueue: make
sure delayed work run in local cpu") was applied to restore the local
CPU guarnatee.  Unfortunately, the change exposed a bug in timer code
which got fixed by 22b886dd1018 ("timers: Use proper base migration in
add_timer_on()").  Due to code restructuring, the commit couldn't be
backported beyond certain point and stable kernels which only had
874bbfe600a6 started crashing.

The local CPU guarantee was accidental more than anything else and we
want to get rid of it anyway.  As, with the vmstat case fixed,
874bbfe600a6 is causing more problems than it's fixing, it has been
decided to take the chance and officially break the guarantee by
reverting the commit.  A debug feature will be added to force foreign
CPU assignment to expose cases relying on the guarantee and fixes for
the individual cases will be backported to stable as necessary.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Fixes: 874bbfe600a6 ("workqueue: make sure delayed work run in local cpu")
Link: http://lkml.kernel.org/g/20160120211926.GJ10810@quack.suse.cz
Cc: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Cc: Henrique de Moraes Holschuh &lt;hmh@hmh.eng.br&gt;
Cc: Daniel Bilik &lt;daniel.bilik@neosystem.cz&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Shaohua Li &lt;shli@fb.com&gt;
Cc: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Cc: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Daniel Bilik &lt;daniel.bilik@neosystem.cz&gt;
Cc: Jiri Slaby &lt;jslaby@suse.cz&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup</title>
<updated>2016-03-03T23:07:27Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2016-02-03T18:54:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=21b34b4574f8619907eb751b37c1831bfa3f2440'/>
<id>urn:sha1:21b34b4574f8619907eb751b37c1831bfa3f2440</id>
<content type='text'>
commit d6e022f1d207a161cd88e08ef0371554680ffc46 upstream.

When looking up the pool_workqueue to use for an unbound workqueue,
workqueue assumes that the target CPU is always bound to a valid NUMA
node.  However, currently, when a CPU goes offline, the mapping is
destroyed and cpu_to_node() returns NUMA_NO_NODE.

This has always been broken but hasn't triggered often enough before
874bbfe600a6 ("workqueue: make sure delayed work run in local cpu").
After the commit, workqueue forcifully assigns the local CPU for
delayed work items without explicit target CPU to fix a different
issue.  This widens the window where CPU can go offline while a
delayed work item is pending causing delayed work items dispatched
with target CPU set to an already offlined CPU.  The resulting
NUMA_NO_NODE mapping makes workqueue try to queue the work item on a
NULL pool_workqueue and thus crash.

While 874bbfe600a6 has been reverted for a different reason making the
bug less visible again, it can still happen.  Fix it by mapping
NUMA_NO_NODE to the default pool_workqueue from unbound_pwq_by_node().
This is a temporary workaround.  The long term solution is keeping CPU
-&gt; NODE mapping stable across CPU off/online cycles which is being
worked on.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Cc: Tang Chen &lt;tangchen@cn.fujitsu.com&gt;
Cc: Rafael J. Wysocki &lt;rafael@kernel.org&gt;
Cc: Len Brown &lt;len.brown@intel.com&gt;
Link: http://lkml.kernel.org/g/1454424264.11183.46.camel@gmail.com
Link: http://lkml.kernel.org/g/1453702100-2597-1-git-send-email-tangchen@cn.fujitsu.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
