<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/workqueue.c, branch v3.18.32</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.18.32</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.18.32'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2016-01-25T12:00:06Z</updated>
<entry>
<title>Revert "workqueue: make sure delayed work run in local cpu"</title>
<updated>2016-01-25T12:00:06Z</updated>
<author>
<name>Sasha Levin</name>
<email>sasha.levin@oracle.com</email>
</author>
<published>2016-01-25T12:00:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1eacda6c0dc441384129b5260a485cbb8b4e218a'/>
<id>urn:sha1:1eacda6c0dc441384129b5260a485cbb8b4e218a</id>
<content type='text'>
This reverts commit 1e7af294dd037af63e74fe13e2b6afb93105ed3f.

This commit is only needed on 4.1+

Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>workqueue: make sure delayed work run in local cpu</title>
<updated>2015-11-13T18:19:09Z</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2015-09-30T16:05:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1e7af294dd037af63e74fe13e2b6afb93105ed3f'/>
<id>urn:sha1:1e7af294dd037af63e74fe13e2b6afb93105ed3f</id>
<content type='text'>
[ Upstream commit 874bbfe600a660cba9c776b3957b1ce393151b76 ]

My system keeps crashing with below message. vmstat_update() schedules a delayed
work in current cpu and expects the work runs in the cpu.
schedule_delayed_work() is expected to make delayed work run in local cpu. The
problem is timer can be migrated with NO_HZ. __queue_work() queues work in
timer handler, which could run in a different cpu other than where the delayed
work is scheduled. The end result is the delayed work runs in different cpu.
The patch makes __queue_delayed_work records local cpu earlier. Where the timer
runs doesn't change where the work runs with the change.

[   28.010131] ------------[ cut here ]------------
[   28.010609] kernel BUG at ../mm/vmstat.c:1392!
[   28.011099] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[   28.011860] Modules linked in:
[   28.012245] CPU: 0 PID: 289 Comm: kworker/0:3 Tainted: G        W4.3.0-rc3+ #634
[   28.013065] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153802- 04/01/2014
[   28.014160] Workqueue: events vmstat_update
[   28.014571] task: ffff880117682580 ti: ffff8800ba428000 task.ti: ffff8800ba428000
[   28.015445] RIP: 0010:[&lt;ffffffff8115f921&gt;]  [&lt;ffffffff8115f921&gt;]vmstat_update+0x31/0x80
[   28.016282] RSP: 0018:ffff8800ba42fd80  EFLAGS: 00010297
[   28.016812] RAX: 0000000000000000 RBX: ffff88011a858dc0 RCX:0000000000000000
[   28.017585] RDX: ffff880117682580 RSI: ffffffff81f14d8c RDI:ffffffff81f4df8d
[   28.018366] RBP: ffff8800ba42fd90 R08: 0000000000000001 R09:0000000000000000
[   28.019169] R10: 0000000000000000 R11: 0000000000000121 R12:ffff8800baa9f640
[   28.019947] R13: ffff88011a81e340 R14: ffff88011a823700 R15:0000000000000000
[   28.020071] FS:  0000000000000000(0000) GS:ffff88011a800000(0000)knlGS:0000000000000000
[   28.020071] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   28.020071] CR2: 00007ff6144b01d0 CR3: 00000000b8e93000 CR4:00000000000006f0
[   28.020071] Stack:
[   28.020071]  ffff88011a858dc0 ffff8800baa9f640 ffff8800ba42fe00ffffffff8106bd88
[   28.020071]  ffffffff8106bd0b 0000000000000096 0000000000000000ffffffff82f9b1e8
[   28.020071]  ffffffff829f0b10 0000000000000000 ffffffff81f18460ffff88011a81e340
[   28.020071] Call Trace:
[   28.020071]  [&lt;ffffffff8106bd88&gt;] process_one_work+0x1c8/0x540
[   28.020071]  [&lt;ffffffff8106bd0b&gt;] ? process_one_work+0x14b/0x540
[   28.020071]  [&lt;ffffffff8106c214&gt;] worker_thread+0x114/0x460
[   28.020071]  [&lt;ffffffff8106c100&gt;] ? process_one_work+0x540/0x540
[   28.020071]  [&lt;ffffffff81071bf8&gt;] kthread+0xf8/0x110
[   28.020071]  [&lt;ffffffff81071b00&gt;] ?kthread_create_on_node+0x200/0x200
[   28.020071]  [&lt;ffffffff81a6522f&gt;] ret_from_fork+0x3f/0x70
[   28.020071]  [&lt;ffffffff81071b00&gt;] ?kthread_create_on_node+0x200/0x200

Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: stable@vger.kernel.org # v2.6.31+
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE</title>
<updated>2015-03-28T13:37:48Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-03-05T13:04:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d4bc18f7bbd9aa62b1794fe8f409bd0c1b88a597'/>
<id>urn:sha1:d4bc18f7bbd9aa62b1794fe8f409bd0c1b88a597</id>
<content type='text'>
[ Upstream commit 8603e1b30027f943cc9c1eef2b291d42c3347af1 ]

cancel[_delayed]_work_sync() are implemented using
__cancel_work_timer() which grabs the PENDING bit using
try_to_grab_pending() and then flushes the work item with PENDING set
to prevent the on-going execution of the work item from requeueing
itself.

try_to_grab_pending() can always grab PENDING bit without blocking
except when someone else is doing the above flushing during
cancelation.  In that case, try_to_grab_pending() returns -ENOENT.  In
this case, __cancel_work_timer() currently invokes flush_work().  The
assumption is that the completion of the work item is what the other
canceling task would be waiting for too and thus waiting for the same
condition and retrying should allow forward progress without excessive
busy looping

Unfortunately, this doesn't work if preemption is disabled or the
latter task has real time priority.  Let's say task A just got woken
up from flush_work() by the completion of the target work item.  If,
before task A starts executing, task B gets scheduled and invokes
__cancel_work_timer() on the same work item, its try_to_grab_pending()
will return -ENOENT as the work item is still being canceled by task A
and flush_work() will also immediately return false as the work item
is no longer executing.  This puts task B in a busy loop possibly
preventing task A from executing and clearing the canceling state on
the work item leading to a hang.

task A			task B			worker

						executing work
__cancel_work_timer()
  try_to_grab_pending()
  set work CANCELING
  flush_work()
    block for work completion
						completion, wakes up A
			__cancel_work_timer()
			while (forever) {
			  try_to_grab_pending()
			    -ENOENT as work is being canceled
			  flush_work()
			    false as work is no longer executing
			}

This patch removes the possible hang by updating __cancel_work_timer()
to explicitly wait for clearing of CANCELING rather than invoking
flush_work() after try_to_grab_pending() fails with -ENOENT.

Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com

v3: bit_waitqueue() can't be used for work items defined in vmalloc
    area.  Switched to custom wake function which matches the target
    work item and exclusive wait and wakeup.

v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if
    the target bit waitqueue has wait_bit_queue's on it.  Use
    DEFINE_WAIT_BIT() and __wake_up_bit() instead.  Reported by Tomeu
    Vizoso.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Rabin Vincent &lt;rabin.vincent@axis.com&gt;
Cc: Tomeu Vizoso &lt;tomeu.vizoso@gmail.com&gt;
Cc: stable@vger.kernel.org
Tested-by: Jesper Nilsson &lt;jesper.nilsson@axis.com&gt;
Tested-by: Rabin Vincent &lt;rabin.vincent@axis.com&gt;
Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
</content>
</entry>
<entry>
<title>workqueue: fix subtle pool management issue which can stall whole worker_pool</title>
<updated>2015-01-30T01:40:43Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-01-16T19:21:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8fdd4f779530d6de523b24816bba81b1f8d5766e'/>
<id>urn:sha1:8fdd4f779530d6de523b24816bba81b1f8d5766e</id>
<content type='text'>
commit 29187a9eeaf362d8422e62e17a22a6e115277a49 upstream.

A worker_pool's forward progress is guaranteed by the fact that the
last idle worker assumes the manager role to create more workers and
summon the rescuers if creating workers doesn't succeed in timely
manner before proceeding to execute work items.

This manager role is implemented in manage_workers(), which indicates
whether the worker may proceed to work item execution with its return
value.  This is necessary because multiple workers may contend for the
manager role, and, if there already is a manager, others should
proceed to work item execution.

Unfortunately, the function also indicates that the worker may proceed
to work item execution if need_to_create_worker() is false at the head
of the function.  need_to_create_worker() tests the following
conditions.

	pending work items &amp;&amp; !nr_running &amp;&amp; !nr_idle

The first and third conditions are protected by pool-&gt;lock and thus
won't change while holding pool-&gt;lock; however, nr_running can change
asynchronously as other workers block and resume and while it's likely
to be zero, as someone woke this worker up in the first place, some
other workers could have become runnable inbetween making it non-zero.

If this happens, manage_worker() could return false even with zero
nr_idle making the worker, the last idle one, proceed to execute work
items.  If then all workers of the pool end up blocking on a resource
which can only be released by a work item which is pending on that
pool, the whole pool can deadlock as there's no one to create more
workers or summon the rescuers.

This patch fixes the problem by removing the early exit condition from
maybe_create_worker() and making manage_workers() return false iff
there's already another manager, which ensures that the last worker
doesn't start executing work items.

We can leave the early exit condition alone and just ignore the return
value but the only reason it was put there is because the
manage_workers() used to perform both creations and destructions of
workers and thus the function may be invoked while the pool is trying
to reduce the number of workers.  Now that manage_workers() is called
only when more workers are needed, the only case this early exit
condition is triggered is rare race conditions rendering it pointless.

Tested with simulated workload and modified workqueue code which
trigger the pool deadlock reliably without this patch.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Eric Sandeen &lt;sandeen@sandeen.net&gt;
Link: http://lkml.kernel.org/g/54B019F4.8030009@sandeen.net
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>workqueue: Use cond_resched_rcu_qs macro</title>
<updated>2014-10-06T12:58:26Z</updated>
<author>
<name>Joe Lawrence</name>
<email>joe.lawrence@stratus.com</email>
</author>
<published>2014-10-05T17:24:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3e28e377204badfc3c4119ff2abda473127ee0ff'/>
<id>urn:sha1:3e28e377204badfc3c4119ff2abda473127ee0ff</id>
<content type='text'>
Tidy up and use cond_resched_rcu_qs when calling cond_resched and
reporting potential quiescent state to RCU.  Splitting this change in
this way allows easy backporting to -stable for kernel versions not
having cond_resched_rcu_qs().

Signed-off-by: Joe Lawrence &lt;joe.lawrence@stratus.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
</entry>
<entry>
<title>workqueue: Add quiescent state between work items</title>
<updated>2014-10-06T12:57:43Z</updated>
<author>
<name>Joe Lawrence</name>
<email>joe.lawrence@stratus.com</email>
</author>
<published>2014-10-05T17:24:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=789cbbeca4eb7141cbd748ee93772471101b507b'/>
<id>urn:sha1:789cbbeca4eb7141cbd748ee93772471101b507b</id>
<content type='text'>
Similar to the stop_machine deadlock scenario on !PREEMPT kernels
addressed in b22ce2785d97 "workqueue: cond_resched() after processing
each work item", kworker threads requeueing back-to-back with zero jiffy
delay can stall RCU. The cond_resched call introduced in that fix will
yield only iff there are other higher priority tasks to run, so force a
quiescent RCU state between work items.

Signed-off-by: Joe Lawrence &lt;joe.lawrence@stratus.com&gt;
Link: https://lkml.kernel.org/r/20140926105227.01325697@jlaw-desktop.mno.stratus.com
Link: https://lkml.kernel.org/r/20140929115445.40221d8e@jlaw-desktop.mno.stratus.com
Fixes: b22ce2785d97 ("workqueue: cond_resched() after processing each work item")
Cc: &lt;stable@vger.kernel.org&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu</title>
<updated>2014-08-04T17:09:27Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-08-04T17:09:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f2a84170ede80e4b80f636e3700ef4d4d5dc7d33'/>
<id>urn:sha1:f2a84170ede80e4b80f636e3700ef4d4d5dc7d33</id>
<content type='text'>
Pull percpu updates from Tejun Heo:

 - Major reorganization of percpu header files which I think makes
   things a lot more readable and logical than before.

 - percpu-refcount is updated so that it requires explicit destruction
   and can be reinitialized if necessary.  This was pulled into the
   block tree to replace the custom percpu refcnting implemented in
   blk-mq.

 - In the process, percpu and percpu-refcount got cleaned up a bit

* 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (21 commits)
  percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero()
  percpu-refcount: require percpu_ref to be exited explicitly
  percpu-refcount: use unsigned long for pcpu_count pointer
  percpu-refcount: add helpers for -&gt;percpu_count accesses
  percpu-refcount: one bit is enough for REF_STATUS
  percpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc()
  workqueue: stronger test in process_one_work()
  workqueue: clear POOL_DISASSOCIATED in rebind_workers()
  percpu: Use ALIGN macro instead of hand coding alignment calculation
  percpu: invoke __verify_pcpu_ptr() from the generic part of accessors and operations
  percpu: preffity percpu header files
  percpu: use raw_cpu_*() to define __this_cpu_*()
  percpu: reorder macros in percpu header files
  percpu: move {raw|this}_cpu_*() definitions to include/linux/percpu-defs.h
  percpu: move generic {raw|this}_cpu_*_N() definitions to include/asm-generic/percpu.h
  percpu: only allow sized arch overrides for {raw|this}_cpu_*() ops
  percpu: reorganize include/linux/percpu-defs.h
  percpu: move accessors from include/linux/percpu.h to percpu-defs.h
  percpu: include/asm-generic/percpu.h should contain only arch-overridable parts
  percpu: introduce arch_raw_cpu_ptr()
  ...
</content>
</entry>
<entry>
<title>Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq</title>
<updated>2014-08-04T17:04:44Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-08-04T17:04:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c4c3f5fba01e189fb3618f09545abdb4cf8ec8ee'/>
<id>urn:sha1:c4c3f5fba01e189fb3618f09545abdb4cf8ec8ee</id>
<content type='text'>
Pull workqueue updates from Tejun Heo:
 "Lai has been doing a lot of cleanups of workqueue and kthread_work.
  No significant behavior change.  Just a lot of cleanups all over the
  place.  Some are a bit invasive but overall nothing too dangerous"

* 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  kthread_work: remove the unused wait_queue_head
  kthread_work: wake up worker only when the worker is idle
  workqueue: use nr_node_ids instead of wq_numa_tbl_len
  workqueue: remove the misnamed out_unlock label in get_unbound_pool()
  workqueue: remove the stale comment in pwq_unbound_release_workfn()
  workqueue: move rescuer pool detachment to the end
  workqueue: unfold start_worker() into create_worker()
  workqueue: remove @wakeup from worker_set_flags()
  workqueue: remove an unneeded UNBOUND test before waking up the next worker
  workqueue: wake regular worker if need_more_worker() when rescuer leave the pool
  workqueue: alloc struct worker on its local node
  workqueue: reuse the already calculated pwq in try_to_grab_pending()
  workqueue: stronger test in process_one_work()
  workqueue: clear POOL_DISASSOCIATED in rebind_workers()
  workqueue: sanity check pool-&gt;cpu in wq_worker_sleeping()
  workqueue: clear leftover flags when detached
  workqueue: remove useless WARN_ON_ONCE()
  workqueue: use schedule_timeout_interruptible() instead of open code
  workqueue: remove the empty check in too_many_workers()
  workqueue: use "pool-&gt;cpu &lt; 0" to stand for an unbound pool
</content>
</entry>
<entry>
<title>workqueue: use nr_node_ids instead of wq_numa_tbl_len</title>
<updated>2014-07-22T16:10:39Z</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@cn.fujitsu.com</email>
</author>
<published>2014-07-22T05:05:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ddcb57e2ed0a4d0de5aef06735dd9df98894f818'/>
<id>urn:sha1:ddcb57e2ed0a4d0de5aef06735dd9df98894f818</id>
<content type='text'>
They are the same and nr_node_ids is provided by the memory subsystem.

Signed-off-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: remove the misnamed out_unlock label in get_unbound_pool()</title>
<updated>2014-07-22T16:10:39Z</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@cn.fujitsu.com</email>
</author>
<published>2014-07-22T05:04:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3fb1823c093ebe1869d34005837f64df64713780'/>
<id>urn:sha1:3fb1823c093ebe1869d34005837f64df64713780</id>
<content type='text'>
After the locking was moved up to the caller of the get_unbound_pool(),
out_unlock label doesn't need to do any unlock operation and the name
became bad, so we just remove this label, and the only usage-site
"goto out_unlock" is subsituted to "return pool".

Signed-off-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
</feed>
