<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/workqueue.c, branch v3.10.32</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.10.32</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.10.32'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2013-12-04T18:57:16Z</updated>
<entry>
<title>workqueue: fix ordered workqueues in NUMA setups</title>
<updated>2013-12-04T18:57:16Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-09-05T16:30:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ced4ac92852e8f17fabcbed7492ba459619640aa'/>
<id>urn:sha1:ced4ac92852e8f17fabcbed7492ba459619640aa</id>
<content type='text'>
commit 8a2b75384444488fc4f2cbb9f0921b6a0794838f upstream.

An ordered workqueue implements execution ordering by using single
pool_workqueue with max_active == 1.  On a given pool_workqueue, work
items are processed in FIFO order and limiting max_active to 1
enforces the queued work items to be processed one by one.

Unfortunately, 4c16bd327c ("workqueue: implement NUMA affinity for
unbound workqueues") accidentally broke this guarantee by applying
NUMA affinity to ordered workqueues too.  On NUMA setups, an ordered
workqueue would end up with separate pool_workqueues for different
nodes.  Each pool_workqueue still limits max_active to 1 but multiple
work items may be executed concurrently and out of order depending on
which node they are queued to.

Fix it by using dedicated ordered_wq_attrs[] when creating ordered
workqueues.  The new attrs match the unbound ones except that no_numa
is always set thus forcing all NUMA nodes to share the default
pool_workqueue.

While at it, add sanity check in workqueue creation path which
verifies that an ordered workqueues has only the default
pool_workqueue.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Libin &lt;huawei.libin@huawei.com&gt;
Cc: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>workqueue: cond_resched() after processing each work item</title>
<updated>2013-09-08T05:09:58Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-08-28T21:33:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6ff96f7340e8a0b7b7e2c40a26bc47fb320e6475'/>
<id>urn:sha1:6ff96f7340e8a0b7b7e2c40a26bc47fb320e6475</id>
<content type='text'>
commit b22ce2785d97423846206cceec4efee0c4afd980 upstream.

If !PREEMPT, a kworker running work items back to back can hog CPU.
This becomes dangerous when a self-requeueing work item which is
waiting for something to happen races against stop_machine.  Such
self-requeueing work item would requeue itself indefinitely hogging
the kworker and CPU it's running on while stop_machine would wait for
that CPU to enter stop_machine while preventing anything else from
happening on all other CPUs.  The two would deadlock.

Jamie Liu reports that this deadlock scenario exists around
scsi_requeue_run_queue() and libata port multiplier support, where one
port may exclude command processing from other ports.  With the right
timing, scsi_requeue_run_queue() can end up requeueing itself trying
to execute an IO which is asked to be retried while another device has
an exclusive access, which in turn can't make forward progress due to
stop_machine.

Fix it by invoking cond_resched() after executing each work item.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Jamie Liu &lt;jamieliu@google.com&gt;
References: http://thread.gmane.org/gmane.linux.kernel/1552567
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>workqueue: copy workqueue_attrs with all fields</title>
<updated>2013-08-12T01:35:25Z</updated>
<author>
<name>Shaohua Li</name>
<email>shli@kernel.org</email>
</author>
<published>2013-08-01T01:56:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=73b8bd6de83c0ca182622f83d31a1b0a137281fe'/>
<id>urn:sha1:73b8bd6de83c0ca182622f83d31a1b0a137281fe</id>
<content type='text'>
commit 2865a8fb44cc32420407362cbda80c10fa09c6b2 upstream.

 $echo '0' &gt; /sys/bus/workqueue/devices/xxx/numa
 $cat /sys/bus/workqueue/devices/xxx/numa

I got 1. It should be 0, the reason is copy_workqueue_attrs() called
in apply_workqueue_attrs() doesn't copy no_numa field.

Fix it by making copy_workqueue_attrs() copy -&gt;no_numa too.  This
would also make get_unbound_pool() set a pool's -&gt;no_numa attribute
according to the workqueue attributes used when the pool was created.
While harmelss, as -&gt;no_numa isn't a pool attribute, this is a bit
confusing.  Clear it explicitly.

tj: Updated description and comments a bit.

Signed-off-by: Shaohua Li &lt;shli@fusionio.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>workqueue: don't perform NUMA-aware allocations on offline nodes in wq_numa_init()</title>
<updated>2013-05-15T21:24:24Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-05-15T21:24:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1be0c25da56e860992af972a60321563ca2cfcd1'/>
<id>urn:sha1:1be0c25da56e860992af972a60321563ca2cfcd1</id>
<content type='text'>
wq_numa_init() builds per-node cpumasks which are later used to make
unbound workqueues NUMA-aware.  The cpumasks are allocated using
alloc_cpumask_var_node() for all possible nodes.  Unfortunately, on
machines with off-line nodes, this leads to NUMA-aware allocations on
existing bug offline nodes, which in turn triggers BUG in the memory
allocation code.

Fix it by using NUMA_NO_NODE for cpumask allocations for offline
nodes.

  kernel BUG at include/linux/gfp.h:323!
  invalid opcode: 0000 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0+ #1
  Hardware name: ProLiant BL465c G7, BIOS A19 12/10/2011
  task: ffff880234608000 ti: ffff880234602000 task.ti: ffff880234602000
  RIP: 0010:[&lt;ffffffff8117495d&gt;]  [&lt;ffffffff8117495d&gt;] new_slab+0x2ad/0x340
  RSP: 0000:ffff880234603bf8  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff880237404b40 RCX: 00000000000000d0
  RDX: 0000000000000001 RSI: 0000000000000003 RDI: 00000000002052d0
  RBP: ffff880234603c28 R08: 0000000000000000 R09: 0000000000000001
  R10: 0000000000000001 R11: ffffffff812e3aa8 R12: 0000000000000001
  R13: ffff8802378161c0 R14: 0000000000030027 R15: 00000000000040d0
  FS:  0000000000000000(0000) GS:ffff880237800000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: ffff88043fdff000 CR3: 00000000018d5000 CR4: 00000000000007f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Stack:
   ffff880234603c28 0000000000000001 00000000000000d0 ffff8802378161c0
   ffff880237404b40 ffff880237404b40 ffff880234603d28 ffffffff815edba1
   ffff880237816140 0000000000000000 ffff88023740e1c0
  Call Trace:
   [&lt;ffffffff815edba1&gt;] __slab_alloc+0x330/0x4f2
   [&lt;ffffffff81174b25&gt;] kmem_cache_alloc_node_trace+0xa5/0x200
   [&lt;ffffffff812e3aa8&gt;] alloc_cpumask_var_node+0x28/0x90
   [&lt;ffffffff81a0bdb3&gt;] wq_numa_init+0x10d/0x1be
   [&lt;ffffffff81a0bec8&gt;] init_workqueues+0x64/0x341
   [&lt;ffffffff810002ea&gt;] do_one_initcall+0xea/0x1a0
   [&lt;ffffffff819f1f31&gt;] kernel_init_freeable+0xb7/0x1ec
   [&lt;ffffffff815d50de&gt;] kernel_init+0xe/0xf0
   [&lt;ffffffff815ff89c&gt;] ret_from_fork+0x7c/0xb0
  Code: 45  84 ac 00 00 00 f0 41 80 4d 00 40 e9 f6 fe ff ff 66 0f 1f 84 00 00 00 00 00 e8 eb 4b ff ff 49 89 c5 e9 05 fe ff ff &lt;0f&gt; 0b 4c 8b 73 38 44 89 ff 81 cf 00 00 20 00 4c 89 f6 48 c1 ee

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-and-Tested-by: Lingzhu Xiang &lt;lxiang@redhat.com&gt;
</content>
</entry>
<entry>
<title>workqueue: Make schedule_work() available again to non GPL modules</title>
<updated>2013-05-14T18:52:51Z</updated>
<author>
<name>Marc Dionne</name>
<email>marc.c.dionne@gmail.com</email>
</author>
<published>2013-05-06T21:44:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ad7b1f841f8a54c6d61ff181451f55b68175e15a'/>
<id>urn:sha1:ad7b1f841f8a54c6d61ff181451f55b68175e15a</id>
<content type='text'>
Commit 8425e3d5bdbe ("workqueue: inline trivial wrappers") changed
schedule_work() and schedule_delayed_work() to inline wrappers,
but these rely on some symbols that are EXPORT_SYMBOL_GPL, while
the original functions were EXPORT_SYMBOL.  This has the effect of
changing the licensing requirement for these functions and making
them unavailable to non GPL modules.

Make them available again by removing the restriction on the
required symbols.

Signed-off-by: Marc Dionne &lt;marc.dionne@your-file-system.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: correct handling of the pool spin_lock</title>
<updated>2013-05-14T18:48:15Z</updated>
<author>
<name>Joonsoo Kim</name>
<email>js1304@gmail.com</email>
</author>
<published>2013-04-30T15:07:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8f174b1175a10903ade40f36eb6c896412877ca0'/>
<id>urn:sha1:8f174b1175a10903ade40f36eb6c896412877ca0</id>
<content type='text'>
When we fail to mutex_trylock(), we release the pool spin_lock and do
mutex_lock(). After that, we should regrab the pool spin_lock, but,
regrabbing is missed in current code. So correct it.

Cc: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Signed-off-by: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: workqueue_congested() shouldn't translate WORK_CPU_UNBOUND into node number</title>
<updated>2013-05-10T18:10:17Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-05-10T18:10:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d3251859168b0b12841e1b90d6d768ab478dc23d'/>
<id>urn:sha1:d3251859168b0b12841e1b90d6d768ab478dc23d</id>
<content type='text'>
df2d5ae499 ("workqueue: map an unbound workqueues to multiple per-node
pool_workqueues") made unbound workqueues to map to multiple per-node
pool_workqueues and accordingly updated workqueue_contested() so that,
for unbound workqueues, it maps the specified @cpu to the NUMA node
number to obtain the matching pool_workqueue to query the congested
state.

Before this change, workqueue_congested() ignored @cpu for unbound
workqueues as there was only one pool_workqueue and some users
(fscache) called it with WORK_CPU_UNBOUND.  After the commit, this
causes the following oops as WORK_CPU_UNBOUND gets translated to
garbage by cpu_to_node().

  BUG: unable to handle kernel paging request at ffff8803598d98b8
  IP: [&lt;ffffffff81043b7e&gt;] unbound_pwq_by_node+0xa1/0xfa
  PGD 2421067 PUD 0
  Oops: 0000 [#1] SMP
  CPU: 1 PID: 2689 Comm: cat Tainted: GF            3.9.0-fsdevel+ #4
  task: ffff88003d801040 ti: ffff880025806000 task.ti: ffff880025806000
  RIP: 0010:[&lt;ffffffff81043b7e&gt;]  [&lt;ffffffff81043b7e&gt;] unbound_pwq_by_node+0xa1/0xfa
  RSP: 0018:ffff880025807ad8  EFLAGS: 00010202
  RAX: 0000000000000001 RBX: ffff8800388a2400 RCX: 0000000000000003
  RDX: ffff880025807fd8 RSI: ffffffff81a31420 RDI: ffff88003d8016e0
  RBP: ffff880025807ae8 R08: ffff88003d801730 R09: ffffffffa00b4898
  R10: ffffffff81044217 R11: ffff88003d801040 R12: 0000000064206e97
  R13: ffff880036059d98 R14: ffff880038cc8080 R15: ffff880038cc82d0
  FS:  00007f21afd9c740(0000) GS:ffff88003d100000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: ffff8803598d98b8 CR3: 000000003df49000 CR4: 00000000000007e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Stack:
   ffff8800388a2400 0000000000000002 ffff880025807b18 ffffffff810442ce
   ffffffff81044217 ffff880000000002 ffff8800371b4080 ffff88003d112ec0
   ffff880025807b38 ffffffffa00810b0 ffff880036059d88 ffff880036059be8
  Call Trace:
   [&lt;ffffffff810442ce&gt;] workqueue_congested+0xb7/0x12c
   [&lt;ffffffffa00810b0&gt;] fscache_enqueue_object+0xb2/0xe8 [fscache]
   [&lt;ffffffffa007facd&gt;] __fscache_acquire_cookie+0x3b9/0x56c [fscache]
   [&lt;ffffffffa00ad8fe&gt;] nfs_fscache_set_inode_cookie+0xee/0x132 [nfs]
   [&lt;ffffffffa009e112&gt;] do_open+0x9/0xd [nfs]
   [&lt;ffffffff810e804a&gt;] do_dentry_open+0x175/0x24b
   [&lt;ffffffff810e8298&gt;] finish_open+0x41/0x51

Fix it by using smp_processor_id() if @cpu is WORK_CPU_UNBOUND.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: David Howells &lt;dhowells@redhat.com&gt;
Tested-and-Acked-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
</entry>
<entry>
<title>workqueue: include workqueue info when printing debug dump of a worker task</title>
<updated>2013-05-01T00:04:02Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-04-30T22:27:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3d1cb2059d9374e58da481b783332cf191cb6620'/>
<id>urn:sha1:3d1cb2059d9374e58da481b783332cf191cb6620</id>
<content type='text'>
One of the problems that arise when converting dedicated custom
threadpool to workqueue is that the shared worker pool used by workqueue
anonimizes each worker making it more difficult to identify what the
worker was doing on which target from the output of sysrq-t or debug
dump from oops, BUG() and friends.

This patch implements set_worker_desc() which can be called from any
workqueue work function to set its description.  When the worker task is
dumped for whatever reason - sysrq-t, WARN, BUG, oops, lockdep assertion
and so on - the description will be printed out together with the
workqueue name and the worker function pointer.

The printing side is implemented by print_worker_info() which is called
from functions in task dump paths - sched_show_task() and
dump_stack_print_info().  print_worker_info() can be safely called on
any task in any state as long as the task struct itself is accessible.
It uses probe_*() functions to access worker fields.  It may print
garbage if something went very wrong, but it wouldn't cause (another)
oops.

The description is currently limited to 24bytes including the
terminating \0.  worker-&gt;desc_valid and workder-&gt;desc[] are added and
the 64 bytes marker which was already incorrect before adding the new
fields is moved to the correct position.

Here's an example dump with writeback updated to set the bdi name as
worker desc.

 Hardware name: Bochs
 Modules linked in:
 Pid: 7, comm: kworker/u9:0 Not tainted 3.9.0-rc1-work+ #1
 Workqueue: writeback bdi_writeback_workfn (flush-8:0)
  ffffffff820a3ab0 ffff88000f6e9cb8 ffffffff81c61845 ffff88000f6e9cf8
  ffffffff8108f50f 0000000000000000 0000000000000000 ffff88000cde16b0
  ffff88000cde1aa8 ffff88001ee19240 ffff88000f6e9fd8 ffff88000f6e9d08
 Call Trace:
  [&lt;ffffffff81c61845&gt;] dump_stack+0x19/0x1b
  [&lt;ffffffff8108f50f&gt;] warn_slowpath_common+0x7f/0xc0
  [&lt;ffffffff8108f56a&gt;] warn_slowpath_null+0x1a/0x20
  [&lt;ffffffff81200150&gt;] bdi_writeback_workfn+0x2a0/0x3b0
 ...

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>workqueue: use kmem_cache_free() instead of kfree()</title>
<updated>2013-04-09T18:33:40Z</updated>
<author>
<name>Wei Yongjun</name>
<email>yongjun_wei@trendmicro.com.cn</email>
</author>
<published>2013-04-09T06:29:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cece95dfe5aa56ba99e51b4746230ff0b8542abd'/>
<id>urn:sha1:cece95dfe5aa56ba99e51b4746230ff0b8542abd</id>
<content type='text'>
memory allocated by kmem_cache_alloc() should be freed using
kmem_cache_free(), not kfree().

Signed-off-by: Wei Yongjun &lt;yongjun_wei@trendmicro.com.cn&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: avoid false negative WARN_ON() in destroy_workqueue()</title>
<updated>2013-04-04T14:54:01Z</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@cn.fujitsu.com</email>
</author>
<published>2013-04-04T02:05:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5c529597e922c26910fe49b8d5f93aeaca9a2415'/>
<id>urn:sha1:5c529597e922c26910fe49b8d5f93aeaca9a2415</id>
<content type='text'>
destroy_workqueue() performs several sanity checks before proceeding
with destruction of a workqueue.  One of the checks verifies that
refcnt of each pwq (pool_workqueue) is over 1 as at that point there
should be no in-flight work items and the only holder of pwq refs is
the workqueue itself.

This worked fine as a workqueue used to hold only one reference to its
pwqs; however, since 4c16bd327c ("workqueue: implement NUMA affinity
for unbound workqueues"), a workqueue may hold multiple references to
its default pwq triggering this sanity check spuriously.

Fix it by not triggering the pwq-&gt;refcnt assertion on default pwqs.

An example spurious WARN trigger follows.

 WARNING: at kernel/workqueue.c:4201 destroy_workqueue+0x6a/0x13e()
 Hardware name: 4286C12
 Modules linked in: sdhci_pci sdhci mmc_core usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video
 Pid: 361, comm: umount Not tainted 3.9.0-rc5+ #29
 Call Trace:
  [&lt;c04314a7&gt;] warn_slowpath_common+0x7c/0x93
  [&lt;c04314e0&gt;] warn_slowpath_null+0x22/0x24
  [&lt;c044796a&gt;] destroy_workqueue+0x6a/0x13e
  [&lt;c056dc01&gt;] ext4_put_super+0x43/0x2c4
  [&lt;c04fb7b8&gt;] generic_shutdown_super+0x4b/0xb9
  [&lt;c04fb848&gt;] kill_block_super+0x22/0x60
  [&lt;c04fb960&gt;] deactivate_locked_super+0x2f/0x56
  [&lt;c04fc41b&gt;] deactivate_super+0x2e/0x31
  [&lt;c050f1e6&gt;] mntput_no_expire+0x103/0x108
  [&lt;c050fdce&gt;] sys_umount+0x2a2/0x2c4
  [&lt;c050fe0e&gt;] sys_oldumount+0x1e/0x20
  [&lt;c085ba4d&gt;] sysenter_do_call+0x12/0x38

tj: Rewrote description.

Signed-off-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
</content>
</entry>
</feed>
