<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/workqueue.c, branch v3.10</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.10</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.10'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2013-05-15T21:24:24Z</updated>
<entry>
<title>workqueue: don't perform NUMA-aware allocations on offline nodes in wq_numa_init()</title>
<updated>2013-05-15T21:24:24Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-05-15T21:24:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1be0c25da56e860992af972a60321563ca2cfcd1'/>
<id>urn:sha1:1be0c25da56e860992af972a60321563ca2cfcd1</id>
<content type='text'>
wq_numa_init() builds per-node cpumasks which are later used to make
unbound workqueues NUMA-aware.  The cpumasks are allocated using
alloc_cpumask_var_node() for all possible nodes.  Unfortunately, on
machines with off-line nodes, this leads to NUMA-aware allocations on
existing bug offline nodes, which in turn triggers BUG in the memory
allocation code.

Fix it by using NUMA_NO_NODE for cpumask allocations for offline
nodes.

  kernel BUG at include/linux/gfp.h:323!
  invalid opcode: 0000 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0+ #1
  Hardware name: ProLiant BL465c G7, BIOS A19 12/10/2011
  task: ffff880234608000 ti: ffff880234602000 task.ti: ffff880234602000
  RIP: 0010:[&lt;ffffffff8117495d&gt;]  [&lt;ffffffff8117495d&gt;] new_slab+0x2ad/0x340
  RSP: 0000:ffff880234603bf8  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff880237404b40 RCX: 00000000000000d0
  RDX: 0000000000000001 RSI: 0000000000000003 RDI: 00000000002052d0
  RBP: ffff880234603c28 R08: 0000000000000000 R09: 0000000000000001
  R10: 0000000000000001 R11: ffffffff812e3aa8 R12: 0000000000000001
  R13: ffff8802378161c0 R14: 0000000000030027 R15: 00000000000040d0
  FS:  0000000000000000(0000) GS:ffff880237800000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: ffff88043fdff000 CR3: 00000000018d5000 CR4: 00000000000007f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Stack:
   ffff880234603c28 0000000000000001 00000000000000d0 ffff8802378161c0
   ffff880237404b40 ffff880237404b40 ffff880234603d28 ffffffff815edba1
   ffff880237816140 0000000000000000 ffff88023740e1c0
  Call Trace:
   [&lt;ffffffff815edba1&gt;] __slab_alloc+0x330/0x4f2
   [&lt;ffffffff81174b25&gt;] kmem_cache_alloc_node_trace+0xa5/0x200
   [&lt;ffffffff812e3aa8&gt;] alloc_cpumask_var_node+0x28/0x90
   [&lt;ffffffff81a0bdb3&gt;] wq_numa_init+0x10d/0x1be
   [&lt;ffffffff81a0bec8&gt;] init_workqueues+0x64/0x341
   [&lt;ffffffff810002ea&gt;] do_one_initcall+0xea/0x1a0
   [&lt;ffffffff819f1f31&gt;] kernel_init_freeable+0xb7/0x1ec
   [&lt;ffffffff815d50de&gt;] kernel_init+0xe/0xf0
   [&lt;ffffffff815ff89c&gt;] ret_from_fork+0x7c/0xb0
  Code: 45  84 ac 00 00 00 f0 41 80 4d 00 40 e9 f6 fe ff ff 66 0f 1f 84 00 00 00 00 00 e8 eb 4b ff ff 49 89 c5 e9 05 fe ff ff &lt;0f&gt; 0b 4c 8b 73 38 44 89 ff 81 cf 00 00 20 00 4c 89 f6 48 c1 ee

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-and-Tested-by: Lingzhu Xiang &lt;lxiang@redhat.com&gt;
</content>
</entry>
<entry>
<title>workqueue: Make schedule_work() available again to non GPL modules</title>
<updated>2013-05-14T18:52:51Z</updated>
<author>
<name>Marc Dionne</name>
<email>marc.c.dionne@gmail.com</email>
</author>
<published>2013-05-06T21:44:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ad7b1f841f8a54c6d61ff181451f55b68175e15a'/>
<id>urn:sha1:ad7b1f841f8a54c6d61ff181451f55b68175e15a</id>
<content type='text'>
Commit 8425e3d5bdbe ("workqueue: inline trivial wrappers") changed
schedule_work() and schedule_delayed_work() to inline wrappers,
but these rely on some symbols that are EXPORT_SYMBOL_GPL, while
the original functions were EXPORT_SYMBOL.  This has the effect of
changing the licensing requirement for these functions and making
them unavailable to non GPL modules.

Make them available again by removing the restriction on the
required symbols.

Signed-off-by: Marc Dionne &lt;marc.dionne@your-file-system.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: correct handling of the pool spin_lock</title>
<updated>2013-05-14T18:48:15Z</updated>
<author>
<name>Joonsoo Kim</name>
<email>js1304@gmail.com</email>
</author>
<published>2013-04-30T15:07:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8f174b1175a10903ade40f36eb6c896412877ca0'/>
<id>urn:sha1:8f174b1175a10903ade40f36eb6c896412877ca0</id>
<content type='text'>
When we fail to mutex_trylock(), we release the pool spin_lock and do
mutex_lock(). After that, we should regrab the pool spin_lock, but,
regrabbing is missed in current code. So correct it.

Cc: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Signed-off-by: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: workqueue_congested() shouldn't translate WORK_CPU_UNBOUND into node number</title>
<updated>2013-05-10T18:10:17Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-05-10T18:10:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d3251859168b0b12841e1b90d6d768ab478dc23d'/>
<id>urn:sha1:d3251859168b0b12841e1b90d6d768ab478dc23d</id>
<content type='text'>
df2d5ae499 ("workqueue: map an unbound workqueues to multiple per-node
pool_workqueues") made unbound workqueues to map to multiple per-node
pool_workqueues and accordingly updated workqueue_contested() so that,
for unbound workqueues, it maps the specified @cpu to the NUMA node
number to obtain the matching pool_workqueue to query the congested
state.

Before this change, workqueue_congested() ignored @cpu for unbound
workqueues as there was only one pool_workqueue and some users
(fscache) called it with WORK_CPU_UNBOUND.  After the commit, this
causes the following oops as WORK_CPU_UNBOUND gets translated to
garbage by cpu_to_node().

  BUG: unable to handle kernel paging request at ffff8803598d98b8
  IP: [&lt;ffffffff81043b7e&gt;] unbound_pwq_by_node+0xa1/0xfa
  PGD 2421067 PUD 0
  Oops: 0000 [#1] SMP
  CPU: 1 PID: 2689 Comm: cat Tainted: GF            3.9.0-fsdevel+ #4
  task: ffff88003d801040 ti: ffff880025806000 task.ti: ffff880025806000
  RIP: 0010:[&lt;ffffffff81043b7e&gt;]  [&lt;ffffffff81043b7e&gt;] unbound_pwq_by_node+0xa1/0xfa
  RSP: 0018:ffff880025807ad8  EFLAGS: 00010202
  RAX: 0000000000000001 RBX: ffff8800388a2400 RCX: 0000000000000003
  RDX: ffff880025807fd8 RSI: ffffffff81a31420 RDI: ffff88003d8016e0
  RBP: ffff880025807ae8 R08: ffff88003d801730 R09: ffffffffa00b4898
  R10: ffffffff81044217 R11: ffff88003d801040 R12: 0000000064206e97
  R13: ffff880036059d98 R14: ffff880038cc8080 R15: ffff880038cc82d0
  FS:  00007f21afd9c740(0000) GS:ffff88003d100000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: ffff8803598d98b8 CR3: 000000003df49000 CR4: 00000000000007e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Stack:
   ffff8800388a2400 0000000000000002 ffff880025807b18 ffffffff810442ce
   ffffffff81044217 ffff880000000002 ffff8800371b4080 ffff88003d112ec0
   ffff880025807b38 ffffffffa00810b0 ffff880036059d88 ffff880036059be8
  Call Trace:
   [&lt;ffffffff810442ce&gt;] workqueue_congested+0xb7/0x12c
   [&lt;ffffffffa00810b0&gt;] fscache_enqueue_object+0xb2/0xe8 [fscache]
   [&lt;ffffffffa007facd&gt;] __fscache_acquire_cookie+0x3b9/0x56c [fscache]
   [&lt;ffffffffa00ad8fe&gt;] nfs_fscache_set_inode_cookie+0xee/0x132 [nfs]
   [&lt;ffffffffa009e112&gt;] do_open+0x9/0xd [nfs]
   [&lt;ffffffff810e804a&gt;] do_dentry_open+0x175/0x24b
   [&lt;ffffffff810e8298&gt;] finish_open+0x41/0x51

Fix it by using smp_processor_id() if @cpu is WORK_CPU_UNBOUND.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: David Howells &lt;dhowells@redhat.com&gt;
Tested-and-Acked-by: David Howells &lt;dhowells@redhat.com&gt;
</content>
</entry>
<entry>
<title>workqueue: include workqueue info when printing debug dump of a worker task</title>
<updated>2013-05-01T00:04:02Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-04-30T22:27:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3d1cb2059d9374e58da481b783332cf191cb6620'/>
<id>urn:sha1:3d1cb2059d9374e58da481b783332cf191cb6620</id>
<content type='text'>
One of the problems that arise when converting dedicated custom
threadpool to workqueue is that the shared worker pool used by workqueue
anonimizes each worker making it more difficult to identify what the
worker was doing on which target from the output of sysrq-t or debug
dump from oops, BUG() and friends.

This patch implements set_worker_desc() which can be called from any
workqueue work function to set its description.  When the worker task is
dumped for whatever reason - sysrq-t, WARN, BUG, oops, lockdep assertion
and so on - the description will be printed out together with the
workqueue name and the worker function pointer.

The printing side is implemented by print_worker_info() which is called
from functions in task dump paths - sched_show_task() and
dump_stack_print_info().  print_worker_info() can be safely called on
any task in any state as long as the task struct itself is accessible.
It uses probe_*() functions to access worker fields.  It may print
garbage if something went very wrong, but it wouldn't cause (another)
oops.

The description is currently limited to 24bytes including the
terminating \0.  worker-&gt;desc_valid and workder-&gt;desc[] are added and
the 64 bytes marker which was already incorrect before adding the new
fields is moved to the correct position.

Here's an example dump with writeback updated to set the bdi name as
worker desc.

 Hardware name: Bochs
 Modules linked in:
 Pid: 7, comm: kworker/u9:0 Not tainted 3.9.0-rc1-work+ #1
 Workqueue: writeback bdi_writeback_workfn (flush-8:0)
  ffffffff820a3ab0 ffff88000f6e9cb8 ffffffff81c61845 ffff88000f6e9cf8
  ffffffff8108f50f 0000000000000000 0000000000000000 ffff88000cde16b0
  ffff88000cde1aa8 ffff88001ee19240 ffff88000f6e9fd8 ffff88000f6e9d08
 Call Trace:
  [&lt;ffffffff81c61845&gt;] dump_stack+0x19/0x1b
  [&lt;ffffffff8108f50f&gt;] warn_slowpath_common+0x7f/0xc0
  [&lt;ffffffff8108f56a&gt;] warn_slowpath_null+0x1a/0x20
  [&lt;ffffffff81200150&gt;] bdi_writeback_workfn+0x2a0/0x3b0
 ...

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>workqueue: use kmem_cache_free() instead of kfree()</title>
<updated>2013-04-09T18:33:40Z</updated>
<author>
<name>Wei Yongjun</name>
<email>yongjun_wei@trendmicro.com.cn</email>
</author>
<published>2013-04-09T06:29:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cece95dfe5aa56ba99e51b4746230ff0b8542abd'/>
<id>urn:sha1:cece95dfe5aa56ba99e51b4746230ff0b8542abd</id>
<content type='text'>
memory allocated by kmem_cache_alloc() should be freed using
kmem_cache_free(), not kfree().

Signed-off-by: Wei Yongjun &lt;yongjun_wei@trendmicro.com.cn&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: avoid false negative WARN_ON() in destroy_workqueue()</title>
<updated>2013-04-04T14:54:01Z</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@cn.fujitsu.com</email>
</author>
<published>2013-04-04T02:05:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5c529597e922c26910fe49b8d5f93aeaca9a2415'/>
<id>urn:sha1:5c529597e922c26910fe49b8d5f93aeaca9a2415</id>
<content type='text'>
destroy_workqueue() performs several sanity checks before proceeding
with destruction of a workqueue.  One of the checks verifies that
refcnt of each pwq (pool_workqueue) is over 1 as at that point there
should be no in-flight work items and the only holder of pwq refs is
the workqueue itself.

This worked fine as a workqueue used to hold only one reference to its
pwqs; however, since 4c16bd327c ("workqueue: implement NUMA affinity
for unbound workqueues"), a workqueue may hold multiple references to
its default pwq triggering this sanity check spuriously.

Fix it by not triggering the pwq-&gt;refcnt assertion on default pwqs.

An example spurious WARN trigger follows.

 WARNING: at kernel/workqueue.c:4201 destroy_workqueue+0x6a/0x13e()
 Hardware name: 4286C12
 Modules linked in: sdhci_pci sdhci mmc_core usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video
 Pid: 361, comm: umount Not tainted 3.9.0-rc5+ #29
 Call Trace:
  [&lt;c04314a7&gt;] warn_slowpath_common+0x7c/0x93
  [&lt;c04314e0&gt;] warn_slowpath_null+0x22/0x24
  [&lt;c044796a&gt;] destroy_workqueue+0x6a/0x13e
  [&lt;c056dc01&gt;] ext4_put_super+0x43/0x2c4
  [&lt;c04fb7b8&gt;] generic_shutdown_super+0x4b/0xb9
  [&lt;c04fb848&gt;] kill_block_super+0x22/0x60
  [&lt;c04fb960&gt;] deactivate_locked_super+0x2f/0x56
  [&lt;c04fc41b&gt;] deactivate_super+0x2e/0x31
  [&lt;c050f1e6&gt;] mntput_no_expire+0x103/0x108
  [&lt;c050fdce&gt;] sys_umount+0x2a2/0x2c4
  [&lt;c050fe0e&gt;] sys_oldumount+0x1e/0x20
  [&lt;c085ba4d&gt;] sysenter_do_call+0x12/0x38

tj: Rewrote description.

Signed-off-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'v3.9-rc5' into wq/for-3.10</title>
<updated>2013-04-02T01:45:36Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-04-02T00:08:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=229641a6f1f09e27a1f12fba38980f33f4c92975'/>
<id>urn:sha1:229641a6f1f09e27a1f12fba38980f33f4c92975</id>
<content type='text'>
Writeback conversion to workqueue will be based on top of wq/for-3.10
branch to take advantage of custom attrs and NUMA support for unbound
workqueues.  Mainline currently contains two commits which result in
non-trivial merge conflicts with wq/for-3.10 and because
block/for-3.10/core is based on v3.9-rc3 which contains one of the
conflicting commits, we need a pre-merge-window merge anyway.  Let's
pull v3.9-rc5 into wq/for-3.10 so that the block tree doesn't suffer
from workqueue merge conflicts.

The two conflicts and their resolutions:

* e68035fb65 ("workqueue: convert to idr_alloc()") in mainline changes
  worker_pool_assign_id() to use idr_alloc() instead of the old idr
  interface.  worker_pool_assign_id() goes through multiple locking
  changes in wq/for-3.10 causing the following conflict.

  static int worker_pool_assign_id(struct worker_pool *pool)
  {
	  int ret;

  &lt;&lt;&lt;&lt;&lt;&lt;&lt; HEAD
	  lockdep_assert_held(&amp;wq_pool_mutex);

	  do {
		  if (!idr_pre_get(&amp;worker_pool_idr, GFP_KERNEL))
			  return -ENOMEM;
		  ret = idr_get_new(&amp;worker_pool_idr, pool, &amp;pool-&gt;id);
	  } while (ret == -EAGAIN);
  =======
	  mutex_lock(&amp;worker_pool_idr_mutex);
	  ret = idr_alloc(&amp;worker_pool_idr, pool, 0, 0, GFP_KERNEL);
	  if (ret &gt;= 0)
		  pool-&gt;id = ret;
	  mutex_unlock(&amp;worker_pool_idr_mutex);
  &gt;&gt;&gt;&gt;&gt;&gt;&gt; c67bf5361e7e66a0ff1f4caf95f89347d55dfb89

	  return ret &lt; 0 ? ret : 0;
  }

  We want locking from the former and idr_alloc() usage from the
  latter, which can be combined to the following.

  static int worker_pool_assign_id(struct worker_pool *pool)
  {
	  int ret;

	  lockdep_assert_held(&amp;wq_pool_mutex);

	  ret = idr_alloc(&amp;worker_pool_idr, pool, 0, 0, GFP_KERNEL);
	  if (ret &gt;= 0) {
		  pool-&gt;id = ret;
		  return 0;
	  }
	  return ret;
   }

* eb2834285c ("workqueue: fix possible pool stall bug in
  wq_unbind_fn()") updated wq_unbind_fn() such that it has single
  larger for_each_std_worker_pool() loop instead of two separate loops
  with a schedule() call inbetween.  wq/for-3.10 renamed
  pool-&gt;assoc_mutex to pool-&gt;manager_mutex causing the following
  conflict (earlier function body and comments omitted for brevity).

  static void wq_unbind_fn(struct work_struct *work)
  {
  ...
		  spin_unlock_irq(&amp;pool-&gt;lock);
  &lt;&lt;&lt;&lt;&lt;&lt;&lt; HEAD
		  mutex_unlock(&amp;pool-&gt;manager_mutex);
	  }
  =======
		  mutex_unlock(&amp;pool-&gt;assoc_mutex);
  &gt;&gt;&gt;&gt;&gt;&gt;&gt; c67bf5361e7e66a0ff1f4caf95f89347d55dfb89

		  schedule();

  &lt;&lt;&lt;&lt;&lt;&lt;&lt; HEAD
	  for_each_cpu_worker_pool(pool, cpu)
  =======
  &gt;&gt;&gt;&gt;&gt;&gt;&gt; c67bf5361e7e66a0ff1f4caf95f89347d55dfb89
		  atomic_set(&amp;pool-&gt;nr_running, 0);

		  spin_lock_irq(&amp;pool-&gt;lock);
		  wake_up_worker(pool);
		  spin_unlock_irq(&amp;pool-&gt;lock);
	  }
  }

  The resolution is mostly trivial.  We want the control flow of the
  latter with the rename of the former.

  static void wq_unbind_fn(struct work_struct *work)
  {
  ...
		  spin_unlock_irq(&amp;pool-&gt;lock);
		  mutex_unlock(&amp;pool-&gt;manager_mutex);

		  schedule();

		  atomic_set(&amp;pool-&gt;nr_running, 0);

		  spin_lock_irq(&amp;pool-&gt;lock);
		  wake_up_worker(pool);
		  spin_unlock_irq(&amp;pool-&gt;lock);
	  }
  }

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity</title>
<updated>2013-04-01T18:23:38Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-04-01T18:23:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d55262c4d164759a8debe772da6c9b16059dec47'/>
<id>urn:sha1:d55262c4d164759a8debe772da6c9b16059dec47</id>
<content type='text'>
Unbound workqueues are now NUMA aware.  Let's add some control knobs
and update sysfs interface accordingly.

* Add kernel param workqueue.numa_disable which disables NUMA affinity
  globally.

* Replace sysfs file "pool_id" with "pool_ids" which contain
  node:pool_id pairs.  This change is userland-visible but "pool_id"
  hasn't seen a release yet, so this is okay.

* Add a new sysf files "numa" which can toggle NUMA affinity on
  individual workqueues.  This is implemented as attrs-&gt;no_numa whichn
  is special in that it isn't part of a pool's attributes.  It only
  affects how apply_workqueue_attrs() picks which pools to use.

After "pool_ids" change, first_pwq() doesn't have any user left.
Removed.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
</content>
</entry>
<entry>
<title>workqueue: implement NUMA affinity for unbound workqueues</title>
<updated>2013-04-01T18:23:36Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-04-01T18:23:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4c16bd327c74d6678858706211a0c6e4e53eb3e6'/>
<id>urn:sha1:4c16bd327c74d6678858706211a0c6e4e53eb3e6</id>
<content type='text'>
Currently, an unbound workqueue has single current, or first, pwq
(pool_workqueue) to which all new work items are queued.  This often
isn't optimal on NUMA machines as workers may jump around across node
boundaries and work items get assigned to workers without any regard
to NUMA affinity.

This patch implements NUMA affinity for unbound workqueues.  Instead
of mapping all entries of numa_pwq_tbl[] to the same pwq,
apply_workqueue_attrs() now creates a separate pwq covering the
intersecting CPUs for each NUMA node which has online CPUs in
@attrs-&gt;cpumask.  Nodes which don't have intersecting possible CPUs
are mapped to pwqs covering whole @attrs-&gt;cpumask.

As CPUs come up and go down, the pool association is changed
accordingly.  Changing pool association may involve allocating new
pools which may fail.  To avoid failing CPU_DOWN, each workqueue
always keeps a default pwq which covers whole attrs-&gt;cpumask which is
used as fallback if pool creation fails during a CPU hotplug
operation.

This ensures that all work items issued on a NUMA node is executed on
the same node as long as the workqueue allows execution on the CPUs of
the node.

As this maps a workqueue to multiple pwqs and max_active is per-pwq,
this change the behavior of max_active.  The limit is now per NUMA
node instead of global.  While this is an actual change, max_active is
already per-cpu for per-cpu workqueues and primarily used as safety
mechanism rather than for active concurrency control.  Concurrency is
usually limited from workqueue users by the number of concurrently
active work items and this change shouldn't matter much.

v2: Fixed pwq freeing in apply_workqueue_attrs() error path.  Spotted
    by Lai.

v3: The previous version incorrectly made a workqueue spanning
    multiple nodes spread work items over all online CPUs when some of
    its nodes don't have any desired cpus.  Reimplemented so that NUMA
    affinity is properly updated as CPUs go up and down.  This problem
    was spotted by Lai Jiangshan.

v4: destroy_workqueue() was putting wq-&gt;dfl_pwq and then clearing it;
    however, wq may be freed at any time after dfl_pwq is put making
    the clearing use-after-free.  Clear wq-&gt;dfl_pwq before putting it.

v5: apply_workqueue_attrs() was leaking @tmp_attrs, @new_attrs and
    @pwq_tbl after success.  Fixed.

    Retry loop in wq_update_unbound_numa_attrs() isn't necessary as
    application of new attrs is excluded via CPU hotplug.  Removed.

    Documentation on CPU affinity guarantee on CPU_DOWN added.

    All changes are suggested by Lai Jiangshan.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
</content>
</entry>
</feed>
