<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/workqueue.c, branch v3.14.33</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.14.33</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.14.33'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2015-02-06T06:35:52Z</updated>
<entry>
<title>workqueue: fix subtle pool management issue which can stall whole worker_pool</title>
<updated>2015-02-06T06:35:52Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2015-01-16T19:21:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=336d55193a33b2922f2cd3d70ca4e81a0e76c30f'/>
<id>urn:sha1:336d55193a33b2922f2cd3d70ca4e81a0e76c30f</id>
<content type='text'>
commit 29187a9eeaf362d8422e62e17a22a6e115277a49 upstream.

A worker_pool's forward progress is guaranteed by the fact that the
last idle worker assumes the manager role to create more workers and
summon the rescuers if creating workers doesn't succeed in timely
manner before proceeding to execute work items.

This manager role is implemented in manage_workers(), which indicates
whether the worker may proceed to work item execution with its return
value.  This is necessary because multiple workers may contend for the
manager role, and, if there already is a manager, others should
proceed to work item execution.

Unfortunately, the function also indicates that the worker may proceed
to work item execution if need_to_create_worker() is false at the head
of the function.  need_to_create_worker() tests the following
conditions.

	pending work items &amp;&amp; !nr_running &amp;&amp; !nr_idle

The first and third conditions are protected by pool-&gt;lock and thus
won't change while holding pool-&gt;lock; however, nr_running can change
asynchronously as other workers block and resume and while it's likely
to be zero, as someone woke this worker up in the first place, some
other workers could have become runnable inbetween making it non-zero.

If this happens, manage_worker() could return false even with zero
nr_idle making the worker, the last idle one, proceed to execute work
items.  If then all workers of the pool end up blocking on a resource
which can only be released by a work item which is pending on that
pool, the whole pool can deadlock as there's no one to create more
workers or summon the rescuers.

This patch fixes the problem by removing the early exit condition from
maybe_create_worker() and making manage_workers() return false iff
there's already another manager, which ensures that the last worker
doesn't start executing work items.

We can leave the early exit condition alone and just ignore the return
value but the only reason it was put there is because the
manage_workers() used to perform both creations and destructions of
workers and thus the function may be invoked while the pool is trying
to reduce the number of workers.  Now that manage_workers() is called
only when more workers are needed, the only case this early exit
condition is triggered is rare race conditions rendering it pointless.

Tested with simulated workload and modified workqueue code which
trigger the pool deadlock reliably without this patch.

tj: Updated to v3.14 where manage_workers() is responsible not only
    for creating more workers but also destroying surplus ones.
    maybe_create_worker() needs to keep its early exit condition to
    avoid creating a new worker when manage_workers() is called to
    destroy surplus ones.  Other than that, the adaptabion is
    straight-forward.  Both maybe_{create|destroy}_worker() functions
    are converted to return void and manage_workers() returns %false
    iff it lost manager arbitration.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Eric Sandeen &lt;sandeen@sandeen.net&gt;
Link: http://lkml.kernel.org/g/54B019F4.8030009@sandeen.net
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;


</content>
</entry>
<entry>
<title>workqueue: zero cpumask of wq_numa_possible_cpumask on init</title>
<updated>2014-07-17T23:21:03Z</updated>
<author>
<name>Yasuaki Ishimatsu</name>
<email>isimatu.yasuaki@jp.fujitsu.com</email>
</author>
<published>2014-07-07T13:56:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f53acc0b5b403434e038dd4ee9f83f63fdf3390f'/>
<id>urn:sha1:f53acc0b5b403434e038dd4ee9f83f63fdf3390f</id>
<content type='text'>
commit 5a6024f1604eef119cf3a6fa413fe0261a81a8f3 upstream.

When hot-adding and onlining CPU, kernel panic occurs, showing following
call trace.

  BUG: unable to handle kernel paging request at 0000000000001d08
  IP: [&lt;ffffffff8114acfd&gt;] __alloc_pages_nodemask+0x9d/0xb10
  PGD 0
  Oops: 0000 [#1] SMP
  ...
  Call Trace:
   [&lt;ffffffff812b8745&gt;] ? cpumask_next_and+0x35/0x50
   [&lt;ffffffff810a3283&gt;] ? find_busiest_group+0x113/0x8f0
   [&lt;ffffffff81193bc9&gt;] ? deactivate_slab+0x349/0x3c0
   [&lt;ffffffff811926f1&gt;] new_slab+0x91/0x300
   [&lt;ffffffff815de95a&gt;] __slab_alloc+0x2bb/0x482
   [&lt;ffffffff8105bc1c&gt;] ? copy_process.part.25+0xfc/0x14c0
   [&lt;ffffffff810a3c78&gt;] ? load_balance+0x218/0x890
   [&lt;ffffffff8101a679&gt;] ? sched_clock+0x9/0x10
   [&lt;ffffffff81105ba9&gt;] ? trace_clock_local+0x9/0x10
   [&lt;ffffffff81193d1c&gt;] kmem_cache_alloc_node+0x8c/0x200
   [&lt;ffffffff8105bc1c&gt;] copy_process.part.25+0xfc/0x14c0
   [&lt;ffffffff81114d0d&gt;] ? trace_buffer_unlock_commit+0x4d/0x60
   [&lt;ffffffff81085a80&gt;] ? kthread_create_on_node+0x140/0x140
   [&lt;ffffffff8105d0ec&gt;] do_fork+0xbc/0x360
   [&lt;ffffffff8105d3b6&gt;] kernel_thread+0x26/0x30
   [&lt;ffffffff81086652&gt;] kthreadd+0x2c2/0x300
   [&lt;ffffffff81086390&gt;] ? kthread_create_on_cpu+0x60/0x60
   [&lt;ffffffff815f20ec&gt;] ret_from_fork+0x7c/0xb0
   [&lt;ffffffff81086390&gt;] ? kthread_create_on_cpu+0x60/0x60

In my investigation, I found the root cause is wq_numa_possible_cpumask.
All entries of wq_numa_possible_cpumask is allocated by
alloc_cpumask_var_node(). And these entries are used without initializing.
So these entries have wrong value.

When hot-adding and onlining CPU, wq_update_unbound_numa() is called.
wq_update_unbound_numa() calls alloc_unbound_pwq(). And alloc_unbound_pwq()
calls get_unbound_pool(). In get_unbound_pool(), worker_pool-&gt;node is set
as follow:

3592         /* if cpumask is contained inside a NUMA node, we belong to that node */
3593         if (wq_numa_enabled) {
3594                 for_each_node(node) {
3595                         if (cpumask_subset(pool-&gt;attrs-&gt;cpumask,
3596                                            wq_numa_possible_cpumask[node])) {
3597                                 pool-&gt;node = node;
3598                                 break;
3599                         }
3600                 }
3601         }

But wq_numa_possible_cpumask[node] does not have correct cpumask. So, wrong
node is selected. As a result, kernel panic occurs.

By this patch, all entries of wq_numa_possible_cpumask are allocated by
zalloc_cpumask_var_node to initialize them. And the panic disappeared.

Signed-off-by: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Reviewed-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Fixes: bce903809ab3 ("workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]")
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>workqueue: fix dev_set_uevent_suppress() imbalance</title>
<updated>2014-07-17T23:21:03Z</updated>
<author>
<name>Maxime Bizon</name>
<email>mbizon@freebox.fr</email>
</author>
<published>2014-06-23T14:35:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7a7332c9fc03b810731b13628a7431e7d8c6a36d'/>
<id>urn:sha1:7a7332c9fc03b810731b13628a7431e7d8c6a36d</id>
<content type='text'>
commit bddbceb688c6d0decaabc7884fede319d02f96c8 upstream.

Uevents are suppressed during attributes registration, but never
restored, so kobject_uevent() does nothing.

Signed-off-by: Maxime Bizon &lt;mbizon@freebox.fr&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Fixes: 226223ab3c4118ddd10688cc2c131135848371ab
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>workqueue: make rescuer_thread() empty wq-&gt;maydays list before exiting</title>
<updated>2014-06-07T17:28:22Z</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@cn.fujitsu.com</email>
</author>
<published>2014-04-18T15:04:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=efa2a039afc3eac43e529e02bb47df62042a504c'/>
<id>urn:sha1:efa2a039afc3eac43e529e02bb47df62042a504c</id>
<content type='text'>
commit 4d595b866d2c653dc90a492b9973a834eabfa354 upstream.

After a @pwq is scheduled for emergency execution, other workers may
consume the affectd work items before the rescuer gets to them.  This
means that a workqueue many have pwqs queued on @wq-&gt;maydays list
while not having any work item pending or in-flight.  If
destroy_workqueue() executes in such condition, the rescuer may exit
without emptying @wq-&gt;maydays.

This currently doesn't cause any actual harm.  destroy_workqueue() can
safely destroy all the involved data structures whether @wq-&gt;maydays
is populated or not as nobody access the list once the rescuer exits.

However, this is nasty and makes future development difficult.  Let's
update rescuer_thread() so that it empties @wq-&gt;maydays after seeing
should_stop to guarantee that the list is empty on rescuer exit.

tj: Updated comment and patch description.

Signed-off-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>workqueue: fix a possible race condition between rescuer and pwq-release</title>
<updated>2014-06-07T17:28:22Z</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@cn.fujitsu.com</email>
</author>
<published>2014-04-18T15:04:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7b21cacc8bf8fda9822d63496fb38bc6def34046'/>
<id>urn:sha1:7b21cacc8bf8fda9822d63496fb38bc6def34046</id>
<content type='text'>
commit 77668c8b559e4fe2acf2a0749c7c83cde49a5025 upstream.

There is a race condition between rescuer_thread() and
pwq_unbound_release_workfn().

Even after a pwq is scheduled for rescue, the associated work items
may be consumed by any worker.  If all of them are consumed before the
rescuer gets to them and the pwq's base ref was put due to attribute
change, the pwq may be released while still being linked on
@wq-&gt;maydays list making the rescuer dereference already freed pwq
later.

Make send_mayday() pin the target pwq until the rescuer is done with
it.

tj: Updated comment and patch description.

Signed-off-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>workqueue: fix bugs in wq_update_unbound_numa() failure path</title>
<updated>2014-06-07T17:28:21Z</updated>
<author>
<name>Daeseok Youn</name>
<email>daeseok.youn@gmail.com</email>
</author>
<published>2014-04-16T05:32:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=65eefbee64455cb2f18a77be616ca1df10c95d6d'/>
<id>urn:sha1:65eefbee64455cb2f18a77be616ca1df10c95d6d</id>
<content type='text'>
commit 77f300b198f93328c26191b52655ce1b62e202cf upstream.

wq_update_unbound_numa() failure path has the following two bugs.

- alloc_unbound_pwq() is called without holding wq-&gt;mutex; however, if
  the allocation fails, it jumps to out_unlock which tries to unlock
  wq-&gt;mutex.

- The function should switch to dfl_pwq on failure but didn't do so
  after alloc_unbound_pwq() failure.

Fix it by regrabbing wq-&gt;mutex and jumping to use_dfl_pwq on
alloc_unbound_pwq() failure.

Signed-off-by: Daeseok Youn &lt;daeseok.youn@gmail.com&gt;
Acked-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Fixes: 4c16bd327c74 ("workqueue: implement NUMA affinity for unbound workqueues")
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>workqueue: ensure @task is valid across kthread_stop()</title>
<updated>2014-02-18T21:35:20Z</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@cn.fujitsu.com</email>
</author>
<published>2014-02-15T14:02:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5bdfff96c69a4d5ab9c49e60abf9e070ecd2acbb'/>
<id>urn:sha1:5bdfff96c69a4d5ab9c49e60abf9e070ecd2acbb</id>
<content type='text'>
When a kworker should die, the kworkre is notified through WORKER_DIE
flag instead of kthread_should_stop().  This, IIRC, is primarily to
keep the test synchronized inside worker_pool lock.  WORKER_DIE is
first set while holding pool-&gt;lock, the lock is dropped and
kthread_stop() is called.

Unfortunately, this means that there's a slight chance that the target
kworker may see WORKER_DIE before kthread_stop() finishes and exits
and frees the target task before or during kthread_stop().

Fix it by pinning the target task before setting WORKER_DIE and
putting it after kthread_stop() is done.

tj: Improved patch description and comment.  Moved pinning above
    WORKER_DIE for better signify what it's protecting.

CC: stable@vger.kernel.org
Signed-off-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq</title>
<updated>2014-01-22T01:46:31Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-01-22T01:46:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4a2829b97654ec773dabc681f232ab11cb347d01'/>
<id>urn:sha1:4a2829b97654ec773dabc681f232ab11cb347d01</id>
<content type='text'>
Pull workqueue update from Tejun Heo:
 "Just one patch to add destroy_work_on_stack() annotations to help
  debugobj debugging"

* 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
</content>
</entry>
<entry>
<title>workqueue: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()</title>
<updated>2014-01-12T03:26:33Z</updated>
<author>
<name>Chuansheng Liu</name>
<email>chuansheng.liu@intel.com</email>
</author>
<published>2014-01-12T03:26:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=440a11360326044a9addf1c652a0364aad0be90c'/>
<id>urn:sha1:440a11360326044a9addf1c652a0364aad0be90c</id>
<content type='text'>
In case CONFIG_DEBUG_OBJECTS_WORK is defined, it is needed to
call destroy_work_on_stack() which frees the debug object to pair
with INIT_WORK_ONSTACK().

Signed-off-by: Liu, Chuansheng &lt;chuansheng.liu@intel.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'pci-v3.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci</title>
<updated>2013-12-15T19:45:27Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-12-15T19:45:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9199c4caa1315c31d160abbd166df0b9a9e8551e'/>
<id>urn:sha1:9199c4caa1315c31d160abbd166df0b9a9e8551e</id>
<content type='text'>
Pull PCI updates from Bjorn Helgaas:
 "PCI device hotplug
    - Move device_del() from pci_stop_dev() to pci_destroy_dev() (Rafael
      Wysocki)

  Host bridge drivers
    - Update maintainers for DesignWare, i.MX6, Armada, R-Car (Bjorn
      Helgaas)
    - mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin
      (Jason Gunthorpe)

  Miscellaneous
    - Avoid unnecessary CPU switch when calling .probe() (Alexander
      Duyck)
    - Revert "workqueue: allow work_on_cpu() to be called recursively"
      (Bjorn Helgaas)
    - Disable Bus Master only on kexec reboot (Khalid Aziz)
    - Omit PCI ID macro strings to shorten quirk names for LTO (Michal
      Marek)"

* tag 'pci-v3.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  MAINTAINERS: Add DesignWare, i.MX6, Armada, R-Car PCI host maintainers
  PCI: Disable Bus Master only on kexec reboot
  PCI: mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin
  PCI: Omit PCI ID macro strings to shorten quirk names
  PCI: Move device_del() from pci_stop_dev() to pci_destroy_dev()
  Revert "workqueue: allow work_on_cpu() to be called recursively"
  PCI: Avoid unnecessary CPU switch when calling driver .probe() method
</content>
</entry>
</feed>
