<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/workqueue.c, branch v6.6.62</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.6.62</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.6.62'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2024-09-12T09:11:42Z</updated>
<entry>
<title>workqueue: Improve scalability of workqueue watchdog touch</title>
<updated>2024-09-12T09:11:42Z</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2024-06-25T11:42:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=241bce1c757d0587721512296952e6bba69631ed'/>
<id>urn:sha1:241bce1c757d0587721512296952e6bba69631ed</id>
<content type='text'>
[ Upstream commit 98f887f820c993e05a12e8aa816c80b8661d4c87 ]

On a ~2000 CPU powerpc system, hard lockups have been observed in the
workqueue code when stop_machine runs (in this case due to CPU hotplug).
This is due to lots of CPUs spinning in multi_cpu_stop, calling
touch_nmi_watchdog() which ends up calling wq_watchdog_touch().
wq_watchdog_touch() writes to the global variable wq_watchdog_touched,
and that can find itself in the same cacheline as other important
workqueue data, which slows down operations to the point of lockups.

In the case of the following abridged trace, worker_pool_idr was in
the hot line, causing the lockups to always appear at idr_find.

  watchdog: CPU 1125 self-detected hard LOCKUP @ idr_find
  Call Trace:
  get_work_pool
  __queue_work
  call_timer_fn
  run_timer_softirq
  __do_softirq
  do_softirq_own_stack
  irq_exit
  timer_interrupt
  decrementer_common_virt
  * interrupt: 900 (timer) at multi_cpu_stop
  multi_cpu_stop
  cpu_stopper_thread
  smpboot_thread_fn
  kthread

Fix this by having wq_watchdog_touch() only write to the line if the
last time a touch was recorded exceeds 1/4 of the watchdog threshold.

Reported-by: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Reviewed-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: wq_watchdog_touch is always called with valid CPU</title>
<updated>2024-09-12T09:11:42Z</updated>
<author>
<name>Nicholas Piggin</name>
<email>npiggin@gmail.com</email>
</author>
<published>2024-06-25T11:42:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5ff0a441419576b8694a2b19c4a689f5c04095e5'/>
<id>urn:sha1:5ff0a441419576b8694a2b19c4a689f5c04095e5</id>
<content type='text'>
[ Upstream commit 18e24deb1cc92f2068ce7434a94233741fbd7771 ]

Warn in the case it is called with cpu == -1. This does not appear
to happen anywhere.

Signed-off-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Reviewed-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Fix selection of wake_cpu in kick_pool()</title>
<updated>2024-05-17T10:02:31Z</updated>
<author>
<name>Sven Schnelle</name>
<email>svens@linux.ibm.com</email>
</author>
<published>2024-04-23T06:19:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c57824d4fe07c2131f8c48687cbd5ee2be60c767'/>
<id>urn:sha1:c57824d4fe07c2131f8c48687cbd5ee2be60c767</id>
<content type='text'>
commit 57a01eafdcf78f6da34fad9ff075ed5dfdd9f420 upstream.

With cpu_possible_mask=0-63 and cpu_online_mask=0-7 the following
kernel oops was observed:

smp: Bringing up secondary CPUs ...
smp: Brought up 1 node, 8 CPUs
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 0000000000000000 TEID: 0000000000000803
[..]
 Call Trace:
arch_vcpu_is_preempted+0x12/0x80
select_idle_sibling+0x42/0x560
select_task_rq_fair+0x29a/0x3b0
try_to_wake_up+0x38e/0x6e0
kick_pool+0xa4/0x198
__queue_work.part.0+0x2bc/0x3a8
call_timer_fn+0x36/0x160
__run_timers+0x1e2/0x328
__run_timer_base+0x5a/0x88
run_timer_softirq+0x40/0x78
__do_softirq+0x118/0x388
irq_exit_rcu+0xc0/0xd8
do_ext_irq+0xae/0x168
ext_int_handler+0xbe/0xf0
psw_idle_exit+0x0/0xc
default_idle_call+0x3c/0x110
do_idle+0xd4/0x158
cpu_startup_entry+0x40/0x48
rest_init+0xc6/0xc8
start_kernel+0x3c4/0x5e0
startup_continue+0x3c/0x50

The crash is caused by calling arch_vcpu_is_preempted() for an offline
CPU. To avoid this, select the cpu with cpumask_any_and_distribute()
to mask __pod_cpumask with cpu_online_mask. In case no cpu is left in
the pool, skip the assignment.

tj: This doesn't fully fix the bug as CPUs can still go down between picking
the target CPU and the wake call. Fixing that likely requires adding
cpu_online() test to either the sched or s390 arch code. However, regardless
of how that is fixed, workqueue shouldn't be picking a CPU which isn't
online as that would result in unpredictable and worse behavior.

Signed-off-by: Sven Schnelle &lt;svens@linux.ibm.com&gt;
Fixes: 8639ecebc9b1 ("workqueue: Implement non-strict affinity scope for unbound workqueues")
Cc: stable@vger.kernel.org # v6.6+
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "workqueue.c: Increase workqueue name length"</title>
<updated>2024-04-04T18:23:07Z</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2024-04-03T14:36:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a99d7274a2b1191744d7e905d01982f3225c0563'/>
<id>urn:sha1:a99d7274a2b1191744d7e905d01982f3225c0563</id>
<content type='text'>
This reverts commit 43a181f8f41aca27e7454cf44a6dfbccc8b14e92 which is
commit 31c89007285d365aa36f71d8fb0701581c770a27 upstream.

The workqueue patches backported to 6.6.y caused some reported
regressions, so revert them for now.

Reported-by: Thorsten Leemhuis &lt;regressions@leemhuis.info&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Cc: Nathan Chancellor &lt;nathan@kernel.org&gt;
Cc: Sasha Levin &lt;sashal@kernel.org&gt;
Cc: Audra Mitchell &lt;audra@redhat.com&gt;
Link: https://lore.kernel.org/all/ce4c2f67-c298-48a0-87a3-f933d646c73b@leemhuis.info/
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "workqueue: Move pwq-&gt;max_active to wq-&gt;max_active"</title>
<updated>2024-04-04T18:23:07Z</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2024-04-03T14:36:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d8354f268d928b6f04119d7a7cdd0145f3a6823f'/>
<id>urn:sha1:d8354f268d928b6f04119d7a7cdd0145f3a6823f</id>
<content type='text'>
This reverts commit 82e098f5bed1ff167332d26f8551662098747ec4 which is
commit a045a272d887575da17ad86d6573e82871b50c27 upstream.

The workqueue patches backported to 6.6.y caused some reported
regressions, so revert them for now.

Reported-by: Thorsten Leemhuis &lt;regressions@leemhuis.info&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Cc: Nathan Chancellor &lt;nathan@kernel.org&gt;
Cc: Sasha Levin &lt;sashal@kernel.org&gt;
Cc: Audra Mitchell &lt;audra@redhat.com&gt;
Link: https://lore.kernel.org/all/ce4c2f67-c298-48a0-87a3-f933d646c73b@leemhuis.info/
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "workqueue: Factor out pwq_is_empty()"</title>
<updated>2024-04-04T18:23:07Z</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2024-04-03T14:36:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=35bf38dd162bbbb8682b12e31ee78df54f40b7ca'/>
<id>urn:sha1:35bf38dd162bbbb8682b12e31ee78df54f40b7ca</id>
<content type='text'>
This reverts commit bad184d26a4f68aa00ad75502f9669950a790f71 which is
commit afa87ce85379e2d93863fce595afdb5771a84004 upstream.

The workqueue patches backported to 6.6.y caused some reported
regressions, so revert them for now.

Reported-by: Thorsten Leemhuis &lt;regressions@leemhuis.info&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Cc: Nathan Chancellor &lt;nathan@kernel.org&gt;
Cc: Sasha Levin &lt;sashal@kernel.org&gt;
Cc: Audra Mitchell &lt;audra@redhat.com&gt;
Link: https://lore.kernel.org/all/ce4c2f67-c298-48a0-87a3-f933d646c73b@leemhuis.info/
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "workqueue: Replace pwq_activate_inactive_work() with [__]pwq_activate_work()"</title>
<updated>2024-04-04T18:23:07Z</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2024-04-03T14:36:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=957578ec33d4d21dced2d0b219bd88b1464307fb'/>
<id>urn:sha1:957578ec33d4d21dced2d0b219bd88b1464307fb</id>
<content type='text'>
This reverts commit 6c592f0bb96815117538491e5ba12e0a8a8c4493 which is
commit 4c6380305d21e36581b451f7337a36c93b64e050 upstream.

The workqueue patches backported to 6.6.y caused some reported
regressions, so revert them for now.

Reported-by: Thorsten Leemhuis &lt;regressions@leemhuis.info&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Cc: Nathan Chancellor &lt;nathan@kernel.org&gt;
Cc: Sasha Levin &lt;sashal@kernel.org&gt;
Cc: Audra Mitchell &lt;audra@redhat.com&gt;
Link: https://lore.kernel.org/all/ce4c2f67-c298-48a0-87a3-f933d646c73b@leemhuis.info/
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "workqueue: Move nr_active handling into helpers"</title>
<updated>2024-04-04T18:23:07Z</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2024-04-03T14:36:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5debbff9539c9c536d71e91d4bb8995206672b90'/>
<id>urn:sha1:5debbff9539c9c536d71e91d4bb8995206672b90</id>
<content type='text'>
This reverts commit 4023a2d95076918abe2757d60810642a8115b586 which is
commit 1c270b79ce0b8290f146255ea9057243f6dd3c17 upstream.

The workqueue patches backported to 6.6.y caused some reported
regressions, so revert them for now.

Reported-by: Thorsten Leemhuis &lt;regressions@leemhuis.info&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Cc: Nathan Chancellor &lt;nathan@kernel.org&gt;
Cc: Sasha Levin &lt;sashal@kernel.org&gt;
Cc: Audra Mitchell &lt;audra@redhat.com&gt;
Link: https://lore.kernel.org/all/ce4c2f67-c298-48a0-87a3-f933d646c73b@leemhuis.info/
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "workqueue: Make wq_adjust_max_active() round-robin pwqs while activating"</title>
<updated>2024-04-04T18:23:07Z</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2024-04-03T14:36:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e3ee73b57a2e67010407c9f02514916cab19bbc7'/>
<id>urn:sha1:e3ee73b57a2e67010407c9f02514916cab19bbc7</id>
<content type='text'>
This reverts commit 5f99fee6f2dea1228980c3e785ab1a2c69b4da3c which is
commit qc5404d4e6df6faba1007544b5f4e62c7c14416dd upstream.

The workqueue patches backported to 6.6.y caused some reported
regressions, so revert them for now.

Reported-by: Thorsten Leemhuis &lt;regressions@leemhuis.info&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Cc: Nathan Chancellor &lt;nathan@kernel.org&gt;
Cc: Sasha Levin &lt;sashal@kernel.org&gt;
Cc: Audra Mitchell &lt;audra@redhat.com&gt;
Link: https://lore.kernel.org/all/ce4c2f67-c298-48a0-87a3-f933d646c73b@leemhuis.info/
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "workqueue: RCU protect wq-&gt;dfl_pwq and implement accessors for it"</title>
<updated>2024-04-04T18:23:06Z</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2024-04-03T14:36:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f3c11cb27a8b7c71cf48c8990ace6c8a0783f6db'/>
<id>urn:sha1:f3c11cb27a8b7c71cf48c8990ace6c8a0783f6db</id>
<content type='text'>
This reverts commit bd31fb926dfa02d2ccfb4b79389168b1d16f36b1 which is
commit 9f66cff212bb3c1cd25996aaa0dfd0c9e9d8baab upstream.

The workqueue patches backported to 6.6.y caused some reported
regressions, so revert them for now.

Reported-by: Thorsten Leemhuis &lt;regressions@leemhuis.info&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Cc: Nathan Chancellor &lt;nathan@kernel.org&gt;
Cc: Sasha Levin &lt;sashal@kernel.org&gt;
Cc: Audra Mitchell &lt;audra@redhat.com&gt;
Link: https://lore.kernel.org/all/ce4c2f67-c298-48a0-87a3-f933d646c73b@leemhuis.info/
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
