<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/watchdog.c, branch v4.4.232</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.232</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.232'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2017-01-06T10:16:16Z</updated>
<entry>
<title>kernel/watchdog: use nmi registers snapshot in hardlockup handler</title>
<updated>2017-01-06T10:16:16Z</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@yandex-team.ru</email>
</author>
<published>2016-12-14T23:04:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f2b8b3455b22405237110d929f107318a535a480'/>
<id>urn:sha1:f2b8b3455b22405237110d929f107318a535a480</id>
<content type='text'>
commit 4d1f0fb096aedea7bb5489af93498a82e467c480 upstream.

NMI handler doesn't call set_irq_regs(), it's set only by normal IRQ.
Thus get_irq_regs() returns NULL or stale registers snapshot with IP/SP
pointing to the code interrupted by IRQ which was interrupted by NMI.
NULL isn't a problem: in this case watchdog calls dump_stack() and
prints full stack trace including NMI.  But if we're stuck in IRQ
handler then NMI watchlog will print stack trace without IRQ part at
all.

This patch uses registers snapshot passed into NMI handler as arguments:
these registers point exactly to the instruction interrupted by NMI.

Fixes: 55537871ef66 ("kernel/watchdog.c: perform all-CPU backtrace in case of hard lockup")
Link: http://lkml.kernel.org/r/146771764784.86724.6006627197118544150.stgit@buzz
Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Cc: Jiri Kosina &lt;jkosina@suse.cz&gt;
Cc: Ulrich Obergfell &lt;uobergfe@redhat.com&gt;
Cc: Aaron Tomlin &lt;atomlin@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>watchdog: don't run proc_watchdog_update if new value is same as old</title>
<updated>2016-04-12T16:08:54Z</updated>
<author>
<name>Joshua Hunt</name>
<email>johunt@akamai.com</email>
</author>
<published>2016-03-17T21:17:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5f4a82d5e3492c26fb0263ca7f007180612e8b54'/>
<id>urn:sha1:5f4a82d5e3492c26fb0263ca7f007180612e8b54</id>
<content type='text'>
commit a1ee1932aa6bea0bb074f5e3ced112664e4637ed upstream.

While working on a script to restore all sysctl params before a series of
tests I found that writing any value into the
/proc/sys/kernel/{nmi_watchdog,soft_watchdog,watchdog,watchdog_thresh}
causes them to call proc_watchdog_update().

  NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
  NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
  NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
  NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.

There doesn't appear to be a reason for doing this work every time a write
occurs, so only do it when the values change.

Signed-off-by: Josh Hunt &lt;johunt@akamai.com&gt;
Acked-by: Don Zickus &lt;dzickus@redhat.com&gt;
Reviewed-by: Aaron Tomlin &lt;atomlin@redhat.com&gt;
Cc: Ulrich Obergfell &lt;uobergfe@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>kernel/watchdog.c: fix race between proc_watchdog_thresh() and watchdog_timer_fn()</title>
<updated>2015-11-06T03:34:48Z</updated>
<author>
<name>Ulrich Obergfell</name>
<email>uobergfe@redhat.com</email>
</author>
<published>2015-11-06T02:44:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=39d2da2161d35de301ec5397ce9103c68b883054'/>
<id>urn:sha1:39d2da2161d35de301ec5397ce9103c68b883054</id>
<content type='text'>
Theoretically it is possible that the watchdog timer expires right at the
time when a user sets 'watchdog_thresh' to zero (note: this disables the
lockup detectors).  In this scenario, the is_softlockup() function - which
is called by the timer - could produce a false positive.

Fix this by checking the current value of 'watchdog_thresh'.

Signed-off-by: Ulrich Obergfell &lt;uobergfe@redhat.com&gt;
Acked-by: Don Zickus &lt;dzickus@redhat.com&gt;
Reviewed-by: Aaron Tomlin &lt;atomlin@redhat.com&gt;
Cc: Ulrich Obergfell &lt;uobergfe@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kernel/watchdog.c: remove {get|put}_online_cpus() from watchdog_{park|unpark}_threads()</title>
<updated>2015-11-06T03:34:48Z</updated>
<author>
<name>Ulrich Obergfell</name>
<email>uobergfe@redhat.com</email>
</author>
<published>2015-11-06T02:44:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a2a45b85ec45db4b041ea5d93b21033dbc3cc0fc'/>
<id>urn:sha1:a2a45b85ec45db4b041ea5d93b21033dbc3cc0fc</id>
<content type='text'>
watchdog_{park|unpark}_threads() are now called in code paths that protect
themselves against CPU hotplug, so {get|put}_online_cpus() calls are
redundant and can be removed.

Signed-off-by: Ulrich Obergfell &lt;uobergfe@redhat.com&gt;
Acked-by: Don Zickus &lt;dzickus@redhat.com&gt;
Reviewed-by: Aaron Tomlin &lt;atomlin@redhat.com&gt;
Cc: Ulrich Obergfell &lt;uobergfe@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kernel/watchdog.c: avoid races between /proc handlers and CPU hotplug</title>
<updated>2015-11-06T03:34:48Z</updated>
<author>
<name>Ulrich Obergfell</name>
<email>uobergfe@redhat.com</email>
</author>
<published>2015-11-06T02:44:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8614ddef82139d08234dbf681188f9bcddae9f03'/>
<id>urn:sha1:8614ddef82139d08234dbf681188f9bcddae9f03</id>
<content type='text'>
The handler functions for watchdog parameters in /proc/sys/kernel do not
protect themselves against races with CPU hotplug.  Hence, theoretically
it is possible that a new watchdog thread is started on a hotplugged CPU
while a parameter is being modified, and the thread could thus use a
parameter value that is 'in transition'.

For example, if 'watchdog_thresh' is being set to zero (note: this
disables the lockup detectors) the thread would erroneously use the value
zero as the sample period.

To avoid such races and to keep the /proc handler code consistent,
call
     {get|put}_online_cpus() in proc_watchdog_common()
     {get|put}_online_cpus() in proc_watchdog_thresh()
     {get|put}_online_cpus() in proc_watchdog_cpumask()

Signed-off-by: Ulrich Obergfell &lt;uobergfe@redhat.com&gt;
Acked-by: Don Zickus &lt;dzickus@redhat.com&gt;
Reviewed-by: Aaron Tomlin &lt;atomlin@redhat.com&gt;
Cc: Ulrich Obergfell &lt;uobergfe@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kernel/watchdog.c: avoid race between lockup detector suspend/resume and CPU hotplug</title>
<updated>2015-11-06T03:34:48Z</updated>
<author>
<name>Ulrich Obergfell</name>
<email>uobergfe@redhat.com</email>
</author>
<published>2015-11-06T02:44:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ee89e71eb091d3ef8ca2be8bd4ec77ccfa91334c'/>
<id>urn:sha1:ee89e71eb091d3ef8ca2be8bd4ec77ccfa91334c</id>
<content type='text'>
The lockup detector suspend/resume interface that was introduced by
commit 8c073d27d7ad ("watchdog: introduce watchdog_suspend() and
watchdog_resume()") does not protect itself against races with CPU
hotplug.  Hence, theoretically it is possible that a new watchdog thread
is started on a hotplugged CPU while the lockup detector is suspended,
and the thread could thus interfere unexpectedly with the code that
requested to suspend the lockup detector.

Avoid the race by calling

  get_online_cpus() in lockup_detector_suspend()
  put_online_cpus() in lockup_detector_resume()

Signed-off-by: Ulrich Obergfell &lt;uobergfe@redhat.com&gt;
Acked-by: Don Zickus &lt;dzickus@redhat.com&gt;
Reviewed-by: Aaron Tomlin &lt;atomlin@redhat.com&gt;
Cc: Ulrich Obergfell &lt;uobergfe@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kernel/watchdog.c: add sysctl knob hardlockup_panic</title>
<updated>2015-11-06T03:34:48Z</updated>
<author>
<name>Don Zickus</name>
<email>dzickus@redhat.com</email>
</author>
<published>2015-11-06T02:44:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ac1f591249d95372f3a5ab3828d4af5dfbf5efd3'/>
<id>urn:sha1:ac1f591249d95372f3a5ab3828d4af5dfbf5efd3</id>
<content type='text'>
The only way to enable a hardlockup to panic the machine is to set
'nmi_watchdog=panic' on the kernel command line.

This makes it awkward for end users and folks who want to run automate
tests (like myself).

Mimic the softlockup_panic knob and create a /proc/sys/kernel/hardlockup_panic
knob.

Signed-off-by: Don Zickus &lt;dzickus@redhat.com&gt;
Cc: Ulrich Obergfell &lt;uobergfe@redhat.com&gt;
Acked-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
Reviewed-by: Aaron Tomlin &lt;atomlin@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kernel/watchdog.c: perform all-CPU backtrace in case of hard lockup</title>
<updated>2015-11-06T03:34:48Z</updated>
<author>
<name>Jiri Kosina</name>
<email>jkosina@suse.cz</email>
</author>
<published>2015-11-06T02:44:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=55537871ef666b4153fd1ef8782e4a13fee142cc'/>
<id>urn:sha1:55537871ef666b4153fd1ef8782e4a13fee142cc</id>
<content type='text'>
In many cases of hardlockup reports, it's actually not possible to know
why it triggered, because the CPU that got stuck is usually waiting on a
resource (with IRQs disabled) in posession of some other CPU is holding.

IOW, we are often looking at the stacktrace of the victim and not the
actual offender.

Introduce sysctl / cmdline parameter that makes it possible to have
hardlockup detector perform all-CPU backtrace.

Signed-off-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
Reviewed-by: Aaron Tomlin &lt;atomlin@redhat.com&gt;
Cc: Ulrich Obergfell &lt;uobergfe@redhat.com&gt;
Acked-by: Don Zickus &lt;dzickus@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>watchdog: do not unpark threads in watchdog_park_threads() on error</title>
<updated>2015-11-06T03:34:48Z</updated>
<author>
<name>Ulrich Obergfell</name>
<email>uobergfe@redhat.com</email>
</author>
<published>2015-11-06T02:44:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ee7fed540563b27e1028bec0b509921496c91bf9'/>
<id>urn:sha1:ee7fed540563b27e1028bec0b509921496c91bf9</id>
<content type='text'>
If kthread_park() returns an error, watchdog_park_threads() should not
blindly 'roll back' the already parked threads to the unparked state.
Instead leave it up to the callers to handle such errors appropriately in
their context.  For example, it is redundant to unpark the threads if the
lockup detectors will soon be disabled by the callers anyway.

Signed-off-by: Ulrich Obergfell &lt;uobergfe@redhat.com&gt;
Reviewed-by: Aaron Tomlin &lt;atomlin@redhat.com&gt;
Acked-by: Don Zickus &lt;dzickus@redhat.com&gt;
Cc: Ulrich Obergfell &lt;uobergfe@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>watchdog: implement error handling in lockup_detector_suspend()</title>
<updated>2015-11-06T03:34:48Z</updated>
<author>
<name>Ulrich Obergfell</name>
<email>uobergfe@redhat.com</email>
</author>
<published>2015-11-06T02:44:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c993590c6ae6273681d9fb2a8d26dce03bf9d96c'/>
<id>urn:sha1:c993590c6ae6273681d9fb2a8d26dce03bf9d96c</id>
<content type='text'>
lockup_detector_suspend() now handles errors from watchdog_park_threads().

Signed-off-by: Ulrich Obergfell &lt;uobergfe@redhat.com&gt;
Reviewed-by: Aaron Tomlin &lt;atomlin@redhat.com&gt;
Acked-by: Don Zickus &lt;dzickus@redhat.com&gt;
Cc: Ulrich Obergfell &lt;uobergfe@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
