<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/watchdog.c, branch v6.16.2</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.16.2</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.16.2'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2025-05-21T17:48:22Z</updated>
<entry>
<title>kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count</title>
<updated>2025-05-21T17:48:22Z</updated>
<author>
<name>Max Kellermann</name>
<email>max.kellermann@ionos.com</email>
</author>
<published>2025-05-04T18:08:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=aaf05e96e93cb9c8fc8d35fc4525715530397655'/>
<id>urn:sha1:aaf05e96e93cb9c8fc8d35fc4525715530397655</id>
<content type='text'>
Patch series "sysfs: add counters for lockups and stalls", v2.

Commits 9db89b411170 ("exit: Expose "oops_count" to sysfs") and
8b05aa263361 ("panic: Expose "warn_count" to sysfs") added counters for
oopses and warnings to sysfs, and these two patches do the same for
hard/soft lockups and RCU stalls.

All of these counters are useful for monitoring tools to detect whether
the machine is healthy.  If the kernel has experienced a lockup or a
stall, it's probably due to a kernel bug, and I'd like to detect that
quickly and easily.  There is currently no way to detect that, other than
parsing dmesg.  Or observing indirect effects: such as certain tasks not
responding, but then I need to observe all tasks, and it may take a while
until these effects become visible/measurable.  I'd rather be able to
detect the primary cause more quickly, possibly before everything falls
apart.


This patch (of 2):

There is /proc/sys/kernel/hung_task_detect_count, /sys/kernel/warn_count
and /sys/kernel/oops_count but there is no userspace-accessible counter
for hard/soft lockups.  Having this is useful for monitoring tools.

Link: https://lkml.kernel.org/r/20250504180831.4190860-1-max.kellermann@ionos.com
Link: https://lkml.kernel.org/r/20250504180831.4190860-2-max.kellermann@ionos.com
Signed-off-by: Max Kellermann &lt;max.kellermann@ionos.com&gt;
Cc:
Cc: Core Minyard &lt;cminyard@mvista.com&gt;
Cc: Doug Anderson &lt;dianders@chromium.org&gt;
Cc: Joel Granados &lt;joel.granados@kernel.org&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Kees Cook &lt;kees@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>watchdog: fix watchdog may detect false positive of softlockup</title>
<updated>2025-05-12T00:54:10Z</updated>
<author>
<name>Luo Gengkun</name>
<email>luogengkun@huaweicloud.com</email>
</author>
<published>2025-04-21T03:50:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7123dbbef88cfd9f09e8a7899b0911834600cfa3'/>
<id>urn:sha1:7123dbbef88cfd9f09e8a7899b0911834600cfa3</id>
<content type='text'>
When updating `watchdog_thresh`, there is a race condition between writing
the new `watchdog_thresh` value and stopping the old watchdog timer.  If
the old timer triggers during this window, it may falsely detect a
softlockup due to the old interval and the new `watchdog_thresh` value
being used.  The problem can be described as follow:

 # We asuume previous watchdog_thresh is 60, so the watchdog timer is
 # coming every 24s.
echo 10 &gt; /proc/sys/kernel/watchdog_thresh (User space)
|
+------&gt;+ update watchdog_thresh (We are in kernel now)
	|
	|	  # using old interval and new `watchdog_thresh`
	+------&gt;+ watchdog hrtimer (irq context: detect softlockup)
		|
		|
	+-------+
	|
	|
	+ softlockup_stop_all

To fix this problem, introduce a shadow variable for `watchdog_thresh`. 
The update to the actual `watchdog_thresh` is delayed until after the old
timer is stopped, preventing false positives.

The following testcase may help to understand this problem.

---------------------------------------------
echo RT_RUNTIME_SHARE &gt; /sys/kernel/debug/sched/features
echo -1 &gt; /proc/sys/kernel/sched_rt_runtime_us
echo 0 &gt; /sys/kernel/debug/sched/fair_server/cpu3/runtime
echo 60 &gt; /proc/sys/kernel/watchdog_thresh
taskset -c 3 chrt -r 99 /bin/bash -c "while true;do true; done" &amp;
echo 10 &gt; /proc/sys/kernel/watchdog_thresh &amp;
---------------------------------------------

The test case above first removes the throttling restrictions for
real-time tasks.  It then sets watchdog_thresh to 60 and executes a
real-time task ,a simple while(1) loop, on cpu3.  Consequently, the final
command gets blocked because the presence of this real-time thread
prevents kworker:3 from being selected by the scheduler.  This eventually
triggers a softlockup detection on cpu3 due to watchdog_timer_fn operating
with inconsistent variable - using both the old interval and the updated
watchdog_thresh simultaneously.

[nysal@linux.ibm.com: fix the SOFTLOCKUP_DETECTOR=n case]
  Link: https://lkml.kernel.org/r/20250502111120.282690-1-nysal@linux.ibm.com
Link: https://lkml.kernel.org/r/20250421035021.3507649-1-luogengkun@huaweicloud.com
Signed-off-by: Luo Gengkun &lt;luogengkun@huaweicloud.com&gt;
Signed-off-by: Nysal Jan K.A. &lt;nysal@linux.ibm.com&gt;
Cc: Doug Anderson &lt;dianders@chromium.org&gt;
Cc: Joel Granados &lt;joel.granados@kernel.org&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Thomas Gleinxer &lt;tglx@linutronix.de&gt;
Cc: "Nysal Jan K.A." &lt;nysal@linux.ibm.com&gt;
Cc: Venkat Rao Bagalkote &lt;venkat88@linux.ibm.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2025-03-25T17:54:15Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-03-25T17:54:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a50b4fe095fb98e0b7da03b0a42fd1247284868e'/>
<id>urn:sha1:a50b4fe095fb98e0b7da03b0a42fd1247284868e</id>
<content type='text'>
Pull timer cleanups from Thomas Gleixner:
 "A treewide hrtimer timer cleanup

  hrtimers are initialized with hrtimer_init() and a subsequent store to
  the callback pointer. This turned out to be suboptimal for the
  upcoming Rust integration and is obviously a silly implementation to
  begin with.

  This cleanup replaces the hrtimer_init(T); T-&gt;function = cb; sequence
  with hrtimer_setup(T, cb);

  The conversion was done with Coccinelle and a few manual fixups.

  Once the conversion has completely landed in mainline, hrtimer_init()
  will be removed and the hrtimer::function becomes a private member"

* tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits)
  wifi: rt2x00: Switch to use hrtimer_update_function()
  io_uring: Use helper function hrtimer_update_function()
  serial: xilinx_uartps: Use helper function hrtimer_update_function()
  ASoC: fsl: imx-pcm-fiq: Switch to use hrtimer_setup()
  RDMA: Switch to use hrtimer_setup()
  virtio: mem: Switch to use hrtimer_setup()
  drm/vmwgfx: Switch to use hrtimer_setup()
  drm/xe/oa: Switch to use hrtimer_setup()
  drm/vkms: Switch to use hrtimer_setup()
  drm/msm: Switch to use hrtimer_setup()
  drm/i915/request: Switch to use hrtimer_setup()
  drm/i915/uncore: Switch to use hrtimer_setup()
  drm/i915/pmu: Switch to use hrtimer_setup()
  drm/i915/perf: Switch to use hrtimer_setup()
  drm/i915/gvt: Switch to use hrtimer_setup()
  drm/i915/huc: Switch to use hrtimer_setup()
  drm/amdgpu: Switch to use hrtimer_setup()
  stm class: heartbeat: Switch to use hrtimer_setup()
  i2c: Switch to use hrtimer_setup()
  iio: Switch to use hrtimer_setup()
  ...
</content>
</entry>
<entry>
<title>watchdog/hardlockup/perf: Fix perf_event memory leak</title>
<updated>2025-03-06T11:05:33Z</updated>
<author>
<name>Li Huafei</name>
<email>lihuafei1@huawei.com</email>
</author>
<published>2024-10-21T19:30:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d6834d9c990333bfa433bc1816e2417f268eebbe'/>
<id>urn:sha1:d6834d9c990333bfa433bc1816e2417f268eebbe</id>
<content type='text'>
During stress-testing, we found a kmemleak report for perf_event:

  unreferenced object 0xff110001410a33e0 (size 1328):
    comm "kworker/4:11", pid 288, jiffies 4294916004
    hex dump (first 32 bytes):
      b8 be c2 3b 02 00 11 ff 22 01 00 00 00 00 ad de  ...;....".......
      f0 33 0a 41 01 00 11 ff f0 33 0a 41 01 00 11 ff  .3.A.....3.A....
    backtrace (crc 24eb7b3a):
      [&lt;00000000e211b653&gt;] kmem_cache_alloc_node_noprof+0x269/0x2e0
      [&lt;000000009d0985fa&gt;] perf_event_alloc+0x5f/0xcf0
      [&lt;00000000084ad4a2&gt;] perf_event_create_kernel_counter+0x38/0x1b0
      [&lt;00000000fde96401&gt;] hardlockup_detector_event_create+0x50/0xe0
      [&lt;0000000051183158&gt;] watchdog_hardlockup_enable+0x17/0x70
      [&lt;00000000ac89727f&gt;] softlockup_start_fn+0x15/0x40
      ...

Our stress test includes CPU online and offline cycles, and updating the
watchdog configuration.

After reading the code, I found that there may be a race between cleaning up
perf_event after updating watchdog and disabling event when the CPU goes offline:

  CPU0                          CPU1                           CPU2
  (update watchdog)                                            (hotplug offline CPU1)

  ...                                                          _cpu_down(CPU1)
  cpus_read_lock()                                             // waiting for cpu lock
    softlockup_start_all
      smp_call_on_cpu(CPU1)
                                softlockup_start_fn
                                ...
                                  watchdog_hardlockup_enable(CPU1)
                                    perf create E1
                                    watchdog_ev[CPU1] = E1
  cpus_read_unlock()
                                                               cpus_write_lock()
                                                               cpuhp_kick_ap_work(CPU1)
                                cpuhp_thread_fun
                                ...
                                  watchdog_hardlockup_disable(CPU1)
                                    watchdog_ev[CPU1] = NULL
                                    dead_event[CPU1] = E1
  __lockup_detector_cleanup
    for each dead_events_mask
      release each dead_event
      /*
       * CPU1 has not been added to
       * dead_events_mask, then E1
       * will not be released
       */
                                    CPU1 -&gt; dead_events_mask
    cpumask_clear(&amp;dead_events_mask)
    // dead_events_mask is cleared, E1 is leaked

In this case, the leaked perf_event E1 matches the perf_event leak
reported by kmemleak. Due to the low probability of problem recurrence
(only reported once), I added some hack delays in the code:

  static void __lockup_detector_reconfigure(void)
  {
    ...
          watchdog_hardlockup_start();
          cpus_read_unlock();
  +       mdelay(100);
          /*
           * Must be called outside the cpus locked section to prevent
           * recursive locking in the perf code.
    ...
  }

  void watchdog_hardlockup_disable(unsigned int cpu)
  {
    ...
                  perf_event_disable(event);
                  this_cpu_write(watchdog_ev, NULL);
                  this_cpu_write(dead_event, event);
  +               mdelay(100);
                  cpumask_set_cpu(smp_processor_id(), &amp;dead_events_mask);
                  atomic_dec(&amp;watchdog_cpus);
    ...
  }

  void hardlockup_detector_perf_cleanup(void)
  {
    ...
                          perf_event_release_kernel(event);
                  per_cpu(dead_event, cpu) = NULL;
          }
  +       mdelay(100);
          cpumask_clear(&amp;dead_events_mask);
  }

Then, simultaneously performing CPU on/off and switching watchdog, it is
almost certain to reproduce this leak.

The problem here is that releasing perf_event is not within the CPU
hotplug read-write lock. Commit:

  941154bd6937 ("watchdog/hardlockup/perf: Prevent CPU hotplug deadlock")

introduced deferred release to solve the deadlock caused by calling
get_online_cpus() when releasing perf_event. Later, commit:

  efe951d3de91 ("perf/x86: Fix perf,x86,cpuhp deadlock")

removed the get_online_cpus() call on the perf_event release path to solve
another deadlock problem.

Therefore, it is now possible to move the release of perf_event back
into the CPU hotplug read-write lock, and release the event immediately
after disabling it.

Fixes: 941154bd6937 ("watchdog/hardlockup/perf: Prevent CPU hotplug deadlock")
Signed-off-by: Li Huafei &lt;lihuafei1@huawei.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20241021193004.308303-1-lihuafei1@huawei.com
</content>
</entry>
<entry>
<title>watchdog: Switch to use hrtimer_setup()</title>
<updated>2025-02-18T09:32:33Z</updated>
<author>
<name>Nam Cao</name>
<email>namcao@linutronix.de</email>
</author>
<published>2025-02-05T10:39:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d2254b0643222ffa7e4e7ac0ae8aa0e4cbf6d09a'/>
<id>urn:sha1:d2254b0643222ffa7e4e7ac0ae8aa0e4cbf6d09a</id>
<content type='text'>
hrtimer_setup() takes the callback function pointer as argument and
initializes the timer completely.

Replace hrtimer_init() and the open coded initialization of
hrtimer::function with the new setup mechanism.

Patch was created by using Coccinelle.

Signed-off-by: Nam Cao &lt;namcao@linutronix.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/all/a5c62f2b5e1ea1cf4d32f37bc2d21a8eeab2f875.1738746821.git.namcao@linutronix.de

</content>
</entry>
<entry>
<title>treewide: const qualify ctl_tables where applicable</title>
<updated>2025-01-28T12:48:37Z</updated>
<author>
<name>Joel Granados</name>
<email>joel.granados@kernel.org</email>
</author>
<published>2025-01-28T12:48:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1751f872cc97f992ed5c4c72c55588db1f0021e1'/>
<id>urn:sha1:1751f872cc97f992ed5c4c72c55588db1f0021e1</id>
<content type='text'>
Add the const qualifier to all the ctl_tables in the tree except for
watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls,
loadpin_sysctl_table and the ones calling register_net_sysctl (./net,
drivers/inifiniband dirs). These are special cases as they use a
registration function with a non-const qualified ctl_table argument or
modify the arrays before passing them on to the registration function.

Constifying ctl_table structs will prevent the modification of
proc_handler function pointers as the arrays would reside in .rodata.
This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide:
constify the ctl_table argument of proc_handlers") constified all the
proc_handlers.

Created this by running an spatch followed by a sed command:
Spatch:
    virtual patch

    @
    depends on !(file in "net")
    disable optional_qualifier
    @

    identifier table_name != {
      watchdog_hardlockup_sysctl,
      iwcm_ctl_table,
      ucma_ctl_table,
      memory_allocation_profiling_sysctls,
      loadpin_sysctl_table
    };
    @@

    + const
    struct ctl_table table_name [] = { ... };

sed:
    sed --in-place \
      -e "s/struct ctl_table .table = &amp;uts_kern/const struct ctl_table *table = \&amp;uts_kern/" \
      kernel/utsname_sysctl.c

Reviewed-by: Song Liu &lt;song@kernel.org&gt;
Acked-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt; # for kernel/trace/
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt; # SCSI
Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt; # xfs
Acked-by: Jani Nikula &lt;jani.nikula@intel.com&gt;
Acked-by: Corey Minyard &lt;cminyard@mvista.com&gt;
Acked-by: Wei Liu &lt;wei.liu@kernel.org&gt;
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Bill O'Donnell &lt;bodonnel@redhat.com&gt;
Acked-by: Baoquan He &lt;bhe@redhat.com&gt;
Acked-by: Ashutosh Dixit &lt;ashutosh.dixit@intel.com&gt;
Acked-by: Anna Schumaker &lt;anna.schumaker@oracle.com&gt;
Signed-off-by: Joel Granados &lt;joel.granados@kernel.org&gt;
</content>
</entry>
<entry>
<title>watchdog: output this_cpu when printing hard LOCKUP</title>
<updated>2025-01-13T04:21:05Z</updated>
<author>
<name>Yunhui Cui</name>
<email>cuiyunhui@bytedance.com</email>
</author>
<published>2024-12-10T09:52:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3c16fc0c913a62d4ed52be1b31519f8837086187'/>
<id>urn:sha1:3c16fc0c913a62d4ed52be1b31519f8837086187</id>
<content type='text'>
When printing "Watchdog detected hard LOCKUP on cpu", also output the
detecting CPU.  It's more intuitive.

Link: https://lkml.kernel.org/r/20241210095238.63444-1-cuiyunhui@bytedance.com
Signed-off-by: Yunhui Cui &lt;cuiyunhui@bytedance.com&gt;
Reviewed-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Cc: Bitao Hu &lt;yaoma@linux.alibaba.com&gt;
Cc: Joel Granados &lt;joel.granados@kernel.org&gt;
Cc: John Ogness &lt;john.ogness@linutronix.de&gt;
Cc: Liu Song &lt;liusong@linux.alibaba.com&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Thomas Weißschuh &lt;linux@weissschuh.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2024-11-26T00:09:48Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-11-26T00:09:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f5f4745a7f057b58c9728ee4e2c5d6d79f382fe7'/>
<id>urn:sha1:f5f4745a7f057b58c9728ee4e2c5d6d79f382fe7</id>
<content type='text'>
Pull non-MM updates from Andrew Morton:

 - The series "resource: A couple of cleanups" from Andy Shevchenko
   performs some cleanups in the resource management code

 - The series "Improve the copy of task comm" from Yafang Shao addresses
   possible race-induced overflows in the management of
   task_struct.comm[]

 - The series "Remove unnecessary header includes from
   {tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a
   small fix to the list_sort library code and to its selftest

 - The series "Enhance min heap API with non-inline functions and
   optimizations" also from Kuan-Wei Chiu optimizes and cleans up the
   min_heap library code

 - The series "nilfs2: Finish folio conversion" from Ryusuke Konishi
   finishes off nilfs2's folioification

 - The series "add detect count for hung tasks" from Lance Yang adds
   more userspace visibility into the hung-task detector's activity

 - Apart from that, singelton patches in many places - please see the
   individual changelogs for details

* tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
  gdb: lx-symbols: do not error out on monolithic build
  kernel/reboot: replace sprintf() with sysfs_emit()
  lib: util_macros_kunit: add kunit test for util_macros.h
  util_macros.h: fix/rework find_closest() macros
  Improve consistency of '#error' directive messages
  ocfs2: fix uninitialized value in ocfs2_file_read_iter()
  hung_task: add docs for hung_task_detect_count
  hung_task: add detect count for hung tasks
  dma-buf: use atomic64_inc_return() in dma_buf_getfile()
  fs/proc/kcore.c: fix coccinelle reported ERROR instances
  resource: avoid unnecessary resource tree walking in __region_intersects()
  ocfs2: remove unused errmsg function and table
  ocfs2: cluster: fix a typo
  lib/scatterlist: use sg_phys() helper
  checkpatch: always parse orig_commit in fixes tag
  nilfs2: convert metadata aops from writepage to writepages
  nilfs2: convert nilfs_recovery_copy_block() to take a folio
  nilfs2: convert nilfs_page_count_clean_buffers() to take a folio
  nilfs2: remove nilfs_writepage
  nilfs2: convert checkpoint file to be folio-based
  ...
</content>
</entry>
<entry>
<title>sched_ext: Enable the ops breather and eject BPF scheduler on softlockup</title>
<updated>2024-11-08T20:42:22Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2024-11-05T21:49:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e32c260195e6ff72940ab7826e38e0a0066fc58f'/>
<id>urn:sha1:e32c260195e6ff72940ab7826e38e0a0066fc58f</id>
<content type='text'>
On 2 x Intel Sapphire Rapids machines with 224 logical CPUs, a poorly
behaving BPF scheduler can live-lock the system by making multiple CPUs bang
on the same DSQ to the point where soft-lockup detection triggers before
SCX's own watchdog can take action. It also seems possible that the machine
can be live-locked enough to prevent scx_ops_helper, which is an RT task,
from running in a timely manner.

Implement scx_softlockup() which is called when three quarters of
soft-lockup threshold has passed. The function immediately enables the ops
breather and triggers an ops error to initiate ejection of the BPF
scheduler.

The previous and this patch combined enable the kernel to reliably recover
the system from live-lock conditions that can be triggered by a poorly
behaving BPF scheduler on Intel dual socket systems.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Douglas Anderson &lt;dianders@chromium.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kernel/watchdog: always restore watchdog_softlockup(,hardlockup)_user_enabled after proc show</title>
<updated>2024-11-06T01:12:27Z</updated>
<author>
<name>Tio Zhang</name>
<email>tiozhang@didiglobal.com</email>
</author>
<published>2024-09-06T09:47:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=838010180241f5a9779a9ef9a621cdd2842f7354'/>
<id>urn:sha1:838010180241f5a9779a9ef9a621cdd2842f7354</id>
<content type='text'>
Otherwise when watchdog_enabled becomes 0,
watchdog_softlockup(,hardlockup)_user_enabled will changes to 0 after proc
show.

Steps to reproduce:

  step 1:
  # cat /proc/sys/kernel/*watchdog
  1
  1
  1

  | name                             | value
  |----------------------------------|--------------------------
  | watchdog_enabled                 | 1
  |----------------------------------|--------------------------
  | watchdog_hardlockup_user_enabled | 1
  |----------------------------------|--------------------------
  | watchdog_softlockup_user_enabled | 1
  |----------------------------------|--------------------------
  | watchdog_user_enabled            | 1
  |----------------------------------|--------------------------

  step 2:
  # echo 0 &gt; /proc/sys/kernel/watchdog

  | name                             | value
  |----------------------------------|--------------------------
  | watchdog_enabled                 | 0
  |----------------------------------|--------------------------
  | watchdog_hardlockup_user_enabled | 1
  |----------------------------------|--------------------------
  | watchdog_softlockup_user_enabled | 1
  |----------------------------------|--------------------------
  | watchdog_user_enabled            | 0
  |----------------------------------|--------------------------

  step 3:
  # cat /proc/sys/kernel/*watchdog
  0
  0
  0

  | name                             | value
  |----------------------------------|--------------------------
  | watchdog_enabled                 | 0
  |----------------------------------|--------------------------
  | watchdog_hardlockup_user_enabled | 0
  |----------------------------------|--------------------------
  | watchdog_softlockup_user_enabled | 0
  |----------------------------------|--------------------------
  | watchdog_user_enabled            | 0
  |----------------------------------|--------------------------

  step 4:
  # echo 1 &gt; /proc/sys/kernel/watchdog

  | name                             | value
  |----------------------------------|--------------------------
  | watchdog_enabled                 | 0
  |----------------------------------|--------------------------
  | watchdog_hardlockup_user_enabled | 0
  |----------------------------------|--------------------------
  | watchdog_softlockup_user_enabled | 0
  |----------------------------------|--------------------------
  | watchdog_user_enabled            | 0
  |----------------------------------|--------------------------

  step 5:
  # cat /proc/sys/kernel/*watchdog
  0
  0
  0

If we dont do "step 3", do "step 4" right after "step 2", it will be

  | name                             | value
  |----------------------------------|--------------------------
  | watchdog_enabled                 | 1
  |----------------------------------|--------------------------
  | watchdog_hardlockup_user_enabled | 1
  |----------------------------------|--------------------------
  | watchdog_softlockup_user_enabled | 1
  |----------------------------------|--------------------------
  | watchdog_user_enabled            | 1
  |----------------------------------|--------------------------

then everything works correctly.

So this patch fix "step 3"'s value into

| name                             | value
|----------------------------------|--------------------------
| watchdog_enabled                 | 0
|----------------------------------|--------------------------
| watchdog_hardlockup_user_enabled | 1
|----------------------------------|--------------------------
| watchdog_softlockup_user_enabled | 1
|----------------------------------|--------------------------
| watchdog_user_enabled            | 0
|----------------------------------|--------------------------

And still print 0 as before.

Link: https://lkml.kernel.org/r/20240906094700.GA30052@didi-ThinkCentre-M930t-N000
Signed-off-by: Tio Zhang &lt;tiozhang@didiglobal.com&gt;
Reviewed-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Cc: Ben Segall &lt;bsegall@google.com&gt;
Cc: Daniel Bristot de Oliveira &lt;bristot@redhat.com&gt;
Cc: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: John Ogness &lt;john.ogness@linutronix.de&gt;
Cc: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Cc: Krister Johansen &lt;kjlx@templeofstupid.com&gt;
Cc: Li Zhe &lt;lizhe.67@bytedance.com&gt;
Cc: Luis Chamberlain &lt;mcgrof@kernel.org&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Thomas Weißschuh &lt;linux@weissschuh.net&gt;
Cc: Valentin Schneider &lt;vschneid@redhat.com&gt;
Cc: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
</feed>
