<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/debug/debug_core.c, branch v5.0.3</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.0.3</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.0.3'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2018-12-30T08:31:23Z</updated>
<entry>
<title>kdb: Don't back trace on a cpu that didn't round up</title>
<updated>2018-12-30T08:31:23Z</updated>
<author>
<name>Douglas Anderson</name>
<email>dianders@chromium.org</email>
</author>
<published>2018-12-05T03:38:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=162bc7f5afd75b72acbe3c5f3488ef7e64a3fe36'/>
<id>urn:sha1:162bc7f5afd75b72acbe3c5f3488ef7e64a3fe36</id>
<content type='text'>
If you have a CPU that fails to round up and then run 'btc' you'll end
up crashing in kdb becaue we dereferenced NULL.  Let's add a check.
It's wise to also set the task to NULL when leaving the debugger so
that if we fail to round up on a later entry into the debugger we
won't backtrace a stale task.

Signed-off-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Acked-by: Daniel Thompson &lt;daniel.thompson@linaro.org&gt;
Signed-off-by: Daniel Thompson &lt;daniel.thompson@linaro.org&gt;
</content>
</entry>
<entry>
<title>kgdb: Don't round up a CPU that failed rounding up before</title>
<updated>2018-12-30T08:29:13Z</updated>
<author>
<name>Douglas Anderson</name>
<email>dianders@chromium.org</email>
</author>
<published>2018-12-05T03:38:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=87b095928584da7d5cb3149016f00b0b139c2292'/>
<id>urn:sha1:87b095928584da7d5cb3149016f00b0b139c2292</id>
<content type='text'>
If we're using the default implementation of kgdb_roundup_cpus() that
uses smp_call_function_single_async() we can end up hanging
kgdb_roundup_cpus() if we try to round up a CPU that failed to round
up before.

Specifically smp_call_function_single_async() will try to wait on the
csd lock for the CPU that we're trying to round up.  If the previous
round up never finished then that lock could still be held and we'll
just sit there hanging.

There's not a lot of use trying to round up a CPU that failed to round
up before.  Let's keep a flag that indicates whether the CPU started
but didn't finish to round up before.  If we see that flag set then
we'll skip the next round up.

In general we have a few goals here:
- We never want to end up calling smp_call_function_single_async()
  when the csd is still locked.  This is accomplished because
  flush_smp_call_function_queue() unlocks the csd _before_ invoking
  the callback.  That means that when kgdb_nmicallback() runs we know
  for sure the the csd is no longer locked.  Thus when we set
  "rounding_up = false" we know for sure that the csd is unlocked.
- If there are no timeouts rounding up we should never skip a round
  up.

NOTE #1: In general trying to continue running after failing to round
up CPUs doesn't appear to be supported in the debugger.  When I
simulate this I find that kdb reports "Catastrophic error detected"
when I try to continue.  I can overrule and continue anyway, but it
should be noted that we may be entering the land of dragons here.
Possibly the "Catastrophic error detected" was added _because_ of the
future failure to round up, but even so this is an area of the code
that hasn't been strongly tested.

NOTE #2: I did a bit of testing before and after this change.  I
introduced a 10 second hang in the kernel while holding a spinlock
that I could invoke on a certain CPU with 'taskset -c 3 cat /sys/...".

Before this change if I did:
- Invoke hang
- Enter debugger
- g (which warns about Catastrophic error, g again to go anyway)
- g
- Enter debugger

...I'd hang the rest of the 10 seconds without getting a debugger
prompt.  After this change I end up in the debugger the 2nd time after
only 1 second with the standard warning about 'Timed out waiting for
secondary CPUs.'

I'll also note that once the CPU finished waiting I could actually
debug it (aka "btc" worked)

I won't promise that everything works perfectly if the errant CPU
comes back at just the wrong time (like as we're entering or exiting
the debugger) but it certainly seems like an improvement.

NOTE #3: setting 'kgdb_info[cpu].rounding_up = false' is in
kgdb_nmicallback() instead of kgdb_call_nmi_hook() because some
implementations override kgdb_call_nmi_hook().  It shouldn't hurt to
have it in kgdb_nmicallback() in any case.

NOTE #4: this logic is really only needed because there is no API call
like "smp_try_call_function_single_async()" or "smp_csd_is_locked()".
If such an API existed then we'd use it instead, but it seemed a bit
much to add an API like this just for kgdb.

Signed-off-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Acked-by: Daniel Thompson &lt;daniel.thompson@linaro.org&gt;
Signed-off-by: Daniel Thompson &lt;daniel.thompson@linaro.org&gt;
</content>
</entry>
<entry>
<title>kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()</title>
<updated>2018-12-30T08:28:02Z</updated>
<author>
<name>Douglas Anderson</name>
<email>dianders@chromium.org</email>
</author>
<published>2018-12-05T03:38:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3cd99ac3559855f69afbc1d5080e17eaa12394ff'/>
<id>urn:sha1:3cd99ac3559855f69afbc1d5080e17eaa12394ff</id>
<content type='text'>
When I had lockdep turned on and dropped into kgdb I got a nice splat
on my system.  Specifically it hit:
  DEBUG_LOCKS_WARN_ON(current-&gt;hardirq_context)

Specifically it looked like this:
  sysrq: SysRq : DEBUG
  ------------[ cut here ]------------
  DEBUG_LOCKS_WARN_ON(current-&gt;hardirq_context)
  WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 lockdep_hardirqs_on+0xf0/0x160
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
  pstate: 604003c9 (nZCv DAIF +PAN -UAO)
  pc : lockdep_hardirqs_on+0xf0/0x160
  ...
  Call trace:
   lockdep_hardirqs_on+0xf0/0x160
   trace_hardirqs_on+0x188/0x1ac
   kgdb_roundup_cpus+0x14/0x3c
   kgdb_cpu_enter+0x53c/0x5cc
   kgdb_handle_exception+0x180/0x1d4
   kgdb_compiled_brk_fn+0x30/0x3c
   brk_handler+0x134/0x178
   do_debug_exception+0xfc/0x178
   el1_dbg+0x18/0x78
   kgdb_breakpoint+0x34/0x58
   sysrq_handle_dbg+0x54/0x5c
   __handle_sysrq+0x114/0x21c
   handle_sysrq+0x30/0x3c
   qcom_geni_serial_isr+0x2dc/0x30c
  ...
  ...
  irq event stamp: ...45
  hardirqs last  enabled at (...44): [...] __do_softirq+0xd8/0x4e4
  hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
  softirqs last  enabled at (...42): [...] _local_bh_enable+0x2c/0x34
  softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
  ---[ end trace adf21f830c46e638 ]---

Looking closely at it, it seems like a really bad idea to be calling
local_irq_enable() in kgdb_roundup_cpus().  If nothing else that seems
like it could violate spinlock semantics and cause a deadlock.

Instead, let's use a private csd alongside
smp_call_function_single_async() to round up the other CPUs.  Using
smp_call_function_single_async() doesn't require interrupts to be
enabled so we can remove the offending bit of code.

In order to avoid duplicating this across all the architectures that
use the default kgdb_roundup_cpus(), we'll add a "weak" implementation
to debug_core.c.

Looking at all the people who previously had copies of this code,
there were a few variants.  I've attempted to keep the variants
working like they used to.  Specifically:
* For arch/arc we passed NULL to kgdb_nmicallback() instead of
  get_irq_regs().
* For arch/mips there was a bit of extra code around
  kgdb_nmicallback()

NOTE: In this patch we will still get into trouble if we try to round
up a CPU that failed to round up before.  We'll try to round it up
again and potentially hang when we try to grab the csd lock.  That's
not new behavior but we'll still try to do better in a future patch.

Suggested-by: Daniel Thompson &lt;daniel.thompson@linaro.org&gt;
Signed-off-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Cc: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Richard Kuo &lt;rkuo@codeaurora.org&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Cc: Paul Burton &lt;paul.burton@mips.com&gt;
Cc: James Hogan &lt;jhogan@kernel.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Acked-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Daniel Thompson &lt;daniel.thompson@linaro.org&gt;
</content>
</entry>
<entry>
<title>kgdb: Remove irq flags from roundup</title>
<updated>2018-12-30T08:24:21Z</updated>
<author>
<name>Douglas Anderson</name>
<email>dianders@chromium.org</email>
</author>
<published>2018-12-05T03:38:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9ef7fa507d6b53a96de4da3298c5f01bde603c0a'/>
<id>urn:sha1:9ef7fa507d6b53a96de4da3298c5f01bde603c0a</id>
<content type='text'>
The function kgdb_roundup_cpus() was passed a parameter that was
documented as:

&gt; the flags that will be used when restoring the interrupts. There is
&gt; local_irq_save() call before kgdb_roundup_cpus().

Nobody used those flags.  Anyone who wanted to temporarily turn on
interrupts just did local_irq_enable() and local_irq_disable() without
looking at them.  So we can definitely remove the flags.

Signed-off-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Cc: Vineet Gupta &lt;vgupta@synopsys.com&gt;
Cc: Russell King &lt;linux@armlinux.org.uk&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Richard Kuo &lt;rkuo@codeaurora.org&gt;
Cc: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Cc: Paul Burton &lt;paul.burton@mips.com&gt;
Cc: James Hogan &lt;jhogan@kernel.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Acked-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Daniel Thompson &lt;daniel.thompson@linaro.org&gt;
</content>
</entry>
<entry>
<title>sched/headers: Prepare for new header dependencies before moving code to &lt;linux/sched/nmi.h&gt;</title>
<updated>2017-03-02T07:42:30Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2017-02-08T17:51:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=38b8d208a4544c9a26b10baec89b8a21042e5305'/>
<id>urn:sha1:38b8d208a4544c9a26b10baec89b8a21042e5305</id>
<content type='text'>
We are going to move softlockup APIs out of &lt;linux/sched.h&gt;, which
will have to be picked up from other headers and a couple of .c files.

&lt;linux/nmi.h&gt; already includes &lt;linux/sched.h&gt;.

Include the &lt;linux/nmi.h&gt; header in the files that are going to need it.

Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm/vmacache, sched/headers: Introduce 'struct vmacache' and move it from &lt;linux/sched.h&gt; to &lt;linux/mm_types&gt;</title>
<updated>2017-03-02T07:42:25Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2017-02-03T10:03:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=314ff7851fc8ea66cbf48eaa93d8ebfb5ca084a9'/>
<id>urn:sha1:314ff7851fc8ea66cbf48eaa93d8ebfb5ca084a9</id>
<content type='text'>
The &lt;linux/sched.h&gt; header includes various vmacache related defines,
which are arguably misplaced.

Move them to mm_types.h and minimize the sched.h impact by putting
all task vmacache state into a new 'struct vmacache' structure.

No change in functionality.

Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>kernel/debug/debug_core.c: more properly delay for secondary CPUs</title>
<updated>2016-12-15T00:04:08Z</updated>
<author>
<name>Douglas Anderson</name>
<email>dianders@chromium.org</email>
</author>
<published>2016-12-14T23:05:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2d13bb6494c807bcf3f78af0e96c0b8615a94385'/>
<id>urn:sha1:2d13bb6494c807bcf3f78af0e96c0b8615a94385</id>
<content type='text'>
We've got a delay loop waiting for secondary CPUs.  That loop uses
loops_per_jiffy.  However, loops_per_jiffy doesn't actually mean how
many tight loops make up a jiffy on all architectures.  It is quite
common to see things like this in the boot log:

  Calibrating delay loop (skipped), value calculated using timer
  frequency.. 48.00 BogoMIPS (lpj=24000)

In my case I was seeing lots of cases where other CPUs timed out
entering the debugger only to print their stack crawls shortly after the
kdb&gt; prompt was written.

Elsewhere in kgdb we already use udelay(), so that should be safe enough
to use to implement our timeout.  We'll delay 1 ms for 1000 times, which
should give us a full second of delay (just like the old code wanted)
but allow us to notice that we're done every 1 ms.

[akpm@linux-foundation.org: simplifications, per Daniel]
Link: http://lkml.kernel.org/r/1477091361-2039-1-git-send-email-dianders@chromium.org
Signed-off-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Reviewed-by: Daniel Thompson &lt;daniel.thompson@linaro.org&gt;
Cc: Jason Wessel &lt;jason.wessel@windriver.com&gt;
Cc: Brian Norris &lt;briannorris@chromium.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.0+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>debug: prevent entering debug mode on panic/exception.</title>
<updated>2015-02-19T18:39:03Z</updated>
<author>
<name>Colin Cross</name>
<email>ccross@android.com</email>
</author>
<published>2015-01-28T11:32:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5516fd7b92a7c16b9111a88aaa1b13dd937d8ac5'/>
<id>urn:sha1:5516fd7b92a7c16b9111a88aaa1b13dd937d8ac5</id>
<content type='text'>
On non-developer devices, kgdb prevents the device from rebooting
after a panic.

Incase of panics and exceptions, to allow the device to reboot, prevent
entering debug mode to avoid getting stuck waiting for the user to
interact with debugger.

To avoid entering the debugger on panic/exception without any extra
configuration, panic_timeout is being used which can be set via
/proc/sys/kernel/panic at run time and CONFIG_PANIC_TIMEOUT sets the
default value.

Setting panic_timeout indicates that the user requested machine to
perform unattended reboot after panic. We dont want to get stuck waiting
for the user input incase of panic.

Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: kgdb-bugreport@lists.sourceforge.net
Cc: linux-kernel@vger.kernel.org
Cc: Android Kernel Team &lt;kernel-team@android.com&gt;
Cc: John Stultz &lt;john.stultz@linaro.org&gt;
Cc: Sumit Semwal &lt;sumit.semwal@linaro.org&gt;
Signed-off-by: Colin Cross &lt;ccross@android.com&gt;
[Kiran: Added context to commit message.
panic_timeout is used instead of break_on_panic and
break_on_exception to honor CONFIG_PANIC_TIMEOUT
Modified the commit as per community feedback]
Signed-off-by: Kiran Raparthy &lt;kiran.kumar@linaro.org&gt;
Signed-off-by: Daniel Thompson &lt;daniel.thompson@linaro.org&gt;
Signed-off-by: Jason Wessel &lt;jason.wessel@windriver.com&gt;
</content>
</entry>
<entry>
<title>kdb: Fix off by one error in kdb_cpu()</title>
<updated>2015-02-19T18:39:02Z</updated>
<author>
<name>Jason Wessel</name>
<email>jason.wessel@windriver.com</email>
</author>
<published>2015-01-08T21:46:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=df0036d117e6c9df36324e517728e33543065f9a'/>
<id>urn:sha1:df0036d117e6c9df36324e517728e33543065f9a</id>
<content type='text'>
There was a follow on replacement patch against the prior
"kgdb: Timeout if secondary CPUs ignore the roundup".

See: https://lkml.org/lkml/2015/1/7/442

This patch is the delta vs the patch that was committed upstream:
  * Fix an off-by-one error in kdb_cpu().
  * Replace NR_CPUS with CONFIG_NR_CPUS to tell checkpatch that we
    really want a static limit.
  * Removed the "KGDB: " prefix from the pr_crit() in debug_core.c
    (kgdb-next contains a patch which introduced pr_fmt() to this file
    to the tag will now be applied automatically).

Cc: Daniel Thompson &lt;daniel.thompson@linaro.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Jason Wessel &lt;jason.wessel@windriver.com&gt;
</content>
</entry>
<entry>
<title>kernel/debug/debug_core.c: Logging clean-up</title>
<updated>2014-11-11T15:31:53Z</updated>
<author>
<name>Fabian Frederick</name>
<email>fabf@skynet.be</email>
</author>
<published>2014-06-12T19:30:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0f16996cf2ed7c368dd95b4c517ce572b96a10f5'/>
<id>urn:sha1:0f16996cf2ed7c368dd95b4c517ce572b96a10f5</id>
<content type='text'>
-Convert printk( to pr_foo()
-Add pr_fmt
-Coalesce formats

Cc: Jason Wessel &lt;jason.wessel@windriver.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Signed-off-by: Fabian Frederick &lt;fabf@skynet.be&gt;
Signed-off-by: Jason Wessel &lt;jason.wessel@windriver.com&gt;
</content>
</entry>
</feed>
