<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/trace/ring_buffer.c, branch v3.2.65</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2.65</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2.65'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2014-11-05T20:27:49Z</updated>
<entry>
<title>ring-buffer: Fix infinite spin in reading buffer</title>
<updated>2014-11-05T20:27:49Z</updated>
<author>
<name>Steven Rostedt (Red Hat)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2014-10-02T20:51:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=77179bb84cada16355fc832cc12dde32dd45a5f2'/>
<id>urn:sha1:77179bb84cada16355fc832cc12dde32dd45a5f2</id>
<content type='text'>
commit 24607f114fd14f2f37e3e0cb3d47bce96e81e848 upstream.

Commit 651e22f2701b "ring-buffer: Always reset iterator to reader page"
fixed one bug but in the process caused another one. The reset is to
update the header page, but that fix also changed the way the cached
reads were updated. The cache reads are used to test if an iterator
needs to be updated or not.

A ring buffer iterator, when created, disables writes to the ring buffer
but does not stop other readers or consuming reads from happening.
Although all readers are synchronized via a lock, they are only
synchronized when in the ring buffer functions. Those functions may
be called by any number of readers. The iterator continues down when
its not interrupted by a consuming reader. If a consuming read
occurs, the iterator starts from the beginning of the buffer.

The way the iterator sees that a consuming read has happened since
its last read is by checking the reader "cache". The cache holds the
last counts of the read and the reader page itself.

Commit 651e22f2701b changed what was saved by the cache_read when
the rb_iter_reset() occurred, making the iterator never match the cache.
Then if the iterator calls rb_iter_reset(), it will go into an
infinite loop by checking if the cache doesn't match, doing the reset
and retrying, just to see that the cache still doesn't match! Which
should never happen as the reset is suppose to set the cache to the
current value and there's locks that keep a consuming reader from
having access to the data.

Fixes: 651e22f2701b "ring-buffer: Always reset iterator to reader page"
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>ring-buffer: Always reset iterator to reader page</title>
<updated>2014-09-13T22:41:44Z</updated>
<author>
<name>Steven Rostedt (Red Hat)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2014-08-06T18:11:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d41fb3c8983178043c8172ac8ee21f415f9bc267'/>
<id>urn:sha1:d41fb3c8983178043c8172ac8ee21f415f9bc267</id>
<content type='text'>
commit 651e22f2701b4113989237c3048d17337dd2185c upstream.

When performing a consuming read, the ring buffer swaps out a
page from the ring buffer with a empty page and this page that
was swapped out becomes the new reader page. The reader page
is owned by the reader and since it was swapped out of the ring
buffer, writers do not have access to it (there's an exception
to that rule, but it's out of scope for this commit).

When reading the "trace" file, it is a non consuming read, which
means that the data in the ring buffer will not be modified.
When the trace file is opened, a ring buffer iterator is allocated
and writes to the ring buffer are disabled, such that the iterator
will not have issues iterating over the data.

Although the ring buffer disabled writes, it does not disable other
reads, or even consuming reads. If a consuming read happens, then
the iterator is reset and starts reading from the beginning again.

My tests would sometimes trigger this bug on my i386 box:

WARNING: CPU: 0 PID: 5175 at kernel/trace/trace.c:1527 __trace_find_cmdline+0x66/0xaa()
Modules linked in:
CPU: 0 PID: 5175 Comm: grep Not tainted 3.16.0-rc3-test+ #8
Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
 00000000 00000000 f09c9e1c c18796b3 c1b5d74c f09c9e4c c103a0e3 c1b5154b
 f09c9e78 00001437 c1b5d74c 000005f7 c10bd85a c10bd85a c1cac57c f09c9eb0
 ed0e0000 f09c9e64 c103a185 00000009 f09c9e5c c1b5154b f09c9e78 f09c9e80^M
Call Trace:
 [&lt;c18796b3&gt;] dump_stack+0x4b/0x75
 [&lt;c103a0e3&gt;] warn_slowpath_common+0x7e/0x95
 [&lt;c10bd85a&gt;] ? __trace_find_cmdline+0x66/0xaa
 [&lt;c10bd85a&gt;] ? __trace_find_cmdline+0x66/0xaa
 [&lt;c103a185&gt;] warn_slowpath_fmt+0x33/0x35
 [&lt;c10bd85a&gt;] __trace_find_cmdline+0x66/0xaa^M
 [&lt;c10bed04&gt;] trace_find_cmdline+0x40/0x64
 [&lt;c10c3c16&gt;] trace_print_context+0x27/0xec
 [&lt;c10c4360&gt;] ? trace_seq_printf+0x37/0x5b
 [&lt;c10c0b15&gt;] print_trace_line+0x319/0x39b
 [&lt;c10ba3fb&gt;] ? ring_buffer_read+0x47/0x50
 [&lt;c10c13b1&gt;] s_show+0x192/0x1ab
 [&lt;c10bfd9a&gt;] ? s_next+0x5a/0x7c
 [&lt;c112e76e&gt;] seq_read+0x267/0x34c
 [&lt;c1115a25&gt;] vfs_read+0x8c/0xef
 [&lt;c112e507&gt;] ? seq_lseek+0x154/0x154
 [&lt;c1115ba2&gt;] SyS_read+0x54/0x7f
 [&lt;c188488e&gt;] syscall_call+0x7/0xb
---[ end trace 3f507febd6b4cc83 ]---
&gt;&gt;&gt;&gt; ##### CPU 1 buffer started ####

Which was the __trace_find_cmdline() function complaining about the pid
in the event record being negative.

After adding more test cases, this would trigger more often. Strangely
enough, it would never trigger on a single test, but instead would trigger
only when running all the tests. I believe that was the case because it
required one of the tests to be shutting down via delayed instances while
a new test started up.

After spending several days debugging this, I found that it was caused by
the iterator becoming corrupted. Debugging further, I found out why
the iterator became corrupted. It happened with the rb_iter_reset().

As consuming reads may not read the full reader page, and only part
of it, there's a "read" field to know where the last read took place.
The iterator, must also start at the read position. In the rb_iter_reset()
code, if the reader page was disconnected from the ring buffer, the iterator
would start at the head page within the ring buffer (where writes still
happen). But the mistake there was that it still used the "read" field
to start the iterator on the head page, where it should always start
at zero because readers never read from within the ring buffer where
writes occur.

I originally wrote a patch to have it set the iter-&gt;head to 0 instead
of iter-&gt;head_page-&gt;read, but then I questioned why it wasn't always
setting the iter to point to the reader page, as the reader page is
still valid.  The list_empty(reader_page-&gt;list) just means that it was
successful in swapping out. But the reader_page may still have data.

There was a bug report a long time ago that was not reproducible that
had something about trace_pipe (consuming read) not matching trace
(iterator read). This may explain why that happened.

Anyway, the correct answer to this bug is to always use the reader page
an not reset the iterator to inside the writable ring buffer.

Fixes: d769041f8653 "ring_buffer: implement new locking"
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>ring-buffer: Up rb_iter_peek() loop count to 3</title>
<updated>2014-09-13T22:41:43Z</updated>
<author>
<name>Steven Rostedt (Red Hat)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2014-08-06T19:36:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cb2dfe4e504ee38eaae667915c2ffaaab9669b4f'/>
<id>urn:sha1:cb2dfe4e504ee38eaae667915c2ffaaab9669b4f</id>
<content type='text'>
commit 021de3d904b88b1771a3a2cfc5b75023c391e646 upstream.

After writting a test to try to trigger the bug that caused the
ring buffer iterator to become corrupted, I hit another bug:

 WARNING: CPU: 1 PID: 5281 at kernel/trace/ring_buffer.c:3766 rb_iter_peek+0x113/0x238()
 Modules linked in: ipt_MASQUERADE sunrpc [...]
 CPU: 1 PID: 5281 Comm: grep Tainted: G        W     3.16.0-rc3-test+ #143
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
  0000000000000000 ffffffff81809a80 ffffffff81503fb0 0000000000000000
  ffffffff81040ca1 ffff8800796d6010 ffffffff810c138d ffff8800796d6010
  ffff880077438c80 ffff8800796d6010 ffff88007abbe600 0000000000000003
 Call Trace:
  [&lt;ffffffff81503fb0&gt;] ? dump_stack+0x4a/0x75
  [&lt;ffffffff81040ca1&gt;] ? warn_slowpath_common+0x7e/0x97
  [&lt;ffffffff810c138d&gt;] ? rb_iter_peek+0x113/0x238
  [&lt;ffffffff810c138d&gt;] ? rb_iter_peek+0x113/0x238
  [&lt;ffffffff810c14df&gt;] ? ring_buffer_iter_peek+0x2d/0x5c
  [&lt;ffffffff810c6f73&gt;] ? tracing_iter_reset+0x6e/0x96
  [&lt;ffffffff810c74a3&gt;] ? s_start+0xd7/0x17b
  [&lt;ffffffff8112b13e&gt;] ? kmem_cache_alloc_trace+0xda/0xea
  [&lt;ffffffff8114cf94&gt;] ? seq_read+0x148/0x361
  [&lt;ffffffff81132d98&gt;] ? vfs_read+0x93/0xf1
  [&lt;ffffffff81132f1b&gt;] ? SyS_read+0x60/0x8e
  [&lt;ffffffff8150bf9f&gt;] ? tracesys+0xdd/0xe2

Debugging this bug, which triggers when the rb_iter_peek() loops too
many times (more than 2 times), I discovered there's a case that can
cause that function to legitimately loop 3 times!

rb_iter_peek() is different than rb_buffer_peek() as the rb_buffer_peek()
only deals with the reader page (it's for consuming reads). The
rb_iter_peek() is for traversing the buffer without consuming it, and as
such, it can loop for one more reason. That is, if we hit the end of
the reader page or any page, it will go to the next page and try again.

That is, we have this:

 1. iter-&gt;head &gt; iter-&gt;head_page-&gt;page-&gt;commit
    (rb_inc_iter() which moves the iter to the next page)
    try again

 2. event = rb_iter_head_event()
    event-&gt;type_len == RINGBUF_TYPE_TIME_EXTEND
    rb_advance_iter()
    try again

 3. read the event.

But we never get to 3, because the count is greater than 2 and we
cause the WARNING and return NULL.

Up the counter to 3.

Fixes: 69d1b839f7ee "ring-buffer: Bind time extend and data events together"
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
[bwh: Backported to 3.2: drop inapplicable spelling correction]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>ring-buffer: Fix first commit on sub-buffer having non-zero delta</title>
<updated>2014-04-01T23:58:51Z</updated>
<author>
<name>Steven Rostedt (Red Hat)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2014-02-11T18:38:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b52b81c22aca6ce4a38015ec54911ede7ab563bc'/>
<id>urn:sha1:b52b81c22aca6ce4a38015ec54911ede7ab563bc</id>
<content type='text'>
commit d651aa1d68a2f0a7ee65697b04c6a92f8c0a12f2 upstream.

Each sub-buffer (buffer page) has a full 64 bit timestamp. The events on
that page use a 27 bit delta against that timestamp in order to save on
bits written to the ring buffer. If the time between events is larger than
what the 27 bits can hold, a "time extend" event is added to hold the
entire 64 bit timestamp again and the events after that hold a delta from
that timestamp.

As a "time extend" is always paired with an event, it is logical to just
allocate the event with the time extend, to make things a bit more efficient.

Unfortunately, when the pairing code was written, it removed the "delta = 0"
from the first commit on a page, causing the events on the page to be
slightly skewed.

Fixes: 69d1b839f7ee "ring-buffer: Bind time extend and data events together"
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>ring-buffer: Fix NULL pointer if rb_set_head_page() fails</title>
<updated>2013-01-03T03:33:24Z</updated>
<author>
<name>Steven Rostedt</name>
<email>srostedt@redhat.com</email>
</author>
<published>2012-11-30T03:27:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=51e8eac1d5138ab32debce67202b975c08920029'/>
<id>urn:sha1:51e8eac1d5138ab32debce67202b975c08920029</id>
<content type='text'>
commit 54f7be5b831254199522523ccab4c3d954bbf576 upstream.

The function rb_set_head_page() searches the list of ring buffer
pages for a the page that has the HEAD page flag set. If it does
not find it, it will do a WARN_ON(), disable the ring buffer and
return NULL, as this should never happen.

But if this bug happens to happen, not all callers of this function
can handle a NULL pointer being returned from it. That needs to be
fixed.

Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2011-10-26T15:03:38Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-10-26T15:03:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7115e3fcf45514db7525a05365b10454ff7f345e'/>
<id>urn:sha1:7115e3fcf45514db7525a05365b10454ff7f345e</id>
<content type='text'>
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (121 commits)
  perf symbols: Increase symbol KSYM_NAME_LEN size
  perf hists browser: Refuse 'a' hotkey on non symbolic views
  perf ui browser: Use libslang to read keys
  perf tools: Fix tracing info recording
  perf hists browser: Elide DSO column when it is set to just one DSO, ditto for threads
  perf hists: Don't consider filtered entries when calculating column widths
  perf hists: Don't decay total_period for filtered entries
  perf hists browser: Honour symbol_conf.show_{nr_samples,total_period}
  perf hists browser: Do not exit on tab key with single event
  perf annotate browser: Don't change selection line when returning from callq
  perf tools: handle endianness of feature bitmap
  perf tools: Add prelink suggestion to dso update message
  perf script: Fix unknown feature comment
  perf hists browser: Apply the dso and thread filters when merging new batches
  perf hists: Move the dso and thread filters from hist_browser
  perf ui browser: Honour the xterm colors
  perf top tui: Give color hints just on the percentage, like on --stdio
  perf ui browser: Make the colors configurable and change the defaults
  perf tui: Remove unneeded call to newtCls on startup
  perf hists: Don't format the percentage on hist_entry__snprintf
  ...

Fix up conflicts in arch/x86/kernel/kprobes.c manually.

Ingo's tree did the insane "add volatile to const array", which just
doesn't make sense ("volatile const"?).  But we could remove the const
*and* make the array volatile to make doubly sure that gcc doesn't
optimize it away..

Also fix up kernel/trace/ring_buffer.c non-data-conflicts manually: the
reader_lock has been turned into a raw lock by the core locking merge,
and there was a new user of it introduced in this perf core merge.  Make
sure that new use also uses the raw accessor functions.
</content>
</entry>
<entry>
<title>locking, tracing: Annotate tracing locks as raw</title>
<updated>2011-09-13T09:11:52Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2009-07-25T15:13:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5389f6fad27019f2ba78f1b332f719ec05f12a42'/>
<id>urn:sha1:5389f6fad27019f2ba78f1b332f719ec05f12a42</id>
<content type='text'>
The tracing locks can be taken in atomic context and therefore
cannot be preempted on -rt - annotate it.

In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>trace: Add ring buffer stats to measure rate of events</title>
<updated>2011-08-30T16:27:45Z</updated>
<author>
<name>Vaibhav Nagarnaik</name>
<email>vnagarnaik@google.com</email>
</author>
<published>2011-08-16T21:46:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c64e148a3be3cb786534ad38298c25c833116c26'/>
<id>urn:sha1:c64e148a3be3cb786534ad38298c25c833116c26</id>
<content type='text'>
The stats file under per_cpu folder provides the number of entries,
overruns and other statistics about the CPU ring buffer. However, the
numbers do not provide any indication of how full the ring buffer is in
bytes compared to the overall size in bytes. Also, it is helpful to know
the rate at which the cpu buffer is filling up.

This patch adds an entry "bytes: " in printed stats for per_cpu ring
buffer which provides the actual bytes consumed in the ring buffer. This
field includes the number of bytes used by recorded events and the
padding bytes added when moving the tail pointer to next page.

It also adds the following time stamps:
"oldest event ts:" - the oldest timestamp in the ring buffer
"now ts:"  - the timestamp at the time of reading

The field "now ts" provides a consistent time snapshot to the userspace
when being read. This is read from the same trace clock used by tracing
event timestamps.

Together, these values provide the rate at which the buffer is filling
up, from the formula:
bytes / (now_ts - oldest_event_ts)

Signed-off-by: Vaibhav Nagarnaik &lt;vnagarnaik@google.com&gt;
Cc: Michael Rubin &lt;mrubin@google.com&gt;
Cc: David Sharp &lt;dhsharp@google.com&gt;
Link: http://lkml.kernel.org/r/1313531179-9323-3-git-send-email-vnagarnaik@google.com
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>ring-buffer: Set __GFP_NORETRY flag for ring buffer allocating process</title>
<updated>2011-06-15T02:48:51Z</updated>
<author>
<name>Vaibhav Nagarnaik</name>
<email>vnagarnaik@google.com</email>
</author>
<published>2011-06-08T00:01:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d7ec4bfed6c97405c6417970ba06c439e08ab8e3'/>
<id>urn:sha1:d7ec4bfed6c97405c6417970ba06c439e08ab8e3</id>
<content type='text'>
The tracing ring buffer is allocated from kernel memory. While
allocating a large chunk of memory, OOM might happen which destabilizes
the system. Thus random processes might get killed during the
allocation.

This patch adds __GFP_NORETRY flag to the ring buffer allocation calls
to make it fail more gracefully if the system will not be able to
complete the allocation request.

Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Vaibhav Nagarnaik &lt;vnagarnaik@google.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Michael Rubin &lt;mrubin@google.com&gt;
Cc: David Sharp &lt;dhsharp@google.com&gt;
Link: http://lkml.kernel.org/r/1307491302-9236-1-git-send-email-vnagarnaik@google.com
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
</content>
</entry>
<entry>
<title>tracing: Convert to kstrtoul_from_user</title>
<updated>2011-06-15T02:48:50Z</updated>
<author>
<name>Peter Huewe</name>
<email>peterhuewe@gmx.de</email>
</author>
<published>2011-06-07T19:58:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=22fe9b54d859e53bfbbbdc1a0a77a82bc453927c'/>
<id>urn:sha1:22fe9b54d859e53bfbbbdc1a0a77a82bc453927c</id>
<content type='text'>
This patch replaces the code for getting an unsigned long from a
userspace buffer by a simple call to kstroul_from_user.
This makes it easier to read and less error prone.

Signed-off-by: Peter Huewe &lt;peterhuewe@gmx.de&gt;
Link: http://lkml.kernel.org/r/1307476707-14762-1-git-send-email-peterhuewe@gmx.de
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
</content>
</entry>
</feed>
