<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/tools/perf, branch v4.11.8</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.11.8</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.11.8'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2017-06-29T11:02:46Z</updated>
<entry>
<title>perf probe: Fix probe definition for inlined functions</title>
<updated>2017-06-29T11:02:46Z</updated>
<author>
<name>Björn Töpel</name>
<email>bjorn.topel@intel.com</email>
</author>
<published>2017-06-21T16:41:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=75d735389044a333d6ecd44e03fb25f8b413bf92'/>
<id>urn:sha1:75d735389044a333d6ecd44e03fb25f8b413bf92</id>
<content type='text'>
commit 7598f8bc1383ffd77686cb4e92e749bef3c75937 upstream.

In commit 613f050d68a8 ("perf probe: Fix to probe on gcc generated
functions in modules"), the offset from symbol is, incorrectly, added
to the trace point address. This leads to incorrect probe trace points
for inlined functions and when using relative line number on symbols.

Prior this patch:
  $ perf probe -m nf_nat -D in_range
  p:probe/in_range nf_nat:in_range.isra.9+0
  $ perf probe -m i40e -D i40e_clean_rx_irq
  p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2212
  $ perf probe -m i40e -D i40e_clean_rx_irq:16
  p:probe/i40e_clean_rx_irq i40e:i40e_lan_xmit_frame+626

After:
  $ perf probe -m nf_nat -D in_range
  p:probe/in_range nf_nat:in_range.isra.9+0
  $ perf probe -m i40e -D i40e_clean_rx_irq
  p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+1106
  $ perf probe -m i40e -D i40e_clean_rx_irq:16
  p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2665

Committer testing:

Using 'pfunct', a tool found in the 'dwarves' package [1], one can ask what are
the functions that while not being explicitely marked as inline, were inlined
by the compiler:

  # pfunct --cc_inlined /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko | head
  __ew32
  e1000_regdump
  e1000e_dump_ps_pages
  e1000_desc_unused
  e1000e_systim_to_hwtstamp
  e1000e_rx_hwtstamp
  e1000e_update_rdt_wa
  e1000e_update_tdt_wa
  e1000_put_txbuf
  e1000_consume_page

Then ask 'perf probe' to produce the kprobe_tracer probe definitions for two of
them:

  # perf probe -m e1000e -D e1000e_rx_hwtstamp
  p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+74

  # perf probe -m e1000e -D e1000_consume_page
  p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+876
  p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+1506
  p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074

Now lets concentrate on the 'e1000_consume_page' one, that was inlined twice in
e1000_clean_jumbo_rx_irq(), lets see what readelf says about the DWARF tags for
that function:

  $ readelf -wi /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
  &lt;SNIP&gt;
  &lt;1&gt;&lt;13e27b&gt;: Abbrev Number: 121 (DW_TAG_subprogram)
    &lt;13e27c&gt;   DW_AT_name        : (indirect string, offset: 0xa8945): e1000_clean_jumbo_rx_irq
    &lt;13e287&gt;   DW_AT_low_pc      : 0x17a30
  &lt;3&gt;&lt;13e6ef&gt;: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
    &lt;13e6f0&gt;   DW_AT_abstract_origin: &lt;0x13ed2c&gt;
    &lt;13e6f4&gt;   DW_AT_low_pc      : 0x17be6
  &lt;SNIP&gt;
  &lt;1&gt;&lt;13ed2c&gt;: Abbrev Number: 142 (DW_TAG_subprogram)
     &lt;13ed2e&gt;   DW_AT_name        : (indirect string, offset: 0xa54c3): e1000_consume_page

So, the first time in e1000_clean_jumbo_rx_irq() where e1000_consume_page() is
inlined is at PC 0x17be6, which subtracted from e1000_clean_jumbo_rx_irq()'s
address, gives us the offset we should use in the probe definition:

  0x17be6 - 0x17a30 = 438

but above we have 876, which is twice as much.

Lets see the second inline expansion of e1000_consume_page() in
e1000_clean_jumbo_rx_irq():

  &lt;3&gt;&lt;13e86e&gt;: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
    &lt;13e86f&gt;   DW_AT_abstract_origin: &lt;0x13ed2c&gt;
    &lt;13e873&gt;   DW_AT_low_pc      : 0x17d21

  0x17d21 - 0x17a30 = 753

So we where adding it at twice the offset from the containing function as we
should.

And then after this patch:

  # perf probe -m e1000e -D e1000e_rx_hwtstamp
  p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+37

  # perf probe -m e1000e -D e1000_consume_page
  p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+438
  p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+753
  p:probe/e1000_consume_page_2 e1000e:e1000_clean_jumbo_rx_irq+1353
  #

Which matches the two first expansions and shows that because we were
doubling the offset it would spill over the next function:

  readelf -sw /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
   673: 0000000000017a30  1626 FUNC    LOCAL  DEFAULT    2 e1000_clean_jumbo_rx_irq
   674: 0000000000018090  2013 FUNC    LOCAL  DEFAULT    2 e1000_clean_rx_irq_ps

This is the 3rd inline expansion of e1000_consume_page() in
e1000_clean_jumbo_rx_irq():

   &lt;3&gt;&lt;13ec77&gt;: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
    &lt;13ec78&gt;   DW_AT_abstract_origin: &lt;0x13ed2c&gt;
    &lt;13ec7c&gt;   DW_AT_low_pc      : 0x17f79

  0x17f79 - 0x17a30 = 1353

 So:

   0x17a30 + 2 * 1353 = 0x184c2

  And:

   0x184c2 - 0x18090 = 1074

Which explains the bogus third expansion for e1000_consume_page() to end up at:

   p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074

All fixed now :-)

[1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/

Signed-off-by: Björn Töpel &lt;bjorn.topel@intel.com&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Acked-by: Magnus Karlsson &lt;magnus.karlsson@intel.com&gt;
Acked-by: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Fixes: 613f050d68a8 ("perf probe: Fix to probe on gcc generated functions in modules")
Link: http://lkml.kernel.org/r/20170621164134.5701-1-bjorn.topel@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>perf annotate s390: Implement jump types for perf annotate</title>
<updated>2017-05-20T12:49:45Z</updated>
<author>
<name>Christian Borntraeger</name>
<email>borntraeger@de.ibm.com</email>
</author>
<published>2017-04-06T07:51:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7a17cbb4b8dec4a608869138a7f668314d70437b'/>
<id>urn:sha1:7a17cbb4b8dec4a608869138a7f668314d70437b</id>
<content type='text'>
commit d9f8dfa9baf9b6ae1f2f84f887176558ecde5268 upstream.

Implement simple detection for all kind of jumps and branches.

Signed-off-by: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Cc: Andreas Krebbel &lt;krebbel@linux.vnet.ibm.com&gt;
Cc: Hendrik Brueckner &lt;brueckner@linux.vnet.ibm.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: linux-s390 &lt;linux-s390@vger.kernel.org&gt;
Link: http://lkml.kernel.org/r/1491465112-45819-3-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>perf annotate s390: Fix perf annotate error -95 (4.10 regression)</title>
<updated>2017-05-20T12:49:45Z</updated>
<author>
<name>Christian Borntraeger</name>
<email>borntraeger@de.ibm.com</email>
</author>
<published>2017-04-06T07:51:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b09eace76ed399b92a4ede8d2854bb2347a9cbab'/>
<id>urn:sha1:b09eace76ed399b92a4ede8d2854bb2347a9cbab</id>
<content type='text'>
commit e77852b32d6d4430c68c38aaf73efe5650fa25af upstream.

since 4.10 perf annotate exits on s390 with an "unknown error -95".
Turns out that commit 786c1b51844d ("perf annotate: Start supporting
cross arch annotation") added a hard requirement for architecture
support when objdump is used but only provided x86 and arm support.
Meanwhile power was added so lets add s390 as well.

While at it make sure to implement the branch and jump types.

Signed-off-by: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Cc: Andreas Krebbel &lt;krebbel@linux.vnet.ibm.com&gt;
Cc: Hendrik Brueckner &lt;brueckner@linux.vnet.ibm.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: linux-s390 &lt;linux-s390@vger.kernel.org&gt;
Fixes: 786c1b51844 "perf annotate: Start supporting cross arch annotation"
Link: http://lkml.kernel.org/r/1491465112-45819-2-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>perf auxtrace: Fix no_size logic in addr_filter__resolve_kernel_syms()</title>
<updated>2017-05-20T12:49:45Z</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2017-03-24T12:15:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=08b0b5bd543a9f8fee67283e980cd7db51165bf9'/>
<id>urn:sha1:08b0b5bd543a9f8fee67283e980cd7db51165bf9</id>
<content type='text'>
commit c3a0bbc7ad7598dec5a204868bdf8a2b1b51df14 upstream.

Address filtering with kernel symbols incorrectly resulted in the error
"Cannot determine size of symbol" because the no_size logic was the wrong
way around.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Tested-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Link: http://lkml.kernel.org/r/1490357752-27942-1-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Merge tag 'perf-urgent-for-mingo-4.11-20170411' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent</title>
<updated>2017-04-11T19:41:39Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2017-04-11T19:41:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0718b334068475d5eac4e25f1a88981880917f1d'/>
<id>urn:sha1:0718b334068475d5eac4e25f1a88981880917f1d</id>
<content type='text'>
Pull 'perf annotate' fix for s390:

- The move to support cross arch annotation introduced per arch
  initialization requirements, fullfill them for s/390 (Christian Borntraeger)

Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>perf annotate s390: Fix perf annotate error -95 (4.10 regression)</title>
<updated>2017-04-07T15:33:10Z</updated>
<author>
<name>Christian Borntraeger</name>
<email>borntraeger@de.ibm.com</email>
</author>
<published>2017-04-06T07:51:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3c1a427954399fd1bda1ee7e1b356f47b61cee74'/>
<id>urn:sha1:3c1a427954399fd1bda1ee7e1b356f47b61cee74</id>
<content type='text'>
since 4.10 perf annotate exits on s390 with an "unknown error -95".
Turns out that commit 786c1b51844d ("perf annotate: Start supporting
cross arch annotation") added a hard requirement for architecture
support when objdump is used but only provided x86 and arm support.
Meanwhile power was added so lets add s390 as well.

While at it make sure to implement the branch and jump types.

Signed-off-by: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Cc: Andreas Krebbel &lt;krebbel@linux.vnet.ibm.com&gt;
Cc: Hendrik Brueckner &lt;brueckner@linux.vnet.ibm.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: linux-s390 &lt;linux-s390@vger.kernel.org&gt;
Cc: stable@kernel.org # v4.10+
Fixes: 786c1b51844 "perf annotate: Start supporting cross arch annotation"
Link: http://lkml.kernel.org/r/1491465112-45819-2-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2017-03-17T20:59:52Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-03-17T20:59:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a7fc726bb2e3ab3e960064392aa7cb66999d6927'/>
<id>urn:sha1:a7fc726bb2e3ab3e960064392aa7cb66999d6927</id>
<content type='text'>
Pull perf fixes from Thomas Gleixner:
 "A set of perf related fixes:

   - fix a CR4.PCE propagation issue caused by usage of mm instead of
     active_mm and therefore propagated the wrong value.

   - perf core fixes, which plug a use-after-free issue and make the
     event inheritance on fork more robust.

   - a tooling fix for symbol handling"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf symbols: Fix symbols__fixup_end heuristic for corner cases
  x86/perf: Clarify why x86_pmu_event_mapped() isn't racy
  x86/perf: Fix CR4.PCE propagation to use active_mm instead of mm
  perf/core: Better explain the inherit magic
  perf/core: Simplify perf_event_free_task()
  perf/core: Fix event inheritance on fork()
  perf/core: Fix use-after-free in perf_release()
</content>
</entry>
<entry>
<title>perf symbols: Fix symbols__fixup_end heuristic for corner cases</title>
<updated>2017-03-17T13:30:22Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2017-03-15T21:53:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e7ede72a6d40cb3a30c087142d79381ca8a31dab'/>
<id>urn:sha1:e7ede72a6d40cb3a30c087142d79381ca8a31dab</id>
<content type='text'>
The current symbols__fixup_end() heuristic for the last entry in the rb
tree is suboptimal as it leads to not being able to recognize the symbol
in the call graph in a couple of corner cases, for example:

 i) If the symbol has a start address (f.e. exposed via kallsyms)
    that is at a page boundary, then the roundup(curr-&gt;start, 4096)
    for the last entry will result in curr-&gt;start == curr-&gt;end with
    a symbol length of zero.

ii) If the symbol has a start address that is shortly before a page
    boundary, then also here, curr-&gt;end - curr-&gt;start will just be
    very few bytes, where it's unrealistic that we could perform a
    match against.

Instead, change the heuristic to roundup(curr-&gt;start, 4096) + 4096, so
that we can catch such corner cases and have a better chance to find
that specific symbol. It's still just best effort as the real end of the
symbol is unknown to us (and could even be at a larger offset than the
current range), but better than the current situation.

Alexei reported that he recently run into case i) with a JITed eBPF
program (these are all page aligned) as the last symbol which wasn't
properly shown in the call graph (while other eBPF program symbols in
the rb tree were displayed correctly). Since this is a generic issue,
lets try to improve the heuristic a bit.

Reported-and-Tested-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Fixes: 2e538c4a1847 ("perf tools: Improve kernel/modules symbol lookup")
Link: http://lkml.kernel.org/r/bb5c80d27743be6f12afc68405f1956a330e1bc9.1489614365.git.daniel@iogearbox.net
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>x86/events: Remove last remnants of old filenames</title>
<updated>2017-03-01T10:27:26Z</updated>
<author>
<name>Borislav Petkov</name>
<email>bp@suse.de</email>
</author>
<published>2017-02-18T11:31:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=940b2f2fd963c043418ce8af64605783b2b19140'/>
<id>urn:sha1:940b2f2fd963c043418ce8af64605783b2b19140</id>
<content type='text'>
Update to the new file paths, remove them from introductory comments.

Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/20170218113140.8051-1-bp@alien8.de
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;

</content>
</entry>
<entry>
<title>Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2017-02-28T19:38:18Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-02-28T19:38:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3f26b0c876bbfeed74325ada0329de53efbdf7a6'/>
<id>urn:sha1:3f26b0c876bbfeed74325ada0329de53efbdf7a6</id>
<content type='text'>
Pull perf fixes from Ingo Molnar:
 "Misc fixes on the kernel and tooling side - nothing in particular
  stands out"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  perf/core: Fix the perf_cpu_time_max_percent check
  perf/core: Fix perf_event_enable_on_exec() timekeeping (again)
  perf/core: Remove confusing comment and move put_ctx()
  perf record: Honor --quiet option properly
  perf annotate: Add -q/--quiet option
  perf diff: Add -q/--quiet option
  perf report: Add -q/--quiet option
  perf utils: Check verbose flag properly
  perf utils: Add perf_quiet_option()
  perf record: Add -a as default target
  perf stat: Add -a as default target
  perf tools: Fail on using multiple bits long terms without value
  perf tools: Move new_term arguments into struct parse_events_term template
  perf build: Add special fixdep cleaning rule
  perf tools: Replace _SC_NPROCESSORS_CONF with max_present_cpu in cpu_topology_map
  perf header: Make build_cpu_topology skip offline/absent CPUs
  perf cpumap: Add cpu__max_present_cpu()
  perf session: Fix DEBUG=1 build with clang
  tools lib traceevent: It's preempt not prempt
  perf python: Filter out -specs=/a/b/c from the python binding cc options
  ...
</content>
</entry>
</feed>
