<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/sysctl.c, branch v3.8-rc2</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.8-rc2</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.8-rc2'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2012-12-16T23:18:08Z</updated>
<entry>
<title>Merge tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma</title>
<updated>2012-12-16T23:18:08Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-12-16T22:33:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3d59eebc5e137bd89c6351e4c70e90ba1d0dc234'/>
<id>urn:sha1:3d59eebc5e137bd89c6351e4c70e90ba1d0dc234</id>
<content type='text'>
Pull Automatic NUMA Balancing bare-bones from Mel Gorman:
 "There are three implementations for NUMA balancing, this tree
  (balancenuma), numacore which has been developed in tip/master and
  autonuma which is in aa.git.

  In almost all respects balancenuma is the dumbest of the three because
  its main impact is on the VM side with no attempt to be smart about
  scheduling.  In the interest of getting the ball rolling, it would be
  desirable to see this much merged for 3.8 with the view to building
  scheduler smarts on top and adapting the VM where required for 3.9.

  The most recent set of comparisons available from different people are

    mel:    https://lkml.org/lkml/2012/12/9/108
    mingo:  https://lkml.org/lkml/2012/12/7/331
    tglx:   https://lkml.org/lkml/2012/12/10/437
    srikar: https://lkml.org/lkml/2012/12/10/397

  The results are a mixed bag.  In my own tests, balancenuma does
  reasonably well.  It's dumb as rocks and does not regress against
  mainline.  On the other hand, Ingo's tests shows that balancenuma is
  incapable of converging for this workloads driven by perf which is bad
  but is potentially explained by the lack of scheduler smarts.  Thomas'
  results show balancenuma improves on mainline but falls far short of
  numacore or autonuma.  Srikar's results indicate we all suffer on a
  large machine with imbalanced node sizes.

  My own testing showed that recent numacore results have improved
  dramatically, particularly in the last week but not universally.
  We've butted heads heavily on system CPU usage and high levels of
  migration even when it shows that overall performance is better.
  There are also cases where it regresses.  Of interest is that for
  specjbb in some configurations it will regress for lower numbers of
  warehouses and show gains for higher numbers which is not reported by
  the tool by default and sometimes missed in treports.  Recently I
  reported for numacore that the JVM was crashing with
  NullPointerExceptions but currently it's unclear what the source of
  this problem is.  Initially I thought it was in how numacore batch
  handles PTEs but I'm no longer think this is the case.  It's possible
  numacore is just able to trigger it due to higher rates of migration.

  These reports were quite late in the cycle so I/we would like to start
  with this tree as it contains much of the code we can agree on and has
  not changed significantly over the last 2-3 weeks."

* tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma: (50 commits)
  mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable
  mm/rmap: Convert the struct anon_vma::mutex to an rwsem
  mm: migrate: Account a transhuge page properly when rate limiting
  mm: numa: Account for failed allocations and isolations as migration failures
  mm: numa: Add THP migration for the NUMA working set scanning fault case build fix
  mm: numa: Add THP migration for the NUMA working set scanning fault case.
  mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node
  mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG
  mm: sched: numa: Control enabling and disabling of NUMA balancing
  mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate
  mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task&lt;-&gt;node relationships
  mm: numa: migrate: Set last_nid on newly allocated page
  mm: numa: split_huge_page: Transfer last_nid on tail page
  mm: numa: Introduce last_nid to the page frame
  sched: numa: Slowly increase the scanning period as NUMA faults are handled
  mm: numa: Rate limit setting of pte_numa if node is saturated
  mm: numa: Rate limit the amount of memory that is migrated between nodes
  mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting
  mm: numa: Migrate pages handled during a pmd_numa hinting fault
  mm: numa: Migrate on reference policy
  ...
</content>
</entry>
<entry>
<title>mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate</title>
<updated>2012-12-11T14:42:55Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2012-11-21T01:18:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b8593bfda1652755136333cdd362de125b283a9c'/>
<id>urn:sha1:b8593bfda1652755136333cdd362de125b283a9c</id>
<content type='text'>
The PTE scanning rate and fault rates are two of the biggest sources of
system CPU overhead with automatic NUMA placement.  Ideally a proper policy
would detect if a workload was properly placed, schedule and adjust the
PTE scanning rate accordingly. We do not track the necessary information
to do that but we at least know if we migrated or not.

This patch scans slower if a page was not migrated as the result of a
NUMA hinting fault up to sysctl_numa_balancing_scan_period_max which is
now higher than the previous default. Once every minute it will reset
the scanner in case of phase changes.

This is hilariously crude and the numbers are arbitrary. Workloads will
converge quite slowly in comparison to what a proper policy should be able
to do. On the plus side, we will chew up less CPU for workloads that have
no need for automatic balancing.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
</content>
</entry>
<entry>
<title>mm: sched: numa: Implement slow start for working set sampling</title>
<updated>2012-12-11T14:42:47Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2012-10-25T12:16:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4b96a29ba891dd59734cb7be80a900fe93aa2d9f'/>
<id>urn:sha1:4b96a29ba891dd59734cb7be80a900fe93aa2d9f</id>
<content type='text'>
Add a 1 second delay before starting to scan the working set of
a task and starting to balance it amongst nodes.

[ note that before the constant per task WSS sampling rate patch
  the initial scan would happen much later still, in effect that
  patch caused this regression. ]

The theory is that short-run tasks benefit very little from NUMA
placement: they come and go, and they better stick to the node
they were started on. As tasks mature and rebalance to other CPUs
and nodes, so does their NUMA placement have to change and so
does it start to matter more and more.

In practice this change fixes an observable kbuild regression:

   # [ a perf stat --null --repeat 10 test of ten bzImage builds to /dev/shm ]

   !NUMA:
   45.291088843 seconds time elapsed                                          ( +-  0.40% )
   45.154231752 seconds time elapsed                                          ( +-  0.36% )

   +NUMA, no slow start:
   46.172308123 seconds time elapsed                                          ( +-  0.30% )
   46.343168745 seconds time elapsed                                          ( +-  0.25% )

   +NUMA, 1 sec slow start:
   45.224189155 seconds time elapsed                                          ( +-  0.25% )
   45.160866532 seconds time elapsed                                          ( +-  0.17% )

and it also fixes an observable perf bench (hackbench) regression:

   # perf stat --null --repeat 10 perf bench sched messaging

   -NUMA:

   -NUMA:                  0.246225691 seconds time elapsed                   ( +-  1.31% )
   +NUMA no slow start:    0.252620063 seconds time elapsed                   ( +-  1.13% )

   +NUMA 1sec delay:       0.248076230 seconds time elapsed                   ( +-  1.35% )

The implementation is simple and straightforward, most of the patch
deals with adding the /proc/sys/kernel/numa_balancing_scan_delay_ms tunable
knob.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
[ Wrote the changelog, ran measurements, tuned the default. ]
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
</content>
</entry>
<entry>
<title>mm: sched: numa: Implement constant, per task Working Set Sampling (WSS) rate</title>
<updated>2012-12-11T14:42:46Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2012-10-25T12:16:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6e5fb223e89dbe5cb5c563f8d4a4a0a7d62455a8'/>
<id>urn:sha1:6e5fb223e89dbe5cb5c563f8d4a4a0a7d62455a8</id>
<content type='text'>
Previously, to probe the working set of a task, we'd use
a very simple and crude method: mark all of its address
space PROT_NONE.

That method has various (obvious) disadvantages:

 - it samples the working set at dissimilar rates,
   giving some tasks a sampling quality advantage
   over others.

 - creates performance problems for tasks with very
   large working sets

 - over-samples processes with large address spaces but
   which only very rarely execute

Improve that method by keeping a rotating offset into the
address space that marks the current position of the scan,
and advance it by a constant rate (in a CPU cycles execution
proportional manner). If the offset reaches the last mapped
address of the mm then it then it starts over at the first
address.

The per-task nature of the working set sampling functionality in this tree
allows such constant rate, per task, execution-weight proportional sampling
of the working set, with an adaptive sampling interval/frequency that
goes from once per 100ms up to just once per 8 seconds.  The current
sampling volume is 256 MB per interval.

As tasks mature and converge their working set, so does the
sampling rate slow down to just a trickle, 256 MB per 8
seconds of CPU time executed.

This, beyond being adaptive, also rate-limits rarely
executing systems and does not over-sample on overloaded
systems.

[ In AutoNUMA speak, this patch deals with the effective sampling
  rate of the 'hinting page fault'. AutoNUMA's scanning is
  currently rate-limited, but it is also fundamentally
  single-threaded, executing in the knuma_scand kernel thread,
  so the limit in AutoNUMA is global and does not scale up with
  the number of CPUs, nor does it scan tasks in an execution
  proportional manner.

  So the idea of rate-limiting the scanning was first implemented
  in the AutoNUMA tree via a global rate limit. This patch goes
  beyond that by implementing an execution rate proportional
  working set sampling rate that is not implemented via a single
  global scanning daemon. ]

[ Dan Carpenter pointed out a possible NULL pointer dereference in the
  first version of this patch. ]

Based-on-idea-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Bug-Found-By: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
[ Wrote changelog and fixed bug. ]
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
</content>
</entry>
<entry>
<title>mm: numa: Add fault driven placement and migration</title>
<updated>2012-12-11T14:42:45Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2012-10-25T12:16:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cbee9f88ec1b8dd6b58f25f54e4f52c82ed77690'/>
<id>urn:sha1:cbee9f88ec1b8dd6b58f25f54e4f52c82ed77690</id>
<content type='text'>
NOTE: This patch is based on "sched, numa, mm: Add fault driven
	placement and migration policy" but as it throws away all the policy
	to just leave a basic foundation I had to drop the signed-offs-by.

This patch creates a bare-bones method for setting PTEs pte_numa in the
context of the scheduler that when faulted later will be faulted onto the
node the CPU is running on.  In itself this does nothing useful but any
placement policy will fundamentally depend on receiving hints on placement
from fault context and doing something intelligent about it.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
</content>
</entry>
<entry>
<title>sysctl: remove CONFIG_HOTPLUG ifdefs</title>
<updated>2012-11-28T18:33:03Z</updated>
<author>
<name>Bill Pemberton</name>
<email>wfp5p@virginia.edu</email>
</author>
<published>2012-11-19T18:19:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3b572b506c76a7a38c8debe45b50d09295620810'/>
<id>urn:sha1:3b572b506c76a7a38c8debe45b50d09295620810</id>
<content type='text'>
Remove conditional code based on CONFIG_HOTPLUG being false.  It's
always on now in preparation of it going away as an option.

Signed-off-by: Bill Pemberton &lt;wfp5p@virginia.edu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Kconfig: clean up the "#if defined(arch)" list for exception-trace sysctl entry</title>
<updated>2012-10-09T07:22:14Z</updated>
<author>
<name>Catalin Marinas</name>
<email>catalin.marinas@arm.com</email>
</author>
<published>2012-10-08T23:28:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7ac57a89de958fbb5271dc504d0c25e34dbeec32'/>
<id>urn:sha1:7ac57a89de958fbb5271dc504d0c25e34dbeec32</id>
<content type='text'>
Introduce SYSCTL_EXCEPTION_TRACE config option and selec it in the
architectures requiring support for the "exception-trace" debug_table
entry in kernel/sysctl.c.

Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Chris Metcalf &lt;cmetcalf@tilera.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>coredump: make core dump functionality optional</title>
<updated>2012-10-05T18:05:15Z</updated>
<author>
<name>Alex Kelly</name>
<email>alex.page.kelly@gmail.com</email>
</author>
<published>2012-10-05T00:15:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=046d662f481830e652ac34cd112249adde16452a'/>
<id>urn:sha1:046d662f481830e652ac34cd112249adde16452a</id>
<content type='text'>
Adds an expert Kconfig option, CONFIG_COREDUMP, which allows disabling of
core dump.  This saves approximately 2.6k in the compiled kernel, and
complements CONFIG_ELF_CORE, which now depends on it.

CONFIG_COREDUMP also disables coredump-related sysctls, except for
suid_dumpable and related functions, which are necessary for ptrace.

[akpm@linux-foundation.org: fix binfmt_aout.c build]
Signed-off-by: Alex Kelly &lt;alex.page.kelly@gmail.com&gt;
Reviewed-by: Josh Triplett &lt;josh@joshtriplett.org&gt;
Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Randy Dunlap &lt;rdunlap@xenotime.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'arm64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64</title>
<updated>2012-10-01T18:51:57Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2012-10-01T18:51:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=81f56e5375e84689b891e0e6c5a02ec12a1f18d9'/>
<id>urn:sha1:81f56e5375e84689b891e0e6c5a02ec12a1f18d9</id>
<content type='text'>
Pull arm64 support from Catalin Marinas:
 "Linux support for the 64-bit ARM architecture (AArch64)

  Features currently supported:
   - 39-bit address space for user and kernel (each)
   - 4KB and 64KB page configurations
   - Compat (32-bit) user applications (ARMv7, EABI only)
   - Flattened Device Tree (mandated for all AArch64 platforms)
   - ARM generic timers"

* tag 'arm64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64: (35 commits)
  arm64: ptrace: remove obsolete ptrace request numbers from user headers
  arm64: Do not set the SMP/nAMP processor bit
  arm64: MAINTAINERS update
  arm64: Build infrastructure
  arm64: Miscellaneous header files
  arm64: Generic timers support
  arm64: Loadable modules
  arm64: Miscellaneous library functions
  arm64: Performance counters support
  arm64: Add support for /proc/sys/debug/exception-trace
  arm64: Debugging support
  arm64: Floating point and SIMD
  arm64: 32-bit (compat) applications support
  arm64: User access library functions
  arm64: Signal handling support
  arm64: VDSO support
  arm64: System calls handling
  arm64: ELF definitions
  arm64: SMP support
  arm64: DMA mapping API
  ...
</content>
</entry>
<entry>
<title>arm64: Add support for /proc/sys/debug/exception-trace</title>
<updated>2012-09-17T12:42:16Z</updated>
<author>
<name>Catalin Marinas</name>
<email>catalin.marinas@arm.com</email>
</author>
<published>2012-08-14T16:08:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5c4233697c3f5cb14eb7a969332e2d60f357f952'/>
<id>urn:sha1:5c4233697c3f5cb14eb7a969332e2d60f357f952</id>
<content type='text'>
This patch allows setting of the show_unhandled_signals variable via
/proc/sys/debug/exception-trace. The default value is currently 1
showing unhandled user faults (undefined instructions, data aborts) and
invalid signal stack frames.

Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Acked-by: Tony Lindgren &lt;tony@atomide.com&gt;
Acked-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Acked-by: Nicolas Pitre &lt;nico@linaro.org&gt;
Acked-by: Olof Johansson &lt;olof@lixom.net&gt;
Acked-by: Santosh Shilimkar &lt;santosh.shilimkar@ti.com&gt;
</content>
</entry>
</feed>
