<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/topology.h, branch v3.2.45</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2.45</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2.45'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2011-09-20T05:53:21Z</updated>
<entry>
<title>sched: Allow SD_NODES_PER_DOMAIN to be overridden</title>
<updated>2011-09-20T05:53:21Z</updated>
<author>
<name>Anton Blanchard</name>
<email>anton@samba.org</email>
</author>
<published>2011-07-24T16:33:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=590e4d857153c5d4cf86052cdfd42cf9b0779841'/>
<id>urn:sha1:590e4d857153c5d4cf86052cdfd42cf9b0779841</id>
<content type='text'>
We want to override the default value of SD_NODES_PER_DOMAIN on ppc64,
so move it into linux/topology.h.

Signed-off-by: Anton Blanchard &lt;anton@samba.org&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
</content>
</entry>
<entry>
<title>mm: increase RECLAIM_DISTANCE to 30</title>
<updated>2011-06-16T03:03:59Z</updated>
<author>
<name>KOSAKI Motohiro</name>
<email>kosaki.motohiro@jp.fujitsu.com</email>
</author>
<published>2011-06-15T22:08:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=32e45ff43eaf5c17f5a82c9ad358d515622c2562'/>
<id>urn:sha1:32e45ff43eaf5c17f5a82c9ad358d515622c2562</id>
<content type='text'>
Recently, Robert Mueller reported (http://lkml.org/lkml/2010/9/12/236)
that zone_reclaim_mode doesn't work properly on his new NUMA server (Dual
Xeon E5520 + Intel S5520UR MB).  He is using Cyrus IMAPd and it's built on
a very traditional single-process model.

  * a master process which reads config files and manages the other
    process
  * multiple imapd processes, one per connection
  * multiple pop3d processes, one per connection
  * multiple lmtpd processes, one per connection
  * periodical "cleanup" processes.

There are thousands of independent processes.  The problem is, recent
Intel motherboard turn on zone_reclaim_mode by default and traditional
prefork model software don't work well on it.  Unfortunatelly, such models
are still typical even in the 21st century.  We can't ignore them.

This patch raises the zone_reclaim_mode threshold to 30.  30 doesn't have
any specific meaning.  but 20 means that one-hop QPI/Hypertransport and
such relatively cheap 2-4 socket machine are often used for traditional
servers as above.  The intention is that these machines don't use
zone_reclaim_mode.

Note: ia64 and Power have arch specific RECLAIM_DISTANCE definitions.
This patch doesn't change such high-end NUMA machine behavior.

Dave Hansen said:

: I know specifically of pieces of x86 hardware that set the information
: in the BIOS to '21' *specifically* so they'll get the zone_reclaim_mode
: behavior which that implies.
:
: They've done performance testing and run very large and scary benchmarks
: to make sure that they _want_ this turned on.  What this means for them
: is that they'll probably be de-optimized, at least on newer versions of
: the kernel.
:
: If you want to do this for particular systems, maybe _that_'s what we
: should do.  Have a list of specific configurations that need the
: defaults overridden either because they're buggy, or they have an
: unusual hardware configuration not really reflected in the distance
: table.

And later said:

: The original change in the hardware tables was for the benefit of a
: benchmark.  Said benchmark isn't going to get run on mainline until the
: next batch of enterprise distros drops, at which point the hardware where
: this was done will be irrelevant for the benchmark.  I'm sure any new
: hardware will just set this distance to another yet arbitrary value to
: make the kernel do what it wants.  :)
:
: Also, when the hardware got _set_ to this initially, I complained.  So, I
: guess I'm getting my way now, with this patch.  I'm cool with it.

Reported-by: Robert Mueller &lt;robm@fastmail.fm&gt;
Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Acked-by: Christoph Lameter &lt;cl@linux.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Reviewed-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Acked-by: Dave Hansen &lt;dave@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched: Add book scheduling domain</title>
<updated>2010-09-09T18:41:20Z</updated>
<author>
<name>Heiko Carstens</name>
<email>heiko.carstens@de.ibm.com</email>
</author>
<published>2010-08-31T08:28:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=01a08546af311c065f34727787dd0cc8dc0c216f'/>
<id>urn:sha1:01a08546af311c065f34727787dd0cc8dc0c216f</id>
<content type='text'>
On top of the SMT and MC scheduling domains this adds the BOOK scheduling
domain. This is useful for NUMA like machines which do not have an interface
which tells which piece of memory is attached to which node or where the
hardware performs striping.

Signed-off-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
LKML-Reference: &lt;20100831082844.253053798@de.ibm.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>topology: alternate fix for ia64 tiger_defconfig build breakage</title>
<updated>2010-08-10T03:44:57Z</updated>
<author>
<name>Lee Schermerhorn</name>
<email>Lee.Schermerhorn@hp.com</email>
</author>
<published>2010-08-10T00:19:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=251060006003b79b788f8ce5a827ee5354a42910'/>
<id>urn:sha1:251060006003b79b788f8ce5a827ee5354a42910</id>
<content type='text'>
Define stubs for the numa_*_id() generic percpu related functions for
non-NUMA configurations in &lt;asm-generic/topology.h&gt; where the other
non-numa stubs live.

Fixes ia64 !NUMA build breakage -- e.g., tiger_defconfig

Back out now unneeded '#ifndef CONFIG_NUMA' guards from ia64 smpboot.c

Signed-off-by: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Tested-by: Tony Luck &lt;tony.luck@intel.com&gt;
Acked-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched: Fix spelling of sibling</title>
<updated>2010-06-29T08:44:29Z</updated>
<author>
<name>Michael Neuling</name>
<email>mikey@neuling.org</email>
</author>
<published>2010-06-29T02:02:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2ec57d448b2e8fcfba539a46701b43f14f037f17'/>
<id>urn:sha1:2ec57d448b2e8fcfba539a46701b43f14f037f17</id>
<content type='text'>
No logic changes, only spelling.

Signed-off-by: Michael Neuling &lt;mikey@neuling.org&gt;
Cc: linuxppc-dev@ozlabs.org
Cc: David Howells &lt;dhowells@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
LKML-Reference: &lt;15249.1277776921@neuling.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>sched: Add asymmetric group packing option for sibling domain</title>
<updated>2010-06-09T08:34:55Z</updated>
<author>
<name>Michael Neuling</name>
<email>mikey@neuling.org</email>
</author>
<published>2010-06-08T04:57:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=532cb4c401e225b084c14d6bd6a2f8ee561de2f1'/>
<id>urn:sha1:532cb4c401e225b084c14d6bd6a2f8ee561de2f1</id>
<content type='text'>
Check to see if the group is packed in a sched doman.

This is primarily intended to used at the sibling level.  Some cores
like POWER7 prefer to use lower numbered SMT threads.  In the case of
POWER7, it can move to lower SMT modes only when higher threads are
idle.  When in lower SMT modes, the threads will perform better since
they share less core resources.  Hence when we have idle threads, we
want them to be the higher ones.

This adds a hook into f_b_g() called check_asym_packing() to check the
packing.  This packing function is run on idle threads.  It checks to
see if the busiest CPU in this domain (core in the P7 case) has a
higher CPU number than what where the packing function is being run
on.  If it is, calculate the imbalance and return the higher busier
thread as the busiest group to f_b_g().  Here we are assuming a lower
CPU number will be equivalent to a lower SMT thread number.

It also creates a new SD_ASYM_PACKING flag to enable this feature at
any scheduler domain level.

It also creates an arch hook to enable this feature at the sibling
level.  The default function doesn't enable this feature.

Based heavily on patch from Peter Zijlstra.
Fixes from Srivatsa Vaddagiri.

Signed-off-by: Michael Neuling &lt;mikey@neuling.org&gt;
Signed-off-by: Srivatsa Vaddagiri &lt;vatsa@linux.vnet.ibm.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
LKML-Reference: &lt;20100608045702.2936CCC897@localhost.localdomain&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>numa: introduce numa_mem_id()- effective local memory node id</title>
<updated>2010-05-27T16:12:57Z</updated>
<author>
<name>Lee Schermerhorn</name>
<email>lee.schermerhorn@hp.com</email>
</author>
<published>2010-05-26T21:45:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7aac789885512388a66d47280d7e7777ffba1e59'/>
<id>urn:sha1:7aac789885512388a66d47280d7e7777ffba1e59</id>
<content type='text'>
Introduce numa_mem_id(), based on generic percpu variable infrastructure
to track "nearest node with memory" for archs that support memoryless
nodes.

Define API in &lt;linux/topology.h&gt; when CONFIG_HAVE_MEMORYLESS_NODES
defined, else stubs.  Architectures will define HAVE_MEMORYLESS_NODES
if/when they support them.

Archs can override definitions of:

numa_mem_id() - returns node number of "local memory" node
set_numa_mem() - initialize [this cpus'] per cpu variable 'numa_mem'
cpu_to_mem()  - return numa_mem for specified cpu; may be used as lvalue

Generic initialization of 'numa_mem' occurs in __build_all_zonelists().
This will initialize the boot cpu at boot time, and all cpus on change of
numa_zonelist_order, or when node or memory hot-plug requires zonelist
rebuild.  Archs that support memoryless nodes will need to initialize
'numa_mem' for secondary cpus as they're brought on-line.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Signed-off-by: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Eric Whitney &lt;eric.whitney@hp.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Cc: &lt;linux-arch@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>numa: add generic percpu var numa_node_id() implementation</title>
<updated>2010-05-27T16:12:57Z</updated>
<author>
<name>Lee Schermerhorn</name>
<email>lee.schermerhorn@hp.com</email>
</author>
<published>2010-05-26T21:44:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7281201922a0063fa60804ce39c277fc98142a47'/>
<id>urn:sha1:7281201922a0063fa60804ce39c277fc98142a47</id>
<content type='text'>
Rework the generic version of the numa_node_id() function to use the new
generic percpu variable infrastructure.

Guard the new implementation with a new config option:

        CONFIG_USE_PERCPU_NUMA_NODE_ID.

Archs which support this new implemention will default this option to 'y'
when NUMA is configured.  This config option could be removed if/when all
archs switch over to the generic percpu implementation of numa_node_id().
Arch support involves:

  1) converting any existing per cpu variable implementations to use
     this implementation.  x86_64 is an instance of such an arch.
  2) archs that don't use a per cpu variable for numa_node_id() will
     need to initialize the new per cpu variable "numa_node" as cpus
     are brought on-line.  ia64 is an example.
  3) Defining USE_PERCPU_NUMA_NODE_ID in arch dependent Kconfig--e.g.,
     when NUMA is configured.  This is required because I have
     retained the old implementation by default to allow archs to
     be modified incrementally, as desired.

Subsequent patches will convert x86_64 and ia64 to use this implemenation.

Signed-off-by: Lee Schermerhorn &lt;lee.schermerhorn@hp.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Reviewed-by: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Eric Whitney &lt;eric.whitney@hp.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Cc: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Cc: &lt;linux-arch@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched: Fix vmark regression on big machines</title>
<updated>2010-01-21T12:39:03Z</updated>
<author>
<name>Mike Galbraith</name>
<email>efault@gmx.de</email>
</author>
<published>2010-01-04T13:44:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=50b926e439620c469565e8be0f28be78f5fca1ce'/>
<id>urn:sha1:50b926e439620c469565e8be0f28be78f5fca1ce</id>
<content type='text'>
SD_PREFER_SIBLING is set at the CPU domain level if power saving isn't
enabled, leading to many cache misses on large machines as we traverse
looking for an idle shared cache to wake to.  Change the enabler of
select_idle_sibling() to SD_SHARE_PKG_RESOURCES, and enable same at the
sibling domain level.

Reported-by: Lin Ming &lt;ming.m.lin@intel.com&gt;
Signed-off-by: Mike Galbraith &lt;efault@gmx.de&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
LKML-Reference: &lt;1262612696.15495.15.camel@marge.simson.net&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>sched: Disable SD_PREFER_LOCAL for MC/CPU domains</title>
<updated>2009-10-14T13:02:34Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2009-10-09T10:16:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=799e2205ec65e174f752b558c62a92c4752df313'/>
<id>urn:sha1:799e2205ec65e174f752b558c62a92c4752df313</id>
<content type='text'>
Yanmin reported that both tbench and hackbench were significantly
hurt by trying to keep tasks local on these domains, esp on small
cache machines.

So disable it in order to promote spreading outside of the cache
domains.

Reported-by: "Zhang, Yanmin" &lt;yanmin_zhang@linux.intel.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
CC: Mike Galbraith &lt;efault@gmx.de&gt;
LKML-Reference: &lt;1255083400.8802.15.camel@laptop&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
</feed>
