<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/memblock.h, branch v4.4.2</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.2</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.2'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2015-11-06T03:34:48Z</updated>
<entry>
<title>mm/memblock: make memblock_remove_range() static</title>
<updated>2015-11-06T03:34:48Z</updated>
<author>
<name>Alexander Kuleshov</name>
<email>kuleshovmail@gmail.com</email>
</author>
<published>2015-11-06T02:47:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=35bd16a227534cb6ffc9b26a33061c2dcf91934b'/>
<id>urn:sha1:35bd16a227534cb6ffc9b26a33061c2dcf91934b</id>
<content type='text'>
memblock_remove_range() is only used in the mm/memblock.c, so we can make
it static.

Signed-off-by: Alexander Kuleshov &lt;kuleshovmail@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mem-hotplug: handle node hole when initializing numa_meminfo.</title>
<updated>2015-09-08T22:35:28Z</updated>
<author>
<name>Tang Chen</name>
<email>tangchen@cn.fujitsu.com</email>
</author>
<published>2015-09-08T22:02:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=95cf82ecc1fcb44df1768162343cc8eb88083b86'/>
<id>urn:sha1:95cf82ecc1fcb44df1768162343cc8eb88083b86</id>
<content type='text'>
When parsing SRAT, all memory ranges are added into numa_meminfo.  In
numa_init(), before entering numa_cleanup_meminfo(), all possible memory
ranges are in numa_meminfo.  And numa_cleanup_meminfo() removes all
ranges over max_pfn or empty.

But, this only works if the nodes are continuous.  Let's have a look at
the following example:

We have an SRAT like this:
SRAT: Node 0 PXM 0 [mem 0x00000000-0x5fffffff]
SRAT: Node 0 PXM 0 [mem 0x100000000-0x1ffffffffff]
SRAT: Node 1 PXM 1 [mem 0x20000000000-0x3ffffffffff]
SRAT: Node 4 PXM 2 [mem 0x40000000000-0x5ffffffffff] hotplug
SRAT: Node 5 PXM 3 [mem 0x60000000000-0x7ffffffffff] hotplug
SRAT: Node 2 PXM 4 [mem 0x80000000000-0x9ffffffffff] hotplug
SRAT: Node 3 PXM 5 [mem 0xa0000000000-0xbffffffffff] hotplug
SRAT: Node 6 PXM 6 [mem 0xc0000000000-0xdffffffffff] hotplug
SRAT: Node 7 PXM 7 [mem 0xe0000000000-0xfffffffffff] hotplug

On boot, only node 0,1,2,3 exist.

And the numa_meminfo will look like this:
numa_meminfo.nr_blks = 9
1. on node 0: [0, 60000000]
2. on node 0: [100000000, 20000000000]
3. on node 1: [20000000000, 40000000000]
4. on node 4: [40000000000, 60000000000]
5. on node 5: [60000000000, 80000000000]
6. on node 2: [80000000000, a0000000000]
7. on node 3: [a0000000000, a0800000000]
8. on node 6: [c0000000000, a0800000000]
9. on node 7: [e0000000000, a0800000000]

And numa_cleanup_meminfo() will merge 1 and 2, and remove 8,9 because the
end address is over max_pfn, which is a0800000000.  But 4 and 5 are not
removed because their end addresses are less then max_pfn.  But in fact,
node 4 and 5 don't exist.

In a word, numa_cleanup_meminfo() is not able to handle holes between nodes.

Since memory ranges in node 4 and 5 are in numa_meminfo, in
numa_register_memblks(), node 4 and 5 will be mistakenly set to online.

If you run lscpu, it will show:
NUMA node0 CPU(s):     0-14,128-142
NUMA node1 CPU(s):     15-29,143-157
NUMA node2 CPU(s):
NUMA node3 CPU(s):
NUMA node4 CPU(s):     62-76,190-204
NUMA node5 CPU(s):     78-92,206-220

In this patch, we use memblock_overlaps_region() to check if ranges in
numa_meminfo overlap with ranges in memory_block.  Since memory_block
contains all available memory at boot time, if they overlap, it means the
ranges exist.  If not, then remove them from numa_meminfo.

After this patch, lscpu will show:
NUMA node0 CPU(s):     0-14,128-142
NUMA node1 CPU(s):     15-29,143-157
NUMA node4 CPU(s):     62-76,190-204
NUMA node5 CPU(s):     78-92,206-220

Signed-off-by: Tang Chen &lt;tangchen@cn.fujitsu.com&gt;
Reviewed-by: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Cc: Xishi Qiu &lt;qiuxishi@huawei.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Vladimir Murzin &lt;vladimir.murzin@arm.com&gt;
Cc: Fabian Frederick &lt;fabf@skynet.be&gt;
Cc: Alexander Kuleshov &lt;kuleshovmail@gmail.com&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memblock.c: make memblock_overlaps_region() return bool.</title>
<updated>2015-09-08T22:35:28Z</updated>
<author>
<name>Tang Chen</name>
<email>tangchen@cn.fujitsu.com</email>
</author>
<published>2015-09-08T22:02:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c5c5c9d1008fb15945d0173b3ca75931ef53ae1f'/>
<id>urn:sha1:c5c5c9d1008fb15945d0173b3ca75931ef53ae1f</id>
<content type='text'>
memblock_overlaps_region() checks if the given memblock region
intersects a region in memblock.  If so, it returns the index of the
intersected region.

But its only caller is memblock_is_region_reserved(), and it returns 0
if false, non-zero if true.

Both of these should return bool.

Signed-off-by: Tang Chen &lt;tangchen@cn.fujitsu.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Cc: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Cc: Xishi Qiu &lt;qiuxishi@huawei.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Vladimir Murzin &lt;vladimir.murzin@arm.com&gt;
Cc: Fabian Frederick &lt;fabf@skynet.be&gt;
Cc: Alexander Kuleshov &lt;kuleshovmail@gmail.com&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memblock: introduce a for_each_reserved_mem_region iterator</title>
<updated>2015-07-01T02:44:55Z</updated>
<author>
<name>Robin Holt</name>
<email>holt@sgi.com</email>
</author>
<published>2015-06-30T21:56:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8e7a7f8619f1f93736d9bb7e31caf4721bdc739d'/>
<id>urn:sha1:8e7a7f8619f1f93736d9bb7e31caf4721bdc739d</id>
<content type='text'>
Struct page initialisation had been identified as one of the reasons why
large machines take a long time to boot. Patches were posted a long time ago
to defer initialisation until they were first used.  This was rejected on
the grounds it should not be necessary to hurt the fast paths. This series
reuses much of the work from that time but defers the initialisation of
memory to kswapd so that one thread per node initialises memory local to
that node.

After applying the series and setting the appropriate Kconfig variable I
see this in the boot log on a 64G machine

[    7.383764] kswapd 0 initialised deferred memory in 188ms
[    7.404253] kswapd 1 initialised deferred memory in 208ms
[    7.411044] kswapd 3 initialised deferred memory in 216ms
[    7.411551] kswapd 2 initialised deferred memory in 216ms

On a 1TB machine, I see

[    8.406511] kswapd 3 initialised deferred memory in 1116ms
[    8.428518] kswapd 1 initialised deferred memory in 1140ms
[    8.435977] kswapd 0 initialised deferred memory in 1148ms
[    8.437416] kswapd 2 initialised deferred memory in 1148ms

Once booted the machine appears to work as normal. Boot times were measured
from the time shutdown was called until ssh was available again.  In the
64G case, the boot time savings are negligible. On the 1TB machine, the
savings were 16 seconds.

Nate Zimmer said:

: On an older 8 TB box with lots and lots of cpus the boot time, as
: measure from grub to login prompt, the boot time improved from 1484
: seconds to exactly 1000 seconds.

Waiman Long said:

: I ran a bootup timing test on a 12-TB 16-socket IvyBridge-EX system.  From
: grub menu to ssh login, the bootup time was 453s before the patch and 265s
: after the patch - a saving of 188s (42%).

Daniel Blueman said:

: On a 7TB, 1728-core NumaConnect system with 108 NUMA nodes, we're seeing
: stock 4.0 boot in 7136s.  This drops to 2159s, or a 70% reduction with
: this patchset.  Non-temporal PMD init (https://lkml.org/lkml/2015/4/23/350)
: drops this to 1045s.

This patch (of 13):

As part of initializing struct page's in 2MiB chunks, we noticed that at
the end of free_all_bootmem(), there was nothing which had forced the
reserved/allocated 4KiB pages to be initialized.

This helper function will be used for that expansion.

Signed-off-by: Robin Holt &lt;holt@sgi.com&gt;
Signed-off-by: Nate Zimmer &lt;nzimmer@sgi.com&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Tested-by: Nate Zimmer &lt;nzimmer@sgi.com&gt;
Tested-by: Waiman Long &lt;waiman.long@hp.com&gt;
Tested-by: Daniel J Blueman &lt;daniel@numascale.com&gt;
Acked-by: Pekka Enberg &lt;penberg@kernel.org&gt;
Cc: Robin Holt &lt;robinmholt@gmail.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Waiman Long &lt;waiman.long@hp.com&gt;
Cc: Scott Norton &lt;scott.norton@hp.com&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memblock: allocate boot time data structures from mirrored memory</title>
<updated>2015-06-25T00:49:45Z</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2015-06-24T23:58:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a3f5bafcc04aaf62990e0cf3ced1cc6d8dc6fe95'/>
<id>urn:sha1:a3f5bafcc04aaf62990e0cf3ced1cc6d8dc6fe95</id>
<content type='text'>
Try to allocate all boot time kernel data structures from mirrored
memory.

If we run out of mirrored memory print warnings, but fall back to using
non-mirrored memory to make sure that we still boot.

By number of bytes, most of what we allocate at boot time is the page
structures.  64 bytes per 4K page on x86_64 ...  or about 1.5% of total
system memory.  For workloads where the bulk of memory is allocated to
applications this may represent a useful improvement to system
availability since 1.5% of total memory might be a third of the memory
allocated to the kernel.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Xishi Qiu &lt;qiuxishi@huawei.com&gt;
Cc: Hanjun Guo &lt;guohanjun@huawei.com&gt;
Cc: Xiexiuqi &lt;xiexiuqi@huawei.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Yinghai Lu &lt;yinghai@kernel.org&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute</title>
<updated>2015-06-25T00:49:44Z</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2015-06-24T23:58:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fc6daaf93151877748f8096af6b3fddb147f22d6'/>
<id>urn:sha1:fc6daaf93151877748f8096af6b3fddb147f22d6</id>
<content type='text'>
Some high end Intel Xeon systems report uncorrectable memory errors as a
recoverable machine check.  Linux has included code for some time to
process these and just signal the affected processes (or even recover
completely if the error was in a read only page that can be replaced by
reading from disk).

But we have no recovery path for errors encountered during kernel code
execution.  Except for some very specific cases were are unlikely to ever
be able to recover.

Enter memory mirroring. Actually 3rd generation of memory mirroing.

Gen1: All memory is mirrored
	Pro: No s/w enabling - h/w just gets good data from other side of the
	     mirror
	Con: Halves effective memory capacity available to OS/applications

Gen2: Partial memory mirror - just mirror memory begind some memory controllers
	Pro: Keep more of the capacity
	Con: Nightmare to enable. Have to choose between allocating from
	     mirrored memory for safety vs. NUMA local memory for performance

Gen3: Address range partial memory mirror - some mirror on each memory
      controller
	Pro: Can tune the amount of mirror and keep NUMA performance
	Con: I have to write memory management code to implement

The current plan is just to use mirrored memory for kernel allocations.
This has been broken into two phases:

1) This patch series - find the mirrored memory, use it for boot time
   allocations

2) Wade into mm/page_alloc.c and define a ZONE_MIRROR to pick up the
   unused mirrored memory from mm/memblock.c and only give it out to
   select kernel allocations (this is still being scoped because
   page_alloc.c is scary).

This patch (of 3):

Add extra "flags" to memblock to allow selection of memory based on
attribute.  No functional changes

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Xishi Qiu &lt;qiuxishi@huawei.com&gt;
Cc: Hanjun Guo &lt;guohanjun@huawei.com&gt;
Cc: Xiexiuqi &lt;xiexiuqi@huawei.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Yinghai Lu &lt;yinghai@kernel.org&gt;
Cc: Naoya Horiguchi &lt;nao.horiguchi@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memtest: use phys_addr_t for physical addresses</title>
<updated>2015-04-14T23:49:06Z</updated>
<author>
<name>Vladimir Murzin</name>
<email>vladimir.murzin@arm.com</email>
</author>
<published>2015-04-14T22:48:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7f70baeeb9e2d2b2a37a4bd3727d709547c4ae41'/>
<id>urn:sha1:7f70baeeb9e2d2b2a37a4bd3727d709547c4ae41</id>
<content type='text'>
Since memtest might be used by other architectures pass input parameters
as phys_addr_t instead of long to prevent overflow.

Signed-off-by: Vladimir Murzin &lt;vladimir.murzin@arm.com&gt;
Acked-by: Will Deacon &lt;will.deacon@arm.com&gt;
Tested-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Russell King &lt;rmk@arm.linux.org.uk&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: move memtest under mm</title>
<updated>2015-04-14T23:49:06Z</updated>
<author>
<name>Vladimir Murzin</name>
<email>vladimir.murzin@arm.com</email>
</author>
<published>2015-04-14T22:48:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4a20799d11f64e6b8725cacc7619b1ae1dbf9acd'/>
<id>urn:sha1:4a20799d11f64e6b8725cacc7619b1ae1dbf9acd</id>
<content type='text'>
Memtest is a simple feature which fills the memory with a given set of
patterns and validates memory contents, if bad memory regions is detected
it reserves them via memblock API.  Since memblock API is widely used by
other architectures this feature can be enabled outside of x86 world.

This patch set promotes memtest to live under generic mm umbrella and
enables memtest feature for arm/arm64.

It was reported that this patch set was useful for tracking down an issue
with some errant DMA on an arm64 platform.

This patch (of 6):

There is nothing platform dependent in the core memtest code, so other
platforms might benefit from this feature too.

[linux@roeck-us.net: MEMTEST depends on MEMBLOCK]
Signed-off-by: Vladimir Murzin &lt;vladimir.murzin@arm.com&gt;
Acked-by: Will Deacon &lt;will.deacon@arm.com&gt;
Tested-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Russell King &lt;rmk@arm.linux.org.uk&gt;
Cc: Paul Bolle &lt;pebolle@tiscali.nl&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>include/linux/memblock.h: add __init to memblock_set_bottom_up()</title>
<updated>2014-08-07T01:01:15Z</updated>
<author>
<name>Fabian Frederick</name>
<email>fabf@skynet.be</email>
</author>
<published>2014-08-06T23:05:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2cfb3665e864755400dc57b6ceee2ebb6b382910'/>
<id>urn:sha1:2cfb3665e864755400dc57b6ceee2ebb6b382910</id>
<content type='text'>
memblock_set_bottom_up() is only called by __init
cmdline_parse_movable_node() and __init numa_init().

Signed-off-by: Fabian Frederick &lt;fabf@skynet.be&gt;
Reviewed-by: Tang Chen &lt;tangchen@cn.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memblock: introduce memblock_alloc_range()</title>
<updated>2014-06-04T23:53:57Z</updated>
<author>
<name>Akinobu Mita</name>
<email>akinobu.mita@gmail.com</email>
</author>
<published>2014-06-04T23:06:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2bfc2862c4fe38379a2fb2cfba33fad32ccb4ff4'/>
<id>urn:sha1:2bfc2862c4fe38379a2fb2cfba33fad32ccb4ff4</id>
<content type='text'>
This introduces memblock_alloc_range() which allocates memblock from the
specified range of physical address.  I would like to use this function
to specify the location of CMA.

Signed-off-by: Akinobu Mita &lt;akinobu.mita@gmail.com&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Cc: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: David Woodhouse &lt;dwmw2@infradead.org&gt;
Cc: Don Dutile &lt;ddutile@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: Yinghai Lu &lt;yinghai@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
