<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/node.h, branch v5.15.16</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.15.16</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.15.16'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2020-10-16T18:11:18Z</updated>
<entry>
<title>mm: don't panic when links can't be created in sysfs</title>
<updated>2020-10-16T18:11:18Z</updated>
<author>
<name>Laurent Dufour</name>
<email>ldufour@linux.ibm.com</email>
</author>
<published>2020-10-16T03:09:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=90c7eaeb14a325a760d732184ff1fbed47e5fa98'/>
<id>urn:sha1:90c7eaeb14a325a760d732184ff1fbed47e5fa98</id>
<content type='text'>
At boot time, or when doing memory hot-add operations, if the links in
sysfs can't be created, the system is still able to run, so just report
the error in the kernel log rather than BUG_ON and potentially make system
unusable because the callpath can be called with locks held.

Since the number of memory blocks managed could be high, the messages are
rate limited.

As a consequence, link_mem_sections() has no status to report anymore.

Signed-off-by: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: Nathan Lynch &lt;nathanl@linux.ibm.com&gt;
Cc: "Rafael J . Wysocki" &lt;rafael@kernel.org&gt;
Cc: Scott Cheloha &lt;cheloha@linux.ibm.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lkml.kernel.org/r/20200915094143.79181-4-ldufour@linux.ibm.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: don't rely on system state to detect hot-plug operations</title>
<updated>2020-09-26T17:33:57Z</updated>
<author>
<name>Laurent Dufour</name>
<email>ldufour@linux.ibm.com</email>
</author>
<published>2020-09-26T04:19:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f85086f95fa36194eb0db5cd5c12e56801b98523'/>
<id>urn:sha1:f85086f95fa36194eb0db5cd5c12e56801b98523</id>
<content type='text'>
In register_mem_sect_under_node() the system_state's value is checked to
detect whether the call is made during boot time or during an hot-plug
operation.  Unfortunately, that check against SYSTEM_BOOTING is wrong
because regular memory is registered at SYSTEM_SCHEDULING state.  In
addition, memory hot-plug operation can be triggered at this system
state by the ACPI [1].  So checking against the system state is not
enough.

The consequence is that on system with interleaved node's ranges like this:

 Early memory node ranges
   node   1: [mem 0x0000000000000000-0x000000011fffffff]
   node   2: [mem 0x0000000120000000-0x000000014fffffff]
   node   1: [mem 0x0000000150000000-0x00000001ffffffff]
   node   0: [mem 0x0000000200000000-0x000000048fffffff]
   node   2: [mem 0x0000000490000000-0x00000007ffffffff]

This can be seen on PowerPC LPAR after multiple memory hot-plug and
hot-unplug operations are done.  At the next reboot the node's memory
ranges can be interleaved and since the call to link_mem_sections() is
made in topology_init() while the system is in the SYSTEM_SCHEDULING
state, the node's id is not checked, and the sections registered to
multiple nodes:

  $ ls -l /sys/devices/system/memory/memory21/node*
  total 0
  lrwxrwxrwx 1 root root     0 Aug 24 05:27 node1 -&gt; ../../node/node1
  lrwxrwxrwx 1 root root     0 Aug 24 05:27 node2 -&gt; ../../node/node2

In that case, the system is able to boot but if later one of theses
memory blocks is hot-unplugged and then hot-plugged, the sysfs
inconsistency is detected and this is triggering a BUG_ON():

  kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084!
  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4
  CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25
  Call Trace:
    add_memory_resource+0x23c/0x340 (unreliable)
    __add_memory+0x5c/0xf0
    dlpar_add_lmb+0x1b4/0x500
    dlpar_memory+0x1f8/0xb80
    handle_dlpar_errorlog+0xc0/0x190
    dlpar_store+0x198/0x4a0
    kobj_attr_store+0x30/0x50
    sysfs_kf_write+0x64/0x90
    kernfs_fop_write+0x1b0/0x290
    vfs_write+0xe8/0x290
    ksys_write+0xdc/0x130
    system_call_exception+0x160/0x270
    system_call_common+0xf0/0x27c

This patch addresses the root cause by not relying on the system_state
value to detect whether the call is due to a hot-plug operation.  An
extra parameter is added to link_mem_sections() detailing whether the
operation is due to a hot-plug operation.

[1] According to Oscar Salvador, using this qemu command line, ACPI
memory hotplug operations are raised at SYSTEM_SCHEDULING state:

  $QEMU -enable-kvm -machine pc -smp 4,sockets=4,cores=1,threads=1 -cpu host -monitor pty \
        -m size=$MEM,slots=255,maxmem=4294967296k  \
        -numa node,nodeid=0,cpus=0-3,mem=512 -numa node,nodeid=1,mem=512 \
        -object memory-backend-ram,id=memdimm0,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 \
        -object memory-backend-ram,id=memdimm1,size=134217728 -device pc-dimm,node=0,memdev=memdimm1,id=dimm1,slot=1 \
        -object memory-backend-ram,id=memdimm2,size=134217728 -device pc-dimm,node=0,memdev=memdimm2,id=dimm2,slot=2 \
        -object memory-backend-ram,id=memdimm3,size=134217728 -device pc-dimm,node=0,memdev=memdimm3,id=dimm3,slot=3 \
        -object memory-backend-ram,id=memdimm4,size=134217728 -device pc-dimm,node=1,memdev=memdimm4,id=dimm4,slot=4 \
        -object memory-backend-ram,id=memdimm5,size=134217728 -device pc-dimm,node=1,memdev=memdimm5,id=dimm5,slot=5 \
        -object memory-backend-ram,id=memdimm6,size=134217728 -device pc-dimm,node=1,memdev=memdimm6,id=dimm6,slot=6 \

Fixes: 4fbce633910e ("mm/memory_hotplug.c: make register_mem_sect_under_node() a callback of walk_memory_range()")
Signed-off-by: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: Nathan Lynch &lt;nathanl@linux.ibm.com&gt;
Cc: Scott Cheloha &lt;cheloha@linux.ibm.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: https://lkml.kernel.org/r/20200915094143.79181-3-ldufour@linux.ibm.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: make register_mem_sect_under_node() static</title>
<updated>2019-07-19T00:08:06Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2019-07-18T22:57:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8d595c4c0f768f19db043d378b22e98405f9fd47'/>
<id>urn:sha1:8d595c4c0f768f19db043d378b22e98405f9fd47</id>
<content type='text'>
It is only used internally.

Link: http://lkml.kernel.org/r/20190614100114.311-4-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Keith Busch &lt;keith.busch@intel.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory_hotplug: make unregister_memory_block_under_nodes() never fail</title>
<updated>2019-07-19T00:08:06Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2019-07-18T22:57:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a31b264c2b415b29660da0bc2ba291a98629ce51'/>
<id>urn:sha1:a31b264c2b415b29660da0bc2ba291a98629ce51</id>
<content type='text'>
We really don't want anything during memory hotunplug to fail.  We
always pass a valid memory block device, that check can go.  Avoid
allocating memory and eventually failing.  As we are always called under
lock, we can use a static piece of memory.  This avoids having to put
the structure onto the stack, having to guess about the stack size of
callers.

Patch inspired by a patch from Oscar Salvador.

In the future, there might be no need to iterate over nodes at all.
mem-&gt;nid should tell us exactly what to remove.  Memory block devices
with mixed nodes (added during boot) should properly fenced off and
never removed.

Link: http://lkml.kernel.org/r/20190527111152.16324-11-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Wei Yang &lt;richardw.yang@linux.intel.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Mark Brown &lt;broonie@kernel.org&gt;
Cc: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Andrew Banman &lt;andrew.banman@hpe.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Cc: Arun KS &lt;arunks@codeaurora.org&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Chintan Pandya &lt;cpandya@codeaurora.org&gt;
Cc: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Jun Yao &lt;yaojun8558363@gmail.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Masahiro Yamada &lt;yamada.masahiro@socionext.com&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Mike Rapoport &lt;rppt@linux.vnet.ibm.com&gt;
Cc: "mike.travis@hpe.com" &lt;mike.travis@hpe.com&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Rob Herring &lt;robh@kernel.org&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory_hotplug: remove memory block devices before arch_remove_memory()</title>
<updated>2019-07-19T00:08:06Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2019-07-18T22:57:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4c4b7f9ba9486c565aead99a198ceeef73ae81f6'/>
<id>urn:sha1:4c4b7f9ba9486c565aead99a198ceeef73ae81f6</id>
<content type='text'>
Let's factor out removing of memory block devices, which is only
necessary for memory added via add_memory() and friends that created
memory block devices.  Remove the devices before calling
arch_remove_memory().

This finishes factoring out memory block device handling from
arch_add_memory() and arch_remove_memory().

Link: http://lkml.kernel.org/r/20190527111152.16324-10-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: "Rafael J. Wysocki" &lt;rafael@kernel.org&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: "mike.travis@hpe.com" &lt;mike.travis@hpe.com&gt;
Cc: Andrew Banman &lt;andrew.banman@hpe.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Mark Brown &lt;broonie@kernel.org&gt;
Cc: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Arun KS &lt;arunks@codeaurora.org&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Chintan Pandya &lt;cpandya@codeaurora.org&gt;
Cc: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Jun Yao &lt;yaojun8558363@gmail.com&gt;
Cc: "Kirill A. Shutemov" &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Masahiro Yamada &lt;yamada.masahiro@socionext.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Mike Rapoport &lt;rppt@linux.vnet.ibm.com&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Rob Herring &lt;robh@kernel.org&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>node: Add memory-side caching attributes</title>
<updated>2019-04-04T16:41:21Z</updated>
<author>
<name>Keith Busch</name>
<email>keith.busch@intel.com</email>
</author>
<published>2019-03-11T20:56:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=acc02a109b0497e917c83f986a89c51e47d0022c'/>
<id>urn:sha1:acc02a109b0497e917c83f986a89c51e47d0022c</id>
<content type='text'>
System memory may have caches to help improve access speed to frequently
requested address ranges. While the system provided cache is transparent
to the software accessing these memory ranges, applications can optimize
their own access based on cache attributes.

Provide a new API for the kernel to register these memory-side caches
under the memory node that provides it.

The new sysfs representation is modeled from the existing cpu cacheinfo
attributes, as seen from /sys/devices/system/cpu/&lt;cpu&gt;/cache/.  Unlike CPU
cacheinfo though, the node cache level is reported from the view of the
memory. A higher level number is nearer to the CPU, while lower levels
are closer to the last level memory.

The exported attributes are the cache size, the line size, associativity
indexing, and write back policy, and add the attributes for the system
memory caches to sysfs stable documentation.

Signed-off-by: Keith Busch &lt;keith.busch@intel.com&gt;
Reviewed-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Reviewed-by: Brice Goglin &lt;Brice.Goglin@inria.fr&gt;
Tested-by: Brice Goglin &lt;Brice.Goglin@inria.fr&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>node: Add heterogenous memory access attributes</title>
<updated>2019-04-04T16:41:21Z</updated>
<author>
<name>Keith Busch</name>
<email>keith.busch@intel.com</email>
</author>
<published>2019-03-11T20:56:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e1cf33aafb8462c7d0a0e6349925870316f040ee'/>
<id>urn:sha1:e1cf33aafb8462c7d0a0e6349925870316f040ee</id>
<content type='text'>
Heterogeneous memory systems provide memory nodes with different latency
and bandwidth performance attributes. Provide a new kernel interface
for subsystems to register the attributes under the memory target
node's initiator access class. If the system provides this information,
applications may query these attributes when deciding which node to
request memory.

The following example shows the new sysfs hierarchy for a node exporting
performance attributes:

  # tree -P "read*|write*"/sys/devices/system/node/nodeY/accessZ/initiators/
  /sys/devices/system/node/nodeY/accessZ/initiators/
  |-- read_bandwidth
  |-- read_latency
  |-- write_bandwidth
  `-- write_latency

The bandwidth is exported as MB/s and latency is reported in
nanoseconds. The values are taken from the platform as reported by the
manufacturer.

Memory accesses from an initiator node that is not one of the memory's
access "Z" initiator nodes linked in the same directory may observe
different performance than reported here. When a subsystem makes use
of this interface, initiators of a different access number may not have
the same performance relative to initiators in other access numbers, or
omitted from the any access class' initiators.

Descriptions for memory access initiator performance access attributes
are added to sysfs stable documentation.

Acked-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Tested-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Signed-off-by: Keith Busch &lt;keith.busch@intel.com&gt;
Reviewed-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Tested-by: Brice Goglin &lt;Brice.Goglin@inria.fr&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>node: Link memory nodes to their compute nodes</title>
<updated>2019-04-04T16:41:20Z</updated>
<author>
<name>Keith Busch</name>
<email>keith.busch@intel.com</email>
</author>
<published>2019-03-11T20:56:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=08d9dbe72b1f899468b2b34f9309e88a84f440f2'/>
<id>urn:sha1:08d9dbe72b1f899468b2b34f9309e88a84f440f2</id>
<content type='text'>
Systems may be constructed with various specialized nodes. Some nodes
may provide memory, some provide compute devices that access and use
that memory, and others may provide both. Nodes that provide memory are
referred to as memory targets, and nodes that can initiate memory access
are referred to as memory initiators.

Memory targets will often have varying access characteristics from
different initiators, and platforms may have ways to express those
relationships. In preparation for these systems, provide interfaces for
the kernel to export the memory relationship among different nodes memory
targets and their initiators with symlinks to each other.

If a system provides access locality for each initiator-target pair, nodes
may be grouped into ranked access classes relative to other nodes. The
new interface allows a subsystem to register relationships of varying
classes if available and desired to be exported.

A memory initiator may have multiple memory targets in the same access
class. The target memory's initiators in a given class indicate the
nodes access characteristics share the same performance relative to other
linked initiator nodes. Each target within an initiator's access class,
though, do not necessarily perform the same as each other.

A memory target node may have multiple memory initiators. All linked
initiators in a target's class have the same access characteristics to
that target.

The following example show the nodes' new sysfs hierarchy for a memory
target node 'Y' with access class 0 from initiator node 'X':

  # symlinks -v /sys/devices/system/node/nodeX/access0/
  relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -&gt; ../../nodeY

  # symlinks -v /sys/devices/system/node/nodeY/access0/
  relative: /sys/devices/system/node/nodeY/access0/initiators/nodeX -&gt; ../../nodeX

The new attributes are added to the sysfs stable documentation.

Reviewed-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Signed-off-by: Keith Busch &lt;keith.busch@intel.com&gt;
Reviewed-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Tested-by: Brice Goglin &lt;Brice.Goglin@inria.fr&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory_hotplug.c: make register_mem_sect_under_node() a callback of walk_memory_range()</title>
<updated>2018-08-17T23:20:29Z</updated>
<author>
<name>Oscar Salvador</name>
<email>osalvador@suse.de</email>
</author>
<published>2018-08-17T22:46:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4fbce633910ed80b135b84160a22b219080c8082'/>
<id>urn:sha1:4fbce633910ed80b135b84160a22b219080c8082</id>
<content type='text'>
link_mem_sections() and walk_memory_range() share most of the code, so
we can use convert link_mem_sections() into a dummy function that calls
walk_memory_range() with a callback to register_mem_sect_under_node().

This patch converts register_mem_sect_under_node() in order to match a
walk_memory_range's callback, getting rid of the check_nid argument and
checking instead if the system is still boothing, since we only have to
check for the nid if the system is in such state.

Link: http://lkml.kernel.org/r/20180622111839.10071-4-osalvador@techadventures.net
Signed-off-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Suggested-by: Pavel Tatashin &lt;pasha.tatashin@oracle.com&gt;
Tested-by: Reza Arbab &lt;arbab@linux.vnet.ibm.com&gt;
Tested-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Reviewed-by: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory_hotplug: fix leftover use of struct page during hotplug</title>
<updated>2018-05-26T01:12:11Z</updated>
<author>
<name>Jonathan Cameron</name>
<email>Jonathan.Cameron@huawei.com</email>
</author>
<published>2018-05-25T21:47:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a21558618c5dfc55b6086743a88ce5a9c1588f0a'/>
<id>urn:sha1:a21558618c5dfc55b6086743a88ce5a9c1588f0a</id>
<content type='text'>
The case of a new numa node got missed in avoiding using the node info
from page_struct during hotplug.  In this path we have a call to
register_mem_sect_under_node (which allows us to specify it is hotplug
so don't change the node), via link_mem_sections which unfortunately
does not.

Fix is to pass check_nid through link_mem_sections as well and disable
it in the new numa node path.

Note the bug only 'sometimes' manifests depending on what happens to be
in the struct page structures - there are lots of them and it only needs
to match one of them.

The result of the bug is that (with a new memory only node) we never
successfully call register_mem_sect_under_node so don't get the memory
associated with the node in sysfs and meminfo for the node doesn't
report it.

It came up whilst testing some arm64 hotplug patches, but appears to be
universal.  Whilst I'm triggering it by removing then reinserting memory
to a node with no other elements (thus making the node disappear then
appear again), it appears it would happen on hotplugging memory where
there was none before and it doesn't seem to be related the arm64
patches.

These patches call __add_pages (where most of the issue was fixed by
Pavel's patch).  If there is a node at the time of the __add_pages call
then all is well as it calls register_mem_sect_under_node from there
with check_nid set to false.  Without a node that function returns
having not done the sysfs related stuff as there is no node to use.
This is expected but it is the resulting path that fails...

Exact path to the problem is as follows:

 mm/memory_hotplug.c: add_memory_resource()

   The node is not online so we enter the 'if (new_node)' twice, on the
   second such block there is a call to link_mem_sections which calls
   into

  drivers/node.c: link_mem_sections() which calls

  drivers/node.c: register_mem_sect_under_node() which calls
     get_nid_for_pfn and keeps trying until the output of that matches
     the expected node (passed all the way down from
     add_memory_resource)

It is effectively the same fix as the one referred to in the fixes tag
just in the code path for a new node where the comments point out we
have to rerun the link creation because it will have failed in
register_new_memory (as there was no node at the time).  (actually that
comment is wrong now as we don't have register_new_memory any more it
got renamed to hotplug_memory_register in Pavel's patch).

Link: http://lkml.kernel.org/r/20180504085311.1240-1-Jonathan.Cameron@huawei.com
Fixes: fc44f7f9231a ("mm/memory_hotplug: don't read nid from struct page during hotplug")
Signed-off-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Reviewed-by: Pavel Tatashin &lt;pasha.tatashin@oracle.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
