<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/mm/memory_hotplug.c, branch v4.14.329</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.14.329</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.14.329'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2021-09-22T09:45:34Z</updated>
<entry>
<title>mm/memory_hotplug: use "unsigned long" for PFN in zone_for_pfn_range()</title>
<updated>2021-09-22T09:45:34Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2021-09-08T02:54:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=699bbc8bca3992bd42eb141787c0db452f485c2d'/>
<id>urn:sha1:699bbc8bca3992bd42eb141787c0db452f485c2d</id>
<content type='text'>
commit 7cf209ba8a86410939a24cb1aeb279479a7e0ca6 upstream.

Patch series "mm/memory_hotplug: preparatory patches for new online policy and memory"

These are all cleanups and one fix previously sent as part of [1]:
[PATCH v1 00/12] mm/memory_hotplug: "auto-movable" online policy and memory
groups.

These patches make sense even without the other series, therefore I pulled
them out to make the other series easier to digest.

[1] https://lkml.kernel.org/r/20210607195430.48228-1-david@redhat.com

This patch (of 4):

Checkpatch complained on a follow-up patch that we are using "unsigned"
here, which defaults to "unsigned int" and checkpatch is correct.

As we will search for a fitting zone using the wrong pfn, we might end
up onlining memory to one of the special kernel zones, such as ZONE_DMA,
which can end badly as the onlined memory does not satisfy properties of
these zones.

Use "unsigned long" instead, just as we do in other places when handling
PFNs.  This can bite us once we have physical addresses in the range of
multiple TB.

Link: https://lkml.kernel.org/r/20210712124052.26491-2-david@redhat.com
Fixes: e5e689302633 ("mm, memory_hotplug: display allowed zones in the preferred ordering")
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Pankaj Gupta &lt;pankaj.gupta@ionos.com&gt;
Reviewed-by: Muchun Song &lt;songmuchun@bytedance.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Vitaly Kuznetsov &lt;vkuznets@redhat.com&gt;
Cc: "Michael S. Tsirkin" &lt;mst@redhat.com&gt;
Cc: Jason Wang &lt;jasowang@redhat.com&gt;
Cc: Pankaj Gupta &lt;pankaj.gupta.linux@gmail.com&gt;
Cc: Wei Yang &lt;richard.weiyang@linux.alibaba.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Mike Rapoport &lt;rppt@kernel.org&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@rjwysocki.net&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Heiko Carstens &lt;hca@linux.ibm.com&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: virtualization@lists.linux-foundation.org
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Anton Blanchard &lt;anton@ozlabs.org&gt;
Cc: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Cc: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Cc: Dave Jiang &lt;dave.jiang@intel.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jia He &lt;justin.he@arm.com&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Cc: Kefeng Wang &lt;wangkefeng.wang@huawei.com&gt;
Cc: Laurent Dufour &lt;ldufour@linux.ibm.com&gt;
Cc: Michel Lespinasse &lt;michel@lespinasse.org&gt;
Cc: Nathan Lynch &lt;nathanl@linux.ibm.com&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Pierre Morel &lt;pmorel@linux.ibm.com&gt;
Cc: "Rafael J. Wysocki" &lt;rafael.j.wysocki@intel.com&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Scott Cheloha &lt;cheloha@linux.ibm.com&gt;
Cc: Sergei Trofimovich &lt;slyfox@gentoo.org&gt;
Cc: Thiago Jung Bauermann &lt;bauerman@linux.ibm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Vishal Verma &lt;vishal.l.verma@intel.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory_hotplug: don't access uninitialized memmaps in shrink_zone_span()</title>
<updated>2019-12-01T08:14:18Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2019-11-22T01:53:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b5ca5acbe49449598882b102440df587e40c147a'/>
<id>urn:sha1:b5ca5acbe49449598882b102440df587e40c147a</id>
<content type='text'>
commit 7ce700bf11b5e2cb84e4352bbdf2123a7a239c84 upstream.

Let's limit shrinking to !ZONE_DEVICE so we can fix the current code.
We should never try to touch the memmap of offline sections where we
could have uninitialized memmaps and could trigger BUGs when calling
page_to_nid() on poisoned pages.

There is no reliable way to distinguish an uninitialized memmap from an
initialized memmap that belongs to ZONE_DEVICE, as we don't have
anything like SECTION_IS_ONLINE we can use similar to
pfn_to_online_section() for !ZONE_DEVICE memory.

E.g., set_zone_contiguous() similarly relies on pfn_to_online_section()
and will therefore never set a ZONE_DEVICE zone consecutive.  Stopping
to shrink the ZONE_DEVICE therefore results in no observable changes,
besides /proc/zoneinfo indicating different boundaries - something we
can totally live with.

Before commit d0dc12e86b31 ("mm/memory_hotplug: optimize memory
hotplug"), the memmap was initialized with 0 and the node with the right
value.  So the zone might be wrong but not garbage.  After that commit,
both the zone and the node will be garbage when touching uninitialized
memmaps.

Toshiki reported a BUG (race between delayed initialization of
ZONE_DEVICE memmaps without holding the memory hotplug lock and
concurrent zone shrinking).

  https://lkml.org/lkml/2019/11/14/1040

"Iteration of create and destroy namespace causes the panic as below:

      kernel BUG at mm/page_alloc.c:535!
      CPU: 7 PID: 2766 Comm: ndctl Not tainted 5.4.0-rc4 #6
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
      RIP: 0010:set_pfnblock_flags_mask+0x95/0xf0
      Call Trace:
       memmap_init_zone_device+0x165/0x17c
       memremap_pages+0x4c1/0x540
       devm_memremap_pages+0x1d/0x60
       pmem_attach_disk+0x16b/0x600 [nd_pmem]
       nvdimm_bus_probe+0x69/0x1c0
       really_probe+0x1c2/0x3e0
       driver_probe_device+0xb4/0x100
       device_driver_attach+0x4f/0x60
       bind_store+0xc9/0x110
       kernfs_fop_write+0x116/0x190
       vfs_write+0xa5/0x1a0
       ksys_write+0x59/0xd0
       do_syscall_64+0x5b/0x180
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

  While creating a namespace and initializing memmap, if you destroy the
  namespace and shrink the zone, it will initialize the memmap outside
  the zone and trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page),
  pfn), page) in set_pfnblock_flags_mask()."

This BUG is also mitigated by this commit, where we for now stop to
shrink the ZONE_DEVICE zone until we can do it in a safe and clean way.

Link: http://lkml.kernel.org/r/20191006085646.5768-5-david@redhat.com
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online")	[visible after d0dc12e86b319]
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reported-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.ibm.com&gt;
Reported-by: Toshiki Fukasawa &lt;t-fukasawa@vx.jp.nec.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Alexander Duyck &lt;alexander.h.duyck@linux.intel.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Cc: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Cc: Damian Tometzki &lt;damian.tometzki@gmail.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: Gerald Schaefer &lt;gerald.schaefer@de.ibm.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Halil Pasic &lt;pasic@linux.ibm.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Ira Weiny &lt;ira.weiny@intel.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Jun Yao &lt;yaojun8558363@gmail.com&gt;
Cc: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Masahiro Yamada &lt;yamada.masahiro@socionext.com&gt;
Cc: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Pankaj Gupta &lt;pagupta@redhat.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Steve Capper &lt;steve.capper@arm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Tom Lendacky &lt;thomas.lendacky@amd.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Cc: Wei Yang &lt;richardw.yang@linux.intel.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.13+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory_hotplug: make add_memory() take the device_hotplug_lock</title>
<updated>2019-12-01T08:13:57Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2018-10-30T22:10:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5cb8388a680a363ba9a8cca8f81687f9b0d238bb'/>
<id>urn:sha1:5cb8388a680a363ba9a8cca8f81687f9b0d238bb</id>
<content type='text'>
[ Upstream commit 8df1d0e4a265f25dc1e7e7624ccdbcb4a6630c89 ]

add_memory() currently does not take the device_hotplug_lock, however
is aleady called under the lock from
	arch/powerpc/platforms/pseries/hotplug-memory.c
	drivers/acpi/acpi_memhotplug.c
to synchronize against CPU hot-remove and similar.

In general, we should hold the device_hotplug_lock when adding memory to
synchronize against online/offline request (e.g.  from user space) - which
already resulted in lock inversions due to device_lock() and
mem_hotplug_lock - see 30467e0b3be ("mm, hotplug: fix concurrent memory
hot-add deadlock").  add_memory()/add_memory_resource() will create memory
block devices, so this really feels like the right thing to do.

Holding the device_hotplug_lock makes sure that a memory block device
can really only be accessed (e.g. via .online/.state) from user space,
once the memory has been fully added to the system.

The lock is not held yet in
	drivers/xen/balloon.c
	arch/powerpc/platforms/powernv/memtrace.c
	drivers/s390/char/sclp_cmd.c
	drivers/hv/hv_balloon.c
So, let's either use the locked variants or take the lock.

Don't export add_memory_resource(), as it once was exported to be used by
XEN, which is never built as a module.  If somebody requires it, we also
have to export a locked variant (as device_hotplug_lock is never
exported).

Link: http://lkml.kernel.org/r/20180925091457.28651-3-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Reviewed-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Reviewed-by: Rashmica Gupta &lt;rashmica.g@gmail.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: "Rafael J. Wysocki" &lt;rjw@rjwysocki.net&gt;
Cc: Len Brown &lt;lenb@kernel.org&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: Nathan Fontenot &lt;nfont@linux.vnet.ibm.com&gt;
Cc: John Allen &lt;jallen@linux.vnet.ibm.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Joonsoo Kim &lt;iamjoonsoo.kim@lge.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Mathieu Malaterre &lt;malat@debian.org&gt;
Cc: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Cc: YASUAKI ISHIMATSU &lt;yasu.isimatu@gmail.com&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Haiyang Zhang &lt;haiyangz@microsoft.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Kate Stewart &lt;kstewart@linuxfoundation.org&gt;
Cc: "K. Y. Srinivasan" &lt;kys@microsoft.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Michael Neuling &lt;mikey@neuling.org&gt;
Cc: Philippe Ombredanne &lt;pombredanne@nexb.com&gt;
Cc: Stephen Hemminger &lt;sthemmin@microsoft.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm/memory_hotplug: fix updating the node span</title>
<updated>2019-11-24T07:22:47Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2019-11-06T05:17:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=630ca7d4c08948e25ae4d6dc8c6f1fa783b56a41'/>
<id>urn:sha1:630ca7d4c08948e25ae4d6dc8c6f1fa783b56a41</id>
<content type='text'>
commit 656d571193262a11c2daa4012e53e4d645bbce56 upstream.

We recently started updating the node span based on the zone span to
avoid touching uninitialized memmaps.

Currently, we will always detect the node span to start at 0, meaning a
node can easily span too many pages.  pgdat_is_empty() will still work
correctly if all zones span no pages.  We should skip over all zones
without spanned pages and properly handle the first detected zone that
spans pages.

Unfortunately, in contrast to the zone span (/proc/zoneinfo), the node
span cannot easily be inspected and tested.  The node span gives no real
guarantees when an architecture supports memory hotplug, meaning it can
easily contain holes or span pages of different nodes.

The node span is not really used after init on architectures that
support memory hotplug.

E.g., we use it in mm/memory_hotplug.c:try_offline_node() and in
mm/kmemleak.c:kmemleak_scan().  These users seem to be fine.

Link: http://lkml.kernel.org/r/20191027222714.5313-1-david@redhat.com
Fixes: 00d6c019b5bc ("mm/memory_hotplug: don't access uninitialized memmaps in shrink_pgdat_span()")
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm/memory_hotplug: don't access uninitialized memmaps in shrink_pgdat_span()</title>
<updated>2019-11-24T07:22:46Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2019-10-19T03:19:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b55841e18213b42e6ee380e0cc1c03c9b406fce0'/>
<id>urn:sha1:b55841e18213b42e6ee380e0cc1c03c9b406fce0</id>
<content type='text'>
commit 00d6c019b5bc175cee3770e0e659f2b5f4804ea5 upstream.

We might use the nid of memmaps that were never initialized.  For
example, if the memmap was poisoned, we will crash the kernel in
pfn_to_nid() right now.  Let's use the calculated boundaries of the
separate zones instead.  This now also avoids having to iterate over a
whole bunch of subsections again, after shrinking one zone.

Before commit d0dc12e86b31 ("mm/memory_hotplug: optimize memory
hotplug"), the memmap was initialized to 0 and the node was set to the
right value.  After that commit, the node might be garbage.

We'll have to fix shrink_zone_span() next.

Link: http://lkml.kernel.org/r/20191006085646.5768-4-david@redhat.com
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online")	[d0dc12e86b319]
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Reported-by: Aneesh Kumar K.V &lt;aneesh.kumar@linux.ibm.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: David Hildenbrand &lt;david@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Wei Yang &lt;richardw.yang@linux.intel.com&gt;
Cc: Alexander Duyck &lt;alexander.h.duyck@linux.intel.com&gt;
Cc: Alexander Potapenko &lt;glider@google.com&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Christian Borntraeger &lt;borntraeger@de.ibm.com&gt;
Cc: Christophe Leroy &lt;christophe.leroy@c-s.fr&gt;
Cc: Damian Tometzki &lt;damian.tometzki@gmail.com&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Fenghua Yu &lt;fenghua.yu@intel.com&gt;
Cc: Gerald Schaefer &lt;gerald.schaefer@de.ibm.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Halil Pasic &lt;pasic@linux.ibm.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Ira Weiny &lt;ira.weiny@intel.com&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Jun Yao &lt;yaojun8558363@gmail.com&gt;
Cc: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Masahiro Yamada &lt;yamada.masahiro@socionext.com&gt;
Cc: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
Cc: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Mike Rapoport &lt;rppt@linux.ibm.com&gt;
Cc: Pankaj Gupta &lt;pagupta@redhat.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Pavel Tatashin &lt;pavel.tatashin@microsoft.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Qian Cai &lt;cai@lca.pw&gt;
Cc: Rich Felker &lt;dalias@libc.org&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Steve Capper &lt;steve.capper@arm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Tom Lendacky &lt;thomas.lendacky@amd.com&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: Vasily Gorbik &lt;gor@linux.ibm.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Wei Yang &lt;richard.weiyang@gmail.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Yoshinori Sato &lt;ysato@users.sourceforge.jp&gt;
Cc: Yu Zhao &lt;yuzhao@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[4.13+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone</title>
<updated>2019-03-13T21:03:18Z</updated>
<author>
<name>Mikhail Zaslonko</name>
<email>zaslonko@linux.ibm.com</email>
</author>
<published>2019-02-01T22:20:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d83382c6c3c751867dbbadd0d4a2740ba028dd85'/>
<id>urn:sha1:d83382c6c3c751867dbbadd0d4a2740ba028dd85</id>
<content type='text'>
[ Upstream commit 24feb47c5fa5b825efb0151f28906dfdad027e61 ]

If memory end is not aligned with the sparse memory section boundary,
the mapping of such a section is only partly initialized.  This may lead
to VM_BUG_ON due to uninitialized struct pages access from
test_pages_in_a_zone() function triggered by memory_hotplug sysfs
handlers.

Here are the the panic examples:
 CONFIG_DEBUG_VM_PGFLAGS=y
 kernel parameter mem=2050M
 --------------------------
 page:000003d082008000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
   test_pages_in_a_zone+0xde/0x160
   show_valid_zones+0x5c/0x190
   dev_attr_show+0x34/0x70
   sysfs_kf_seq_show+0xc8/0x148
   seq_read+0x204/0x480
   __vfs_read+0x32/0x178
   vfs_read+0x82/0x138
   ksys_read+0x5a/0xb0
   system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
   test_pages_in_a_zone+0xde/0x160
 Kernel panic - not syncing: Fatal exception: panic_on_oops

Fix this by checking whether the pfn to check is within the zone.

[mhocko@suse.com: separated this change from http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com]
Link: http://lkml.kernel.org/r/20190128144506.15603-3-mhocko@kernel.org

[mhocko@suse.com: separated this change from
http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com]
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Signed-off-by: Mikhail Zaslonko &lt;zaslonko@linux.ibm.com&gt;
Tested-by: Mikhail Gavrilov &lt;mikhail.v.gavrilov@gmail.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Tested-by: Gerald Schaefer &lt;gerald.schaefer@de.ibm.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Mikhail Gavrilov &lt;mikhail.v.gavrilov@gmail.com&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone</title>
<updated>2019-03-13T21:03:17Z</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2019-02-01T22:20:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=659052d1f16e0c3af60a371fa0147c007eba5a97'/>
<id>urn:sha1:659052d1f16e0c3af60a371fa0147c007eba5a97</id>
<content type='text'>
[ Upstream commit efad4e475c312456edb3c789d0996d12ed744c13 ]

Patch series "mm, memory_hotplug: fix uninitialized pages fallouts", v2.

Mikhail Zaslonko has posted fixes for the two bugs quite some time ago
[1].  I have pushed back on those fixes because I believed that it is
much better to plug the problem at the initialization time rather than
play whack-a-mole all over the hotplug code and find all the places
which expect the full memory section to be initialized.

We have ended up with commit 2830bf6f05fb ("mm, memory_hotplug:
initialize struct pages for the full memory section") merged and cause a
regression [2][3].  The reason is that there might be memory layouts
when two NUMA nodes share the same memory section so the merged fix is
simply incorrect.

In order to plug this hole we really have to be zone range aware in
those handlers.  I have split up the original patch into two.  One is
unchanged (patch 2) and I took a different approach for `removable'
crash.

[1] http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1666948
[3] http://lkml.kernel.org/r/20190125163938.GA20411@dhcp22.suse.cz

This patch (of 2):

Mikhail has reported the following VM_BUG_ON triggered when reading sysfs
removable state of a memory block:

 page:000003d08300c000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
   is_mem_section_removable+0xb4/0x190
   show_mem_removable+0x9a/0xd8
   dev_attr_show+0x34/0x70
   sysfs_kf_seq_show+0xc8/0x148
   seq_read+0x204/0x480
   __vfs_read+0x32/0x178
   vfs_read+0x82/0x138
   ksys_read+0x5a/0xb0
   system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
   is_mem_section_removable+0xb4/0x190
 Kernel panic - not syncing: Fatal exception: panic_on_oops

The reason is that the memory block spans the zone boundary and we are
stumbling over an unitialized struct page.  Fix this by enforcing zone
range in is_mem_section_removable so that we never run away from a zone.

Link: http://lkml.kernel.org/r/20190128144506.15603-2-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reported-by: Mikhail Zaslonko &lt;zaslonko@linux.ibm.com&gt;
Debugged-by: Mikhail Zaslonko &lt;zaslonko@linux.ibm.com&gt;
Tested-by: Gerald Schaefer &lt;gerald.schaefer@de.ibm.com&gt;
Tested-by: Mikhail Gavrilov &lt;mikhail.v.gavrilov@gmail.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined</title>
<updated>2019-01-13T09:01:02Z</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2018-12-28T08:38:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2c25071bed4b1f9c4cfb10a7914847d7069794bf'/>
<id>urn:sha1:2c25071bed4b1f9c4cfb10a7914847d7069794bf</id>
<content type='text'>
commit b15c87263a69272423771118c653e9a1d0672caa upstream.

We have received a bug report that an injected MCE about faulty memory
prevents memory offline to succeed on 4.4 base kernel.  The underlying
reason was that the HWPoison page has an elevated reference count and the
migration keeps failing.  There are two problems with that.  First of all
it is dubious to migrate the poisoned page because we know that accessing
that memory is possible to fail.  Secondly it doesn't make any sense to
migrate a potentially broken content and preserve the memory corruption
over to a new location.

Oscar has found out that 4.4 and the current upstream kernels behave
slightly differently with his simply testcase

===

int main(void)
{
        int ret;
        int i;
        int fd;
        char *array = malloc(4096);
        char *array_locked = malloc(4096);

        fd = open("/tmp/data", O_RDONLY);
        read(fd, array, 4095);

        for (i = 0; i &lt; 4096; i++)
                array_locked[i] = 'd';

        ret = mlock((void *)PAGE_ALIGN((unsigned long)array_locked), sizeof(array_locked));
        if (ret)
                perror("mlock");

        sleep (20);

        ret = madvise((void *)PAGE_ALIGN((unsigned long)array_locked), 4096, MADV_HWPOISON);
        if (ret)
                perror("madvise");

        for (i = 0; i &lt; 4096; i++)
                array_locked[i] = 'd';

        return 0;
}
===

+ offline this memory.

In 4.4 kernels he saw the hwpoisoned page to be returned back to the LRU
list
kernel:  [&lt;ffffffff81019ac9&gt;] dump_trace+0x59/0x340
kernel:  [&lt;ffffffff81019e9a&gt;] show_stack_log_lvl+0xea/0x170
kernel:  [&lt;ffffffff8101ac71&gt;] show_stack+0x21/0x40
kernel:  [&lt;ffffffff8132bb90&gt;] dump_stack+0x5c/0x7c
kernel:  [&lt;ffffffff810815a1&gt;] warn_slowpath_common+0x81/0xb0
kernel:  [&lt;ffffffff811a275c&gt;] __pagevec_lru_add_fn+0x14c/0x160
kernel:  [&lt;ffffffff811a2eed&gt;] pagevec_lru_move_fn+0xad/0x100
kernel:  [&lt;ffffffff811a334c&gt;] __lru_cache_add+0x6c/0xb0
kernel:  [&lt;ffffffff81195236&gt;] add_to_page_cache_lru+0x46/0x70
kernel:  [&lt;ffffffffa02b4373&gt;] extent_readpages+0xc3/0x1a0 [btrfs]
kernel:  [&lt;ffffffff811a16d7&gt;] __do_page_cache_readahead+0x177/0x200
kernel:  [&lt;ffffffff811a18c8&gt;] ondemand_readahead+0x168/0x2a0
kernel:  [&lt;ffffffff8119673f&gt;] generic_file_read_iter+0x41f/0x660
kernel:  [&lt;ffffffff8120e50d&gt;] __vfs_read+0xcd/0x140
kernel:  [&lt;ffffffff8120e9ea&gt;] vfs_read+0x7a/0x120
kernel:  [&lt;ffffffff8121404b&gt;] kernel_read+0x3b/0x50
kernel:  [&lt;ffffffff81215c80&gt;] do_execveat_common.isra.29+0x490/0x6f0
kernel:  [&lt;ffffffff81215f08&gt;] do_execve+0x28/0x30
kernel:  [&lt;ffffffff81095ddb&gt;] call_usermodehelper_exec_async+0xfb/0x130
kernel:  [&lt;ffffffff8161c045&gt;] ret_from_fork+0x55/0x80

And that latter confuses the hotremove path because an LRU page is
attempted to be migrated and that fails due to an elevated reference
count.  It is quite possible that the reuse of the HWPoisoned page is some
kind of fixed race condition but I am not really sure about that.

With the upstream kernel the failure is slightly different.  The page
doesn't seem to have LRU bit set but isolate_movable_page simply fails and
do_migrate_range simply puts all the isolated pages back to LRU and
therefore no progress is made and scan_movable_pages finds same set of
pages over and over again.

Fix both cases by explicitly checking HWPoisoned pages before we even try
to get reference on the page, try to unmap it if it is still mapped.  As
explained by Naoya:

: Hwpoison code never unmapped those for no big reason because
: Ksm pages never dominate memory, so we simply didn't have strong
: motivation to save the pages.

Also put WARN_ON(PageLRU) in case there is a race and we can hit LRU
HWPoison pages which shouldn't happen but I couldn't convince myself about
that.  Naoya has noted the following:

: Theoretically no such gurantee, because try_to_unmap() doesn't have a
: guarantee of success and then memory_failure() returns immediately
: when hwpoison_user_mappings fails.
: Or the following code (comes after hwpoison_user_mappings block) also impli=
: es
: that the target page can still have PageLRU flag.
:
:         /*
:          * Torn down by someone else?
:          */
:         if (PageLRU(p) &amp;&amp; !PageSwapCache(p) &amp;&amp; p-&gt;mapping =3D=3D NULL) {
:                 action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED);
:                 res =3D -EBUSY;
:                 goto out;
:         }
:
: So I think it's OK to keep "if (WARN_ON(PageLRU(page)))" block in
: current version of your patch.

Link: http://lkml.kernel.org/r/20181206120135.14079-1-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Oscar Salvador &lt;osalvador@suse.com&gt;
Debugged-by: Oscar Salvador &lt;osalvador@suse.com&gt;
Tested-by: Oscar Salvador &lt;osalvador@suse.com&gt;
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm/memory_hotplug: define find_{smallest|biggest}_section_pfn as unsigned long</title>
<updated>2017-10-04T00:54:26Z</updated>
<author>
<name>YASUAKI ISHIMATSU</name>
<email>yasu.isimatu@gmail.com</email>
</author>
<published>2017-10-03T23:16:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d09b0137d204bebeaafed672bc5a244e9ac92edb'/>
<id>urn:sha1:d09b0137d204bebeaafed672bc5a244e9ac92edb</id>
<content type='text'>
find_{smallest|biggest}_section_pfn()s find the smallest/biggest section
and return the pfn of the section.  But the functions are defined as int.
So the functions always return 0x00000000 - 0xffffffff.  It means if
memory address is over 16TB, the functions does not work correctly.

To handle 64 bit value, the patch defines
find_{smallest|biggest}_section_pfn() as unsigned long.

Fixes: 815121d2b5cd ("memory_hotplug: clear zone when removing the memory")
Link: http://lkml.kernel.org/r/d9d5593a-d0a4-c4be-ab08-493df59a85c6@gmail.com
Signed-off-by: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Xishi Qiu &lt;qiuxishi@huawei.com&gt;
Cc: Reza Arbab &lt;arbab@linux.vnet.ibm.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm/memory_hotplug: change pfn_to_section_nr/section_nr_to_pfn macro to inline function</title>
<updated>2017-10-04T00:54:25Z</updated>
<author>
<name>YASUAKI ISHIMATSU</name>
<email>yasu.isimatu@gmail.com</email>
</author>
<published>2017-10-03T23:16:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1dd2bfc86818ddbc95f98e312e7704350223fd7d'/>
<id>urn:sha1:1dd2bfc86818ddbc95f98e312e7704350223fd7d</id>
<content type='text'>
pfn_to_section_nr() and section_nr_to_pfn() are defined as macro.
pfn_to_section_nr() has no issue even if it is defined as macro.  But
section_nr_to_pfn() has overflow issue if sec is defined as int.

section_nr_to_pfn() just shifts sec by PFN_SECTION_SHIFT.  If sec is
defined as unsigned long, section_nr_to_pfn() returns pfn as 64 bit value.
But if sec is defined as int, section_nr_to_pfn() returns pfn as 32 bit
value.

__remove_section() calculates start_pfn using section_nr_to_pfn() and
scn_nr defined as int.  So if hot-removed memory address is over 16TB,
overflow issue occurs and section_nr_to_pfn() does not calculate correct
pfn.

To make callers use proper arg, the patch changes the macros to inline
functions.

Fixes: 815121d2b5cd ("memory_hotplug: clear zone when removing the memory")
Link: http://lkml.kernel.org/r/e643a387-e573-6bbf-d418-c60c8ee3d15e@gmail.com
Signed-off-by: Yasuaki Ishimatsu &lt;isimatu.yasuaki@jp.fujitsu.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Xishi Qiu &lt;qiuxishi@huawei.com&gt;
Cc: Reza Arbab &lt;arbab@linux.vnet.ibm.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
