<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/resource.c, branch v5.4.76</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.4.76</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.4.76'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2020-06-24T15:50:35Z</updated>
<entry>
<title>/dev/mem: Revoke mappings when a driver claims the region</title>
<updated>2020-06-24T15:50:35Z</updated>
<author>
<name>Dan Williams</name>
<email>dan.j.williams@intel.com</email>
</author>
<published>2020-05-21T21:06:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ece3a3337c50c8e4fdd76a9158ac2c5c1067d061'/>
<id>urn:sha1:ece3a3337c50c8e4fdd76a9158ac2c5c1067d061</id>
<content type='text'>
[ Upstream commit 3234ac664a870e6ea69ae3a57d824cd7edbeacc5 ]

Close the hole of holding a mapping over kernel driver takeover event of
a given address range.

Commit 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
introduced CONFIG_IO_STRICT_DEVMEM with the goal of protecting the
kernel against scenarios where a /dev/mem user tramples memory that a
kernel driver owns. However, this protection only prevents *new* read(),
write() and mmap() requests. Established mappings prior to the driver
calling request_mem_region() are left alone.

Especially with persistent memory, and the core kernel metadata that is
stored there, there are plentiful scenarios for a /dev/mem user to
violate the expectations of the driver and cause amplified damage.

Teach request_mem_region() to find and shoot down active /dev/mem
mappings that it believes it has successfully claimed for the exclusive
use of the driver. Effectively a driver call to request_mem_region()
becomes a hole-punch on the /dev/mem device.

The typical usage of unmap_mapping_range() is part of
truncate_pagecache() to punch a hole in a file, but in this case the
implementation is only doing the "first half" of a hole punch. Namely it
is just evacuating current established mappings of the "hole", and it
relies on the fact that /dev/mem establishes mappings in terms of
absolute physical address offsets. Once existing mmap users are
invalidated they can attempt to re-establish the mapping, or attempt to
continue issuing read(2) / write(2) to the invalidated extent, but they
will then be subject to the CONFIG_IO_STRICT_DEVMEM checking that can
block those subsequent accesses.

Cc: Arnd Bergmann &lt;arnd@arndb.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Russell King &lt;linux@arm.linux.org.uk&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Fixes: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Link: https://lore.kernel.org/r/159009507306.847224.8502634072429766747.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm/memory_hotplug.c: use PFN_UP / PFN_DOWN in walk_system_ram_range()</title>
<updated>2019-09-24T22:54:09Z</updated>
<author>
<name>David Hildenbrand</name>
<email>david@redhat.com</email>
</author>
<published>2019-09-23T22:35:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=00ff9a91bdb74933648a5b346d9f0edb99bd76d3'/>
<id>urn:sha1:00ff9a91bdb74933648a5b346d9f0edb99bd76d3</id>
<content type='text'>
Patch series "mm/memory_hotplug: online_pages() cleanups", v2.

Some cleanups (+ one fix for a special case) in the context of
online_pages().

This patch (of 5):

This makes it clearer that we will never call func() with duplicate PFNs
in case we have multiple sub-page memory resources.  All unaligned parts
of PFNs are completely discarded.

Link: http://lkml.kernel.org/r/20190814154109.3448-2-david@redhat.com
Signed-off-by: David Hildenbrand &lt;david@redhat.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Wei Yang &lt;richardw.yang@linux.intel.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Borislav Petkov &lt;bp@suse.de&gt;
Cc: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Nadav Amit &lt;namit@vmware.com&gt;
Cc: Oscar Salvador &lt;osalvador@suse.de&gt;
Cc: Arun KS &lt;arunks@codeaurora.org&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@soleen.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>resource: add a not device managed request_free_mem_region variant</title>
<updated>2019-08-20T12:39:41Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2019-08-18T09:05:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0c385190392d8c7128fb7517b3c676e19c7b8808'/>
<id>urn:sha1:0c385190392d8c7128fb7517b3c676e19c7b8808</id>
<content type='text'>
Factor out the guts of devm_request_free_mem_region so that we can
implement both a device managed and a manually release version as tiny
wrappers around it.

Link: https://lore.kernel.org/r/20190818090557.17853-2-hch@lst.de
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Ira Weiny &lt;ira.weiny@intel.com&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Tested-by: Bharata B Rao &lt;bharata@linux.ibm.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
</content>
</entry>
<entry>
<title>resource: avoid unnecessary lookups in find_next_iomem_res()</title>
<updated>2019-07-19T00:08:06Z</updated>
<author>
<name>Nadav Amit</name>
<email>namit@vmware.com</email>
</author>
<published>2019-07-18T22:57:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=756398750e11ade1e617cd2a8f8d66fe7ed637e1'/>
<id>urn:sha1:756398750e11ade1e617cd2a8f8d66fe7ed637e1</id>
<content type='text'>
find_next_iomem_res() shows up to be a source for overhead in dax
benchmarks.

Improve performance by not considering children of the tree if the top
level does not match.  Since the range of the parents should include the
range of the children such check is redundant.

Running sysbench on dax (pmem emulation, with write_cache disabled):

  sysbench fileio --file-total-size=3G --file-test-mode=rndwr \
   --file-io-mode=mmap --threads=4 --file-fsync-mode=fdatasync run

Provides the following results:

		events (avg/stddev)
		-------------------
  5.2-rc3:	1247669.0000/16075.39
  w/patch:	1286320.5000/16402.72	(+3%)

Link: http://lkml.kernel.org/r/20190613045903.4922-3-namit@vmware.com
Signed-off-by: Nadav Amit &lt;namit@vmware.com&gt;
Cc: Borislav Petkov &lt;bp@suse.de&gt;
Cc: Toshi Kani &lt;toshi.kani@hpe.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>resource: fix locking in find_next_iomem_res()</title>
<updated>2019-07-19T00:08:06Z</updated>
<author>
<name>Nadav Amit</name>
<email>namit@vmware.com</email>
</author>
<published>2019-07-18T22:57:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=49f17c26c123b60fd1c74629eef077740d16ffc2'/>
<id>urn:sha1:49f17c26c123b60fd1c74629eef077740d16ffc2</id>
<content type='text'>
Since resources can be removed, locking should ensure that the resource
is not removed while accessing it.  However, find_next_iomem_res() does
not hold the lock while copying the data of the resource.

Keep holding the lock while the data is copied.  While at it, change the
return value to a more informative value.  It is disregarded by the
callers.

[akpm@linux-foundation.org: fix find_next_iomem_res() documentation]
Link: http://lkml.kernel.org/r/20190613045903.4922-2-namit@vmware.com
Fixes: ff3cc952d3f00 ("resource: Add remove_resource interface")
Signed-off-by: Nadav Amit &lt;namit@vmware.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Borislav Petkov &lt;bp@suse.de&gt;
Cc: Toshi Kani &lt;toshi.kani@hpe.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: factor out a devm_request_free_mem_region helper</title>
<updated>2019-07-02T17:32:44Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2019-06-26T12:27:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0092908d16c604b8207c2141ec64b0fa4473bb03'/>
<id>urn:sha1:0092908d16c604b8207c2141ec64b0fa4473bb03</id>
<content type='text'>
Keep the physical address allocation that hmm_add_device does with the
rest of the resource code, and allow future reuse of it without the hmm
wrapper.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
Reviewed-by: John Hubbard &lt;jhubbard@nvidia.com&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@mellanox.com&gt;
</content>
</entry>
<entry>
<title>treewide: Add SPDX license identifier for missed files</title>
<updated>2019-05-21T08:50:45Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-05-19T12:08:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=457c89965399115e5cd8bf38f9c597293405703d'/>
<id>urn:sha1:457c89965399115e5cd8bf38f9c597293405703d</id>
<content type='text'>
Add SPDX license identifiers to all files which:

 - Have no license information of any form

 - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
   initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm/resource: Use resource_overlaps() to simplify region_intersects()</title>
<updated>2019-04-19T10:59:36Z</updated>
<author>
<name>Wei Yang</name>
<email>richardw.yang@linux.intel.com</email>
</author>
<published>2019-03-05T08:34:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f6c6010a07734103a31faa0cc977641b358c45b0'/>
<id>urn:sha1:f6c6010a07734103a31faa0cc977641b358c45b0</id>
<content type='text'>
The three checks in region_intersects() are basically an open-coded version
of resource_overlaps() - so use the real thing.

Also fix typos in comments while at it.

Signed-off-by: Wei Yang &lt;richardw.yang@linux.intel.com&gt;
Reviewed-by: Like Xu &lt;like.xu@linux.intel.com&gt;
Reviewed-by: Yuan Yao &lt;yuan.yao@linux.intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: akpm@linux-foundation.org
Cc: bhelgaas@google.com
Cc: bp@suse.de
Cc: dan.j.williams@intel.com
Cc: jack@suse.cz
Cc: rdunlap@infradead.org
Cc: tiwai@suse.de
Link: http://lkml.kernel.org/r/20190305083432.23675-1-richardw.yang@linux.intel.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm</title>
<updated>2019-03-16T20:05:32Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-03-16T20:05:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f67e3fb4891287b8248ebb3320f794b9f5e782d4'/>
<id>urn:sha1:f67e3fb4891287b8248ebb3320f794b9f5e782d4</id>
<content type='text'>
Pull device-dax updates from Dan Williams:
 "New device-dax infrastructure to allow persistent memory and other
  "reserved" / performance differentiated memories, to be assigned to
  the core-mm as "System RAM".

  Some users want to use persistent memory as additional volatile
  memory. They are willing to cope with potential performance
  differences, for example between DRAM and 3D Xpoint, and want to use
  typical Linux memory management apis rather than a userspace memory
  allocator layered over an mmap() of a dax file. The administration
  model is to decide how much Persistent Memory (pmem) to use as System
  RAM, create a device-dax-mode namespace of that size, and then assign
  it to the core-mm. The rationale for device-dax is that it is a
  generic memory-mapping driver that can be layered over any "special
  purpose" memory, not just pmem. On subsequent boots udev rules can be
  used to restore the memory assignment.

  One implication of using pmem as RAM is that mlock() no longer keeps
  data off persistent media. For this reason it is recommended to enable
  NVDIMM Security (previously merged for 5.0) to encrypt pmem contents
  at rest. We considered making this recommendation an actively enforced
  requirement, but in the end decided to leave it as a distribution /
  administrator policy to allow for emulation and test environments that
  lack security capable NVDIMMs.

  Summary:

   - Replace the /sys/class/dax device model with /sys/bus/dax, and
     include a compat driver so distributions can opt-in to the new ABI.

   - Allow for an alternative driver for the device-dax address-range

   - Introduce the 'kmem' driver to hotplug / assign a device-dax
     address-range to the core-mm.

   - Arrange for the device-dax target-node to be onlined so that the
     newly added memory range can be uniquely referenced by numa apis"

NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because
we currently have special - and very annoying rules in the kernel about
accessing PMEM only with the "MC safe" accessors, because machine checks
inside the regular repeat string copy functions can be fatal in some
(not described) circumstances.

And apparently the PMEM modules can cause that a lot more than regular
RAM.  The argument is that this happens because PMEM doesn't necessarily
get scrubbed at boot like RAM does, but that is planned to be added for
the user space tooling.

Quoting Dan from another email:
 "The exposure can be reduced in the volatile-RAM case by scanning for
  and clearing errors before it is onlined as RAM. The userspace tooling
  for that can be in place before v5.1-final. There's also runtime
  notifications of errors via acpi_nfit_uc_error_notify() from
  background scrubbers on the DIMM devices. With that mechanism the
  kernel could proactively clear newly discovered poison in the volatile
  case, but that would be additional development more suitable for v5.2.

  I understand the concern, and the need to highlight this issue by
  tapping the brakes on feature development, but I don't see PMEM as RAM
  making the situation worse when the exposure is also there via DAX in
  the PMEM case. Volatile-RAM is arguably a safer use case since it's
  possible to repair pages where the persistent case needs active
  application coordination"

* tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  device-dax: "Hotplug" persistent memory for use like normal RAM
  mm/resource: Let walk_system_ram_range() search child resources
  mm/memory-hotplug: Allow memory resources to be children
  mm/resource: Move HMM pr_debug() deeper into resource code
  mm/resource: Return real error codes from walk failures
  device-dax: Add a 'modalias' attribute to DAX 'bus' devices
  device-dax: Add a 'target_node' attribute
  device-dax: Auto-bind device after successful new_id
  acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node
  device-dax: Add /sys/class/dax backwards compatibility
  device-dax: Add support for a dax override driver
  device-dax: Move resource pinning+mapping into the common driver
  device-dax: Introduce bus + driver model
  device-dax: Start defining a dax bus model
  device-dax: Remove multi-resource infrastructure
  device-dax: Kill dax_region base
  device-dax: Kill dax_region ida
</content>
</entry>
<entry>
<title>mm/resource: Let walk_system_ram_range() search child resources</title>
<updated>2019-02-28T18:41:23Z</updated>
<author>
<name>Dave Hansen</name>
<email>dave.hansen@linux.intel.com</email>
</author>
<published>2019-02-25T18:57:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2b539aefe9e48e3908cff02699aa63a8b9bd268e'/>
<id>urn:sha1:2b539aefe9e48e3908cff02699aa63a8b9bd268e</id>
<content type='text'>
In the process of onlining memory, we use walk_system_ram_range()
to find the actual RAM areas inside of the area being onlined.

However, it currently only finds memory resources which are
"top-level" iomem_resources.  Children are not currently
searched which causes it to skip System RAM in areas like this
(in the format of /proc/iomem):

a0000000-bfffffff : Persistent Memory (legacy)
  a0000000-afffffff : System RAM

Changing the true-&gt;false here allows children to be searched
as well.  We need this because we add a new "System RAM"
resource underneath the "persistent memory" resource when
we use persistent memory in a volatile mode.

Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: Keith Busch &lt;keith.busch@intel.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Dave Jiang &lt;dave.jiang@intel.com&gt;
Cc: Ross Zwisler &lt;zwisler@kernel.org&gt;
Cc: Vishal Verma &lt;vishal.l.verma@intel.com&gt;
Cc: Tom Lendacky &lt;thomas.lendacky@amd.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: linux-nvdimm@lists.01.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: Huang Ying &lt;ying.huang@intel.com&gt;
Cc: Fengguang Wu &lt;fengguang.wu@intel.com&gt;
Cc: Borislav Petkov &lt;bp@suse.de&gt;
Cc: Bjorn Helgaas &lt;bhelgaas@google.com&gt;
Cc: Yaowei Bai &lt;baiyaowei@cmss.chinamobile.com&gt;
Cc: Takashi Iwai &lt;tiwai@suse.de&gt;
Cc: Jerome Glisse &lt;jglisse@redhat.com&gt;
Signed-off-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
</entry>
</feed>
