user/sven/linux.git/kernel/memremap.c, branch v4.9.15

mm, devm_memremap_pages: hold device_hotplug lock over mem_hotplug_{begin, done}

2017-03-12T05:41:43Z

commit b5d24fda9c3dce51fcb4eee459550a458eaaf1e2 upstream. The mem_hotplug_{begin,done} lock coordinates with {get,put}_online_mems() to hold off "readers" of the current state of memory from new hotplug actions. mem_hotplug_begin() expects exclusive access, via the device_hotplug lock, to set mem_hotplug.active_writer. Calling mem_hotplug_begin() without locking device_hotplug can lead to corrupting mem_hotplug.refcount and missed wakeups / soft lockups. [dan.j.williams@intel.com: v2] Link: http://lkml.kernel.org/r/148728203365.38457.17804568297887708345.stgit@dwillia2-desk3.amr.corp.intel.com Link: http://lkml.kernel.org/r/148693885680.16345.17802627926777862337.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: f931ab479dd2 ("mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}") Signed-off-by: Dan Williams Reported-by: Ben Hutchings Cc: Michal Hocko Cc: Toshi Kani Cc: Vlastimil Babka Cc: Logan Gunthorpe Cc: Masayoshi Mizuma Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman

mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}

2017-01-19T19:17:58Z

commit f931ab479dd24cf7a2c6e2df19778406892591fb upstream. Both arch_add_memory() and arch_remove_memory() expect a single threaded context. For example, arch/x86/mm/init_64.c::kernel_physical_mapping_init() does not hold any locks over this check and branch: if (pgd_val(*pgd)) { pud = (pud_t *)pgd_page_vaddr(*pgd); paddr_last = phys_pud_init(pud, __pa(vaddr), __pa(vaddr_end), page_size_mask); continue; } pud = alloc_low_page(); paddr_last = phys_pud_init(pud, __pa(vaddr), __pa(vaddr_end), page_size_mask); The result is that two threads calling devm_memremap_pages() simultaneously can end up colliding on pgd initialization. This leads to crash signatures like the following where the loser of the race initializes the wrong pgd entry: BUG: unable to handle kernel paging request at ffff888ebfff0000 IP: memcpy_erms+0x6/0x10 PGD 2f8e8fc067 PUD 0 /* <---- Invalid PUD */ Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 54 PID: 3818 Comm: systemd-udevd Not tainted 4.6.7+ #13 task: ffff882fac290040 ti: ffff882f887a4000 task.ti: ffff882f887a4000 RIP: memcpy_erms+0x6/0x10 [..] Call Trace: ? pmem_do_bvec+0x205/0x370 [nd_pmem] ? blk_queue_enter+0x3a/0x280 pmem_rw_page+0x38/0x80 [nd_pmem] bdev_read_page+0x84/0xb0 Hold the standard memory hotplug mutex over calls to arch_{add,remove}_memory(). Fixes: 41e94a851304 ("add devm_memremap_pages") Link: http://lkml.kernel.org/r/148357647831.9498.12606007370121652979.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams Cc: Christoph Hellwig Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman

mm: fix cache mode of dax pmd mappings

2016-09-10T00:34:46Z

track_pfn_insert() in vmf_insert_pfn_pmd() is marking dax mappings as uncacheable rendering them impractical for application usage. DAX-pte mappings are cached and the goal of establishing DAX-pmd mappings is to attain more performance, not dramatically less (3 orders of magnitude). track_pfn_insert() relies on a previous call to reserve_memtype() to establish the expected page_cache_mode for the range. While memremap() arranges for reserve_memtype() to be called, devm_memremap_pages() does not. So, teach track_pfn_insert() and untrack_pfn() how to handle tracking without a vma, and arrange for devm_memremap_pages() to establish the write-back-cache reservation in the memtype tree. Cc: Cc: Matthew Wilcox Cc: Ross Zwisler Cc: Nilesh Choudhury Cc: Kirill A. Shutemov Reported-by: Toshi Kani Reported-by: Kai Zhang Acked-by: Andrew Morton Signed-off-by: Dan Williams

Merge tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

2016-07-29T00:38:16Z

Pull libnvdimm updates from Dan Williams: - Replace pcommit with ADR / directed-flushing. The pcommit instruction, which has not shipped on any product, is deprecated. Instead, the requirement is that platforms implement either ADR, or provide one or more flush addresses per nvdimm. ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers to the memory controller on a power-fail event. Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware Interface Table (NFIT) sub-structure: "Flush Hint Address Structure". A flush hint is an mmio address that when written and fenced assures that all previous posted writes targeting a given dimm have been flushed to media. - On-demand ARS (address range scrub). Linux uses the results of the ACPI ARS commands to track bad blocks in pmem devices. When latent errors are detected we re-scrub the media to refresh the bad block list, userspace can also request a re-scrub at any time. - Support for the Microsoft DSM (device specific method) command format. - Support for EDK2/OVMF virtual disk device memory ranges. - Various fixes and cleanups across the subsystem. * tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits) libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register" nfit: do an ARS scrub on hitting a latent media error nfit: move to nfit/ sub-directory nfit, libnvdimm: allow an ARS scrub to be triggered on demand libnvdimm: register nvdimm_bus devices with an nd_bus driver pmem: clarify a debug print in pmem_clear_poison x86/insn: remove pcommit Revert "KVM: x86: add pcommit support" nfit, tools/testing/nvdimm/: unify shutdown paths libnvdimm: move ->module to struct nvdimm_bus_descriptor nfit: cleanup acpi_nfit_init calling convention nfit: fix _FIT evaluation memory leak + use after free tools/testing/nvdimm: add manufacturing_{date|location} dimm properties tools/testing/nvdimm: add virtual ramdisk range acpi, nfit: treat virtual ramdisk SPA as pmem region pmem: kill __pmem address space pmem: kill wmb_pmem() libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes fs/dax: remove wmb_pmem() libnvdimm, pmem: flush posted-write queues on shutdown ...

mm: cleanup ifdef guards for vmem_altmap

2016-07-28T23:07:41Z

Now that ZONE_DEVICE depends on SPARSEMEM_VMEMMAP we can simplify some ifdef guards to just ZONE_DEVICE. Link: http://lkml.kernel.org/r/146687646788.39261.8020536391978771940.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams Reported-by: Vlastimil Babka Cc: Eric Sandeen Cc: Jeff Moyer Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

libnvdimm, pmem: allow nfit_test to override pmem_direct_access()

2016-06-24T18:39:29Z

Currently phys_to_pfn_t() is an exported symbol to allow nfit_test to override it and indicate that nfit_test-pmem is not device-mapped. Now, we want to enable nfit_test to operate without DMA_CMA and the pmem it provides will no longer be physically contiguous, i.e. won't be capable of supporting direct_access requests larger than a page. Make pmem_direct_access() a weak symbol so that it can be replaced by the tools/testing/nvdimm/ version, and move phys_to_pfn_t() to a static inline now that it no longer needs to be overridden. Acked-by: Johannes Thumshirn Signed-off-by: Dan Williams

memremap: add arch specific hook for MEMREMAP_WB mappings

2016-04-04T08:26:41Z

Currently, the memremap code serves MEMREMAP_WB mappings directly from the kernel direct mapping, unless the region is in high memory, in which case it falls back to using ioremap_cache(). However, the semantics of ioremap_cache() are not unambiguously defined, and on ARM, it will actually result in a mapping type that differs from the attributes used for the linear mapping, and for this reason, the ioremap_cache() call fails if the region is part of the memory managed by the kernel. So instead, implement an optional hook 'arch_memremap_wb' whose default implementation calls ioremap_cache() as before, but which can be overridden by the architecture to do what is appropriate for it. Acked-by: Dan Williams Signed-off-by: Ard Biesheuvel

memremap: add MEMREMAP_WC flag

2016-03-22T22:36:02Z

Add a flag to memremap() for writecombine mappings. Mappings satisfied by this flag will not be cached, however writes may be delayed or combined into more efficient bursts. This is most suitable for buffers written sequentially by the CPU for use by other DMA devices. Signed-off-by: Brian Starkey Reviewed-by: Catalin Marinas Cc: Dan Williams Cc: Greg Kroah-Hartman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

memremap: don't modify flags

2016-03-22T22:36:02Z

These patches implement a MEMREMAP_WC flag for memremap(), which can be used to obtain writecombine mappings. This is then used for setting up dma_coherent_mem regions which use the DMA_MEMORY_MAP flag. The motivation is to fix an alignment fault on arm64, and the suggestion to implement MEMREMAP_WC for this case was made at [1]. That particular issue is handled in patch 4, which makes sure that the appropriate memset function is used when zeroing allocations mapped as IO memory. This patch (of 4): Don't modify the flags input argument to memremap(). MEMREMAP_WB is already a special case so we can check for it directly instead of clearing flag bits in each mapper. Signed-off-by: Brian Starkey Cc: Catalin Marinas Cc: Dan Williams Cc: Greg Kroah-Hartman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm: fix two typos in comments for to_vmem_altmap()

2016-03-15T23:55:16Z

Commit 4b94ffdc4163 ("x86, mm: introduce vmem_altmap to augment vmemmap_populate()"), introduced the to_vmem_altmap() function. The comments in this function contain two typos (one misspelling of the Kconfig option CONFIG_SPARSEMEM_VMEMMAP, and one missing letter 'n'), let's fix them up. Signed-off-by: Andreas Ziegler Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds