user/sven/linux.git/include/linux/mm.h, branch v3.16.74

mm/page_alloc.c: calculate 'available' memory in a separate function

2019-08-13T11:39:28Z

commit d02bd27bd33dd7e8d22594cd568b81be0cb584cd upstream. Add a new field, VIRTIO_BALLOON_S_AVAIL, to virtio_balloon memory statistics protocol, corresponding to 'Available' in /proc/meminfo. It indicates to the hypervisor how big the balloon can be inflated without pushing the guest system to swap. This metric would be very useful in VM orchestration software to improve memory management of different VMs under overcommit. This patch (of 2): Factor out calculation of the available memory counter into a separate exportable function, in order to be able to use it in other parts of the kernel. In particular, it appears a relevant metric to report to the hypervisor via virtio-balloon statistics interface (in a followup patch). Signed-off-by: Igor Redko Signed-off-by: Denis V. Lunev Reviewed-by: Roman Kagan Cc: Michael S. Tsirkin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16 as dependency of commit a1078e821b60 "xen: let alloc_xenballooned_pages() fail if not enough memory free"] Signed-off-by: Ben Hutchings

mm: introduce vma_is_anonymous(vma) helper

2019-06-20T17:11:26Z

commit b5330628546616af14ff23075fbf8d4ad91f6e25 upstream. special_mapping_fault() is absolutely broken. It seems it was always wrong, but this didn't matter until vdso/vvar started to use more than one page. And after this change vma_is_anonymous() becomes really trivial, it simply checks vm_ops == NULL. However, I do think the helper makes sense. There are a lot of ->vm_ops != NULL checks, the helper makes the caller's code more understandable (self-documented) and this is more grep-friendly. This patch (of 3): Preparation. Add the new simple helper, vma_is_anonymous(vma), and change handle_pte_fault() to use it. It will have more users. The name is not accurate, say a hpet_mmap()'ed vma is not anonymous. Perhaps it should be named vma_has_fault() instead. But it matches the logic in mmap.c/memory.c (see next changes). "True" just means that a page fault will use do_anonymous_page(). Signed-off-by: Oleg Nesterov Acked-by: Kirill A. Shutemov Cc: Andy Lutomirski Cc: Hugh Dickins Cc: Pavel Emelyanov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16 as dependency of "mm/mincore.c: make mincore() more conservative"; adjusted context] Signed-off-by: Ben Hutchings

mm: migration: fix migration of huge PMD shared pages

2019-04-04T15:14:08Z

commit 017b1660df89f5fb4bfe66c34e35f7d2031100c7 upstream. The page migration code employs try_to_unmap() to try and unmap the source page. This is accomplished by using rmap_walk to find all vmas where the page is mapped. This search stops when page mapcount is zero. For shared PMD huge pages, the page map count is always 1 no matter the number of mappings. Shared mappings are tracked via the reference count of the PMD page. Therefore, try_to_unmap stops prematurely and does not completely unmap all mappings of the source page. This problem can result is data corruption as writes to the original source page can happen after contents of the page are copied to the target page. Hence, data is lost. This problem was originally seen as DB corruption of shared global areas after a huge page was soft offlined due to ECC memory errors. DB developers noticed they could reproduce the issue by (hotplug) offlining memory used to back huge pages. A simple testcase can reproduce the problem by creating a shared PMD mapping (note that this must be at least PUD_SIZE in size and PUD_SIZE aligned (1GB on x86)), and using migrate_pages() to migrate process pages between nodes while continually writing to the huge pages being migrated. To fix, have the try_to_unmap_one routine check for huge PMD sharing by calling huge_pmd_unshare for hugetlbfs huge pages. If it is a shared mapping it will be 'unshared' which removes the page table entry and drops the reference on the PMD page. After this, flush caches and TLB. mmu notifiers are called before locking page tables, but we can not be sure of PMD sharing until page tables are locked. Therefore, check for the possibility of PMD sharing before locking so that notifiers can prepare for the worst possible case. Link: http://lkml.kernel.org/r/20180823205917.16297-2-mike.kravetz@oracle.com [mike.kravetz@oracle.com: make _range_in_vma() a static inline] Link: http://lkml.kernel.org/r/6063f215-a5c8-2f0c-465a-2c515ddc952d@oracle.com Fixes: 39dde65c9940 ("shared page table for hugetlb page") Signed-off-by: Mike Kravetz Acked-by: Kirill A. Shutemov Reviewed-by: Naoya Horiguchi Acked-by: Michal Hocko Cc: Vlastimil Babka Cc: Davidlohr Bueso Cc: Jerome Glisse Cc: Mike Kravetz Signed-off-by: Andrew Morton Signed-off-by: Mike Kravetz Acked-by: Michal Hocko Reviewed-by: Jérôme Glisse Signed-off-by: Greg Kroah-Hartman [bwh: Backported from 4.4 to 3.16: adjust context] Signed-off-by: Ben Hutchings

fs/proc: Stop trying to report thread stacks

2018-11-20T18:05:58Z

commit b18cb64ead400c01bf1580eeba330ace51f8087d upstream. This reverts more of: b76437579d13 ("procfs: mark thread stack correctly in proc//maps") ... which was partially reverted by: 65376df58217 ("proc: revert /proc//maps [stack:TID] annotation") Originally, /proc/PID/task/TID/maps was the same as /proc/TID/maps. In current kernels, /proc/PID/maps (or /proc/TID/maps even for threads) shows "[stack]" for VMAs in the mm's stack address range. In contrast, /proc/PID/task/TID/maps uses KSTK_ESP to guess the target thread's stack's VMA. This is racy, probably returns garbage and, on arches with CONFIG_TASK_INFO_IN_THREAD=y, is also crash-prone: KSTK_ESP is not safe to use on tasks that aren't known to be running ordinary process-context kernel code. This patch removes the difference and just shows "[stack]" for VMAs in the mm's stack range. This is IMO much more sensible -- the actual "stack" address really is treated specially by the VM code, and the current thread stack isn't even well-defined for programs that frequently switch stacks on their own. Reported-by: Jann Horn Signed-off-by: Andy Lutomirski Acked-by: Thomas Gleixner Cc: Al Viro Cc: Andrew Morton Cc: Borislav Petkov Cc: Brian Gerst Cc: Johannes Weiner Cc: Kees Cook Cc: Linus Torvalds Cc: Linux API Cc: Peter Zijlstra Cc: Tycho Andersen Link: http://lkml.kernel.org/r/3e678474ec14e0a0ec34c611016753eea2e1b8ba.1475257877.git.luto@kernel.org Signed-off-by: Ingo Molnar [bwh: Backported to 3.16: Squash in the earlier commits 58cb65487e92 "proc/maps: make vm_is_stack() logic namespace-friendly" and 65376df58217 "proc: revert /proc//maps [stack:TID] annotation", which would introduce build failures if applied separately.] Signed-off-by: Ben Hutchings

pagewalk: improve vma handling

2018-10-03T03:10:02Z

commit fafaa4264eba49fd10695c193a82760558d093f4 upstream. Current implementation of page table walker has a fundamental problem in vma handling, which started when we tried to handle vma(VM_HUGETLB). Because it's done in pgd loop, considering vma boundary makes code complicated and bug-prone. From the users viewpoint, some user checks some vma-related condition to determine whether the user really does page walk over the vma. In order to solve these, this patch moves vma check outside pgd loop and introduce a new callback ->test_walk(). Signed-off-by: Naoya Horiguchi Acked-by: Kirill A. Shutemov Cc: "Kirill A. Shutemov" Cc: Andrea Arcangeli Cc: Cyrill Gorcunov Cc: Dave Hansen Cc: Pavel Emelyanov Cc: Benjamin Herrenschmidt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16 as dependency of L1TF mitigation] Signed-off-by: Ben Hutchings

mm/pagewalk: remove pgd_entry() and pud_entry()

2018-10-03T03:10:02Z

commit 0b1fbfe50006c41014cc25660c0e735d21c34939 upstream. Currently no user of page table walker sets ->pgd_entry() or ->pud_entry(), so checking their existence in each loop is just wasting CPU cycle. So let's remove it to reduce overhead. Signed-off-by: Naoya Horiguchi Acked-by: Kirill A. Shutemov Cc: Andrea Arcangeli Cc: Cyrill Gorcunov Cc: Dave Hansen Cc: Kirill A. Shutemov Cc: Pavel Emelyanov Cc: Benjamin Herrenschmidt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16 as dependency of L1TF mitigation] Signed-off-by: Ben Hutchings

mm: Add vm_insert_pfn_prot()

2018-10-03T03:10:01Z

commit 1745cbc5d0dee0749a6bc0ea8e872c5db0074061 upstream. The x86 vvar vma contains pages with differing cacheability flags. x86 currently implements this by manually inserting all the ptes using (io_)remap_pfn_range when the vma is set up. x86 wants to move to using .fault with VM_FAULT_NOPAGE to set up the mappings as needed. The correct API to use to insert a pfn in .fault is vm_insert_pfn(), but vm_insert_pfn() can't override the vma's cache mode, and the HPET page in particular needs to be uncached despite the fact that the rest of the VMA is cached. Add vm_insert_pfn_prot() to support varying cacheability within the same non-COW VMA in a more sane manner. x86 could alternatively use multiple VMAs, but that's messy, would break CRIU, and would create unnecessary VMAs that would waste memory. Signed-off-by: Andy Lutomirski Reviewed-by: Kees Cook Acked-by: Andrew Morton Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: Fenghua Yu Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Quentin Casasnovas Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/d2938d1eb37be7a5e4f86182db646551f11e45aa.1451446564.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Ben Hutchings

mm: remove rest usage of VM_NONLINEAR and pte_file()

2018-10-03T03:09:52Z

commit 0661a33611fca12570cba48d9344ce68834ee86c upstream. One bit in ->vm_flags is unused now! Signed-off-by: Kirill A. Shutemov Cc: Dan Carpenter Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16: Drop changes in mm/debug.c] Signed-off-by: Ben Hutchings

rmap: drop support of non-linear mappings

2018-10-03T03:09:51Z

commit 27ba0644ea9dfe6e7693abc85837b60e40583b96 upstream. We don't create non-linear mappings anymore. Let's drop code which handles them in rmap. Signed-off-by: Kirill A. Shutemov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16: - Deleted code is slightly different - Adjust context] Signed-off-by: Ben Hutchings

mm: drop vm_ops->remap_pages and generic_file_remap_pages() stub

2018-10-03T03:09:51Z

commit d83a08db5ba6072caa658745881f4baa9bad6a08 upstream. Nobody uses it anymore. [akpm@linux-foundation.org: fix filemap_xip.c] Signed-off-by: Kirill A. Shutemov Cc: Wu Fengguang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16: - Deleted code is slightly different - Adjust context] Signed-off-by: Ben Hutchings