user/sven/linux.git/include/linux/rmap.h, branch v3.14.36

mm: fix swapops.h:131 bug if remap_file_pages raced migration

2014-03-21T05:09:09Z

Add remove_linear_migration_ptes_from_nonlinear(), to fix an interesting little include/linux/swapops.h:131 BUG_ON(!PageLocked) found by trinity: indicating that remove_migration_ptes() failed to find one of the migration entries that was temporarily inserted. The problem comes from remap_file_pages()'s switch from vma_interval_tree (good for inserting the migration entry) to i_mmap_nonlinear list (no good for locating it again); but can only be a problem if the remap_file_pages() range does not cover the whole of the vma (zap_pte() clears the range). remove_migration_ptes() needs a file_nonlinear method to go down the i_mmap_nonlinear list, applying linear location to look for migration entries in those vmas too, just in case there was this race. The file_nonlinear method does need rmap_walk_control.arg to do this; but it never needed vma passed in - vma comes from its own iteration. Reported-and-tested-by: Dave Jones Reported-and-tested-by: Sasha Levin Signed-off-by: Hugh Dickins Signed-off-by: Linus Torvalds

mm/rmap: use rmap_walk() in page_referenced()

2014-01-22T00:19:45Z

Now, we have an infrastructure in rmap_walk() to handle difference from variants of rmap traversing functions. So, just use it in page_referenced(). In this patch, I change following things. 1. remove some variants of rmap traversing functions. cf> page_referenced_ksm, page_referenced_anon, page_referenced_file 2. introduce new struct page_referenced_arg and pass it to page_referenced_one(), main function of rmap_walk, in order to count reference, to store vm_flags and to check finish condition. 3. mechanical change to use rmap_walk() in page_referenced(). [liwanp@linux.vnet.ibm.com: fix BUG at rmap_walk] Signed-off-by: Joonsoo Kim Reviewed-by: Naoya Horiguchi Cc: Mel Gorman Cc: Hugh Dickins Cc: Rik van Riel Cc: Ingo Molnar Cc: Hillf Danton Signed-off-by: Wanpeng Li Cc: Sasha Levin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm/rmap: use rmap_walk() in try_to_unmap()

2014-01-22T00:19:45Z

Now, we have an infrastructure in rmap_walk() to handle difference from variants of rmap traversing functions. So, just use it in try_to_unmap(). In this patch, I change following things. 1. enable rmap_walk() if !CONFIG_MIGRATION. 2. mechanical change to use rmap_walk() in try_to_unmap(). Signed-off-by: Joonsoo Kim Reviewed-by: Naoya Horiguchi Cc: Mel Gorman Cc: Hugh Dickins Cc: Rik van Riel Cc: Ingo Molnar Cc: Hillf Danton Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm/rmap: extend rmap_walk_xxx() to cope with different cases

2014-01-22T00:19:45Z

There are a lot of common parts in traversing functions, but there are also a little of uncommon parts in it. By assigning proper function pointer on each rmap_walker_control, we can handle these difference correctly. Following are differences we should handle. 1. difference of lock function in anon mapping case 2. nonlinear handling in file mapping case 3. prechecked condition: checking memcg in page_referenced(), checking VM_SHARE in page_mkclean() checking temporary vma in try_to_unmap() 4. exit condition: checking page_mapped() in try_to_unmap() So, in this patch, I introduce 4 function pointers to handle above differences. Signed-off-by: Joonsoo Kim Cc: Naoya Horiguchi Cc: Mel Gorman Cc: Hugh Dickins Cc: Rik van Riel Cc: Ingo Molnar Cc: Hillf Danton Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm/rmap: make rmap_walk to get the rmap_walk_control argument

2014-01-22T00:19:45Z

In each rmap traverse case, there is some difference so that we need function pointers and arguments to them in order to handle these For this purpose, struct rmap_walk_control is introduced in this patch, and will be extended in following patch. Introducing and extending are separate, because it clarify changes. Signed-off-by: Joonsoo Kim Reviewed-by: Naoya Horiguchi Cc: Mel Gorman Cc: Hugh Dickins Cc: Rik van Riel Cc: Ingo Molnar Cc: Hillf Danton Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm/rmap: rename anon_vma_unlock() => anon_vma_unlock_write()

2013-02-24T01:50:17Z

The comment in commit 4fc3f1d66b1e ("mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable") says: | Rename anon_vma_[un]lock() => anon_vma_[un]lock_write(), | to make it clearer that it's an exclusive write-lock in | that case - suggested by Rik van Riel. But that commit renames only anon_vma_lock() Signed-off-by: Konstantin Khlebnikov Cc: Ingo Molnar Reviewed-by: Rik van Riel Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable

2012-12-11T14:43:00Z

rmap_walk_anon() and try_to_unmap_anon() appears to be too careful about locking the anon vma: while it needs protection against anon vma list modifications, it does not need exclusive access to the list itself. Transforming this exclusive lock to a read-locked rwsem removes a global lock from the hot path of page-migration intense threaded workloads which can cause pathological performance like this: 96.43% process 0 [kernel.kallsyms] [k] perf_trace_sched_switch | --- perf_trace_sched_switch __schedule schedule schedule_preempt_disabled __mutex_lock_common.isra.6 __mutex_lock_slowpath mutex_lock | |--50.61%-- rmap_walk | move_to_new_page | migrate_pages | migrate_misplaced_page | __do_numa_page.isra.69 | handle_pte_fault | handle_mm_fault | __do_page_fault | do_page_fault | page_fault | __memset_sse2 | | | --100.00%-- worker_thread | | | --100.00%-- start_thread | --49.39%-- page_lock_anon_vma try_to_unmap_anon try_to_unmap migrate_pages migrate_misplaced_page __do_numa_page.isra.69 handle_pte_fault handle_mm_fault __do_page_fault do_page_fault page_fault __memset_sse2 | --100.00%-- worker_thread start_thread With this change applied the profile is now nicely flat and there's no anon-vma related scheduling/blocking. Rename anon_vma_[un]lock() => anon_vma_[un]lock_write(), to make it clearer that it's an exclusive write-lock in that case - suggested by Rik van Riel. Suggested-by: Linus Torvalds Cc: Peter Zijlstra Cc: Paul Turner Cc: Lee Schermerhorn Cc: Christoph Lameter Cc: Rik van Riel Cc: Mel Gorman Cc: Andrea Arcangeli Cc: Johannes Weiner Cc: Hugh Dickins Signed-off-by: Ingo Molnar Signed-off-by: Mel Gorman

mm/rmap: Convert the struct anon_vma::mutex to an rwsem

2012-12-11T14:43:00Z

Convert the struct anon_vma::mutex to an rwsem, which will help in solving a page-migration scalability problem. (Addressed in a separate patch.) The conversion is simple and straightforward: in every case where we mutex_lock()ed we'll now down_write(). Suggested-by: Linus Torvalds Reviewed-by: Rik van Riel Cc: Peter Zijlstra Cc: Paul Turner Cc: Lee Schermerhorn Cc: Christoph Lameter Cc: Mel Gorman Cc: Andrea Arcangeli Cc: Johannes Weiner Cc: Hugh Dickins Signed-off-by: Ingo Molnar Signed-off-by: Mel Gorman

mm: cma: discard clean pages during contiguous allocation instead of migration

2012-10-09T07:22:43Z

Drop clean cache pages instead of migration during alloc_contig_range() to minimise allocation latency by reducing the amount of migration that is necessary. It's useful for CMA because latency of migration is more important than evicting the background process's working set. In addition, as pages are reclaimed then fewer free pages for migration targets are required so it avoids memory reclaiming to get free pages, which is a contributory factor to increased latency. I measured elapsed time of __alloc_contig_migrate_range() which migrates 10M in 40M movable zone in QEMU machine. Before - 146ms, After - 7ms [akpm@linux-foundation.org: fix nommu build] Signed-off-by: Mel Gorman Signed-off-by: Minchan Kim Reviewed-by: Mel Gorman Cc: Marek Szyprowski Acked-by: Michal Nazarewicz Cc: Rik van Riel Tested-by: Kyungmin Park Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

mm: add CONFIG_DEBUG_VM_RB build option

2012-10-09T07:22:42Z

Add a CONFIG_DEBUG_VM_RB build option for the previously existing DEBUG_MM_RB code. Now that Andi Kleen modified it to avoid using recursive algorithms, we can expose it a bit more. Also extend this code to validate_mm() after stack expansion, and to check that the vma's start and last pgoffs have not changed since the nodes were inserted on the anon vma interval tree (as it is important that the nodes be reindexed after each such update). Signed-off-by: Michel Lespinasse Cc: Andrea Arcangeli Cc: Rik van Riel Cc: Peter Zijlstra Cc: Daniel Santos Cc: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds