user/sven/linux.git - Linux Kernel

Age	Commit message (Collapse)	Author
2006-02-01	[PATCH] Direct Migration V9: remove_from_swap() to remove swap ptes	Christoph Lameter
	Add remove_from_swap remove_from_swap() allows the restoration of the pte entries that existed before page migration occurred for anonymous pages by walking the reverse maps. This reduces swap use and establishes regular pte's without the need for page faults. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-01	[PATCH] Direct Migration V9: migrate_pages() extension	Christoph Lameter
	Add direct migration support with fall back to swap. Direct migration support on top of the swap based page migration facility. This allows the direct migration of anonymous pages and the migration of file backed pages by dropping the associated buffers (requires writeout). Fall back to swap out if necessary. The patch is based on lots of patches from the hotplug project but the code was restructured, documented and simplified as much as possible. Note that an additional patch that defines the migrate_page() method for filesystems is necessary in order to avoid writeback for anonymous and file backed pages. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Mike Kravetz <kravetz@us.ibm.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06	[PATCH] mm: rmap optimisation	Nick Piggin
	Optimise rmap functions by minimising atomic operations when we know there will be no concurrent modifications. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28	[PATCH] temporarily disable swap token on memory pressure	Rik van Riel
	Some users (hi Zwane) have seen a problem when running a workload that eats nearly all of physical memory - th system does an OOM kill, even when there is still a lot of swap free. The problem appears to be a very big task that is holding the swap token, and the VM has a very hard time finding any other page in the system that is swappable. Instead of ignoring the swap token when sc->priority reaches 0, we could simply take the swap token away from the memory hog and make sure we don't give it back to the memory hog for a few seconds. This patch resolves the problem Zwane ran into. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29	[PATCH] mm: rmap with inner ptlock	Hugh Dickins
	rmap's page_check_address descend without page_table_lock. First just pte_offset_map in case there's no pte present worth locking for, then take page_table_lock for the full check, and pass ptl back to caller in the same style as pte_offset_map_lock. __xip_unmap, page_referenced_one and try_to_unmap_one use pte_unmap_unlock. try_to_unmap_cluster also. page_check_address reformatted to avoid progressive indentation. No use is made of its one error code, return NULL when it fails. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24	[PATCH] xip: fs/mm: execute in place	Carsten Otte
	- generic_file* file operations do no longer have a xip/non-xip split - filemap_xip.c implements a new set of fops that require get_xip_page aop to work proper. all new fops are exported GPL-only (don't like to see whatever code use those except GPL modules) - __xip_unmap now uses page_check_address, which is no longer static in rmap.c, and defined in linux/rmap.h - mm/filemap.h is now much more clean, plainly having just Linus' inline funcs moved here from filemap.c - fix includes in filemap_xip to make it build cleanly on i386 Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-11-18	[PATCH] vmscan: ignore swap token when in trouble	Andrew Morton
	The token-based thrashing control patches introduced a problem: when a task which doesn't hold the token tries to run direct-reclaim, that task is told that pages which belong to the token-holding mm are referenced, even though they are not. This means that it is possible for a huge number of a non-token-holding mm's pages to be scanned to no effect. Eventually, we give up and go and oom-kill something. So the patch arranges for the thrashing control logic to be defeated if the caller has reached the highest level of scanning priority. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-25	[PATCH] lsm: remove net related includes from security.h	Chris Wright
	With this we're back to the times when changing skbuff.h only triggers rebuild of _net_ related stuff 8) This uncovered a bug in rmap.h, that was not including mm.h to get the definition of struct vm_area_struct, working by luck. Signed-off-by: Arnaldo Carvalho de Melo <acme@conectiva.com.br> Signed-off-by: Chris Wright <chrisw@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-23	[PATCH] rmaplock: swapoff use anon_vma	Hugh Dickins
	Swapoff can make good use of a page's anon_vma and index, while it's still left in swapcache, or once it's brought back in and the first pte mapped back: unuse_vma go directly to just one page of only those vmas with the same anon_vma. And unuse_process can skip any vmas without an anon_vma (extending the hugetlb check: hugetlb vmas have no anon_vma). This just hacks in on top of the existing procedure, still going through all the vmas of all the mms in mmlist. A more elegant procedure might replace mmlist by a list of anon_vmas: but that would be more work to implement, with apparently more overhead in the common paths. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-23	[PATCH] rmaplock: kill page_map_lock	Hugh Dickins
	The pte_chains rmap used pte_chain_lock (bit_spin_lock on PG_chainlock) to lock its pte_chains. We kept this (as page_map_lock: bit_spin_lock on PG_maplock) when we moved to objrmap. But the file objrmap locks its vma tree with mapping->i_mmap_lock, and the anon objrmap locks its vma list with anon_vma->lock: so isn't the page_map_lock superfluous? Pretty much, yes. The mapcount was protected by it, and needs to become an atomic: starting at -1 like page _count, so nr_mapped can be tracked precisely up and down. The last page_remove_rmap can't clear anon page mapping any more, because of races with page_add_rmap; from which some BUG_ONs must go for the same reason, but they've served their purpose. vmscan decisions are naturally racy, little change there beyond removing page_map_lock/unlock. But to stabilize the file-backed page->mapping against truncation while acquiring i_mmap_lock, page_referenced_file now needs page lock to be held even for refill_inactive_zone. There's a similar issue in acquiring anon_vma->lock, where page lock doesn't help: which this patch pretends to handle, but actually it needs the next. Roughly 10% cut off lmbench fork numbers on my 2HTP4. Must confess my testing failed to show the races even while they were knowingly exposed: would benefit from testing on racier equipment. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-05-22	[PATCH] rmap 39 add anon_vma rmap	Andrew Morton
	From: Hugh Dickins <hugh@veritas.com> Andrea Arcangeli's anon_vma object-based reverse mapping scheme for anonymous pages. Instead of tracking anonymous pages by pte_chains or by mm, this tracks them by vma. But because vmas are frequently split and merged (particularly by mprotect), a page cannot point directly to its vma(s), but instead to an anon_vma list of those vmas likely to contain the page - a list on which vmas can easily be linked and unlinked as they come and go. The vmas on one list are all related, either by forking or by splitting. This has three particular advantages over anonmm: that it can cope effortlessly with mremap moves; and no longer needs page_table_lock to protect an mm's vma tree, since try_to_unmap finds vmas via page -> anon_vma -> vma instead of using find_vma; and should use less cpu for swapout since it can locate its anonymous vmas more quickly. It does have disadvantages too: a lot more change in mmap.c to deal with anon_vmas, though small straightforward additions now that the vma merging has been refactored there; more lowmem needed for each anon_vma and vma structure; an additional restriction on the merging of vmas (cannot be merged if already assigned different anon_vmas, since then their pages will be pointing to different heads). (There would be no need to enlarge the vma structure if anonymous pages belonged only to anonymous vmas; but private file mappings accumulate anonymous pages by copy-on-write, so need to be listed in both anon_vma and prio_tree at the same time. A different implementation could avoid that by using anon_vmas only for purely anonymous vmas, and use the existing prio_tree to locate cow pages - but that would involve a long search for each single private copy, probably not a good idea.) Where before the vm_pgoff of a purely anonymous (not file-backed) vma was meaningless, now it represents the virtual start address at which that vma is mapped - which the standard file pgoff manipulations treat linearly as vmas are split and merged. But if mremap moves the vma, then it generally carries its original vm_pgoff to the new location, so pages shared with the old location can still be found. Magic. Hugh has massaged it somewhat: building on the earlier rmap patches, this patch is a fifth of the size of Andrea's original anon_vma patch. Please note that this posting will be his first sight of this patch, which he may or may not approve.
2004-05-22	[PATCH] rmap 38 remove anonmm rmap	Andrew Morton
	From: Hugh Dickins <hugh@veritas.com> Before moving on to anon_vma rmap, remove now what's peculiar to anonmm rmap: the anonmm handling and the mremap move cows. Temporarily reduce page_referenced_anon and try_to_unmap_anon to stubs, so a kernel built with this patch will not swap anonymous at all.
2004-05-22	[PATCH] rmap 37 page_add_anon_rmap vma	Andrew Morton
	From: Hugh Dickins <hugh@veritas.com> Silly final patch for anonmm rmap: change page_add_anon_rmap's mm arg to vma arg like anon_vma rmap, to smooth the transition between them.
2004-05-22	[PATCH] rmap 24 no rmap fastcalls	Andrew Morton
	From: Hugh Dickins <hugh@veritas.com> I like CONFIG_REGPARM, even when it's forced on: because it's easy to force off for debugging - easier than editing out scattered fastcalls. Plus I've never understood why we make function foo a fastcall, but function bar not. Remove fastcall directives from rmap. And fix comment about mremap_moved race: it only applies to anon pages.
2004-05-22	[PATCH] Convert i_shared_sem back to a spinlock	Andrew Morton
	Having a semaphore in there causes modest performance regressions on heavily mmap-intensive workloads on some hardware. Specifically, up to 30% in SDET on NUMAQ and big PPC64. So switch it back to being a spinlock. This does mean that unmap_vmas() needs to be told whether or not it is allowed to schedule away; that's simple to do via the zap_details structure. This change means that there will be high scheuling latencies when someone truncates a large file which is currently mmapped, but nobody does that anyway. The scheduling points in unmap_vmas() are mainly for munmap() and exit(), and they still will work OK for that. From: Hugh Dickins <hugh@veritas.com> Sorry, my premature optimizations (trying to pass down NULL zap_details except when needed) have caught you out doubly: unmap_mapping_range_list was NULLing the details even though atomic was set; and if it hadn't, then zap_pte_range would have missed free_swap_and_cache and pte_clear when pte not present. Moved the optimization into zap_pte_range itself. Plus massive documentation update. From: Hugh Dickins <hugh@veritas.com> Here's a second patch to add to the first: mremap's cows can't come home without releasing the i_mmap_lock, better move the whole "Subtle point" locking from move_vma into move_page_tables. And it's possible for the file that was behind an anonymous page to be truncated while we drop that lock, don't want to abort mremap because of VM_FAULT_SIGBUS. (Eek, should we be checking do_swap_page of a vm_file area against the truncate_count sequence? Technically yes, but I doubt we need bother.) - We cannot hold i_mmap_lock across move_one_page() because move_one_page() needs to perform __GFP_WAIT allocations of pagetable pages. - Move the cond_resched() out so we test it once per page rather than only when move_one_page() returns -EAGAIN.
2004-05-22	[PATCH] rmap 11 mremap moves	Andrew Morton
	From: Hugh Dickins <hugh@veritas.com> A weakness of the anonmm scheme is its difficulty in tracking pages shared between two or more mms (one being an ancestor of the other), when mremap has been used to move a range of pages in one of those mms. mremap move is not very common anyway, and it's more often used on a page range exclusive to the mm; but uncommon though it may be, we must not allow unlocked pages to become unswappable. This patch follows Linus' suggestion, simply to take a private copy of the page in such a case: early C-O-W. My previous implementation was daft with respect to pages currently on swap: it insisted on swapping them in to copy them. No need for that: just take the copy when a page is brought in from swap, and its intended address is found to clash with what rmap has already noted. If do_swap_page has to make this copy in the mremap moved case (simply a call to do_wp_page), might as well do so also in the case when it's a write access but the page not exclusive, it's always seemed a little odd that swapin needed a second fault for that. A bug even: get_user_pages force imagines that a single call to handle_mm_fault must break C-O-W. Another bugfix: swapoff's unuse_process didn't check is_vm_hugetlb_page. Andrea's anon_vma has no such problem with mremap moved pages, handling them with elegant use of vm_pgoff - though at some cost to vma merging. How important is it to handle them efficiently? For now there's a msg printk(KERN_WARNING "%s: mremap moved %d cows\n", current->comm, cows);
2004-05-22	[PATCH] rmap 10 add anonmm rmap	Andrew Morton
	From: Hugh Dickins <hugh@veritas.com> Hugh's anonmm object-based reverse mapping scheme for anonymous pages. We have not yet decided whether to adopt this scheme, or Andrea's more advanced anon_vma scheme. anonmm is easier for me to merge quickly, to replace the pte_chain rmap taken out in the previous patch; a patch to install Andrea's anon_vma will follow in due course. Why build up and tear down chains of pte pointers for anonymous pages, when a page can only appear at one particular address, in a restricted group of mms that might share it? (Except: see next patch on mremap.) Introduce struct anonmm per mm to track anonymous pages, all forks from one exec sharing the same bundle of linked anonmms. Anonymous pages originate in one mm, but may be forked into another mm of the bundle later on. Callouts from fork.c to allocate, dup and exit the anonmm structure private to rmap.c. From: Hugh Dickins <hugh@veritas.com> Two concurrent exits (of the last two mms sharing the anonhd). First exit_rmap brings anonhd->count down to 2, gets preempted (at the spin_unlock) by second, which brings anonhd->count down to 1, sees it's 1 and frees the anonhd (without making any change to anonhd->count itself), cpu goes on to do something new which reallocates the old anonhd as a new struct anonmm (probably not a head, in which case count will start at 1), first resumes after the spin_unlock and sees anonhd->count 1, frees "anonhd" again, it's used for something else, a later exit_rmap list_del finds list corrupt.
2004-05-22	[PATCH] rmap 9 remove pte_chains	Andrew Morton
	From: Hugh Dickins <hugh@veritas.com> Lots of deletions: the next patch will put in the new anon rmap, which should look clearer if first we remove all of the old pte-pointer-based rmap from the core in this patch - which therefore leaves anonymous rmap totally disabled, anon pages locked in memory until process frees them. Leave arch files (and page table rmap) untouched for now, clean them up in a later batch. A few constructive changes amidst all the deletions: Choose names (e.g. page_add_anon_rmap) and args (e.g. no more pteps) now so we need not revisit so many files in the next patch. Inline function page_dup_rmap for fork's copy_page_range, simply bumps mapcount under lock. cond_resched_lock in copy_page_range. Struct page rearranged: no pte union, just mapcount moved next to atomic count, so two ints can occupy one long on 64-bit; i386 struct page now 32 bytes even with PAE. Never pass PageReserved to page_remove_rmap, only do_wp_page did so. From: Hugh Dickins <hugh@veritas.com> Move page_add_anon_rmap's BUG_ON(page_mapping(page)) inside the rmap_lock (well, might as well just check mapping if !mapcount then): if this page is being mapped or unmapped on another cpu at the same time, page_mapping's PageAnon(page) and page->mapping are volatile. But page_mapping(page) is used more widely: I've a nasty feeling that clear_page_anon, page_add_anon_rmap and/or page_mapping need barriers added (also in 2.6.6 itself),
2004-05-22	[PATCH] rmap 7 object-based rmap	Andrew Morton
	From: Hugh Dickins <hugh@veritas.com> Dave McCracken's object-based reverse mapping scheme for file pages: why build up and tear down chains of pte pointers for file pages, when page->mapping has i_mmap and i_mmap_shared lists of all the vmas which might contain that page, and it appears at one deterministic position within the vma (unless vma is nonlinear - see next patch)? Has some drawbacks: more work to locate the ptes from page_referenced and try_to_unmap, especially if the i_mmap lists contain a lot of vmas covering different ranges; has to down_trylock the i_shared_sem, and hope that doesn't fail too often. But attractive in that it uses less lowmem, and shifts the rmap burden away from the hot paths, to swapout. Hybrid scheme for the moment: carry on with pte_chains for anonymous pages, that's unchanged; but file pages keep mapcount in the pte union of struct page, where anonymous pages keep chain pointer or direct pte address: so page_mapped(page) works on both. Hugh massaged it a little: distinct page_add_file_rmap entry point; list searches check rss so as not to waste time on mms fully swapped out; check mapcount to terminate once all ptes have been found; and a WARN_ON if page_referenced should have but couldn't find all the ptes.
2004-05-14	[PATCH] rename rmap_lock to page_map_lock	Andrew Morton
	Sync this up with Andrea's patches.
2004-04-12	[PATCH] rmap 1 linux/rmap.h	Andrew Morton
	From: Hugh Dickins <hugh@veritas.com> First of a batch of three rmap patches: this initial batch of three paving the way for a move to some form of object-based rmap (probably Andrea's, but drawing from mine too), and making almost no functional change by itself. A few days will intervene before the next batch, to give the struct page changes in the second patch some exposure before proceeding. rmap 1 create include/linux/rmap.h Start small: linux/rmap-locking.h has already gathered some declarations unrelated to locking, and the rest of the rmap declarations were over in linux/swap.h: gather them all together in linux/rmap.h, and rename the pte_chain_lock to rmap_lock.