summaryrefslogtreecommitdiff
path: root/include/linux/pagevec.h
AgeCommit message (Collapse)Author
2009-04-01mm: remove pagevec_swap_free()KOSAKI Motohiro
pagevec_swap_free() is now unused. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Acked-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06mm: get rid of pagevec_release_nonlru()KOSAKI Motohiro
speculative page references patch (commit: e286781d5f2e9c846e012a39653a166e9d31777d) removed last pagevec_release_nonlru() caller. So this function can be removed now. This patch doesn't have any functional change. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20Unevictable LRU InfrastructureLee Schermerhorn
When the system contains lots of mlocked or otherwise unevictable pages, the pageout code (kswapd) can spend lots of time scanning over these pages. Worse still, the presence of lots of unevictable pages can confuse kswapd into thinking that more aggressive pageout modes are required, resulting in all kinds of bad behaviour. Infrastructure to manage pages excluded from reclaim--i.e., hidden from vmscan. Based on a patch by Larry Woodman of Red Hat. Reworked to maintain "unevictable" pages on a separate per-zone LRU list, to "hide" them from vmscan. Kosaki Motohiro added the support for the memory controller unevictable lru list. Pages on the unevictable list have both PG_unevictable and PG_lru set. Thus, PG_unevictable is analogous to and mutually exclusive with PG_active--it specifies which LRU list the page is on. The unevictable infrastructure is enabled by a new mm Kconfig option [CONFIG_]UNEVICTABLE_LRU. A new function 'page_evictable(page, vma)' in vmscan.c tests whether or not a page may be evictable. Subsequent patches will add the various !evictable tests. We'll want to keep these tests light-weight for use in shrink_active_list() and, possibly, the fault path. To avoid races between tasks putting pages [back] onto an LRU list and tasks that might be moving the page from non-evictable to evictable state, the new function 'putback_lru_page()' -- inverse to 'isolate_lru_page()' -- tests the "evictability" of a page after placing it on the LRU, before dropping the reference. If the page has become unevictable, putback_lru_page() will redo the 'putback', thus moving the page to the unevictable list. This way, we avoid "stranding" evictable pages on the unevictable list. [akpm@linux-foundation.org: fix fallout from out-of-order merge] [riel@redhat.com: fix UNEVICTABLE_LRU and !PROC_PAGE_MONITOR build] [nishimura@mxp.nes.nec.co.jp: remove redundant mapping check] [kosaki.motohiro@jp.fujitsu.com: unevictable-lru-infrastructure: putback_lru_page()/unevictable page handling rework] [kosaki.motohiro@jp.fujitsu.com: kill unnecessary lock_page() in vmscan.c] [kosaki.motohiro@jp.fujitsu.com: revert migration change of unevictable lru infrastructure] [kosaki.motohiro@jp.fujitsu.com: revert to unevictable-lru-infrastructure-kconfig-fix.patch] [kosaki.motohiro@jp.fujitsu.com: restore patch failure of vmstat-unevictable-and-mlocked-pages-vm-events.patch] Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Debugged-by: Benjamin Kidwell <benjkidwell@yahoo.com> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20vmscan: split LRU lists into anon & file setsRik van Riel
Split the LRU lists in two, one set for pages that are backed by real file systems ("file") and one for pages that are backed by memory and swap ("anon"). The latter includes tmpfs. The advantage of doing this is that the VM will not have to scan over lots of anonymous pages (which we generally do not want to swap out), just to find the page cache pages that it should evict. This patch has the infrastructure and a basic policy to balance how much we scan the anon lists and how much we scan the file lists. The big policy changes are in separate patches. [lee.schermerhorn@hp.com: collect lru meminfo statistics from correct offset] [kosaki.motohiro@jp.fujitsu.com: prevent incorrect oom under split_lru] [kosaki.motohiro@jp.fujitsu.com: fix pagevec_move_tail() doesn't treat unevictable page] [hugh@veritas.com: memcg swapbacked pages active] [hugh@veritas.com: splitlru: BDI_CAP_SWAP_BACKED] [akpm@linux-foundation.org: fix /proc/vmstat units] [nishimura@mxp.nes.nec.co.jp: memcg: fix handling of shmem migration] [kosaki.motohiro@jp.fujitsu.com: adjust Quicklists field of /proc/meminfo] [kosaki.motohiro@jp.fujitsu.com: fix style issue of get_scan_ratio()] Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20vmscan: free swap space on swap-in/activationRik van Riel
If vm_swap_full() (swap space more than 50% full), the system will free swap space at swapin time. With this patch, the system will also free the swap space in the pageout code, when we decide that the page is not a candidate for swapout (and just wasting swap space). Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: MinChan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20swap: use an array for the LRU pagevecsKOSAKI Motohiro
Turn the pagevecs into an array just like the LRUs. This significantly cleans up the source code and reduces the size of the kernel by about 13kB after all the LRU lists have been created further down in the split VM patch series. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2006-01-08[PATCH] Permit multiple inclusion of linux/pagevec.hDavid Howells
Make it possible to include linux/pagevec.h multiple times without incurring errors due to duplicate definitions. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-25[PATCH] Change pagevec counters back to unsigned long and cacheline alignMarcelo Tosatti
Change pagevec "nr" and "cold" back to "unsigned long", because <4 byte accesses can be slow on architectures < Pentium III (additional "data16" operand on instruction). This still honours the cacheline alignment, making the size of "pagevec" structure a power of two (either 64 or 128 bytes). Haven't been able to see any significant change on performance on my limited testing. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-18[PATCH] Adjust alignment of pagevec structureMarcelo Tosatti
We can shrink the pagevec structure to cacheline align it. It is used all over VM reclaiming and mpage pagecache read code. Right now it is 140 bytes on 64-bit and 72 bytes on 32-bit. Thats just a little bit more than a power of 2 (which will cacheline align), so shrink it to be aligned: 64 bytes on 32bit and 124bytes on 64-bit. It now occupies two cachelines most of the time instead of three. I changed nr and cold to "unsigned short" because they'll never reach 2 ^ 16. Did some reaim benchmarking on 4way PIII (32byte cacheline), with 512MB RAM: #### stock 2.6.9-rc1-mm4 #### Peak load Test: Maximum Jobs per Minute 4144.44 (average of 3 runs) Quick Convergence Test: Maximum Jobs per Minute 4007.86 (average of 3 runs) Peak load Test: Maximum Jobs per Minute 4207.48 (average of 3 runs) Quick Convergence Test: Maximum Jobs per Minute 3999.28 (average of 3 runs) #### shrink-pagevec ##### Peak load Test: Maximum Jobs per Minute 4717.88 (average of 3 runs) Quick Convergence Test: Maximum Jobs per Minute 4360.59 (average of 3 runs) Peak load Test: Maximum Jobs per Minute 4493.18 (average of 3 runs) Quick Convergence Test: Maximum Jobs per Minute 4327.77 (average of 3 runs) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-04-11[PATCH] stop using the address_space dirty_pages listAndrew Morton
Move everything over to walking the radix tree via the PAGECACHE_TAG_DIRTY tag. Remove address_space.dirty_pages.
2002-11-21[PATCH] Remove mapping->vm_writebackAndrew Morton
The vm_writeback address_space operation was designed to provide the VM with a "clustered writeout" capability. It allowed the filesystem to perform more intelligent writearound decisions when the VM was trying to clean a particular page. I can't say I ever saw any real benefit from this - not much writeout actually happens on that path - quite a lot of work has gone into minimising it actually. The default ->vm_writeback a_op which I provided wrote back the pages in ->dirty_pages order. But there is one scenario in which this causes problems - writing a single 4G file with mem=4G. We end up with all of ZONE_NORMAL full of dirty pages, but all writeback effort is against highmem pages. (Because there is about 1.5G of dirty memory total). Net effect: the machine stalls ZONE_NORMAL allocation attempts until the ->dirty_pages writeback advances onto ZONE_NORMAL pages. This can be fixed most sweetly with additional radix-tree infrastructure which will be quite complex. Later. So this patch dumps it all, and goes back to using writepage against individual pages as they come off the LRU.
2002-10-31[PATCH] empty the deferred lru-addition buffers in swapin_readaheadAndrew Morton
If we're about to return to userspace after performing some swap readahead, the pages in the deferred-addition LRU queues could stay there for some time. So drain them after performing readahead.
2002-10-31[PATCH] lru_add_active(): for starting pages on the active listAndrew Morton
This is the first in a series of patches which tune up the 2.5 performance under heavy swap loads. Throughput on stupid swapstormy tests is increased by 1.5x to 3x. Still about 20% behind 2.4 with multithreaded tests. That is not easily fixable - the virtual scan tends to apply a form of load control: particular processes are heavily swapped out so the others can get ahead. With 2.5 all processes make very even progress and much more swapping is needed. It's on par with 2.4 for single-process swapstorms. In this patch: The code which tries to start mapped pages out on the active list doesn't work very well. It uses an "is it mapped into pagetables" test. Which doesn't work for, say, swap readahead pages. They are not mapped into pagetables when they are spilled onto the LRU. So create a new `lru_cache_add_active()' function for deferred addition of pages to their active list. Also move mark_page_accessed() from filemap.c to swap.c where all similar functions live. And teach it to not try to move pages which are in the deferred-addition list onto the active list. That won't work, and it's bogusly clearing PageReferenced in that case. The deferred-addition lists are a pest. But lru_cache_add used to be really expensive in sime workloads on some machines. Must persist.
2002-10-29[PATCH] hot-n-cold pages: free and allocate hintsAndrew Morton
Add a `cold' hint to struct pagevec, and teach truncate and page reclaim to use it. Empirical testing showed that truncate's pages tend to be hot. And page reclaim's are certainly cold.
2002-10-02[PATCH] truncate/invalidate_inode_pages rewriteAndrew Morton
Rewrite these functions to use gang lookup. - This probably has similar performance to the old code in the common case. - It will be vastly quicker than current code for the worst case (single-page truncate). - invalidate_inode_pages() has been changed. It used to use page_count(page) as the "is it mapped into pagetables" heuristic. It now uses the (page->pte.direct != 0) heuristic. - Removes the worst cause of scheduling latency in the kernel. - It's a big code cleanup. - invalidate_inode_pages() has been changed to take an address_space *, not an inode *. - the maximum hold times for mapping->page_lock are enormously reduced, making it quite feasible to turn this into an irq-safe lock. Which, it seems, is a requirement for sane AIO<->direct-io integration, as well as possibly other AIO things. (Thanks Hugh for fixing a bug in this one as well). (Christoph added some stuff too)
2002-09-09[PATCH] buffer_head takedown for bighighmem machinesAndrew Morton
This patch addresses the excessive consumption of ZONE_NORMAL by buffer_heads on highmem machines. The algorithms which decide which buffers to shoot down are fairly dumb, but they only cut in on machines with large highmem:lowmem ratios and the code footprint is tiny. The buffer.c change implements the buffer_head accounting - it sets the upper limit on buffer_head memory occupancy to 10% of ZONE_NORMAL. A possible side-effect of this change is that the kernel will perform more calls to get_block() to map pages to disk. This will only be observed when a file is being repeatadly overwritten - this is the only case in which the "cached get_block result" in the buffers is useful. I did quite some testing of this back in the delalloc ext2 days, and was not able to come up with a test in which the cached get_block result was measurably useful. That's for ext2, which has a fast get_block(). A desirable side effect of this patch is that the kernel will be able to cache much more blockdev pagecache in ZONE_NORMAL, so there are more ext2/3 indirect blocks in cache, so with some workloads, less I/O will be performed. In mpage_writepage(): if the number of buffer_heads is excessive then buffers are stripped from pages as they are submitted for writeback. This change is only useful for filesystems which are using the mpage code. That's ext2 and ext3-writeback and JFS. An mpage patch for reiserfs was floating about but seems to have got lost. There is no need to strip buffers for reads because the mpage code does not attach buffers for reads. These are perhaps not the most appropriate buffer_heads to toss away. Perhaps something smarter should be done to detect file overwriting, or to toss the 'oldest' buffer_heads first. In refill_inactive(): if the number of buffer_heads is excessive then strip buffers from pages as they move onto the inactive list. This change is useful for all filesystems. This approach is good because pages which are being repeatedly overwritten will remain on the active list and will retain their buffers, whereas pages which are not being overwritten will be stripped.
2002-08-30[PATCH] remove pagevec_lru_del()Andrew Morton
it was only being used in invalidate_inode_pages(), and from there, pagevec_release() does the same thing.
2002-08-14[PATCH] deferred and batched addition of pages to the LRUAndrew Morton
The remaining source of page-at-a-time activity against pagemap_lru_lock is the anonymous pagefault path, which cannot be changed to operate against multiple pages at a time. But what we can do is to batch up just its adding of pages to the LRU, via buffering and deferral. This patch is based on work from Bill Irwin. The patch changes lru_cache_add to put the pages into a per-CPU pagevec. They are added to the LRU 16-at-a-time. And in the page reclaim code, purge the local CPU's buffer before starting. This is mainly to decrease the chances of pages staying off the LRU for very long periods: if the machine is under memory pressure, CPUs will spill their pages onto the LRU promptly. A consequence of this change is that we can have up to 15*num_cpus pages which are not on the LRU. Which could have a slight effect on VM accuracy, but I find that doubtful. If the system is under memory pressure the pages will be added to the LRU promptly, and these pages are the most-recently-touched ones - the VM isn't very interested in them anyway. This optimisation could be made SMP-specific, but I felt it best to turn it on for UP as well for consistency and better testing coverage.
2002-08-14[PATCH] pagevec infrastructureAndrew Morton
This is the first patch in a series of eight which address pagemap_lru_lock contention, and which simplify the VM locking hierarchy. Most testing has been done with all eight patches applied, so it would be best not to cherrypick, please. The workload which was optimised was: 4x500MHz PIII CPUs, mem=512m, six disks, six filesystems, six processes each flat-out writing a large file onto one of the disks. ie: heavy page replacement load. The frequency with which pagemap_lru_lock is taken is reduced by 90%. Lockmeter claims that pagemap_lru_lock contention on the 4-way has been reduced by 98%. Total amount of system time lost to lock spinning went from 2.5% to 0.85%. Anton ran a similar test on 8-way PPC, the reduction in system time was around 25%, and the reduction in time spent playing with pagemap_lru_lock was 80%. http://samba.org/~anton/linux/2.5.30/standard/ versus http://samba.org/~anton/linux/2.5.30/akpm/ Throughput changes on uniprocessor are modest: a 1% speedup with this workload due to shortened code paths and improved cache locality. The patches do two main things: 1: In almost all places where the kernel was doing something with lots of pages one-at-a-time, convert the code to do the same thing sixteen-pages-at-a-time. Take the lock once rather than sixteen times. Take the lock for the minimum possible time. 2: Multithread the pagecache reclaim function: don't hold pagemap_lru_lock while reclaiming pagecache pages. That function was massively expensive. One fallout from this work is that we never take any other locks while holding pagemap_lru_lock. So this lock conceptually disappears from the VM locking hierarchy. So. This is all basically a code tweak to improve kernel scalability. It does it by optimising the existing design, rather than by redesign. There is little conceptual change to how the VM works. This is as far as I can tweak it. It seems that the results are now acceptable on SMP. But things are still bad on NUMA. It is expected that the per-zone LRU and per-zone LRU lock patches will fix NUMA as well, but that has yet to be tested. This first patch introduces `struct pagevec', which is the basic unit of batched work. It is simply: struct pagevec { unsigned nr; struct page *pages[16]; }; pagevecs are used in the following patches to get the VM away from page-at-a-time operations. This patch includes all the pagevec library functions which are used in later patches.