user/sven/linux.git - Linux Kernel

Age	Commit message (Collapse)	Author
2002-09-09	[PATCH] buffer_head takedown for bighighmem machines	Andrew Morton
	This patch addresses the excessive consumption of ZONE_NORMAL by buffer_heads on highmem machines. The algorithms which decide which buffers to shoot down are fairly dumb, but they only cut in on machines with large highmem:lowmem ratios and the code footprint is tiny. The buffer.c change implements the buffer_head accounting - it sets the upper limit on buffer_head memory occupancy to 10% of ZONE_NORMAL. A possible side-effect of this change is that the kernel will perform more calls to get_block() to map pages to disk. This will only be observed when a file is being repeatadly overwritten - this is the only case in which the "cached get_block result" in the buffers is useful. I did quite some testing of this back in the delalloc ext2 days, and was not able to come up with a test in which the cached get_block result was measurably useful. That's for ext2, which has a fast get_block(). A desirable side effect of this patch is that the kernel will be able to cache much more blockdev pagecache in ZONE_NORMAL, so there are more ext2/3 indirect blocks in cache, so with some workloads, less I/O will be performed. In mpage_writepage(): if the number of buffer_heads is excessive then buffers are stripped from pages as they are submitted for writeback. This change is only useful for filesystems which are using the mpage code. That's ext2 and ext3-writeback and JFS. An mpage patch for reiserfs was floating about but seems to have got lost. There is no need to strip buffers for reads because the mpage code does not attach buffers for reads. These are perhaps not the most appropriate buffer_heads to toss away. Perhaps something smarter should be done to detect file overwriting, or to toss the 'oldest' buffer_heads first. In refill_inactive(): if the number of buffer_heads is excessive then strip buffers from pages as they move onto the inactive list. This change is useful for all filesystems. This approach is good because pages which are being repeatedly overwritten will remain on the active list and will retain their buffers, whereas pages which are not being overwritten will be stripped.
2002-08-30	[PATCH] remove pagevec_lru_del()	Andrew Morton
	it was only being used in invalidate_inode_pages(), and from there, pagevec_release() does the same thing.
2002-08-14	[PATCH] deferred and batched addition of pages to the LRU	Andrew Morton
	The remaining source of page-at-a-time activity against pagemap_lru_lock is the anonymous pagefault path, which cannot be changed to operate against multiple pages at a time. But what we can do is to batch up just its adding of pages to the LRU, via buffering and deferral. This patch is based on work from Bill Irwin. The patch changes lru_cache_add to put the pages into a per-CPU pagevec. They are added to the LRU 16-at-a-time. And in the page reclaim code, purge the local CPU's buffer before starting. This is mainly to decrease the chances of pages staying off the LRU for very long periods: if the machine is under memory pressure, CPUs will spill their pages onto the LRU promptly. A consequence of this change is that we can have up to 15*num_cpus pages which are not on the LRU. Which could have a slight effect on VM accuracy, but I find that doubtful. If the system is under memory pressure the pages will be added to the LRU promptly, and these pages are the most-recently-touched ones - the VM isn't very interested in them anyway. This optimisation could be made SMP-specific, but I felt it best to turn it on for UP as well for consistency and better testing coverage.
2002-08-14	[PATCH] pagevec infrastructure	Andrew Morton
	This is the first patch in a series of eight which address pagemap_lru_lock contention, and which simplify the VM locking hierarchy. Most testing has been done with all eight patches applied, so it would be best not to cherrypick, please. The workload which was optimised was: 4x500MHz PIII CPUs, mem=512m, six disks, six filesystems, six processes each flat-out writing a large file onto one of the disks. ie: heavy page replacement load. The frequency with which pagemap_lru_lock is taken is reduced by 90%. Lockmeter claims that pagemap_lru_lock contention on the 4-way has been reduced by 98%. Total amount of system time lost to lock spinning went from 2.5% to 0.85%. Anton ran a similar test on 8-way PPC, the reduction in system time was around 25%, and the reduction in time spent playing with pagemap_lru_lock was 80%. http://samba.org/~anton/linux/2.5.30/standard/ versus http://samba.org/~anton/linux/2.5.30/akpm/ Throughput changes on uniprocessor are modest: a 1% speedup with this workload due to shortened code paths and improved cache locality. The patches do two main things: 1: In almost all places where the kernel was doing something with lots of pages one-at-a-time, convert the code to do the same thing sixteen-pages-at-a-time. Take the lock once rather than sixteen times. Take the lock for the minimum possible time. 2: Multithread the pagecache reclaim function: don't hold pagemap_lru_lock while reclaiming pagecache pages. That function was massively expensive. One fallout from this work is that we never take any other locks while holding pagemap_lru_lock. So this lock conceptually disappears from the VM locking hierarchy. So. This is all basically a code tweak to improve kernel scalability. It does it by optimising the existing design, rather than by redesign. There is little conceptual change to how the VM works. This is as far as I can tweak it. It seems that the results are now acceptable on SMP. But things are still bad on NUMA. It is expected that the per-zone LRU and per-zone LRU lock patches will fix NUMA as well, but that has yet to be tested. This first patch introduces `struct pagevec', which is the basic unit of batched work. It is simply: struct pagevec { unsigned nr; struct page *pages[16]; }; pagevecs are used in the following patches to get the VM away from page-at-a-time operations. This patch includes all the pagevec library functions which are used in later patches.