[PATCH] pagevec infrastructure

This is the first patch in a series of eight which address pagemap_lru_lock contention, and which simplify the VM locking hierarchy. Most testing has been done with all eight patches applied, so it would be best not to cherrypick, please. The workload which was optimised was: 4x500MHz PIII CPUs, mem=512m, six disks, six filesystems, six processes each flat-out writing a large file onto one of the disks. ie: heavy page replacement load. The frequency with which pagemap_lru_lock is taken is reduced by 90%. Lockmeter claims that pagemap_lru_lock contention on the 4-way has been reduced by 98%. Total amount of system time lost to lock spinning went from 2.5% to 0.85%. Anton ran a similar test on 8-way PPC, the reduction in system time was around 25%, and the reduction in time spent playing with pagemap_lru_lock was 80%. http://samba.org/~anton/linux/2.5.30/standard/ versus http://samba.org/~anton/linux/2.5.30/akpm/ Throughput changes on uniprocessor are modest: a 1% speedup with this workload due to shortened code paths and improved cache locality. The patches do two main things: 1: In almost all places where the kernel was doing something with lots of pages one-at-a-time, convert the code to do the same thing sixteen-pages-at-a-time. Take the lock once rather than sixteen times. Take the lock for the minimum possible time. 2: Multithread the pagecache reclaim function: don't hold pagemap_lru_lock while reclaiming pagecache pages. That function was massively expensive. One fallout from this work is that we never take any other locks while holding pagemap_lru_lock. So this lock conceptually disappears from the VM locking hierarchy. So. This is all basically a code tweak to improve kernel scalability. It does it by optimising the existing design, rather than by redesign. There is little conceptual change to how the VM works. This is as far as I can tweak it. It seems that the results are now acceptable on SMP. But things are still bad on NUMA. It is expected that the per-zone LRU and per-zone LRU lock patches will fix NUMA as well, but that has yet to be tested. This first patch introduces `struct pagevec', which is the basic unit of batched work. It is simply: struct pagevec { unsigned nr; struct page *pages[16]; }; pagevecs are used in the following patches to get the VM away from page-at-a-time operations. This patch includes all the pagevec library functions which are used in later patches.
author: Andrew Morton <akpm@zip.com.au> 2002-08-14 21:20:44 -0700
committer: Linus Torvalds <torvalds@home.transmeta.com> 2002-08-14 21:20:44 -0700
commit: 6a95284048359fb4e1c96e02c6be0be9bdc71d6c (patch)
tree: 66fb6ceaeee413185a82a0a738e35b1c3e6c2649 /include
parent: ecc9d325a607ad198c7e1b6aada94ff4daefef91 (diff)
1 files changed, 76 insertions, 0 deletions
diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
new file mode 100644
index 000000000000..7d091aea7543
--- /dev/null
+++ b/include/linux/pagevec.h
@@ -0,0 +1,76 @@
+/*
+ * include/linux/pagevec.h
+ *
+ * In many places it is efficient to batch an operation up against multiple
+ * pages.  A pagevec is a multipage container which is used for that.
+ */
+
+#define PAGEVEC_SIZE	16
+
+struct page;
+
+struct pagevec {
+	unsigned nr;
+	struct page *pages[PAGEVEC_SIZE];
+};
+
+void __pagevec_release(struct pagevec *pvec);
+void __pagevec_release_nonlru(struct pagevec *pvec);
+void __pagevec_free(struct pagevec *pvec);
+void __pagevec_lru_add(struct pagevec *pvec);
+void __pagevec_lru_del(struct pagevec *pvec);
+void pagevec_deactivate_inactive(struct pagevec *pvec);
+
+static inline void pagevec_init(struct pagevec *pvec)
+{
+	pvec->nr = 0;
+}
+
+static inline unsigned pagevec_count(struct pagevec *pvec)
+{
+	return pvec->nr;
+}
+
+static inline unsigned pagevec_space(struct pagevec *pvec)
+{
+	return PAGEVEC_SIZE - pvec->nr;
+}
+
+/*
+ * Add a page to a pagevec.  Returns the number of slots still available.
+ */
+static inline unsigned pagevec_add(struct pagevec *pvec, struct page *page)
+{
+	pvec->pages[pvec->nr++] = page;
+	return pagevec_space(pvec);
+}
+
+static inline void pagevec_release(struct pagevec *pvec)
+{
+	if (pagevec_count(pvec))
+		__pagevec_release(pvec);
+}
+
+static inline void pagevec_release_nonlru(struct pagevec *pvec)
+{
+	if (pagevec_count(pvec))
+		__pagevec_release_nonlru(pvec);
+}
+
+static inline void pagevec_free(struct pagevec *pvec)
+{
+	if (pagevec_count(pvec))
+		__pagevec_free(pvec);
+}
+
+static inline void pagevec_lru_add(struct pagevec *pvec)
+{
+	if (pagevec_count(pvec))
+		__pagevec_lru_add(pvec);
+}
+
+static inline void pagevec_lru_del(struct pagevec *pvec)
+{
+	if (pagevec_count(pvec))
+		__pagevec_lru_del(pvec);
+}
author	Andrew Morton <akpm@zip.com.au>	2002-08-14 21:20:44 -0700
committer	Linus Torvalds <torvalds@home.transmeta.com>	2002-08-14 21:20:44 -0700
commit	6a95284048359fb4e1c96e02c6be0be9bdc71d6c (patch)
tree	66fb6ceaeee413185a82a0a738e35b1c3e6c2649 /include
parent	ecc9d325a607ad198c7e1b6aada94ff4daefef91 (diff)