[PATCH] readahead

I'd like to be able to claim amazing speedups, but the best benchmark I could find was diffing two 256 megabyte files, which is about 10% quicker. And that is probably due to the window size being effectively 50% larger. Fact is, any disk worth owning nowadays has a segmented 2-megabyte cache, and OS-level readahead mainly seems to save on CPU cycles rather than overall throughput. Once you start reading more streams than there are segments in the disk cache we start to win. Still. The main motivation for this work is to clean the code up, and to create a central point at which many pages are marshalled together so that they can all be encapsulated into the smallest possible number of BIOs, and injected into the request layer. A number of filesystems were poking around inside the readahead state variables. I'm not really sure what they were up to, but I took all that out. The readahead code manages its own state autonomously and should not need any hints. - Unifies the current three readahead functions (mmap reads, read(2) and sys_readhead) into a single implementation. - More aggressive in building up the readahead windows. - More conservative in tearing them down. - Special start-of-file heuristics. - Preallocates the readahead pages, to avoid the (never demonstrated, but potentially catastrophic) scenario where allocation of readahead pages causes the allocator to perform VM writeout. - Gets all the readahead pages gathered together in one spot, so they can be marshalled into big BIOs. - reinstates the readahead ioctls, so hdparm(8) and blockdev(8) are working again. The readahead settings are now per-request-queue, and the drivers never have to know about it. I use blockdev(8). It works in units of 512 bytes. - Identifies readahead thrashing. Also attempts to handle it. Certainly the changes here delay the onset of catastrophic readahead thrashing by quite a lot, and decrease it seriousness as we get more deeply into it, but it's still pretty bad.
author: Andrew Morton <akpm@zip.com.au> 2002-04-09 21:29:32 -0700
committer: Linus Torvalds <torvalds@penguin.transmeta.com> 2002-04-09 21:29:32 -0700
commit: 8fa498462272fec2c16a92a9a7f67d005225b640 (patch)
tree: e29b46a5009e785a91243ef5d9905c23557b8375 /include/linux/blkdev.h
parent: 3d30a6cc3af49ca0b668a2cbbc9d43def619567c (diff)
1 files changed, 8 insertions, 4 deletions
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 7a43ff774fe0..914498e8e4b9 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -153,6 +153,12 @@ struct request_queue
 	prep_rq_fn		*prep_rq_fn;
 
 	/*
+	 * The VM-level readahead tunable for this device.  In
+	 * units of 512-byte sectors.
+	 */
+	unsigned ra_sectors;
+
+	/*
 	 * The queue owner gets to use this for whatever they like.
 	 * ll_rw_blk doesn't touch it.
 	 */
@@ -308,6 +314,8 @@ extern void blk_queue_hardsect_size(request_queue_t *q, unsigned short);
 extern void blk_queue_segment_boundary(request_queue_t *q, unsigned long);
 extern void blk_queue_assign_lock(request_queue_t *q, spinlock_t *);
 extern void blk_queue_prep_rq(request_queue_t *q, prep_rq_fn *pfn);
+extern int blk_set_readahead(kdev_t dev, unsigned sectors);
+extern unsigned blk_get_readahead(kdev_t dev);
 
 extern int blk_rq_map_sg(request_queue_t *, struct request *, struct scatterlist *);
 extern void blk_dump_rq_flags(struct request *, char *);
@@ -322,10 +330,6 @@ extern int * blksize_size[MAX_BLKDEV];
 
 #define MAX_SEGMENT_SIZE	65536
 
-/* read-ahead in pages.. */
-#define MAX_READAHEAD	31
-#define MIN_READAHEAD	3
-
 #define blkdev_entry_to_request(entry) list_entry((entry), struct request, queuelist)
 
 extern void drive_stat_acct(struct request *, int, int);
author	Andrew Morton <akpm@zip.com.au>	2002-04-09 21:29:32 -0700
committer	Linus Torvalds <torvalds@penguin.transmeta.com>	2002-04-09 21:29:32 -0700
commit	8fa498462272fec2c16a92a9a7f67d005225b640 (patch)
tree	e29b46a5009e785a91243ef5d9905c23557b8375 /include/linux/blkdev.h
parent	3d30a6cc3af49ca0b668a2cbbc9d43def619567c (diff)