summaryrefslogtreecommitdiff
path: root/include/linux
diff options
context:
space:
mode:
authorAndrew Morton <akpm@digeo.com>2002-09-22 08:16:37 -0700
committerLinus Torvalds <torvalds@home.transmeta.com>2002-09-22 08:16:37 -0700
commit4cef1b04dfd642a1335a8c3abdb00872b892fb7e (patch)
tree3700f612fafe089fd82f8d541698ac243a61b269 /include/linux
parentb574273304f5fa6cbdabe201ff9717d00c6f9eae (diff)
[PATCH] infrastructure for monitoring queue congestion state
The patch provides a means for the VM to be able to determine whether a request queue is in a "congested" state. If it is congested, then a write to (or read from) the queue may cause blockage in get_request_wait(). So the VM can do: if (!bdi_write_congested(page->mapping->backing_dev_info)) writepage(page); This is not exact. The code assumes that if the request queue still has 1/4 of its capacity (queue_nr_requests) available then a request will be non-blocking. There is a small chance that another CPU could zoom in and consume those requests. But on the rare occasions where that may happen the result will mereley be some unexpected latency - it's not worth doing anything elaborate to prevent this. The patch decreases the size of `batch_requests'. batch_requests is positively harmful - when a "heavy" writer and a "light" writer are both writing to the same queue, batch_requests provides a means for the heavy writer to massively stall the light writer. Instead of waiting for one or two requests to come free, the light writer has to wait for 32 requests to complete. Plus batch_requests generally makes things harder to tune, understand and predict. I wanted to kill it altogether, but Jens says that it is important for some hardware - it allows decent size requests to be submitted. The VM changes which go along with this code cause batch_requests to be not so painful anyway - the only processes which sleep in get_request_wait() are the ones which we elect, by design, to wait in there - typically heavy writers. The patch changes the meaning of `queue_nr_requests'. It used to mean "total number of requests per queue". Half of these are for reads, and half are for writes. This always confused the heck out of me, and the code needs to divide queue_nr_requests by two all over the place. So queue_nr_requests now means "the number of write requests per queue" and "the number of read requests per queue". ie: I halved it. Also, queue_nr_requests was converted to static scope. Nothing else uses it. The accuracy of bdi_read_congested() and bdi_write_congested() depends upon the accuracy of mapping->backing_dev_info. With complex block stacking arrangements it is possible that ->backing_dev_info is pointing at the wrong queue. I don't know. But the cost of getting this wrong is merely latency, and if it is a problem we can fix it up in the block layer, by getting stacking devices to communicate their congestion state upwards in some manner.
Diffstat (limited to 'include/linux')
-rw-r--r--include/linux/backing-dev.h14
-rw-r--r--include/linux/blkdev.h1
2 files changed, 15 insertions, 0 deletions
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 898f8e1814ef..94c93c9c5f66 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -8,11 +8,15 @@
#ifndef _LINUX_BACKING_DEV_H
#define _LINUX_BACKING_DEV_H
+#include <asm/atomic.h>
+
/*
* Bits in backing_dev_info.state
*/
enum bdi_state {
BDI_pdflush, /* A pdflush thread is working this device */
+ BDI_write_congested, /* The write queue is getting full */
+ BDI_read_congested, /* The read queue is getting full */
BDI_unused, /* Available bits start here */
};
@@ -28,4 +32,14 @@ int writeback_acquire(struct backing_dev_info *bdi);
int writeback_in_progress(struct backing_dev_info *bdi);
void writeback_release(struct backing_dev_info *bdi);
+static inline int bdi_read_congested(struct backing_dev_info *bdi)
+{
+ return test_bit(BDI_read_congested, &bdi->state);
+}
+
+static inline int bdi_write_congested(struct backing_dev_info *bdi)
+{
+ return test_bit(BDI_write_congested, &bdi->state);
+}
+
#endif /* _LINUX_BACKING_DEV_H */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index fa0798452e77..255001f6f433 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -345,6 +345,7 @@ extern void blk_queue_end_tag(request_queue_t *, struct request *);
extern int blk_queue_init_tags(request_queue_t *, int);
extern void blk_queue_free_tags(request_queue_t *);
extern void blk_queue_invalidate_tags(request_queue_t *);
+extern void blk_congestion_wait(int rw, long timeout);
#define MAX_PHYS_SEGMENTS 128
#define MAX_HW_SEGMENTS 128