diff options
| author | Andrew Morton <akpm@zip.com.au> | 2002-04-29 23:54:18 -0700 |
|---|---|---|
| committer | Linus Torvalds <torvalds@home.transmeta.com> | 2002-04-29 23:54:18 -0700 |
| commit | a2bcb3a084f4312844639e18cbe7eb7256c7c23c (patch) | |
| tree | 7341b21bf48c391bb18cb5c9d0ee698d65167a95 /fs/block_dev.c | |
| parent | f15fe42437fb78d9ffae53a7c40be1c5939330e9 (diff) | |
[PATCH] page writeback locking update
- Fixes a performance problem - callers of
prepare_write/commit_write, etc are locking pages, which synchronises
them behind writeback, which also locks these pages. Significant
slowdowns for some workloads.
- So pages are no longer locked while under writeout. Introduce a
new PG_writeback and associated infrastructure to support this design
change.
- Pages which are under read I/O still use PageLocked. Pages which
are under write I/O have PageWriteback() true.
I considered creating Page_IO instead of PageWriteback, and marking
both readin and writeout pages as PageIO(). So pages are unlocked
during both read and write. There just doesn't seem a need to do
this - nobody ever needs unblocking access to a page which is under
read I/O.
- Pages under swapout (brw_page) are PageLocked, not PageWriteback.
So their treatment is unchangeded.
It's not obvious that pages which are under swapout actually need
the more asynchronous behaviour of PageWriteback.
I was setting the swapout pages PageWriteback and unlocking them
prior to submitting the buffers in brw_page(). This led to deadlocks
on the exit_mmap->zap_page_range->free_swap_and_cache path. These
functions call block_flushpage under spinlock. If the page is
unlocked but has locked buffers, block_flushpage->discard_buffer()
sleeps. Under spinlock. So that will need fixing if for some reason
we want swapout to use PageWriteback.
Kernel has called block_flushpage() under spinlock for a long time.
It is assuming that a locked page will never have locked buffers.
This appears to be true, but it's ugly.
- Adds new function wait_on_page_writeback(). Renames wait_on_page()
to wait_on_page_locked() to remind people that they need to call the
appropriate one.
- Renames filemap_fdatasync() to filemap_fdatawrite(). It's more
accurate - "sync" implies, if anything, writeout and wait. (fsync,
msync) Or writeout. it's not clear.
- Subtly changes the filemap_fdatawrite() internals - this function
used to do a lock_page() - it waited for any other user of the page
to let go before submitting new I/O against a page. It has been
changed to simply skip over any pages which are currently under
writeback.
This is the right thing to do for memory-cleansing reasons.
But it's the wrong thing to do for data consistency operations (eg,
fsync()). For those operations we must ensure that all data which
was dirty *at the time of the system call* are tight on disk before
the call returns.
So all places which care about this have been converted to do:
filemap_fdatawait(mapping); /* Wait for current writeback */
filemap_fdatawrite(mapping); /* Write all dirty pages */
filemap_fdatawait(mapping); /* Wait for I/O to complete */
- Fixes a truncate_inode_pages problem - truncate currently will
block when it hits a locked page, so it ends up getting into lockstep
behind writeback and all of the file is pointlessly written back.
One fix for this is for truncate to simply walk the page list in the
opposite direction from writeback.
I chose to use a separate cleansing pass. It is more
CPU-intensive, but it is surer and clearer. This is because there is
no reason why the per-address_space ->vm_writeback and
->writeback_mapping functions *have* to perform writeout in
->dirty_pages order. They may choose to do something totally
different.
(set_page_dirty() is an a_op now, so address_spaces could almost
privatise the whole dirty-page handling thing. Except
truncate_inode_pages and invalidate_inode_pages assume that the pages
are on the address_space lists. hmm. So making truncate_inode_pages
and invalidate_inode_pages a_ops would make some sense).
Diffstat (limited to 'fs/block_dev.c')
| -rw-r--r-- | fs/block_dev.c | 15 |
1 files changed, 1 insertions, 14 deletions
diff --git a/fs/block_dev.c b/fs/block_dev.c index c811529e7993..5add2e4911d0 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -180,22 +180,9 @@ static loff_t block_llseek(struct file *file, loff_t offset, int origin) return retval; } -/* - * AKPM: fixme. unneeded stuff here. - */ static int __block_fsync(struct inode * inode) { - int ret, err; - - ret = filemap_fdatasync(inode->i_mapping); - err = sync_buffers(inode->i_bdev, 1); - if (err && !ret) - ret = err; - err = filemap_fdatawait(inode->i_mapping); - if (err && !ret) - ret = err; - - return ret; + return sync_buffers(inode->i_bdev, 1); } /* |
