diff options
| author | Andrew Morton <akpm@zip.com.au> | 2002-08-30 01:49:22 -0700 |
|---|---|---|
| committer | Linus Torvalds <torvalds@home.transmeta.com> | 2002-08-30 01:49:22 -0700 |
| commit | ec12ac49e86d508e0ce7065f44889fbb67e60601 (patch) | |
| tree | 61b30e346858b0df219f162904de87093d3544ba /include/linux | |
| parent | 8fd3d4584142a0d68eaeabecff9fa99831c9451a (diff) | |
[PATCH] writeback correctness and efficiency changes
This is a performance and correctness fix against the writeback paths.
The writeback code has competing requirements. Sometimes it is used
for "memory cleansing": kupdate, bdflush, writer throttling, page
allocator writeback, etc. And sometimes this same code is used for
data integrity pruposes: fsync, msync, fdatasync, sync, umount, various
other kernel-internal uses.
The problem is: how to handle a dirty buffer or page which is currently
under writeback.
For memory cleansing, we just want to skip that buffer/page and go onto
the next one. But for sync, we must wait on the old writeback and then
start new writeback.
mpage_writepages() is current correct for cleansing, but incorrect for
sync. block_write_full_page() is currently correct for sync, but
inefficient for cleansing.
The fix is fairly simple.
- In mpage_writepages(), don't skip the page is it's a sync
operation.
- In block_write_full_page(), skip the buffer if it is a sync
operation. And return -EAGAIN to tell the caller that the writeout
didn't work out. The caller must then set the page dirty again and
move it onto mapping->dirty_pages.
This is an extension of the writepage API: writepage can now return
EAGAIN. There are only three callers, and they have been updated.
fail_writepage() and ext3_writepage() were actually doing this by
hand. They have been changed to return -EAGAIN. NTFS will want to
be able to return -EAGAIN from its writepage as well.
- A sticky question is: how to tell the writeout code which mode it
is operating in? Cleansing or sync?
It's such a tiny code change that I didn't have the heart to go and
propagate a `mode' argument down every instance of writepages() and
writepage() in the kernel. So I passed it in via current->flags.
Incidentally, the occurrence of a locked-and-dirty buffer in
block_write_full_page() is fairly rare: normally the collision avoidance
happens at the address_space level, via PageWriteback. But some
mappings (blockdevs, ext3 files, etc) have their dirty buffers written
out via submit_bh(). It is these buffers which can stall
block_write_full_page().
This wart will be pretty intrusive to fix. ext3 needs to become fully
page-based (ugh. It's a block-based journalling filesystem, and pages
are unnatural). blockdev mappings are still written out by buffers
because that's how filesystems use them. Putting _all_ metadata
(indirects, inodes, superblocks, etc) into standalone address_spaces
would fix that up.
- filemap_fdatawrite() sets PF_SYNC. So filemap_fdatawrite() is the
kernel function which will start writeback against a mapping for
"data integrity" purposes, whereas the unexported, internal-only
do_writepages() is the writeback function which is used for memory
cleansing. This difference is the reason why I didn't consolidate
those functions ages ago...
- Lots of code paths had a bogus extra call to filemap_fdatawait(),
which I previously added in a moment of weak-headedness. They have
all been removed.
Diffstat (limited to 'include/linux')
| -rw-r--r-- | include/linux/sched.h | 1 | ||||
| -rw-r--r-- | include/linux/writeback.h | 9 |
2 files changed, 10 insertions, 0 deletions
diff --git a/include/linux/sched.h b/include/linux/sched.h index e46b8b84cad4..e8251036b026 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -405,6 +405,7 @@ do { if (atomic_dec_and_test(&(tsk)->usage)) __put_task_struct(tsk); } while(0) #define PF_FREEZE 0x00010000 /* this task should be frozen for suspend */ #define PF_IOTHREAD 0x00020000 /* this thread is needed for doing I/O to swap */ #define PF_FROZEN 0x00040000 /* frozen for system suspend */ +#define PF_SYNC 0x00080000 /* performing fsync(), etc */ /* * Ptrace flags diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 7b1ae2718f3e..5de884cd6a7c 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -72,4 +72,13 @@ extern int nr_pdflush_threads; /* Global so it can be exported to sysctl read-only. */ +/* + * Tell the writeback paths that they are being called for a "data integrity" + * operation such as fsync(). + */ +static inline int called_for_sync(void) +{ + return current->flags & PF_SYNC; +} + #endif /* WRITEBACK_H */ |
