summaryrefslogtreecommitdiff
path: root/include/linux
diff options
context:
space:
mode:
authorAndrew Morton <akpm@zip.com.au>2002-08-30 01:49:22 -0700
committerLinus Torvalds <torvalds@home.transmeta.com>2002-08-30 01:49:22 -0700
commitec12ac49e86d508e0ce7065f44889fbb67e60601 (patch)
tree61b30e346858b0df219f162904de87093d3544ba /include/linux
parent8fd3d4584142a0d68eaeabecff9fa99831c9451a (diff)
[PATCH] writeback correctness and efficiency changes
This is a performance and correctness fix against the writeback paths. The writeback code has competing requirements. Sometimes it is used for "memory cleansing": kupdate, bdflush, writer throttling, page allocator writeback, etc. And sometimes this same code is used for data integrity pruposes: fsync, msync, fdatasync, sync, umount, various other kernel-internal uses. The problem is: how to handle a dirty buffer or page which is currently under writeback. For memory cleansing, we just want to skip that buffer/page and go onto the next one. But for sync, we must wait on the old writeback and then start new writeback. mpage_writepages() is current correct for cleansing, but incorrect for sync. block_write_full_page() is currently correct for sync, but inefficient for cleansing. The fix is fairly simple. - In mpage_writepages(), don't skip the page is it's a sync operation. - In block_write_full_page(), skip the buffer if it is a sync operation. And return -EAGAIN to tell the caller that the writeout didn't work out. The caller must then set the page dirty again and move it onto mapping->dirty_pages. This is an extension of the writepage API: writepage can now return EAGAIN. There are only three callers, and they have been updated. fail_writepage() and ext3_writepage() were actually doing this by hand. They have been changed to return -EAGAIN. NTFS will want to be able to return -EAGAIN from its writepage as well. - A sticky question is: how to tell the writeout code which mode it is operating in? Cleansing or sync? It's such a tiny code change that I didn't have the heart to go and propagate a `mode' argument down every instance of writepages() and writepage() in the kernel. So I passed it in via current->flags. Incidentally, the occurrence of a locked-and-dirty buffer in block_write_full_page() is fairly rare: normally the collision avoidance happens at the address_space level, via PageWriteback. But some mappings (blockdevs, ext3 files, etc) have their dirty buffers written out via submit_bh(). It is these buffers which can stall block_write_full_page(). This wart will be pretty intrusive to fix. ext3 needs to become fully page-based (ugh. It's a block-based journalling filesystem, and pages are unnatural). blockdev mappings are still written out by buffers because that's how filesystems use them. Putting _all_ metadata (indirects, inodes, superblocks, etc) into standalone address_spaces would fix that up. - filemap_fdatawrite() sets PF_SYNC. So filemap_fdatawrite() is the kernel function which will start writeback against a mapping for "data integrity" purposes, whereas the unexported, internal-only do_writepages() is the writeback function which is used for memory cleansing. This difference is the reason why I didn't consolidate those functions ages ago... - Lots of code paths had a bogus extra call to filemap_fdatawait(), which I previously added in a moment of weak-headedness. They have all been removed.
Diffstat (limited to 'include/linux')
-rw-r--r--include/linux/sched.h1
-rw-r--r--include/linux/writeback.h9
2 files changed, 10 insertions, 0 deletions
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e46b8b84cad4..e8251036b026 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -405,6 +405,7 @@ do { if (atomic_dec_and_test(&(tsk)->usage)) __put_task_struct(tsk); } while(0)
#define PF_FREEZE 0x00010000 /* this task should be frozen for suspend */
#define PF_IOTHREAD 0x00020000 /* this thread is needed for doing I/O to swap */
#define PF_FROZEN 0x00040000 /* frozen for system suspend */
+#define PF_SYNC 0x00080000 /* performing fsync(), etc */
/*
* Ptrace flags
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 7b1ae2718f3e..5de884cd6a7c 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -72,4 +72,13 @@ extern int nr_pdflush_threads; /* Global so it can be exported to sysctl
read-only. */
+/*
+ * Tell the writeback paths that they are being called for a "data integrity"
+ * operation such as fsync().
+ */
+static inline int called_for_sync(void)
+{
+ return current->flags & PF_SYNC;
+}
+
#endif /* WRITEBACK_H */