diff options
| author | Andrew Morton <akpm@digeo.com> | 2003-01-14 00:20:46 -0800 |
|---|---|---|
| committer | Linus Torvalds <torvalds@home.transmeta.com> | 2003-01-14 00:20:46 -0800 |
| commit | 2a6cb303d76059edf35254438375acd95862dc4c (patch) | |
| tree | 2c2d1c4a7d80d842d818892237f0ef9db1ef0af7 /include/linux/ext3_jbd.h | |
| parent | daebe5ee58fa9985c257a1ce5df29fcb258f06d3 (diff) | |
[PATCH] fix ext3 memory leak
This is the leak which Con found. Long story...
- If a dirty page is fed into ext3_writepage() during truncate,
block_write_full_page() will reutrn -EIO (it's outside i_size) and will
leave the buffers dirty. In the expectation that discard_buffer() will
clean them.
- ext3_writepage() then adds the still-dirty buffers to the journal's
"async data list". These are buffers which are known to have had IO
started. All we need to do is to wait on them in commit.
- meanwhile, truncate will chop the pages off the address_space. But
truncate cannot invalidate the buffers (in journal_unmap_buffer()) because
the buffers are attached to the committing transaction. (hm. This
behaviour in journal_unmap_buffer() is bogus. We just never need to write
these buffers.)
- ext3 commit will "wait on writeout" of these writepage buffers (even
though it was never started) and will then release them from the
journalling system.
So we end up with pages which are attached to no mapping, which are clean and
which have dirty buffers. These are unreclaimable.
Aside:
ext3-ordered has two buffer lists: the "sync data list" and the "async
data list".
The sync list consists of probably-dirty buffers which were dirtied in
commit_write(). Transaction commit must write all thee out and wait on
them.
The async list supposedly consists of clean buffers which were attached
to the journal in ->writepage. These have had IO started (by writepage) so
commit merely needs to wait on them.
This is all designed for the 2.4 VM really. In 2.5, tons of writeback
goes via writepage (instead of the buffer lru) and these buffers end up
madly hpooing between the async and sync lists.
Plus it's arguably incorrect to just wait on the writes in commit - if
the buffers were set dirty again (say, by zap_pte_range()) then perhaps we
should write them again before committing.
So what the patch does is to remove the async list. All ordered-data buffers
are now attached to the single "sync data list". So when we come to commit,
those buffers which are dirty will have IO started and all buffers are waited
upon.
This means that the dirty buffers against a clean page which came about from
block_write_full_page()'s -EIO will be written to disk in commit - this
cleans them, and the page is now reclaimable. No leak.
It seems bogus to write these buffers in commit, and indeed it is. But ext3
will not allow those blocks to be reused until the commit has ended so there
is no corruption risk. And the amount of data involved is low - it only
comes about as a race between truncate and writepage().
Diffstat (limited to 'include/linux/ext3_jbd.h')
| -rw-r--r-- | include/linux/ext3_jbd.h | 12 |
1 files changed, 0 insertions, 12 deletions
diff --git a/include/linux/ext3_jbd.h b/include/linux/ext3_jbd.h index 1985ecee6a3f..13508f6053b9 100644 --- a/include/linux/ext3_jbd.h +++ b/include/linux/ext3_jbd.h @@ -132,16 +132,6 @@ __ext3_journal_get_write_access(const char *where, return err; } -static inline int -__ext3_journal_dirty_data(const char *where, - handle_t *handle, struct buffer_head *bh, int async) -{ - int err = journal_dirty_data(handle, bh, async); - if (err) - ext3_journal_abort_handle(where, __FUNCTION__, bh, handle,err); - return err; -} - static inline void ext3_journal_forget(handle_t *handle, struct buffer_head *bh) { @@ -183,8 +173,6 @@ __ext3_journal_dirty_metadata(const char *where, __ext3_journal_get_undo_access(__FUNCTION__, (handle), (bh)) #define ext3_journal_get_write_access(handle, bh) \ __ext3_journal_get_write_access(__FUNCTION__, (handle), (bh)) -#define ext3_journal_dirty_data(handle, bh, async) \ - __ext3_journal_dirty_data(__FUNCTION__, (handle), (bh), (async)) #define ext3_journal_revoke(handle, blocknr, bh) \ __ext3_journal_revoke(__FUNCTION__, (handle), (blocknr), (bh)) #define ext3_journal_get_create_access(handle, bh) \ |
