<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/fs/buffer.c, branch v2.6.30.7</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v2.6.30.7</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v2.6.30.7'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2009-09-09T03:33:19Z</updated>
<entry>
<title>Re-introduce page mapping check in mark_buffer_dirty()</title>
<updated>2009-09-09T03:33:19Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2009-08-22T00:40:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=006fb349ae9699ccc9a27483b6a94be0bd87cdd0'/>
<id>urn:sha1:006fb349ae9699ccc9a27483b6a94be0bd87cdd0</id>
<content type='text'>
commit 8e9d78edea3ce5c0036f85b93091483f2f15443a upstream.

In commit a8e7d49aa7be728c4ae241a75a2a124cdcabc0c5 ("Fix race in
create_empty_buffers() vs __set_page_dirty_buffers()"), I removed a test
for a NULL page mapping unintentionally when some of the code inside
__set_page_dirty() was moved to the callers.

That removal generally didn't matter, since a filesystem would serialize
truncation (which clears the page mapping) against writing (which marks
the buffer dirty), so locking at a higher level (either per-page or an
inode at a time) should mean that the buffer page would be stable.  And
indeed, nothing bad seemed to happen.

Except it turns out that apparently reiserfs does something odd when
under load and writing out the journal, and we have a number of bugzilla
entries that look similar:

	http://bugzilla.kernel.org/show_bug.cgi?id=13556
	http://bugzilla.kernel.org/show_bug.cgi?id=13756
	http://bugzilla.kernel.org/show_bug.cgi?id=13876

and it looks like reiserfs depended on that check (the common theme
seems to be "data=journal", and a journal writeback during a truncate).

I suspect reiserfs should have some additional locking, but in the
meantime this should get us back to the pre-2.6.29 behavior.

Pattern-pointed-out-by: Roland Kletzing &lt;devzero@web.de&gt;
Cc: Jeff Mahoney &lt;jeffm@suse.com&gt;
Cc: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>Fix nobh_truncate_page() to not pass stack garbage to get_block()</title>
<updated>2009-06-06T10:17:25Z</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2009-05-12T11:37:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=460bcf57b128ce1c0dd553d905fedc097f9955c6'/>
<id>urn:sha1:460bcf57b128ce1c0dd553d905fedc097f9955c6</id>
<content type='text'>
The nobh_truncate_page() function is used by ext2, exofs, and jfs.  Of
these three, only ext2 and jfs's get_block() function pays attention
to bh-&gt;b_size --- which is normally always the filesystem blocksize
except when the get_block() function is called by either
mpage_readpage(), mpage_readpages(), or the direct I/O routines in
fs/direct_io.c.

Unfortunately, nobh_truncate_page() does not initialize map_bh before
calling the filesystem-supplied get_block() function.  So ext2 and jfs
will try to calculate the number of blocks to map by taking stack
garbage and shifting it left by inode-&gt;i_blkbits.  This should be
*mostly* harmless (except the filesystem will do some unnneeded work)
unless the stack garbage is less than filesystem's blocksize, in which
case maxblocks will be zero, and the attempt to find out whether or
not the filesystem has a hole at a given logical block will fail, and
the page cache entry might not get zero'ed out.

Also if the stack garbage in in map_bh-&gt;state happens to have the
BH_Mapped bit set, there could be an attempt to call readpage() on a
non-existent page, which could cause nobh_truncate_page() to return an
error when it should not.

Fix this by initializing map_bh-&gt;state and map_bh-&gt;size.

Fortunately, it's probably fairly unlikely that ext2 and jfs users
mount with nobh these days.

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
Cc: Dave Kleikamp &lt;shaggy@linux.vnet.ibm.com&gt;
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>mm: close page_mkwrite races</title>
<updated>2009-05-02T22:36:09Z</updated>
<author>
<name>Nick Piggin</name>
<email>npiggin@suse.de</email>
</author>
<published>2009-04-30T22:08:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b827e496c893de0c0f142abfaeb8730a2fd6b37f'/>
<id>urn:sha1:b827e496c893de0c0f142abfaeb8730a2fd6b37f</id>
<content type='text'>
Change page_mkwrite to allow implementations to return with the page
locked, and also change it's callers (in page fault paths) to hold the
lock until the page is marked dirty.  This allows the filesystem to have
full control of page dirtying events coming from the VM.

Rather than simply hold the page locked over the page_mkwrite call, we
call page_mkwrite with the page unlocked and allow callers to return with
it locked, so filesystems can avoid LOR conditions with page lock.

The problem with the current scheme is this: a filesystem that wants to
associate some metadata with a page as long as the page is dirty, will
perform this manipulation in its -&gt;page_mkwrite.  It currently then must
return with the page unlocked and may not hold any other locks (according
to existing page_mkwrite convention).

In this window, the VM could write out the page, clearing page-dirty.  The
filesystem has no good way to detect that a dirty pte is about to be
attached, so it will happily write out the page, at which point, the
filesystem may manipulate the metadata to reflect that the page is no
longer dirty.

It is not always possible to perform the required metadata manipulation in
-&gt;set_page_dirty, because that function cannot block or fail.  The
filesystem may need to allocate some data structure, for example.

And the VM cannot mark the pte dirty before page_mkwrite, because
page_mkwrite is allowed to fail, so we must not allow any window where the
page could be written to if page_mkwrite does fail.

This solution of holding the page locked over the 3 critical operations
(page_mkwrite, setting the pte dirty, and finally setting the page dirty)
closes out races nicely, preventing page cleaning for writeout being
initiated in that window.  This provides the filesystem with a strong
synchronisation against the VM here.

- Sage needs this race closed for ceph filesystem.
- Trond for NFS (http://bugzilla.kernel.org/show_bug.cgi?id=12913).
- I need it for fsblock.
- I suspect other filesystems may need it too (eg. btrfs).
- I have converted buffer.c to the new locking. Even simple block allocation
  under dirty pages might be susceptible to i_size changing under partial page
  at the end of file (we also have a buffer.c-side problem here, but it cannot
  be fixed properly without this patch).
- Other filesystems (eg. NFS, maybe btrfs) will need to change their
  page_mkwrite functions themselves.

[ This also moves page_mkwrite another step closer to fault, which should
  eventually allow page_mkwrite to be moved into -&gt;fault, and thus avoiding a
  filesystem calldown and page lock/unlock cycle in __do_fault. ]

[akpm@linux-foundation.org: fix derefs of NULL -&gt;mapping]
Cc: Sage Weil &lt;sage@newdream.net&gt;
Cc: Trond Myklebust &lt;trond.myklebust@fys.uio.no&gt;
Signed-off-by: Nick Piggin &lt;npiggin@suse.de&gt;
Cc: Valdis Kletnieks &lt;Valdis.Kletnieks@vt.edu&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Add block_write_full_page_endio for passing endio handler</title>
<updated>2009-04-16T14:47:49Z</updated>
<author>
<name>Chris Mason</name>
<email>chris.mason@oracle.com</email>
</author>
<published>2009-04-15T17:22:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=35c80d5f400f68f2eccf3069d1c068e154bde9c9'/>
<id>urn:sha1:35c80d5f400f68f2eccf3069d1c068e154bde9c9</id>
<content type='text'>
block_write_full_page doesn't allow the caller to control what happens
when the IO is over.  This adds a new call named block_write_full_page_endio
so the buffer head end_io handler can be provided by the caller.

This will be used by the ext3 data=guarded mode to do i_size updates in
a workqueue based end_io handler.  end_buffer_async_write is also
exported so it can be called to do the dirty work of managing page
writeback for the higher level end_io handler.

Signed-off-by: Chris Mason &lt;chris.mason@oracle.com&gt;
Acked-by: Theodore Tso &lt;tytso@mit.edu&gt;
Acked-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>buffer: switch do_emergency_thaw() away from pdflush_operation()</title>
<updated>2009-04-15T06:28:12Z</updated>
<author>
<name>Jens Axboe</name>
<email>jens.axboe@oracle.com</email>
</author>
<published>2009-04-08T11:44:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=053c525fcf976810f023d96472f414c0d5e6339b'/>
<id>urn:sha1:053c525fcf976810f023d96472f414c0d5e6339b</id>
<content type='text'>
This is (again) a preparatory patch similar to commit
a2a9537ac0b37a5da6fbe7e1e9cb06c524d2a9c4. It open codes a simple
async way of executing do_thaw_all() out of context, so we can get
rid of pdflush.

Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
<entry>
<title>block_write_full_page: switch synchronous writes to use WRITE_SYNC_PLUG</title>
<updated>2009-04-08T17:15:09Z</updated>
<author>
<name>Theodore Ts'o</name>
<email>tytso@mit.edu</email>
</author>
<published>2009-04-07T22:12:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6e34eeddf7deec1444bbddab533f03f520d8458c'/>
<id>urn:sha1:6e34eeddf7deec1444bbddab533f03f520d8458c</id>
<content type='text'>
Now that we have a distinction between WRITE_SYNC and WRITE_SYNC_PLUG,
use WRITE_SYNC_PLUG in __block_write_full_page() to avoid unplugging
the block device I/O queue between each page that gets flushed out.

Otherwise, when we run sync() or fsync() and we need to write out a
large number of pages, the block device queue will get unplugged
between for every page that is flushed out, which will be a pretty
serious performance regression caused by commit a64c8610.

Signed-off-by: "Theodore Ts'o" &lt;tytso@mit.edu&gt;
</content>
</entry>
<entry>
<title>block: switch sync_dirty_buffer() over to WRITE_SYNC</title>
<updated>2009-04-06T15:04:54Z</updated>
<author>
<name>Jens Axboe</name>
<email>jens.axboe@oracle.com</email>
</author>
<published>2009-04-06T12:48:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1aa2a7cc6fd7b5c86681a6ae9dfd1072c261a435'/>
<id>urn:sha1:1aa2a7cc6fd7b5c86681a6ae9dfd1072c261a435</id>
<content type='text'>
We should now have the logic in place to handle this properly
without regressing on the write performance, so re-enable
the sync writes.

Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>block: fsync_buffers_list() should use SWRITE_SYNC_PLUG</title>
<updated>2009-04-06T15:04:53Z</updated>
<author>
<name>Jens Axboe</name>
<email>jens.axboe@oracle.com</email>
</author>
<published>2009-04-06T12:48:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9cf6b720f84d6999ff9a514d0a939dd183846aaf'/>
<id>urn:sha1:9cf6b720f84d6999ff9a514d0a939dd183846aaf</id>
<content type='text'>
Then it can submit all the buffers without unplugging for each one.
We will kick off the pending IO if we come across a new address space.

Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'ext3-latency-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4</title>
<updated>2009-04-03T18:10:33Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2009-04-03T18:10:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=20bec8ab1458c24bed0d5492ee15d87807fc415a'/>
<id>urn:sha1:20bec8ab1458c24bed0d5492ee15d87807fc415a</id>
<content type='text'>
* 'ext3-latency-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext3: Add replace-on-rename hueristics for data=writeback mode
  ext3: Add replace-on-truncate hueristics for data=writeback mode
  ext3: Use WRITE_SYNC for commits which are caused by fsync()
  block_write_full_page: Use synchronous writes for WBC_SYNC_ALL writebacks
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6</title>
<updated>2009-04-03T04:09:10Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2009-04-03T04:09:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8fe74cf053de7ad2124a894996f84fa890a81093'/>
<id>urn:sha1:8fe74cf053de7ad2124a894996f84fa890a81093</id>
<content type='text'>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  Remove two unneeded exports and make two symbols static in fs/mpage.c
  Cleanup after commit 585d3bc06f4ca57f975a5a1f698f65a45ea66225
  Trim includes of fdtable.h
  Don't crap into descriptor table in binfmt_som
  Trim includes in binfmt_elf
  Don't mess with descriptor table in load_elf_binary()
  Get rid of indirect include of fs_struct.h
  New helper - current_umask()
  check_unsafe_exec() doesn't care about signal handlers sharing
  New locking/refcounting for fs_struct
  Take fs_struct handling to new file (fs/fs_struct.c)
  Get rid of bumping fs_struct refcount in pivot_root(2)
  Kill unsharing fs_struct in __set_personality()
</content>
</entry>
</feed>
