| Age | Commit message (Collapse) | Author |
|
Commit 29a814d2ee0e43c2980f33f91c1311ec06c0aa35 (vfs: add hooks for
ext4's delayed allocation support) exported the following functions
mpage_bio_submit()
__mpage_writepage()
for the benefit of ext4's delayed allocation support. Since commit
a1d6cc563bfdf1bf2829d3e6ce4d8b774251796b (ext4: Rework the
ext4_da_writepages() function), these functions are not used by the
ext4 driver anymore. However, the now unnecessary exports still
remain, and this patch removes those. Moreover, these two functions
can become static again.
The issue was spotted by namespacecheck.
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Export mpage_bio_submit() and __mpage_writepage() for the benefit of
ext4's delayed allocation support. Also change __block_write_full_page
so that if buffers that have the BH_Delay flag set it will call
get_block() to get the physical block allocated, just as in the
!BH_Mapped case.
Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|
Clean up massive code duplication between mpage_writepages() and
generic_writepages().
The new generic function, write_cache_pages() takes a function pointer
argument, which will be called for each page to be written.
Maybe cifs_writepages() too can use this infrastructure, but I'm not
touching that with a ten-foot pole.
The upcoming page writeback support in fuse will also want this.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Make it possible to disable the block layer. Not all embedded devices require
it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
the block layer to be present.
This patch does the following:
(*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
support.
(*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
an item that uses the block layer. This includes:
(*) Block I/O tracing.
(*) Disk partition code.
(*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
(*) The SCSI layer. As far as I can tell, even SCSI chardevs use the
block layer to do scheduling. Some drivers that use SCSI facilities -
such as USB storage - end up disabled indirectly from this.
(*) Various block-based device drivers, such as IDE and the old CDROM
drivers.
(*) MTD blockdev handling and FTL.
(*) JFFS - which uses set_bdev_super(), something it could avoid doing by
taking a leaf out of JFFS2's book.
(*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is,
however, still used in places, and so is still available.
(*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
parts of linux/fs.h.
(*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
(*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
(*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
is not enabled.
(*) fs/no-block.c is created to hold out-of-line stubs and things that are
required when CONFIG_BLOCK is not set:
(*) Default blockdev file operations (to give error ENODEV on opening).
(*) Makes some /proc changes:
(*) /proc/devices does not list any blockdevs.
(*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
(*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
(*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
given command other than Q_SYNC or if a special device is specified.
(*) In init/do_mounts.c, no reference is made to the blockdev routines if
CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2.
(*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
error ENOSYS by way of cond_syscall if so).
(*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
CONFIG_BLOCK is not set, since they can't then happen.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Dissociate the generic_writepages() function from the mpage stuff, moving its
declaration to linux/mm.h and actually emitting a full implementation into
mm/page-writeback.c.
The implementation is a partial duplicate of mpage_writepages() with all BIO
references removed.
It is used by NFS to do writeback.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This had a fatal lock ranking bug: we do journal_start outside
mpage_writepages()'s lock_page().
Revert the whole thing, think again.
Credit-to: Jan Kara <jack@suse.cz>
For identifying the bug.
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Add writepages support for ext3 writeback mode.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Add nobh_wripage() support for the filesystems which uses
nobh_prepare_write/nobh_commit_write().
Idea here is to reduce unnecessary bufferhead creation/attachment to the
page through pageout()->block_write_full_page(). nobh_wripage() tries to
operate by directly creating bios, but it falls back to
__block_write_full_page() if it can't make progress.
Note that this is not really generic routine and can't be used for
filesystems which uses page->Private for anything other than buffer heads.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The writeback code paths which walk the superblocks and inodes are
getting an increasing arguments passed to them.
The patch wraps those args into the new `struct writeback_control',
and uses that instead. There is no functional change.
The new writeback_control structure is passed down through the
writeback paths in the place where the old `nr_to_write' pointer used
to be.
writeback_control will be used to pass new information up and down the
writeback paths. Such as whether the writeback should be non-blocking,
and whether queue congestion was encountered.
|
|
generic_writepages() is just a wrapper around mpage_writepages(), so
inline it.
|
|
Spot the difference:
aops.readpage
aops.readpages
aops.writepage
aops.writeback_mapping
The patch renames `writeback_mapping' to `writepages'
|
|
Multipage BIO writeout from the pagecache.
It's pretty much the same as multipage reads. It falls back to buffers
if things got complex.
The write case is a little more complex because it handles pages which
have buffers and pages which do not. If the page didn't have buffers
this code does not add them.
|
|
Implements BIO-based multipage reads into the pagecache, and turns this
on for ext2.
CPU load for `cat large_file > /dev/null' is reduced by approximately
15%. Similar reductions for tiobench with a single thread. (Earlier
claims of 25% were exaggerated - they were measured with slab debug
enabled. But 15% isn't bad for a load which is dominated by copy_*_user
costs).
With 2, 4 and 8 tiobench threads, throughput is increased as well, which was
unexpected. It's due to request queue weirdness. (Generally the
request queueing is doing bad things under certain workloads - that's a
separate issue.)
BIOs of up to 64 kbytes are assembled and submitted for readahead and
for single-page reads. So the work involved in reading 32 pages has gone
from:
- allocate and attach 32 buffer_heads
- submit 32 buffer_heads
- allocate 32 bios
- submit 32 bios
to:
- allocate 2 bios
- submit 2 bios
These pages never have buffers attached. Buffers will be attached
later if the application writes to these pages (file overwrite).
The first version of this code (in the "delayed allocation" patches)
tries to handle everything - bios which start mid-page, bios which end
mid-page and pages which are covered by multiple bios. It is very
complex code and in fact appears to be incorrect: out-of-order BIO
completion could cause a page to come unlocked at the wrong time.
This implementation is much simpler: if things get complex, it just
falls back to the buffer-based block_read_full_page(), which isn't
going away, and which understands all that complexity. There's no
point in doing this in two places.
This code will bypass the buffer layer for
- fully-mapped pages which are on-disk contiguous.
- fully unmapoped pages (holes)
- partially unmapped pages, where the unmappedness is at the end of
the page (end-of-file).
and everything else falls back to buffers.
This means that with blocksize == PAGE_CACHE_SIZE, 100% of pages are
handed direct to BIO. With a heavy 10-minute dbench run on 4k
PAGE_CACHE_SIZE and 1k blocks, 95% of pages were handed direct to BIO.
Almost all of the other 5% were passed to block_read_full_page()
because they were already partially uptodate from an earlier sub-page
write(). This ratio will fall if PAGE_CACHE_SIZE/blocksize is greater
than four. But if that's the case, CPU efficiency is far from the main
concern - there are significant seek and bandwidth problems just at 4
blocks per page.
This code will stress out the block layer somewhat - RAID0 doesn't like
multipage BIOs, and there are probably others. RAID0 seems to struggle
along - readahead fails but read falls back to single-page reads, which
succeed. Such problems may be worked around by setting MPAGE_BIO_MAX_SIZE
to PAGE_CACHE_SIZE in fs/mpage.c.
It is trivial to enable multipage reads for many other filesystems. We
can do that after completion of external testing of ext2.
|