| Age | Commit message (Collapse) | Author |
|
This is a rewrite of the ramdisk block device driver.
The old one is really difficult because it effectively implements a block
device which serves data out of its own buffer cache. It relies on the dirty
bit being set, to pin its backing store in cache, however there are non
trivial paths which can clear the dirty bit (eg. try_to_free_buffers()),
which had recently lead to data corruption. And in general it is completely
wrong for a block device driver to do this.
The new one is more like a regular block device driver. It has no idea about
vm/vfs stuff. It's backing store is similar to the buffer cache (a simple
radix-tree of pages), but it doesn't know anything about page cache (the pages
in the radix tree are not pagecache pages).
There is one slight downside -- direct block device access and filesystem
metadata access goes through an extra copy and gets stored in RAM twice.
However, this downside is only slight, because the real buffercache of the
device is now reclaimable (because we're not playing crazy games with it), so
under memory intensive situations, footprint should effectively be the same --
maybe even a slight advantage to the new driver because it can also reclaim
buffer heads.
The fact that it now goes through all the regular vm/fs paths makes it
much more useful for testing, too.
text data bss dec hex filename
2837 849 384 4070 fe6 drivers/block/rd.o
3528 371 12 3911 f47 drivers/block/brd.o
Text is larger, but data and bss are smaller, making total size smaller.
A few other nice things about it:
- Similar structure and layout to the new loop device handlinag.
- Dynamic ramdisk creation.
- Runtime flexible buffer head size (because it is no longer part of the
ramdisk code).
- Boot / load time flexible ramdisk size, which could easily be extended
to a per-ramdisk runtime changeable size (eg. with an ioctl).
- Can use highmem for the backing store.
[akpm@linux-foundation.org: fix build]
[byron.bbradley@gmail.com: make rd_size non-static]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Byron Bradley <byron.bbradley@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We have seen ramdisk based install systems, where some pages of mapped
libraries and programs were suddendly zeroed under memory pressure. This
should not happen, as the ramdisk avoids freeing its pages by keeping them
dirty all the time.
It turns out that there is a case, where the VM makes a ramdisk page clean,
without telling the ramdisk driver. On memory pressure shrink_zone runs
and it starts to run shrink_active_list. There is a check for
buffer_heads_over_limit, and if true, pagevec_strip is called.
pagevec_strip calls try_to_release_page. If the mapping has no releasepage
callback, try_to_free_buffers is called. try_to_free_buffers has now a
special logic for some file systems to make a dirty page clean, if all
buffers are clean. Thats what happened in our test case.
The simplest solution is to provide a noop-releasepage callback for the
ramdisk driver. This avoids try_to_free_buffers for ramdisk pages.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix the various misspellings of "system", controller", "interrupt" and
"[un]necessary".
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
|
|
Since the "ramdisk" kernel parameter has been officially deprecated
since at least 2.6.18, might as well finally get rid of it.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
provide BDI constructor/destructor hooks
[akpm@linux-foundation.org: compile fix]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
As bi_end_io is only called once when the reqeust is complete,
the 'size' argument is now redundant. Remove it.
Now there is no need for bio_endio to subtract the size completed
from bi_size. So don't do that either.
While we are at it, change bi_end_io to return void.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
Some of the code has been gradually transitioned to using the proper
struct request_queue, but there's lots left. So do a full sweet of
the kernel and get rid of this typedef and replace its uses with
the proper type.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
Signed-off-by: Michael Opdenacker <michael@free-electrons.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't
been used in 6 years (so akpm says).
find * -name \*.[ch] | xargs grep -l invalidate_bdev |
while read file; do
quilt add $file;
sed -ie 's/invalidate_bdev(\([^,]*\),[^)]*)/invalidate_bdev(\1)/g' $file;
done
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If RAM disk driver initialization fails due to blk_alloc_queue() faulure, the
gendisk structs stored in rd_disks[] will not be freed completely.
This patch resolves that memory leak case by doing alloc_disk() and
blk_alloc_queue() at the same time.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Make the ramdisk blocksize configurable at kernel compilation time rather
than only at boot or module load time, like a couple of the other ramdisk
options. I found this handy awhile back but thought little of it, until
recently asked by a few of the testing folks here to be able to do the same
thing for their automated test setups.
The Kconfig comment is largely lifted from comments in rd.c, and hopefully
this will increase the chances of making folks aware that the default value
often isn't a great choice here (for increasing values of PAGE_SIZE, even
moreso).
Signed-off-by: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/devfs-2.6: (22 commits)
[PATCH] devfs: Remove it from the feature_removal.txt file
[PATCH] devfs: Last little devfs cleanups throughout the kernel tree.
[PATCH] devfs: Rename TTY_DRIVER_NO_DEVFS to TTY_DRIVER_DYNAMIC_DEV
[PATCH] devfs: Remove the tty_driver devfs_name field as it's no longer needed
[PATCH] devfs: Remove the line_driver devfs_name field as it's no longer needed
[PATCH] devfs: Remove the videodevice devfs_name field as it's no longer needed
[PATCH] devfs: Remove the gendisk devfs_name field as it's no longer needed
[PATCH] devfs: Remove the miscdevice devfs_name field as it's no longer needed
[PATCH] devfs: Remove the devfs_fs_kernel.h file from the tree
[PATCH] devfs: Remove devfs_remove() function from the kernel tree
[PATCH] devfs: Remove devfs_mk_cdev() function from the kernel tree
[PATCH] devfs: Remove devfs_mk_bdev() function from the kernel tree
[PATCH] devfs: Remove devfs_mk_symlink() function from the kernel tree
[PATCH] devfs: Remove devfs_mk_dir() function from the kernel tree
[PATCH] devfs: Remove devfs_*_tape() functions from the kernel tree
[PATCH] devfs: Remove devfs support from the sound subsystem
[PATCH] devfs: Remove devfs support from the ide subsystem.
[PATCH] devfs: Remove devfs support from the serial subsystem
[PATCH] devfs: Remove devfs from the init code
[PATCH] devfs: Remove devfs from the partition code
...
|
|
Same as with already do with the file operations: keep them in .rodata and
prevents people from doing runtime patching.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
And remove the now unneeded number field.
Also fixes all drivers that set these fields.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Also fixes up all files that #include it.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Removes the devfs_remove() function and all callers of it.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Removes the devfs_mk_dir() function and all callers of it.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
We need set_page_dirty() to return true if it actually transitioned the page
from a clean to dirty state. This wasn't right in a couple of places. Do a
kernel-wide audit, fix things up.
This leaves open the possibility of returning a negative errno from
set_page_dirty() sometime in the future. But we don't do that at present.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
readpage(), prepare_write(), and commit_write() callers are updated to
understand the special return code AOP_TRUNCATED_PAGE in the style of
writepage() and WRITEPAGE_ACTIVATE. AOP_TRUNCATED_PAGE tells the caller that
the callee has unlocked the page and that the operation should be tried again
with a new page. OCFS2 uses this to detect and work around a lock inversion in
its aop methods. There should be no change in behaviour for methods that don't
return AOP_TRUNCATED_PAGE.
WRITEPAGE_ACTIVATE is also prepended with AOP_ for consistency and they are
made enums so that kerneldoc can be used to document their semantics.
Signed-off-by: Zach Brown <zach.brown@oracle.com>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
I somehow missed that there is external usage of rd_size on some
architectures.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch makes some needlessly global identifiers static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Arjan van de Ven <arjanv@infradead.org>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The attached patch replaces backing_dev_info::memory_backed with capabilitied
bitmap. The capabilities available include:
(*) BDI_CAP_NO_ACCT_DIRTY
Set if the pages associated with this backing device should not be
tracked by the dirty page accounting.
(*) BDI_CAP_NO_WRITEBACK
Set if dirty pages associated with this backing device should not have
writepage() or writepages() invoked upon them to clean them.
(*) Capability markers that indicate what a backing device is capable of
with regard to memory mapping facilities. These flags indicate whether a
device can be mapped directly, whether it can be copied for a mapping,
and whether direct mappings can be read, written and/or executed. This
information is primarily aimed at improving no-MMU private mapping
support.
The patch also provides convenience functions for determining the dirty-page
capabilities available on backing devices directly or on the backing devices
associated with a mapping. These are provided to keep line length down when
checking for the capabilities.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This is a megarollup of ~60 patches which give various things static scope.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Convert Ramdisk block device to module_param to get rid of warning. Add
module alias to cause correct autoloading.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The ramdisk_blocksize option has been broken for quite a while in 2.6.
Making an initrd with a 4K ext2 filesystem impossible to use.
After digging into this, the problem turned out to that rd.c was not
setting the hard sector size. There were a few secondary problems like
i_blkbits was not being set, and the number KiB in uncompressed ext2 images
was not taking into account the block size.
I have also corrected the surrounding comments as they were not just
incorrect but misleading.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
I waffled over this for ages. On balance, I think it's best to mark those
bh's as uptodate.
And on reflection, I'm not sure why we go bringing ramdisk blockdev pages
uptodate all over the place anyway. But ramdisk is weird and it passes
testing. Let those dogs sleep.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: Rusty Russell <rusty@rustcorp.com.au>
From: a.othieno@bluewin.ch (Arthur Othieno)
From: Vinay K Nallamothu <vinay-rc@naturesoft.net>
Remove various duplicated #includes
From: Vinay K Nallamothu <vinay-rc@naturesoft.net>
Use mod_timer in drivers_block_floppy98.c
From: carbonated beverage <ramune@net-ronin.org>
doc update for bk usage
bk://... appears to be dead, use http://... instead.
|
|
This re-applies the separation of the ramdisk blockdev inode
backing store and the files that live in the ramdisk.
Cset exclude: torvalds@ppc970.osdl.org|ChangeSet|20040521210025|21437
|
|
The code was incomplete. Let's re-do it when the other pieces
are in place.
Cset exclude: akpm@osdl.org[torvalds]|ChangeSet|20040521202003|24125
|
|
Give appropriate and separate backing_dev_info's to both the ramdisk blockdev
inode and to the files which live atop the ramdisk.
Everything works now.
|
|
Implement an empty ->writepages() so that attempts to write back ramdisk pages
have less work to do.
|
|
When a filesystem does getblk() to get a buffer_head against the ramdisk the
VFS will allocate a new not-uptodate pagecache page and will attach buffers to
it.
The filesystem will then bring certain buffer_heads uptodate. But not the
whole page.
Later, various ramdisk a_ops see the not-uptodate page and wipe the whole
thing out, including the parts to which the filesystem wrote!
Fix that up by only zapping those parts of the page which are covered by
non-uptodate buffers.
|
|
We don't actualy need to kmap the blockdev inode's pages at all, because
they're ~__GFP_HIGHMEM. But it's future-safe, and cheap.
|
|
There's a race: one CPU writes a 1k block into a ramdisk page which isn't in
the blockdev pagecache yet. It memsets the locked page to zeroes.
While this is happening, another CPU comes in and tries to write a different
1k block to the "disk". But it doesn't lock the page so it races with the
memset and can have its data scribbled over.
Fix this up by locking the page even if it already existed in pagecache.
Locking a pagecache page in a make_request_fn sounds deadlocky but it is not,
because:
a) ramdisk_writepage() does nothing but a set_bit(), and cannot recur onto
the same page.
b) Any higher-level code which holds a page lock is supposed to be
allocating its memory with GFP_NOFS, and in 2.6 kernels that's equivalent
to GFP_NOIO.
(The distinction between GFP_NOIO and GFP_NOFS basically disappeared
with the buffer_head LRU, although it was reused for writes to swap).
|
|
Allocating pagecache pages within the disk request_fn is deadlocky and prone
to page allocation failures, causing write I/O errors.
Attempt to improve things by fiddling with gfp masks.
|
|
- Remove the ramdisk special-case in fs-writeback.c - it will soon be
unneeded.
- Fix rd_ioctl() to return -ENOTTY on invalid ioctl types, not -EINVAL.
- Make ramdisk Kconfig friendlier.
|
|
From: Jens Axboe <axboe@suse.de>,
Chris Mason,
me, others.
The global unplug list causes horrid spinlock contention on many-disk
many-CPU setups - throughput is worse than halved.
The other problem with the global unplugging is of course that it will cause
the unplugging of queues which are unrelated to the I/O upon which the caller
is about to wait.
So what we do to solve these problems is to remove the global unplug and set
up the infrastructure under which the VFS can tell the block layer to unplug
only those queues which are relevant to the page or buffer_head whcih is
about to be waited upon.
We do this via the very appropriate address_space->backing_dev_info structure.
Most of the complexity is in devicemapper, MD and swapper_space, because for
these backing devices, multiple queues may need to be unplugged to complete a
page/buffer I/O. In each case we ensure that data structures are in place to
permit us to identify all the lower-level queues which contribute to the
higher-level backing_dev_info. Each contributing queue is told to unplug in
response to a higher-level unplug.
To simplify things in various places we also introduce the concept of a
"synchronous BIO": it is tagged with BIO_RW_SYNC. The block layer will
perform an immediate unplug when it sees one of these go past.
|
|
Free the request queues on the rd_init() error recovery path.
|
|
Noticed by me, fixed by Jens.
|
|
Fairly pointless coding-style cleanups which I've been sitting on for ages.
The ramdisk driver is still buggy: it drops pagecache when unmounted. I
still need to fix this.
Apparently it also displays data corruption under load even when not
unmounted.
|
|
The recent change to /proc/partitions which prevents it from displaying
removeable media accidentally caused 128 NBD and 16 ramdisk partitions to
appear in /proc/partitions instead.
So add a specific gendisk flag which says "don't show me in /proc/partitions"
and use that.
|
|
These no longer do anything.
This patch changes modules API. It was acked by Arjan@RH and Hubert@Suse.
|
|
new helper - iminor(inode); defined as minor(inode->i_rdev); lots and
lots of places in drivers had been switched to it.
|
|
struct block_device made the private part of bdevfs inodes; bd_count
is gone, we use ->i_count of inode now; separate hash is also gone and we
are using iget5_locked()/igrab()/iput() instead.
|
|
To be able to properly be able to keep references to block queues,
we make blk_init_queue() return the queue that it initialized, and
let it be independently allocated and then cleaned up on the last
reference.
I have grepped high and low, and there really shouldn't be any broken
uses of blk_init_queue() in the kernel drivers left. The added bonus
being blk_init_queue() error checking is explicit now, most of the
drivers were broken in this regard (even IDE/SCSI).
No drivers have embedded request queue structures. Drivers that don't
use blk_init_queue() but blk_queue_make_request(), should allocate the
queue with blk_alloc_queue(gfp_mask). I've converted all of them to do
that, too. They can call blk_cleanup_queue() now too, using the define
blk_put_queue() is probably cleaner though.
|