| Age | Commit message (Collapse) | Author |
|
Add scsi helpers to create really-large-requests and convert
scsi-ml to scsi_execute_async().
Per Jens's previous comments, I placed this function in scsi_lib.c.
I made it follow all the queue's limits - I think I did at least :), so
I removed the warning on the function header.
I think the scsi_execute_* functions should eventually take a request_queue
and be placed some place where the dm-multipath hw_handler can use them
if that failover code is going to stay in the kernel. That conversion
patch will be sent in another mail though.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
- ->releasepage() annotated (s/int/gfp_t), instances updated
- missing gfp_t in fs/* added
- fixed misannotation from the original sweep caught by bitwise checks:
XFS used __nocast both for gfp_t and for flags used by XFS allocator.
The latter left with unsigned int __nocast; we might want to add a
different type for those but for now let's leave them alone. That,
BTW, is a case when __nocast use had been actively confusing - it had
been used in the same code for two different and similar types, with
no way to catch misuses. Switch of gfp_t to bitwise had caught that
immediately...
One tricky bit is left alone to be dealt with later - mapping->flags is
a mix of gfp_t and error indications. Left alone for now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
- added typedef unsigned int __nocast gfp_t;
- replaced __nocast uses for gfp flags with gfp_t - it gives exactly
the same warnings as far as sparse is concerned, doesn't change
generated code (from gcc point of view we replaced unsigned int with
typedef) and documents what's going on far better.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
"extern inline" doesn't make much sense.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
Jens:
->bi_set is totally unnecessary bloat of struct bio. Just define a proper
destructor for the bio and it already knows what bio_set it belongs too.
Peter:
Fixed the bugs.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
This updates the CFQ io scheduler to the new time sliced design (cfq
v3). It provides full process fairness, while giving excellent
aggregate system throughput even for many competing processes. It
supports io priorities, either inherited from the cpu nice value or set
directly with the ioprio_get/set syscalls. The latter closely mimic
set/getpriority.
This import is based on my latest from -mm.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Jens Axboe <axboe@suse.de>
|
|
Add blk_rq_map_kern which takes a kernel buffer and maps it into
a request and bio. This can be used by the dm hw_handlers, old
sg_scsi_ioctl, and one day scsi special requests so all requests
comming into scsi will have bios. All requests having bios
should allow scsi to use scatter lists for all IO and allow it
to use block layer functions.
Signed-off-by: Jens Axboe <axboe@suse.de>
|
|
This makes it hard(er) to mix argument orders by mistake for things like
kmalloc() and friends, since silent integer promotion is now caught by
sparse.
|
|
I've had this patch reviewed by Jens, and incorporated his recommended
fixes.
The patch adds new interfaces to bio.c that support the creation of local
bio and bvec pools. This is important for layered drivers that need to
allocate new bio and bvec structures in response to bio's submitted to it
from higher up. The layered drivers can allocate local pools of bio
structures to preclude deadlock under global bio pool exhaustion.
The device mapper source files have been modified to remove duplicate bio
code, and to use the new interfaces to create local bio pools.
From: Dave Olien <dmo@osdl.org>
Change bio_clone() to use the global bio_set pool instead of the bio_set pool
associated with the bio argument. This is because raid5 and raid6 bio's are
not allocated from a bio_set and have no bio_set associated with them. This
patch along with the patch Linux just accepted allows raid5 and raid6 to
function.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch is from the Xen crew - it allows override of deciding whether
two given pages can be considered physically contigous or not. This is
similar to how we handle iommu and virtual merging.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
IDE disk barrier core.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
blk_rq_map_user() is a bit of a hack currently, since it drops back to
kmalloc() if bio_map_user() fails. This is unfortunate since it means we
do no real segment or size checking (and the request segment counts contain
crap, already found one bug in a scsi lld). It's also pretty nasty for >
PAGE_SIZE requests, as we attempt to do higher order page allocations.
Even worse still, ide-cd will drop back to PIO for non-sg/bio requests.
All in all, very suboptimal.
This patch adds bio_copy_user() which simply sets up a bio with kernel
pages and copies data as needed for reads and writes. It also changes
bio_map_user() to return an error pointer like bio_copy_user(), so we can
return something sane to the user instead of always -ENOMEM.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch is from James, I've changed it slightly only.
The problem is that some IOMMU implementations have a maximum limit to the
size of the number of contiguously mappable pages (admittedly, this limit
is mostly in the resource management algorithms rather than the IOMMUs
themselves).
This patch adds this concept to the bio layer via the parameter
BIO_VMERGE_MAX_SIZE
which architectures can define in asm/io.h (if undefined, we assume it to
be infinite, which is current behaviour).
While adding this, I noticed several places where bio was making incorrect
assumptions about virtual mergeability (none of which was a bug: bio was
overestimating rather than underestimating).
- The worst offender was bio_add_page(), which seemed never to check for
virtual mergeability
- I also fixed blk_hw_contig_segments() not to check the QUEUE_CLUSTER
flag, and not to check the phys segment boundary.
In order to track the hw segment size across bios, I had to introduce two
extra bio parameters: bi_hw_front_size and bi_hw_back_size which store the
sizes of the front and back hw contiguous segments (and which will be equal
if there's only one hw segment). When the bio is merged into a request,
these fields are updated with the total hw contig size so they can always
be used to assess if the merger would violate the BIO_VMERGE_MAX_SIZE
parameter.
Signed-Off-By: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: Jens Axboe <axboe@suse.de>,
Chris Mason,
me, others.
The global unplug list causes horrid spinlock contention on many-disk
many-CPU setups - throughput is worse than halved.
The other problem with the global unplugging is of course that it will cause
the unplugging of queues which are unrelated to the I/O upon which the caller
is about to wait.
So what we do to solve these problems is to remove the global unplug and set
up the infrastructure under which the VFS can tell the block layer to unplug
only those queues which are relevant to the page or buffer_head whcih is
about to be waited upon.
We do this via the very appropriate address_space->backing_dev_info structure.
Most of the complexity is in devicemapper, MD and swapper_space, because for
these backing devices, multiple queues may need to be unplugged to complete a
page/buffer I/O. In each case we ensure that data structures are in place to
permit us to identify all the lower-level queues which contribute to the
higher-level backing_dev_info. Each contributing queue is told to unplug in
response to a higher-level unplug.
To simplify things in various places we also introduce the concept of a
"synchronous BIO": it is tagged with BIO_RW_SYNC. The block layer will
perform an immediate unplug when it sees one of these go past.
|
|
into home.osdl.org:/home/torvalds/v2.5/linux
|
|
include/linux/bio.h:234: sorry, unimplemented: inlining failed in call to 'bio_phys_segments': function body not available
|
|
This patch against a recent bk 2.6 changes scsi_cmd_ioctl to take a
gendisk as an argument instead of a request_queue_t. This allows scsi char
devices to use the scsi_cmd_ioctl interface.
In turn, change bio_map_user to also pass a request_queue_t, and add a
__bio_add_page helper that takes a request_queue_t.
Tested ide cd burning with no problems.
If the scsi upper level scsi_cmd_ioctl usage were consolidated in
scsi_prep_fn, we could pass a request_queue_t instead of a gendisk to
scsi_cmd_ioctl.
|
|
From: Adrian Bunk <bunk@fs.tum.de>
four months ago, Rolf Eike Beer <eike-kernel@sf-tec.de> sent a patch
against 2.6.0-test5-bk1 that converted several if ... BUG() to BUG_ON()
This might in some cases result in slightly faster code because BUG_ON()
uses unlikely().
|
|
From: Mark Haverkamp <markh@osdl.org>
This fixes a problem similar to the patch I submitted on 11/20
http://marc.theaimsgroup.com/?l=linux-kernel&m=106936439707962&w=2
In this case, though, the result is an:
"Incorrect number of segments after building list" message.
The macro __BVEC_START assumes a bi_idx of zero when the dm code can
submit a bio with a non-zero bi_idx.
The code has been tested on an 8 way / 8gb OSDL STP machine with a 197G
lvm volume running dbt2 test.
|
|
From: Christoph Hellwig <hch@lst.de>
now that kdev_t is gone very few places needs this still, the only header
of those beeing fs.h
|
|
Here's the patch to enable failfast flag in the bio submission code, and
use it for multipath and readahead.
|
|
From: Hugh Dickins <hugh@veritas.com>
bio_copy is used only by the loop driver, which already has to walk the bio
segments itself: so it makes sense to change it from bio.c export to loop.c
static, as prelude to working upon it there.
bio_copy itself is unchanged by this patch, with one exception. On oom
failure it must use bio_put, instead of mempool_free to static bio_pool:
which it should have been doing all along - it was leaking the veclist.
(Grudgingly acked by Jens)
|
|
|
|
So here it is, easy split support for md and dm. Neil, the changes over
your version are merely:
- Make a global bio split pool instead of requring device setup of one.
Will waste 8 * sizeof(struct bio_pair) of RAM, but... For 2.6 at least
it has to be a core functionality.
- Various style changes to follow the kernel guide lines.
|
|
Add bio traversal functionality. This is a prereq for doing ide
multiwrites safely and sanely. Patch was originally done by Suparna,
Bartlomiej picked it up and changed the design somewhat. From Bart:
Main idea is now reversed - instead of introducing rq->hard_bio as
pointer for bio to be completed and using rq->bio as pointer for bio
to be submitted, rq->cbio is introduced for submissions and rq->bio
is used for completions
This minimizes changes to block layer and assures that all existing
block users are not affected by this patch.
|
|
RAID5 is calling copy_data() under sh->lock. But copy_data() does kmap(),
which can sleep.
The best fix is to use kmap_atomic() in there. It is faster than kmap() and
does not block.
The patch removes the unused bio_kmap() and replaces __bio_kmap() with
__bio_kmap_atomic(). I think it's best to withdraw the sleeping-and-slow
bio_kmap() from the kernel API before someone else tries to use it.
Also, I notice that bio_kmap_irq() was using local_save_flags(). This is a
bug - local_save_flags() does not disable interrupts. Converted that to
local_irq_save(). These names are terribly chosen.
This patch was acked by Jens and Neil.
|
|
|
|
In two cases (AIO-for-direct-IO and some CDROM DMA stuff which Jens
did), we need to run set_page_dirty() in interrupt context. After DMA
hardware has altered userspace pages for direct-IO reads.
But mapping->page_lock, mapping->private_lock and inode_lock are not
irq-safe. And really, we don't want to convert those locks just for this
problem.
So what we do is to dirty the pages *before* starting I/O. Then, in
interrupt completion context, check to see that they are still dirty.
If so then there is nothing to do. But if the pages were cleaned while
the direct-IO is in progress we need to redirty them. The code uses
schedule_work() for that.
Of course, we could use schedule_work() for all BIOs and pages. The
speculative dirty-before-starting-IO is just an optimisation. It's
quite unlikely that any of the pages will be cleaned during the direct
IO.
This code is basically untestable under normal circumstances, because the
relevant pages are pinned via get_user_pages(). This makes
is_page_cache_freeable() evaluate false and the VM doesn't try to write them
out anyway. But if the pages happen to be MAP_SHARED file pages, pdflush
could clean them. The way I tested this code was to disable the call to
bio_set_pages_dirty() in fs/direct-io.c.
|
|
This adds bio_map_user and bio_unmap_user to aid drivers in mapping user
space memory into a bio suitable for block io.
|
|
Sometimes we don't even need a bio->bi_end_io, so make it optional. This
also encourages users to _use_ bio_endio()! I like that, since it means
they don't have to remember to decrement bi_size themselves.
Also clear bi_private in bio_init(), and switch to subsys_initcall().
|
|
o Split blk_queue_bounce() into a slow and fast path. The fast path is
inlined, only if we actually need to check the bio for possible
bounces (and bounce) do we enter __blk_queue_bounce() slow path.
o Fix a nasty bug that could cause corruption for file systems not
using PAGE_CACHE_SIZE blok size! We were not correctly setting the
'to' bv_offset correctly.
o Add BIO_BOUNCE flag. Later patches will use this for debug checking.
|
|
This changes the way we do pool lookups when freeing a bio. Right now
we use bi_max as a handle into bvec_array[], to find the pool where it
came from. This used to be just fine, because if you had a private bio,
you could specify your own destructor. But now we have bio_add_page()
which also needs to know where the bio came from, or more precisely, it
needs to know how many entries the bio can hold.
So I've changed bi_max to bi_max_vecs, it now contains the number of vec
entries in the bio. Privately allocated bio's (or on stack) can now just
set bio->bi_max_vecs to reflect the max size. The pool index for the
default destructor is stored in the top bits of bi_flags.
|
|
Add bio_get_nr_vecs(). It returns an approximate number of pages that
can be added to a block device. It's just a ballpark number, but I think
this is quite fine for the type of thing it is needed for: mpage etc
need to know an approx size of a bio that they need to allocate. It
would be silly to continously allocate 64-page sized bio_vec entries, if
the target cannot do more than 8, for example.
|
|
Make bio->bi_end_io() take bytes_done and actual error as argument. This
enables partial completion of bio's, which is important for latency
reasons (bio can be huge, for slow media we want page-by-page
completions).
I think I got most of the bi_end_io() functions out there, but I might
have missed a few. For the record, if you don't care about partial
completions and just want to be notified when the entire bio completes,
add a
if (bio->bi_size)
return 1;
to the top of your bi_end_io(). It should return 0 on completion.
bio_endio() will decrement bio->bi_size appropriately, it's recommended
for people to go through that. Otherwise they will have to control
BIO_UPTODATE and bi_size decrement themselves, there's really no reason
to do that. I've deliberately avoided doing any functional changes to
any of the end_io functions, as I think that would only make the patch
more complex. It's simple right now, but this being i/o paths I prefer
(as usual) to be careful and take small steps. The mpage_end_io_read()
do-vecs-at-the-time change can come right after this, for instance.
|
|
This is bio_add_page(), 100% identical to the version I sent out for
comments earlier this week. With the previous queue restriction patch,
this guarentees that we can always add a page worth of data to the bio.
bio_add_page() returns 0 on success, and 1 on failure. Either the page
is added completely, or the attempt is aborted.
bio_add_page() uses the normal queue restrictions to determine whether
we an add the page or not. if a queue has further restrictions, it can
define a q->merge_bvec_fn() to further impose limits.
Patch also includes changes to ll_rw_kio(), if for nothing else to
demonstrate how to use this piece of instrastructure.
|
|
clean up with bio_kmap_irq() thing properly. remove the micro optimization of _not_ calling kmap_atomic() if this isn't a highmem page. we could keep that and do the inc_preempt_count() ourselves, but I'm not sure it's worth it and this is cleaner.
|
|
Make people use the proper cli/sti replacements
|
|
* since the last caller of is_read_only() is gone, the function
itself is removed.
* destroy_buffers() is not used anymore; gone.
* fsync_dev() is gone; the only user is (broken) lvm.c and first
step in fixing lvm.c will consist of propagating struct block_device *
anyway; at that point we'll just use fsync_bdev() in there.
* prototype of bio_ioctl() removed - function doesn't exist
anymore.
|
|
highmem.h includes bio.h, so just about every compilation unit in the
kernel gets to process bio.h.
The patch moves the BIO-related functions out of highmem.h and into
bio-related headers. The nested include is removed and all files which
need to include bio.h now do so.
|
|
This removes <linux/mm.h> from <linux/vmalloc.h>.
This then goes and fixes all of the files (x86 and PPC) which relied on
implicit includes which don't happen anymore. This also takes
<linux/kdev_t.h> out of fs/mpage.c and puts it into include/linux/bio.h
where it belongs since <linux/bio.h> references 'kdev_t' directly.
A quick summary of the of the added includes:
arch/i386/kernel/microcode.c: needs extern for num_physpages, in linux/mm.h
include/linux/spinlock.h: local_irq* is defined in <asm/system.h> but
this was never directly included.
|
|
- *NOW* all places that (re)assign ->bi_dev have relevant struct
block_device *. ->bi_bdev (struct block_device * equivalent of
->bi_dev) introduced, ->bi_dev removed, users updated.
|
|
from when completion was potentially called more than once to indicate
partial end I/O. These days bio->bi_end_io is _only_ called when I/O
has completed on the entire bio.
|
|
- Christoph Hellwig: scsi_register_module cleanup
- Mikael Pettersson: apic.c LVTERR fixes
- Russell King: ARM update (including bio update for icside)
- Jens Axboe: more bio updates
- Al Viro: make ready to switch bread away from kdev_t..
- Davide Libenzi: scheduler cleanups
- Anders Gustafsson: LVM fixes for bio
- Richard Gooch: devfs update
|
|
- Al Viro: floppy_eject cleanup, mount cleanups
- Jens Axboe: bio updates
- Ingo Molnar: mempool fixes
- GOTO Masanori: Fix O_DIRECT error handling
|
|
- Jens Axboe: more bio stuff
- Ingo Molnar: mempool for bio
- Niibe Yutaka: Super-H update
|
|
- Jeff Garzik: separate out handling of older tulip chips
- Jens Axboe: more bio stuff
- Anton Altaparmakov: NTFS 1.1.21 update
|
|
- Greg KH: USB updates
- Jens Axboe: more bio updates
- Christoph Rohland: fix up proper shmat semantics
|