| Age | Commit message (Collapse) | Author |
|
It is equivalent to conf->raid_disks - conf->mddev->degraded.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Both R1BIO_Barrier and R1BIO_Returned are 4 !!!!
This means that barrier requests don't get returned (i.e. b_endio called)
because it looks like they already have been.
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
On a read-error we suspend the array, then synchronously read the block from
other arrays until we find one where we can read it. Then we try writing the
good data back everywhere and make sure it works. If any write or subsequent
read fails, only then do we fail the device out of the array.
To be able to suspend the array, we need to also keep track of how many
requests are queued for handling by raid1d.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
raid1 needs to put up a barrier to new requests while it does resync or other
background recovery. The code for this is currently open-coded, slighty
obscure by its use of two waitqueues, and not documented.
This patch gathers all the related code into 4 functions, and includes a
comment which (hopefully) explains what is happening.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
We can only accept BARRIER requests if all slaves handle
barriers, and that can, of course, change with time....
So we keep track of whether the whole array seems safe for barriers,
and also whether each individual rdev handles barriers.
We initially assumes barriers are OK.
When writing the superblock we try a barrier, and if that fails, we flag
things for no-barriers. This will usually clear the flags fairly quickly.
If writing the superblock finds that BIO_RW_BARRIER is -ENOTSUPP, we need to
resubmit, so introduce function "md_super_wait" which waits for requests to
finish, and retries ENOTSUPP requests without the barrier flag.
When writing the real raid1, write requests which were BIO_RW_BARRIER but
which aresn't supported need to be retried. So raid1d is enhanced to do this,
and when any bio write completes (i.e. no retry needed) we remove it from the
r1bio, so that devices needing retry are easy to find.
We should hardly ever get -ENOTSUPP errors when writing data to the raid.
It should only happen if:
1/ the device used to support BARRIER, but now doesn't. Few devices
change like this, though raid1 can!
or
2/ the array has no persistent superblock, so there was no opportunity to
pre-test for barriers when writing the superblock.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
If a device is flagged 'WriteMostly' and the array has a bitmap, and the
bitmap superblock indicates that write_behind is allowed, then write_behind is
enabled for WriteMostly devices.
Write requests will be acknowledges as complete to the caller (via b_end_io)
when all non-WriteMostly devices have completed the write, but will not be
cleared from the bitmap until all devices complete.
This requires memory allocation to make a local copy of the data being
written. If there is insufficient memory, then we fall-back on normal write
semantics.
Signed-Off-By: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Both raid1 and multipath have a "retry_list" which is global, so all raid1
arrays (for example) us the same list. This is rather ugly, and it is simple
enough to make it per-array, so this patch does that.
It also changes to multipath code to use list.h lists instead of
roll-your-own.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This allows the number of "raid_disks" in a raid1 to be changed.
This requires allocating a new pool of "r1bio" structures which a different
number of bios, suspending IO, and swapping the new pool in place of the old.
(and a few other related changes).
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: NeilBrown <neilb@cse.unsw.edu.au>
For each resync request, we allocate a "r1_bio" which has a bio "master_bio"
attached that goes largely unused. We also allocate a read_bio which is
used. This patch removes the read_bio and just uses the master_bio instead.
This fixes a bug wherein bi_bdev of the master_bio wasn't being set, but was
being used.
We also introduce a new "sectors" field into the r1_bio as we can no-longer
rely in master_bio->bi_sectors.
|
|
From: NeilBrown <neilb@cse.unsw.edu.au>
next_r1 is never used, so it can just go.
read_bio isn't needed as we can easily use one of the pointers in the
write_bios array - write_bios[->read_disk]. So rename "write_bios" to "bios"
and store the pointer to the read bio in there.
|
|
From: NeilBrown <neilb@cse.unsw.edu.au>
The only time it is really needed is to differentiate a retry-on-fail from a
write-after-read-for-resync request to raid1d. So we use a bit in 'state'
for that.
|
|
raid1 uses MD_SB_DISKS to size two data structures,
but the new version-1 superblock allows for more than
this number of disks (and most actual arrays use many
fewer).
This patch sizes to two arrays dynamically.
One becomes a separate kmalloced array.
The other is moved to the end of the containing structure
and appropriate extra space is allocated.
Also, change r1buf_pool_alloc (which allocates buffers for
a mempool for doing re-sync) to not get r1bio structures
from the r1bio pool (which could exhaust the pool) but instead
to allocate them separately.
|
|
The md_recoveryd thread is responsible for initiating and cleaning
up resync threads.
This job can be equally well done by the per-array threads
for those arrays which might need it.
So the mdrecoveryd thread is gone and the core code that
it ran is now run by raid5d, raid1d or multipathd.
We add an MD_RECOVERY_NEEDED flag so those daemon don't have
to bother trying to lock the md array unless it is likely
that something needs to be done.
Also modify the names of all threads to have the number of
md device.
|
|
This allows the thread to easily identified and signalled.
The point of signalling will appear in the next patch.
|
|
raid1, raid5 and multipath maintain their own
'operational' flag. This is equivalent to
!rdev->faulty
and so isn't needed.
Similarly raid1 and raid1 maintain a "write_only" flag
that is equivalnt to
!rdev->in_sync
so it isn't needed either.
As part of implementing this change, we introduce some extra
flag bit in raid5 that are meaningful only inside 'handle_stripe'.
Some of these replace the "action" array which recorded what
actions were required (and would be performed after the stripe
spinlock was released). This has the advantage of reducing our
dependance on MD_SB_DISKS which personalities shouldn't need
to know about.
|
|
1/ Personalities only know about raid_disks devices.
Some might be not in_sync and so cannot be read from,
but must be written to.
- change MD_SB_DISKS to ->raid_disks
- add tests for .write_only
2/ rdev->raid_disk is now -1 for spares. desc_nr is maintained
by analyse_sbs and sync_sbs.
3/ spare_inactive method is subsumed into hot_remove_disk
spare_writable is subsumed into hot_add_disk.
hot_add_disk decides which slot a new device will hold.
4/ spare_active now finds all non-in_sync devices and marks them
in_sync.
5/ faulty devices are removed by the md recovery thread as soon
as they are idle. Any spares that are available are then added.
|
|
This is equivalent to ->rdev != NULL, so it isn't needed.
|
|
device on an MD array
This will allow us to know, in the event of a device failure, when the
device is completely unused and so can be disconnected from the
array. Currently this isn't a problem as drives aren't normally disconnect
until after a repacement has been rebuilt, which is a LONG TIME, but that
will change shortly...
We always increment the count under a spinlock after checking that
it hasn't been disconnected already (rdev!= NULL).
We disconnect under the same spinlock after checking that the
count is zero.
|
|
Holding the rdev instead of the bdev does cause an extra
de-reference, but it is conceptually cleaner and will allow
lots more tidying up.
|
|
Remove number and raid_disk from personality arrays
These are redundant. number not needed any more
raid_disk never was as that is the index.
|
|
nr_disks is gone from multipath/raid1
Never used.
|
|
* a bunch of callers of partition_name() are calling
bdev_partition_name(),
* the last users of raid1 and multipath ->dev are gone; so are
the fields in question.
|
|
Previously each raid personality (Well, 1 and 5) started their
own thread to do resync, but md.c had a single common thread to do
reconstruct. Apart from being untidy, this means that you cannot
have two arrays reconstructing at the same time, though you can have
to array resyncing at the same time..
This patch changes the personalities so they don't start the resync,
but just leave a flag to say that it is needed.
The common thread (mdrecoveryd) now just monitors things and starts a
separate per-array thread whenever resync or recovery (or both) is
needed.
When the recovery finishes, mdrecoveryd will be woken up to re-lock
the device and activate the spares or whatever.
raid1 needs to know when resync/recovery starts and ends so it can
allocate and release resources.
It allocated when a resync request for stripe 0 is received.
Previously it deallocated for resync in it's own thread, and
deallocated for recovery when the spare is made active or inactive
(depending on success).
As raid1 doesn't own a thread anymore this needed to change. So to
match the "alloc on 0", the md_do_resync now calls sync_request one
last time asking to sync one block past the end. This is a signal to
release any resources.
|
|
- md/raid1.c - bring struct block_device * into private data.
|
|
- me: revert the "kill(-1..)" change. POSIX isn't that clear on the
issue anyway, and the new behaviour breaks things.
- Jens Axboe: more bio updates
- Al Viro: rd_load cleanups. hpfs mount fix, mount cleanups
- Ingo Molnar: more raid updates
- Jakub Jelinek: fix Linux/x86 confusion about arg passing of "save_v86_state" and "do_signal"
- Trond Myklebust: fix NFS client race conditions
|
|
- Jeff Garzik: no longer support old cards in tulip driver
(see separate driver for old tulip chips)
- Pat Mochel: driverfs/device model documentation
- Ballabio Dario: update eata driver to new IO locking
- Ingo Molnar: raid resync with new bio structures (much more efficient)
and mempool_resize()
- Jens Axboe: bio queue locking
|
|
- Rui Sousa: emu10k1 module fixes, remove joystick part.
- Alan Cox: driver merges
- Andrea Arkangeli: alpha updates
- David Woodhouse: up_and_exit -> complete_and_exit
- David Miller: sparc and network update
- Andrew Morton: update 3c59x driver
- Neil Brown: NFS export VFAT, knfsd cleanups, raid fixes
- Ben Collins: ieee1394 updates
- Paul Mackerras: PPC update
- me: make sure we don't lose position bits in "filldir()"
|
|
|