| Age | Commit message (Collapse) | Author |
|
from there.
Add a new field to the md superblock, in an used area, to record where
resync was up-to on a clean shutdown while resync is active. Restart from
this point.
The extra field is verified by having a second copy of the event counter.
If the second event counter is wrong, we ignore the extra field.
This patch thanks to Angus Sawyer <angus.sawyer@dsl.pipex.com>
|
|
Define an interface for interpreting and updating superblocks
so we can more easily define new formats.
With this patch, (almost) all superblock layout information is
locating in a small set of routines dedicated to superblock
handling. This will allow us to provide a similar set for
a different format.
The two exceptions are:
1/ autostart_array where the devices listed in the superblock
are searched for.
2/ raid5 'knows' the maximum number of devices for
compute_parity.
These will be addressed in a later patch.
|
|
|
|
From Peter Chubb
Compaq Smart array sector_t cleanup: prepare for possible 64-bit sector_t
Clean up loop device to allow huge backing files.
MD transition to 64-bit sector_t.
- Hold sizes and offsets as sector_t not int;
- use 64-bit arithmetic if necessary to map block-in-raid to zone
and block-in-zone
|
|
partition_name() moved from md.c to partitions/check.c; disk_name() is not
exported anymore; partition_name() takes dev_t instead of kdev_t.
|
|
* we remove the paritition 0 from ->part[] and put the old
contents of ->part[0] into gendisk itself; indexes are shifted, obviously.
* ->part is allocated at add_gendisk() time and freed at del_gendisk()
according to value of ->minor_shift; static arrays of hd_struct are gone
from drivers, ditto for manual allocations a-la ide. As the matter of fact,
none of the drivers know about struct hd_struct now.
|
|
raid1, raid5 and multipath maintain their own
'operational' flag. This is equivalent to
!rdev->faulty
and so isn't needed.
Similarly raid1 and raid1 maintain a "write_only" flag
that is equivalnt to
!rdev->in_sync
so it isn't needed either.
As part of implementing this change, we introduce some extra
flag bit in raid5 that are meaningful only inside 'handle_stripe'.
Some of these replace the "action" array which recorded what
actions were required (and would be performed after the stripe
spinlock was released). This has the advantage of reducing our
dependance on MD_SB_DISKS which personalities shouldn't need
to know about.
|
|
This flag was used by multipath to make sure only
one superblock was written, as there is only one
real device.
The relevant test is now more explicitly dependant on multipath,
and the flag is gone.
|
|
1/ Personalities only know about raid_disks devices.
Some might be not in_sync and so cannot be read from,
but must be written to.
- change MD_SB_DISKS to ->raid_disks
- add tests for .write_only
2/ rdev->raid_disk is now -1 for spares. desc_nr is maintained
by analyse_sbs and sync_sbs.
3/ spare_inactive method is subsumed into hot_remove_disk
spare_writable is subsumed into hot_add_disk.
hot_add_disk decides which slot a new device will hold.
4/ spare_active now finds all non-in_sync devices and marks them
in_sync.
5/ faulty devices are removed by the md recovery thread as soon
as they are idle. Any spares that are available are then added.
|
|
This is equivalent to ->rdev != NULL, so it isn't needed.
|
|
device on an MD array
This will allow us to know, in the event of a device failure, when the
device is completely unused and so can be disconnected from the
array. Currently this isn't a problem as drives aren't normally disconnect
until after a repacement has been rebuilt, which is a LONG TIME, but that
will change shortly...
We always increment the count under a spinlock after checking that
it hasn't been disconnected already (rdev!= NULL).
We disconnect under the same spinlock after checking that the
count is zero.
|
|
This simplifies the error handlers slighty, but allows for even more
simplification later.
|
|
Holding the rdev instead of the bdev does cause an extra
de-reference, but it is conceptually cleaner and will allow
lots more tidying up.
|
|
get_spare recently became static and no-one told md_k.h
|
|
Get rid of dev in rdev and use bdev exclusively.
There is an awkwardness here in that userspace sometimes
passed down a dev_t (e.g. hot_add_disk) and sometime
a major and a minor (e.g. add_new_disk). Should we convert
both to kdev_t as the uniform standard....
That is what was being done but it seemed very clumsy and
things were gets converted back and forth a lot.
As bdget used a dev_t, I felt safe in staying with dev_t once I
had one rather than converting to kdev_t and back.
|
|
Remove the sb from the mddev
Now that al the important information is in mddev, we don't need
to have an sb off the mddev. We only keep the per-device ones.
Previously we determined if "set_array_info" had been run byb checking
mddev->sb. Now we check mddev->raid_disks on the assumption that
any valid array MUST have a non-zero number of devices.
|
|
Remove dependance on superblock
All the remaining field of interest in the superblock
get duplicated in the mddev struture and this is treated as
authoritative. The superblock gets completely generated at
write time, and all useful information extracted at read time.
This means that we can slot in different superblock formats
without affecting the bulk of the code.
|
|
Move persistent from superblock to mddev
Tidyup calc_dev_sboffset and calc_dev_size on the way
|
|
Remove number and raid_disk from personality arrays
These are redundant. number not needed any more
raid_disk never was as that is the index.
|
|
nr_disks is gone from multipath/raid1
Never used.
|
|
Remove old_dev field.
We used to monitor the pervious device number of a
component device for superblock maintenance. This is
not needed any more.
|
|
Don't maintain disc status in superblock.
The state is now in rdev so we don't maintain it
in superblock any more.
We also nolonger test content of superblock for
disk status
mddev->spare is now an rdev and not a superblock fragment.
|
|
Add "degraded" field to md device
This is used to determine if a spare should be added
without relying on the superblock.
|
|
Add in_sync flag to each rdev
This currently mirrors the MD_DISK_SYNC superblock flag,
but soon it will be authoritative and the superblock will
only be consulted at start time.
|
|
Add raid_disk field to rdev
Also change find_rdev_nr to find based on position
in array (raid_disk) not position in superblock (number).
|
|
Improve handling of spares in md
- hot_remove_disk is given the raid_disk rather than descriptor number
so that it can find the device in internal array directly, no search.
- spare_inactive now uses mddev->spare->raid_disk instead of
mddev->spare->number so it can find the device directly without searching
- spare_write does not need number. It can use mddev->spare->raid_disk as above.
- spare_active does not need &mddev->spare. It finds the descriptor directly
and fixes it without this pointer
|
|
Remove concept of 'spare' drive for multipath.
Multipath now treats all working devices as
active and does io to to first working one.
|
|
Move md_update_sb calls
When a change which requires a superblock update happens
at interrupt time, we currently set a flag (sb_dirty) and
wakeup to per-array thread (raid1/raid5d/multipathd) to
do the actual update.
This patch centralises this. The sb_update is now done
by the mdrecoveryd thread. As this is always woken up after
the error handler is called, we don't need the call to wakeup
the local thread any more.
With this, we don't need "md_update_sb" to lock the array
any more and only use __md_update_sb which is local to md.c
So we rename __md_update_sb back to md_update_sb and stop
exporting it.
|
|
Pass the correct bdev to md_error
After a call to generic_make_request, bio->bi_bdev can have changed
(e.g. by a re-mapped like raid0). So we cannot trust it for reporting
the source of an error. This patch takes care to find the correct
bdev.
|
|
Rdev list cleanups.
An "rdev" can be on three different lists.
- the list of all rdevs
- the list of pending rdevs
- the list of rdevs for a given mddev
The first list is now only used to list "unused" devices in
/proc/mdstat, and only pending rdevs can be unused, so this list
isn't necessary.
An rdev cannot be both pending and in an mddev, so we know rdev will
only be on one list at at time.
This patch discards the all_raid_disks list, and changes the
pending list to use "same_set" in the rdev. It also changes
/proc/mdstat to iterate through pending devices, rather than through
all devices.
So now an rdev is only on one list, either the pending list
or the list of rdevs for a given mddev. This means that
ITERATE_RDEV_GENERIC doesn't need to be told which field,
to walk down: there is ony one.
|
|
Use symbolic names for multipath (-4) and linear (-1)
Also, a variable called "level" was being used to store a
"level" and a "personality" number. This is potentially
confusing, so it is now two variables.
|
|
Embed bio in mp_bh rather than separate allocation.
multipath currently allocates an mp_bh and a bio for each
request. With this patch, the bio is made to be part of the
mp_bh so there is only one allocation, and it from a private
pool (the bio was allocated from a shared pool).
Also remove "remaining" and "cmd" from mp_bh which aren't used.
And remove spare (unused) from multipath_private_data.
|
|
Remove state field from multipath mp_bh structure.
The MPBH_Uptodate flag is set but never used,
The MPBH_SyncPhase flag was never used.
These a both legacy from the copying of raid1.c
MPBH_PreAlloc is no longer needed as due to use of
mempools, so the state field can go...
|
|
Get multipath to use mempool
... rather than maintaining it's own mempool
|
|
|
|
* ->dev killed for md/linear.c (same as previous parts)
|
|
* a bunch of callers of partition_name() are calling
bdev_partition_name(),
* the last users of raid1 and multipath ->dev are gone; so are
the fields in question.
|
|
* ->diskop() split into individual methods; prototypes cleaned
up. In particular, handling of hot_add_disk() gets mdk_rdev_t * of
the component we are adding as an argument instead of playing the games
with major/minor. Code cleaned up.
|
|
* ->error_handler() switched to struct block_device *.
* md_sync_acct() switched to struct block_device *.
* raid5 struct disk_info ->dev is gone - we use ->bdev everywhere.
* bunch of kdev_same() when we have corresponding struct block_device *
and can simply compare them is removed from drivers/md/*.c
|
|
into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
|
|
Previously each raid personality (Well, 1 and 5) started their
own thread to do resync, but md.c had a single common thread to do
reconstruct. Apart from being untidy, this means that you cannot
have two arrays reconstructing at the same time, though you can have
to array resyncing at the same time..
This patch changes the personalities so they don't start the resync,
but just leave a flag to say that it is needed.
The common thread (mdrecoveryd) now just monitors things and starts a
separate per-array thread whenever resync or recovery (or both) is
needed.
When the recovery finishes, mdrecoveryd will be woken up to re-lock
the device and activate the spares or whatever.
raid1 needs to know when resync/recovery starts and ends so it can
allocate and release resources.
It allocated when a resync request for stripe 0 is received.
Previously it deallocated for resync in it's own thread, and
deallocated for recovery when the spare is made active or inactive
(depending on success).
As raid1 doesn't own a thread anymore this needed to change. So to
match the "alloc on 0", the md_do_resync now calls sync_request one
last time asking to sync one block past the end. This is a signal to
release any resources.
|
|
1/ don't free the rdev->sb on an error -- it might be
accessed again later. Just wait for the device to be
exported.
2/ Change md_update_sb to __md_update_sb and have it
clear the sb_dirty flag.
New md_update_sb locks the device and calls __md_update_sb
if sb_dirty. This avoids any possbile races around
updating the superblock
|
|
Provide SMP safe locking for all_mddevs list.
the all_mddevs_lock is added to protect all_mddevs and mddev_map.
ITERATE_MDDEV is moved to md.c (it isn't needed elsewhere) and enhanced
to take the lock appropriately and always have a refcount on the object
that is given to the body of the loop.
mddev_find is changed so that the structure is allocated outside a lock,
but test-and-set is done inside the lock.
|
|
More mddev tidyup - remove recovery_sem and resync_sem
recovery_sem and resync_sem get replaced by careful use
of recovery_running protected by reconfig_sem.
As part of this, the creative:
down(&mddev->recovery_sem);
up(&mddev->recovery_sem);
when stopping an array gets replaced by a more obvious
wait_event(resync_wait, mddev->recovery_running <= 0);
|
|
Strengthen the locking of mddev.
mddev is only ever locked in md.c, so we move {,un}lock_mddev
out of the header and into md.c, and rename to mddev_{,un}lock
for consistancy with mddev_{get,put,find}.
When building arrays (typically at boot time) we now lock, and unlock
as it is the "right" thing to do. The lock should never fail.
When generating /proc/mdstat, we lock each array before inspecting it.
In md_ioctl, we lock the mddev early and unlock at the end, rather than
locking in two different places.
In md_open we make sure we can get a lock before completing the open. This
ensures that we sync with do_md_stop properly.
In md_do_recovery, we lock each mddev before checking it's status.
md_do_recovery must unlock while recovery happens, and a do_md_stop at this
point will deadlock when md_do_recovery tries to regain the lock. This will be
fixed in a later patch.
|
|
The mapping from minor number to mddev structure allows for a
'data' that is never used. This patch removes that and explicitly
inlines some inline functions that become trivial.
mddev_map also becomes completely local to md.c
|
|
The nb_dev field is not needed.
Most uses are the test if it is zero or not, and they can be replaced
by tests on the emptiness of the disks list.
Other uses are for iterating through devices in numerical order and
it makes the code clearer (IMO) to unroll the devices into an array first
(which has to be done at some stage anyway) and then walk that array.
This makes ITERATE_RDEV_ORDERED un-necessary.
Also remove the "name" field which is never used.
|
|
make_request functions.
As we now have per-device queues, we don't need a common make_request
function that dispatches, we can dispatch directly.
Each *_make_request function is changed to take a request_queue_t
from which it extract the mddev that it needs, and to deduce the
"rw" flag directly from the bio.
|
|
It isn't needed. Only the chunksize is used, and it
can be found in the superblock.
|
|
We embed a request_queue_t in the mddev structure and so
have a separate one for each mddev.
This is used for plugging (in raid5).
Given this embeded request_queue_t, md_make_request no-longer
needs to make from device number to mddev, but can map from
the queue to the mddev instead.
|