summaryrefslogtreecommitdiff
path: root/include/linux/raid
AgeCommit message (Collapse)Author
2005-06-21[PATCH] md: allow md to update multiple superblocks in parallel.NeilBrown
currently, md updates all superblocks (one on each device) in series. It waits for one write to complete before starting the next. This isn't a big problem as superblock updates don't happen that often. However it is neater to do it in parallel, and if the drives in the array have gone to "sleep" after a period of idleness, then waking them is parallel is faster (and someone else should be worrying about power drain). Futher, we will need parallel superblock updates for a future patch which keeps the intent-logging bitmap near the superblock. Also remove the silly code that retired superblock updates 100 times. This simply never made sense. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21[PATCH] md: allow md intent bitmap to be stored near the superblock.NeilBrown
This provides an alternate to storing the bitmap in a separate file. The bitmap can be stored at a given offset from the superblock. Obviously the creator of the array must make sure this doesn't intersect with data.... After is good for version-0.90 superblocks. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21[PATCH] md: fix deadlock due to md thread processing delayed requests.NeilBrown
Before completing a 'write' the md superblock might need to be updated. This is best done by the md_thread. The current code schedules this up and queues the write request for later handling by the md_thread. However some personalities (Raid5/raid6) will deadlock if the md_thread tries to submit requests to its own array. So this patch changes things so the processes submitting the request waits for the superblock to be written and then submits the request itself. This fixes a recently-created deadlock in raid5/raid6 Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21[PATCH] md: optimise reconstruction when re-adding a recently failed drive.NeilBrown
When an array is degraded, bit in the intent-bitmap are never cleared. So if a recently failed drive is re-added, we only need to reconstruct the block that are still reflected in the bitmap. This patch adds support for this re-adding. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21[PATCH] md: raid1 support for bitmap intent loggingNeilBrown
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21[PATCH] md: enable the bitmap write-back daemon and wait for it.NeilBrown
Currently we don't wait for updates to the bitmap to be flushed to disk properly. The infrastructure all there, but it isn't being used.... A separate kernel thread (bitmap_writeback_daemon) is needed to wait for each page as we cannot get callbacks when a page write completes. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21[PATCH] md: optimised resync using Bitmap based intent loggingNeilBrown
With this patch, the intent to write to some block in the array can be logged to a bitmap file. Each bit represents some number of sectors and is set before any update happens, and only cleared when all writes relating to all sectors are complete. After an unclean shutdown, information in this bitmap can be used to optimise resync - only sectors which could be out-of-sync need to be updated. Also if a drive is removed and then added back into an array, the recovery can make use of the bitmap to optimise reconstruction. This is not implemented in this patch. Currently the bitmap is stored in a file which must (obviously) be stored on a separate device. The patch only provided infrastructure. It does not update any personalities to bitmap intent logging. Md arrays can still be used with no bitmap file. This patch has minimal impact on such arrays. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21[PATCH] md: improve the interface to sync_requestNeilBrown
1/ change the return value (which is number-of-sectors synced) from 'int' to 'sector_t'. The number of sectors is usually easily small enough to fit in an int, but if resync needs to abort, it may want to return the total number of remaining sectors, which could be large. Also errors cannot be returned as negative numbers now, so use 0 instead 2/ Add a 'skipped' return parameter to allow the array to report that it skipped the sectors. This allows md to take this into account in the speed calculations. Currently there is no important skipping, but the bitmap-based-resync that is coming will use this. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21[PATCH] md: improve locking on 'safemode' and move superblock writesNeilBrown
When md marks the superblock dirty before a write, it calls generic_make_request (to write the superblock) from within generic_make_request (to write the first dirty block), which could cause problems later. With this patch, the superblock write is always done by the helper thread, and write request are delayed until that write completes. Also, the locking around marking the array dirty and writing the superblock is improved to avoid possible races. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-02-07[PATCH] raid5 overlapping read hackNeil Brown
If we detect an overlap, we set a flag and wait for a wakeup. When requests are handled, if the flag was set, we perform the wakeup. Note that the code currently in -mm is badly broken. With this patch applied, it passes tests the use O_DIRECT to cause lots of overlapping requests. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-01-07[PATCH] md: improve 'hash' code in linear.cNeil Brown
The hashtable that linear uses to find the right device stores two pointers for every entry. The second is always one of: The first plus 1 NULL When NULL, it is never accessed, so any value can be stored. Thus it could always be "first plus 1", and so we don't need to store it as it is trivial to calculate. This patch halves the size of this table, which results in some simpler code as well. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-11-10[PATCH] md: "Faulty" personalityNeil Brown
The 'faulty' personality provides a layer over any block device in which errors may be synthesised. A variety of errors are possible including transient and persistent read and write errors, and read errors that persist until the next write. There error mode can be changed on a live array. Accessing this personality requires mdadm 2.8.0 or later. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-11-10[PATCH] md: fix problem with md/linear for devices larger than 2 terabytesNeil Brown
Some size fields were "int" instead of "sector_t". Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-25[PATCH] md: fixes to make version-1 superblocks work in md driverNeil Brown
Add some missing data_offset additions and some le_to_cpu convertions and fix a few other little mistakes. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-25[PATCH] md: make retry_list non-global in raid1 and multipathNeil Brown
Both raid1 and multipath have a "retry_list" which is global, so all raid1 arrays (for example) us the same list. This is rather ugly, and it is simple enough to make it per-array, so this patch does that. It also changes to multipath code to use list.h lists instead of roll-your-own. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-09-13[PATCH] mark md_interrupt_thread staticChristoph Hellwig
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-23[PATCH] md: RAID10 moduleNeil Brown
This patch adds a 'raid10' module which provides features similar to both raid0 and raid1 in the one array. Various combinations of layout are supported. This code is still "experimental", but appears to work. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-23[PATCH] md: assorted fixes/improvemnet to generic md resync code.Neil Brown
1/ Introduce "mddev->resync_max_sectors" so that an md personality can ask for resync to cover a different address range than that of a single drive. raid10 will use this. 2/ fix is_mddev_idle so that if there seem to be a negative number of events, it doesn't immediately assume activity. 3/ make "sync_io" (the count of IO sectors used for array resync) an atomic_t to avoid SMP races. 4/ Pass md_sync_acct a "block_device" rather than the containing "rdev", as the whole rdev isn't needed. Also make this an inline function. 5/ Make sure recovery gets interrupted on any error. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-06-04[PATCH] md: support reshaping raid1 arrays - adding or removing drives.Neil Brown
This allows the number of "raid_disks" in a raid1 to be changed. This requires allocating a new pool of "r1bio" structures which a different number of bios, suspending IO, and swapping the new pool in place of the old. (and a few other related changes). Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-06-04[PATCH] md: allow md arrays to be resized if devices are large enough.Neil Brown
It is possible to have raid1/4/5/6 arrays that do not use all the space on the drive. This can be done explicitly, or can happen info you, one by one, replace all the drives with larger devices. This patch extends the "SET_ARRAY_INFO" ioctl (which previously invalid on active arrays) allow some attributes of the array to be changed and implements changing of the "size" attribute. "size" is the amount of each device that is actually used. If "size" is increased, the new space will immediately be "resynced". Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-06-04[PATCH] md: make sure md_check_recovery will remove a faulty device when ↵Neil Brown
->nr_pending hits 0 md_check_recovery only locks a device and does stuff when it thinks there is a real likelyhood that something needs doing. So the test at the top must cover all possibilities. But it didn't cover the possibility that the last outstanding request on a failed device had finished and so the device needed to be removed. As a result, a failed drive might not get removed from the personalities perspective on the array, and so it could never be removed from the array as a whole. With this patch, whenever ->nr_pending hits zero on a faulty device, MD_RECOVERY_NEEDED is set so that md_check_recovery will do stuff. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-05-19[PATCH] Remove blk_run_queues() remnantsAndrew Morton
It no longer exists.
2004-04-12[PATCH] unplugging: md updateAndrew Morton
From: Neil Brown <neilb@cse.unsw.edu.au> I've made a bunch of changes to the 'md' bits - largely moving the unplugging into the individual personalities which know more about which drives are actually in use.
2004-04-12[PATCH] per-backing dev unpluggingAndrew Morton
From: Jens Axboe <axboe@suse.de>, Chris Mason, me, others. The global unplug list causes horrid spinlock contention on many-disk many-CPU setups - throughput is worse than halved. The other problem with the global unplugging is of course that it will cause the unplugging of queues which are unrelated to the I/O upon which the caller is about to wait. So what we do to solve these problems is to remove the global unplug and set up the infrastructure under which the VFS can tell the block layer to unplug only those queues which are relevant to the page or buffer_head whcih is about to be waited upon. We do this via the very appropriate address_space->backing_dev_info structure. Most of the complexity is in devicemapper, MD and swapper_space, because for these backing devices, multiple queues may need to be unplugged to complete a page/buffer I/O. In each case we ensure that data structures are in place to permit us to identify all the lower-level queues which contribute to the higher-level backing_dev_info. Each contributing queue is told to unplug in response to a higher-level unplug. To simplify things in various places we also introduce the concept of a "synchronous BIO": it is tagged with BIO_RW_SYNC. The block layer will perform an immediate unplug when it sees one of these go past.
2004-03-28[PATCH] md: Convert a number or "unsigned long"s to "sector_t"sAndrew Morton
From: NeilBrown <neilb@cse.unsw.edu.au> This helps raid5 work on at least 1 very large array.. Thanks to Evan Felix <evan.felix@pnl.gov>
2004-02-18[PATCH] md: Allow partitioning of MD devices.Andrew Morton
From: NeilBrown <neilb@cse.unsw.edu.au> With this patch, md used two major numbers for arrays. One Major is number 9 with name 'md' have unpartitioned md arrays, one per minor number. The other Major is allocated dynamically with name 'mdp' and had on array for every 64 minors, allowing for upto 63 partitions. The arrays under one major are completely separate from the arrays under the other. The preferred name for devices with the new major are of the form: /dev/md/d1p3 # partion 3 of device 1 - minor 67 When a paritioned md device is assembled, the partitions are not recognised until after the whole-array device is opened again. A future version of mdadm will perform this open so that the need will be transparent.
2004-02-18[PATCH] md: Avoid unnecessary bio allocation during raid1 resyncAndrew Morton
From: NeilBrown <neilb@cse.unsw.edu.au> For each resync request, we allocate a "r1_bio" which has a bio "master_bio" attached that goes largely unused. We also allocate a read_bio which is used. This patch removes the read_bio and just uses the master_bio instead. This fixes a bug wherein bi_bdev of the master_bio wasn't being set, but was being used. We also introduce a new "sectors" field into the r1_bio as we can no-longer rely in master_bio->bi_sectors.
2004-02-18[PATCH] md: Remove some un-needed fields from r1bio_sAndrew Morton
From: NeilBrown <neilb@cse.unsw.edu.au> next_r1 is never used, so it can just go. read_bio isn't needed as we can easily use one of the pointers in the write_bios array - write_bios[->read_disk]. So rename "write_bios" to "bios" and store the pointer to the read bio in there.
2004-02-18[PATCH] md: Discard the cmd field from r1_bio structureAndrew Morton
From: NeilBrown <neilb@cse.unsw.edu.au> The only time it is really needed is to differentiate a retry-on-fail from a write-after-read-for-resync request to raid1d. So we use a bit in 'state' for that.
2004-02-03[PATCH] md: Change the way the name of an md device is printed in error ↵Andrew Morton
messages. From: NeilBrown <neilb@cse.unsw.edu.au> Instead of using ("md%d", mdidx(mddev)), we now use ("%s", mdname(mddev)) where mdname is the disk_name field in the associated gendisk structure. This allows future flexability in naming.
2004-01-20[PATCH] md: Remove the 'disks' array from md which holds the gendisk structures.Andrew Morton
From: NeilBrown <neilb@cse.unsw.edu.au> Move the pointers into mddev. The reduces dependance on MAX_MD_DEVS.
2004-01-20[PATCH] md: Fix typo in commentAndrew Morton
From: NeilBrown <neilb@cse.unsw.edu.au> Thanks dann frazier <dannf@hp.com>
2004-01-20[PATCH] RAID-6Andrew Morton
From: "H. Peter Anvin" <hpa@zytor.com> RAID6 implementation. See Kconfig help for usage details. The next release of `mdadm' has raid6 userspace support.
2003-09-22[PATCH] 32-bit dev_t: internal useAlexander Viro
Starting the conversion: * internal dev_t made 32bit. * new helpers - new_encode_dev(), new_decode_dev(), huge_encode_dev(), huge_decode_dev(), new_valid_dev(). They do encoding/decoding of 32bit and 64bit values; for now huge_... are aliases for new_... and new_valid_dev() is always true. We do 12:20 for 32bit; representation is compatible with 16bit one - we have major in bits 19--8 and minor in 31--20,7--0. That's what the userland sees; internally we have (major << 20)|minor, of course. * MKDEV(), MAJOR() and MINOR() updated. * several places used to handle Missed'em'V dev_t (14:18 split) manually; that stuff had been taken into common helpers. Now we can start replacing old_... with new_... and huge_..., depending on the width available. MKDEV() callers should (for now) make sure that major and minor are within 12:20. That's what the next chunk will do.
2003-08-06[PATCH] Proper block queue reference countingJens Axboe
To be able to properly be able to keep references to block queues, we make blk_init_queue() return the queue that it initialized, and let it be independently allocated and then cleaned up on the last reference. I have grepped high and low, and there really shouldn't be any broken uses of blk_init_queue() in the kernel drivers left. The added bonus being blk_init_queue() error checking is explicit now, most of the drivers were broken in this regard (even IDE/SCSI). No drivers have embedded request queue structures. Drivers that don't use blk_init_queue() but blk_queue_make_request(), should allocate the queue with blk_alloc_queue(gfp_mask). I've converted all of them to do that, too. They can call blk_cleanup_queue() now too, using the define blk_put_queue() is probably cleaner though.
2003-05-26[PATCH] md: Replace bdev_partition_name with calls to bdevnameNeil Brown
2003-05-26[PATCH] md: Remove dependance on MD_SB_DISKS in linear personalityNeil Brown
Linear uses one array sized by MD_SB_DISKS inside a structure. We move it to the end of the structure, declare it as size 0, and arrange for approprate extra space to be allocated on structure allocation.
2003-05-26[PATCH] md: Remove MD_SB_DISKS limits from raid1Neil Brown
raid1 uses MD_SB_DISKS to size two data structures, but the new version-1 superblock allows for more than this number of disks (and most actual arrays use many fewer). This patch sizes to two arrays dynamically. One becomes a separate kmalloced array. The other is moved to the end of the containing structure and appropriate extra space is allocated. Also, change r1buf_pool_alloc (which allocates buffers for a mempool for doing re-sync) to not get r1bio structures from the r1bio pool (which could exhaust the pool) but instead to allocate them separately.
2003-05-26[PATCH] md: Remove dependancy on MD_SB_DISKS from raid0Neil Brown
Arrays with type-1 superblock can have more than MD_SB_DISKS, so we remove the dependancy on that number from raid0, replacing several fixed sized arrays with one dynamically allocated array.
2003-05-26[PATCH] md: Remove dependancy on MD_SB_DISKS from raid5Neil Brown
One embeded array gets moved to end of structure and sized dynamically.
2003-05-26[PATCH] md: Remove dependancy on MD_SB_DISKS from multipathNeil Brown
Multipath has a dependancy on MD_SB_DISKS which is no longer authoritative. We change it to use a separately allocated array.
2003-05-26[PATCH] md: Improve raid0 mapping code to simplify and reduce mem usage.Neil Brown
To cope with a raid0 array with differing sized devices, raid0 divides an array into "strip zones". The first zone covers the start of all devices, upto an offset equal to the size of the smallest device. The second strip zone covers the remaining devices upto the size of the next smallest size, etc. In order to determing which strip zone a given address is in, the array is logically divided into slices the size of the smallest zone, and a 'hash' table is created listing the first and, if relevant, second zone in each slice. As the smallest slice can be very small (imagine an array with a 76G drive and a 75.5G drive) this hash table can be rather large. With this patch, we limit the size of the hash table to one page, at the possible cost of making several probes into the zone list before we find the correct zone. We also cope with the possibility that a zone could be larger than a 32bit sector address would allow.
2003-05-26[PATCH] md: Use new single page bio splitting for raid0 and linearNeil Brown
Sometimes raid0 and linear are required to take a single page bio that spans two devices. We use bio_split to split such a bio into two. The the same time, bio.h is included by linux/raid/md.h so we don't included it elsewhere anymore. We also modify the mergeable_bvec functions to allow a bvec that doesn't fit if it is the first bvec to be added to the bio, and be careful never to return a negative length from a bvec_mergable funciton.
2003-05-07[PATCH] remove partition_name()Andrew Morton
From: Christoph Hellwig <hch@lst.de> partition_name() is a variant of __bdevname() that caches results and returns a pointrer to kmalloc()ed data instead of printing into a buffer. Due to it's caching it gets utterly confused when the name for a dev_t changes (can happen easily now with device mapper and probably in the future with dynamic dev_t users). It's only used by the raid code and most calls are through a wrapper, bdev_partition_name() which takes a struct block_device * that maybe be NULL. The patch below changes the bdev_partition_name() to call bdevname() if possible and the other calls where we really have nothing more than a dev_t to __bdevname. Btw, it would be nice if someone who knows the md code a bit better than me could remove bdev_partition_name() in favour of direct calls to bdevname() where possible - that would also get rid of the returns pointer to string on stack issue that this patch can't fix yet.
2003-04-03[PATCH] md: Cleanups for md to move device size calculations into personalitiesNeil Brown
2003-03-26[PATCH] md: Convert md personalities to new module interfaceNeil Brown
Thanks to Angus Sawyer <angus.sawyer@dsl.pipex.com> and Daniel McNeil <daniel@osdl.org>
2003-03-14[PATCH] md: Add new superblock format for mdNeil Brown
Superblock format '1' resolves a number of issues with superblock format '0'. It is more dense and can support many more sub-devices. It does not contains un-needed redundancy. It adds a few new useful fields
2003-03-14[PATCH] md: Allow components of MD raid array to have data start at offset ↵Neil Brown
from start of device. Normally the data stored on a component of a RAID array is stored from the start of the device. This patch allows a per-device data_offset so the data can start elsewhere. This will allow RAID arrays where the metadata is at the head of the device rather than the tail.
2003-03-14[PATCH] md: Fulltime delayed 'safe_mode' for mdNeil Brown
From: Angus Sawyer <angus.sawyer@dsl.pipex.com> If there are no writes for 20 milliseconds, write out superblock to mark array as clean. Write out superblock with dirty flag before allowing any further write to succeed. If an md thread gets signaled with SIGKILL, reduce the delay to 0. Also tidy up some printk's and make sure writing the superblock isn't noisy.
2003-03-14[PATCH] md: Remove md_recoveryd thread for mdNeil Brown
The md_recoveryd thread is responsible for initiating and cleaning up resync threads. This job can be equally well done by the per-array threads for those arrays which might need it. So the mdrecoveryd thread is gone and the core code that it ran is now run by raid5d, raid1d or multipathd. We add an MD_RECOVERY_NEEDED flag so those daemon don't have to bother trying to lock the md array unless it is likely that something needs to be done. Also modify the names of all threads to have the number of md device.