user/sven/linux.git/drivers/md, branch v3.14.16

dm cache: fix race affecting dirty block count

2014-08-07T21:52:37Z

commit 44fa816bb778edbab6b6ddaaf24908dd6295937e upstream. nr_dirty is updated without locking, causing it to drift so that it is non-zero (either a small positive integer, or a very large one when an underflow occurs) even when there are no actual dirty blocks. This was due to a race between the workqueue and map function accessing nr_dirty in parallel without proper protection. People were seeing under runs due to a race on increment/decrement of nr_dirty, see: https://lkml.org/lkml/2014/6/3/648 Fix this by using an atomic_t for nr_dirty. Reported-by: roma1390@gmail.com Signed-off-by: Anssi Hannula Signed-off-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman

dm bufio: fully initialize shrinker

2014-08-07T21:52:37Z

commit d8c712ea471ce7a4fd1734ad2211adf8469ddddc upstream. 1d3d4437eae1 ("vmscan: per-node deferred work") added a flags field to struct shrinker assuming that all shrinkers were zero filled. The dm bufio shrinker is not zero filled, which leaves arbitrary kmalloc() data in flags. So far the only defined flags bit is SHRINKER_NUMA_AWARE. But there are proposed patches which add other bits to shrinker.flags (e.g. memcg awareness). Rather than simply initializing the shrinker, this patch uses kzalloc() when allocating the dm_bufio_client to ensure that the embedded shrinker and any other similar structures are zeroed. This fixes theoretical over aggressive shrinking of dm bufio objects. If the uninitialized dm_bufio_client.shrinker.flags contains SHRINKER_NUMA_AWARE then shrink_slab() would call the dm shrinker for each numa node rather than just once. This has been broken since 3.12. Signed-off-by: Greg Thelen Acked-by: Mikulas Patocka Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman

dm cache metadata: do not allow the data block size to change

2014-07-28T15:06:03Z

commit 048e5a07f282c57815b3901d4a68a77fa131ce0a upstream. The block size for the dm-cache's data device must remained fixed for the life of the cache. Disallow any attempt to change the cache's data block size. Signed-off-by: Mike Snitzer Acked-by: Joe Thornber Signed-off-by: Greg Kroah-Hartman

dm thin metadata: do not allow the data block size to change

2014-07-28T15:06:03Z

commit 9aec8629ec829fc9403788cd959e05dd87988bd1 upstream. The block size for the thin-pool's data device must remained fixed for the life of the thin-pool. Disallow any attempt to change the thin-pool's data block size. It should be noted that attempting to change the data block size via thin-pool table reload will be ignored as a side-effect of the thin-pool handover that the thin-pool target does during thin-pool table reload. Here is an example outcome of attempting to load a thin-pool table that reduced the thin-pool's data block size from 1024K to 512K. Before: kernel: device-mapper: thin: 253:4: growing the data device from 204800 to 409600 blocks After: kernel: device-mapper: thin metadata: changing the data block size (from 2048 to 1024) is not supported kernel: device-mapper: table: 253:4: thin-pool: Error creating metadata object kernel: device-mapper: ioctl: error adding target to table Signed-off-by: Mike Snitzer Acked-by: Joe Thornber Signed-off-by: Greg Kroah-Hartman

dm: allocate a special workqueue for deferred device removal

2014-07-17T23:21:05Z

commit acfe0ad74d2e1bfc81d1d7bf5e15b043985d3650 upstream. The commit 2c140a246dc ("dm: allow remove to be deferred") introduced a deferred removal feature for the device mapper. When this feature is used (by passing a flag DM_DEFERRED_REMOVE to DM_DEV_REMOVE_CMD ioctl) and the user tries to remove a device that is currently in use, the device will be removed automatically in the future when the last user closes it. Device mapper used the system workqueue to perform deferred removals. However, some targets (dm-raid1, dm-mpath, dm-stripe) flush work items scheduled for the system workqueue from their destructor. If the destructor itself is called from the system workqueue during deferred removal, it introduces a possible deadlock - the workqueue tries to flush itself. Fix this possible deadlock by introducing a new workqueue for deferred removals. We allocate just one workqueue for all dm targets. The ability of dm targets to process IOs isn't dependent on deferred removal of unused targets, so a deadlock due to shared workqueue isn't possible. Also, cleanup local_init() to eliminate potential for returning success on failure. Signed-off-by: Mikulas Patocka Signed-off-by: Dan Carpenter Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman

dm io: fix a race condition in the wake up code for sync_io

2014-07-17T23:21:05Z

commit 10f1d5d111e8aed46a0f1179faf9a3cf422f689e upstream. There's a race condition between the atomic_dec_and_test(&io->count) in dec_count() and the waking of the sync_io() thread. If the thread is spuriously woken immediately after the decrement it may exit, making the on stack io struct invalid, yet the dec_count could still be using it. Fix this race by using a completion in sync_io() and dec_count(). Reported-by: Minfei Huang Signed-off-by: Joe Thornber Signed-off-by: Mike Snitzer Acked-by: Mikulas Patocka Signed-off-by: Greg Kroah-Hartman

md: flush writes before starting a recovery.

2014-07-09T18:18:28Z

commit 133d4527eab8d199a62eee6bd433f0776842df2e upstream. When we write to a degraded array which has a bitmap, we make sure the relevant bit in the bitmap remains set when the write completes (so a 're-add' can quickly rebuilt a temporarily-missing device). If, immediately after such a write starts, we incorporate a spare, commence recovery, and skip over the region where the write is happening (because the 'needs recovery' flag isn't set yet), then that write will not get to the new device. Once the recovery finishes the new device will be trusted, but will have incorrect data, leading to possible corruption. We cannot set the 'needs recovery' flag when we start the write as we do not know easily if the write will be "degraded" or not. That depends on details of the particular raid level and particular write request. This patch fixes a corruption issue of long standing and so it suitable for any -stable kernel. It applied correctly to 3.0 at least and will minor editing to earlier kernels. Reported-by: Bill Tested-by: Bill Link: http://lkml.kernel.org/r/53A518BB.60709@sbcglobal.net Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman

dm thin: update discard_granularity to reflect the thin-pool blocksize

2014-07-09T18:18:26Z

commit 09869de57ed2728ae3c619803932a86cb0e2c4f8 upstream. DM thinp already checks whether the discard_granularity of the data device is a factor of the thin-pool block size. But when using the dm-thin-pool's discard passdown support, DM thinp was not selecting the max of the underlying data device's discard_granularity and the thin-pool's block size. Update set_discard_limits() to set discard_granularity to the max of these values. This enables blkdev_issue_discard() to properly align the discards that are sent to the DM thin device on a full block boundary. As such each discard will now cover an entire DM thin-pool block and the block will be reclaimed. Reported-by: Zdenek Kabelac Signed-off-by: Lukas Czerner Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman

md: always set MD_RECOVERY_INTR when interrupting a reshape thread.

2014-06-11T18:54:11Z

commit 2ac295a544dcae9299cba13ce250419117ae7fd1 upstream. Commit 8313b8e57f55b15e5b7f7fc5d1630bbf686a9a97 md: fix problem when adding device to read-only array with bitmap. added a called to md_reap_sync_thread() which cause a reshape thread to be interrupted (in particular, it could cause md_thread() to never even call md_do_sync()). However it didn't set MD_RECOVERY_INTR so ->finish_reshape() would not know that the reshape didn't complete. This only happens when mddev->ro is set and normally reshape threads don't run in that situation. But raid5 and raid10 can start a reshape thread during "run" is the array is in the middle of a reshape. They do this even if ->ro is set. So it is best to set MD_RECOVERY_INTR before abortingg the sync thread, just in case. Though it rare for this to trigger a problem it can cause data corruption because the reshape isn't finished properly. So it is suitable for any stable which the offending commit was applied to. (3.2 or later) Fixes: 8313b8e57f55b15e5b7f7fc5d1630bbf686a9a97 Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman

md: always set MD_RECOVERY_INTR when aborting a reshape or other "resync".

2014-06-11T18:54:11Z

commit 3991b31ea072b070081ca3bfa860a077eda67de5 upstream. If mddev->ro is set, md_to_sync will (correctly) abort. However in that case MD_RECOVERY_INTR isn't set. If a RESHAPE had been requested, then ->finish_reshape() will be called and it will think the reshape was successful even though nothing happened. Normally a resync will not be requested if ->ro is set, but if an array is stopped while a reshape is on-going, then when the array is started, the reshape will be restarted. If the array is also set read-only at this point, the reshape will instantly appear to success, resulting in data corruption. Consequently, this patch is suitable for any -stable kernel. Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman