user/sven/linux.git/drivers/md, branch v4.12.9

MD: not clear ->safemode for external metadata array

2017-08-25T00:15:02Z

commit afc1f55ca44e257f69da8f43e0714a76686ae8d1 upstream. ->safemode should be triggered by mdadm for external metadaa array, otherwise array's state confuses mdadm. Fixes: 33182d15c6bf(md: always clear ->safemode when md_check_recovery gets the mddev lock.) Cc: NeilBrown Signed-off-by: Shaohua Li Signed-off-by: Greg Kroah-Hartman

md: always clear ->safemode when md_check_recovery gets the mddev lock.

2017-08-25T00:15:01Z

commit 33182d15c6bf182f7ae32a66ea4a547d979cd6d7 upstream. If ->safemode == 1, md_check_recovery() will try to get the mddev lock and perform various other checks. If mddev->in_sync is zero, it will call set_in_sync, and clear ->safemode. However if mddev->in_sync is not zero, ->safemode will not be cleared. When md_check_recovery() drops the mddev lock, the thread is woken up again. Normally it would just check if there was anything else to do, find nothing, and go to sleep. However as ->safemode was not cleared, it will take the mddev lock again, then wake itself up when unlocking. This results in an infinite loop, repeatedly calling md_check_recovery(), which RCU or the soft-lockup detector will eventually complain about. Prior to commit 4ad23a976413 ("MD: use per-cpu counter for writes_pending"), safemode would only be set to one when the writes_pending counter reached zero, and would be cleared again when writes_pending is incremented. Since that patch, safemode is set more freely, but is not reliably cleared. So in md_check_recovery() clear ->safemode before checking ->in_sync. Fixes: 4ad23a976413 ("MD: use per-cpu counter for writes_pending") Reported-by: Dominik Brodowski Reported-by: David R Signed-off-by: NeilBrown Signed-off-by: Shaohua Li Signed-off-by: Greg Kroah-Hartman

md: fix test in md_write_start()

2017-08-25T00:15:01Z

commit 81fe48e9aa00bdd509bd3c37a76d1132da6b9f09 upstream. md_write_start() needs to clear the in_sync flag is it is set, or if there might be a race with set_in_sync() such that the later will set it very soon. In the later case it is sufficient to take the spinlock to synchronize with set_in_sync(), and then set the flag if needed. The current test is incorrect. It should be: if "flag is set" or "race is possible" "flag is set" is trivially "mddev->in_sync". "race is possible" should be tested by "mddev->sync_checkers". If sync_checkers is 0, then there can be no race. set_in_sync() will wait in percpu_ref_switch_to_atomic_sync() for an RCU grace period, and as md_write_start() holds the rcu_read_lock(), set_in_sync() will be sure ot see the update to writes_pending. If sync_checkers is > 0, there could be race. If md_write_start() happened entirely between if (!mddev->in_sync && percpu_ref_is_zero(&mddev->writes_pending)) { and mddev->in_sync = 1; in set_in_sync(), then it would not see that is_sync had been set, and set_in_sync() would not see that writes_pending had been incremented. This bug means that in_sync is sometimes not set when it should be. Consequently there is a small chance that the array will be marked as "clean" when in fact it is inconsistent. Fixes: 4ad23a976413 ("MD: use per-cpu counter for writes_pending") Signed-off-by: NeilBrown Signed-off-by: Shaohua Li Signed-off-by: Greg Kroah-Hartman

md/raid5: add thread_group worker async_tx_issue_pending_all

2017-08-06T16:21:09Z

commit 7e96d559634b73a8158ee99a7abece2eacec2668 upstream. Since thread_group worker and raid5d kthread are not in sync, if worker writes stripe before raid5d then requests will be waiting for issue_pendig. Issue observed when building raid5 with ext4, in some build runs jbd2 would get hung and requests were waiting in the HW engine waiting to be issued. Fix this by adding a call to async_tx_issue_pending_all in the raid5_do_work. Signed-off-by: Ofer Heifetz Signed-off-by: Shaohua Li Signed-off-by: Greg Kroah-Hartman

md/raid1: fix writebehind bio clone

2017-08-06T16:21:09Z

commit 16d56e2fcc1fc15b981369653c3b41d7ff0b443d upstream. After bio is submitted, we should not clone it as its bi_iter might be invalid by driver. This is the case of behind_master_bio. In certain situration, we could dispatch behind_master_bio immediately for the first disk and then clone it for other disks. https://bugzilla.kernel.org/show_bug.cgi?id=196383 Reported-and-tested-by: Markus Reviewed-by: Ming Lei Fix: 841c1316c7da(md: raid1: improve write behind) Signed-off-by: Shaohua Li Signed-off-by: Greg Kroah-Hartman

md: remove 'idx' from 'struct resync_pages'

2017-08-06T16:21:09Z

commit 022e510fcbda79183fd2cdc01abb01b4be80d03f upstream. bio_add_page() won't fail for resync bio, and the page index for each bio is same, so remove it. More importantly the 'idx' of 'struct resync_pages' is initialized in mempool allocator function, the current way is wrong since mempool is only responsible for allocation, we can't use that for initialization. Suggested-by: NeilBrown Reported-by: NeilBrown Reported-and-tested-by: Patrick Fixes: f0250618361d(md: raid10: don't use bio's vec table to manage resync pages) Fixes: 98d30c5812c3(md: raid1: don't use bio's vec table to manage resync pages) Signed-off-by: Ming Lei Signed-off-by: Shaohua Li Signed-off-by: Greg Kroah-Hartman

dm integrity: test for corrupted disk format during table load

2017-08-06T16:21:09Z

commit bc86a41e96c5b6f07453c405e036d95acc673389 upstream. If the dm-integrity superblock was corrupted in such a way that the journal_sections field was zero, the integrity target would deadlock because it would wait forever for free space in the journal. Detect this situation and refuse to activate the device. Signed-off-by: Mikulas Patocka Fixes: 7eada909bfd7 ("dm: add integrity target") Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman

dm integrity: fix inefficient allocation of journal space

2017-08-06T16:21:09Z

commit 9dd59727dbc21d33a9add4c5b308a5775cd5a6ef upstream. When using a block size greater than 512 bytes, the dm-integrity target allocates journal space inefficiently. It allocates one journal entry for each 512-byte chunk of data, fills an entry for each block of data and leaves the remaining entries unused. This issue doesn't cause data corruption, but all the unused journal entries degrade performance severely. For example, with 4k blocks and an 8k bio, it would allocate 16 journal entries but only use 2 entries. The remaining 14 entries were left unused. Fix this by adding the missing 'log2_sectors_per_block' shifts that are required to have each journal entry map to a full block. Signed-off-by: Mikulas Patocka Fixes: 7eada909bfd7 ("dm: add integrity target") Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman

Raid5 should update rdev->sectors after reshape

2017-07-27T22:10:13Z

commit b5d27718f38843a74552e9a93d32e2391fd3999f upstream. The raid5 md device is created by the disks which we don't use the total size. For example, the size of the device is 5G and it just uses 3G of the devices to create one raid5 device. Then change the chunksize and wait reshape to finish. After reshape finishing stop the raid and assemble it again. It fails. mdadm -CR /dev/md0 -l5 -n3 /dev/loop[0-2] --size=3G --chunk=32 --assume-clean mdadm /dev/md0 --grow --chunk=64 wait reshape to finish mdadm -S /dev/md0 mdadm -As The error messages: [197519.814302] md: loop1 does not have a valid v1.2 superblock, not importing! [197519.821686] md: md_import_device returned -22 After reshape the data offset is changed. It selects backwards direction in this condition. In function super_1_load it compares the available space of the underlying device with sb->data_size. The new data offset gets bigger after reshape. So super_1_load returns -EINVAL. rdev->sectors is updated in md_finish_reshape. Then sb->data_size is set in super_1_sync based on rdev->sectors. So add md_finish_reshape in end_reshape. Signed-off-by: Xiao Ni Acked-by: Guoqing Jiang Signed-off-by: Shaohua Li Signed-off-by: Greg Kroah-Hartman

dm raid: stop using BUG() in __rdev_sectors()

2017-07-27T22:10:12Z

commit 4d49f1b4a1fcab16b6dd1c79ef14f2b6531d50a6 upstream. Return 0 rather than BUG() if __rdev_sectors() fails and catch invalid rdev size in the constructor. Reported-by: Hannes Reinecke Signed-off-by: Heinz Mauelshagen Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman