user/sven/linux.git/drivers/md, branch v3.18.27

dm snapshot: fix hung bios when copy error occurs

2016-02-10T03:56:33Z

[ Upstream commit 385277bfb57faac44e92497104ba542cdd82d5fe ] When there is an error copying a chunk dm-snapshot can incorrectly hold associated bios indefinitely, resulting in hung IO. The function copy_callback sets pe->error if there was error copying the chunk, and then calls complete_exception. complete_exception calls pending_complete on error, otherwise it calls commit_exception with commit_callback (and commit_callback calls complete_exception). The persistent exception store (dm-snap-persistent.c) assumes that calls to prepare_exception and commit_exception are paired. persistent_prepare_exception increases ps->pending_count and persistent_commit_exception decreases it. If there is a copy error, persistent_prepare_exception is called but persistent_commit_exception is not. This results in the variable ps->pending_count never returning to zero and that causes some pending exceptions (and their associated bios) to be held forever. Fix this by unconditionally calling commit_exception regardless of whether the copy was successful. A new "valid" parameter is added to commit_exception -- when the copy fails this parameter is set to zero so that the chunk that failed to copy (and all following chunks) is not recorded in the snapshot store. Also, remove commit_callback now that it is merely a wrapper around pending_complete. Signed-off-by: Mikulas Patocka Signed-off-by: Mike Snitzer Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin

bcache: Change refill_dirty() to always scan entire disk if necessary

2016-02-10T03:56:17Z

[ Upstream commit 627ccd20b4ad3ba836472468208e2ac4dfadbf03 ] Previously, it would only scan the entire disk if it was starting from the very start of the disk - i.e. if the previous scan got to the end. This was broken by refill_full_stripes(), which updates last_scanned so that refill_dirty was never triggering the searched_from_start path. But if we change refill_dirty() to always scan the entire disk if necessary, regardless of what last_scanned was, the code gets cleaner and we fix that bug too. Signed-off-by: Kent Overstreet Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

bcache: prevent crash on changing writeback_running

2016-02-10T03:56:17Z

[ Upstream commit 8d16ce540c94c9d366eb36fc91b7154d92d6397b ] Added a safeguard in the shutdown case. At least while not being attached it is also possible to trigger a kernel bug by writing into writeback_running. This change adds the same check before trying to wake up the thread for that case. Signed-off-by: Stefan Bader Cc: Kent Overstreet Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

bcache: allows use of register in udev to avoid "device_busy" error.

2016-02-10T03:56:17Z

[ Upstream commit d7076f21629f8f329bca4a44dc408d94670f49e2 ] Allows to use register, not register_quiet in udev to avoid "device_busy" error. The initial patch proposed at https://lkml.org/lkml/2013/8/26/549 by Gabriel de Perthuis does not unlock the mutex and hangs the kernel. See http://thread.gmane.org/gmane.linux.kernel.bcache.devel/2594 for the discussion. Cc: Denis Bychkov Cc: Kent Overstreet Cc: Eric Wheeler Cc: Gabriel de Perthuis Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

bcache: unregister reboot notifier if bcache fails to unregister device

2016-02-10T03:56:16Z

[ Upstream commit 2ecf0cdb2b437402110ab57546e02abfa68a716b ] In bcache_init() function it forgot to unregister reboot notifier if bcache fails to unregister a block device. This commit fixes this. Signed-off-by: Zheng Liu Tested-by: Joshua Schmid Tested-by: Eric Wheeler Cc: Kent Overstreet Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

bcache: fix a leak in bch_cached_dev_run()

2016-02-10T03:56:16Z

[ Upstream commit 4d4d8573a8451acc9f01cbea24b7e55f04a252fe ] Signed-off-by: Al Viro Tested-by: Joshua Schmid Tested-by: Eric Wheeler Cc: Kent Overstreet Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

bcache: clear BCACHE_DEV_UNLINK_DONE flag when attaching a backing device

2016-02-10T03:56:15Z

[ Upstream commit fecaee6f20ee122ad75402c53d8278f9bb142ddc ] This bug can be reproduced by the following script: #!/bin/bash bcache_sysfs="/sys/fs/bcache" function clear_cache() { if [ ! -e $bcache_sysfs ]; then echo "no bcache sysfs" exit fi cset_uuid=$(ls -l $bcache_sysfs|head -n 2|tail -n 1|awk '{print $9}') sudo sh -c "echo $cset_uuid > /sys/block/sdb/sdb1/bcache/detach" sleep 5 sudo sh -c "echo $cset_uuid > /sys/block/sdb/sdb1/bcache/attach" } for ((i=0;i<10;i++)); do clear_cache done The warning messages look like below: [ 275.948611] ------------[ cut here ]------------ [ 275.963840] WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xb8/0xd0() (Tainted: P W --------------- ) [ 275.979253] Hardware name: Tecal RH2285 [ 275.994106] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:09.0/0000:08:00.0/host4/target4:2:1/4:2:1:0/block/sdb/sdb1/bcache/cache' [ 276.024105] Modules linked in: bcache tcp_diag inet_diag ipmi_devintf ipmi_si ipmi_msghandler bonding 8021q garp stp llc ipv6 ext3 jbd loop sg iomemory_vsl(P) bnx2 microcode serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 jbd2 mbcache megaraid_sas pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_wait_scan] [ 276.072643] Pid: 2765, comm: sh Tainted: P W --------------- 2.6.32 #1 [ 276.089315] Call Trace: [ 276.105801] [] ? warn_slowpath_common+0x87/0xc0 [ 276.122650] [] ? warn_slowpath_fmt+0x46/0x50 [ 276.139361] [] ? sysfs_add_one+0xb8/0xd0 [ 276.156012] [] ? sysfs_do_create_link+0x12b/0x170 [ 276.172682] [] ? sysfs_create_link+0x13/0x20 [ 276.189282] [] ? bcache_device_link+0xc1/0x110 [bcache] [ 276.205993] [] ? bch_cached_dev_attach+0x478/0x4f0 [bcache] [ 276.222794] [] ? bch_cached_dev_store+0x627/0x780 [bcache] [ 276.239680] [] ? alloc_pages_current+0xaa/0x110 [ 276.256594] [] ? sysfs_write_file+0xe5/0x170 [ 276.273364] [] ? vfs_write+0xb8/0x1a0 [ 276.290133] [] ? sys_write+0x51/0x90 [ 276.306368] [] ? system_call_fastpath+0x16/0x1b [ 276.322301] ---[ end trace 9f5d4fcdd0c3edfb ]--- [ 276.338241] ------------[ cut here ]------------ [ 276.354109] WARNING: at /home/wenqing.lz/bcache/bcache/super.c:720 bcache_device_link+0xdf/0x110 [bcache]() (Tainted: P W --------------- ) [ 276.386017] Hardware name: Tecal RH2285 [ 276.401430] Couldn't create device <-> cache set symlinks [ 276.401759] Modules linked in: bcache tcp_diag inet_diag ipmi_devintf ipmi_si ipmi_msghandler bonding 8021q garp stp llc ipv6 ext3 jbd loop sg iomemory_vsl(P) bnx2 microcode serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 jbd2 mbcache megaraid_sas pata_acpi ata_generic ata_piix dm_mod [last unloaded: scsi_wait_scan] [ 276.465477] Pid: 2765, comm: sh Tainted: P W --------------- 2.6.32 #1 [ 276.482169] Call Trace: [ 276.498610] [] ? warn_slowpath_common+0x87/0xc0 [ 276.515405] [] ? warn_slowpath_fmt+0x46/0x50 [ 276.532059] [] ? bcache_device_link+0xdf/0x110 [bcache] [ 276.548808] [] ? bch_cached_dev_attach+0x478/0x4f0 [bcache] [ 276.565569] [] ? bch_cached_dev_store+0x627/0x780 [bcache] [ 276.582418] [] ? alloc_pages_current+0xaa/0x110 [ 276.599341] [] ? sysfs_write_file+0xe5/0x170 [ 276.616142] [] ? vfs_write+0xb8/0x1a0 [ 276.632607] [] ? sys_write+0x51/0x90 [ 276.648671] [] ? system_call_fastpath+0x16/0x1b [ 276.664756] ---[ end trace 9f5d4fcdd0c3edfc ]--- We forget to clear BCACHE_DEV_UNLINK_DONE flag in bcache_device_attach() function when we attach a backing device first time. After detaching this backing device, this flag will be true and sysfs_remove_link() isn't called in bcache_device_unlink(). Then when we attach this backing device again, sysfs_create_link() will return EEXIST error in bcache_device_link(). So the fix is trival and we clear this flag in bcache_device_link(). Signed-off-by: Zheng Liu Tested-by: Joshua Schmid Tested-by: Eric Wheeler Cc: Kent Overstreet Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

bcache: Add a cond_resched() call to gc

2016-02-10T03:56:15Z

[ Upstream commit c5f1e5adf956e3ba82d204c7c141a75da9fa449a ] Signed-off-by: Takashi Iwai Tested-by: Eric Wheeler Cc: Kent Overstreet Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

bcache: fix a livelock when we cause a huge number of cache misses

2016-02-10T03:56:14Z

[ Upstream commit 2ef9ccbfcb90cf84bdba320a571b18b05c41101b ] Subject : [PATCH v2] bcache: fix a livelock in btree lock Date : Wed, 25 Feb 2015 20:32:09 +0800 (02/25/2015 04:32:09 AM) This commit tries to fix a livelock in bcache. This livelock might happen when we causes a huge number of cache misses simultaneously. When we get a cache miss, bcache will execute the following path. ->cached_dev_make_request() ->cached_dev_read() ->cached_lookup() ->bch->btree_map_keys() ->btree_root() <------------------------ ->bch_btree_map_keys_recurse() | ->cache_lookup_fn() | ->cached_dev_cache_miss() | ->bch_btree_insert_check_key() -| [If btree->seq is not equal to seq + 1, we should return EINTR and traverse btree again.] In bch_btree_insert_check_key() function we first need to check upgrade flag (op->lock == -1), and when this flag is true we need to release read btree->lock and try to take write btree->lock. During taking and releasing this write lock, btree->seq will be monotone increased in order to prevent other threads modify this in cache miss (see btree.h:74). But if there are some cache misses caused by some requested, we could meet a livelock because btree->seq is always changed by others. Thus no one can make progress. This commit will try to take write btree->lock if it encounters a race when we traverse btree. Although it sacrifice the scalability but we can ensure that only one can modify the btree. Signed-off-by: Zheng Liu Tested-by: Joshua Schmid Tested-by: Eric Wheeler Cc: Joshua Schmid Cc: Zhu Yanhai Cc: Kent Overstreet Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

dm thin: fix race condition when destroying thin pool workqueue

2016-02-02T18:57:28Z

[ Upstream commit 18d03e8c25f173f4107a40d0b8c24defb6ed69f3 ] When a thin pool is being destroyed delayed work items are cancelled using cancel_delayed_work(), which doesn't guarantee that on return the delayed item isn't running. This can cause the work item to requeue itself on an already destroyed workqueue. Fix this by using cancel_delayed_work_sync() which guarantees that on return the work item is not running anymore. Fixes: 905e51b39a555 ("dm thin: commit outstanding data every second") Fixes: 85ad643b7e7e5 ("dm thin: add timeout to stop out-of-data-space mode holding IO forever") Signed-off-by: Nikolay Borisov Signed-off-by: Mike Snitzer Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin