<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/drivers/md, branch v4.12.5</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.12.5</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.12.5'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2017-08-06T16:21:09Z</updated>
<entry>
<title>md/raid5: add thread_group worker async_tx_issue_pending_all</title>
<updated>2017-08-06T16:21:09Z</updated>
<author>
<name>Ofer Heifetz</name>
<email>oferh@marvell.com</email>
</author>
<published>2017-07-24T06:17:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9425c1fdc48d3a05ea332b19454cc690faec3ee8'/>
<id>urn:sha1:9425c1fdc48d3a05ea332b19454cc690faec3ee8</id>
<content type='text'>
commit 7e96d559634b73a8158ee99a7abece2eacec2668 upstream.

Since thread_group worker and raid5d kthread are not in sync, if
worker writes stripe before raid5d then requests will be waiting
for issue_pendig.

Issue observed when building raid5 with ext4, in some build runs
jbd2 would get hung and requests were waiting in the HW engine
waiting to be issued.

Fix this by adding a call to async_tx_issue_pending_all in the
raid5_do_work.

Signed-off-by: Ofer Heifetz &lt;oferh@marvell.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>md/raid1: fix writebehind bio clone</title>
<updated>2017-08-06T16:21:09Z</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2017-07-17T21:33:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=270c1bc38fc531662bddd64a835f4afd83a9562d'/>
<id>urn:sha1:270c1bc38fc531662bddd64a835f4afd83a9562d</id>
<content type='text'>
commit 16d56e2fcc1fc15b981369653c3b41d7ff0b443d upstream.

After bio is submitted, we should not clone it as its bi_iter might be
invalid by driver. This is the case of behind_master_bio. In certain
situration, we could dispatch behind_master_bio immediately for the
first disk and then clone it for other disks.

https://bugzilla.kernel.org/show_bug.cgi?id=196383

Reported-and-tested-by: Markus &lt;m4rkusxxl@web.de&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Fix: 841c1316c7da(md: raid1: improve write behind)
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>md: remove 'idx' from 'struct resync_pages'</title>
<updated>2017-08-06T16:21:09Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2017-07-14T08:14:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b70f86cedc1b5378815083db1d8f31fb9073ced0'/>
<id>urn:sha1:b70f86cedc1b5378815083db1d8f31fb9073ced0</id>
<content type='text'>
commit 022e510fcbda79183fd2cdc01abb01b4be80d03f upstream.

bio_add_page() won't fail for resync bio, and the page index for each
bio is same, so remove it.

More importantly the 'idx' of 'struct resync_pages' is initialized in
mempool allocator function, the current way is wrong since mempool is
only responsible for allocation, we can't use that for initialization.

Suggested-by: NeilBrown &lt;neilb@suse.com&gt;
Reported-by: NeilBrown &lt;neilb@suse.com&gt;
Reported-and-tested-by: Patrick &lt;dto@gmx.net&gt;
Fixes: f0250618361d(md: raid10: don't use bio's vec table to manage resync pages)
Fixes: 98d30c5812c3(md: raid1: don't use bio's vec table to manage resync pages)
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>dm integrity: test for corrupted disk format during table load</title>
<updated>2017-08-06T16:21:09Z</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2017-07-21T15:58:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bce721912a18fcffda3a19947f5ec4d99996667e'/>
<id>urn:sha1:bce721912a18fcffda3a19947f5ec4d99996667e</id>
<content type='text'>
commit bc86a41e96c5b6f07453c405e036d95acc673389 upstream.

If the dm-integrity superblock was corrupted in such a way that the
journal_sections field was zero, the integrity target would deadlock
because it would wait forever for free space in the journal.

Detect this situation and refuse to activate the device.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Fixes: 7eada909bfd7 ("dm: add integrity target")
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>dm integrity: fix inefficient allocation of journal space</title>
<updated>2017-08-06T16:21:09Z</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2017-07-19T15:23:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d2df849cda98d053cc29021898c4bcda6159de4f'/>
<id>urn:sha1:d2df849cda98d053cc29021898c4bcda6159de4f</id>
<content type='text'>
commit 9dd59727dbc21d33a9add4c5b308a5775cd5a6ef upstream.

When using a block size greater than 512 bytes, the dm-integrity target
allocates journal space inefficiently.  It allocates one journal entry
for each 512-byte chunk of data, fills an entry for each block of data
and leaves the remaining entries unused.

This issue doesn't cause data corruption, but all the unused journal
entries degrade performance severely.

For example, with 4k blocks and an 8k bio, it would allocate 16 journal
entries but only use 2 entries.  The remaining 14 entries were left
unused.

Fix this by adding the missing 'log2_sectors_per_block' shifts that are
required to have each journal entry map to a full block.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Fixes: 7eada909bfd7 ("dm: add integrity target")
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Raid5 should update rdev-&gt;sectors after reshape</title>
<updated>2017-07-27T22:10:13Z</updated>
<author>
<name>Xiao Ni</name>
<email>xni@redhat.com</email>
</author>
<published>2017-07-05T09:34:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=25b43a867f56eba19c23cae8e296f8dd3b7fe951'/>
<id>urn:sha1:25b43a867f56eba19c23cae8e296f8dd3b7fe951</id>
<content type='text'>
commit b5d27718f38843a74552e9a93d32e2391fd3999f upstream.

The raid5 md device is created by the disks which we don't use the total size. For example,
the size of the device is 5G and it just uses 3G of the devices to create one raid5 device.
Then change the chunksize and wait reshape to finish. After reshape finishing stop the raid
and assemble it again. It fails.
mdadm -CR /dev/md0 -l5 -n3 /dev/loop[0-2] --size=3G --chunk=32 --assume-clean
mdadm /dev/md0 --grow --chunk=64
wait reshape to finish
mdadm -S /dev/md0
mdadm -As
The error messages:
[197519.814302] md: loop1 does not have a valid v1.2 superblock, not importing!
[197519.821686] md: md_import_device returned -22

After reshape the data offset is changed. It selects backwards direction in this condition.
In function super_1_load it compares the available space of the underlying device with
sb-&gt;data_size. The new data offset gets bigger after reshape. So super_1_load returns -EINVAL.
rdev-&gt;sectors is updated in md_finish_reshape. Then sb-&gt;data_size is set in super_1_sync based
on rdev-&gt;sectors. So add md_finish_reshape in end_reshape.

Signed-off-by: Xiao Ni &lt;xni@redhat.com&gt;
Acked-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>dm raid: stop using BUG() in __rdev_sectors()</title>
<updated>2017-07-27T22:10:12Z</updated>
<author>
<name>Heinz Mauelshagen</name>
<email>heinzm@redhat.com</email>
</author>
<published>2017-06-30T13:45:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=47f1b42a07b1e26c2d61a0785559d0f8434110f0'/>
<id>urn:sha1:47f1b42a07b1e26c2d61a0785559d0f8434110f0</id>
<content type='text'>
commit 4d49f1b4a1fcab16b6dd1c79ef14f2b6531d50a6 upstream.

Return 0 rather than BUG() if __rdev_sectors() fails and catch invalid
rdev size in the constructor.

Reported-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Heinz Mauelshagen &lt;heinzm@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>md: fix deadlock between mddev_suspend() and md_write_start()</title>
<updated>2017-07-27T22:10:11Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2017-06-05T06:49:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a2bfc67530653eca8484b393d71b7042efc26e1c'/>
<id>urn:sha1:a2bfc67530653eca8484b393d71b7042efc26e1c</id>
<content type='text'>
commit cc27b0c78c79680d128dbac79de0d40556d041bb upstream.

If mddev_suspend() races with md_write_start() we can deadlock
with mddev_suspend() waiting for the request that is currently
in md_write_start() to complete the -&gt;make_request() call,
and md_write_start() waiting for the metadata to be updated
to mark the array as 'dirty'.
As metadata updates done by md_check_recovery() only happen then
the mddev_lock() can be claimed, and as mddev_suspend() is often
called with the lock held, these threads wait indefinitely for each
other.

We fix this by having md_write_start() abort if mddev_suspend()
is happening, and -&gt;make_request() aborts if md_write_start()
aborted.
md_make_request() can detect this abort, decrease the -&gt;active_io
count, and wait for mddev_suspend().

Reported-by: Nix &lt;nix@esperi.org.uk&gt;
Fix: 68866e425be2(MD: no sync IO while suspended)
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>md: don't use flush_signals in userspace processes</title>
<updated>2017-07-27T22:10:11Z</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2017-06-07T23:05:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8d73fe66b5a640b793fdf13491b5cac12882be7e'/>
<id>urn:sha1:8d73fe66b5a640b793fdf13491b5cac12882be7e</id>
<content type='text'>
commit f9c79bc05a2a91f4fba8bfd653579e066714b1ec upstream.

The function flush_signals clears all pending signals for the process. It
may be used by kernel threads when we need to prepare a kernel thread for
responding to signals. However using this function for an userspaces
processes is incorrect - clearing signals without the program expecting it
can cause misbehavior.

The raid1 and raid5 code uses flush_signals in its request routine because
it wants to prepare for an interruptible wait. This patch drops
flush_signals and uses sigprocmask instead to block all signals (including
SIGKILL) around the schedule() call. The signals are not lost, but the
schedule() call won't respond to them.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Acked-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>dm thin: do not queue freed thin mapping for next stage processing</title>
<updated>2017-06-27T19:14:34Z</updated>
<author>
<name>Vallish Vaidyeshwara</name>
<email>vallish@amazon.com</email>
</author>
<published>2017-06-23T18:53:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=00a0ea33b495ee6149bf5a77ac5807ce87323abb'/>
<id>urn:sha1:00a0ea33b495ee6149bf5a77ac5807ce87323abb</id>
<content type='text'>
process_prepared_discard_passdown_pt1() should cleanup
dm_thin_new_mapping in cases of error.

dm_pool_inc_data_range() can fail trying to get a block reference:

metadata operation 'dm_pool_inc_data_range' failed: error = -61

When dm_pool_inc_data_range() fails, dm thin aborts current metadata
transaction and marks pool as PM_READ_ONLY. Memory for thin mapping
is released as well. However, current thin mapping will be queued
onto next stage as part of queue_passdown_pt2() or passdown_endio().
This dangling thin mapping memory when processed and accessed in
next stage will lead to device mapper crashing.

Code flow without fix:
-&gt; process_prepared_discard_passdown_pt1(m)
   -&gt; dm_thin_remove_range()
   -&gt; discard passdown
      --&gt; passdown_endio(m) queues m onto next stage
   -&gt; dm_pool_inc_data_range() fails, frees memory m
            but does not remove it from next stage queue

-&gt; process_prepared_discard_passdown_pt2(m)
   -&gt; processes freed memory m and crashes

One such stack:

Call Trace:
[&lt;ffffffffa037a46f&gt;] dm_cell_release_no_holder+0x2f/0x70 [dm_bio_prison]
[&lt;ffffffffa039b6dc&gt;] cell_defer_no_holder+0x3c/0x80 [dm_thin_pool]
[&lt;ffffffffa039b88b&gt;] process_prepared_discard_passdown_pt2+0x4b/0x90 [dm_thin_pool]
[&lt;ffffffffa0399611&gt;] process_prepared+0x81/0xa0 [dm_thin_pool]
[&lt;ffffffffa039e735&gt;] do_worker+0xc5/0x820 [dm_thin_pool]
[&lt;ffffffff8152bf54&gt;] ? __schedule+0x244/0x680
[&lt;ffffffff81087e72&gt;] ? pwq_activate_delayed_work+0x42/0xb0
[&lt;ffffffff81089f53&gt;] process_one_work+0x153/0x3f0
[&lt;ffffffff8108a71b&gt;] worker_thread+0x12b/0x4b0
[&lt;ffffffff8108a5f0&gt;] ? rescuer_thread+0x350/0x350
[&lt;ffffffff8108fd6a&gt;] kthread+0xca/0xe0
[&lt;ffffffff8108fca0&gt;] ? kthread_park+0x60/0x60
[&lt;ffffffff81530b45&gt;] ret_from_fork+0x25/0x30

The fix is to first take the block ref count for discarded block and
then do a passdown discard of this block. If block ref count fails,
then bail out aborting current metadata transaction, mark pool as
PM_READ_ONLY and also free current thin mapping memory (existing error
handling code) without queueing this thin mapping onto next stage of
processing. If block ref count succeeds, then passdown discard of this
block. Discard callback of passdown_endio() will queue this thin mapping
onto next stage of processing.

Code flow with fix:
-&gt; process_prepared_discard_passdown_pt1(m)
   -&gt; dm_thin_remove_range()
   -&gt; dm_pool_inc_data_range()
      --&gt; if fails, free memory m and bail out
   -&gt; discard passdown
      --&gt; passdown_endio(m) queues m onto next stage

Cc: stable &lt;stable@vger.kernel.org&gt; # v4.9+
Reviewed-by: Eduardo Valentin &lt;eduval@amazon.com&gt;
Reviewed-by: Cristian Gafton &lt;gafton@amazon.com&gt;
Reviewed-by: Anchal Agarwal &lt;anchalag@amazon.com&gt;
Signed-off-by: Vallish Vaidyeshwara &lt;vallish@amazon.com&gt;
Reviewed-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
</entry>
</feed>
