<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/drivers/md, branch v3.2.78</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2.78</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2.78'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2016-02-13T10:34:07Z</updated>
<entry>
<title>dm snapshot: fix hung bios when copy error occurs</title>
<updated>2016-02-13T10:34:07Z</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2016-01-09T00:07:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=07b8f29203c5fc3ea82e1c8819d10ff10b66e056'/>
<id>urn:sha1:07b8f29203c5fc3ea82e1c8819d10ff10b66e056</id>
<content type='text'>
commit 385277bfb57faac44e92497104ba542cdd82d5fe upstream.

When there is an error copying a chunk dm-snapshot can incorrectly hold
associated bios indefinitely, resulting in hung IO.

The function copy_callback sets pe-&gt;error if there was error copying the
chunk, and then calls complete_exception.  complete_exception calls
pending_complete on error, otherwise it calls commit_exception with
commit_callback (and commit_callback calls complete_exception).

The persistent exception store (dm-snap-persistent.c) assumes that calls
to prepare_exception and commit_exception are paired.
persistent_prepare_exception increases ps-&gt;pending_count and
persistent_commit_exception decreases it.

If there is a copy error, persistent_prepare_exception is called but
persistent_commit_exception is not.  This results in the variable
ps-&gt;pending_count never returning to zero and that causes some pending
exceptions (and their associated bios) to be held forever.

Fix this by unconditionally calling commit_exception regardless of
whether the copy was successful.  A new "valid" parameter is added to
commit_exception -- when the copy fails this parameter is set to zero so
that the chunk that failed to copy (and all following chunks) is not
recorded in the snapshot store.  Also, remove commit_callback now that
it is merely a wrapper around pending_complete.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>dm btree: fix bufio buffer leaks in dm_btree_del() error path</title>
<updated>2015-12-30T02:26:01Z</updated>
<author>
<name>Joe Thornber</name>
<email>ejt@redhat.com</email>
</author>
<published>2015-12-10T14:37:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=715b56f753f736b6b701e402f22e8d5a41471f1b'/>
<id>urn:sha1:715b56f753f736b6b701e402f22e8d5a41471f1b</id>
<content type='text'>
commit ed8b45a3679eb49069b094c0711b30833f27c734 upstream.

If dm_btree_del()'s call to push_frame() fails, e.g. due to
btree_node_validator finding invalid metadata, the dm_btree_del() error
path must unlock all frames (which have active dm-bufio buffers) that
were pushed onto the del_stack.

Otherwise, dm_bufio_client_destroy() will BUG_ON() because dm-bufio
buffers have leaked, e.g.:
  device-mapper: bufio: leaked buffer 3, hold count 1, list 0

Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>dm btree: fix leak of bufio-backed block in btree_split_sibling error path</title>
<updated>2015-12-30T02:25:59Z</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2015-11-23T21:24:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b777326182726e55759f48ae39af5b33eec3b292'/>
<id>urn:sha1:b777326182726e55759f48ae39af5b33eec3b292</id>
<content type='text'>
commit 30ce6e1cc5a0f781d60227e9096c86e188d2c2bd upstream.

The block allocated at the start of btree_split_sibling() is never
released if later insert_at() fails.

Fix this by releasing the previously allocated bufio block using
unlock_block().

Reported-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>md/raid10: don't clear bitmap bit when bad-block-list write fails.</title>
<updated>2015-11-17T15:54:45Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2015-10-24T05:23:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=965d8d1de4c6c0dddc2f031bd09d8415f0521f3f'/>
<id>urn:sha1:965d8d1de4c6c0dddc2f031bd09d8415f0521f3f</id>
<content type='text'>
commit c340702ca26a628832fade4f133d8160a55c29cc upstream.

When a write fails and a bad-block-list is present, we can
update the bad-block-list instead of writing the data.  If
this succeeds then it is OK clear the relevant bitmap-bit as
no further 'sync' of the block is needed.

However if writing the bad-block-list fails then we need to
treat the write as failed and particularly must not clear
the bitmap bit.  Otherwise the device can be re-added (after
any hardware connection issues are resolved) and because the
relevant bit in the bitmap is clear, that block will not be
resynced.  This leads to data corruption.

We already delay the final bio_endio() on the write until
the bad-block-list is written so that when the write
returns: either that data is safe, the bad-block record is
safe, or the fact that the device is faulty is safe.
However we *don't* delay the clearing of the bitmap, so the
bitmap bit can be recorded as cleared before we know if the
bad-block-list was written safely.

So: delay that until the write really is safe.
i.e. move the call to close_write() until just before
calling bio_endio(), and recheck the 'is array degraded'
status before making that call.

This bug goes back to v3.1 when bad-block-lists were
introduced, though it only affects arrays created with
mdadm-3.3 or later as only those have bad-block lists.

Backports will require at least
Commit: 95af587e95aa ("md/raid10: ensure device failure recorded before write request returns.")
as well.  I'll send that to 'stable' separately.

Note that of the two tests of R10BIO_WriteError that this
patch adds, the first is certain to fail and the second is
certain to succeed.  However doing it this way makes the
patch more obviously correct.  I will tidy the code up in a
future merge window.

Reported-by: Nate Dailey &lt;nate.dailey@stratus.com&gt;
Fixes: bd870a16c594 ("md/raid10:  Handle write errors by updating badblock log.")
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>md/raid10: ensure device failure recorded before write request returns.</title>
<updated>2015-11-17T15:54:45Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2015-08-14T01:26:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c1fba1c813b9135565aa01bd8470b94f315078ac'/>
<id>urn:sha1:c1fba1c813b9135565aa01bd8470b94f315078ac</id>
<content type='text'>
commit 95af587e95aacb9cfda4a9641069a5244a540dc8 upstream.

When a write to one of the legs of a RAID10 fails, the failure is
recorded in the metadata of the other legs so that after a restart
the data on the failed drive wont be trusted even if that drive seems
to be working again (maybe a cable was unplugged).

Currently there is no interlock between the write request completing
and the metadata update.  So it is possible that the write will
complete, the app will confirm success in some way, and then the
machine will crash before the metadata update completes.

This is an extremely small hole for a racy to fit in, but it is
theoretically possible and so should be closed.

So:
 - set MD_CHANGE_PENDING when requesting a metadata update for a
   failed device, so we can know with certainty when it completes
 - queue requests that experienced an error on a new queue which
   is only processed after the metadata update completes
 - call raid_end_bio_io() on bios in that queue when the time comes.

Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>md/raid1: don't clear bitmap bit when bad-block-list write fails.</title>
<updated>2015-11-17T15:54:45Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2015-10-24T05:02:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1f6c748a9d212708e2e0b5a43570fc22fc856689'/>
<id>urn:sha1:1f6c748a9d212708e2e0b5a43570fc22fc856689</id>
<content type='text'>
commit bd8688a199b864944bf62eebed0ca13b46249453 upstream.

When a write fails and a bad-block-list is present, we can
update the bad-block-list instead of writing the data.  If
this succeeds then it is OK clear the relevant bitmap-bit as
no further 'sync' of the block is needed.

However if writing the bad-block-list fails then we need to
treat the write as failed and particularly must not clear
the bitmap bit.  Otherwise the device can be re-added (after
any hardware connection issues are resolved) and because the
relevant bit in the bitmap is clear, that block will not be
resynced.  This leads to data corruption.

We already delay the final bio_endio() on the write until
the bad-block-list is written so that when the write
returns: either that data is safe, the bad-block record is
safe, or the fact that the device is faulty is safe.
However we *don't* delay the clearing of the bitmap, so the
bitmap bit can be recorded as cleared before we know if the
bad-block-list was written safely.

So: delay that until the write really is safe.
i.e. move the call to close_write() until just before
calling bio_endio(), and recheck the 'is array degraded'
status before making that call.

This bug goes back to v3.1 when bad-block-lists were
introduced, though it only affects arrays created with
mdadm-3.3 or later as only those have bad-block lists.

Backports will require at least
Commit: 55ce74d4bfe1 ("md/raid1: ensure device failure recorded before write request returns.")
as well.  I'll send that to 'stable' separately.

Note that of the two tests of R1BIO_WriteError that this
patch adds, the first is certain to fail and the second is
certain to succeed.  However doing it this way makes the
patch more obviously correct.  I will tidy the code up in a
future merge window.

Reported-and-tested-by: Nate Dailey &lt;nate.dailey@stratus.com&gt;
Cc: Jes Sorensen &lt;Jes.Sorensen@redhat.com&gt;
Fixes: cd5ff9a16f08 ("md/raid1:  Handle write errors by updating badblock log.")
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>md/raid1: ensure device failure recorded before write request returns.</title>
<updated>2015-11-17T15:54:45Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2015-08-14T01:11:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6a1281c38ae60b71e308d818c2668a6b8bd4897b'/>
<id>urn:sha1:6a1281c38ae60b71e308d818c2668a6b8bd4897b</id>
<content type='text'>
commit 55ce74d4bfe1b9444436264c637f39a152d1e5ac upstream.

When a write to one of the legs of a RAID1 fails, the failure is
recorded in the metadata of the other leg(s) so that after a restart
the data on the failed drive wont be trusted even if that drive seems
to be working again  (maybe a cable was unplugged).

Similarly when we record a bad-block in response to a write failure,
we must not let the write complete until the bad-block update is safe.

Currently there is no interlock between the write request completing
and the metadata update.  So it is possible that the write will
complete, the app will confirm success in some way, and then the
machine will crash before the metadata update completes.

This is an extremely small hole for a racy to fit in, but it is
theoretically possible and so should be closed.

So:
 - set MD_CHANGE_PENDING when requesting a metadata update for a
   failed device, so we can know with certainty when it completes
 - queue requests that experienced an error on a new queue which
   is only processed after the metadata update completes
 - call raid_end_bio_io() on bios in that queue when the time comes.

Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>dm btree: fix leak of bufio-backed block in btree_split_beneath error path</title>
<updated>2015-11-17T15:54:45Z</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2015-10-22T14:56:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=12d1c67b7b482bea0503fb9e42e9d257498b5c32'/>
<id>urn:sha1:12d1c67b7b482bea0503fb9e42e9d257498b5c32</id>
<content type='text'>
commit 4dcb8b57df3593dcb20481d9d6cf79d1dc1534be upstream.

btree_split_beneath()'s error path had an outstanding FIXME that speaks
directly to the potential for _not_ cleaning up a previously allocated
bufio-backed block.

Fix this by releasing the previously allocated bufio block using
unlock_block().

Reported-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Acked-by: Joe Thornber &lt;thornber@redhat.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>dm btree remove: fix a bug when rebalancing nodes after removal</title>
<updated>2015-11-17T15:54:45Z</updated>
<author>
<name>Joe Thornber</name>
<email>ejt@redhat.com</email>
</author>
<published>2015-10-21T17:36:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=11020754df47e783aa3908c2fbf33318b05b744e'/>
<id>urn:sha1:11020754df47e783aa3908c2fbf33318b05b744e</id>
<content type='text'>
commit 2871c69e025e8bc507651d5a9cf81a8a7da9d24b upstream.

Commit 4c7e309340ff ("dm btree remove: fix bug in redistribute3") wasn't
a complete fix for redistribute3().

The redistribute3 function takes 3 btree nodes and shares out the entries
evenly between them.  If the three nodes in total contained
(MAX_ENTRIES * 3) - 1 entries between them then this was erroneously getting
rebalanced as (MAX_ENTRIES - 1) on the left and right, and (MAX_ENTRIES + 1) in
the center.

Fix this issue by being more careful about calculating the target number
of entries for the left and right nodes.

Unit tested in userspace using this program:
https://github.com/jthornber/redistribute3-test/blob/master/redistribute3_t.c

Signed-off-by: Joe Thornber &lt;ejt@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>md/raid0: apply base queue limits *before* disk_stack_limits</title>
<updated>2015-11-17T15:54:41Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2015-09-24T05:47:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e1263d46df6f5f96bc6abd039684b7778a72bab3'/>
<id>urn:sha1:e1263d46df6f5f96bc6abd039684b7778a72bab3</id>
<content type='text'>
commit 66eefe5de11db1e0d8f2edc3880d50e7c36a9d43 upstream.

Calling e.g. blk_queue_max_hw_sectors() after calls to
disk_stack_limits() discards the settings determined by
disk_stack_limits().
So we need to make those calls first.

Fixes: 199dc6ed5179 ("md/raid0: update queue parameter in a safer location.")
Reported-by: Jes Sorensen &lt;Jes.Sorensen@redhat.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
[bwh: Backported to 3.2: the code being moved looks a little different]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
</feed>
