<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/drivers/md/raid10.c, branch v3.2.78</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2.78</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2.78'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2015-11-17T15:54:45Z</updated>
<entry>
<title>md/raid10: don't clear bitmap bit when bad-block-list write fails.</title>
<updated>2015-11-17T15:54:45Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2015-10-24T05:23:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=965d8d1de4c6c0dddc2f031bd09d8415f0521f3f'/>
<id>urn:sha1:965d8d1de4c6c0dddc2f031bd09d8415f0521f3f</id>
<content type='text'>
commit c340702ca26a628832fade4f133d8160a55c29cc upstream.

When a write fails and a bad-block-list is present, we can
update the bad-block-list instead of writing the data.  If
this succeeds then it is OK clear the relevant bitmap-bit as
no further 'sync' of the block is needed.

However if writing the bad-block-list fails then we need to
treat the write as failed and particularly must not clear
the bitmap bit.  Otherwise the device can be re-added (after
any hardware connection issues are resolved) and because the
relevant bit in the bitmap is clear, that block will not be
resynced.  This leads to data corruption.

We already delay the final bio_endio() on the write until
the bad-block-list is written so that when the write
returns: either that data is safe, the bad-block record is
safe, or the fact that the device is faulty is safe.
However we *don't* delay the clearing of the bitmap, so the
bitmap bit can be recorded as cleared before we know if the
bad-block-list was written safely.

So: delay that until the write really is safe.
i.e. move the call to close_write() until just before
calling bio_endio(), and recheck the 'is array degraded'
status before making that call.

This bug goes back to v3.1 when bad-block-lists were
introduced, though it only affects arrays created with
mdadm-3.3 or later as only those have bad-block lists.

Backports will require at least
Commit: 95af587e95aa ("md/raid10: ensure device failure recorded before write request returns.")
as well.  I'll send that to 'stable' separately.

Note that of the two tests of R10BIO_WriteError that this
patch adds, the first is certain to fail and the second is
certain to succeed.  However doing it this way makes the
patch more obviously correct.  I will tidy the code up in a
future merge window.

Reported-by: Nate Dailey &lt;nate.dailey@stratus.com&gt;
Fixes: bd870a16c594 ("md/raid10:  Handle write errors by updating badblock log.")
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>md/raid10: ensure device failure recorded before write request returns.</title>
<updated>2015-11-17T15:54:45Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2015-08-14T01:26:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c1fba1c813b9135565aa01bd8470b94f315078ac'/>
<id>urn:sha1:c1fba1c813b9135565aa01bd8470b94f315078ac</id>
<content type='text'>
commit 95af587e95aacb9cfda4a9641069a5244a540dc8 upstream.

When a write to one of the legs of a RAID10 fails, the failure is
recorded in the metadata of the other legs so that after a restart
the data on the failed drive wont be trusted even if that drive seems
to be working again (maybe a cable was unplugged).

Currently there is no interlock between the write request completing
and the metadata update.  So it is possible that the write will
complete, the app will confirm success in some way, and then the
machine will crash before the metadata update completes.

This is an extremely small hole for a racy to fit in, but it is
theoretically possible and so should be closed.

So:
 - set MD_CHANGE_PENDING when requesting a metadata update for a
   failed device, so we can know with certainty when it completes
 - queue requests that experienced an error on a new queue which
   is only processed after the metadata update completes
 - call raid_end_bio_io() on bios in that queue when the time comes.

Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>md/raid1,raid10: always abort recover on write error.</title>
<updated>2014-09-13T22:41:40Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2014-07-31T00:16:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9a28f9e6ec28037666ce0c0fb55165a05c8d01f0'/>
<id>urn:sha1:9a28f9e6ec28037666ce0c0fb55165a05c8d01f0</id>
<content type='text'>
commit 2446dba03f9dabe0b477a126cbeb377854785b47 upstream.

Currently we don't abort recovery on a write error if the write error
to the recovering device was triggerd by normal IO (as opposed to
recovery IO).

This means that for one bitmap region, the recovery might write to the
recovering device for a few sectors, then not bother for subsequent
sectors (as it never writes to failed devices).  In this case
the bitmap bit will be cleared, but it really shouldn't.

The result is that if the recovering device fails and is then re-added
(after fixing whatever hardware problem triggerred the failure),
the second recovery won't redo the region it was in the middle of,
so some of the device will not be recovered properly.

If we abort the recovery, the region being processes will be cancelled
(bit not cleared) and the whole region will be retried.

As the bug can result in data corruption the patch is suitable for
-stable.  For kernels prior to 3.11 there is a conflict in raid10.c
which will require care.

Original-from: jiao hui &lt;jiaohui@bwstor.com.cn&gt;
Reported-and-tested-by: jiao hui &lt;jiaohui@bwstor.com.cn&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>md/raid10: fix bug when raid10 recovery fails to recover a block.</title>
<updated>2014-02-15T19:20:17Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2014-01-05T23:35:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8ea69324fe0ff7b7a7d0424b8518c63358971f68'/>
<id>urn:sha1:8ea69324fe0ff7b7a7d0424b8518c63358971f68</id>
<content type='text'>
commit e8b849158508565e0cd6bc80061124afc5879160 upstream.

commit e875ecea266a543e643b19e44cf472f1412708f9
    md/raid10 record bad blocks as needed during recovery.

added code to the "cannot recover this block" path to record a bad
block rather than fail the whole recovery.
Unfortunately this new case was placed *after* r10bio was freed rather
than *before*, yet it still uses r10bio.
This is will crash with a null dereference.

So move the freeing of r10bio down where it is safe.

Fixes: e875ecea266a543e643b19e44cf472f1412708f9
Reported-by: Damian Nowak &lt;spam@nowaker.net&gt;
URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>md/raid10: fix two bugs in handling of known-bad-blocks.</title>
<updated>2014-02-15T19:20:16Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2014-01-13T23:38:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=11bbcdfc5e8d7983403952f37543d085be5d6f58'/>
<id>urn:sha1:11bbcdfc5e8d7983403952f37543d085be5d6f58</id>
<content type='text'>
commit b50c259e25d9260b9108dc0c2964c26e5ecbe1c1 upstream.

If we discover a bad block when reading we split the request and
potentially read some of it from a different device.

The code path of this has two bugs in RAID10.
1/ we get a spin_lock with _irq, but unlock without _irq!!
2/ The calculation of 'sectors_handled' is wrong, as can be clearly
   seen by comparison with raid1.c

This leads to at least 2 warnings and a probable crash is a RAID10
ever had known bad blocks.

Fixes: 856e08e23762dfb92ffc68fd0a8d228f9e152160
Reported-by: Damian Nowak &lt;spam@nowaker.net&gt;
URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>md/raid1: consider WRITE as successful only if at least one non-Faulty and non-rebuilding drive completed it.</title>
<updated>2013-06-19T01:17:02Z</updated>
<author>
<name>Alex Lyakas</name>
<email>alex@zadarastorage.com</email>
</author>
<published>2013-06-04T17:42:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=58e06b1d5e77f6cbed3cddc39e7d782d462e1883'/>
<id>urn:sha1:58e06b1d5e77f6cbed3cddc39e7d782d462e1883</id>
<content type='text'>
commit 3056e3aec8d8ba61a0710fb78b2d562600aa2ea7 upstream.

Without that fix, the following scenario could happen:

- RAID1 with drives A and B; drive B was freshly-added and is rebuilding
- Drive A fails
- WRITE request arrives to the array. It is failed by drive A, so
r1_bio is marked as R1BIO_WriteError, but the rebuilding drive B
succeeds in writing it, so the same r1_bio is marked as
R1BIO_Uptodate.
- r1_bio arrives to handle_write_finished, badblocks are disabled,
md_error()-&gt;error() does nothing because we don't fail the last drive
of raid1
- raid_end_bio_io()  calls call_bio_endio()
- As a result, in call_bio_endio():
        if (!test_bit(R1BIO_Uptodate, &amp;r1_bio-&gt;state))
                clear_bit(BIO_UPTODATE, &amp;bio-&gt;bi_flags);
this code doesn't clear the BIO_UPTODATE flag, and the whole master
WRITE succeeds, back to the upper layer.

So we returned success to the upper layer, even though we had written
the data onto the rebuilding drive only. But when we want to read the
data back, we would not read from the rebuilding drive, so this data
is lost.

[neilb - applied identical change to raid10 as well]

This bug can result in lost data, so it is suitable for any
-stable kernel.

Signed-off-by: Alex Lyakas &lt;alex@zadarastorage.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
[bwh: Backported to 3.2: for raid10, s/rdev/conf-&gt;mirrors[dev].rdev/]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>md/raid10: use correct limit variable</title>
<updated>2012-10-30T23:26:43Z</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@oracle.com</email>
</author>
<published>2012-10-11T03:20:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=69cc003f594c8697e623c78f800c4c7bcaf7c900'/>
<id>urn:sha1:69cc003f594c8697e623c78f800c4c7bcaf7c900</id>
<content type='text'>
commit 91502f099dfc5a1e8812898e26ee280713e1d002 upstream.

Clang complains that we are assigning a variable to itself.  This should
be using bad_sectors like the similar earlier check does.

Bug has been present since 3.1-rc1.  It is minor but could
conceivably cause corruption or other bad behaviour.

Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>md/raid10: fix "enough" function for detecting if array is failed.</title>
<updated>2012-10-10T02:31:06Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2012-09-27T02:35:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4e19de3be14c9390e63271effb5b95ab50f298f4'/>
<id>urn:sha1:4e19de3be14c9390e63271effb5b95ab50f298f4</id>
<content type='text'>
commit 80b4812407c6b1f66a4f2430e69747a13f010839 upstream.

The 'enough' function is written to work with 'near' arrays only
in that is implicitly assumes that the offset from one 'group' of
devices to the next is the same as the number of copies.
In reality it is the number of 'near' copies.

So change it to make this number explicit.

This bug makes it possible to run arrays without enough drives
present, which is dangerous.
It is appropriate for an -stable kernel, but will almost certainly
need to be modified for some of them.

Reported-by: Jakub Husák &lt;jakub@gooseman.cz&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
[bwh: Backported to 3.2: s/geo-&gt;/conf-&gt;/]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>md/raid10: fix failure when trying to repair a read error.</title>
<updated>2012-07-12T03:32:06Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2012-07-03T05:55:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0cc5b5c40fccdbf7b872ae167db30a4aeb413d89'/>
<id>urn:sha1:0cc5b5c40fccdbf7b872ae167db30a4aeb413d89</id>
<content type='text'>
commit 055d3747dbf00ce85c6872ecca4d466638e80c22 upstream.

commit 58c54fcca3bac5bf9290cfed31c76e4c4bfbabaf
     md/raid10: handle further errors during fix_read_error better.

in 3.1 added "r10_sync_page_io" which takes an IO size in sectors.
But we were passing the IO size in bytes!!!
This resulting in bio_add_page failing, and empty request being sent
down, and a consequent BUG_ON in scsi_lib.

[fix missing space in error message at same time]

This fix is suitable for 3.1.y and later.

Reported-by: Christian Balzer &lt;chibi@gol.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>md/raid10: Don't try to recovery unmatched (and unused) chunks.</title>
<updated>2012-07-12T03:32:05Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2012-07-03T00:37:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2e9e8f43671774f754629eda8489946bf09fe4fc'/>
<id>urn:sha1:2e9e8f43671774f754629eda8489946bf09fe4fc</id>
<content type='text'>
commit fc448a18ae6219af9a73257b1fbcd009efab4a81 upstream.

If a RAID10 has an odd number of chunks - as might happen when there
are an odd number of devices - the last chunk has no pair and so is
not mirrored.  We don't store data there, but when recovering the last
device in an array we retry to recover that last chunk from a
non-existent location.  This results in an error, and the recovery
aborts.

When we get to that last chunk we should just stop - there is nothing
more to do anyway.

This bug has been present since the introduction of RAID10, so the
patch is appropriate for any -stable kernel.

Reported-by: Christian Balzer &lt;chibi@gol.com&gt;
Tested-by: Christian Balzer &lt;chibi@gol.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
</feed>
