<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/drivers/md, branch v2.6.33</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v2.6.33</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v2.6.33'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2010-02-16T18:43:04Z</updated>
<entry>
<title>dm: sysfs revert add empty release function to avoid debug warning</title>
<updated>2010-02-16T18:43:04Z</updated>
<author>
<name>Alasdair G Kergon</name>
<email>agk@redhat.com</email>
</author>
<published>2010-02-16T18:43:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9307f6b19ac4f5887552b5b2992f391b866f7633'/>
<id>urn:sha1:9307f6b19ac4f5887552b5b2992f391b866f7633</id>
<content type='text'>
Revert commit d2bb7df8cac647b92f51fb84ae735771e7adbfa7 at Greg's request.

    Author: Milan Broz &lt;mbroz@redhat.com&gt;
    Date:   Thu Dec 10 23:51:53 2009 +0000

    dm: sysfs add empty release function to avoid debug warning

    This patch just removes an unnecessary warning:
     kobject: 'dm': does not have a release() function,
     it is broken and must be fixed.

    The kobject is embedded in mapped device struct, so
    code does not need to release memory explicitly here.

Cc: Greg KH &lt;gregkh@suse.de&gt;
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</content>
</entry>
<entry>
<title>dm mpath: fix stall when requeueing io</title>
<updated>2010-02-16T18:43:01Z</updated>
<author>
<name>Kiyoshi Ueda</name>
<email>k-ueda@ct.jp.nec.com</email>
</author>
<published>2010-02-16T18:43:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9eef87da2a8ea4920e0d913ff977cac064b68ee0'/>
<id>urn:sha1:9eef87da2a8ea4920e0d913ff977cac064b68ee0</id>
<content type='text'>
This patch fixes the problem that system may stall if target's -&gt;map_rq
returns DM_MAPIO_REQUEUE in map_request().
E.g. stall happens on 1 CPU box when a dm-mpath device with queue_if_no_path
     bounces between all-paths-down and paths-up on I/O load.

When target's -&gt;map_rq returns DM_MAPIO_REQUEUE, map_request() requeues
the request and returns to dm_request_fn().  Then, dm_request_fn()
doesn't exit the I/O dispatching loop and continues processing
the requeued request again.
This map and requeue loop can be done with interrupt disabled,
so 1 CPU system can be stalled if this situation happens.

For example, commands below can stall my 1 CPU box within 1 minute or so:
  # dmsetup table mp
  mp: 0 2097152 multipath 1 queue_if_no_path 0 1 1 service-time 0 1 2 8:144 1 1
  # while true; do dd if=/dev/mapper/mp of=/dev/null bs=1M count=100; done &amp;
  # while true; do \
  &gt; dmsetup message mp 0 "fail_path 8:144" \
  &gt; dmsetup suspend --noflush mp \
  &gt; dmsetup resume mp \
  &gt; dmsetup message mp 0 "reinstate_path 8:144" \
  &gt; done

To fix the problem above, this patch changes dm_request_fn() to exit
the I/O dispatching loop once if a request is requeued in map_request().

Signed-off-by: Kiyoshi Ueda &lt;k-ueda@ct.jp.nec.com&gt;
Signed-off-by: Jun'ichi Nomura &lt;j-nomura@ce.jp.nec.com&gt;
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</content>
</entry>
<entry>
<title>dm raid1: fix null pointer dereference in suspend</title>
<updated>2010-02-16T18:42:58Z</updated>
<author>
<name>Takahiro Yasui</name>
<email>tyasui@redhat.com</email>
</author>
<published>2010-02-16T18:42:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=558569aa9d83e016295bac77d900342908d7fd85'/>
<id>urn:sha1:558569aa9d83e016295bac77d900342908d7fd85</id>
<content type='text'>
When suspending a failed mirror, bios are completed by mirror_end_io() and
__rh_lookup() in dm_rh_dec() returns NULL where a non-NULL return value is
required by design.  Fix this by not changing the state of the recovery failed
region from DM_RH_RECOVERING to DM_RH_NOSYNC in dm_rh_recovery_end().

Issue

On 2.6.33-rc1 kernel, I hit the bug when I suspended the failed
mirror by dmsetup command.

BUG: unable to handle kernel NULL pointer dereference at 00000020
IP: [&lt;f94f38e2&gt;] dm_rh_dec+0x35/0xa1 [dm_region_hash]
...
EIP: 0060:[&lt;f94f38e2&gt;] EFLAGS: 00010046 CPU: 0
EIP is at dm_rh_dec+0x35/0xa1 [dm_region_hash]
EAX: 00000286 EBX: 00000000 ECX: 00000286 EDX: 00000000
ESI: eff79eac EDI: eff79e80 EBP: f6915cd4 ESP: f6915cc4
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process dmsetup (pid: 2849, ti=f6914000 task=eff03e80 task.ti=f6914000)
 ...
Call Trace:
 [&lt;f9530af6&gt;] ? mirror_end_io+0x53/0x1b1 [dm_mirror]
 [&lt;f9413104&gt;] ? clone_endio+0x4d/0xa2 [dm_mod]
 [&lt;f9530aa3&gt;] ? mirror_end_io+0x0/0x1b1 [dm_mirror]
 [&lt;f94130b7&gt;] ? clone_endio+0x0/0xa2 [dm_mod]
 [&lt;c02d6bcb&gt;] ? bio_endio+0x28/0x2b
 [&lt;f952f303&gt;] ? hold_bio+0x2d/0x62 [dm_mirror]
 [&lt;f952f942&gt;] ? mirror_presuspend+0xeb/0xf7 [dm_mirror]
 [&lt;c02aa3e2&gt;] ? vmap_page_range+0xb/0xd
 [&lt;f9414c8d&gt;] ? suspend_targets+0x2d/0x3b [dm_mod]
 [&lt;f9414ca9&gt;] ? dm_table_presuspend_targets+0xe/0x10 [dm_mod]
 [&lt;f941456f&gt;] ? dm_suspend+0x4d/0x150 [dm_mod]
 [&lt;f941767d&gt;] ? dev_suspend+0x55/0x18a [dm_mod]
 [&lt;c0343762&gt;] ? _copy_from_user+0x42/0x56
 [&lt;f9417fb0&gt;] ? dm_ctl_ioctl+0x22c/0x281 [dm_mod]
 [&lt;f9417628&gt;] ? dev_suspend+0x0/0x18a [dm_mod]
 [&lt;f9417d84&gt;] ? dm_ctl_ioctl+0x0/0x281 [dm_mod]
 [&lt;c02c3c4b&gt;] ? vfs_ioctl+0x22/0x85
 [&lt;c02c422c&gt;] ? do_vfs_ioctl+0x4cb/0x516
 [&lt;c02c42b7&gt;] ? sys_ioctl+0x40/0x5a
 [&lt;c0202858&gt;] ? sysenter_do_call+0x12/0x28

Analysis

When recovery process of a region failed, dm_rh_recovery_end() function
changes the state of the region from RM_RH_RECOVERING to DM_RH_NOSYNC.
When recovery_complete() is executed between dm_rh_update_states() and
dm_writes() in do_mirror(), bios are processed with the region state,
DM_RH_NOSYNC. However, the region data is freed without checking its
pending count when dm_rh_update_states() is called next time.

When bios are finished by mirror_end_io(), __rh_lookup() in dm_rh_dec()
returns NULL even though a valid return value are expected.

Solution

Remove the state change of the recovery failed region from DM_RH_RECOVERING
to DM_RH_NOSYNC in dm_rh_recovery_end(). We can remove the state change
because:

  - If the region data has been released by dm_rh_update_states(),
    a new region data is created with the state of DM_RH_NOSYNC, and
    bios are processed according to the DM_RH_NOSYNC state.

  - If the region data has not been released by dm_rh_update_states(),
    a state of the region is DM_RH_RECOVERING and bios are put in the
    delayed_bio list.

The flag change from DM_RH_RECOVERING to DM_RH_NOSYNC in dm_rh_recovery_end()
was added in the following commit:
  dm raid1: handle resync failures
  author  Jonathan Brassow &lt;jbrassow@redhat.com&gt;
    Thu, 12 Jul 2007 16:29:04 +0000 (17:29 +0100)
  http://git.kernel.org/linus/f44db678edcc6f4c2779ac43f63f0b9dfa28b724

Signed-off-by: Takahiro Yasui &lt;tyasui@redhat.com&gt;
Reviewed-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</content>
</entry>
<entry>
<title>dm raid1: fail writes if errors are not handled and log fails</title>
<updated>2010-02-16T18:42:55Z</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2010-02-16T18:42:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5528d17de1cf1462f285c40ccaf8e0d0e4c64dc0'/>
<id>urn:sha1:5528d17de1cf1462f285c40ccaf8e0d0e4c64dc0</id>
<content type='text'>
If the mirror log fails when the handle_errors option was not selected
and there is no remaining valid mirror leg, writes return success even
though they weren't actually written to any device.  This patch
completes them with EIO instead.

This code path is taken:
do_writes:
	bio_list_merge(&amp;ms-&gt;failures, &amp;sync);
do_failures:
	if (!get_valid_mirror(ms)) (false)
	else if (errors_handled(ms)) (false)
	else bio_endio(bio, 0);

The logic in do_failures is based on presuming that the write was already
tried: if it succeeded at least on one leg (without handle_errors) it
is reported as success.

Reference: https://bugzilla.redhat.com/show_bug.cgi?id=555197

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</content>
</entry>
<entry>
<title>dm log: userspace fix overhead_size calcuations</title>
<updated>2010-02-16T18:42:53Z</updated>
<author>
<name>Jonathan Brassow</name>
<email>jbrassow@redhat.com</email>
</author>
<published>2010-02-16T18:42:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ebfd32bba9b518d684009d9d21a56742337ca1b3'/>
<id>urn:sha1:ebfd32bba9b518d684009d9d21a56742337ca1b3</id>
<content type='text'>
This patch fixes two bugs that revolve around the miscalculation and
misuse of the variable 'overhead_size'.  'overhead_size' is the size of
the various header structures used during communication.

The first bug is the use of 'sizeof' with the pointer of a structure
instead of the structure itself - resulting in the wrong size being
computed.  This is then used in a check to see if the payload
(data_size) would be to large for the preallocated structure.  Since the
bug produces a smaller value for the overhead, it was possible for the
structure to be breached.  (Although the current users of the code do
not currently send enough data to trigger this bug.)

The second bug is that the 'overhead_size' value is used to compute how
much of the preallocated space should be cleared before populating it
with fresh data.  This should have simply been 'sizeof(struct cn_msg)'
not overhead_size.  The fact that 'overhead_size' was computed
incorrectly made this problem "less bad" - leaving only a pointer's
worth of space at the end uncleared.  Thus, this bug was never producing
a bad result, but still needs to be fixed - especially now that the
value is computed correctly.

Cc: stable@kernel.org
Signed-off-by: Jonathan Brassow &lt;jbrassow@redhat.com
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</content>
</entry>
<entry>
<title>dm snapshot: persistent annotate work_queue as on stack</title>
<updated>2010-02-16T18:42:51Z</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2010-02-16T18:42:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=55f67f2dedec1e3049abc30b6d82b999a14cafb7'/>
<id>urn:sha1:55f67f2dedec1e3049abc30b6d82b999a14cafb7</id>
<content type='text'>
chunk_io() declares its 'struct mdata_req' on the stack and then
initializes its 'struct work_struct' member.  Annotate the
initialization of this workqueue with INIT_WORK_ON_STACK to suppress a
debugobjects warning seen when CONFIG_DEBUG_OBJECTS_WORK is enabled.

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</content>
</entry>
<entry>
<title>dm stripe: avoid divide by zero with invalid stripe count</title>
<updated>2010-02-16T18:42:47Z</updated>
<author>
<name>Nikanth Karthikesan</name>
<email>knikanth@suse.de</email>
</author>
<published>2010-02-16T18:42:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=781248c1b50c776a9ef4be1130f84ced1cba42fe'/>
<id>urn:sha1:781248c1b50c776a9ef4be1130f84ced1cba42fe</id>
<content type='text'>
If a table containing zero as stripe count is passed into stripe_ctr
the code attempts to divide by zero.

This patch changes DM_TABLE_LOAD to return -EINVAL if the stripe count
is zero.

We now get the following error messages:
  device-mapper: table: 253:0: striped: Invalid stripe count
  device-mapper: ioctl: error adding target to table

Signed-off-by: Nikanth Karthikesan &lt;knikanth@suse.de&gt;
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
</content>
</entry>
<entry>
<title>md: fix some lockdep issues between md and sysfs.</title>
<updated>2010-02-10T00:26:09Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-02-09T05:34:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ef286f6fa673cd7fb367e1b145069d8dbfcc6081'/>
<id>urn:sha1:ef286f6fa673cd7fb367e1b145069d8dbfcc6081</id>
<content type='text'>
======
This fix is related to
    http://bugzilla.kernel.org/show_bug.cgi?id=15142
but does not address that exact issue.
======

sysfs does like attributes being removed while they are being accessed
(i.e. read or written) and waits for the access to complete.

As accessing some md attributes takes the same lock that is held while
removing those attributes a deadlock can occur.

This patch addresses 3 issues in md that could lead to this deadlock.

Two relate to calling flush_scheduled_work while the lock is held.
This is probably a bad idea in general and as we use schedule_work to
delete various sysfs objects it is particularly bad.

In one case flush_scheduled_work is called from md_alloc (called by
md_probe) called from do_md_run which holds the lock.  This call is
only present to ensure that -&gt;gendisk is set.  However we can be sure
that gendisk is always set (though possibly we couldn't when that code
was originally written.  This is because do_md_run is called in three
different contexts:
  1/ from md_ioctl.  This requires that md_open has succeeded, and it
     fails if -&gt;gendisk is not set.
  2/ from writing a sysfs attribute.  This can only happen if the
     mddev has been registered in sysfs which happens in md_alloc
     after -&gt;gendisk has been set.
  3/ from autorun_array which is only called by autorun_devices, which
     checks for -&gt;gendisk to be set before calling autorun_array.
So the call to md_probe in do_md_run can be removed, and the check on
-&gt;gendisk can also go.


In the other case flush_scheduled_work is being called in do_md_stop,
purportedly to wait for all md_delayed_delete calls (which delete the
component rdevs) to complete.  However there really isn't any need to
wait for them - they have already been disconnected in all important
ways.

The third issue is that raid5-&gt;stop() removes some attribute names
while the lock is held.  There is already some infrastructure in place
to delay attribute removal until after the lock is released (using
schedule_work).  So extend that infrastructure to remove the
raid5_attrs_group.

This does not address all lockdep issues related to the sysfs
"s_active" lock.  The rest can be address by splitting that lockdep
context between symlinks and non-symlinks which hopefully will happen.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
</entry>
<entry>
<title>md: fix 'degraded' calculation when starting a reshape.</title>
<updated>2010-02-09T05:34:29Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-02-09T01:31:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9eb07c259207d048e3ee8be2a77b2a4680b1edd4'/>
<id>urn:sha1:9eb07c259207d048e3ee8be2a77b2a4680b1edd4</id>
<content type='text'>
This code was written long ago when it was not possible to
reshape a degraded array.  Now it is so the current level of
degraded-ness needs to be taken in to account.  Also newly addded
devices should only reduce degradedness if they are deemed to be
in-sync.

In particular, if you convert a RAID5 to a RAID6, and increase the
number of devices at the same time, then the 5-&gt;6 conversion will
make the array degraded so the current code will produce a wrong
value for 'degraded' - "-1" to be precise.

If the reshape runs to completion end_reshape will calculate a correct
new value for 'degraded', but if a device fails during the reshape an
incorrect decision might be made based on the incorrect value of
"degraded".

This patch is suitable for 2.6.32-stable and if they are still open,
2.6.31-stable and 2.6.30-stable as well.

Cc: stable@kernel.org
Reported-by: Michael Evans &lt;mjevans1983@gmail.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
</entry>
<entry>
<title>DM: Fix device mapper topology stacking</title>
<updated>2010-01-11T13:29:20Z</updated>
<author>
<name>Martin K. Petersen</name>
<email>martin.petersen@oracle.com</email>
</author>
<published>2010-01-11T08:21:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b27d7f16d3c6c27345d4280a739809c1c2c4c0b5'/>
<id>urn:sha1:b27d7f16d3c6c27345d4280a739809c1c2c4c0b5</id>
<content type='text'>
Make DM use bdev_stack_limits() function so that partition offsets get
taken into account when calculating alignment.  Clarify stacking
warnings.

Also remove obsolete clearing of final alignment_offset and misalignment
flag.

Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: Alasdair G. Kergon &lt;agk@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
</feed>
