<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/drivers/md, branch v6.9.5</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.9.5</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.9.5'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2024-06-16T11:50:57Z</updated>
<entry>
<title>md/raid5: fix deadlock that raid5d() wait for itself to clear MD_SB_CHANGE_PENDING</title>
<updated>2024-06-16T11:50:57Z</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2024-03-22T08:10:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e332a12f65d8fed8cf63bedb4e9317bb872b9ac7'/>
<id>urn:sha1:e332a12f65d8fed8cf63bedb4e9317bb872b9ac7</id>
<content type='text'>
commit 151f66bb618d1fd0eeb84acb61b4a9fa5d8bb0fa upstream.

Xiao reported that lvm2 test lvconvert-raid-takeover.sh can hang with
small possibility, the root cause is exactly the same as commit
bed9e27baf52 ("Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"")

However, Dan reported another hang after that, and junxiao investigated
the problem and found out that this is caused by plugged bio can't issue
from raid5d().

Current implementation in raid5d() has a weird dependence:

1) md_check_recovery() from raid5d() must hold 'reconfig_mutex' to clear
   MD_SB_CHANGE_PENDING;
2) raid5d() handles IO in a deadloop, until all IO are issued;
3) IO from raid5d() must wait for MD_SB_CHANGE_PENDING to be cleared;

This behaviour is introduce before v2.6, and for consequence, if other
context hold 'reconfig_mutex', and md_check_recovery() can't update
super_block, then raid5d() will waste one cpu 100% by the deadloop, until
'reconfig_mutex' is released.

Refer to the implementation from raid1 and raid10, fix this problem by
skipping issue IO if MD_SB_CHANGE_PENDING is still set after
md_check_recovery(), daemon thread will be woken up when 'reconfig_mutex'
is released. Meanwhile, the hang problem will be fixed as well.

Fixes: 5e2cf333b7bd ("md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d")
Cc: stable@vger.kernel.org # v5.19+
Reported-and-tested-by: Dan Moulding &lt;dan@danm.net&gt;
Closes: https://lore.kernel.org/all/20240123005700.9302-1-dan@danm.net/
Investigated-by: Junxiao Bi &lt;junxiao.bi@oracle.com&gt;
Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Link: https://lore.kernel.org/r/20240322081005.1112401-1-yukuai1@huaweicloud.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bcache: fix variable length array abuse in btree_iter</title>
<updated>2024-06-16T11:50:55Z</updated>
<author>
<name>Matthew Mirvish</name>
<email>matthew@mm12.xyz</email>
</author>
<published>2024-05-09T01:11:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0c31344e22dd8d6b1394c6e4c41d639015bdc671'/>
<id>urn:sha1:0c31344e22dd8d6b1394c6e4c41d639015bdc671</id>
<content type='text'>
commit 3a861560ccb35f2a4f0a4b8207fa7c2a35fc7f31 upstream.

btree_iter is used in two ways: either allocated on the stack with a
fixed size MAX_BSETS, or from a mempool with a dynamic size based on the
specific cache set. Previously, the struct had a fixed-length array of
size MAX_BSETS which was indexed out-of-bounds for the dynamically-sized
iterators, which causes UBSAN to complain.

This patch uses the same approach as in bcachefs's sort_iter and splits
the iterator into a btree_iter with a flexible array member and a
btree_iter_stack which embeds a btree_iter as well as a fixed-length
data array.

Cc: stable@vger.kernel.org
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2039368
Signed-off-by: Matthew Mirvish &lt;matthew@mm12.xyz&gt;
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Link: https://lore.kernel.org/r/20240509011117.2697-3-colyli@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>dm-delay: fix max_delay calculations</title>
<updated>2024-05-30T07:44:35Z</updated>
<author>
<name>Benjamin Marzinski</name>
<email>bmarzins@redhat.com</email>
</author>
<published>2024-05-06T21:55:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ff499d5a3fb15db07788c4d4d3e4199f21c78760'/>
<id>urn:sha1:ff499d5a3fb15db07788c4d4d3e4199f21c78760</id>
<content type='text'>
[ Upstream commit 64eb88d6caee2c8eb806a68dab3f184f14f818a4 ]

delay_ctr() pointlessly compared max_delay in cases where multiple delay
classes were initialized identically. Also, when write delays were
configured different than read delays, delay_ctr() never compared their
value against max_delay. Fix these issues.

Fixes: 70bbeb29fab0 ("dm delay: for short delays, use kthread instead of timers and wq")
Signed-off-by: Benjamin Marzinski &lt;bmarzins@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>dm-delay: fix hung task introduced by kthread mode</title>
<updated>2024-05-30T07:44:35Z</updated>
<author>
<name>Joel Colledge</name>
<email>joel.colledge@linbit.com</email>
</author>
<published>2024-05-06T07:25:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=aa9d9cbc69d60dc7505331354f99458537df278e'/>
<id>urn:sha1:aa9d9cbc69d60dc7505331354f99458537df278e</id>
<content type='text'>
[ Upstream commit d14646f23300a5fc85be867bafdc0702c2002789 ]

If the worker thread is not woken due to a bio, then it is not woken at
all. This causes the hung task check to trigger. This occurs, for
instance, when no bios are submitted. Also when a delay of 0 is
configured, delay_bio() returns without waking the worker.

Prevent the hung task check from triggering by creating the thread with
kthread_run() instead of using kthread_create() directly.

Fixes: 70bbeb29fab0 ("dm delay: for short delays, use kthread instead of timers and wq")
Signed-off-by: Joel Colledge &lt;joel.colledge@linbit.com&gt;
Reviewed-by: Benjamin Marzinski &lt;bmarzins@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>dm-delay: fix workqueue delay_timer race</title>
<updated>2024-05-30T07:44:35Z</updated>
<author>
<name>Benjamin Marzinski</name>
<email>bmarzins@redhat.com</email>
</author>
<published>2024-05-07T21:16:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d23afe03255dc5e3648c7da04da86237c961d740'/>
<id>urn:sha1:d23afe03255dc5e3648c7da04da86237c961d740</id>
<content type='text'>
[ Upstream commit 8d24790ed08ab4e619ce58ed4a1b353ab77ffdc5 ]

delay_timer could be pending when delay_dtr() is called. It needs to be
shut down before kdelayd_wq is destroyed, so it won't try queueing more
work to kdelayd_wq while that's getting destroyed.

Also the del_timer_sync() call in delay_presuspend() doesn't protect
against the timer getting immediately rearmed by the queued call to
flush_delayed_bios(), but there's no real harm if that does happen.
timer_delete() is less work, and is basically just as likely to stop a
pointless call to flush_delayed_bios().

Fixes: 26b9f228703f ("dm: delay target")
Signed-off-by: Benjamin Marzinski &lt;bmarzins@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>md: fix resync softlockup when bitmap size is less than array size</title>
<updated>2024-05-30T07:44:10Z</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2024-04-22T06:58:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8bbc71315e0ae4bb7e37f8d43b915e1cb01a481b'/>
<id>urn:sha1:8bbc71315e0ae4bb7e37f8d43b915e1cb01a481b</id>
<content type='text'>
[ Upstream commit f0e729af2eb6bee9eb58c4df1087f14ebaefe26b ]

Is is reported that for dm-raid10, lvextend + lvchange --syncaction will
trigger following softlockup:

kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [mdX_resync:6976]
CPU: 7 PID: 3588 Comm: mdX_resync Kdump: loaded Not tainted 6.9.0-rc4-next-20240419 #1
RIP: 0010:_raw_spin_unlock_irq+0x13/0x30
Call Trace:
 &lt;TASK&gt;
 md_bitmap_start_sync+0x6b/0xf0
 raid10_sync_request+0x25c/0x1b40 [raid10]
 md_do_sync+0x64b/0x1020
 md_thread+0xa7/0x170
 kthread+0xcf/0x100
 ret_from_fork+0x30/0x50
 ret_from_fork_asm+0x1a/0x30

And the detailed process is as follows:

md_do_sync
 j = mddev-&gt;resync_min
 while (j &lt; max_sectors)
  sectors = raid10_sync_request(mddev, j, &amp;skipped)
   if (!md_bitmap_start_sync(..., &amp;sync_blocks))
    // md_bitmap_start_sync set sync_blocks to 0
    return sync_blocks + sectors_skippe;
  // sectors = 0;
  j += sectors;
  // j never change

Root cause is that commit 301867b1c168 ("md/raid10: check
slab-out-of-bounds in md_bitmap_get_counter") return early from
md_bitmap_get_counter(), without setting returned blocks.

Fix this problem by always set returned blocks from
md_bitmap_get_counter"(), as it used to be.

Noted that this patch just fix the softlockup problem in kernel, the
case that bitmap size doesn't match array size still need to be fixed.

Fixes: 301867b1c168 ("md/raid10: check slab-out-of-bounds in md_bitmap_get_counter")
Reported-and-tested-by: Nigel Croxon &lt;ncroxon@redhat.com&gt;
Closes: https://lore.kernel.org/all/71ba5272-ab07-43ba-8232-d2da642acb4e@redhat.com/
Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Link: https://lore.kernel.org/r/20240422065824.2516-1-yukuai1@huaweicloud.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-6.9/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm</title>
<updated>2024-04-26T18:17:24Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-04-26T18:17:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=08f0677dfc1a0e4648eca650be5b32f1a40e93ad'/>
<id>urn:sha1:08f0677dfc1a0e4648eca650be5b32f1a40e93ad</id>
<content type='text'>
Pull device mapper fixes from Mike Snitzer:

 - Fix 6.9 regression so that DM device removal is performed
   synchronously by default.

   Asynchronous removal has always been possible but it isn't the
   default. It is important that synchronous removal be preserved,
   otherwise it is an interface change that breaks lvm2.

 - Remove errant semicolon in drivers/md/dm-vdo/murmurhash3.c

* tag 'for-6.9/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: restore synchronous close of device mapper block device
  dm vdo murmurhash: remove unneeded semicolon
</content>
</entry>
<entry>
<title>dm: restore synchronous close of device mapper block device</title>
<updated>2024-04-16T15:32:07Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2024-04-16T00:56:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=48ef0ba12e6b77a1ce5d09c580c38855b090ae7c'/>
<id>urn:sha1:48ef0ba12e6b77a1ce5d09c580c38855b090ae7c</id>
<content type='text'>
'dmsetup remove' and 'dmsetup remove_all' require synchronous bdev
release. Otherwise dm_lock_for_deletion() may return -EBUSY if the open
count is &gt; 0, because the open count is dropped in dm_blk_close()
which occurs after fput() completes.

So if dm_blk_close() is delayed because of asynchronous fput(), this
device mapper device is skipped during remove, which is a regression.

Fix the issue by using __fput_sync().

Also, DM device removal has long supported being made asynchronous by
setting the DMF_DEFERRED_REMOVE flag on the DM device. So leverage
using async fput() in close_table_device() if DMF_DEFERRED_REMOVE flag
is set.

Reported-by: Zhong Changhui &lt;czhong@redhat.com&gt;
Fixes: a28d893eb327 ("md: port block device access to file")
Suggested-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
[snitzer: editted commit header, use fput() if DMF_DEFERRED_REMOVE set]
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
</content>
</entry>
<entry>
<title>dm vdo murmurhash: remove unneeded semicolon</title>
<updated>2024-04-10T15:46:40Z</updated>
<author>
<name>Matthew Sakai</name>
<email>msakai@redhat.com</email>
</author>
<published>2024-04-05T19:26:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f141dde5dc51ecab18e8b12b76eb416cda0d6798'/>
<id>urn:sha1:f141dde5dc51ecab18e8b12b76eb416cda0d6798</id>
<content type='text'>
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202404050327.4ebVLBD3-lkp@intel.com/
Signed-off-by: Matthew Sakai &lt;msakai@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'md-6.9-20240408' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.9</title>
<updated>2024-04-09T03:49:27Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2024-04-09T03:49:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=013ee5a6234d4c574dedd60c4887a4bcc9ecc749'/>
<id>urn:sha1:013ee5a6234d4c574dedd60c4887a4bcc9ecc749</id>
<content type='text'>
Pull MD fix from Song:

"This change, by Yu Kuai, fixes a UAF in a corner case."

* tag 'md-6.9-20240408' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  raid1: fix use-after-free for original bio in raid1_write_request()
</content>
</entry>
</feed>
