<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/drivers/md, branch v6.1.72</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.1.72</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.1.72'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2024-01-01T12:39:07Z</updated>
<entry>
<title>dm-integrity: don't modify bio's immutable bio_vec in integrity_metadata()</title>
<updated>2024-01-01T12:39:07Z</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2023-12-05T15:39:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7f7efa30fa87e60ae43574edf0ff1d63bd1dcee2'/>
<id>urn:sha1:7f7efa30fa87e60ae43574edf0ff1d63bd1dcee2</id>
<content type='text'>
commit b86f4b790c998afdbc88fe1aa55cfe89c4068726 upstream.

__bio_for_each_segment assumes that the first struct bio_vec argument
doesn't change - it calls "bio_advance_iter_single((bio), &amp;(iter),
(bvl).bv_len)" to advance the iterator. Unfortunately, the dm-integrity
code changes the bio_vec with "bv.bv_len -= pos". When this code path
is taken, the iterator would be out of sync and dm-integrity would
report errors. This happens if the machine is out of memory and
"kmalloc" fails.

Fix this bug by making a copy of "bv" and changing the copy instead.

Fixes: 7eada909bfd7 ("dm: add integrity target")
Cc: stable@vger.kernel.org	# v4.12+
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>dm thin metadata: Fix ABBA deadlock by resetting dm_bufio_client</title>
<updated>2024-01-01T12:39:05Z</updated>
<author>
<name>Li Lingfeng</name>
<email>lilingfeng3@huawei.com</email>
</author>
<published>2023-06-05T07:03:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=93da3d8af9ee2ae6c93badd48539aafba3251a01'/>
<id>urn:sha1:93da3d8af9ee2ae6c93badd48539aafba3251a01</id>
<content type='text'>
[ Upstream commit d48300120627a1cb98914738fff38b424625b8ad ]

As described in commit 8111964f1b85 ("dm thin: Fix ABBA deadlock between
shrink_slab and dm_pool_abort_metadata"), ABBA deadlocks will be
triggered because shrinker_rwsem currently needs to held by
dm_pool_abort_metadata() as a side-effect of thin-pool metadata
operation failure.

The following three problem scenarios have been noticed:

1) Described by commit 8111964f1b85 ("dm thin: Fix ABBA deadlock between
   shrink_slab and dm_pool_abort_metadata")

2) shrinker_rwsem and throttle-&gt;lock
          P1(drop cache)                        P2(kworker)
drop_caches_sysctl_handler
 drop_slab
  shrink_slab
   down_read(&amp;shrinker_rwsem)  - LOCK A
   do_shrink_slab
    super_cache_scan
     prune_icache_sb
      dispose_list
       evict
        ext4_evict_inode
         ext4_clear_inode
          ext4_discard_preallocations
           ext4_mb_load_buddy_gfp
            ext4_mb_init_cache
             ext4_wait_block_bitmap
              __ext4_error
               ext4_handle_error
                ext4_commit_super
                 ...
                 dm_submit_bio
                                     do_worker
                                      throttle_work_update
                                       down_write(&amp;t-&gt;lock) -- LOCK B
                                      process_deferred_bios
                                       commit
                                        metadata_operation_failed
                                         dm_pool_abort_metadata
                                          dm_block_manager_create
                                           dm_bufio_client_create
                                            register_shrinker
                                             down_write(&amp;shrinker_rwsem)
                                             -- LOCK A
                 thin_map
                  thin_bio_map
                   thin_defer_bio_with_throttle
                    throttle_lock
                     down_read(&amp;t-&gt;lock)  - LOCK B

3) shrinker_rwsem and wait_on_buffer
          P1(drop cache)                            P2(kworker)
drop_caches_sysctl_handler
 drop_slab
  shrink_slab
   down_read(&amp;shrinker_rwsem)  - LOCK A
   do_shrink_slab
   ...
    ext4_wait_block_bitmap
     __ext4_error
      ext4_handle_error
       jbd2_journal_abort
        jbd2_journal_update_sb_errno
         jbd2_write_superblock
          submit_bh
           // LOCK B
           // RELEASE B
                             do_worker
                              throttle_work_update
                               down_write(&amp;t-&gt;lock) - LOCK B
                              process_deferred_bios
                               process_bio
                               commit
                                metadata_operation_failed
                                 dm_pool_abort_metadata
                                  dm_block_manager_create
                                   dm_bufio_client_create
                                    register_shrinker
                                     register_shrinker_prepared
                                      down_write(&amp;shrinker_rwsem)  - LOCK A
                               bio_endio
      wait_on_buffer
       __wait_on_buffer

Fix these by resetting dm_bufio_client without holding shrinker_rwsem.

Fixes: 8111964f1b85 ("dm thin: Fix ABBA deadlock between shrink_slab and dm_pool_abort_metadata")
Cc: stable@vger.kernel.org
Signed-off-by: Li Lingfeng &lt;lilingfeng3@huawei.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bcache: avoid NULL checking to c-&gt;root in run_cache_set()</title>
<updated>2023-12-20T16:00:22Z</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2023-11-20T05:25:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=02a4b14d17abdf76a88001ad924b44d75b14ef75'/>
<id>urn:sha1:02a4b14d17abdf76a88001ad924b44d75b14ef75</id>
<content type='text'>
[ Upstream commit 3eba5e0b2422aec3c9e79822029599961fdcab97 ]

In run_cache_set() after c-&gt;root returned from bch_btree_node_get(), it
is checked by IS_ERR_OR_NULL(). Indeed it is unncessary to check NULL
because bch_btree_node_get() will not return NULL pointer to caller.

This patch replaces IS_ERR_OR_NULL() by IS_ERR() for the above reason.

Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Link: https://lore.kernel.org/r/20231120052503.6122-11-colyli@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bcache: add code comments for bch_btree_node_get() and __bch_btree_node_alloc()</title>
<updated>2023-12-20T16:00:22Z</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2023-11-20T05:25:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3d3f72efc77dbfceda277bc0487e927d583f0e31'/>
<id>urn:sha1:3d3f72efc77dbfceda277bc0487e927d583f0e31</id>
<content type='text'>
[ Upstream commit 31f5b956a197d4ec25c8a07cb3a2ab69d0c0b82f ]

This patch adds code comments to bch_btree_node_get() and
__bch_btree_node_alloc() that NULL pointer will not be returned and it
is unnecessary to check NULL pointer by the callers of these routines.

Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Link: https://lore.kernel.org/r/20231120052503.6122-10-colyli@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bcache: remove redundant assignment to variable cur_idx</title>
<updated>2023-12-20T16:00:22Z</updated>
<author>
<name>Colin Ian King</name>
<email>colin.i.king@gmail.com</email>
</author>
<published>2023-11-20T05:24:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bc17ec4215e257939a54d1a5fb04187216e2d8f9'/>
<id>urn:sha1:bc17ec4215e257939a54d1a5fb04187216e2d8f9</id>
<content type='text'>
[ Upstream commit be93825f0e6428c2d3f03a6e4d447dc48d33d7ff ]

Variable cur_idx is being initialized with a value that is never read,
it is being re-assigned later in a while-loop. Remove the redundant
assignment. Cleans up clang scan build warning:

drivers/md/bcache/writeback.c:916:2: warning: Value stored to 'cur_idx'
is never read [deadcode.DeadStores]

Signed-off-by: Colin Ian King &lt;colin.i.king@gmail.com&gt;
Reviewed-by: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Link: https://lore.kernel.org/r/20231120052503.6122-4-colyli@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bcache: avoid oversize memory allocation by small stripe_size</title>
<updated>2023-12-20T16:00:22Z</updated>
<author>
<name>Coly Li</name>
<email>colyli@suse.de</email>
</author>
<published>2023-11-20T05:24:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=be0e2a28e06acf4fb83ee6e0c4495be58b8f6070'/>
<id>urn:sha1:be0e2a28e06acf4fb83ee6e0c4495be58b8f6070</id>
<content type='text'>
[ Upstream commit baf8fb7e0e5ec54ea0839f0c534f2cdcd79bea9c ]

Arraies bcache-&gt;stripe_sectors_dirty and bcache-&gt;full_dirty_stripes are
used for dirty data writeback, their sizes are decided by backing device
capacity and stripe size. Larger backing device capacity or smaller
stripe size make these two arraies occupies more dynamic memory space.

Currently bcache-&gt;stripe_size is directly inherited from
queue-&gt;limits.io_opt of underlying storage device. For normal hard
drives, its limits.io_opt is 0, and bcache sets the corresponding
stripe_size to 1TB (1&lt;&lt;31 sectors), it works fine 10+ years. But for
devices do declare value for queue-&gt;limits.io_opt, small stripe_size
(comparing to 1TB) becomes an issue for oversize memory allocations of
bcache-&gt;stripe_sectors_dirty and bcache-&gt;full_dirty_stripes, while the
capacity of hard drives gets much larger in recent decade.

For example a raid5 array assembled by three 20TB hardrives, the raid
device capacity is 40TB with typical 512KB limits.io_opt. After the math
calculation in bcache code, these two arraies will occupy 400MB dynamic
memory. Even worse Andrea Tomassetti reports that a 4KB limits.io_opt is
declared on a new 2TB hard drive, then these two arraies request 2GB and
512MB dynamic memory from kzalloc(). The result is that bcache device
always fails to initialize on his system.

To avoid the oversize memory allocation, bcache-&gt;stripe_size should not
directly inherited by queue-&gt;limits.io_opt from the underlying device.
This patch defines BCH_MIN_STRIPE_SZ (4MB) as minimal bcache stripe size
and set bcache device's stripe size against the declared limits.io_opt
value from the underlying storage device,
- If the declared limits.io_opt &gt; BCH_MIN_STRIPE_SZ, bcache device will
  set its stripe size directly by this limits.io_opt value.
- If the declared limits.io_opt &lt; BCH_MIN_STRIPE_SZ, bcache device will
  set its stripe size by a value multiplying limits.io_opt and euqal or
  large than BCH_MIN_STRIPE_SZ.

Then the minimal stripe size of a bcache device will always be &gt;= 4MB.
For a 40TB raid5 device with 512KB limits.io_opt, memory occupied by
bcache-&gt;stripe_sectors_dirty and bcache-&gt;full_dirty_stripes will be 50MB
in total. For a 2TB hard drive with 4KB limits.io_opt, memory occupied
by these two arraies will be 2.5MB in total.

Such mount of memory allocated for bcache-&gt;stripe_sectors_dirty and
bcache-&gt;full_dirty_stripes is reasonable for most of storage devices.

Reported-by: Andrea Tomassetti &lt;andrea.tomassetti-opensource@devo.com&gt;
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Reviewed-by: Eric Wheeler &lt;bcache@lists.ewheeler.net&gt;
Link: https://lore.kernel.org/r/20231120052503.6122-2-colyli@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>md/raid6: use valid sector values to determine if an I/O should wait on the reshape</title>
<updated>2023-12-13T17:39:21Z</updated>
<author>
<name>David Jeffery</name>
<email>djeffery@redhat.com</email>
</author>
<published>2023-11-28T18:11:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=515d971cd26a40f710490d1566783f9c62b46d61'/>
<id>urn:sha1:515d971cd26a40f710490d1566783f9c62b46d61</id>
<content type='text'>
commit c467e97f079f0019870c314996fae952cc768e82 upstream.

During a reshape or a RAID6 array such as expanding by adding an additional
disk, I/Os to the region of the array which have not yet been reshaped can
stall indefinitely. This is from errors in the stripe_ahead_of_reshape
function causing md to think the I/O is to a region in the actively
undergoing the reshape.

stripe_ahead_of_reshape fails to account for the q disk having a sector
value of 0. By not excluding the q disk from the for loop, raid6 will always
generate a min_sector value of 0, causing a return value which stalls.

The function's max_sector calculation also uses min() when it should use
max(), causing the max_sector value to always be 0. During a backwards
rebuild this can cause the opposite problem where it allows I/O to advance
when it should wait.

Fixing these errors will allow safe I/O to advance in a timely manner and
delay only I/O which is unsafe due to stripes in the middle of undergoing
the reshape.

Fixes: 486f60558607 ("md/raid5: Check all disks in a stripe_head for reshape progress")
Cc: stable@vger.kernel.org # v6.0+
Signed-off-by: David Jeffery &lt;djeffery@redhat.com&gt;
Tested-by: Laurence Oberman &lt;loberman@redhat.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20231128181233.6187-1-djeffery@redhat.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>md: don't leave 'MD_RECOVERY_FROZEN' in error path of md_set_readonly()</title>
<updated>2023-12-13T17:39:17Z</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2023-12-05T09:42:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3c796895b4e24015a3768ac1fb70c85e6b13826f'/>
<id>urn:sha1:3c796895b4e24015a3768ac1fb70c85e6b13826f</id>
<content type='text'>
[ Upstream commit c9f7cb5b2bc968adcdc686c197ed108f47fd8eb0 ]

If md_set_readonly() failed, the array could still be read-write, however
'MD_RECOVERY_FROZEN' could still be set, which leave the array in an
abnormal state that sync or recovery can't continue anymore.
Hence make sure the flag is cleared after md_set_readonly() returns.

Fixes: 88724bfa68be ("md: wait for pending superblock updates before switching to read-only")
Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Acked-by: Xiao Ni &lt;xni@redhat.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20231205094215.1824240-3-yukuai1@huaweicloud.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>md: introduce md_ro_state</title>
<updated>2023-12-13T17:39:16Z</updated>
<author>
<name>Ye Bin</name>
<email>yebin10@huawei.com</email>
</author>
<published>2022-09-20T02:39:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5255ded034227e93f77028613bcde4d32a02c1d9'/>
<id>urn:sha1:5255ded034227e93f77028613bcde4d32a02c1d9</id>
<content type='text'>
[ Upstream commit f97a5528b21eb175d90dce2df9960c8d08e1be82 ]

Introduce md_ro_state for mddev-&gt;ro, so it is easy to understand.

Signed-off-by: Ye Bin &lt;yebin10@huawei.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Stable-dep-of: c9f7cb5b2bc9 ("md: don't leave 'MD_RECOVERY_FROZEN' in error path of md_set_readonly()")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bcache: revert replacing IS_ERR_OR_NULL with IS_ERR</title>
<updated>2023-12-08T07:51:15Z</updated>
<author>
<name>Markus Weippert</name>
<email>markus@gekmihesg.de</email>
</author>
<published>2023-11-24T15:14:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0b48970ce102eb77251bf8b14607f6c6dd19f4ef'/>
<id>urn:sha1:0b48970ce102eb77251bf8b14607f6c6dd19f4ef</id>
<content type='text'>
commit bb6cc253861bd5a7cf8439e2118659696df9619f upstream.

Commit 028ddcac477b ("bcache: Remove unnecessary NULL point check in
node allocations") replaced IS_ERR_OR_NULL by IS_ERR. This leads to a
NULL pointer dereference.

BUG: kernel NULL pointer dereference, address: 0000000000000080
Call Trace:
 ? __die_body.cold+0x1a/0x1f
 ? page_fault_oops+0xd2/0x2b0
 ? exc_page_fault+0x70/0x170
 ? asm_exc_page_fault+0x22/0x30
 ? btree_node_free+0xf/0x160 [bcache]
 ? up_write+0x32/0x60
 btree_gc_coalesce+0x2aa/0x890 [bcache]
 ? bch_extent_bad+0x70/0x170 [bcache]
 btree_gc_recurse+0x130/0x390 [bcache]
 ? btree_gc_mark_node+0x72/0x230 [bcache]
 bch_btree_gc+0x5da/0x600 [bcache]
 ? cpuusage_read+0x10/0x10
 ? bch_btree_gc+0x600/0x600 [bcache]
 bch_gc_thread+0x135/0x180 [bcache]

The relevant code starts with:

    new_nodes[0] = NULL;

    for (i = 0; i &lt; nodes; i++) {
        if (__bch_keylist_realloc(&amp;keylist, bkey_u64s(&amp;r[i].b-&gt;key)))
            goto out_nocoalesce;
    // ...
out_nocoalesce:
    // ...
    for (i = 0; i &lt; nodes; i++)
        if (!IS_ERR(new_nodes[i])) {  // IS_ERR_OR_NULL before
028ddcac477b
            btree_node_free(new_nodes[i]);  // new_nodes[0] is NULL
            rw_unlock(true, new_nodes[i]);
        }

This patch replaces IS_ERR() by IS_ERR_OR_NULL() to fix this.

Fixes: 028ddcac477b ("bcache: Remove unnecessary NULL point check in node allocations")
Link: https://lore.kernel.org/all/3DF4A87A-2AC1-4893-AE5F-E921478419A9@suse.de/
Cc: stable@vger.kernel.org
Cc: Zheng Wang &lt;zyytlz.wz@163.com&gt;
Cc: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Markus Weippert &lt;markus@gekmihesg.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
