<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/drivers/md, branch v6.6.7</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.6.7</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.6.7'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2023-12-13T17:45:25Z</updated>
<entry>
<title>md/raid6: use valid sector values to determine if an I/O should wait on the reshape</title>
<updated>2023-12-13T17:45:25Z</updated>
<author>
<name>David Jeffery</name>
<email>djeffery@redhat.com</email>
</author>
<published>2023-11-28T18:11:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4ce431c297558e30baa9226243a15d818320742b'/>
<id>urn:sha1:4ce431c297558e30baa9226243a15d818320742b</id>
<content type='text'>
commit c467e97f079f0019870c314996fae952cc768e82 upstream.

During a reshape or a RAID6 array such as expanding by adding an additional
disk, I/Os to the region of the array which have not yet been reshaped can
stall indefinitely. This is from errors in the stripe_ahead_of_reshape
function causing md to think the I/O is to a region in the actively
undergoing the reshape.

stripe_ahead_of_reshape fails to account for the q disk having a sector
value of 0. By not excluding the q disk from the for loop, raid6 will always
generate a min_sector value of 0, causing a return value which stalls.

The function's max_sector calculation also uses min() when it should use
max(), causing the max_sector value to always be 0. During a backwards
rebuild this can cause the opposite problem where it allows I/O to advance
when it should wait.

Fixing these errors will allow safe I/O to advance in a timely manner and
delay only I/O which is unsafe due to stripes in the middle of undergoing
the reshape.

Fixes: 486f60558607 ("md/raid5: Check all disks in a stripe_head for reshape progress")
Cc: stable@vger.kernel.org # v6.0+
Signed-off-by: David Jeffery &lt;djeffery@redhat.com&gt;
Tested-by: Laurence Oberman &lt;loberman@redhat.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20231128181233.6187-1-djeffery@redhat.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>md: don't leave 'MD_RECOVERY_FROZEN' in error path of md_set_readonly()</title>
<updated>2023-12-13T17:45:19Z</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2023-12-05T09:42:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=49b79af00d24e5b27049725aa3bc11f4175c3c2c'/>
<id>urn:sha1:49b79af00d24e5b27049725aa3bc11f4175c3c2c</id>
<content type='text'>
[ Upstream commit c9f7cb5b2bc968adcdc686c197ed108f47fd8eb0 ]

If md_set_readonly() failed, the array could still be read-write, however
'MD_RECOVERY_FROZEN' could still be set, which leave the array in an
abnormal state that sync or recovery can't continue anymore.
Hence make sure the flag is cleared after md_set_readonly() returns.

Fixes: 88724bfa68be ("md: wait for pending superblock updates before switching to read-only")
Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Acked-by: Xiao Ni &lt;xni@redhat.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20231205094215.1824240-3-yukuai1@huaweicloud.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>dm-crypt: start allocating with MAX_ORDER</title>
<updated>2023-12-13T17:45:01Z</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2023-11-17T17:38:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=dfa1898cef4c33fefb402d3aec1cca8df146e34c'/>
<id>urn:sha1:dfa1898cef4c33fefb402d3aec1cca8df146e34c</id>
<content type='text'>
[ Upstream commit 13648e04a9b831b3dfa5cf3887dfa6cf8fe5fe69 ]

Commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely")
changed the meaning of MAX_ORDER from exclusive to inclusive. So, we
can allocate compound pages with up to 1 &lt;&lt; MAX_ORDER pages.

Reflect this change in dm-crypt and start trying to allocate compound
pages with MAX_ORDER.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bcache: revert replacing IS_ERR_OR_NULL with IS_ERR</title>
<updated>2023-12-08T07:52:19Z</updated>
<author>
<name>Markus Weippert</name>
<email>markus@gekmihesg.de</email>
</author>
<published>2023-11-24T15:14:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0daf3b0fe4a731f17635d703e0befdbfe9d494de'/>
<id>urn:sha1:0daf3b0fe4a731f17635d703e0befdbfe9d494de</id>
<content type='text'>
commit bb6cc253861bd5a7cf8439e2118659696df9619f upstream.

Commit 028ddcac477b ("bcache: Remove unnecessary NULL point check in
node allocations") replaced IS_ERR_OR_NULL by IS_ERR. This leads to a
NULL pointer dereference.

BUG: kernel NULL pointer dereference, address: 0000000000000080
Call Trace:
 ? __die_body.cold+0x1a/0x1f
 ? page_fault_oops+0xd2/0x2b0
 ? exc_page_fault+0x70/0x170
 ? asm_exc_page_fault+0x22/0x30
 ? btree_node_free+0xf/0x160 [bcache]
 ? up_write+0x32/0x60
 btree_gc_coalesce+0x2aa/0x890 [bcache]
 ? bch_extent_bad+0x70/0x170 [bcache]
 btree_gc_recurse+0x130/0x390 [bcache]
 ? btree_gc_mark_node+0x72/0x230 [bcache]
 bch_btree_gc+0x5da/0x600 [bcache]
 ? cpuusage_read+0x10/0x10
 ? bch_btree_gc+0x600/0x600 [bcache]
 bch_gc_thread+0x135/0x180 [bcache]

The relevant code starts with:

    new_nodes[0] = NULL;

    for (i = 0; i &lt; nodes; i++) {
        if (__bch_keylist_realloc(&amp;keylist, bkey_u64s(&amp;r[i].b-&gt;key)))
            goto out_nocoalesce;
    // ...
out_nocoalesce:
    // ...
    for (i = 0; i &lt; nodes; i++)
        if (!IS_ERR(new_nodes[i])) {  // IS_ERR_OR_NULL before
028ddcac477b
            btree_node_free(new_nodes[i]);  // new_nodes[0] is NULL
            rw_unlock(true, new_nodes[i]);
        }

This patch replaces IS_ERR() by IS_ERR_OR_NULL() to fix this.

Fixes: 028ddcac477b ("bcache: Remove unnecessary NULL point check in node allocations")
Link: https://lore.kernel.org/all/3DF4A87A-2AC1-4893-AE5F-E921478419A9@suse.de/
Cc: stable@vger.kernel.org
Cc: Zheng Wang &lt;zyytlz.wz@163.com&gt;
Cc: Coly Li &lt;colyli@suse.de&gt;
Signed-off-by: Markus Weippert &lt;markus@gekmihesg.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>dm verity: don't perform FEC for failed readahead IO</title>
<updated>2023-12-08T07:52:18Z</updated>
<author>
<name>Wu Bo</name>
<email>bo.wu@vivo.com</email>
</author>
<published>2023-11-22T03:51:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7ec7652671502d99e9774af581b2a99c25fc8ab2'/>
<id>urn:sha1:7ec7652671502d99e9774af581b2a99c25fc8ab2</id>
<content type='text'>
commit 0193e3966ceeeef69e235975918b287ab093082b upstream.

We found an issue under Android OTA scenario that many BIOs have to do
FEC where the data under dm-verity is 100% complete and no corruption.

Android OTA has many dm-block layers, from upper to lower:
dm-verity
dm-snapshot
dm-origin &amp; dm-cow
dm-linear
ufs

DM tables have to change 2 times during Android OTA merging process.
When doing table change, the dm-snapshot will be suspended for a while.
During this interval, many readahead IOs are submitted to dm_verity
from filesystem. Then the kverity works are busy doing FEC process
which cost too much time to finish dm-verity IO. This causes needless
delay which feels like system is hung.

After adding debugging it was found that each readahead IO needed
around 10s to finish when this situation occurred. This is due to IO
amplification:

dm-snapshot suspend
erofs_readahead     // 300+ io is submitted
	dm_submit_bio (dm_verity)
		dm_submit_bio (dm_snapshot)
		bio return EIO
		bio got nothing, it's empty
	verity_end_io
	verity_verify_io
	forloop range(0, io-&gt;n_blocks)    // each io-&gt;nblocks ~= 20
		verity_fec_decode
		fec_decode_rsb
		fec_read_bufs
		forloop range(0, v-&gt;fec-&gt;rsn) // v-&gt;fec-&gt;rsn = 253
			new_read
			submit_bio (dm_snapshot)
		end loop
	end loop
dm-snapshot resume

Readahead BIOs get nothing while dm-snapshot is suspended, so all of
them will cause verity's FEC.
Each readahead BIO needs to verify ~20 (io-&gt;nblocks) blocks.
Each block needs to do FEC, and every block needs to do 253
(v-&gt;fec-&gt;rsn) reads.
So during the suspend interval(~200ms), 300 readahead BIOs trigger
~1518000 (300*20*253) IOs to dm-snapshot.

As readahead IO is not required by userspace, and to fix this issue,
it is best to pass readahead errors to upper layer to handle it.

Cc: stable@vger.kernel.org
Fixes: a739ff3f543a ("dm verity: add support for forward error correction")
Signed-off-by: Wu Bo &lt;bo.wu@vivo.com&gt;
Reviewed-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>dm verity: initialize fec io before freeing it</title>
<updated>2023-12-08T07:52:18Z</updated>
<author>
<name>Wu Bo</name>
<email>bo.wu@vivo.com</email>
</author>
<published>2023-11-22T03:51:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=590df5e4e86551263702cf65f609918a5e485449'/>
<id>urn:sha1:590df5e4e86551263702cf65f609918a5e485449</id>
<content type='text'>
commit 7be05bdfb4efc1396f7692562c7161e2b9f595f1 upstream.

If BIO error, verity_end_io() can call verity_finish_io() before
verity_fec_init_io(). Therefore, fec_io-&gt;rs is not initialized and
may crash when doing memory freeing in verity_fec_finish_io().

Crash call stack:
 die+0x90/0x2b8
 __do_kernel_fault+0x260/0x298
 do_bad_area+0x2c/0xdc
 do_translation_fault+0x3c/0x54
 do_mem_abort+0x54/0x118
 el1_abort+0x38/0x5c
 el1h_64_sync_handler+0x50/0x90
 el1h_64_sync+0x64/0x6c
 free_rs+0x18/0xac
 fec_rs_free+0x10/0x24
 mempool_free+0x58/0x148
 verity_fec_finish_io+0x4c/0xb0
 verity_end_io+0xb8/0x150

Cc: stable@vger.kernel.org      # v6.0+
Fixes: 5721d4e5a9cd ("dm verity: Add optional "try_verify_in_tasklet" feature")
Signed-off-by: Wu Bo &lt;bo.wu@vivo.com&gt;
Reviewed-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>dm-verity: align struct dm_verity_fec_io properly</title>
<updated>2023-12-08T07:52:16Z</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2023-11-28T13:50:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=99277dd29051e2c28d865c5100e355d32803a997'/>
<id>urn:sha1:99277dd29051e2c28d865c5100e355d32803a997</id>
<content type='text'>
commit 38bc1ab135db87577695816b190e7d6d8ec75879 upstream.

dm_verity_fec_io is placed after the end of two hash digests. If the hash
digest has unaligned length, struct dm_verity_fec_io could be unaligned.

This commit fixes the placement of struct dm_verity_fec_io, so that it's
aligned.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Cc: stable@vger.kernel.org
Fixes: a739ff3f543a ("dm verity: add support for forward error correction")
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bcache: fixup lock c-&gt;root error</title>
<updated>2023-12-03T06:33:09Z</updated>
<author>
<name>Mingzhe Zou</name>
<email>mingzhe.zou@easystack.cn</email>
</author>
<published>2023-11-20T05:24:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b163173dbfd7da017426f20f890a0e6aa89ae32f'/>
<id>urn:sha1:b163173dbfd7da017426f20f890a0e6aa89ae32f</id>
<content type='text'>
commit e34820f984512b433ee1fc291417e60c47d56727 upstream.

We had a problem with io hung because it was waiting for c-&gt;root to
release the lock.

crash&gt; cache_set.root -l cache_set.list ffffa03fde4c0050
  root = 0xffff802ef454c800
crash&gt; btree -o 0xffff802ef454c800 | grep rw_semaphore
  [ffff802ef454c858] struct rw_semaphore lock;
crash&gt; struct rw_semaphore ffff802ef454c858
struct rw_semaphore {
  count = {
    counter = -4294967297
  },
  wait_list = {
    next = 0xffff00006786fc28,
    prev = 0xffff00005d0efac8
  },
  wait_lock = {
    raw_lock = {
      {
        val = {
          counter = 0
        },
        {
          locked = 0 '\000',
          pending = 0 '\000'
        },
        {
          locked_pending = 0,
          tail = 0
        }
      }
    }
  },
  osq = {
    tail = {
      counter = 0
    }
  },
  owner = 0xffffa03fdc586603
}

The "counter = -4294967297" means that lock count is -1 and a write lock
is being attempted. Then, we found that there is a btree with a counter
of 1 in btree_cache_freeable.

crash&gt; cache_set -l cache_set.list ffffa03fde4c0050 -o|grep btree_cache
  [ffffa03fde4c1140] struct list_head btree_cache;
  [ffffa03fde4c1150] struct list_head btree_cache_freeable;
  [ffffa03fde4c1160] struct list_head btree_cache_freed;
  [ffffa03fde4c1170] unsigned int btree_cache_used;
  [ffffa03fde4c1178] wait_queue_head_t btree_cache_wait;
  [ffffa03fde4c1190] struct task_struct *btree_cache_alloc_lock;
crash&gt; list -H ffffa03fde4c1140|wc -l
973
crash&gt; list -H ffffa03fde4c1150|wc -l
1123
crash&gt; cache_set.btree_cache_used -l cache_set.list ffffa03fde4c0050
  btree_cache_used = 2097
crash&gt; list -s btree -l btree.list -H ffffa03fde4c1140|grep -E -A2 "^  lock = {" &gt; btree_cache.txt
crash&gt; list -s btree -l btree.list -H ffffa03fde4c1150|grep -E -A2 "^  lock = {" &gt; btree_cache_freeable.txt
[root@node-3 127.0.0.1-2023-08-04-16:40:28]# pwd
/var/crash/127.0.0.1-2023-08-04-16:40:28
[root@node-3 127.0.0.1-2023-08-04-16:40:28]# cat btree_cache.txt|grep counter|grep -v "counter = 0"
[root@node-3 127.0.0.1-2023-08-04-16:40:28]# cat btree_cache_freeable.txt|grep counter|grep -v "counter = 0"
      counter = 1

We found that this is a bug in bch_sectors_dirty_init() when locking c-&gt;root:
    (1). Thread X has locked c-&gt;root(A) write.
    (2). Thread Y failed to lock c-&gt;root(A), waiting for the lock(c-&gt;root A).
    (3). Thread X bch_btree_set_root() changes c-&gt;root from A to B.
    (4). Thread X releases the lock(c-&gt;root A).
    (5). Thread Y successfully locks c-&gt;root(A).
    (6). Thread Y releases the lock(c-&gt;root B).

        down_write locked ---(1)----------------------┐
                |                                     |
                |   down_read waiting ---(2)----┐     |
                |           |               ┌-------------┐ ┌-------------┐
        bch_btree_set_root ===(3)========&gt;&gt; | c-&gt;root   A | | c-&gt;root   B |
                |           |               └-------------┘ └-------------┘
            up_write ---(4)---------------------┘     |            |
                            |                         |            |
                    down_read locked ---(5)-----------┘            |
                            |                                      |
                        up_read ---(6)-----------------------------┘

Since c-&gt;root may change, the correct steps to lock c-&gt;root should be
the same as bch_root_usage(), compare after locking.

static unsigned int bch_root_usage(struct cache_set *c)
{
        unsigned int bytes = 0;
        struct bkey *k;
        struct btree *b;
        struct btree_iter iter;

        goto lock_root;

        do {
                rw_unlock(false, b);
lock_root:
                b = c-&gt;root;
                rw_lock(false, b, b-&gt;level);
        } while (b != c-&gt;root);

        for_each_key_filter(&amp;b-&gt;keys, k, &amp;iter, bch_ptr_bad)
                bytes += bkey_bytes(k);

        rw_unlock(false, b);

        return (bytes * 100) / btree_bytes(c);
}

Fixes: b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be multithreaded")
Signed-off-by: Mingzhe Zou &lt;mingzhe.zou@easystack.cn&gt;
Cc:  &lt;stable@vger.kernel.org&gt;
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Link: https://lore.kernel.org/r/20231120052503.6122-7-colyli@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bcache: fixup init dirty data errors</title>
<updated>2023-12-03T06:33:09Z</updated>
<author>
<name>Mingzhe Zou</name>
<email>mingzhe.zou@easystack.cn</email>
</author>
<published>2023-11-20T05:24:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=19aff881b80e457f413a0405857495f8bc4e64fa'/>
<id>urn:sha1:19aff881b80e457f413a0405857495f8bc4e64fa</id>
<content type='text'>
commit 7cc47e64d3d69786a2711a4767e26b26ba63d7ed upstream.

We found that after long run, the dirty_data of the bcache device
will have errors. This error cannot be eliminated unless re-register.

We also found that reattach after detach, this error can accumulate.

In bch_sectors_dirty_init(), all inode &lt;= d-&gt;id keys will be recounted
again. This is wrong, we only need to count the keys of the current
device.

Fixes: b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be multithreaded")
Signed-off-by: Mingzhe Zou &lt;mingzhe.zou@easystack.cn&gt;
Cc:  &lt;stable@vger.kernel.org&gt;
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Link: https://lore.kernel.org/r/20231120052503.6122-6-colyli@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bcache: prevent potential division by zero error</title>
<updated>2023-12-03T06:33:09Z</updated>
<author>
<name>Rand Deeb</name>
<email>rand.sec96@gmail.com</email>
</author>
<published>2023-11-20T05:24:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8ff253108ede353bfa9071a89c7e4b69e096a8b4'/>
<id>urn:sha1:8ff253108ede353bfa9071a89c7e4b69e096a8b4</id>
<content type='text'>
commit 2c7f497ac274a14330208b18f6f734000868ebf9 upstream.

In SHOW(), the variable 'n' is of type 'size_t.' While there is a
conditional check to verify that 'n' is not equal to zero before
executing the 'do_div' macro, concerns arise regarding potential
division by zero error in 64-bit environments.

The concern arises when 'n' is 64 bits in size, greater than zero, and
the lower 32 bits of it are zeros. In such cases, the conditional check
passes because 'n' is non-zero, but the 'do_div' macro casts 'n' to
'uint32_t,' effectively truncating it to its lower 32 bits.
Consequently, the 'n' value becomes zero.

To fix this potential division by zero error and ensure precise
division handling, this commit replaces the 'do_div' macro with
div64_u64(). div64_u64() is designed to work with 64-bit operands,
guaranteeing that division is performed correctly.

This change enhances the robustness of the code, ensuring that division
operations yield accurate results in all scenarios, eliminating the
possibility of division by zero, and improving compatibility across
different 64-bit environments.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Signed-off-by: Rand Deeb &lt;rand.sec96@gmail.com&gt;
Cc:  &lt;stable@vger.kernel.org&gt;
Signed-off-by: Coly Li &lt;colyli@suse.de&gt;
Link: https://lore.kernel.org/r/20231120052503.6122-5-colyli@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
