<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/block, branch v6.16.2</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.16.2</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.16.2'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2025-08-20T16:41:35Z</updated>
<entry>
<title>block: Introduce bio_needs_zone_write_plugging()</title>
<updated>2025-08-20T16:41:35Z</updated>
<author>
<name>Damien Le Moal</name>
<email>dlemoal@kernel.org</email>
</author>
<published>2025-06-25T09:33:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5486888fbd0aa39a9ebebccf265893d1dfc1fe85'/>
<id>urn:sha1:5486888fbd0aa39a9ebebccf265893d1dfc1fe85</id>
<content type='text'>
commit f70291411ba20d50008db90a6f0731efac27872c upstream.

In preparation for fixing device mapper zone write handling, introduce
the inline helper function bio_needs_zone_write_plugging() to test if a
BIO requires handling through zone write plugging using the function
blk_zone_plug_bio(). This function returns true for any write
(op_is_write(bio) == true) operation directed at a zoned block device
using zone write plugging, that is, a block device with a disk that has
a zone write plug hash table.

This helper allows simplifying the check on entry to blk_zone_plug_bio()
and used in to protect calls to it for blk-mq devices and DM devices.

Fixes: f211268ed1f9 ("dm: Use the block layer zone append emulation")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20250625093327.548866-3-dlemoal@kernel.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>block: fix kobject double initialization in add_disk</title>
<updated>2025-08-20T16:41:33Z</updated>
<author>
<name>Zheng Qixing</name>
<email>zhengqixing@huawei.com</email>
</author>
<published>2025-08-08T05:36:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1090a272cf334a769101f9a084fab4d14a34cdc4'/>
<id>urn:sha1:1090a272cf334a769101f9a084fab4d14a34cdc4</id>
<content type='text'>
[ Upstream commit 343dc5423bfe876c12bb80c56f5e44286e442a07 ]

Device-mapper can call add_disk() multiple times for the same gendisk
due to its two-phase creation process (dm create + dm load). This leads
to kobject double initialization errors when the underlying iSCSI devices
become temporarily unavailable and then reappear.

However, if the first add_disk() call fails and is retried, the queue_kobj
gets initialized twice, causing:

kobject: kobject (ffff88810c27bb90): tried to init an initialized object,
something is seriously wrong.
 Call Trace:
  &lt;TASK&gt;
  dump_stack_lvl+0x5b/0x80
  kobject_init.cold+0x43/0x51
  blk_register_queue+0x46/0x280
  add_disk_fwnode+0xb5/0x280
  dm_setup_md_queue+0x194/0x1c0
  table_load+0x297/0x2d0
  ctl_ioctl+0x2a2/0x480
  dm_ctl_ioctl+0xe/0x20
  __x64_sys_ioctl+0xc7/0x110
  do_syscall_64+0x72/0x390
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fix this by separating kobject initialization from sysfs registration:
 - Initialize queue_kobj early during gendisk allocation
 - add_disk() only adds the already-initialized kobject to sysfs
 - del_gendisk() removes from sysfs but doesn't destroy the kobject
 - Final cleanup happens when the disk is released

Fixes: 2bd85221a625 ("block: untangle request_queue refcounting from sysfs")
Reported-by: Li Lingfeng &lt;lilingfeng3@huawei.com&gt;
Closes: https://lore.kernel.org/all/83591d0b-2467-433c-bce0-5581298eb161@huawei.com/
Signed-off-by: Zheng Qixing &lt;zhengqixing@huawei.com&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Reviewed-by: Nilay Shroff &lt;nilay@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/20250808053609.3237836-1-zhengqixing@huaweicloud.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: avoid possible overflow for chunk_sectors check in blk_stack_limits()</title>
<updated>2025-08-20T16:41:32Z</updated>
<author>
<name>John Garry</name>
<email>john.g.garry@oracle.com</email>
</author>
<published>2025-07-29T09:14:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=14beeef4aafecc8a41de534e31fb5be94739392f'/>
<id>urn:sha1:14beeef4aafecc8a41de534e31fb5be94739392f</id>
<content type='text'>
[ Upstream commit 448dfecc7ff807822ecd47a5c052acedca7d09e8 ]

In blk_stack_limits(), we check that the t-&gt;chunk_sectors value is a
multiple of the t-&gt;physical_block_size value.

However, by finding the chunk_sectors value in bytes, we may overflow
the unsigned int which holds chunk_sectors, so change the check to be
based on sectors.

Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: John Garry &lt;john.g.garry@oracle.com&gt;
Reviewed-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Link: https://lore.kernel.org/r/20250729091448.1691334-2-john.g.garry@oracle.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>lib/sbitmap: convert shallow_depth from one word to the whole sbitmap</title>
<updated>2025-08-20T16:41:31Z</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2025-08-07T03:24:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=537287e89705a77358e00547d304c7848e618c7f'/>
<id>urn:sha1:537287e89705a77358e00547d304c7848e618c7f</id>
<content type='text'>
[ Upstream commit 42e6c6ce03fd3e41e39a0f93f9b1a1d9fa664338 ]

Currently elevators will record internal 'async_depth' to throttle
asynchronous requests, and they both calculate shallow_dpeth based on
sb-&gt;shift, with the respect that sb-&gt;shift is the available tags in one
word.

However, sb-&gt;shift is not the availbale tags in the last word, see
__map_depth:

if (index == sb-&gt;map_nr - 1)
  return sb-&gt;depth - (index &lt;&lt; sb-&gt;shift);

For consequence, if the last word is used, more tags can be get than
expected, for example, assume nr_requests=256 and there are four words,
in the worst case if user set nr_requests=32, then the first word is
the last word, and still use bits per word, which is 64, to calculate
async_depth is wrong.

One the ohter hand, due to cgroup qos, bfq can allow only one request
to be allocated, and set shallow_dpeth=1 will still allow the number
of words request to be allocated.

Fix this problems by using shallow_depth to the whole sbitmap instead
of per word, also change kyber, mq-deadline and bfq to follow this,
a new helper __map_depth_with_shallow() is introduced to calculate
available bits in each word.

Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Link: https://lore.kernel.org/r/20250807032413.1469456-2-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: ensure discard_granularity is zero when discard is not supported</title>
<updated>2025-08-15T14:39:24Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2025-07-31T15:22:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4632f6a75543004712441221778f7fb2b48b41a3'/>
<id>urn:sha1:4632f6a75543004712441221778f7fb2b48b41a3</id>
<content type='text'>
[ Upstream commit fad6551fcf537375702b9af012508156a16a1ff7 ]

Documentation/ABI/stable/sysfs-block states:

  What: /sys/block/&lt;disk&gt;/queue/discard_granularity
  [...]
  A discard_granularity of 0 means that the device does not support
  discard functionality.

but this got broken when sorting out the block limits updates.  Fix this
by setting the discard_granularity limit to zero when the combined
max_discard_sectors is zero.

Fixes: 3c407dc723bb ("block: default the discard granularity to sector size")
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Link: https://lore.kernel.org/r/20250731152228.873923-1-hch@lst.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: restore two stage elevator switch while running nr_hw_queue update</title>
<updated>2025-08-15T14:38:24Z</updated>
<author>
<name>Nilay Shroff</name>
<email>nilay@linux.ibm.com</email>
</author>
<published>2025-07-24T10:01:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=aea08fc350f1e1d4e7b89c8f5a5ff4e2f659f86d'/>
<id>urn:sha1:aea08fc350f1e1d4e7b89c8f5a5ff4e2f659f86d</id>
<content type='text'>
[ Upstream commit 5989bfe6ac6bf230c2c84e118c786be0ed4be3f4 ]

The kmemleak reports memory leaks related to elevator resources that
were originally allocated in the -&gt;init_hctx() method. The following
leak traces are observed after running blktests block/040:

unreferenced object 0xffff8881b82f7400 (size 512):
  comm "check", pid 68454, jiffies 4310588881
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace (crc 5bac8b34):
    __kvmalloc_node_noprof+0x55d/0x7a0
    sbitmap_init_node+0x15a/0x6a0
    kyber_init_hctx+0x316/0xb90
    blk_mq_init_sched+0x419/0x580
    elevator_switch+0x18b/0x630
    elv_update_nr_hw_queues+0x219/0x2c0
    __blk_mq_update_nr_hw_queues+0x36a/0x6f0
    blk_mq_update_nr_hw_queues+0x3a/0x60
    0xffffffffc09ceb80
    0xffffffffc09d7e0b
    configfs_write_iter+0x2b1/0x470
    vfs_write+0x527/0xe70
    ksys_write+0xff/0x200
    do_syscall_64+0x98/0x3c0
    entry_SYSCALL_64_after_hwframe+0x76/0x7e
unreferenced object 0xffff8881b82f6000 (size 512):
  comm "check", pid 68454, jiffies 4310588881
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace (crc 5bac8b34):
    __kvmalloc_node_noprof+0x55d/0x7a0
    sbitmap_init_node+0x15a/0x6a0
    kyber_init_hctx+0x316/0xb90
    blk_mq_init_sched+0x419/0x580
    elevator_switch+0x18b/0x630
    elv_update_nr_hw_queues+0x219/0x2c0
    __blk_mq_update_nr_hw_queues+0x36a/0x6f0
    blk_mq_update_nr_hw_queues+0x3a/0x60
    0xffffffffc09ceb80
    0xffffffffc09d7e0b
    configfs_write_iter+0x2b1/0x470
    vfs_write+0x527/0xe70
    ksys_write+0xff/0x200
    do_syscall_64+0x98/0x3c0
    entry_SYSCALL_64_after_hwframe+0x76/0x7e
unreferenced object 0xffff8881b82f5800 (size 512):
  comm "check", pid 68454, jiffies 4310588881
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace (crc 5bac8b34):
    __kvmalloc_node_noprof+0x55d/0x7a0
    sbitmap_init_node+0x15a/0x6a0
    kyber_init_hctx+0x316/0xb90
    blk_mq_init_sched+0x419/0x580
    elevator_switch+0x18b/0x630
    elv_update_nr_hw_queues+0x219/0x2c0
    __blk_mq_update_nr_hw_queues+0x36a/0x6f0
    blk_mq_update_nr_hw_queues+0x3a/0x60
    0xffffffffc09ceb80
    0xffffffffc09d7e0b
    configfs_write_iter+0x2b1/0x470
    vfs_write+0x527/0xe70

    ksys_write+0xff/0x200
    do_syscall_64+0x98/0x3c0
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

The issue arises while we run nr_hw_queue update,  Specifically, we first
reallocate hardware contexts (hctx) via __blk_mq_realloc_hw_ctxs(), and
then later invoke elevator_switch() (assuming q-&gt;elevator is not NULL).
The elevator switch code would first exit old elevator (elevator_exit)
and then switches to the new elevator. The elevator_exit loops through
each hctx and invokes the elevator’s per-hctx exit method -&gt;exit_hctx(),
which releases resources allocated during -&gt;init_hctx().

This memleak manifests when we reduce the num of h/w queues - for example,
when the initial update sets the number of queues to X, and a later update
reduces it to Y, where Y &lt; X. In this case, we'd loose the access to old
hctxs while we get to elevator exit code because __blk_mq_realloc_hw_ctxs
would have already released the old hctxs. As we don't now have any
reference left to the old hctxs, we don't have any way to free the
scheduler resources (which are allocate in -&gt;init_hctx()) and kmemleak
complains about it.

This issue was caused due to the commit 596dce110b7d ("block: simplify
elevator reattachment for updating nr_hw_queues"). That change unified
the two-stage elevator teardown and reattachment into a single call that
occurs after __blk_mq_realloc_hw_ctxs() has already freed the hctxs.

This patch restores the previous two-stage elevator switch logic during
nr_hw_queues updates. First, the elevator is switched to 'none', which
ensures all scheduler resources are properly freed. Then, the hardware
contexts (hctxs) are reallocated, and the software-to-hardware queue
mappings are updated. Finally, the original elevator is reattached. This
sequence prevents loss of references to old hctxs and avoids the scheduler
resource leaks reported by kmemleak.

Reported-by : Yi Zhang &lt;yi.zhang@redhat.com&gt;

Fixes: 596dce110b7d ("block: simplify elevator reattachment for updating nr_hw_queues")
Closes: https://lore.kernel.org/all/CAHj4cs8oJFvz=daCvjHM5dYCNQH4UXwSySPPU4v-WHce_kZXZA@mail.gmail.com/
Signed-off-by: Nilay Shroff &lt;nilay@linux.ibm.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Link: https://lore.kernel.org/r/20250724102540.1366308-1-nilay@linux.ibm.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: sanitize chunk_sectors for atomic write limits</title>
<updated>2025-08-15T14:38:23Z</updated>
<author>
<name>John Garry</name>
<email>john.g.garry@oracle.com</email>
</author>
<published>2025-07-11T10:52:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c7ffc33443e6ef4462eb55745cc4b7b06b5f7345'/>
<id>urn:sha1:c7ffc33443e6ef4462eb55745cc4b7b06b5f7345</id>
<content type='text'>
[ Upstream commit 1de67e8e28fc47d71ee06ffa0185da549b378ffb ]

Currently we just ensure that a non-zero value in chunk_sectors aligns
with any atomic write boundary, as the blk boundary functionality uses
both these values.

However it is also improper to have atomic write unit max &gt; chunk_sectors
(for non-zero chunk_sectors), as this would lead to splitting of atomic
write bios (which is disallowed).

Sanitize atomic write unit max against chunk_sectors to avoid any
potential problems.

Fixes: d00eea91deaf3 ("block: Add extra checks in blk_validate_atomic_write_limits()")
Reviewed-by: Nilay Shroff &lt;nilay@linux.ibm.com&gt;
Signed-off-by: John Garry &lt;john.g.garry@oracle.com&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Link: https://lore.kernel.org/r/20250711105258.3135198-3-john.g.garry@oracle.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'block-6.16-20250725' of git://git.kernel.dk/linux</title>
<updated>2025-07-25T15:05:17Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-07-25T15:05:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=327579671a9becc43369f7c0637febe7e03d4003'/>
<id>urn:sha1:327579671a9becc43369f7c0637febe7e03d4003</id>
<content type='text'>
Pull block fix from Jens Axboe:
 "Just a single fix for regression in this release, where a module
  reference could be leaked"

* tag 'block-6.16-20250725' of git://git.kernel.dk/linux:
  block: fix module reference leak in mq-deadline I/O scheduler
</content>
</entry>
<entry>
<title>block: fix module reference leak in mq-deadline I/O scheduler</title>
<updated>2025-07-20T19:18:13Z</updated>
<author>
<name>Nilay Shroff</name>
<email>nilay@linux.ibm.com</email>
</author>
<published>2025-07-19T13:26:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1966554b2e82b89d4f6490f430ce76a379e23f1f'/>
<id>urn:sha1:1966554b2e82b89d4f6490f430ce76a379e23f1f</id>
<content type='text'>
During probe, when the block layer registers a request queue, it
defaults to the mq-deadline I/O scheduler if the device is single-queue
and the mq-deadline module is available. To determine availability, the
elevator_set_default() invokes elevator_find_get(), which increments the
module's reference count. However, this reference is never released,
resulting in a module reference leak that prevents the mq-deadline module
from being unloaded.

This patch fixes the issue by ensuring the acquired module reference is
properly released.

Fixes: 1e44bedbc921 ("block: unifying elevator change")
Signed-off-by: Nilay Shroff &lt;nilay@linux.ibm.com&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Link: https://lore.kernel.org/r/20250719132722.769536-1-nilay@linux.ibm.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge tag 'block-6.16-20250718' of git://git.kernel.dk/linux</title>
<updated>2025-07-18T19:16:13Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-07-18T19:16:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e5ac874257bf06eba00282e5b627f23b72ac8f7d'/>
<id>urn:sha1:e5ac874257bf06eba00282e5b627f23b72ac8f7d</id>
<content type='text'>
Pull block fixes from Jens Axboe:

 - NVMe changes via Christoph:
     - revert the cross-controller atomic write size validation
       that caused regressions (Christoph Hellwig)
     - fix endianness of command word printout in
       nvme_log_err_passthru() (John Garry)
     - fix callback lock for TLS handshake (Maurizio Lombardi)
     - fix misaccounting of nvme-mpath inflight I/O (Yu Kuai)
     - fix inconsistent RCU list manipulation in
       nvme_ns_add_to_ctrl_list() (Zheng Qixing)

 - Fix for a kobject leak in queue unregistration

 - Fix for loop async file write start/end handling

* tag 'block-6.16-20250718' of git://git.kernel.dk/linux:
  loop: use kiocb helpers to fix lockdep warning
  nvmet-tcp: fix callback lock for TLS handshake
  nvme: fix misaccounting of nvme-mpath inflight I/O
  nvme: revert the cross-controller atomic write size validation
  nvme: fix endianness of command word prints in nvme_log_err_passthru()
  nvme: fix inconsistent RCU list manipulation in nvme_ns_add_to_ctrl_list()
  block: fix kobject leak in blk_unregister_queue
</content>
</entry>
</feed>
