<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/blkdev.h, branch v6.12.7</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.12.7</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.12.7'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2024-12-19T17:13:00Z</updated>
<entry>
<title>block: Prevent potential deadlocks in zone write plug error recovery</title>
<updated>2024-12-19T17:13:00Z</updated>
<author>
<name>Damien Le Moal</name>
<email>dlemoal@kernel.org</email>
</author>
<published>2024-12-09T12:23:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7fa80134cf266325fa61139320091001c9b3c477'/>
<id>urn:sha1:7fa80134cf266325fa61139320091001c9b3c477</id>
<content type='text'>
commit fe0418eb9bd69a19a948b297c8de815e05f3cde1 upstream.

Zone write plugging for handling writes to zones of a zoned block
device always execute a zone report whenever a write BIO to a zone
fails. The intent of this is to ensure that the tracking of a zone write
pointer is always correct to ensure that the alignment to a zone write
pointer of write BIOs can be checked on submission and that we can
always correctly emulate zone append operations using regular write
BIOs.

However, this error recovery scheme introduces a potential deadlock if a
device queue freeze is initiated while BIOs are still plugged in a zone
write plug and one of these write operation fails. In such case, the
disk zone write plug error recovery work is scheduled and executes a
report zone. This in turn can result in a request allocation in the
underlying driver to issue the report zones command to the device. But
with the device queue freeze already started, this allocation will
block, preventing the report zone execution and the continuation of the
processing of the plugged BIOs. As plugged BIOs hold a queue usage
reference, the queue freeze itself will never complete, resulting in a
deadlock.

Avoid this problem by completely removing from the zone write plugging
code the use of report zones operations after a failed write operation,
instead relying on the device user to either execute a report zones,
reset the zone, finish the zone, or give up writing to the device (which
is a fairly common pattern for file systems which degrade to read-only
after write failures). This is not an unreasonnable requirement as all
well-behaved applications, FSes and device mapper already use report
zones to recover from write errors whenever possible by comparing the
current position of a zone write pointer with what their assumption
about the position is.

The changes to remove the automatic error recovery are as follows:
 - Completely remove the error recovery work and its associated
   resources (zone write plug list head, disk error list, and disk
   zone_wplugs_work work struct). This also removes the functions
   disk_zone_wplug_set_error() and disk_zone_wplug_clear_error().

 - Change the BLK_ZONE_WPLUG_ERROR zone write plug flag into
   BLK_ZONE_WPLUG_NEED_WP_UPDATE. This new flag is set for a zone write
   plug whenever a write opration targetting the zone of the zone write
   plug fails. This flag indicates that the zone write pointer offset is
   not reliable and that it must be updated when the next report zone,
   reset zone, finish zone or disk revalidation is executed.

 - Modify blk_zone_write_plug_bio_endio() to set the
   BLK_ZONE_WPLUG_NEED_WP_UPDATE flag for the target zone of a failed
   write BIO.

 - Modify the function disk_zone_wplug_set_wp_offset() to clear this
   new flag, thus implementing recovery of a correct write pointer
   offset with the reset (all) zone and finish zone operations.

 - Modify blkdev_report_zones() to always use the disk_report_zones_cb()
   callback so that disk_zone_wplug_sync_wp_offset() can be called for
   any zone marked with the BLK_ZONE_WPLUG_NEED_WP_UPDATE flag.
   This implements recovery of a correct write pointer offset for zone
   write plugs marked with BLK_ZONE_WPLUG_NEED_WP_UPDATE and within
   the range of the report zones operation executed by the user.

 - Modify blk_revalidate_seq_zone() to call
   disk_zone_wplug_sync_wp_offset() for all sequential write required
   zones when a zoned block device is revalidated, thus always resolving
   any inconsistency between the write pointer offset of zone write
   plugs and the actual write pointer position of sequential zones.

Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Link: https://lore.kernel.org/r/20241209122357.47838-5-dlemoal@kernel.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>dm: Fix dm-zoned-reclaim zone write pointer alignment</title>
<updated>2024-12-19T17:13:00Z</updated>
<author>
<name>Damien Le Moal</name>
<email>dlemoal@kernel.org</email>
</author>
<published>2024-12-09T12:23:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a4b656ea1b903da880341a264b8328a21a063179'/>
<id>urn:sha1:a4b656ea1b903da880341a264b8328a21a063179</id>
<content type='text'>
commit b76b840fd93374240b59825f1ab8e2f5c9907acb upstream.

The zone reclaim processing of the dm-zoned device mapper uses
blkdev_issue_zeroout() to align the write pointer of a zone being used
for reclaiming another zone, to write the valid data blocks from the
zone being reclaimed at the same position relative to the zone start in
the reclaim target zone.

The first call to blkdev_issue_zeroout() will try to use hardware
offload using a REQ_OP_WRITE_ZEROES operation if the device reports a
non-zero max_write_zeroes_sectors queue limit. If this operation fails
because of the lack of hardware support, blkdev_issue_zeroout() falls
back to using a regular write operation with the zero-page as buffer.
Currently, such REQ_OP_WRITE_ZEROES failure is automatically handled by
the block layer zone write plugging code which will execute a report
zones operation to ensure that the write pointer of the target zone of
the failed operation has not changed and to "rewind" the zone write
pointer offset of the target zone as it was advanced when the write zero
operation was submitted. So the REQ_OP_WRITE_ZEROES failure does not
cause any issue and blkdev_issue_zeroout() works as expected.

However, since the automatic recovery of zone write pointers by the zone
write plugging code can potentially cause deadlocks with queue freeze
operations, a different recovery must be implemented in preparation for
the removal of zone write plugging report zones based recovery.

Do this by introducing the new function blk_zone_issue_zeroout(). This
function first calls blkdev_issue_zeroout() with the flag
BLKDEV_ZERO_NOFALLBACK to intercept failures on the first execution
which attempt to use the device hardware offload with the
REQ_OP_WRITE_ZEROES operation. If this attempt fails, a report zone
operation is issued to restore the zone write pointer offset of the
target zone to the correct position and blkdev_issue_zeroout() is called
again without the BLKDEV_ZERO_NOFALLBACK flag. The report zones
operation performing this recovery is implemented using the helper
function disk_zone_sync_wp_offset() which calls the gendisk report_zones
file operation with the callback disk_report_zones_cb(). This callback
updates the target write pointer offset of the target zone using the new
function disk_zone_wplug_sync_wp_offset().

dmz_reclaim_align_wp() is modified to change its call to
blkdev_issue_zeroout() to a call to blk_zone_issue_zeroout() without any
other change needed as the two functions are functionnally equivalent.

Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Acked-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Link: https://lore.kernel.org/r/20241209122357.47838-4-dlemoal@kernel.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>block: RCU protect disk-&gt;conv_zones_bitmap</title>
<updated>2024-12-14T19:03:35Z</updated>
<author>
<name>Damien Le Moal</name>
<email>dlemoal@kernel.org</email>
</author>
<published>2024-11-07T06:42:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=493326c4f10cc71a42c27fdc97ce112182ee4cbc'/>
<id>urn:sha1:493326c4f10cc71a42c27fdc97ce112182ee4cbc</id>
<content type='text'>
[ Upstream commit d7cb6d7414ea1b33536fa6d11805cb8dceec1f97 ]

Ensure that a disk revalidation changing the conventional zones bitmap
of a disk does not cause invalid memory references when using the
disk_zone_is_conv() helper by RCU protecting the disk-&gt;conv_zones_bitmap
pointer.

disk_zone_is_conv() is modified to operate under the RCU read lock and
the function disk_set_conv_zones_bitmap() is added to update a disk
conv_zones_bitmap pointer using rcu_replace_pointer() with the disk
zone_wplugs_lock spinlock held.

disk_free_zone_resources() is modified to call
disk_update_zone_resources() with a NULL bitmap pointer to free the disk
conv_zones_bitmap. disk_set_conv_zones_bitmap() is also used in
disk_update_zone_resources() to set the new (revalidated) bitmap and
free the old one.

Signed-off-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Link: https://lore.kernel.org/r/20241107064300.227731-2-dlemoal@kernel.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: always verify unfreeze lock on the owner task</title>
<updated>2024-12-05T13:03:10Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2024-10-31T13:37:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b12cfcae8a83140d949dfaad7b19c37275069f81'/>
<id>urn:sha1:b12cfcae8a83140d949dfaad7b19c37275069f81</id>
<content type='text'>
commit 6a78699838a0ddeed3620ddf50c1521f1fe1e811 upstream.

commit f1be1788a32e ("block: model freeze &amp; enter queue as lock for
supporting lockdep") tries to apply lockdep for verifying freeze &amp;
unfreeze. However, the verification is only done the outmost freeze and
unfreeze. This way is actually not correct because q-&gt;mq_freeze_depth
still may drop to zero on other task instead of the freeze owner task.

Fix this issue by always verifying the last unfreeze lock on the owner
task context, and make sure both the outmost freeze &amp; unfreeze are
verified in the current task.

Fixes: f1be1788a32e ("block: model freeze &amp; enter queue as lock for supporting lockdep")
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Link: https://lore.kernel.org/r/20241031133723.303835-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>block: return unsigned int from bdev_io_min</title>
<updated>2024-12-05T13:03:06Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2024-11-19T07:26:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5e15cc7a1d9335cc2e8b99ecffd975e22666c407'/>
<id>urn:sha1:5e15cc7a1d9335cc2e8b99ecffd975e22666c407</id>
<content type='text'>
[ Upstream commit 46fd48ab3ea3eb3bb215684bd66ea3d260b091a9 ]

The underlying limit is defined as an unsigned int, so return that from
bdev_io_min as well.

Fixes: ac481c20ef8f ("block: Topology ioctls")
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reviewed-by: John Garry &lt;john.g.garry@oracle.com&gt;
Link: https://lore.kernel.org/r/20241119072602.1059488-1-hch@lst.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: model freeze &amp; enter queue as lock for supporting lockdep</title>
<updated>2024-12-05T13:03:05Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2024-10-25T00:37:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a6fc2ba1c7e57cc4ef5a36f8fe56d810fd7b1a65'/>
<id>urn:sha1:a6fc2ba1c7e57cc4ef5a36f8fe56d810fd7b1a65</id>
<content type='text'>
[ Upstream commit f1be1788a32e8fa63416ad4518bbd1a85a825c9d ]

Recently we got several deadlock report[1][2][3] caused by
blk_mq_freeze_queue and blk_enter_queue().

Turns out the two are just like acquiring read/write lock, so model them
as read/write lock for supporting lockdep:

1) model q-&gt;q_usage_counter as two locks(io and queue lock)

- queue lock covers sync with blk_enter_queue()

- io lock covers sync with bio_enter_queue()

2) make the lockdep class/key as per-queue:

- different subsystem has very different lock use pattern, shared lock
 class causes false positive easily

- freeze_queue degrades to no lock in case that disk state becomes DEAD
  because bio_enter_queue() won't be blocked any more

- freeze_queue degrades to no lock in case that request queue becomes dying
  because blk_enter_queue() won't be blocked any more

3) model blk_mq_freeze_queue() as acquire_exclusive &amp; try_lock
- it is exclusive lock, so dependency with blk_enter_queue() is covered

- it is trylock because blk_mq_freeze_queue() are allowed to run
  concurrently

4) model blk_enter_queue() &amp; bio_enter_queue() as acquire_read()
- nested blk_enter_queue() are allowed

- dependency with blk_mq_freeze_queue() is covered

- blk_queue_exit() is often called from other contexts(such as irq), and
it can't be annotated as lock_release(), so simply do it in
blk_enter_queue(), this way still covered cases as many as possible

With lockdep support, such kind of reports may be reported asap and
needn't wait until the real deadlock is triggered.

For example, lockdep report can be triggered in the report[3] with this
patch applied.

[1] occasional block layer hang when setting 'echo noop &gt; /sys/block/sda/queue/scheduler'
https://bugzilla.kernel.org/show_bug.cgi?id=219166

[2] del_gendisk() vs blk_queue_enter() race condition
https://lore.kernel.org/linux-block/20241003085610.GK11458@google.com/

[3] queue_freeze &amp; queue_enter deadlock in scsi
https://lore.kernel.org/linux-block/ZxG38G9BuFdBpBHZ@fedora/T/#u

Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Link: https://lore.kernel.org/r/20241025003722.3630252-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Stable-dep-of: 3802f73bd807 ("block: fix uaf for flush rq while iterating tags")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: Remove unused blk_limits_io_{min,opt}</title>
<updated>2024-09-20T06:19:48Z</updated>
<author>
<name>Dr. David Alan Gilbert</name>
<email>linux@treblig.org</email>
</author>
<published>2024-09-20T00:48:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9ba5dcc722de4390a1d3211b2ee3c864f84f5461'/>
<id>urn:sha1:9ba5dcc722de4390a1d3211b2ee3c864f84f5461</id>
<content type='text'>
blk_limits_io_min and blk_limits_io_opt are unused since the
recent commit
  0a94a469a4f0 ("dm: stop using blk_limits_io_{min,opt}")

Remove them.

Signed-off-by: Dr. David Alan Gilbert &lt;linux@treblig.org&gt;
Link: https://lore.kernel.org/r/20240920004817.676216-1-linux@treblig.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>Merge tag 'v6.11' into for-6.12/block</title>
<updated>2024-09-17T14:32:53Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2024-09-17T14:32:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=42b16d3ac371a2fac9b6f08fd75f23f34ba3955a'/>
<id>urn:sha1:42b16d3ac371a2fac9b6f08fd75f23f34ba3955a</id>
<content type='text'>
Merge in 6.11 final to get the fix for preventing deadlocks on an
elevator switch, as there's a fixup for that patch.

* tag 'v6.11': (1788 commits)
  Linux 6.11
  Revert "KVM: VMX: Always honor guest PAT on CPUs that support self-snoop"
  pinctrl: pinctrl-cy8c95x0: Fix regcache
  cifs: Fix signature miscalculation
  mm: avoid leaving partial pfn mappings around in error case
  drm/xe/client: add missing bo locking in show_meminfo()
  drm/xe/client: fix deadlock in show_meminfo()
  drm/xe/oa: Enable Xe2+ PES disaggregation
  drm/xe/display: fix compat IS_DISPLAY_STEP() range end
  drm/xe: Fix access_ok check in user_fence_create
  drm/xe: Fix possible UAF in guc_exec_queue_process_msg
  drm/xe: Remove fence check from send_tlb_invalidation
  drm/xe/gt: Remove double include
  net: netfilter: move nf flowtable bpf initialization in nf_flow_table_module_init()
  PCI: Fix potential deadlock in pcim_intx()
  workqueue: Clear worker-&gt;pool in the worker thread context
  net: tighten bad gso csum offset check in virtio_net_hdr
  netlink: specs: mptcp: fix port endianness
  net: dpaa: Pad packets to ETH_ZLEN
  mptcp: pm: Fix uaf in __timer_delete_sync
  ...
</content>
</entry>
<entry>
<title>block: constify the lim argument to queue_limits_max_zone_append_sectors</title>
<updated>2024-08-29T10:32:32Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2024-08-26T17:37:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=379b122a3ec8033aa43cb70e8ecb6fb7f98aa68f'/>
<id>urn:sha1:379b122a3ec8033aa43cb70e8ecb6fb7f98aa68f</id>
<content type='text'>
queue_limits_max_zone_append_sectors doesn't change the lim argument,
so mark it as const.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Tested-by: Hans Holmberg &lt;hans.holmberg@wdc.com&gt;
Reviewed-by: Hans Holmberg &lt;hans.holmberg@wdc.com&gt;
Link: https://lore.kernel.org/r/20240826173820.1690925-3-hch@lst.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: Drop NULL check in bdev_write_zeroes_sectors()</title>
<updated>2024-08-19T15:48:59Z</updated>
<author>
<name>John Garry</name>
<email>john.g.garry@oracle.com</email>
</author>
<published>2024-08-15T16:32:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=81475beb1b5996505a39cd1d9316ce1e668932a2'/>
<id>urn:sha1:81475beb1b5996505a39cd1d9316ce1e668932a2</id>
<content type='text'>
Function bdev_get_queue() must not return NULL, so drop the check in
bdev_write_zeroes_sectors().

Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: John Garry &lt;john.g.garry@oracle.com&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reviewed-by: Nitesh Shetty &lt;nj.shetty@samsung.com&gt;
Link: https://lore.kernel.org/r/20240815163228.216051-3-john.g.garry@oracle.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
