<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/block, branch v6.1.43</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.1.43</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.1.43'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2023-08-03T08:23:48Z</updated>
<entry>
<title>blk-mq: Fix stall due to recursive flush plug</title>
<updated>2023-08-03T08:23:48Z</updated>
<author>
<name>Ross Lagerwall</name>
<email>ross.lagerwall@citrix.com</email>
</author>
<published>2023-07-14T10:11:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=557ea2ff05193c2a04cf05233102a774d9e57e52'/>
<id>urn:sha1:557ea2ff05193c2a04cf05233102a774d9e57e52</id>
<content type='text'>
[ Upstream commit 70904263512a74a3b8941dd9e6e515ca6fc57821 ]

We have seen rare IO stalls as follows:

* blk_mq_plug_issue_direct() is entered with an mq_list containing two
requests.
* For the first request, it sets last == false and enters the driver's
queue_rq callback.
* The driver queue_rq callback indirectly calls schedule() which calls
blk_flush_plug(). This may happen if the driver has the
BLK_MQ_F_BLOCKING flag set and is allowed to sleep in -&gt;queue_rq.
* blk_flush_plug() handles the remaining request in the mq_list. mq_list
is now empty.
* The original call to queue_rq resumes (with last == false).
* The loop in blk_mq_plug_issue_direct() terminates because there are no
remaining requests in mq_list.

The IO is now stalled because the last request submitted to the driver
had last == false and there was no subsequent call to commit_rqs().

Fix this by returning early in blk_mq_flush_plug_list() if rq_count is 0
which it will be in the recursive case, rather than checking if the
mq_list is empty. At the same time, adjust one of the callers to skip
the mq_list empty check as it is not necessary.

Fixes: dc5fc361d891 ("block: attempt direct issue of plug list")
Signed-off-by: Ross Lagerwall &lt;ross.lagerwall@citrix.com&gt;
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Link: https://lore.kernel.org/r/20230714101106.3635611-1-ross.lagerwall@citrix.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>blk-crypto: use dynamic lock class for blk_crypto_profile::lock</title>
<updated>2023-07-23T11:49:21Z</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2023-06-10T06:11:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=49f6ac6f1cde88bab4e530317510d1cf8870e404'/>
<id>urn:sha1:49f6ac6f1cde88bab4e530317510d1cf8870e404</id>
<content type='text'>
[ Upstream commit 2fb48d88e77f29bf9d278f25bcfe82cf59a0e09b ]

When a device-mapper device is passing through the inline encryption
support of an underlying device, calls to blk_crypto_evict_key() take
the blk_crypto_profile::lock of the device-mapper device, then take the
blk_crypto_profile::lock of the underlying device (nested).  This isn't
a real deadlock, but it causes a lockdep report because there is only
one lock class for all instances of this lock.

Lockdep subclasses don't really work here because the hierarchy of block
devices is dynamic and could have more than 2 levels.

Instead, register a dynamic lock class for each blk_crypto_profile, and
associate that with the lock.

This avoids false-positive lockdep reports like the following:

    ============================================
    WARNING: possible recursive locking detected
    6.4.0-rc5 #2 Not tainted
    --------------------------------------------
    fscryptctl/1421 is trying to acquire lock:
    ffffff80829ca418 (&amp;profile-&gt;lock){++++}-{3:3}, at: __blk_crypto_evict_key+0x44/0x1c0

                   but task is already holding lock:
    ffffff8086b68ca8 (&amp;profile-&gt;lock){++++}-{3:3}, at: __blk_crypto_evict_key+0xc8/0x1c0

                   other info that might help us debug this:
     Possible unsafe locking scenario:

           CPU0
           ----
      lock(&amp;profile-&gt;lock);
      lock(&amp;profile-&gt;lock);

                    *** DEADLOCK ***

     May be due to missing lock nesting notation

Fixes: 1b2628397058 ("block: Keyslot Manager for Inline Encryption")
Reported-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Link: https://lore.kernel.org/r/20230610061139.212085-1-ebiggers@kernel.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block/partition: fix signedness issue for Amiga partitions</title>
<updated>2023-07-19T14:22:17Z</updated>
<author>
<name>Michael Schmitz</name>
<email>schmitzmic@gmail.com</email>
</author>
<published>2023-07-04T23:38:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=899cc8f7989dcdcad43ba8a641ed957eaebba3bc'/>
<id>urn:sha1:899cc8f7989dcdcad43ba8a641ed957eaebba3bc</id>
<content type='text'>
commit 7eb1e47696aa231b1a567846bbe3a1e1befe1854 upstream.

Making 'blk' sector_t (i.e. 64 bit if LBD support is active) fails the
'blk&gt;0' test in the partition block loop if a value of (signed int) -1 is
used to mark the end of the partition block list.

Explicitly cast 'blk' to signed int to allow use of -1 to terminate the
partition block linked list.

Fixes: b6f3f28f604b ("block: add overflow checks for Amiga partition support")
Reported-by: Christian Zigotzky &lt;chzigotzky@xenosoft.de&gt;
Link: https://lore.kernel.org/r/024ce4fa-cc6d-50a2-9aae-3701d0ebf668@xenosoft.de
Signed-off-by: Michael Schmitz &lt;schmitzmic@gmail.com&gt;
Reviewed-by: Martin Steigerwald &lt;martin@lichtvoll.de&gt;
Tested-by: Christian Zigotzky &lt;chzigotzky@xenosoft.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>block: increment diskseq on all media change events</title>
<updated>2023-07-19T14:21:47Z</updated>
<author>
<name>Demi Marie Obenour</name>
<email>demi@invisiblethingslab.com</email>
</author>
<published>2023-06-07T17:08:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=defc91422764710d88a2485cafd495ae4d6651df'/>
<id>urn:sha1:defc91422764710d88a2485cafd495ae4d6651df</id>
<content type='text'>
commit b90ecc0379eb7bbe79337b0c7289390a98752646 upstream.

Currently, associating a loop device with a different file descriptor
does not increment its diskseq.  This allows the following race
condition:

1. Program X opens a loop device
2. Program X gets the diskseq of the loop device.
3. Program X associates a file with the loop device.
4. Program X passes the loop device major, minor, and diskseq to
   something.
5. Program X exits.
6. Program Y detaches the file from the loop device.
7. Program Y attaches a different file to the loop device.
8. The opener finally gets around to opening the loop device and checks
   that the diskseq is what it expects it to be.  Even though the
   diskseq is the expected value, the result is that the opener is
   accessing the wrong file.

From discussions with Christoph Hellwig, it appears that
disk_force_media_change() was supposed to call inc_diskseq(), but in
fact it does not.  Adding a Fixes: tag to indicate this.  Christoph's
Reported-by is because he stated that disk_force_media_change()
calls inc_diskseq(), which is what led me to discover that it should but
does not.

Reported-by: Christoph Hellwig &lt;hch@infradead.org&gt;
Signed-off-by: Demi Marie Obenour &lt;demi@invisiblethingslab.com&gt;
Fixes: e6138dc12de9 ("block: add a helper to raise a media changed event")
Cc: stable@vger.kernel.org # 5.15+
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20230607170837.1559-1-demi@invisiblethingslab.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>block: add overflow checks for Amiga partition support</title>
<updated>2023-07-19T14:21:47Z</updated>
<author>
<name>Michael Schmitz</name>
<email>schmitzmic@gmail.com</email>
</author>
<published>2023-06-20T20:17:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=40d6a1261a9c6247c9c00e695ff45f433e13339b'/>
<id>urn:sha1:40d6a1261a9c6247c9c00e695ff45f433e13339b</id>
<content type='text'>
commit b6f3f28f604ba3de4724ad82bea6adb1300c0b5f upstream.

The Amiga partition parser module uses signed int for partition sector
address and count, which will overflow for disks larger than 1 TB.

Use u64 as type for sector address and size to allow using disks up to
2 TB without LBD support, and disks larger than 2 TB with LBD. The RBD
format allows to specify disk sizes up to 2^128 bytes (though native
OS limitations reduce this somewhat, to max 2^68 bytes), so check for
u64 overflow carefully to protect against overflowing sector_t.

Bail out if sector addresses overflow 32 bits on kernels without LBD
support.

This bug was reported originally in 2012, and the fix was created by
the RDB author, Joanne Dow &lt;jdow@earthlink.net&gt;. A patch had been
discussed and reviewed on linux-m68k at that time but never officially
submitted (now resubmitted as patch 1 in this series).
This patch adds additional error checking and warning messages.

Reported-by: Martin Steigerwald &lt;Martin@lichtvoll.de&gt;
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=43511
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Message-ID: &lt;201206192146.09327.Martin@lichtvoll.de&gt;
Cc: &lt;stable@vger.kernel.org&gt; # 5.2
Signed-off-by: Michael Schmitz &lt;schmitzmic@gmail.com&gt;
Reviewed-by: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@infradead.org&gt;
Link: https://lore.kernel.org/r/20230620201725.7020-4-schmitzmic@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>block: fix signed int overflow in Amiga partition support</title>
<updated>2023-07-19T14:21:47Z</updated>
<author>
<name>Michael Schmitz</name>
<email>schmitzmic@gmail.com</email>
</author>
<published>2023-06-20T20:17:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a4c79ea1e9ed5cc1cfb786112e4717e67ed5fa9e'/>
<id>urn:sha1:a4c79ea1e9ed5cc1cfb786112e4717e67ed5fa9e</id>
<content type='text'>
commit fc3d092c6bb48d5865fec15ed5b333c12f36288c upstream.

The Amiga partition parser module uses signed int for partition sector
address and count, which will overflow for disks larger than 1 TB.

Use sector_t as type for sector address and size to allow using disks
up to 2 TB without LBD support, and disks larger than 2 TB with LBD.

This bug was reported originally in 2012, and the fix was created by
the RDB author, Joanne Dow &lt;jdow@earthlink.net&gt;. A patch had been
discussed and reviewed on linux-m68k at that time but never officially
submitted. This patch differs from Joanne's patch only in its use of
sector_t instead of unsigned int. No checking for overflows is done
(see patch 3 of this series for that).

Reported-by: Martin Steigerwald &lt;Martin@lichtvoll.de&gt;
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=43511
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Message-ID: &lt;201206192146.09327.Martin@lichtvoll.de&gt;
Cc: &lt;stable@vger.kernel.org&gt; # 5.2
Signed-off-by: Michael Schmitz &lt;schmitzmic@gmail.com&gt;
Tested-by: Martin Steigerwald &lt;Martin@lichtvoll.de&gt;
Reviewed-by: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20230620201725.7020-2-schmitzmic@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>block: fix blktrace debugfs entries leakage</title>
<updated>2023-07-19T14:20:58Z</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2023-06-10T02:20:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=aa07e56c6a9c7558165690d14eed4fe8babf34fb'/>
<id>urn:sha1:aa07e56c6a9c7558165690d14eed4fe8babf34fb</id>
<content type='text'>
[ Upstream commit dd7de3704af9989b780693d51eaea49a665bd9c2 ]

Commit 99d055b4fd4b ("block: remove per-disk debugfs files in
blk_unregister_queue") moves blk_trace_shutdown() from
blk_release_queue() to blk_unregister_queue(), this is safe if blktrace
is created through sysfs, however, there is a regression in corner
case.

blktrace can still be enabled after del_gendisk() through ioctl if
the disk is opened before del_gendisk(), and if blktrace is not shutdown
through ioctl before closing the disk, debugfs entries will be leaked.

Fix this problem by shutdown blktrace in disk_release(), this is safe
because blk_trace_remove() is reentrant.

Fixes: 99d055b4fd4b ("block: remove per-disk debugfs files in blk_unregister_queue")
Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Link: https://lore.kernel.org/r/20230610022003.2557284-4-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>blk-mq: fix potential io hang by wrong 'wake_batch'</title>
<updated>2023-07-19T14:20:55Z</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2023-06-10T02:30:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=931bd6758bcc0f3d27d0492456798afae028e52f'/>
<id>urn:sha1:931bd6758bcc0f3d27d0492456798afae028e52f</id>
<content type='text'>
[ Upstream commit 4f1731df60f9033669f024d06ae26a6301260b55 ]

In __blk_mq_tag_busy/idle(), updating 'active_queues' and calculating
'wake_batch' is not atomic:

t1:			t2:
_blk_mq_tag_busy	blk_mq_tag_busy
inc active_queues
// assume 1-&gt;2
			inc active_queues
			// 2 -&gt; 3
			blk_mq_update_wake_batch
			// calculate based on 3
blk_mq_update_wake_batch
/* calculate based on 2, while active_queues is actually 3. */

Fix this problem by protecting them wih 'tags-&gt;lock', this is not a hot
path, so performance should not be concerned. And now that all writers
are inside the lock, switch 'actives_queues' from atomic to unsigned
int.

Fixes: 180dccb0dba4 ("blk-mq: fix tag_get wait task can't be awakened")
Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Link: https://lore.kernel.org/r/20230610023043.2559121-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>blk-iocost: use spin_lock_irqsave in adjust_inuse_and_calc_cost</title>
<updated>2023-07-19T14:20:55Z</updated>
<author>
<name>Li Nan</name>
<email>linan122@huawei.com</email>
</author>
<published>2023-05-27T09:19:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8ceeb3fc86a83700bb1585c189006080a47e8506'/>
<id>urn:sha1:8ceeb3fc86a83700bb1585c189006080a47e8506</id>
<content type='text'>
[ Upstream commit 8d211554679d0b23702bd32ba04aeac0c1c4f660 ]

adjust_inuse_and_calc_cost() use spin_lock_irq() and IRQ will be enabled
when unlock. DEADLOCK might happen if we have held other locks and disabled
IRQ before invoking it.

Fix it by using spin_lock_irqsave() instead, which can keep IRQ state
consistent with before when unlock.

  ================================
  WARNING: inconsistent lock state
  5.10.0-02758-g8e5f91fd772f #26 Not tainted
  --------------------------------
  inconsistent {IN-HARDIRQ-W} -&gt; {HARDIRQ-ON-W} usage.
  kworker/2:3/388 [HC0[0]:SC0[0]:HE0:SE1] takes:
  ffff888118c00c28 (&amp;bfqd-&gt;lock){?.-.}-{2:2}, at: spin_lock_irq
  ffff888118c00c28 (&amp;bfqd-&gt;lock){?.-.}-{2:2}, at: bfq_bio_merge+0x141/0x390
  {IN-HARDIRQ-W} state was registered at:
    __lock_acquire+0x3d7/0x1070
    lock_acquire+0x197/0x4a0
    __raw_spin_lock_irqsave
    _raw_spin_lock_irqsave+0x3b/0x60
    bfq_idle_slice_timer_body
    bfq_idle_slice_timer+0x53/0x1d0
    __run_hrtimer+0x477/0xa70
    __hrtimer_run_queues+0x1c6/0x2d0
    hrtimer_interrupt+0x302/0x9e0
    local_apic_timer_interrupt
    __sysvec_apic_timer_interrupt+0xfd/0x420
    run_sysvec_on_irqstack_cond
    sysvec_apic_timer_interrupt+0x46/0xa0
    asm_sysvec_apic_timer_interrupt+0x12/0x20
  irq event stamp: 837522
  hardirqs last  enabled at (837521): [&lt;ffffffff84b9419d&gt;] __raw_spin_unlock_irqrestore
  hardirqs last  enabled at (837521): [&lt;ffffffff84b9419d&gt;] _raw_spin_unlock_irqrestore+0x3d/0x40
  hardirqs last disabled at (837522): [&lt;ffffffff84b93fa3&gt;] __raw_spin_lock_irq
  hardirqs last disabled at (837522): [&lt;ffffffff84b93fa3&gt;] _raw_spin_lock_irq+0x43/0x50
  softirqs last  enabled at (835852): [&lt;ffffffff84e00558&gt;] __do_softirq+0x558/0x8ec
  softirqs last disabled at (835845): [&lt;ffffffff84c010ff&gt;] asm_call_irq_on_stack+0xf/0x20

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&amp;bfqd-&gt;lock);
    &lt;Interrupt&gt;
      lock(&amp;bfqd-&gt;lock);

   *** DEADLOCK ***

  3 locks held by kworker/2:3/388:
   #0: ffff888107af0f38 ((wq_completion)kthrotld){+.+.}-{0:0}, at: process_one_work+0x742/0x13f0
   #1: ffff8881176bfdd8 ((work_completion)(&amp;td-&gt;dispatch_work)){+.+.}-{0:0}, at: process_one_work+0x777/0x13f0
   #2: ffff888118c00c28 (&amp;bfqd-&gt;lock){?.-.}-{2:2}, at: spin_lock_irq
   #2: ffff888118c00c28 (&amp;bfqd-&gt;lock){?.-.}-{2:2}, at: bfq_bio_merge+0x141/0x390

  stack backtrace:
  CPU: 2 PID: 388 Comm: kworker/2:3 Not tainted 5.10.0-02758-g8e5f91fd772f #26
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
  Workqueue: kthrotld blk_throtl_dispatch_work_fn
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x107/0x167
   print_usage_bug
   valid_state
   mark_lock_irq.cold+0x32/0x3a
   mark_lock+0x693/0xbc0
   mark_held_locks+0x9e/0xe0
   __trace_hardirqs_on_caller
   lockdep_hardirqs_on_prepare.part.0+0x151/0x360
   trace_hardirqs_on+0x5b/0x180
   __raw_spin_unlock_irq
   _raw_spin_unlock_irq+0x24/0x40
   spin_unlock_irq
   adjust_inuse_and_calc_cost+0x4fb/0x970
   ioc_rqos_merge+0x277/0x740
   __rq_qos_merge+0x62/0xb0
   rq_qos_merge
   bio_attempt_back_merge+0x12c/0x4a0
   blk_mq_sched_try_merge+0x1b6/0x4d0
   bfq_bio_merge+0x24a/0x390
   __blk_mq_sched_bio_merge+0xa6/0x460
   blk_mq_sched_bio_merge
   blk_mq_submit_bio+0x2e7/0x1ee0
   __submit_bio_noacct_mq+0x175/0x3b0
   submit_bio_noacct+0x1fb/0x270
   blk_throtl_dispatch_work_fn+0x1ef/0x2b0
   process_one_work+0x83e/0x13f0
   process_scheduled_works
   worker_thread+0x7e3/0xd80
   kthread+0x353/0x470
   ret_from_fork+0x1f/0x30

Fixes: b0853ab4a238 ("blk-iocost: revamp in-period donation snapbacks")
Signed-off-by: Li Nan &lt;linan122@huawei.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Link: https://lore.kernel.org/r/20230527091904.3001833-1-linan666@huaweicloud.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>blk-mq: fix blk_mq_hw_ctx active request accounting</title>
<updated>2023-06-14T09:15:31Z</updated>
<author>
<name>Tian Lan</name>
<email>tian.lan@twosigma.com</email>
</author>
<published>2023-05-13T22:12:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b64bbe8b1a45a27b9ea3dccfd82dfa138e9c5277'/>
<id>urn:sha1:b64bbe8b1a45a27b9ea3dccfd82dfa138e9c5277</id>
<content type='text'>
[ Upstream commit ddad59331a4e16088468ca0ad228a9fe32d7955a ]

The nr_active counter continues to increase over time which causes the
blk_mq_get_tag to hang until the thread is rescheduled to a different
core despite there are still tags available.

kernel-stack

  INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
  Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
  "echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  task:inboundIOReacto state:D stack:0  pid:3014879 ppid:4557 flags:0x00000000
    Call Trace:
    &lt;TASK&gt;
    __schedule+0x351/0xa20
    scheduler+0x5d/0xe0
    io_schedule+0x42/0x70
    blk_mq_get_tag+0x11a/0x2a0
    ? dequeue_task_stop+0x70/0x70
    __blk_mq_alloc_requests+0x191/0x2e0

kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
__blk_mq_free_request being called.

  320    320  kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
         b'__blk_mq_free_request+0x1 [kernel]'
         b'bt_iter+0x50 [kernel]'
         b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
         b'blk_mq_timeout_work+0x7c [kernel]'
         b'process_one_work+0x1c4 [kernel]'
         b'worker_thread+0x4d [kernel]'
         b'kthread+0xe6 [kernel]'
         b'ret_from_fork+0x1f [kernel]'

Signed-off-by: Tian Lan &lt;tian.lan@twosigma.com&gt;
Fixes: 2e315dc07df0 ("blk-mq: grab rq-&gt;refcount before calling -&gt;fn in blk_mq_tagset_busy_iter")
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Link: https://lore.kernel.org/r/20230513221227.497327-1-tilan7663@gmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
