<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/blk-mq.h, branch v5.4.249</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.4.249</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.4.249'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2019-09-06T01:52:34Z</updated>
<entry>
<title>block: Delay default elevator initialization</title>
<updated>2019-09-06T01:52:34Z</updated>
<author>
<name>Damien Le Moal</name>
<email>damien.lemoal@wdc.com</email>
</author>
<published>2019-09-05T09:51:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=737eb78e82d52d35df166d29af32bf61992de71d'/>
<id>urn:sha1:737eb78e82d52d35df166d29af32bf61992de71d</id>
<content type='text'>
When elevator_init_mq() is called from blk_mq_init_allocated_queue(),
the only information known about the device is the number of hardware
queues as the block device scan by the device driver is not completed
yet for most drivers. The device type and elevator required features
are not set yet, preventing to correctly select the default elevator
most suitable for the device.

This currently affects all multi-queue zoned block devices which default
to the "none" elevator instead of the required "mq-deadline" elevator.
These drives currently include host-managed SMR disks connected to a
smartpqi HBA and null_blk block devices with zoned mode enabled.
Upcoming NVMe Zoned Namespace devices will also be affected.

Fix this by adding the boolean elevator_init argument to
blk_mq_init_allocated_queue() to control the execution of
elevator_init_mq(). Two cases exist:
1) elevator_init = false is used for calls to
   blk_mq_init_allocated_queue() within blk_mq_init_queue(). In this
   case, a call to elevator_init_mq() is added to __device_add_disk(),
   resulting in the delayed initialization of the queue elevator
   after the device driver finished probing the device information. This
   effectively allows elevator_init_mq() access to more information
   about the device.
2) elevator_init = true preserves the current behavior of initializing
   the elevator directly from blk_mq_init_allocated_queue(). This case
   is used for the special request based DM devices where the device
   gendisk is created before the queue initialization and device
   information (e.g. queue limits) is already known when the queue
   initialization is executed.

Additionally, to make sure that the elevator initialization is never
done while requests are in-flight (there should be none when the device
driver calls device_add_disk()), freeze and quiesce the device request
queue before calling blk_mq_init_sched() in elevator_init_mq().

Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Damien Le Moal &lt;damien.lemoal@wdc.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: Remove blk_mq_register_dev()</title>
<updated>2019-08-27T16:40:19Z</updated>
<author>
<name>Bart Van Assche</name>
<email>bvanassche@acm.org</email>
</author>
<published>2019-08-27T11:01:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9685b2270211628e27ea7880a02b52efd4524099'/>
<id>urn:sha1:9685b2270211628e27ea7880a02b52efd4524099</id>
<content type='text'>
This function has no callers. Hence remove it.

Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Cc: Ming Lei &lt;ming.lei@redhat.com&gt;
Cc: Hannes Reinecke &lt;hare@suse.com&gt;
Signed-off-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blk-mq: add callback of .cleanup_rq</title>
<updated>2019-08-05T03:42:06Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2019-07-25T02:04:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=226b4fc75c78f9c497c5182d939101b260cfb9f3'/>
<id>urn:sha1:226b4fc75c78f9c497c5182d939101b260cfb9f3</id>
<content type='text'>
SCSI maintains its own driver private data hooked off of each SCSI
request, and the pridate data won't be freed after scsi_queue_rq()
returns BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE. An upper layer driver
(e.g. dm-rq) may need to retry these SCSI requests, before SCSI has
fully dispatched them, due to a lower level SCSI driver's resource
limitation identified in scsi_queue_rq(). Currently SCSI's per-request
private data is leaked when the upper layer driver (dm-rq) frees and
then retries these requests in response to BLK_STS_RESOURCE or
BLK_STS_DEV_RESOURCE returns from scsi_queue_rq().

This usecase is so specialized that it doesn't warrant training an
existing blk-mq interface (e.g. blk_mq_free_request) to allow SCSI to
account for freeing its driver private data -- doing so would add an
extra branch for handling a special case that all other consumers of
SCSI (and blk-mq) won't ever need to worry about.

So the most pragmatic way forward is to delegate freeing SCSI driver
private data to the upper layer driver (dm-rq).  Do so by adding
new .cleanup_rq callback and calling a new blk_mq_cleanup_rq() method
from dm-rq.  A following commit will implement the .cleanup_rq() hook
in scsi_mq_ops.

Cc: Ewan D. Milne &lt;emilne@redhat.com&gt;
Cc: Bart Van Assche &lt;bvanassche@acm.org&gt;
Cc: Hannes Reinecke &lt;hare@suse.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: dm-devel@redhat.com
Cc: &lt;stable@vger.kernel.org&gt;
Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blk-mq: remove blk_mq_complete_request_sync</title>
<updated>2019-08-05T03:41:29Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2019-07-24T03:48:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a87ccce0b5a06ee546931859fa62e10f1bce54f9'/>
<id>urn:sha1:a87ccce0b5a06ee546931859fa62e10f1bce54f9</id>
<content type='text'>
blk_mq_tagset_wait_completed_request() has been applied for waiting
for completed request's fn, so not necessary to use
blk_mq_complete_request_sync() any more.

Cc: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Cc: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Cc: Keith Busch &lt;keith.busch@intel.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blk-mq: introduce blk_mq_tagset_wait_completed_request()</title>
<updated>2019-08-05T03:41:29Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2019-07-24T03:48:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f9934a80f91dba8c7029ba7601459e41ea7770aa'/>
<id>urn:sha1:f9934a80f91dba8c7029ba7601459e41ea7770aa</id>
<content type='text'>
blk-mq may schedule to call queue's complete function on remote CPU via
IPI, but doesn't provide any way to synchronize the request's complete
fn. The current queue freeze interface can't provide the synchonization
because aborted requests stay at blk-mq queues during EH.

In some driver's EH(such as NVMe), hardware queue's resource may be freed &amp;
re-allocated. If the completed request's complete fn is run finally after the
hardware queue's resource is released, kernel crash will be triggered.

Prepare for fixing this kind of issue by introducing
blk_mq_tagset_wait_completed_request().

Cc: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Cc: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Cc: Keith Busch &lt;keith.busch@intel.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blk-mq: introduce blk_mq_request_completed()</title>
<updated>2019-08-05T03:41:29Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2019-07-24T03:48:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=aa306ab703e9452b1e25cc8e8f04b8df523d0bb8'/>
<id>urn:sha1:aa306ab703e9452b1e25cc8e8f04b8df523d0bb8</id>
<content type='text'>
NVMe needs this function to decide if one request to be aborted has
been completed in normal IO path already.

So introduce it.

Cc: Max Gurtovoy &lt;maxg@mellanox.com&gt;
Cc: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Cc: Keith Busch &lt;keith.busch@intel.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: remove the bi_phys_segments field in struct bio</title>
<updated>2019-06-20T16:29:22Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2019-06-06T10:29:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=14ccb66b3f585b2bc21e7256c96090abed5a512c'/>
<id>urn:sha1:14ccb66b3f585b2bc21e7256c96090abed5a512c</id>
<content type='text'>
We only need the number of segments in the blk-mq submission path.
Remove the field from struct bio, and return it from a variant of
blk_queue_split instead of that it can passed as an argument to
those functions that need the value.

This also means we stop recounting segments except for cloning
and partial segments.

To keep the number of arguments in this how path down remove
pointless struct request_queue arguments from any of the functions
that had it and grew a nr_segs argument.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blk-mq: always free hctx after request queue is freed</title>
<updated>2019-05-04T13:24:08Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2019-04-30T01:52:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2f8f1336a48bd5186de3476da0a3e2ec06d0533a'/>
<id>urn:sha1:2f8f1336a48bd5186de3476da0a3e2ec06d0533a</id>
<content type='text'>
In normal queue cleanup path, hctx is released after request queue
is freed, see blk_mq_release().

However, in __blk_mq_update_nr_hw_queues(), hctx may be freed because
of hw queues shrinking. This way is easy to cause use-after-free,
because: one implicit rule is that it is safe to call almost all block
layer APIs if the request queue is alive; and one hctx may be retrieved
by one API, then the hctx can be freed by blk_mq_update_nr_hw_queues();
finally use-after-free is triggered.

Fixes this issue by always freeing hctx after releasing request queue.
If some hctxs are removed in blk_mq_update_nr_hw_queues(), introduce
a per-queue list to hold them, then try to resuse these hctxs if numa
node is matched.

Cc: Dongli Zhang &lt;dongli.zhang@oracle.com&gt;
Cc: James Smart &lt;james.smart@broadcom.com&gt;
Cc: Bart Van Assche &lt;bart.vanassche@wdc.com&gt;
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen &lt;martin.petersen@oracle.com&gt;,
Cc: Christoph Hellwig &lt;hch@lst.de&gt;,
Cc: James E . J . Bottomley &lt;jejb@linux.vnet.ibm.com&gt;,
Reviewed-by: Hannes Reinecke &lt;hare@suse.com&gt;
Tested-by: James Smart &lt;james.smart@broadcom.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>blk-mq: introduce blk_mq_complete_request_sync()</title>
<updated>2019-04-10T15:57:33Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2019-04-08T22:31:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1b8f21b74c3c9c82fce5a751d7aefb7cc0b8d33d'/>
<id>urn:sha1:1b8f21b74c3c9c82fce5a751d7aefb7cc0b8d33d</id>
<content type='text'>
In NVMe's error handler, follows the typical steps of tearing down
hardware for recovering controller:

1) stop blk_mq hw queues
2) stop the real hw queues
3) cancel in-flight requests via
	blk_mq_tagset_busy_iter(tags, cancel_request, ...)
cancel_request():
	mark the request as abort
	blk_mq_complete_request(req);
4) destroy real hw queues

However, there may be race between #3 and #4, because blk_mq_complete_request()
may run q-&gt;mq_ops-&gt;complete(rq) remotelly and asynchronously, and
-&gt;complete(rq) may be run after #4.

This patch introduces blk_mq_complete_request_sync() for fixing the
above race.

Cc: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Cc: Bart Van Assche &lt;bvanassche@acm.org&gt;
Cc: James Smart &lt;james.smart@broadcom.com&gt;
Cc: linux-nvme@lists.infradead.org
Reviewed-by: Keith Busch &lt;keith.busch@intel.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>block: Unexport blk_mq_add_to_requeue_list()</title>
<updated>2019-03-20T20:19:36Z</updated>
<author>
<name>Bart Van Assche</name>
<email>bvanassche@acm.org</email>
</author>
<published>2019-03-20T20:14:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e6c987120e24cb913cb7bd4e675129a30fa49e0d'/>
<id>urn:sha1:e6c987120e24cb913cb7bd4e675129a30fa49e0d</id>
<content type='text'>
This function is not used outside the block layer core. Hence unexport it.

Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
</feed>
