<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/block, branch v5.1.14</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.1.14</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.1.14'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2019-06-22T06:09:14Z</updated>
<entry>
<title>blk-mq: Fix memory leak in error handling</title>
<updated>2019-06-22T06:09:14Z</updated>
<author>
<name>Jes Sorensen</name>
<email>jsorensen@fb.com</email>
</author>
<published>2019-04-19T20:35:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4c17461bff063b5fa2b71a4359cd83af8570cc29'/>
<id>urn:sha1:4c17461bff063b5fa2b71a4359cd83af8570cc29</id>
<content type='text'>
[ Upstream commit 41de54c64811bf087c8464fdeb43c6ad8be2686b ]

If blk_mq_init_allocated_queue() fails, make sure to free the poll
stat callback struct allocated.

Signed-off-by: Jes Sorensen &lt;jsorensen@fb.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block, bfq: increase idling for weight-raised queues</title>
<updated>2019-06-15T09:53:04Z</updated>
<author>
<name>Paolo Valente</name>
<email>paolo.valente@linaro.org</email>
</author>
<published>2019-03-12T08:59:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=404ee275baec931a5c405c99dedc5abf6bb1d342'/>
<id>urn:sha1:404ee275baec931a5c405c99dedc5abf6bb1d342</id>
<content type='text'>
[ Upstream commit 778c02a236a8728bb992de10ed1f12c0be5b7b0e ]

If a sync bfq_queue has a higher weight than some other queue, and
remains temporarily empty while in service, then, to preserve the
bandwidth share of the queue, it is necessary to plug I/O dispatching
until a new request arrives for the queue. In addition, a timeout
needs to be set, to avoid waiting for ever if the process associated
with the queue has actually finished its I/O.

Even with the above timeout, the device is however not fed with new
I/O for a while, if the process has finished its I/O. If this happens
often, then throughput drops and latencies grow. For this reason, the
timeout is kept rather low: 8 ms is the current default.

Unfortunately, such a low value may cause, on the opposite end, a
violation of bandwidth guarantees for a process that happens to issue
new I/O too late. The higher the system load, the higher the
probability that this happens to some process. This is a problem in
scenarios where service guarantees matter more than throughput. One
important case are weight-raised queues, which need to be granted a
very high fraction of the bandwidth.

To address this issue, this commit lower-bounds the plugging timeout
for weight-raised queues to 20 ms. This simple change provides
relevant benefits. For example, on a PLEXTOR PX-256M5S, with which
gnome-terminal starts in 0.6 seconds if there is no other I/O in
progress, the same applications starts in
- 0.8 seconds, instead of 1.2 seconds, if ten files are being read
  sequentially in parallel
- 1 second, instead of 2 seconds, if, in parallel, five files are
  being read sequentially, and five more files are being written
  sequentially

Tested-by: Holger Hoffstätte &lt;holger@applied-asynchrony.com&gt;
Tested-by: Oleksandr Natalenko &lt;oleksandr@natalenko.name&gt;
Signed-off-by: Paolo Valente &lt;paolo.valente@linaro.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>blk-mq: move cancel of requeue_work into blk_mq_release</title>
<updated>2019-06-15T09:52:58Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2019-04-30T01:52:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9e2c1b3d0d5820faa4c83badf30e40b5b7e90984'/>
<id>urn:sha1:9e2c1b3d0d5820faa4c83badf30e40b5b7e90984</id>
<content type='text'>
[ Upstream commit fbc2a15e3433058582e5635aabe48a3011a644a8 ]

With holding queue's kobject refcount, it is safe for driver
to schedule requeue. However, blk_mq_kick_requeue_list() may
be called after blk_sync_queue() is done because of concurrent
requeue activities, then requeue work may not be completed when
freeing queue, and kernel oops is triggered.

So moving the cancel of requeue_work into blk_mq_release() for
avoiding race between requeue and freeing queue.

Cc: Dongli Zhang &lt;dongli.zhang@oracle.com&gt;
Cc: James Smart &lt;james.smart@broadcom.com&gt;
Cc: Bart Van Assche &lt;bart.vanassche@wdc.com&gt;
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen &lt;martin.petersen@oracle.com&gt;,
Cc: Christoph Hellwig &lt;hch@lst.de&gt;,
Cc: James E . J . Bottomley &lt;jejb@linux.vnet.ibm.com&gt;,
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Reviewed-by: Johannes Thumshirn &lt;jthumshirn@suse.de&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Tested-by: James Smart &lt;james.smart@broadcom.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: pass page to xen_biovec_phys_mergeable</title>
<updated>2019-05-31T13:43:44Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2019-03-29T07:07:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=36478bcfcf86cd91bd12572fa9740fd4fd1a4444'/>
<id>urn:sha1:36478bcfcf86cd91bd12572fa9740fd4fd1a4444</id>
<content type='text'>
[ Upstream commit 0383ad4374f7ad7edd925a2ee4753035c3f5508a ]

xen_biovec_phys_mergeable() only needs .bv_page of the 2nd bio bvec
for checking if the two bvecs can be merged, so pass page to
xen_biovec_phys_mergeable() directly.

No function change.

Cc: ris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: Juergen Gross &lt;jgross@suse.com&gt;
Cc: xen-devel@lists.xenproject.org
Cc: Omar Sandoval &lt;osandov@fb.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Boris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: avoid to break XEN by multi-page bvec</title>
<updated>2019-05-31T13:43:43Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2019-03-17T10:01:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0aa9a8d4983dea88e181c519cd50544f62238ad8'/>
<id>urn:sha1:0aa9a8d4983dea88e181c519cd50544f62238ad8</id>
<content type='text'>
[ Upstream commit db5ebd6edd2627d7e81a031643cf43587f63e66c ]

XEN has special page merge requirement, see xen_biovec_phys_mergeable().
We can't merge pages into one bvec simply for XEN.

So move XEN's specific check on page merge into __bio_try_merge_page(),
then abvoid to break XEN by multi-page bvec.

Cc: ris Ostrovsky &lt;boris.ostrovsky@oracle.com&gt;
Cc: xen-devel@lists.xenproject.org
Cc: Omar Sandoval &lt;osandov@fb.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Juergen Gross &lt;jgross@suse.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: sed-opal: fix IOC_OPAL_ENABLE_DISABLE_MBR</title>
<updated>2019-05-31T13:43:36Z</updated>
<author>
<name>David Kozub</name>
<email>zub@linux.fjfi.cvut.cz</email>
</author>
<published>2019-02-14T00:15:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7c1c65c50558d1cd7986acd4e8947d1064ea15e1'/>
<id>urn:sha1:7c1c65c50558d1cd7986acd4e8947d1064ea15e1</id>
<content type='text'>
[ Upstream commit 78bf47353b0041865564deeed257a54f047c2fdc ]

The implementation of IOC_OPAL_ENABLE_DISABLE_MBR handled the value
opal_mbr_data.enable_disable incorrectly: enable_disable is expected
to be one of OPAL_MBR_ENABLE(0) or OPAL_MBR_DISABLE(1). enable_disable
was passed directly to set_mbr_done and set_mbr_enable_disable where
is was interpreted as either OPAL_TRUE(1) or OPAL_FALSE(0). The end
result was that calling IOC_OPAL_ENABLE_DISABLE_MBR with OPAL_MBR_ENABLE
actually disabled the shadow MBR and vice versa.

This patch adds correct conversion from OPAL_MBR_DISABLE/ENABLE to
OPAL_FALSE/TRUE. The change affects existing programs using
IOC_OPAL_ENABLE_DISABLE_MBR but this is typically used only once when
setting up an Opal drive.

Acked-by: Jon Derrick &lt;jonathan.derrick@intel.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Scott Bauer &lt;sbauer@plzdonthack.me&gt;
Signed-off-by: David Kozub &lt;zub@linux.fjfi.cvut.cz&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>block: fix use-after-free on gendisk</title>
<updated>2019-05-31T13:43:28Z</updated>
<author>
<name>Yufen Yu</name>
<email>yuyufen@huawei.com</email>
</author>
<published>2019-04-02T12:06:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5090dd8f2804fa746e668fe5c047ad1664ef9528'/>
<id>urn:sha1:5090dd8f2804fa746e668fe5c047ad1664ef9528</id>
<content type='text'>
[ Upstream commit 2c88e3c7ec32d7a40cc7c9b4a487cf90e4671bdd ]

commit 2da78092dda "block: Fix dev_t minor allocation lifetime"
specifically moved blk_free_devt(dev-&gt;devt) call to part_release()
to avoid reallocating device number before the device is fully
shutdown.

However, it can cause use-after-free on gendisk in get_gendisk().
We use md device as example to show the race scenes:

Process1		Worker			Process2
md_free
						blkdev_open
del_gendisk
  add delete_partition_work_fn() to wq
  						__blkdev_get
						get_gendisk
put_disk
  disk_release
    kfree(disk)
    						find part from ext_devt_idr
						get_disk_and_module(disk)
    					  	cause use after free

    			delete_partition_work_fn
			put_device(part)
    		  	part_release
		    	remove part from ext_devt_idr

Before &lt;devt, hd_struct pointer&gt; is removed from ext_devt_idr by
delete_partition_work_fn(), we can find the devt and then access
gendisk by hd_struct pointer. But, if we access the gendisk after
it have been freed, it can cause in use-after-freeon gendisk in
get_gendisk().

We fix this by adding a new helper blk_invalidate_devt() in
delete_partition() and del_gendisk(). It replaces hd_struct
pointer in idr with value 'NULL', and deletes the entry from
idr in part_release() as we do now.

Thanks to Jan Kara for providing the solution and more clear comments
for the code.

Fixes: 2da78092dda1 ("block: Fix dev_t minor allocation lifetime")
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Reviewed-by: Keith Busch &lt;keith.busch@intel.com&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Suggested-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Yufen Yu &lt;yuyufen@huawei.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>blk-mq: grab .q_usage_counter when queuing request from plug code path</title>
<updated>2019-05-31T13:43:15Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2019-04-30T01:52:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1ba71e3046cb28990597d818f9f442066c645652'/>
<id>urn:sha1:1ba71e3046cb28990597d818f9f442066c645652</id>
<content type='text'>
[ Upstream commit e87eb301bee183d82bb3d04bd71b6660889a2588 ]

Just like aio/io_uring, we need to grab 2 refcount for queuing one
request, one is for submission, another is for completion.

If the request isn't queued from plug code path, the refcount grabbed
in generic_make_request() serves for submission. In theroy, this
refcount should have been released after the sumission(async run queue)
is done. blk_freeze_queue() works with blk_sync_queue() together
for avoiding race between cleanup queue and IO submission, given async
run queue activities are canceled because hctx-&gt;run_work is scheduled with
the refcount held, so it is fine to not hold the refcount when
running the run queue work function for dispatch IO.

However, if request is staggered into plug list, and finally queued
from plug code path, the refcount in submission side is actually missed.
And we may start to run queue after queue is removed because the queue's
kobject refcount isn't guaranteed to be grabbed in flushing plug list
context, then kernel oops is triggered, see the following race:

blk_mq_flush_plug_list():
        blk_mq_sched_insert_requests()
                insert requests to sw queue or scheduler queue
                blk_mq_run_hw_queue

Because of concurrent run queue, all requests inserted above may be
completed before calling the above blk_mq_run_hw_queue. Then queue can
be freed during the above blk_mq_run_hw_queue().

Fixes the issue by grab .q_usage_counter before calling
blk_mq_sched_insert_requests() in blk_mq_flush_plug_list(). This way is
safe because the queue is absolutely alive before inserting request.

Cc: Dongli Zhang &lt;dongli.zhang@oracle.com&gt;
Cc: James Smart &lt;james.smart@broadcom.com&gt;
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen &lt;martin.petersen@oracle.com&gt;,
Cc: Christoph Hellwig &lt;hch@lst.de&gt;,
Cc: James E . J . Bottomley &lt;jejb@linux.vnet.ibm.com&gt;,
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Tested-by: James Smart &lt;james.smart@broadcom.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>blk-mq: split blk_mq_alloc_and_init_hctx into two parts</title>
<updated>2019-05-31T13:43:15Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2019-04-30T01:52:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c415ae66aa1fb617149d31edbee25d4b698cc276'/>
<id>urn:sha1:c415ae66aa1fb617149d31edbee25d4b698cc276</id>
<content type='text'>
[ Upstream commit 7c6c5b7c9186e3fb5b10afb8e5f710ae661144c6 ]

Split blk_mq_alloc_and_init_hctx into two parts, and one is
blk_mq_alloc_hctx() for allocating all hctx resources, another
is blk_mq_init_hctx() for initializing hctx, which serves as
counter-part of blk_mq_exit_hctx().

Cc: Dongli Zhang &lt;dongli.zhang@oracle.com&gt;
Cc: James Smart &lt;james.smart@broadcom.com&gt;
Cc: Bart Van Assche &lt;bart.vanassche@wdc.com&gt;
Cc: linux-scsi@vger.kernel.org
Cc: Martin K . Petersen &lt;martin.petersen@oracle.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: James E . J . Bottomley &lt;jejb@linux.vnet.ibm.com&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Tested-by: James Smart &lt;james.smart@broadcom.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>blk-mq: free hw queue's resource in hctx's release handler</title>
<updated>2019-05-25T16:16:18Z</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2019-04-30T01:52:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=94ba80afbef465ff756e0cf0b62555853c9f5ad5'/>
<id>urn:sha1:94ba80afbef465ff756e0cf0b62555853c9f5ad5</id>
<content type='text'>
commit c7e2d94b3d1634988a95ac4d77a72dc7487ece06 upstream.

Once blk_cleanup_queue() returns, tags shouldn't be used any more,
because blk_mq_free_tag_set() may be called. Commit 45a9c9d909b2
("blk-mq: Fix a use-after-free") fixes this issue exactly.

However, that commit introduces another issue. Before 45a9c9d909b2,
we are allowed to run queue during cleaning up queue if the queue's
kobj refcount is held. After that commit, queue can't be run during
queue cleaning up, otherwise oops can be triggered easily because
some fields of hctx are freed by blk_mq_free_queue() in blk_cleanup_queue().

We have invented ways for addressing this kind of issue before, such as:

	8dc765d438f1 ("SCSI: fix queue cleanup race before queue initialization is done")
	c2856ae2f315 ("blk-mq: quiesce queue before freeing queue")

But still can't cover all cases, recently James reports another such
kind of issue:

	https://marc.info/?l=linux-scsi&amp;m=155389088124782&amp;w=2

This issue can be quite hard to address by previous way, given
scsi_run_queue() may run requeues for other LUNs.

Fixes the above issue by freeing hctx's resources in its release handler, and this
way is safe becasue tags isn't needed for freeing such hctx resource.

This approach follows typical design pattern wrt. kobject's release handler.

Cc: Dongli Zhang &lt;dongli.zhang@oracle.com&gt;
Cc: James Smart &lt;james.smart@broadcom.com&gt;
Cc: Bart Van Assche &lt;bart.vanassche@wdc.com&gt;
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen &lt;martin.petersen@oracle.com&gt;,
Cc: Christoph Hellwig &lt;hch@lst.de&gt;,
Cc: James E . J . Bottomley &lt;jejb@linux.vnet.ibm.com&gt;,
Reported-by: James Smart &lt;james.smart@broadcom.com&gt;
Fixes: 45a9c9d909b2 ("blk-mq: Fix a use-after-free")
Cc: stable@vger.kernel.org
Reviewed-by: Hannes Reinecke &lt;hare@suse.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Tested-by: James Smart &lt;james.smart@broadcom.com&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
