user/sven/linux.git/block, branch v4.16.14

block: do not use interruptible wait anywhere

2018-05-01T19:47:25Z

commit 1dc3039bc87ae7d19a990c3ee71cfd8a9068f428 upstream. When blk_queue_enter() waits for a queue to unfreeze, or unset the PREEMPT_ONLY flag, do not allow it to be interrupted by a signal. The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec ("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI device is resumed asynchronously, i.e. after un-freezing userspace tasks. So that commit exposed the bug as a regression in v4.15. A mysterious SIGBUS (or -EIO) sometimes happened during the time the device was being resumed. Most frequently, there was no kernel log message, and we saw Xorg or Xwayland killed by SIGBUS.[1] [1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979 Without this fix, I get an IO error in this test: # dd if=/dev/sda of=/dev/null iflag=direct & \ while killall -SIGUSR1 dd; do sleep 0.1; done & \ echo mem > /sys/power/state ; \ sleep 5; killall dd # stop after 5 seconds The interruptible wait was added to blk_queue_enter in commit 3ef28e83ab15 ("block: generic request_queue reference counting"). Before then, the interruptible wait was only in blk-mq, but I don't think it could ever have been correct. Reviewed-by: Bart Van Assche Cc: stable@vger.kernel.org Signed-off-by: Alan Jenkins Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

bfq-iosched: ensure to clear bic/bfqq pointers when preparing request

2018-05-01T19:47:25Z

commit 72961c4e6082be79825265d9193272b8a1634dec upstream. Even if we don't have an IO context attached to a request, we still need to clear the priv[0..1] pointers, as they could be pointing to previously used bic/bfqq structures. If we don't do so, we'll either corrupt memory on dispatching a request, or cause an imbalance in counters. Inspired by a fix from Kees. Reported-by: Oleksandr Natalenko Reported-by: Kees Cook Cc: stable@vger.kernel.org Fixes: aee69d78dec0 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler") Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

blk-mq: start request gstate with gen 1

2018-05-01T19:47:25Z

commit f4560231ec42092c6662acccabb28c6cac9f5dfb upstream. rq->gstate and rq->aborted_gstate both are zero before rqs are allocated. If we have a small timeout, when the timer fires, there could be rqs that are never allocated, and also there could be rq that has been allocated but not initialized and started. At the moment, the rq->gstate and rq->aborted_gstate both are 0, thus the blk_mq_terminate_expired will identify the rq is timed out and invoke .timeout early. For scsi, this will cause scsi_times_out to be invoked before the scsi_cmnd is not initialized, scsi_cmnd->device is still NULL at the moment, then we will get crash. Cc: Bart Van Assche Cc: Tejun Heo Cc: Ming Lei Cc: Martin Steigerwald Cc: stable@vger.kernel.org Signed-off-by: Jianchao Wang Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

blk-mq: don't keep offline CPUs mapped to hctx 0

2018-04-19T06:54:09Z

commit bffa9909a6b48d8ca3398dec601bc9162a4020c4 upstream. From commit 4b855ad37194 ("blk-mq: Create hctx for each present CPU), blk-mq doesn't remap queue after CPU topo is changed, that said when some of these offline CPUs become online, they are still mapped to hctx 0, then hctx 0 may become the bottleneck of IO dispatch and completion. This patch sets up the mapping from the beginning, and aligns to queue mapping for PCI device (blk_mq_pci_map_queues()). Cc: Stefan Haberland Cc: Keith Busch Cc: stable@vger.kernel.org Fixes: 4b855ad37194 ("blk-mq: Create hctx for each present CPU) Tested-by: Christian Borntraeger Reviewed-by: Christoph Hellwig Reviewed-by: Sagi Grimberg Signed-off-by: Ming Lei Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

blk-mq: make sure that correct hctx->next_cpu is set

2018-04-19T06:54:09Z

commit a1c735fb790745f94a359df45c11df4a69760389 upstream. From commit 20e4d81393196 (blk-mq: simplify queue mapping & schedule with each possisble CPU), one hctx can be mapped from all offline CPUs, then hctx->next_cpu can be set as wrong. This patch fixes this issue by making hctx->next_cpu pointing to the first CPU in hctx->cpumask if all CPUs in hctx->cpumask are offline. Cc: Stefan Haberland Tested-by: Christian Borntraeger Reviewed-by: Christoph Hellwig Reviewed-by: Sagi Grimberg Fixes: 20e4d81393196 ("blk-mq: simplify queue mapping & schedule with each possisble CPU") Cc: stable@vger.kernel.org Signed-off-by: Ming Lei Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

blk-mq: order getting budget and driver tag

2018-04-19T06:54:09Z

commit 0bca799b92807ee9be0890690f5dde7d8c6a8e25 upstream. This patch orders getting budget and driver tag by making sure to acquire driver tag after budget is got, this way can help to avoid the following race: 1) before dispatch request from scheduler queue, get one budget first, then dequeue a request, call it request A. 2) in another IO path for dispatching request B which is from hctx->dispatch, driver tag is got, then try to get budget in blk_mq_dispatch_rq_list(), unfortunately the budget is held by request A. 3) meantime blk_mq_dispatch_rq_list() is called for dispatching request A, and try to get driver tag first, unfortunately no driver tag is available because the driver tag is held by request B 4) both two IO pathes can't move on, and IO stall is caused. This issue can be observed when running dbench on USB storage. This patch fixes this issue by always getting budget before getting driver tag. Cc: stable@vger.kernel.org Fixes: de1482974080ec9e ("blk-mq: introduce .get_budget and .put_budget in blk_mq_ops") Cc: Christoph Hellwig Cc: Bart Van Assche Cc: Omar Sandoval Signed-off-by: Ming Lei Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

blk-mq: Directly schedule q->timeout_work when aborting a request

2018-04-19T06:54:09Z

commit bc6d65e6dc89c3b7ff78e4ad797117c122ffde8e upstream. Request abortion is performed by overriding deadline to now and scheduling timeout handling immediately. For the latter part, the code was using mod_timer(timeout, 0) which can't guarantee that the timer runs afterwards. Let's schedule the underlying work item directly instead. This fixes the hangs during probing reported by Sitsofe but it isn't yet clear to me how the failure can happen reliably if it's just the above described race condition. Signed-off-by: Tejun Heo Reported-by: Sitsofe Wheeler Reported-by: Meelis Roos Fixes: 358f70da49d7 ("blk-mq: make blk_abort_request() trigger timeout path") Cc: stable@vger.kernel.org # v4.16 Link: http://lkml.kernel.org/r/CALjAwxh-PVYFnYFCJpGOja+m5SzZ8Sa4J7ohxdK=r8NyOF-EMA@mail.gmail.com Link: http://lkml.kernel.org/r/alpine.LRH.2.21.1802261049140.4893@math.ut.ee Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

block: Change a rcu_read_{lock,unlock}_sched() pair into rcu_read_{lock,unlock}()

2018-04-19T06:54:08Z

commit 818e0fa293ca836eba515615c64680ea916fd7cd upstream. scsi_device_quiesce() uses synchronize_rcu() to guarantee that the effect of blk_set_preempt_only() will be visible for percpu_ref_tryget() calls that occur after the queue unfreeze by using the approach explained in https://lwn.net/Articles/573497/. The rcu read lock and unlock calls in blk_queue_enter() form a pair with the synchronize_rcu() call in scsi_device_quiesce(). Both scsi_device_quiesce() and blk_queue_enter() must either use regular RCU or RCU-sched. Since neither the RCU-protected code in blk_queue_enter() nor blk_queue_usage_counter_release() sleeps, regular RCU protection is sufficient. Note: scsi_device_quiesce() does not have to be modified since it already uses synchronize_rcu(). Reported-by: Tejun Heo Fixes: 3a0a529971ec ("block, scsi: Make SCSI quiesce and resume work reliably") Signed-off-by: Bart Van Assche Acked-by: Tejun Heo Cc: Tejun Heo Cc: Hannes Reinecke Cc: Ming Lei Cc: Christoph Hellwig Cc: Johannes Thumshirn Cc: Oleksandr Natalenko Cc: Martin Steigerwald Cc: stable@vger.kernel.org # v4.15 Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

Fix slab name "biovec-(1<<(21-12))"

2018-04-08T12:29:52Z

commit bd5c4facf59648581d2f1692dad7b107bf429954 upstream. I'm getting a slab named "biovec-(1<<(21-12))". It is caused by unintended expansion of the macro BIO_MAX_PAGES. This patch renames it to biovec-max. Signed-off-by: Mikulas Patocka Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

Merge tag 'for-linus-20180302' of git://git.kernel.dk/linux-block

2018-03-02T17:35:36Z

Pull block fixes from Jens Axboe: "A collection of fixes for this series. This is a little larger than usual at this time, but that's mainly because I was out on vacation last week. Nothing in here is major in any way, it's just two weeks of fixes. This contains: - NVMe pull from Keith, with a set of fixes from the usual suspects. - mq-deadline zone unlock fix from Damien, fixing an issue with the SMR zone locking added for 4.16. - two bcache fixes sent in by Michael, with changes from Coly and Tang. - comment typo fix from Eric for blktrace. - return-value error handling fix for nbd, from Gustavo. - fix a direct-io case where we don't defer to a completion handler, making us sleep from IRQ device completion. From Jan. - a small series from Jan fixing up holes around handling of bdev references. - small set of regression fixes from Jiufei, mostly fixing problems around the gendisk pointer -> partition index change. - regression fix from Ming, fixing a boundary issue with the discard page cache invalidation. - two-patch series from Ming, fixing both a core blk-mq-sched and kyber issue around token freeing on a requeue condition" * tag 'for-linus-20180302' of git://git.kernel.dk/linux-block: (24 commits) block: fix a typo block: display the correct diskname for bio block: fix the count of PGPGOUT for WRITE_SAME mq-deadline: Make sure to always unlock zones nvmet: fix PSDT field check in command format nvme-multipath: fix sysfs dangerously created links nbd: fix return value in error handling path bcache: fix kcrashes with fio in RAID5 backend dev bcache: correct flash only vols (check all uuids) blktrace_api.h: fix comment for struct blk_user_trace_setup blockdev: Avoid two active bdev inodes for one device genhd: Fix BUG in blkdev_open() genhd: Fix use after free in __blkdev_get() genhd: Add helper put_disk_and_module() genhd: Rename get_disk() to get_disk_and_module() genhd: Fix leaked module reference for NVME devices direct-io: Fix sleep in atomic due to sync AIO nvme-pci: Fix nvme queue cleanup if IRQ setup fails block: kyber: fix domain token leak during requeue blk-mq: don't call io sched's .requeue_request when requeueing rq to ->dispatch ...