summaryrefslogtreecommitdiff
path: root/drivers/block
AgeCommit message (Collapse)Author
2020-04-17xen/blkfront: fix memory allocation flags in blkfront_setup_indirect()Juergen Gross
commit 3a169c0be75b59dd85d159493634870cdec6d3c4 upstream. Commit 1d5c76e664333 ("xen-blkfront: switch kcalloc to kvcalloc for large array allocation") didn't fix the issue it was meant to, as the flags for allocating the memory are GFP_NOIO, which will lead the memory allocation falling back to kmalloc(). So instead of GFP_NOIO use GFP_KERNEL and do all the memory allocation in blkfront_setup_indirect() in a memalloc_noio_{save,restore} section. Fixes: 1d5c76e664333 ("xen-blkfront: switch kcalloc to kvcalloc for large array allocation") Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20200403090034.8753-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17null_blk: fix spurious IO errors after failed past-wp accessAlexey Dobriyan
[ Upstream commit ff77042296d0a54535ddf74412c5ae92cb4ec76a ] Steps to reproduce: BLKRESETZONE zone 0 // force EIO pwrite(fd, buf, 4096, 4096); [issue more IO including zone ioctls] It will start failing randomly including IO to unrelated zones because of ->error "reuse". Trigger can be partition detection as well if test is not run immediately which is even more entertaining. The fix is of course to clear ->error where necessary. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexey Dobriyan (SK hynix) <adobriyan@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17null_blk: Handle null_add_dev() failures properlyBart Van Assche
[ Upstream commit 9b03b713082a31a5b90e0a893c72aa620e255c26 ] If null_add_dev() fails then null_del_dev() is called with a NULL argument. Make null_del_dev() handle this scenario correctly. This patch fixes the following KASAN complaint: null-ptr-deref in null_del_dev+0x28/0x280 [null_blk] Read of size 8 at addr 0000000000000000 by task find/1062 Call Trace: dump_stack+0xa5/0xe6 __kasan_report.cold+0x65/0x99 kasan_report+0x16/0x20 __asan_load8+0x58/0x90 null_del_dev+0x28/0x280 [null_blk] nullb_group_drop_item+0x7e/0xa0 [null_blk] client_drop_item+0x53/0x80 [configfs] configfs_rmdir+0x395/0x4e0 [configfs] vfs_rmdir+0xb6/0x220 do_rmdir+0x238/0x2c0 __x64_sys_unlinkat+0x75/0x90 do_syscall_64+0x6f/0x2f0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Johannes Thumshirn <jth@kernel.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17null_blk: Fix the null_add_dev() error pathBart Van Assche
[ Upstream commit 2004bfdef945fe55196db6b9cdf321fbc75bb0de ] If null_add_dev() fails, clear dev->nullb. This patch fixes the following KASAN complaint: BUG: KASAN: use-after-free in nullb_device_submit_queues_store+0xcf/0x160 [null_blk] Read of size 8 at addr ffff88803280fc30 by task check/8409 Call Trace: dump_stack+0xa5/0xe6 print_address_description.constprop.0+0x26/0x260 __kasan_report.cold+0x7b/0x99 kasan_report+0x16/0x20 __asan_load8+0x58/0x90 nullb_device_submit_queues_store+0xcf/0x160 [null_blk] configfs_write_file+0x1c4/0x250 [configfs] __vfs_write+0x4c/0x90 vfs_write+0x145/0x2c0 ksys_write+0xd7/0x180 __x64_sys_write+0x47/0x50 do_syscall_64+0x6f/0x2f0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7ff370926317 Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007fff2dd2da48 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007ff370926317 RDX: 0000000000000002 RSI: 0000559437ef23f0 RDI: 0000000000000001 RBP: 0000559437ef23f0 R08: 000000000000000a R09: 0000000000000001 R10: 0000559436703471 R11: 0000000000000246 R12: 0000000000000002 R13: 00007ff370a006a0 R14: 00007ff370a014a0 R15: 00007ff370a008a0 Allocated by task 8409: save_stack+0x23/0x90 __kasan_kmalloc.constprop.0+0xcf/0xe0 kasan_kmalloc+0xd/0x10 kmem_cache_alloc_node_trace+0x129/0x4c0 null_add_dev+0x24a/0xe90 [null_blk] nullb_device_power_store+0x1b6/0x270 [null_blk] configfs_write_file+0x1c4/0x250 [configfs] __vfs_write+0x4c/0x90 vfs_write+0x145/0x2c0 ksys_write+0xd7/0x180 __x64_sys_write+0x47/0x50 do_syscall_64+0x6f/0x2f0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 8409: save_stack+0x23/0x90 __kasan_slab_free+0x112/0x160 kasan_slab_free+0x12/0x20 kfree+0xdf/0x250 null_add_dev+0xaf3/0xe90 [null_blk] nullb_device_power_store+0x1b6/0x270 [null_blk] configfs_write_file+0x1c4/0x250 [configfs] __vfs_write+0x4c/0x90 vfs_write+0x145/0x2c0 ksys_write+0xd7/0x180 __x64_sys_write+0x47/0x50 do_syscall_64+0x6f/0x2f0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 2984c8684f96 ("nullb: factor disk parameters") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Johannes Thumshirn <jth@kernel.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-03-18virtio-blk: fix hw_queue stopped on arbitrary errorHalil Pasic
commit f5f6b95c72f7f8bb46eace8c5306c752d0133daa upstream. Since nobody else is going to restart our hw_queue for us, the blk_mq_start_stopped_hw_queues() is in virtblk_done() is not sufficient necessarily sufficient to ensure that the queue will get started again. In case of global resource outage (-ENOMEM because mapping failure, because of swiotlb full) our virtqueue may be empty and we can get stuck with a stopped hw_queue. Let us not stop the queue on arbitrary errors, but only on -EONSPC which indicates a full virtqueue, where the hw_queue is guaranteed to get started by virtblk_done() before when it makes sense to carry on submitting requests. Let us also remove a stale comment. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Cc: Jens Axboe <axboe@kernel.dk> Fixes: f7728002c1c7 ("virtio_ring: fix return code on DMA mapping fails") Link: https://lore.kernel.org/r/20200213123728.61216-2-pasic@linux.ibm.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-28floppy: check FDC index for errors before assigning itLinus Torvalds
commit 2e90ca68b0d2f5548804f22f0dd61145516171e3 upstream. Jordy Zomer reported a KASAN out-of-bounds read in the floppy driver in wait_til_ready(). Which on the face of it can't happen, since as Willy Tarreau points out, the function does no particular memory access. Except through the FDCS macro, which just indexes a static allocation through teh current fdc, which is always checked against N_FDC. Except the checking happens after we've already assigned the value. The floppy driver is a disgrace (a lot of it going back to my original horrd "design"), and has no real maintainer. Nobody has the hardware, and nobody really cares. But it still gets used in virtual environment because it's one of those things that everybody supports. The whole thing should be re-written, or at least parts of it should be seriously cleaned up. The 'current fdc' index, which is used by the FDCS macro, and which is often shadowed by a local 'fdc' variable, is a prime example of how not to write code. But because nobody has the hardware or the motivation, let's just fix up the immediate problem with a nasty band-aid: test the fdc index before actually assigning it to the static 'fdc' variable. Reported-by: Jordy Zomer <jordy@simplyhacker.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-24brd: check and limit max_part parZhiqiang Liu
[ Upstream commit c8ab422553c81a0eb070329c63725df1cd1425bc ] In brd_init func, rd_nr num of brd_device are firstly allocated and add in brd_devices, then brd_devices are traversed to add each brd_device by calling add_disk func. When allocating brd_device, the disk->first_minor is set to i * max_part, if rd_nr * max_part is larger than MINORMASK, two different brd_device may have the same devt, then only one of them can be successfully added. when rmmod brd.ko, it will cause oops when calling brd_exit. Follow those steps: # modprobe brd rd_nr=3 rd_size=102400 max_part=1048576 # rmmod brd then, the oops will appear. Oops log: [ 726.613722] Call trace: [ 726.614175] kernfs_find_ns+0x24/0x130 [ 726.614852] kernfs_find_and_get_ns+0x44/0x68 [ 726.615749] sysfs_remove_group+0x38/0xb0 [ 726.616520] blk_trace_remove_sysfs+0x1c/0x28 [ 726.617320] blk_unregister_queue+0x98/0x100 [ 726.618105] del_gendisk+0x144/0x2b8 [ 726.618759] brd_exit+0x68/0x560 [brd] [ 726.619501] __arm64_sys_delete_module+0x19c/0x2a0 [ 726.620384] el0_svc_common+0x78/0x130 [ 726.621057] el0_svc_handler+0x38/0x78 [ 726.621738] el0_svc+0x8/0xc [ 726.622259] Code: aa0203f6 aa0103f7 aa1e03e0 d503201f (7940e260) Here, we add brd_check_and_reset_par func to check and limit max_part par. -- V5->V6: - remove useless code V4->V5:(suggested by Ming Lei) - make sure max_part is not larger than DISK_MAX_PARTS V3->V4:(suggested by Ming Lei) - remove useless change - add one limit of max_part V2->V3: (suggested by Ming Lei) - clear .minors when running out of consecutive minor space in brd_alloc - remove limit of rd_nr V1->V2: - add more checks in brd_check_par_valid as suggested by Ming Lei. Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24rbd: work around -Wuninitialized warningArnd Bergmann
[ Upstream commit a55e601b2f02df5db7070e9a37bd655c9c576a52 ] gcc -O3 warns about a dummy variable that is passed down into rbd_img_fill_nodata without being initialized: drivers/block/rbd.c: In function 'rbd_img_fill_nodata': drivers/block/rbd.c:2573:13: error: 'dummy' is used uninitialized in this function [-Werror=uninitialized] fctx->iter = *fctx->pos; Since this is a dummy, I assume the warning is harmless, but it's better to initialize it anyway and avoid the warning. Fixes: mmtom ("init/Kconfig: enable -O3 for all arches") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24drivers/block/zram/zram_drv.c: fix error return codes not being returned in ↵Colin Ian King
writeback_store [ Upstream commit 3b82a051c10143639a378dcd12019f2353cc9054 ] Currently when an error code -EIO or -ENOSPC in the for-loop of writeback_store the error code is being overwritten by a ret = len assignment at the end of the function and the error codes are being lost. Fix this by assigning ret = len at the start of the function and remove the assignment from the end, hence allowing ret to be preserved when error codes are assigned to it. Addresses Coverity ("Unused value") Link: http://lkml.kernel.org/r/20191128122958.178290-1-colin.king@canonical.com Fixes: a939888ec38b ("zram: support idle/huge page writeback") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-02-24nbd: add a flush_workqueue in nbd_start_deviceSun Ke
[ Upstream commit 5c0dd228b5fc30a3b732c7ae2657e0161ec7ed80 ] When kzalloc fail, may cause trying to destroy the workqueue from inside the workqueue. If num_connections is m (2 < m), and NO.1 ~ NO.n (1 < n < m) kzalloc are successful. The NO.(n + 1) failed. Then, nbd_start_device will return ENOMEM to nbd_start_device_ioctl, and nbd_start_device_ioctl will return immediately without running flush_workqueue. However, we still have n recv threads. If nbd_release run first, recv threads may have to drop the last config_refs and try to destroy the workqueue from inside the workqueue. To fix it, add a flush_workqueue in nbd_start_device. Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") Signed-off-by: Sun Ke <sunke32@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-17Merge tag 'block-5.5-2020-01-16' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Three fixes that should go into this release: - The 32-bit segment size fix that I mentioned last week (Ming) - Use uint for the block size (Mikulas) - A null_blk zone write handling fix (Damien)" * tag 'block-5.5-2020-01-16' of git://git.kernel.dk/linux-block: block: fix an integer overflow in logical block size null_blk: Fix zone write handling block: fix get_max_segment_size() overflow on 32bit arch
2020-01-15null_blk: Fix zone write handlingDamien Le Moal
null_zone_write() only allows writing empty and implicitly opened zones. Writing to closed and explicitly opened zones must also be allowed and the zone condition must be transitioned to implicit open if the zone is not explicitly opened already. Fixes: da644b2cc1a4 ("null_blk: add zone open, close, and finish support") Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-03Merge tag 'block-5.5-20200103' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Three fixes in here: - Fix for a missing split on default memory boundary mask (4G) (Ming) - Fix for multi-page read bio truncate (Ming) - Fix for null_blk zone close request handling (Damien)" * tag 'block-5.5-20200103' of git://git.kernel.dk/linux-block: null_blk: Fix REQ_OP_ZONE_CLOSE handling block: fix splitting segments on boundary masks block: add bio_truncate to fix guard_bio_eod
2019-12-30null_blk: Fix REQ_OP_ZONE_CLOSE handlingDamien Le Moal
In order to match ZBC defined behavior, closing an empty zone must result in the "empty" zone condition instead of the "closed" condition. Fixes: da644b2cc1a4 ("null_blk: add zone open, close, and finish support") Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-27Merge tag 'block-5.5-20191226' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Only thing here are the changes from Arnd from last week, which now have the appropriate header include to ensure they actually compile if COMPAT is enabled" * tag 'block-5.5-20191226' of git://git.kernel.dk/linux-block: compat_ioctl: block: handle Persistent Reservations compat_ioctl: block: handle add zone open, close and finish ioctl compat_ioctl: block: handle BLKGETZONESZ/BLKGETNRZONES compat_ioctl: block: handle BLKREPORTZONE/BLKRESETZONE pktcdvd: fix regression on 64-bit architectures
2019-12-22Merge tag 'block-5.5-20191221' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Let's try this one again, this time without the compat_ioctl changes. We've got those fixed up, but that can go out next week. This contains: - block queue flush lockdep annotation (Bart) - Type fix for bsg_queue_rq() (Bart) - Three dasd fixes (Stefan, Jan) - nbd deadlock fix (Mike) - Error handling bio user map fix (Yang) - iocost fix (Tejun) - sbitmap waitqueue addition fix that affects the kyber IO scheduler (David)" * tag 'block-5.5-20191221' of git://git.kernel.dk/linux-block: sbitmap: only queue kyber's wait callback if not already active block: fix memleak when __blk_rq_map_user_iov() is failed s390/dasd: fix typo in copyright statement s390/dasd: fix memleak in path handling error case s390/dasd/cio: Interpret ccw_device_get_mdc return value correctly block: Fix a lockdep complaint triggered by request queue flushing block: Fix the type of 'sts' in bsg_queue_rq() block: end bio with BLK_STS_AGAIN in case of non-mq devs and REQ_NOWAIT nbd: fix shutdown and recv work deadlock v2 iocost: over-budget forced IOs should schedule async delay
2019-12-21pktcdvd: fix regression on 64-bit architecturesArnd Bergmann
The support for the compat ioctl did not actually do what it was supposed to do because of a typo, instead it broke native support for CDROM_LAST_WRITTEN and CDROM_SEND_PACKET on all architectures with CONFIG_COMPAT enabled. Fixes: 1b114b0817cc ("pktcdvd: add compat_ioctl handler") Signed-off-by: Arnd Bergmann <arnd@arndb.de> ---- Please apply for v5.5, I just noticed the regression while rebasing some of the patches I created on top. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-21Merge tag 'for-linus-5.5b-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "This contains two cleanup patches and a small series for supporting reloading the Xen block backend driver" * tag 'for-linus-5.5b-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/grant-table: remove multiple BUG_ON on gnttab_interface xen-blkback: support dynamic unbind/bind xen/interface: re-define FRONT/BACK_RING_ATTACH() xenbus: limit when state is forced to closed xenbus: move xenbus_dev_shutdown() into frontend code... xen/blkfront: Adjust indentation in xlvbd_alloc_gendisk
2019-12-20xen-blkback: support dynamic unbind/bindPaul Durrant
By simply re-attaching to shared rings during connect_ring() rather than assuming they are freshly allocated (i.e assuming the counters are zero) it is possible for vbd instances to be unbound and re-bound from and to (respectively) a running guest. This has been tested by running: while true; do fio --name=randwrite --ioengine=libaio --iodepth=16 \ --rw=randwrite --bs=4k --direct=1 --size=1G --verify=crc32; done in a PV guest whilst running: while true; do echo vbd-$DOMID-$VBD >unbind; echo unbound; sleep 5; echo vbd-$DOMID-$VBD >bind; echo bound; sleep 3; done in dom0 from /sys/bus/xen-backend/drivers/vbd to continuously unbind and re-bind its system disk image. This is a highly useful feature for a backend module as it allows it to be unloaded and re-loaded (i.e. updated) without requiring domUs to be halted. This was also tested by running: while true; do echo vbd-$DOMID-$VBD >unbind; echo unbound; sleep 5; rmmod xen-blkback; echo unloaded; sleep 1; modprobe xen-blkback; echo bound; cd $(pwd); sleep 3; done in dom0 whilst running the same loop as above in the (single) PV guest. Some (less stressful) testing has also been done using a Windows HVM guest with the latest 9.0 PV drivers installed. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-12-20xen/blkfront: Adjust indentation in xlvbd_alloc_gendiskNathan Chancellor
Clang warns: ../drivers/block/xen-blkfront.c:1117:4: warning: misleading indentation; statement is not part of the previous 'if' [-Wmisleading-indentation] nr_parts = PARTS_PER_DISK; ^ ../drivers/block/xen-blkfront.c:1115:3: note: previous statement is here if (err) ^ This is because there is a space at the beginning of this line; remove it so that the indentation is consistent according to the Linux kernel coding style and clang no longer warns. While we are here, the previous line has some trailing whitespace; clean that up as well. Fixes: c80a420995e7 ("xen-blkfront: handle Xen major numbers other than XENVBD") Link: https://github.com/ClangBuiltLinux/linux/issues/791 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-12-16nbd: fix shutdown and recv work deadlock v2Mike Christie
This fixes a regression added with: commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4 Author: Mike Christie <mchristi@redhat.com> Date: Sun Aug 4 14:10:06 2019 -0500 nbd: fix max number of supported devs where we can deadlock during device shutdown. The problem occurs if the recv_work's nbd_config_put occurs after nbd_start_device_ioctl has returned and the userspace app has droppped its reference via closing the device and running nbd_release. The recv_work nbd_config_put call would then drop the refcount to zero and try to destroy the config which would try to do destroy_workqueue from the recv work. This patch just has nbd_start_device_ioctl do a flush_workqueue when it wakes so we know after the ioctl returns running works have exited. This also fixes a possible race where we could try to reuse the device while old recv_works are still running. Cc: stable@vger.kernel.org Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-15Merge tag 'for-linus-5.5b-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Two fixes: one for a resource accounting bug in some configurations and a fix for another patch which went into rc1" * tag 'for-linus-5.5b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/balloon: fix ballooned page accounting without hotplug enabled xen-blkback: prevent premature module unload
2019-12-13xen-blkback: prevent premature module unloadPaul Durrant
Objects allocated by xen_blkif_alloc come from the 'blkif_cache' kmem cache. This cache is destoyed when xen-blkif is unloaded so it is necessary to wait for the deferred free routine used for such objects to complete. This necessity was missed in commit 14855954f636 "xen-blkback: allow module to be cleanly unloaded". This patch fixes the problem by taking/releasing extra module references in xen_blkif_alloc/free() respectively. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-12-07Merge tag 'for-linus-5.5b-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull more xen updates from Juergen Gross: - a patch to fix a build warning - a cleanup of no longer needed code in the Xen event handling - a small series for the Xen grant driver avoiding high order allocations and replacing an insane global limit by a per-call one - a small series fixing Xen frontend/backend module referencing * tag 'for-linus-5.5b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen-blkback: allow module to be cleanly unloaded xen/xenbus: reference count registered modules xen/gntdev: switch from kcalloc() to kvcalloc() xen/gntdev: replace global limit of mapped pages by limit per call xen/gntdev: remove redundant non-zero check on ret xen/events: remove event handling recursion detection
2019-12-06Merge tag 'for-linus-20191205' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull more block and io_uring updates from Jens Axboe: "I wasn't expecting this to be so big, and if I was, I would have used separate branches for this. Going forward I'll be doing separate branches for the current tree, just like for the next kernel version tree. In any case, this contains: - Series from Christoph that fixes an inherent race condition with zoned devices and revalidation. - null_blk zone size fix (Damien) - Fix for a regression in this merge window that caused busy spins by sending empty disk uevents (Eric) - Fix for a regression in this merge window for bfq stats (Hou) - Fix for io_uring creds allocation failure handling (me) - io_uring -ERESTARTSYS send/recvmsg fix (me) - Series that fixes the need for applications to retain state across async request punts for io_uring. This one is a bit larger than I would have hoped, but I think it's important we get this fixed for 5.5. - connect(2) improvement for io_uring, handling EINPROGRESS instead of having applications needing to poll for it (me) - Have io_uring use a hash for poll requests instead of an rbtree. This turned out to work much better in practice, so I think we should make the switch now. For some workloads, even with a fair amount of cancellations, the insertion sort is just too expensive. (me) - Various little io_uring fixes (me, Jackie, Pavel, LimingWu) - Fix for brd unaligned IO, and a warning for the future (Ming) - Fix for a bio integrity data leak (Justin) - bvec_iter_advance() improvement (Pavel) - Xen blkback page unmap fix (SeongJae) The major items in here are all well tested, and on the liburing side we continue to add regression and feature test cases. We're up to 50 topic cases now, each with anywhere from 1 to more than 10 cases in each" * tag 'for-linus-20191205' of git://git.kernel.dk/linux-block: (33 commits) block: fix memleak of bio integrity data io_uring: fix a typo in a comment bfq-iosched: Ensure bio->bi_blkg is valid before using it io_uring: hook all linked requests via link_list io_uring: fix error handling in io_queue_link_head io_uring: use hash table for poll command lookups io-wq: clear node->next on list deletion io_uring: ensure deferred timeouts copy necessary data io_uring: allow IO_SQE_* flags on IORING_OP_TIMEOUT null_blk: remove unused variable warning on !CONFIG_BLK_DEV_ZONED brd: warn on un-aligned buffer brd: remove max_hw_sectors queue limit xen/blkback: Avoid unmapping unmapped grant pages io_uring: handle connect -EINPROGRESS like -EAGAIN block: set the zone size in blk_revalidate_disk_zones atomically block: don't handle bio based drivers in blk_revalidate_disk_zones block: allocate the zone bitmaps lazily block: replace seq_zones_bitmap with conv_zones_bitmap block: simplify blkdev_nr_zones block: remove the empty line at the end of blk-zoned.c ...
2019-12-05Merge tag 'ceph-for-5.5-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "The two highlights are a set of improvements to how rbd read-only mappings are handled and a conversion to the new mount API (slightly complicated by the fact that we had a common option parsing framework that called out into rbd and the filesystem instead of them calling into it). Also included a few scattered fixes and a MAINTAINERS update for rbd, adding Dongsheng as a reviewer" * tag 'ceph-for-5.5-rc1' of git://github.com/ceph/ceph-client: libceph, rbd, ceph: convert to use the new mount API rbd: ask for a weaker incompat mask for read-only mappings rbd: don't query snapshot features rbd: remove snapshot existence validation code rbd: don't establish watch for read-only mappings rbd: don't acquire exclusive lock for read-only mappings rbd: disallow read-write partitions on images mapped read-only rbd: treat images mapped read-only seriously rbd: introduce RBD_DEV_FLAG_READONLY rbd: introduce rbd_is_snap() ceph: don't leave ino field in ceph_mds_request_head uninitialized ceph: tone down loglevel on ceph_mdsc_build_path warning rbd: update MAINTAINERS info ceph: fix geting random mds from mdsmap rbd: fix spelling mistake "requeueing" -> "requeuing" ceph: make several helper accessors take const pointers libceph: drop unnecessary check from dispatch() in mon_client.c
2019-12-04null_blk: remove unused variable warning on !CONFIG_BLK_DEV_ZONEDJens Axboe
If BLK_DEV_ZONED isn't set, 'ret' isn't used. This makes gcc complain, rightfully. Move ret where it is used. Fixes: 979d54475e0b ("null_blk: cleanup null_gendisk_register") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-04brd: warn on un-aligned bufferMing Lei
Queue dma alignment limit requires users(fs, target, ...) of block layer to pass aligned buffer. So far brd doesn't support un-aligned buffer, even though it is easy to support it. However, given brd is often used for debug purpose, and there are other drivers which can't support un-aligned buffer too. So add warning so that brd users know what to fix. Reported-by: Stephen Rust <srust@blockbridge.com> Cc: Stephen Rust <srust@blockbridge.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-04brd: remove max_hw_sectors queue limitMing Lei
Now we depend on blk_queue_split() to respect most of queue limit (the only one exception could be dma alignment), however blk_queue_split() isn't used for brd, so this limit isn't respected since v4.3. Also max_hw_sectors limit doesn't play a big role for brd, which is added since brd is added to tree for unknown reason. So remove it. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-04xen-blkback: allow module to be cleanly unloadedPaul Durrant
Add a module_exit() to perform the necessary clean-up. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: "Roger Pau Monné" <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-12-03xen/blkback: Avoid unmapping unmapped grant pagesSeongJae Park
For each I/O request, blkback first maps the foreign pages for the request to its local pages. If an allocation of a local page for the mapping fails, it should unmap every mapping already made for the request. However, blkback's handling mechanism for the allocation failure does not mark the remaining foreign pages as unmapped. Therefore, the unmap function merely tries to unmap every valid grant page for the request, including the pages not mapped due to the allocation failure. On a system that fails the allocation frequently, this problem leads to following kernel crash. [ 372.012538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 [ 372.012546] IP: [<ffffffff814071ac>] gnttab_unmap_refs.part.7+0x1c/0x40 [ 372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0 [ 372.012562] Oops: 0002 [#1] SMP [ 372.012566] Modules linked in: act_police sch_ingress cls_u32 ... [ 372.012746] Call Trace: [ 372.012752] [<ffffffff81407204>] gnttab_unmap_refs+0x34/0x40 [ 372.012759] [<ffffffffa0335ae3>] xen_blkbk_unmap+0x83/0x150 [xen_blkback] ... [ 372.012802] [<ffffffffa0336c50>] dispatch_rw_block_io+0x970/0x980 [xen_blkback] ... Decompressing Linux... Parsing ELF... done. Booting the kernel. [ 0.000000] Initializing cgroup subsys cpuset This commit fixes this problem by marking the grant pages of the given request that didn't mapped due to the allocation failure as invalid. Fixes: c6cc142dac52 ("xen-blkback: use balloon pages for all mappings") Reviewed-by: David Woodhouse <dwmw@amazon.de> Reviewed-by: Maximilian Heyne <mheyne@amazon.de> Reviewed-by: Paul Durrant <pdurrant@amazon.co.uk> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: SeongJae Park <sjpark@amazon.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03block: set the zone size in blk_revalidate_disk_zones atomicallyChristoph Hellwig
The current zone revalidation code has a major problem in that it doesn't update the zone size and q->nr_zones atomically, leading to a short window where an out of bounds access to the zone arrays is possible. To fix this move the setting of the zone size into the crticial sections blk_revalidate_disk_zones so that it gets updated together with the zone bitmaps and q->nr_zones. This also slightly simplifies the caller as it deducts the zone size from the report_zones. This change also allows to check for a power of two zone size in generic code. Reported-by: Hans Holmberg <hans@owltronix.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03block: don't handle bio based drivers in blk_revalidate_disk_zonesChristoph Hellwig
bio based drivers only need to update q->nr_zones. Do that manually instead of overloading blk_revalidate_disk_zones to keep that function simpler for the next round of changes that will rely even more on the request based functionality. Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03null_blk: cleanup null_gendisk_registerChristoph Hellwig
Use a saner size calculation, and do a trivial cleanup on the zone revalidation to prepare to future changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03null_blk: fix zone size paramter checkDamien Le Moal
For zoned=1 mode, the zone size must be a power of 2. Check this not only when the zone size is specified during modprobe, but also when creating a zoned null_blk device using configfs. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-01Merge tag 'for-linus-20191129' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "I wasn't going to send this one off so soon, but unfortunately one of the fixes from the previous pull broke the build on some archs. So I'm sending this sooner rather than later. This contains: - Add highmem.h include for io_uring, because of the kmap() additions from last round. For some reason the build bot didn't spot this even though it sat for days. - Three minor ';' removals - Add support for the Beurer CD-on-a-chip device - Make io_uring work on MMU-less archs" * tag 'for-linus-20191129' of git://git.kernel.dk/linux-block: io_uring: fix missing kmap() declaration on powerpc ataflop: Remove unneeded semicolon block: sunvdc: Remove unneeded semicolon drbd: Remove unneeded semicolon io_uring: add mapping support for NOMMU archs sr_vendor: support Beurer GL50 evo CD-on-a-chip devices. cdrom: respect device capabilities during opening action
2019-12-01Merge tag 'compat-ioctl-5.5' of ↵Linus Torvalds
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann: "As part of the cleanup of some remaining y2038 issues, I came to fs/compat_ioctl.c, which still has a couple of commands that need support for time64_t. In completely unrelated work, I spent time on cleaning up parts of this file in the past, moving things out into drivers instead. After Al Viro reviewed an earlier version of this series and did a lot more of that cleanup, I decided to try to completely eliminate the rest of it and move it all into drivers. This series incorporates some of Al's work and many patches of my own, but in the end stops short of actually removing the last part, which is the scsi ioctl handlers. I have patches for those as well, but they need more testing or possibly a rewrite" * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits) scsi: sd: enable compat ioctls for sed-opal pktcdvd: add compat_ioctl handler compat_ioctl: move SG_GET_REQUEST_TABLE handling compat_ioctl: ppp: move simple commands into ppp_generic.c compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic compat_ioctl: unify copy-in of ppp filters tty: handle compat PPP ioctls compat_ioctl: move SIOCOUTQ out of compat_ioctl.c compat_ioctl: handle SIOCOUTQNSD af_unix: add compat_ioctl support compat_ioctl: reimplement SG_IO handling compat_ioctl: move WDIOC handling into wdt drivers fs: compat_ioctl: move FITRIM emulation into file systems gfs2: add compat_ioctl support compat_ioctl: remove unused convert_in_user macro compat_ioctl: remove last RAID handling code compat_ioctl: remove /dev/raw ioctl translation compat_ioctl: remove PCI ioctl translation compat_ioctl: remove joystick ioctl translation ...
2019-11-28ataflop: Remove unneeded semicolonzhengbin
Fixes coccicheck warning: drivers/block/ataflop.c:860:53-54: Unneeded semicolon Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-28block: sunvdc: Remove unneeded semicolonzhengbin
Fixes coccicheck warning: drivers/block/sunvdc.c:637:2-3: Unneeded semicolon Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-28drbd: Remove unneeded semicolonzhengbin
Fixes coccicheck warning: drivers/block/drbd/drbd_req.c:887:2-3: Unneeded semicolon Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-27libceph, rbd, ceph: convert to use the new mount APIDavid Howells
Convert the ceph filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. [ Numerous string handling, leak and regression fixes; rbd conversion was particularly broken and had to be redone almost from scratch. ] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-11-25Merge tag 'printk-for-5.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Allow to print symbolic error names via new %pe modifier. - Use pr_warn() instead of the remaining pr_warning() calls. Fix formatting of the related lines. - Add VSPRINTF entry to MAINTAINERS. * tag 'printk-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (32 commits) checkpatch: don't warn about new vsprintf pointer extension '%pe' MAINTAINERS: Add VSPRINTF tools lib api: Renaming pr_warning to pr_warn ASoC: samsung: Use pr_warn instead of pr_warning lib: cpu_rmap: Use pr_warn instead of pr_warning trace: Use pr_warn instead of pr_warning dma-debug: Use pr_warn instead of pr_warning vgacon: Use pr_warn instead of pr_warning fs: afs: Use pr_warn instead of pr_warning sh/intc: Use pr_warn instead of pr_warning scsi: Use pr_warn instead of pr_warning platform/x86: intel_oaktrail: Use pr_warn instead of pr_warning platform/x86: asus-laptop: Use pr_warn instead of pr_warning platform/x86: eeepc-laptop: Use pr_warn instead of pr_warning oprofile: Use pr_warn instead of pr_warning of: Use pr_warn instead of pr_warning macintosh: Use pr_warn instead of pr_warning idsn: Use pr_warn instead of pr_warning ide: Use pr_warn instead of pr_warning crypto: n2: Use pr_warn instead of pr_warning ...
2019-11-25Merge tag 'for-5.5/disk-revalidate-20191122' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull disk revalidation updates from Jens Axboe: "This continues the work that Jan Kara started to thoroughly cleanup and consolidate how we handle rescans and revalidations" * tag 'for-5.5/disk-revalidate-20191122' of git://git.kernel.dk/linux-block: block: move clearing bd_invalidated into check_disk_size_change block: remove (__)blkdev_reread_part as an exported API block: fix bdev_disk_changed for non-partitioned devices block: move rescan_partitions to fs/block_dev.c block: merge invalidate_partitions into rescan_partitions block: refactor rescan_partitions
2019-11-25Merge tag 'for-5.5/zoned-20191122' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull zoned block device update from Jens Axboe: "Enhancements and improvements to the zoned device support" * tag 'for-5.5/zoned-20191122' of git://git.kernel.dk/linux-block: scsi: sd_zbc: Remove set but not used variable 'buflen' block: rework zone reporting scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer() null_blk: Add zone_nr_conv to features null_blk: clean up report zones null_blk: clean up the block device operations block: Remove partition support for zoned block devices block: Simplify report zones execution block: cleanup the !zoned case in blk_revalidate_disk_zones block: Enhance blk_revalidate_disk_zones()
2019-11-25Merge tag 'for-5.5/drivers-post-20191122' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull additional block driver updates from Jens Axboe: "Here's another block driver update, done to avoid conflicts with the zoned changes coming next. This contains: - Prepare SCSI sd for zone open/close/finish support - Small NVMe pull request - hwmon support (Akinobu) - add new co-maintainer (Christoph) - work-around for a discard issue on non-conformant drives (Eduard) - Small nbd leak fix" * tag 'for-5.5/drivers-post-20191122' of git://git.kernel.dk/linux-block: nbd: prevent memory leak nvme: hwmon: add quirk to avoid changing temperature threshold nvme: hwmon: provide temperature min and max values for each sensor nvmet: add another maintainer nvme: Discard workaround for non-conformant devices nvme: Add hardware monitoring support scsi: sd_zbc: add zone open, close, and finish support
2019-11-25Merge tag 'for-5.5/drivers-20191121' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: "Here are the main block driver updates for 5.5. Nothing major in here, mostly just fixes. This contains: - a set of bcache changes via Coly - MD changes from Song - loop unmap write-zeroes fix (Darrick) - spelling fixes (Geert) - zoned additions cleanups to null_blk/dm (Ajay) - allow null_blk online submit queue changes (Bart) - NVMe changes via Keith, nothing major here either" * tag 'for-5.5/drivers-20191121' of git://git.kernel.dk/linux-block: (56 commits) Revert "bcache: fix fifo index swapping condition in journal_pin_cmp()" drivers/md/raid5-ppl.c: use the new spelling of RWH_WRITE_LIFE_NOT_SET drivers/md/raid5.c: use the new spelling of RWH_WRITE_LIFE_NOT_SET bcache: don't export symbols bcache: remove the extra cflags for request.o bcache: at least try to shrink 1 node in bch_mca_scan() bcache: add idle_max_writeback_rate sysfs interface bcache: add code comments in bch_btree_leaf_dirty() bcache: fix deadlock in bcache_allocator bcache: add code comment bch_keylist_pop() and bch_keylist_pop_front() bcache: deleted code comments for dead code in bch_data_insert_keys() bcache: add more accurate error messages in read_super() bcache: fix static checker warning in bcache_device_free() bcache: fix a lost wake-up problem caused by mca_cannibalize_lock bcache: fix fifo index swapping condition in journal_pin_cmp() md/raid10: prevent access of uninitialized resync_pages offset md: avoid invalid memory access for array sb->dev_roles md/raid1: avoid soft lockup under high load null_blk: add zone open, close, and finish support dm: add zone open, close and finish support ...
2019-11-25rbd: ask for a weaker incompat mask for read-only mappingsIlya Dryomov
For a read-only mapping, ask for a set of features that make the image only unwritable rather than both unreadable and unwritable by a client that doesn't understand them. As of today, the difference between them for krbd is journaling (JOURNALING) and live migration (MIGRATING). get_features method supports read_only parameter since hammer, ceph.git commit 6176ec5fde2a ("librbd: differentiate between R/O vs R/W RBD features"). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
2019-11-25rbd: don't query snapshot featuresIlya Dryomov
Since infernalis, ceph.git commit 281f87f9ee52 ("cls_rbd: get_features on snapshots returns HEAD image features"), querying and checking that is pointless. Userspace support for manipulating image features after image creation came also in infernalis, so a snapshot with a different set of features wasn't ever possible. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
2019-11-25rbd: remove snapshot existence validation codeIlya Dryomov
RBD_DEV_FLAG_EXISTS check in rbd_queue_workfn() is racy and leads to inconsistent behaviour. If the object (or its snapshot) isn't there, the OSD returns ENOENT. A read submitted before the snapshot removal notification is processed would be zero-filled and ended with status OK, while future reads would be failed with IOERR. It also doesn't handle a case when an image that is mapped read-only is removed. On top of this, because watch is no longer established for read-only mappings, we no longer get notifications, so rbd_exists_validate() is effectively dead code. While failing requests rather than returning zeros is a good thing, RBD_DEV_FLAG_EXISTS is not it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
2019-11-25rbd: don't establish watch for read-only mappingsIlya Dryomov
With exclusive lock out of the way, watch is the only thing left that prevents a read-only mapping from being used with read-only OSD caps. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>