summaryrefslogtreecommitdiff
path: root/drivers/block/loop.c
AgeCommit message (Collapse)Author
2022-08-31loop: Check for overflow while configuring loopSiddh Raman Pant
commit c490a0b5a4f36da3918181a8acdc6991d967c5f3 upstream. The userspace can configure a loop using an ioctl call, wherein a configuration of type loop_config is passed (see lo_ioctl()'s case on line 1550 of drivers/block/loop.c). This proceeds to call loop_configure() which in turn calls loop_set_status_from_info() (see line 1050 of loop.c), passing &config->info which is of type loop_info64*. This function then sets the appropriate values, like the offset. loop_device has lo_offset of type loff_t (see line 52 of loop.c), which is typdef-chained to long long, whereas loop_info64 has lo_offset of type __u64 (see line 56 of include/uapi/linux/loop.h). The function directly copies offset from info to the device as follows (See line 980 of loop.c): lo->lo_offset = info->lo_offset; This results in an overflow, which triggers a warning in iomap_iter() due to a call to iomap_iter_done() which has: WARN_ON_ONCE(iter->iomap.offset > iter->pos); Thus, check for negative value during loop_set_status_from_info(). Bug report: https://syzkaller.appspot.com/bug?id=c620fe14aac810396d3c3edc9ad73848bf69a29e Reported-and-tested-by: syzbot+a8e049cd3abd342936b6@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Siddh Raman Pant <code@siddh.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220823160810.181275-1-code@siddh.me Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-08loop: use sysfs_emit() in the sysfs xxx show()Chaitanya Kulkarni
[ Upstream commit b27824d31f09ea7b4a6ba2c1b18bd328df3e8bed ] sprintf does not know the PAGE_SIZE maximum of the temporary buffer used for outputting sysfs content and it's possible to overrun the PAGE_SIZE buffer length. Use a generic sysfs_emit function that knows the size of the temporary buffer and ensures that no overrun is done for offset attribute in loop_attr_[offset|sizelimit|autoclear|partscan|dio]_show() callbacks. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Link: https://lore.kernel.org/r/20220215213310.7264-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-08block: loop:use kstatfs.f_bsize of backing file to set discard granularityMing Lei
[ Upstream commit 06582bc86d7f48d35cd044098ca1e246e8c7c52e ] If backing file's filesystem has implemented ->fallocate(), we think the loop device can support discard, then pass sb->s_blocksize as discard_granularity. However, some underlying FS, such as overlayfs, doesn't set sb->s_blocksize, and causes discard_granularity to be set as zero, then the warning in __blkdev_issue_discard() is triggered. Christoph suggested to pass kstatfs.f_bsize as discard granularity, and this way is fine because kstatfs.f_bsize means 'Optimal transfer block size', which still matches with definition of discard granularity. So fix the issue by setting discard_granularity as kstatfs.f_bsize if it is available, otherwise claims discard isn't supported. Cc: Christoph Hellwig <hch@lst.de> Cc: Vivek Goyal <vgoyal@redhat.com> Reported-by: Pei Zhang <pezhang@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220126035830.296465-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-12-17loop: Use pr_warn_once() for loop_control_remove() warningTetsuo Handa
[ Upstream commit e3f9387aea67742b9d1f4de8e5bb2fd08a8a4584 ] kernel test robot reported that RCU stall via printk() flooding is possible [1] when stress testing. Link: https://lkml.kernel.org/r/20211129073709.GA18483@xsang-OptiPlex-9020 [1] Reported-by: kernel test robot <oliver.sang@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-21loop: Use blk_validate_block_size() to validate block sizeXie Yongji
commit af3c570fb0df422b4906ebd11c1bf363d89961d5 upstream. Remove loop_validate_block_size() and use the block layer helper to validate block size. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20211026144015.188-4-xieyongji@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Cc: Tadeusz Struk <tadeusz.struk@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-03loop: reduce the loop_ctl_mutex scopeTetsuo Handa
syzbot is reporting circular locking problem at __loop_clr_fd() [1], for commit a160c6159d4a0cf8 ("block: add an optional probe callback to major_names") is calling the module's probe function with major_names_lock held. Fortunately, since commit 990e78116d38059c ("block: loop: fix deadlock between open and remove") stopped holding loop_ctl_mutex in lo_open(), current role of loop_ctl_mutex is to serialize access to loop_index_idr and loop_add()/loop_remove(); in other words, management of id for IDR. To avoid holding loop_ctl_mutex during whole add/remove operation, use a bool flag to indicate whether the loop device is ready for use. loop_unregister_transfer() which is called from cleanup_cryptoloop() currently has possibility of use-after-free problem due to lack of serialization between kfree() from loop_remove() from loop_control_remove() and mutex_lock() from unregister_transfer_cb(). But since lo->lo_encryption should be already NULL when this function is called due to module unload, and commit 222013f9ac30b9ce ("cryptoloop: add a deprecation warning") indicates that we will remove this function shortly, this patch updates this function to emit warning instead of checking lo->lo_encryption. Holding loop_ctl_mutex in loop_exit() is pointless, for all users must close /dev/loop-control and /dev/loop$num (in order to drop module's refcount to 0) before loop_exit() starts, and nobody can open /dev/loop-control or /dev/loop$num afterwards. Link: https://syzkaller.appspot.com/bug?id=7bb10e8b62f83e4d445cdf4c13d69e407e629558 [1] Reported-by: syzbot <syzbot+f61766d5763f9e7a118f@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/adb1e792-fc0e-ee81-7ea0-0906fc36419d@i-love.sakura.ne.jp Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-05loop: Select I/O scheduler 'none' from inside add_disk()Bart Van Assche
We noticed that the user interface of Android devices becomes very slow under memory pressure. This is because Android uses the zram driver on top of the loop driver for swapping, because under memory pressure the swap code alternates reads and writes quickly, because mq-deadline is the default scheduler for loop devices and because mq-deadline delays writes by five seconds for such a workload with default settings. Fix this by making the kernel select I/O scheduler 'none' from inside add_disk() for loop devices. This default can be overridden at any time from user space, e.g. via a udev rule. This approach has an advantage compared to changing the I/O scheduler from userspace from 'mq-deadline' into 'none', namely that synchronize_rcu() does not get called. This patch changes the default I/O scheduler for loop devices from 'mq-deadline' into 'none'. Additionally, this patch reduces the Android boot time on my test setup with 0.5 seconds compared to configuring the loop I/O scheduler from user space. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Martijn Coenen <maco@android.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210805174200.3250718-3-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-02loop: raise media_change eventMatteo Croce
Make the loop device raise a DISK_MEDIA_CHANGE event on attach or detach. # udevadm monitor -up |grep -e DISK_MEDIA_CHANGE -e DEVNAME & # losetup -f zero [ 7.454235] loop0: detected capacity change from 0 to 16384 DISK_MEDIA_CHANGE=1 DEVNAME=/dev/loop0 DEVNAME=/dev/loop0 DEVNAME=/dev/loop0 # losetup -f zero [ 10.205245] loop1: detected capacity change from 0 to 16384 DISK_MEDIA_CHANGE=1 DEVNAME=/dev/loop1 DEVNAME=/dev/loop1 DEVNAME=/dev/loop1 # losetup -f zero2 [ 13.532368] loop2: detected capacity change from 0 to 40960 DISK_MEDIA_CHANGE=1 DEVNAME=/dev/loop2 DEVNAME=/dev/loop2 # losetup -D DEVNAME=/dev/loop1 DISK_MEDIA_CHANGE=1 DEVNAME=/dev/loop1 DEVNAME=/dev/loop2 DISK_MEDIA_CHANGE=1 DEVNAME=/dev/loop2 DEVNAME=/dev/loop0 DISK_MEDIA_CHANGE=1 DEVNAME=/dev/loop0 Signed-off-by: Matteo Croce <mcroce@microsoft.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Luca Boccassi <bluca@debian.org> Link: https://lore.kernel.org/r/20210712230530.29323-7-mcroce@linux.microsoft.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-02loop: don't grab a reference to the block deviceChristoph Hellwig
The whole device block device won't be removed while the disk is still alive, so don't bother to grab a reference to it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Ming Lei <ming.lei@rehat.com> Reviewed-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Link: https://lore.kernel.org/r/20210722075402.983367-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-23loop: reintroduce global lock for safe loop_validate_file() traversalTetsuo Handa
Commit 6cc8e7430801fa23 ("loop: scale loop device by introducing per device lock") re-opened a race window for NULL pointer dereference at loop_validate_file() where commit 310ca162d779efee ("block/loop: Use global lock for ioctl() operation.") has closed. Although we need to guarantee that other loop devices will not change during traversal, we can't take remote "struct loop_device"->lo_mutex inside loop_validate_file() in order to avoid AB-BA deadlock. Therefore, introduce a global lock dedicated for loop_validate_file() which is conditionally taken before local "struct loop_device"->lo_mutex is taken. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 6cc8e7430801fa23 ("loop: scale loop device by introducing per device lock") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-02loop: remove unused variable in loop_set_status()Tetsuo Handa
Commit 0384264ea8a39bd9 ("block: pass a gendisk to bdev_disk_changed") changed to pass lo->lo_disk instead of lo->lo_device. Fixes: 0384264ea8a3 ("block: pass a gendisk to bdev_disk_changed") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/r/20210702152714.7978-1-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30loop: rewrite loop_exit using idr_for_each_entryChristoph Hellwig
Use idr_for_each_entry to simplify removing all devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30loop: split loop_lookupChristoph Hellwig
loop_lookup has two callers - one wants to do the a find by index and the other wants any unbound loop device. Open code the respective functionality in each caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210623145908.92973-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30loop: don't allow deleting an unspecified loop deviceChristoph Hellwig
Passing a negative index to loop_lookup while return any unbound device. Doing that for a delete does not make much sense, so add check to explicitly reject that case. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210623145908.92973-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30loop: move loop_ctl_mutex locking into loop_addChristoph Hellwig
Move acquiring and releasing loop_ctl_mutex from the callers into loop_add. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30loop: split loop_control_ioctlChristoph Hellwig
Split loop_control_ioctl into a helper for each command. This keeps the code nicely separated for the upcoming locking changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30loop: don't call loop_lookup before adding a loop deviceChristoph Hellwig
loop_add returns the right error if the slot wasn't available. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210623145908.92973-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30loop: remove the l argument to loop_addChristoph Hellwig
None of the callers cares about the allocated struct loop_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30loop: reduce loop_ctl_mutex coverage in loop_exitChristoph Hellwig
loop_ctl_mutex is only needed to iterate the IDR for removing the loop devices, so reduce the coverage. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30loop: reorder loop_exitChristoph Hellwig
Unregister the misc and blockdevice first to prevent further access, and only then iterate to remove the devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210623145908.92973-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30Merge tag 'for-5.14/drivers-2021-06-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: "Pretty calm round, mostly just NVMe and a bit of MD: - NVMe updates (via Christoph) - improve the APST configuration algorithm (Alexey Bogoslavsky) - look for StorageD3Enable on companion ACPI device (Mario Limonciello) - allow selecting the network interface for TCP connections (Martin Belanger) - misc cleanups (Amit Engel, Chaitanya Kulkarni, Colin Ian King, Christoph) - move the ACPI StorageD3 code to drivers/acpi/ and add quirks for certain AMD CPUs (Mario Limonciello) - zoned device support for nvmet (Chaitanya Kulkarni) - fix the rules for changing the serial number in nvmet (Noam Gottlieb) - various small fixes and cleanups (Dan Carpenter, JK Kim, Chaitanya Kulkarni, Hannes Reinecke, Wesley Sheng, Geert Uytterhoeven, Daniel Wagner) - MD updates (Via Song) - iostats rewrite (Guoqing Jiang) - raid5 lock contention optimization (Gal Ofri) - Fall through warning fix (Gustavo) - Misc fixes (Gustavo, Jiapeng)" * tag 'for-5.14/drivers-2021-06-29' of git://git.kernel.dk/linux-block: (78 commits) nvmet: use NVMET_MAX_NAMESPACES to set nn value loop: Fix missing discard support when using LOOP_CONFIGURE nvme.h: add missing nvme_lba_range_type endianness annotations nvme: remove zeroout memset call for struct nvme-pci: remove zeroout memset call for struct nvmet: remove zeroout memset call for struct nvmet: add ZBD over ZNS backend support nvmet: add Command Set Identifier support nvmet: add nvmet_req_bio put helper for backends nvmet: add req cns error complete helper block: export blk_next_bio() nvmet: remove local variable nvmet: use nvme status value directly nvmet: use u32 type for the local variable nsid nvmet: use u32 for nvmet_subsys max_nsid nvmet: use req->cmd directly in file-ns fast path nvmet: use req->cmd directly in bdev-ns fast path nvmet: make ver stable once connection established nvmet: allow mn change if subsys not discovered nvmet: make sn stable once connection was established ...
2021-06-30Merge tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block updates from Jens Axboe: - disk events cleanup (Christoph) - gendisk and request queue allocation simplifications (Christoph) - bdev_disk_changed cleanups (Christoph) - IO priority improvements (Bart) - Chained bio completion trace fix (Edward) - blk-wbt fixes (Jan) - blk-wbt enable/disable fix (Zhang) - Scheduler dispatch improvements (Jan, Ming) - Shared tagset scheduler improvements (John) - BFQ updates (Paolo, Luca, Pietro) - BFQ lock inversion fix (Jan) - Documentation improvements (Kir) - CLONE_IO block cgroup fix (Tejun) - Remove of ancient and deprecated block dump feature (zhangyi) - Discard merge fix (Ming) - Misc fixes or followup fixes (Colin, Damien, Dan, Long, Max, Thomas, Yang) * tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-block: (129 commits) block: fix discard request merge block/mq-deadline: Remove a WARN_ON_ONCE() call blk-mq: update hctx->dispatch_busy in case of real scheduler blk: Fix lock inversion between ioc lock and bfqd lock bfq: Remove merged request already in bfq_requests_merged() block: pass a gendisk to bdev_disk_changed block: move bdev_disk_changed block: add the events* attributes to disk_attrs block: move the disk events code to a separate file block: fix trace completion for chained bio block/partitions/msdos: Fix typo inidicator -> indicator block, bfq: reset waker pointer with shared queues block, bfq: check waker only for queues with no in-flight I/O block, bfq: avoid delayed merge of async queues block, bfq: boost throughput by extending queue-merging times block, bfq: consider also creation time in delayed stable merge block, bfq: fix delayed stable merge check block, bfq: let also stably merged queues enjoy weight raising blk-wbt: make sure throttle is enabled properly blk-wbt: introduce a new disable state to prevent false positive by rwb_enabled() ...
2021-06-29loop: charge i/o to mem and blk cgDan Schatzberg
The current code only associates with the existing blkcg when aio is used to access the backing file. This patch covers all types of i/o to the backing file and also associates the memcg so if the backing file is on tmpfs, memory is charged appropriately. This patch also exports cgroup_get_e_css and int_active_memcg so it can be used by the loop module. Link: https://lkml.kernel.org/r/20210610173944.1203706-4-schatzberg.dan@gmail.com Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Jens Axboe <axboe@kernel.dk> Cc: Chris Down <chris@chrisdown.name> Cc: Michal Hocko <mhocko@suse.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-29loop: use worker per cgroup instead of kworkerDan Schatzberg
Patch series "Charge loop device i/o to issuing cgroup", v14. The loop device runs all i/o to the backing file on a separate kworker thread which results in all i/o being charged to the root cgroup. This allows a loop device to be used to trivially bypass resource limits and other policy. This patch series fixes this gap in accounting. A simple script to demonstrate this behavior on cgroupv2 machine: ''' #!/bin/bash set -e CGROUP=/sys/fs/cgroup/test.slice LOOP_DEV=/dev/loop0 if [[ ! -d $CGROUP ]] then sudo mkdir $CGROUP fi grep oom_kill $CGROUP/memory.events # Set a memory limit, write more than that limit to tmpfs -> OOM kill sudo unshare -m bash -c " echo \$\$ > $CGROUP/cgroup.procs; echo 0 > $CGROUP/memory.swap.max; echo 64M > $CGROUP/memory.max; mount -t tmpfs -o size=512m tmpfs /tmp; dd if=/dev/zero of=/tmp/file bs=1M count=256" || true grep oom_kill $CGROUP/memory.events # Set a memory limit, write more than that limit through loopback # device -> no OOM kill sudo unshare -m bash -c " echo \$\$ > $CGROUP/cgroup.procs; echo 0 > $CGROUP/memory.swap.max; echo 64M > $CGROUP/memory.max; mount -t tmpfs -o size=512m tmpfs /tmp; truncate -s 512m /tmp/backing_file losetup $LOOP_DEV /tmp/backing_file dd if=/dev/zero of=$LOOP_DEV bs=1M count=256; losetup -D $LOOP_DEV" || true grep oom_kill $CGROUP/memory.events ''' Naively charging cgroups could result in priority inversions through the single kworker thread in the case where multiple cgroups are reading/writing to the same loop device. This patch series does some minor modification to the loop driver so that each cgroup can make forward progress independently to avoid this inversion. With this patch series applied, the above script triggers OOM kills when writing through the loop device as expected. This patch (of 3): Existing uses of loop device may have multiple cgroups reading/writing to the same device. Simply charging resources for I/O to the backing file could result in priority inversion where one cgroup gets synchronously blocked, holding up all other I/O to the loop device. In order to avoid this priority inversion, we use a single workqueue where each work item is a "struct loop_worker" which contains a queue of struct loop_cmds to issue. The loop device maintains a tree mapping blk css_id -> loop_worker. This allows each cgroup to independently make forward progress issuing I/O to the backing file. There is also a single queue for I/O associated with the rootcg which can be used in cases of extreme memory shortage where we cannot allocate a loop_worker. The locking for the tree and queues is fairly heavy handed - we acquire a per-loop-device spinlock any time either is accessed. The existing implementation serializes all I/O through a single thread anyways, so I don't believe this is any worse. [colin.king@canonical.com: fixes] Link: https://lkml.kernel.org/r/20210610173944.1203706-1-schatzberg.dan@gmail.com Link: https://lkml.kernel.org/r/20210610173944.1203706-2-schatzberg.dan@gmail.com Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Chris Down <chris@chrisdown.name> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-24block: pass a gendisk to bdev_disk_changedChristoph Hellwig
bdev_disk_changed can only operate on whole devices. Make that clear by passing a gendisk instead of the struct block_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210624123240.441814-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-18loop: Fix missing discard support when using LOOP_CONFIGUREKristian Klausen
Without calling loop_config_discard() the discard flag and parameters aren't set/updated for the loop device and worst-case they could indicate discard support when it isn't the case (ex: if the LOOP_SET_STATUS ioctl was used with a different file prior to LOOP_CONFIGURE). Cc: <stable@vger.kernel.org> # 5.8.x- Fixes: 3448914e8cc5 ("loop: Add LOOP_CONFIGURE ioctl") Signed-off-by: Kristian Klausen <kristian@klausen.dk> Link: https://lore.kernel.org/r/20210618115157.31452-1-kristian@klausen.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-16loop: fix order of cleaning up the queue and freeing the tagsetChristoph Hellwig
We must release the queue before freeing the tagset. Fixes: 1c99502fae35 ("loop: use blk_mq_alloc_disk and blk_cleanup_disk") Reported-by: Bruno Goncalves <bgoncalv@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-12Merge tag 'block-5.13-2021-06-12' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A few fixes that should go into 5.13: - Fix a regression deadlock introduced in this release between open and remove of a bdev (Christoph) - Fix an async_xor md regression in this release (Xiao) - Fix bcache oversized read issue (Coly)" * tag 'block-5.13-2021-06-12' of git://git.kernel.dk/linux-block: block: loop: fix deadlock between open and remove async_xor: check src_offs is not NULL before updating it bcache: avoid oversized read request in cache missing code path bcache: remove bcache device self-defined readahead
2021-06-11loop: use blk_mq_alloc_disk and blk_cleanup_diskChristoph Hellwig
Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210602065345.355274-19-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-11block: loop: fix deadlock between open and removeChristoph Hellwig
Commit c76f48eb5c08 ("block: take bd_mutex around delete_partitions in del_gendisk") adds disk->part0->bd_mutex in del_gendisk(), this way causes the following AB/BA deadlock between removing loop and opening loop: 1) loop_control_ioctl(LOOP_CTL_REMOVE) -> mutex_lock(&loop_ctl_mutex) -> del_gendisk -> mutex_lock(&disk->part0->bd_mutex) 2) blkdev_get_by_dev -> mutex_lock(&disk->part0->bd_mutex) -> lo_open -> mutex_lock(&loop_ctl_mutex) Add a new Lo_deleting state to remove the need for clearing ->private_data and thus holding loop_ctl_mutex in the ioctl LOOP_CTL_REMOVE path. Based on an analysis and earlier patch from Ming Lei <ming.lei@redhat.com>. Reported-by: Colin Ian King <colin.king@canonical.com> Fixes: c76f48eb5c08 ("block: take bd_mutex around delete_partitions in del_gendisk") Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210605140950.5800-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-01block: move bd_mutex to struct gendiskChristoph Hellwig
Replace the per-block device bd_mutex with a per-gendisk open_mutex, thus simplifying locking wherever we deal with partitions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20210525061301.2242282-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-06include: remove pagemap.h from blkdev.hMatthew Wilcox (Oracle)
My UEK-derived config has 1030 files depending on pagemap.h before this change. Afterwards, just 326 files need to be rebuilt when I touch pagemap.h. I think blkdev.h is probably included too widely, but untangling that dependency is harder and this solves my problem. x86 allmodconfig builds, but there may be implicit include problems on other architectures. Link: https://lkml.kernel.org/r/20210309195747.283796-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Dan Williams <dan.j.williams@intel.com> [nvdimm] Acked-by: Jens Axboe <axboe@kernel.dk> [block] Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: Martin K. Petersen <martin.petersen@oracle.com> [scsi] Reviewed-by: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-28Merge tag 'block-5.12-2021-02-27' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull more block updates from Jens Axboe: "A few stragglers (and one due to me missing it originally), and fixes for changes in this merge window mostly. In particular: - blktrace cleanups (Chaitanya, Greg) - Kill dead blk_pm_* functions (Bart) - Fixes for the bio alloc changes (Christoph) - Fix for the partition changes (Christoph, Ming) - Fix for turning off iopoll with polled IO inflight (Jeffle) - nbd disconnect fix (Josef) - loop fsync error fix (Mauricio) - kyber update depth fix (Yang) - max_sectors alignment fix (Mikulas) - Add bio_max_segs helper (Matthew)" * tag 'block-5.12-2021-02-27' of git://git.kernel.dk/linux-block: (21 commits) block: Add bio_max_segs blktrace: fix documentation for blk_fill_rw() block: memory allocations in bounce_clone_bio must not fail block: remove the gfp_mask argument to bounce_clone_bio block: fix bounce_clone_bio for passthrough bios block-crypto-fallback: use a bio_set for splitting bios block: fix logging on capacity change blk-settings: align max_sectors on "logical_block_size" boundary block: reopen the device in blkdev_reread_part block: don't skip empty device in in disk_uevent blktrace: remove debugfs file dentries from struct blk_trace nbd: handle device refs for DESTROY_ON_DISCONNECT properly kyber: introduce kyber_depth_updated() loop: fix I/O error on fsync() in detached loop devices block: fix potential IO hang when turning off io_poll block: get rid of the trace rq insert wrapper blktrace: fix blk_rq_merge documentation blktrace: fix blk_rq_issue documentation blktrace: add blk_fill_rwbs documentation comment block: remove superfluous param in blk_fill_rwbs() ...
2021-02-27Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted stuff pile - no common topic here" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: whack-a-mole: don't open-code iminor/imajor 9p: fix misuse of sscanf() in v9fs_stat2inode() audit_alloc_mark(): don't open-code ERR_CAST() fs/inode.c: make inode_init_always() initialize i_ino to 0 vfs: don't unnecessarily clone write access for writable fds
2021-02-23whack-a-mole: don't open-code iminor/imajorAl Viro
several instances creeped back into the tree... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-02-22loop: fix I/O error on fsync() in detached loop devicesMauricio Faria de Oliveira
There's an I/O error on fsync() in a detached loop device if it has been previously attached. The issue is write cache is enabled in the attach path in loop_configure() but it isn't disabled in the detach path; thus it remains enabled in the block device regardless of whether it is attached or not. Now fsync() can get an I/O request that will just be failed later in loop_queue_rq() as device's state is not 'Lo_bound'. So, disable write cache in the detach path. Do so based on the queue flag, not the loop device flag for read-only (used to enable) as the queue flag can be changed via sysfs even on read-only loop devices (e.g., losetup -r.) Test-case: # DEV=/dev/loop7 # IMG=/tmp/image # truncate --size 1M $IMG # losetup $DEV $IMG # losetup -d $DEV Before: # strace -e fsync parted -s $DEV print 2>&1 | grep fsync fsync(3) = -1 EIO (Input/output error) Warning: Error fsyncing/closing /dev/loop7: Input/output error [ 982.529929] blk_update_request: I/O error, dev loop7, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0 After: # strace -e fsync parted -s $DEV print 2>&1 | grep fsync fsync(3) = 0 Co-developed-by: Eric Desrochers <eric.desrochers@canonical.com> Signed-off-by: Eric Desrochers <eric.desrochers@canonical.com> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Tested-by: Gabriel Krisman Bertazi <krisman@collabora.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-26loop: scale loop device by introducing per device lockPavel Tatashin
Currently, loop device has only one global lock: loop_ctl_mutex. This becomes hot in scenarios where many loop devices are used. Scale it by introducing per-device lock: lo_mutex that protects modifications of all fields in struct loop_device. Keep loop_ctl_mutex to protect global data: loop_index_idr, loop_lookup, loop_add. The new lock ordering requirement is that loop_ctl_mutex must be taken before lo_mutex. Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-16Merge tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: "Nothing major in here: - NVMe pull request from Christoph: - nvmet passthrough improvements (Chaitanya Kulkarni) - fcloop error injection support (James Smart) - read-only support for zoned namespaces without Zone Append (Javier González) - improve some error message (Minwoo Im) - reject I/O to offline fabrics namespaces (Victor Gladkov) - PCI queue allocation cleanups (Niklas Schnelle) - remove an unused allocation in nvmet (Amit Engel) - a Kconfig spelling fix (Colin Ian King) - nvme_req_qid simplication (Baolin Wang) - MD pull request from Song: - Fix race condition in md_ioctl() (Dae R. Jeong) - Initialize read_slot properly for raid10 (Kevin Vigor) - Code cleanup (Pankaj Gupta) - md-cluster resync/reshape fix (Zhao Heming) - Move null_blk into its own directory (Damien Le Moal) - null_blk zone and discard improvements (Damien Le Moal) - bcache race fix (Dongsheng Yang) - Set of rnbd fixes/improvements (Gioh Kim, Guoqing Jiang, Jack Wang, Lutz Pogrell, Md Haris Iqbal) - lightnvm NULL pointer deref fix (tangzhenhao) - sr in_interrupt() removal (Sebastian Andrzej Siewior) - FC endpoint security support for s390/dasd (Jan Höppner, Sebastian Ott, Vineeth Vijayan). From the s390 arch guys, arch bits included as it made it easier for them to funnel the feature through the block driver tree. - Follow up fixes (Colin Ian King)" * tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-block: (64 commits) block: drop dead assignments in loop_init() sr: Remove in_interrupt() usage in sr_init_command(). sr: Switch the sector size back to 2048 if sr_read_sector() changed it. cdrom: Reset sector_size back it is not 2048. drivers/lightnvm: fix a null-ptr-deref bug in pblk-core.c null_blk: Move driver into its own directory null_blk: Allow controlling max_hw_sectors limit null_blk: discard zones on reset null_blk: cleanup discard handling null_blk: Improve implicit zone close null_blk: improve zone locking block: Align max_hw_sectors to logical blocksize null_blk: Fail zone append to conventional zones null_blk: Fix zone size initialization bcache: fix race between setting bdev state to none and new write request direct to backing block/rnbd: fix a null pointer dereference on dev->blk_symlink_name block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name block/rnbd: call kobject_put in the failure path Documentation/ABI/rnbd-srv: add document for force_close block/rnbd-srv: close a mapped device from server side. ...
2020-12-12block: drop dead assignments in loop_init()Lukas Bulwahn
Commit 8410d38c2552 ("loop: use __register_blkdev to allocate devices on demand") simplified loop_init(); so computing the range of the block region is not required anymore and can be dropped. Drop dead assignments in loop_init(). As compilers will detect these unneeded assignments and optimize this, the resulting object code is identical before and after this change. No functional change. No change in object code. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: remove the nr_sects field in struct hd_structChristoph Hellwig
Now that the hd_struct always has a block device attached to it, there is no need for having two size field that just get out of sync. Additionally the field in hd_struct did not use proper serialization, possibly allowing for torn writes. By only using the block_device field this problem also gets fixed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: simplify the block device claiming interfaceChristoph Hellwig
Stop passing the whole device as a separate argument given that it can be trivially deducted and cleanup the !holder debug check. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: remove ->bd_containsChristoph Hellwig
Now that each hd_struct has a reference to the corresponding block_device, there is no need for the bd_contains pointer. Add a bdev_whole() helper to look up the whole device block_device struture instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01block: remove i_bdevChristoph Hellwig
Switch the block device lookup interfaces to directly work with a dev_t so that struct block_device references are only acquired by the blkdev_get variants (and the blk-cgroup special case). This means that we now don't need an extra reference in the inode and can generally simplify handling of struct block_device to keep the lookups contained in the core block layer code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Coly Li <colyli@suse.de> [bcache] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01loop: do not call set_blocksizeChristoph Hellwig
set_blocksize is used by file systems to use their preferred buffer cache block size. Block drivers should not set it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-16block: remove the update_bdev parameter to set_capacity_revalidate_and_notifyChristoph Hellwig
The update_bdev argument is always set to true, so remove it. Also rename the function to the slighly less verbose set_capacity_and_notify, as propagating the disk size to the block device isn't really revalidation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-16loop: let set_capacity_revalidate_and_notify update the bdev sizeChristoph Hellwig
There is no good reason to call revalidate_disk_size separately. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-16loop: use __register_blkdev to allocate devices on demandChristoph Hellwig
Use the simpler mechanism attached to major_name to allocate a brd device when a currently unregistered minor is accessed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-16loop: use set_disk_roChristoph Hellwig
Use set_disk_ro instead of set_device_ro to match all other block drivers and to ensure all partitions mirror the read-only flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-12loop: Fix occasional uevent dropPetr Vorel
Commit 716ad0986cbd ("loop: Switch to set_capacity_revalidate_and_notify") causes an occasional drop of loop device uevent, which are no longer triggered in loop_set_size() but in a different part of code. Bug is reproducible with LTP test uevent01 [1]: i=0; while true; do i=$((i+1)); echo "== $i ==" lsmod |grep -q loop && rmmod -f loop ./uevent01 || break done Put back triggering through code called in loop_set_size(). Fix required to add yet another parameter to set_capacity_revalidate_and_notify(). [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/uevents/uevent01.c [hch: rebased on a different change to the prototype of set_capacity_revalidate_and_notify] Cc: stable@vger.kernel.org # v5.9 Fixes: 716ad0986cbd ("loop: Switch to set_capacity_revalidate_and_notify") Reported-by: <ltp@lists.linux.it> Signed-off-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-01block: replace bd_set_size with bd_set_nr_sectorsChristoph Hellwig
Replace bd_set_size with a version that takes the number of sectors instead, as that fits most of the current and future callers much better. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>