user/sven/linux.git/block, branch v5.4.244

blk-iocost: fix divide by 0 error in calc_lcoefs()

2023-03-11T15:43:54Z

[ Upstream commit 984af1e66b4126cf145153661cc24c213e2ec231 ] echo max of u64 to cost.model can cause divide by 0 error. # echo 8:0 rbps=18446744073709551615 > /sys/fs/cgroup/io.cost.model divide error: 0000 [#1] PREEMPT SMP RIP: 0010:calc_lcoefs+0x4c/0xc0 Call Trace: ioc_refresh_params+0x2b3/0x4f0 ioc_cost_model_write+0x3cb/0x4c0 ? _copy_from_iter+0x6d/0x6c0 ? kernfs_fop_write_iter+0xfc/0x270 cgroup_file_write+0xa0/0x200 kernfs_fop_write_iter+0x17d/0x270 vfs_write+0x414/0x620 ksys_write+0x73/0x160 __x64_sys_write+0x1e/0x30 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd calc_lcoefs() uses the input value of cost.model in DIV_ROUND_UP_ULL, overflow would happen if bps plus IOC_PAGE_SIZE is greater than ULLONG_MAX, it can cause divide by 0 error. Fix the problem by setting basecost Signed-off-by: Li Nan Signed-off-by: Yu Kuai Acked-by: Tejun Heo Link: https://lore.kernel.org/r/20230117070806.3857142-5-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

block: bio-integrity: Copy flags when bio_integrity_payload is cloned

2023-03-11T15:43:35Z

[ Upstream commit b6a4bdcda430e3ca43bbb9cb1d4d4d34ebe15c40 ] Make sure to copy the flags when a bio_integrity_payload is cloned. Otherwise per-I/O properties such as IP checksum flag will not be passed down to the HBA driver. Since the integrity buffer is owned by the original bio, the BIP_BLOCK_INTEGRITY flag needs to be masked off to avoid a double free in the completion path. Fixes: aae7df50190a ("block: Integrity checksum flag") Fixes: b1f01388574c ("block: Relocate bio integrity flags") Reported-by: Saurav Kashyap Tested-by: Saurav Kashyap Signed-off-by: Martin K. Petersen Reviewed-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Link: https://lore.kernel.org/r/20230215171801.21062-1-martin.petersen@oracle.com Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

blk-mq: correct stale comment of .get_budget

2023-03-11T15:43:34Z

[ Upstream commit 01542f651a9f58a9b176c3d3dc3eefbacee53b78 ] Commit 88022d7201e96 ("blk-mq: don't handle failure in .get_budget") remove BLK_STS_RESOURCE return value and we only check if we can get the budget from .get_budget() now. Correct stale comment that ".get_budget() returns BLK_STS_NO_RESOURCE" to ".get_budget() fails to get the budget". Fixes: 88022d7201e9 ("blk-mq: don't handle failure in .get_budget") Signed-off-by: Kemeng Shi Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

blk-mq: wait on correct sbitmap_queue in blk_mq_mark_tag_wait

2023-03-11T15:43:34Z

[ Upstream commit 98b99e9412d0cde8c7b442bf5efb09528a2ede8b ] For shared queues case, we will only wait on bitmap_tags if we fail to get driver tag. However, rq could be from breserved_tags, then two problems will occur: 1. io hung if no tag is currently allocated from bitmap_tags. 2. unnecessary wakeup when tag is freed to bitmap_tags while no tag is freed to breserved_tags. Wait on the bitmap which rq from to fix this. Fixes: f906a6a0f426 ("blk-mq: improve tag waiting setup for non-shared tags") Reviewed-by: Christoph Hellwig Signed-off-by: Kemeng Shi Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx

2023-03-11T15:43:34Z

[ Upstream commit c31e76bcc379182fe67a82c618493b7b8868c672 ] Commit 97889f9ac24f8 ("blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()") remove handle of TAG_SHARED in restart, then shared_hctx_restart counted for how many hardware queues are marked for restart is removed too. Remove the stale comment that we still count hardware queues need restart. Fixes: 97889f9ac24f ("blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()") Reviewed-by: Christoph Hellwig Signed-off-by: Kemeng Shi Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

block: Limit number of items taken from the I/O scheduler in one go

2023-03-11T15:43:34Z

[ Upstream commit 28d65729b050977d8a9125e6726871e83bd22124 ] Flushes bypass the I/O scheduler and get added to hctx->dispatch in blk_mq_sched_bypass_insert. This can happen while a kworker is running hctx->run_work work item and is past the point in blk_mq_sched_dispatch_requests where hctx->dispatch is checked. The blk_mq_do_dispatch_sched call is not guaranteed to end in bounded time, because the I/O scheduler can feed an arbitrary number of commands. Since we have only one hctx->run_work, the commands waiting in hctx->dispatch will wait an arbitrary length of time for run_work to be rerun. A similar phenomenon exists with dispatches from the software queue. The solution is to poll hctx->dispatch in blk_mq_do_dispatch_sched and blk_mq_do_dispatch_ctx and return from the run_work handler and let it rerun. Signed-off-by: Salman Qazi Reviewed-by: Ming Lei Signed-off-by: Jens Axboe Stable-dep-of: c31e76bcc379 ("blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx") Signed-off-by: Sasha Levin

blk-cgroup: fix missing pd_online_fn() while activating policy

2023-02-06T06:52:48Z

[ Upstream commit e3ff8887e7db757360f97634e0d6f4b8e27a8c46 ] If the policy defines pd_online_fn(), it should be called after pd_init_fn(), like blkg_create(). Signed-off-by: Yu Kuai Acked-by: Tejun Heo Link: https://lore.kernel.org/r/20230103112833.2013432-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

block: fix and cleanup bio_check_ro

2023-02-06T06:52:47Z

commit 57e95e4670d1126c103305bcf34a9442f49f6d6a upstream. Don't use a WARN_ON when printing a potentially user triggered condition. Also don't print the partno when the block device name already includes it, and use the %pg specifier to simplify printing the block device name. Signed-off-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Reviewed-by: Johannes Thumshirn Link: https://lore.kernel.org/r/20220304180105.409765-2-hch@lst.de Signed-off-by: Jens Axboe Signed-off-by: Fedor Pchelkin Signed-off-by: Greg Kroah-Hartman

blk-mq: fix possible memleak when register 'hctx' failed

2023-01-18T10:41:38Z

[ Upstream commit 4b7a21c57b14fbcd0e1729150189e5933f5088e9 ] There's issue as follows when do fault injection test: unreferenced object 0xffff888132a9f400 (size 512): comm "insmod", pid 308021, jiffies 4324277909 (age 509.733s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 08 f4 a9 32 81 88 ff ff ...........2.... 08 f4 a9 32 81 88 ff ff 00 00 00 00 00 00 00 00 ...2............ backtrace: [<00000000e8952bb4>] kmalloc_node_trace+0x22/0xa0 [<00000000f9980e0f>] blk_mq_alloc_and_init_hctx+0x3f1/0x7e0 [<000000002e719efa>] blk_mq_realloc_hw_ctxs+0x1e6/0x230 [<000000004f1fda40>] blk_mq_init_allocated_queue+0x27e/0x910 [<00000000287123ec>] __blk_mq_alloc_disk+0x67/0xf0 [<00000000a2a34657>] 0xffffffffa2ad310f [<00000000b173f718>] 0xffffffffa2af824a [<0000000095a1dabb>] do_one_initcall+0x87/0x2a0 [<00000000f32fdf93>] do_init_module+0xdf/0x320 [<00000000cbe8541e>] load_module+0x3006/0x3390 [<0000000069ed1bdb>] __do_sys_finit_module+0x113/0x1b0 [<00000000a1a29ae8>] do_syscall_64+0x35/0x80 [<000000009cd878b0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fault injection context as follows: kobject_add blk_mq_register_hctx blk_mq_sysfs_register blk_register_queue device_add_disk null_add_dev.part.0 [null_blk] As 'blk_mq_register_hctx' may already add some objects when failed halfway, but there isn't do fallback, caller don't know which objects add failed. To solve above issue just do fallback when add objects failed halfway in 'blk_mq_register_hctx'. Signed-off-by: Ye Bin Reviewed-by: Ming Lei Link: https://lore.kernel.org/r/20221117022940.873959-1-yebin@huaweicloud.com Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

block: unhash blkdev part inode when the part is deleted

2022-12-19T11:24:16Z

v5.11 changes the blkdev lookup mechanism completely since commit 22ae8ce8b892 ("block: simplify bdev/disk lookup in blkdev_get"), and small part of the change is to unhash part bdev inode when deleting partition. Turns out this kind of change does fix one nasty issue in case of BLOCK_EXT_MAJOR: 1) when one partition is deleted & closed, disk_put_part() is always called before bdput(bdev), see blkdev_put(); so the part's devt can be freed & re-used before the inode is dropped 2) then new partition with same devt can be created just before the inode in 1) is dropped, then the old inode/bdev structurein 1) is re-used for this new partition, this way causes use-after-free and kernel panic. It isn't possible to backport the whole big patchset of "merge struct block_device and struct hd_struct v4" for addressing this issue. https://lore.kernel.org/linux-block/20201128161510.347752-1-hch@lst.de/ So fixes it by unhashing part bdev in delete_partition(), and this way is actually aligned with v5.11+'s behavior. Backported from the following 5.10.y commit: 5f2f77560591 ("block: unhash blkdev part inode when the part is deleted") Reported-by: Shiwei Cui Tested-by: Shiwei Cui Cc: Christoph Hellwig Cc: Jan Kara Signed-off-by: Ming Lei Signed-off-by: Greg Kroah-Hartman