user/sven/linux.git/include/linux/blkdev.h, branch v3.16.74

block: move bio_integrity_{intervals,bytes} into blkdev.h

2018-12-16T22:08:26Z

commit 359f642700f2ff05d9c94cd9216c97af7b8e9553 upstream. This allows bio_integrity_bytes() to be called from drivers instead of open coding it. Acked-by: Martin K. Petersen Signed-off-by: Greg Edwards Signed-off-by: Jens Axboe [bwh: Backported to 3.16: bio_integrity_intervals() was called bio_integrity_hw_sectors() and had a different implementation. Move it without renaming.] Signed-off-by: Ben Hutchings

block: Fix transfer when chunk sectors exceeds max

2018-11-20T18:05:29Z

commit 15bfd21fbc5d35834b9ea383dc458a1f0c9e3434 upstream. A device may have boundary restrictions where the number of sectors between boundaries exceeds its max transfer size. In this case, we need to cap the max size to the smaller of the two limits. Reported-by: Jitendra Bhivare Tested-by: Jitendra Bhivare Reviewed-by: Martin K. Petersen Signed-off-by: Keith Busch Signed-off-by: Jens Axboe Signed-off-by: Ben Hutchings

blk-mq: fix race between timeout and freeing request

2018-03-03T15:52:29Z

commit 0048b4837affd153897ed1222283492070027aa9 upstream. Inside timeout handler, blk_mq_tag_to_rq() is called to retrieve the request from one tag. This way is obviously wrong because the request can be freed any time and some fiedds of the request can't be trusted, then kernel oops might be triggered[1]. Currently wrt. blk_mq_tag_to_rq(), the only special case is that the flush request can share same tag with the request cloned from, and the two requests can't be active at the same time, so this patch fixes the above issue by updating tags->rqs[tag] with the active request(either flush rq or the request cloned from) of the tag. Also blk_mq_tag_to_rq() gets much simplified with this patch. Given blk_mq_tag_to_rq() is mainly for drivers and the caller must make sure the request can't be freed, so in bt_for_each() this helper is replaced with tags->rqs[tag]. [1] kernel oops log [ 439.696220] BUG: unable to handle kernel NULL pointer dereference at 0000000000000158^M [ 439.697162] IP: [] blk_mq_tag_to_rq+0x21/0x6e^M [ 439.700653] PGD 7ef765067 PUD 7ef764067 PMD 0 ^M [ 439.700653] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC ^M [ 439.700653] Dumping ftrace buffer:^M [ 439.700653] (ftrace buffer empty)^M [ 439.700653] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw^M [ 439.700653] CPU: 6 PID: 2779 Comm: stress-ng-sigfd Not tainted 4.2.0-rc5-next-20150805+ #265^M [ 439.730500] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011^M [ 439.730500] task: ffff880605308000 ti: ffff88060530c000 task.ti: ffff88060530c000^M [ 439.730500] RIP: 0010:[] [] blk_mq_tag_to_rq+0x21/0x6e^M [ 439.730500] RSP: 0018:ffff880819203da0 EFLAGS: 00010283^M [ 439.730500] RAX: ffff880811b0e000 RBX: ffff8800bb465f00 RCX: 0000000000000002^M [ 439.730500] RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000^M [ 439.730500] RBP: ffff880819203db0 R08: 0000000000000002 R09: 0000000000000000^M [ 439.730500] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000202^M [ 439.730500] R13: ffff880814104800 R14: 0000000000000002 R15: ffff880811a2ea00^M [ 439.730500] FS: 00007f165b3f5740(0000) GS:ffff880819200000(0000) knlGS:0000000000000000^M [ 439.730500] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b^M [ 439.730500] CR2: 0000000000000158 CR3: 00000007ef766000 CR4: 00000000000006e0^M [ 439.730500] Stack:^M [ 439.730500] 0000000000000008 ffff8808114eed90 ffff880819203e00 ffffffff812dc104^M [ 439.755663] ffff880819203e40 ffffffff812d9f5e 0000020000000000 ffff8808114eed80^M [ 439.755663] Call Trace:^M [ 439.755663] ^M [ 439.755663] [] bt_for_each+0x6e/0xc8^M [ 439.755663] [] ? blk_mq_rq_timed_out+0x6a/0x6a^M [ 439.755663] [] ? blk_mq_rq_timed_out+0x6a/0x6a^M [ 439.755663] [] blk_mq_tag_busy_iter+0x55/0x5e^M [ 439.755663] [] ? blk_mq_bio_to_request+0x38/0x38^M [ 439.755663] [] blk_mq_rq_timer+0x5d/0xd4^M [ 439.755663] [] call_timer_fn+0xf7/0x284^M [ 439.755663] [] ? call_timer_fn+0x5/0x284^M [ 439.755663] [] ? blk_mq_bio_to_request+0x38/0x38^M [ 439.755663] [] run_timer_softirq+0x1ce/0x1f8^M [ 439.755663] [] __do_softirq+0x181/0x3a4^M [ 439.755663] [] irq_exit+0x40/0x94^M [ 439.755663] [] smp_apic_timer_interrupt+0x33/0x3e^M [ 439.755663] [] apic_timer_interrupt+0x84/0x90^M [ 439.755663] ^M [ 439.755663] [] ? _raw_spin_unlock_irq+0x32/0x4a^M [ 439.755663] [] finish_task_switch+0xe0/0x163^M [ 439.755663] [] ? finish_task_switch+0xa2/0x163^M [ 439.755663] [] __schedule+0x469/0x6cd^M [ 439.755663] [] schedule+0x82/0x9a^M [ 439.789267] [] signalfd_read+0x186/0x49a^M [ 439.790911] [] ? wake_up_q+0x47/0x47^M [ 439.790911] [] __vfs_read+0x28/0x9f^M [ 439.790911] [] ? __fget_light+0x4d/0x74^M [ 439.790911] [] vfs_read+0x7a/0xc6^M [ 439.790911] [] SyS_read+0x49/0x7f^M [ 439.790911] [] entry_SYSCALL_64_fastpath+0x12/0x6f^M [ 439.790911] Code: 48 89 e5 e8 a9 b8 e7 ff 5d c3 0f 1f 44 00 00 55 89 f2 48 89 e5 41 54 41 89 f4 53 48 8b 47 60 48 8b 1c d0 48 8b 7b 30 48 8b 53 38 <48> 8b 87 58 01 00 00 48 85 c0 75 09 48 8b 97 88 0c 00 00 eb 10 ^M [ 439.790911] RIP [] blk_mq_tag_to_rq+0x21/0x6e^M [ 439.790911] RSP ^M [ 439.790911] CR2: 0000000000000158^M [ 439.790911] ---[ end trace d40af58949325661 ]---^M Signed-off-by: Ming Lei Signed-off-by: Jens Axboe [bwh: Backported to 3.16: - Flush state is in struct request_queue, not struct blk_flush_queue - Flush request cloning is done in blk_mq_clone_flush_request() rather than blk_kick_flush() - Drop changes in bt{,_tags}_for_each() - Adjust filename, context] Signed-off-by: Ben Hutchings

blktrace: Fix potential deadlock between delete & sysfs ops

2018-02-13T18:42:21Z

commit 5acb3cc2c2e9d3020a4fee43763c6463767f1572 upstream. The lockdep code had reported the following unsafe locking scenario: CPU0 CPU1 ---- ---- lock(s_active#228); lock(&bdev->bd_mutex/1); lock(s_active#228); lock(&bdev->bd_mutex); *** DEADLOCK *** The deadlock may happen when one task (CPU1) is trying to delete a partition in a block device and another task (CPU0) is accessing tracing sysfs file (e.g. /sys/block/dm-1/trace/act_mask) in that partition. The s_active isn't an actual lock. It is a reference count (kn->count) on the sysfs (kernfs) file. Removal of a sysfs file, however, require a wait until all the references are gone. The reference count is treated like a rwsem using lockdep instrumentation code. The fact that a thread is in the sysfs callback method or in the ioctl call means there is a reference to the opended sysfs or device file. That should prevent the underlying block structure from being removed. Instead of using bd_mutex in the block_device structure, a new blk_trace_mutex is now added to the request_queue structure to protect access to the blk_trace structure. Suggested-by: Christoph Hellwig Signed-off-by: Waiman Long Acked-by: Steven Rostedt (VMware) Fix typo in patch subject line, and prune a comment detailing how the code used to work. Signed-off-by: Jens Axboe Signed-off-by: Ben Hutchings

blk: rq_data_dir() should not return a boolean

2017-04-04T21:21:52Z

commit 10fbd36e362a0f367e34a7cd876a81295d8fc5ca upstream. rq_data_dir() returns either READ or WRITE (0 == READ, 1 == WRITE), not a boolean value. Now, admittedly the "!= 0" doesn't really change the value (0 stays as zero, 1 stays as one), but it's not only redundant, it confuses gcc, and causes gcc to warn about the construct switch (rq_data_dir(req)) { case READ: ... case WRITE: ... that we have in a few drivers. Now, the gcc warning is silly and stupid (it seems to warn not about the switch value having a different type from the case statements, but about _any_ boolean switch value), but in this case the code itself is silly and stupid too, so let's just change it, and get rid of warnings like this: drivers/block/hd.c: In function ‘hd_request’: drivers/block/hd.c:630:11: warning: switch condition has boolean value [-Wswitch-bool] switch (rq_data_dir(req)) { The odd '!= 0' came in when "cmd_flags" got turned into a "u64" in commit 5953316dbf90 ("block: make rq->cmd_flags be 64-bit") and is presumably because the old code (that just did a logical 'and' with 1) would then end up making the type of rq_data_dir() be u64 too. But if we want to retain the old regular integer type, let's just cast the result to 'int' rather than use that rather odd '!= 0'. Signed-off-by: Linus Torvalds Signed-off-by: Ben Hutchings Cc: Arnd Bergmann

block: Always check queue limits for cloned requests

2016-01-05T11:22:20Z

commit bf4e6b4e757488dee1b6a581f49c7ac34cd217f8 upstream. When a cloned request is retried on other queues it always needs to be checked against the queue limits of that queue. Otherwise the calculations for nr_phys_segments might be wrong, leading to a crash in scsi_init_sgtable(). To clarify this the patch renames blk_rq_check_limits() to blk_cloned_rq_check_limits() and removes the symbol export, as the new function should only be used for cloned requests and never exported. Cc: Mike Snitzer Cc: Ewan Milne Cc: Jeff Moyer Signed-off-by: Hannes Reinecke Fixes: e2a60da74 ("block: Clean up special command handling logic") Acked-by: Mike Snitzer Signed-off-by: Jens Axboe Signed-off-by: Luis Henriques

block: fix alignment_offset math that assumes io_min is a power-of-2

2014-11-17T14:11:44Z

commit b8839b8c55f3fdd60dc36abcda7e0266aff7985c upstream. The math in both blk_stack_limits() and queue_limit_alignment_offset() assume that a block device's io_min (aka minimum_io_size) is always a power-of-2. Fix the math such that it works for non-power-of-2 io_min. This issue (of alignment_offset != 0) became apparent when testing dm-thinp with a thinp blocksize that matches a RAID6 stripesize of 1280K. Commit fdfb4c8c1 ("dm thin: set minimum_io_size to pool's data block size") unlocked the potential for alignment_offset != 0 due to the dm-thin-pool's io_min possibly being a non-power-of-2. Signed-off-by: Mike Snitzer Acked-by: Martin K. Petersen Signed-off-by: Jens Axboe Signed-off-by: Luis Henriques

Revert "block: all blk-mq requests are tagged"

2014-11-13T11:48:49Z

commit e999dbc254044e8d2a5818d92d205f65bae28f37 upstream. This reverts commit fb3ccb5da71273e7f0d50b50bc879e50cedd60e7. SCSI-2/SPI actually needs the tagged/untagged flag in the request to work properly. Revert this patch and add a follow on to set it in the right place. Signed-off-by: Christoph Hellwig Reviewed-by: Martin K. Petersen Acked-by: Jens Axboe Reported-by: Meelis Roos Tested-by: Meelis Roos Signed-off-by: Luis Henriques

block: add support for limiting gaps in SG lists

2014-06-24T22:22:24Z

Another restriction inherited for NVMe - those devices don't support SG lists that have "gaps" in them. Gaps refers to cases where the previous SG entry doesn't end on a page boundary. For NVMe, all SG entries must start at offset 0 (except the first) and end on a page boundary (except the last). Signed-off-by: Jens Axboe

block: blk_max_size_offset() should check ->max_sectors

2014-06-18T05:12:02Z

Commit 762380ad9322 inadvertently changed a check for max_sectors to max_hw_sectors. Revert that part, so we still compare against max_sectors. Signed-off-by: Jens Axboe