user/sven/linux.git/block, branch v3.5.7

block: uninitialized ioc->nr_tasks triggers WARN_ON

2012-08-15T14:52:46Z

commit 4638a83e8615de9c16c39dfed234951d0f468cf1 upstream. Hi, I'm using the old-fashioned 'dump' backup tool, and I noticed that it spews the below warning as of 3.5-rc1 and later (3.4 is fine): [ 10.886893] ------------[ cut here ]------------ [ 10.886904] WARNING: at include/linux/iocontext.h:140 copy_process+0x1488/0x1560() [ 10.886905] Hardware name: Bochs [ 10.886906] Modules linked in: [ 10.886908] Pid: 2430, comm: dump Not tainted 3.5.0-rc7+ #27 [ 10.886908] Call Trace: [ 10.886911] [] warn_slowpath_common+0x7a/0xb0 [ 10.886912] [] warn_slowpath_null+0x15/0x20 [ 10.886913] [] copy_process+0x1488/0x1560 [ 10.886914] [] do_fork+0xb4/0x340 [ 10.886918] [] ? recalc_sigpending+0x1a/0x50 [ 10.886919] [] ? __set_task_blocked+0x32/0x80 [ 10.886920] [] ? __set_current_blocked+0x3a/0x60 [ 10.886923] [] sys_clone+0x23/0x30 [ 10.886925] [] stub_clone+0x13/0x20 [ 10.886927] [] ? system_call_fastpath+0x16/0x1b [ 10.886928] ---[ end trace 32a14af7ee6a590b ]--- Reproducing is easy, I can hit it on a KVM system with a very basic config (x86_64 make defconfig + enable the drivers needed). To hit it, just install dump (on debian/ubuntu, not sure what the package might be called on Fedora), and: dump -o -f /tmp/foo / You'll see the warning in dmesg once it forks off the I/O process and starts dumping filesystem contents. I bisected it down to the following commit: commit f6e8d01bee036460e03bd4f6a79d014f98ba712e Author: Tejun Heo Date: Mon Mar 5 13:15:26 2012 -0800 block: add io_context->active_ref Currently ioc->nr_tasks is used to decide two things - whether an ioc is done issuing IOs and whether it's shared by multiple tasks. This patch separate out the first into ioc->active_ref, which is acquired and released using {get|put}_io_context_active() respectively. This will be used to associate bio's with a given task. This patch doesn't introduce any visible behavior change. Signed-off-by: Tejun Heo Cc: Vivek Goyal Signed-off-by: Jens Axboe It seems like the init of ioc->nr_tasks was removed in that patch, so it starts out at 0 instead of 1. Tejun, is the right thing here to add back the init, or should something else be done? The below patch removes the warning, but I haven't done any more extensive testing on it. Signed-off-by: Olof Johansson Acked-by: Tejun Heo Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

scsi: Silence unnecessary warnings about ioctl to partition

2012-06-15T10:52:46Z

Sometimes, warnings about ioctls to partition happen often enough that they form majority of the warnings in the kernel log and users complain. In some cases warnings are about ioctls such as SG_IO so it's not good to get rid of the warnings completely as they can ease debugging of userspace problems when ioctl is refused. Since I have seen warnings from lots of commands, including some proprietary userspace applications, I don't think disallowing the ioctls for processes with CAP_SYS_RAWIO will happen in the near future if ever. So lets just stop warning for processes with CAP_SYS_RAWIO for which ioctl is allowed. CC: Paolo Bonzini CC: James Bottomley CC: linux-scsi@vger.kernel.org Acked-by: Paolo Bonzini Signed-off-by: Jan Kara Signed-off-by: Jens Axboe

block: Drop dead function blk_abort_queue()

2012-06-15T06:46:23Z

This function was only used by btrfs code in btrfs_abort_devices() (seems in a wrong way). It was removed in commit d07eb9117050c9ed3f78296ebcc06128b52693be, So, Let's remove the dead code to avoid any confusion. Changes in v2: update commit log, btrfs_abort_devices() was removed already. Cc: Jens Axboe Cc: linux-kernel@vger.kernel.org Cc: Chris Mason Cc: linux-btrfs@vger.kernel.org Cc: David Sterba Signed-off-by: Asias He Signed-off-by: Jens Axboe

block: Mitigate lock unbalance caused by lock switching

2012-06-15T06:46:22Z

Commit 777eb1bf15b8532c396821774bf6451e563438f5 disconnects externally supplied queue_lock before blk_drain_queue(). Switching the lock would introduce lock unbalance because theads which have taken the external lock might unlock the internal lock in the during the queue drain. This patch mitigate this by disconnecting the lock after the queue draining since queue draining makes a lot of request_queue users go away. However, please note, this patch only makes the problem less likely to happen. Anyone who still holds a ref might try to issue a new request on a dead queue after the blk_cleanup_queue() finishes draining, the lock unbalance might still happen in this case. ===================================== [ BUG: bad unlock balance detected! ] 3.4.0+ #288 Not tainted ------------------------------------- fio/17706 is trying to release lock (&(&q->__queue_lock)->rlock) at: [] blk_queue_bio+0x2a2/0x380 but there are no more locks to release! other info that might help us debug this: 1 lock held by fio/17706: #0: (&(&vblk->lock)->rlock){......}, at: [] get_request_wait+0x19a/0x250 stack backtrace: Pid: 17706, comm: fio Not tainted 3.4.0+ #288 Call Trace: [] ? blk_queue_bio+0x2a2/0x380 [] print_unlock_inbalance_bug+0xf9/0x100 [] lock_release_non_nested+0x1df/0x330 [] ? dio_bio_end_aio+0x34/0xc0 [] ? bio_check_pages_dirty+0x85/0xe0 [] ? dio_bio_end_aio+0xb1/0xc0 [] ? blk_queue_bio+0x2a2/0x380 [] ? blk_queue_bio+0x2a2/0x380 [] lock_release+0xd9/0x250 [] _raw_spin_unlock_irq+0x23/0x40 [] blk_queue_bio+0x2a2/0x380 [] generic_make_request+0xca/0x100 [] submit_bio+0x76/0xf0 [] ? set_page_dirty_lock+0x3c/0x60 [] ? bio_set_pages_dirty+0x51/0x70 [] do_blockdev_direct_IO+0xbf8/0xee0 [] ? blkdev_get_block+0x80/0x80 [] __blockdev_direct_IO+0x55/0x60 [] ? blkdev_get_block+0x80/0x80 [] blkdev_direct_IO+0x57/0x60 [] ? blkdev_get_block+0x80/0x80 [] generic_file_aio_read+0x70e/0x760 [] ? __lock_acquire+0x215/0x5a0 [] ? aio_run_iocb+0x54/0x1a0 [] ? grab_cache_page_nowait+0xc0/0xc0 [] aio_rw_vect_retry+0x7c/0x1e0 [] ? aio_fsync+0x30/0x30 [] aio_run_iocb+0x66/0x1a0 [] do_io_submit+0x6f0/0xb80 [] ? trace_hardirqs_on_thunk+0x3a/0x3f [] sys_io_submit+0x10/0x20 [] system_call_fastpath+0x16/0x1b Changes since v2: Update commit log to explain how the code is still broken even if we delay the lock switching after the drain. Changes since v1: Update commit log as Tejun suggested. Acked-by: Tejun Heo Signed-off-by: Asias He Signed-off-by: Jens Axboe

block: Avoid missed wakeup in request waitqueue

2012-06-15T06:45:25Z

After hot-unplug a stressed disk, I found that rl->wait[] is not empty while rl->count[] is empty and there are theads still sleeping on get_request after the queue cleanup. With simple debug code, I found there are exactly nr_sleep - nr_wakeup of theads in D state. So there are missed wakeup. $ dmesg | grep nr_sleep [ 52.917115] ---> nr_sleep=1046, nr_wakeup=873, delta=173 $ vmstat 1 1 173 0 712640 24292 96172 0 0 0 0 419 757 0 0 0 100 0 To quote Tejun: Ah, okay, freed_request() wakes up single waiter with the assumption that after the wakeup there will at least be one successful allocation which in turn will continue the wakeup chain until the wait list is empty - ie. waiter wakeup is dependent on successful request allocation happening after each wakeup. With queue marked dead, any woken up waiter fails the allocation path, so the wakeup chaining is lost and we're left with hung waiters. What we need is wake_up_all() after drain completion. This patch fixes the missed wakeup by waking up all the theads which are sleeping on wait queue after queue drain. Changes in v2: Drop waitqueue_active() optimization Acked-by: Tejun Heo Signed-off-by: Asias He Fixed a bug by me, where stacked devices would oops on calling blk_drain_queue() since ->rq.wait[] do not get initialized unless it's a full queue setup. Signed-off-by: Jens Axboe

blkcg: drop local variable @q from blkg_destroy()

2012-06-06T06:35:31Z

blkg_destroy() caches @blkg->q in local variable @q. While there are two places which needs @blkg->q, only lockdep_assert_held() used the local variable leading to unused local variable warning if lockdep is configured out. Drop the local variable and just use @blkg->q directly. Signed-off-by: Tejun Heo Reported-by: Rakesh Iyer Signed-off-by: Jens Axboe

blkcg: fix blkg_alloc() failure path

2012-06-04T08:03:21Z

When policy data allocation fails in the middle, blkg_alloc() invokes blkg_free() to destroy the half constructed blkg. This ends up calling pd_exit_fn() on policy datas which didn't go through pd_init_fn(). Fix it by making blkg_alloc() call pd_init_fn() immediately after each policy data allocation. Signed-off-by: Tejun Heo Acked-by: Vivek Goyal Signed-off-by: Jens Axboe

block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED

2012-06-04T08:02:29Z

cfq may be built w/ or w/o blkcg support depending on CONFIG_CFQ_CGROUP_IOSCHED. If blkcg support is disabled, most of related code is ifdef'd out but some part is left dangling - blkcg_policy_cfq is left zero-filled and blkcg_policy_[un]register() calls are made on it. Feeding zero filled policy to blkcg_policy_register() is incorrect and triggers the following WARN_ON() if CONFIG_BLK_CGROUP && !CONFIG_CFQ_GROUP_IOSCHED. ------------[ cut here ]------------ WARNING: at block/blk-cgroup.c:867 Modules linked in: Modules linked in: CPU: 3 Not tainted 3.4.0-09547-gfb21aff #1 Process swapper/0 (pid: 1, task: 000000003ff80000, ksp: 000000003ff7f8b8) Krnl PSW : 0704100180000000 00000000003d76ca (blkcg_policy_register+0xca/0xe0) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3 Krnl GPRS: 0000000000000000 00000000014b85ec 00000000014b85b0 0000000000000000 000000000096fb60 0000000000000000 00000000009a8e78 0000000000000048 000000000099c070 0000000000b6f000 0000000000000000 000000000099c0b8 00000000014b85b0 0000000000667580 000000003ff7fd98 000000003ff7fd70 Krnl Code: 00000000003d76be: a7280001 lhi %r2,1 00000000003d76c2: a7f4ffdf brc 15,3d7680 #00000000003d76c6: a7f40001 brc 15,3d76c8 >00000000003d76ca: a7c8ffea lhi %r12,-22 00000000003d76ce: a7f4ffce brc 15,3d766a 00000000003d76d2: a7f40001 brc 15,3d76d4 00000000003d76d6: a7c80000 lhi %r12,0 00000000003d76da: a7f4ffc2 brc 15,3d765e Call Trace: ([<0000000000b6f000>] initcall_debug+0x0/0x4) [<0000000000989e8a>] cfq_init+0x62/0xd4 [<00000000001000ba>] do_one_initcall+0x3a/0x170 [<000000000096fb60>] kernel_init+0x214/0x2bc [<0000000000623202>] kernel_thread_starter+0x6/0xc [<00000000006231fc>] kernel_thread_starter+0x0/0xc no locks held by swapper/0/1. Last Breaking-Event-Address: [<00000000003d76c6>] blkcg_policy_register+0xc6/0xe0 ---[ end trace b8ef4903fcbf9dd3 ]--- This patch fixes the problem by ensuring all blkcg support code is inside CONFIG_CFQ_GROUP_IOSCHED. * blkcg_policy_cfq declaration and blkg_to_cfqg() definition are moved inside the first CONFIG_CFQ_GROUP_IOSCHED block. __maybe_unused is dropped from blkcg_policy_cfq decl. * blkcg_deactivate_poilcy() invocation is moved inside ifdef. This also makes the activation logic match cfq_init_queue(). * All blkcg_policy_[un]register() invocations are moved inside ifdef. Signed-off-by: Tejun Heo Reported-by: Heiko Carstens LKML-Reference: <20120601112954.GC3535@osiris.boeblingen.de.ibm.com> Signed-off-by: Jens Axboe

block: fix return value on cfq_init() failure

2012-06-04T08:01:38Z

cfq_init() would return zero after kmem cache creation failure. Fix so that it returns -ENOMEM. Signed-off-by: Tejun Heo Signed-off-by: Jens Axboe

block: avoid infinite loop in get_task_io_context()

2012-05-31T11:39:05Z

Calling get_task_io_context() on a exiting task which isn't %current can loop forever. This triggers at boot time on my dev machine. BUG: soft lockup - CPU#3 stuck for 22s ! [mountall.1603] Fix this by making create_task_io_context() returns -EBUSY in this case to break the loop. Signed-off-by: Eric Dumazet Cc: Tejun Heo Cc: Andrew Morton Cc: Alan Cox Signed-off-by: Jens Axboe