| Age | Commit message (Collapse) | Author |
|
commit a207f5937630dd35bd2550620bef416937a1365e upstream.
The probe function is supposed to return NULL on failure (as we can see in
kobj_lookup: kobj = probe(dev, index, data); ... if (kobj) return kobj;
However, in loop and brd, it returns negative error from ERR_PTR.
This causes a crash if we simulate disk allocation failure and run
less -f /dev/loop0 because the negative number is interpreted as a pointer:
BUG: unable to handle kernel NULL pointer dereference at 00000000000002b4
IP: [<ffffffff8118b188>] __blkdev_get+0x28/0x450
PGD 23c677067 PUD 23d6d1067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: loop hpfs nvidia(PO) ip6table_filter ip6_tables uvesafb cfbcopyarea cfbimgblt cfbfillrect fbcon font bitblit fbcon_rotate fbcon_cw fbcon_ud fbcon_ccw softcursor fb fbdev msr ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc tun ipv6 cpufreq_stats cpufreq_ondemand cpufreq_userspace cpufreq_powersave cpufreq_conservative hid_generic spadfs usbhid hid fuse raid0 snd_usb_audio snd_pcm_oss snd_mixer_oss md_mod snd_pcm snd_timer snd_page_alloc snd_hwdep snd_usbmidi_lib dmi_sysfs snd_rawmidi nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack snd soundcore lm85 hwmon_vid ohci_hcd ehci_pci ehci_hcd serverworks sata_svw libata acpi_cpufreq freq_table mperf ide_core usbcore kvm_amd kvm tg3 i2c_piix4 libphy microcode e100 usb_common ptp skge i2c_core pcspkr k10temp evdev floppy hwmon pps_core mii rtc_cmos button processor unix [last unloaded: nvidia]
CPU: 1 PID: 6831 Comm: less Tainted: P W O 3.10.15-devel #18
Hardware name: empty empty/S3992-E, BIOS 'V1.06 ' 06/09/2009
task: ffff880203cc6bc0 ti: ffff88023e47c000 task.ti: ffff88023e47c000
RIP: 0010:[<ffffffff8118b188>] [<ffffffff8118b188>] __blkdev_get+0x28/0x450
RSP: 0018:ffff88023e47dbd8 EFLAGS: 00010286
RAX: ffffffffffffff74 RBX: ffffffffffffff74 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
RBP: ffff88023e47dc18 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88023f519658
R13: ffffffff8118c300 R14: 0000000000000000 R15: ffff88023f519640
FS: 00007f2070bf7700(0000) GS:ffff880247400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000002b4 CR3: 000000023da1d000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
0000000000000002 0000001d00000000 000000003e47dc50 ffff88023f519640
ffff88043d5bb668 ffffffff8118c300 ffff88023d683550 ffff88023e47de60
ffff88023e47dc98 ffffffff8118c10d 0000001d81605698 0000000000000292
Call Trace:
[<ffffffff8118c300>] ? blkdev_get_by_dev+0x60/0x60
[<ffffffff8118c10d>] blkdev_get+0x1dd/0x370
[<ffffffff8118c300>] ? blkdev_get_by_dev+0x60/0x60
[<ffffffff813cea6c>] ? _raw_spin_unlock+0x2c/0x50
[<ffffffff8118c300>] ? blkdev_get_by_dev+0x60/0x60
[<ffffffff8118c365>] blkdev_open+0x65/0x80
[<ffffffff8114d12e>] do_dentry_open.isra.18+0x23e/0x2f0
[<ffffffff8114d214>] finish_open+0x34/0x50
[<ffffffff8115e122>] do_last.isra.62+0x2d2/0xc50
[<ffffffff8115eb58>] path_openat.isra.63+0xb8/0x4d0
[<ffffffff81115a8e>] ? might_fault+0x4e/0xa0
[<ffffffff8115f4f0>] do_filp_open+0x40/0x90
[<ffffffff813cea6c>] ? _raw_spin_unlock+0x2c/0x50
[<ffffffff8116db85>] ? __alloc_fd+0xa5/0x1f0
[<ffffffff8114e45f>] do_sys_open+0xef/0x1d0
[<ffffffff8114e559>] SyS_open+0x19/0x20
[<ffffffff813cff16>] system_call_fastpath+0x1a/0x1f
Code: 44 00 00 55 48 89 e5 41 57 49 89 ff 41 56 41 89 d6 41 55 41 54 4c 8d 67 18 53 48 83 ec 18 89 75 cc e9 f2 00 00 00 0f 1f 44 00 00 <48> 8b 80 40 03 00 00 48 89 df 4c 8b 68 58 e8 d5
a4 07 00 44 89
RIP [<ffffffff8118b188>] __blkdev_get+0x28/0x450
RSP <ffff88023e47dbd8>
CR2: 00000000000002b4
---[ end trace bb7f32dbf02398dc ]---
The brd change should be backported to stable kernels starting with 2.6.25.
The loop change should be backported to stable kernels starting with 2.6.22.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 3ec981e30fae1f3c8728a05c730acaa1f627bcfb upstream.
loop: fix crash if blk_alloc_queue fails
If blk_alloc_queue fails, loop_add cleans up, but it doesn't clean up the
identifier allocated with idr_alloc. That causes crash on module unload in
idr_for_each(&loop_index_idr, &loop_exit_cb, NULL); where we attempt to
remove non-existed device with that id.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000380
IP: [<ffffffff812057c9>] del_gendisk+0x19/0x2d0
PGD 43d399067 PUD 43d0ad067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: loop(-) dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_loop dm_mod ip6table_filter ip6_tables uvesafb cfbcopyarea cfbimgblt cfbfillrect fbcon font bitblit fbcon_rotate fbcon_cw fbcon_ud fbcon_ccw softcursor fb fbdev msr ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc tun ipv6 cpufreq_userspace cpufreq_stats cpufreq_ondemand cpufreq_conservative cpufreq_powersave spadfs fuse hid_generic usbhid hid raid0 md_mod dmi_sysfs nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack snd_usb_audio snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc lm85 hwmon_vid snd_hwdep snd_usbmidi_lib snd_rawmidi snd soundcore acpi_cpufreq ohci_hcd freq_table tg3 ehci_pci mperf ehci_hcd kvm_amd kvm sata_svw serverworks libphy libata ide_core k10temp usbcore hwmon microcode ptp pcspkr pps_core e100 skge mii usb_common i2c_piix4 floppy evdev rtc_cmos i2c_core processor but!
ton unix
CPU: 7 PID: 2735 Comm: rmmod Tainted: G W 3.10.15-devel #15
Hardware name: empty empty/S3992-E, BIOS 'V1.06 ' 06/09/2009
task: ffff88043d38e780 ti: ffff88043d21e000 task.ti: ffff88043d21e000
RIP: 0010:[<ffffffff812057c9>] [<ffffffff812057c9>] del_gendisk+0x19/0x2d0
RSP: 0018:ffff88043d21fe10 EFLAGS: 00010282
RAX: ffffffffa05102e0 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88043ea82800 RDI: 0000000000000000
RBP: ffff88043d21fe48 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: 00000000000000ff
R13: 0000000000000080 R14: 0000000000000000 R15: ffff88043ea82800
FS: 00007ff646534700(0000) GS:ffff880447000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000380 CR3: 000000043e9bf000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffffffff8100aba4 0000000000000092 ffff88043d21fe48 ffff88043ea82800
00000000000000ff ffff88043d21fe98 0000000000000000 ffff88043d21fe60
ffffffffa05102b4 0000000000000000 ffff88043d21fe70 ffffffffa05102ec
Call Trace:
[<ffffffff8100aba4>] ? native_sched_clock+0x24/0x80
[<ffffffffa05102b4>] loop_remove+0x14/0x40 [loop]
[<ffffffffa05102ec>] loop_exit_cb+0xc/0x10 [loop]
[<ffffffff81217b74>] idr_for_each+0x104/0x190
[<ffffffffa05102e0>] ? loop_remove+0x40/0x40 [loop]
[<ffffffff8109adc5>] ? trace_hardirqs_on_caller+0x105/0x1d0
[<ffffffffa05135dc>] loop_exit+0x34/0xa58 [loop]
[<ffffffff810a98ea>] SyS_delete_module+0x13a/0x260
[<ffffffff81221d5e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff813cff16>] system_call_fastpath+0x1a/0x1f
Code: f0 4c 8b 6d f8 c9 c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 56 41 55 4c 8d af 80 00 00 00 41 54 53 48 89 fb 48 83 ec 18 <48> 83 bf 80 03 00
00 00 74 4d e8 98 fe ff ff 31 f6 48 c7 c7 20
RIP [<ffffffff812057c9>] del_gendisk+0x19/0x2d0
RSP <ffff88043d21fe10>
CR2: 0000000000000380
---[ end trace 64ec069ec70f1309 ]---
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit c1681bf8a7b1b98edee8b862a42c19c4e53205fd upstream.
struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
block_device allocated first time we access /dev/loopXX and deallocated on
bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
we want that block_device stay alive until we destroy the loop device
with "losetup -d".
But because we do not hold /dev/loopXX inode its counter goes 0, and
inode/bdev can be destroyed at any moment. Usually it happens at memory
pressure or when user drops inode cache (like in the test below). When later in
loop_clr_fd() we want to use bdev we have use-after-free error with following
stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
bd_set_size+0x10/0xa0
loop_clr_fd+0x1f8/0x420 [loop]
lo_ioctl+0x200/0x7e0 [loop]
lo_compat_ioctl+0x47/0xe0 [loop]
compat_blkdev_ioctl+0x341/0x1290
do_filp_open+0x42/0xa0
compat_sys_ioctl+0xc1/0xf20
do_sys_open+0x16e/0x1d0
sysenter_dispatch+0x7/0x1a
To prevent use-after-free we need to grab the device in loop_set_fd()
and put it later in loop_clr_fd().
The issue is reprodusible on current Linus head and v3.3. Here is the test:
dd if=/dev/zero of=loop.file bs=1M count=1
while [ true ]; do
losetup /dev/loop0 loop.file
echo 2 > /proc/sys/vm/drop_caches
losetup -d /dev/loop0
done
[ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every
time we call loop_set_fd() we check that loop_device->lo_state is
Lo_unbound and set it to Lo_bound If somebody will try to set_fd again
it will get EBUSY. And if we try to loop_clr_fd() on unbound loop
device we'll get ENXIO.
loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
loop_device->lo_ctl_mutex. ]
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit b1a6650406875b9097a032eed89af50682fe1160 upstream.
When loopdev is built as module and we pass an invalid parameter,
loop_init() will return directly without deregister misc device, which
will cause an oops when insert loop module next time because we left some
garbage in the misc device list.
Test case:
sudo modprobe loop max_part=1024
(failed due to invalid parameter)
sudo modprobe loop
(oops)
Clean up nicely to avoid such oops.
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: M. Hindess <hindessm@uk.ibm.com>
Cc: Nikanth Karthikesan <knikanth@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 5370019dc2d2c2ff90e95d181468071362934f3a upstream.
bd_mutex and lo_ctl_mutex can be held in different order.
Path #1:
blkdev_open
blkdev_get
__blkdev_get (hold bd_mutex)
lo_open (hold lo_ctl_mutex)
Path #2:
blkdev_ioctl
lo_ioctl (hold lo_ctl_mutex)
lo_set_capacity (hold bd_mutex)
Lockdep does not report it, because path #2 actually holds a subclass of
lo_ctl_mutex. This subclass seems creep into the code by mistake. The
patch author actually just mentioned it in the changelog, see commit
f028f3b2 ("loop: fix circular locking in loop_clr_fd()"), also see:
http://marc.info/?l=linux-kernel&m=123806169129727&w=2
Path #2 hold bd_mutex to call bd_set_size(), I've protected it
with i_mutex in a previous patch, so drop bd_mutex at this site.
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: M. Hindess <hindessm@uk.ibm.com>
Cc: Nikanth Karthikesan <knikanth@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
discard_alignment is not relevant to the loop driver since it is
supposed to be set as a workaround for the old sector 63 alignments.
So set it to zero rather than block size.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reported-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The loop driver does not support discard if encryption is enabled,
fix the comment.
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
1) Anyone who has read access to loopdev has permission to call set_status
and may change important parameters such as lo_offset, lo_sizelimit and
so on, which contradicts to read access pattern and definitely equals
to write access pattern.
2) Add lo_offset over i_size check to prevent blkdev_size overflow.
##Testcase_bagin
#dd if=/dev/zero of=./file bs=1k count=1
#losetup /dev/loop0 ./file
/* userspace_application */
struct loop_info64 loinf;
fd = open("/dev/loop0", O_RDONLY);
ioctl(fd, LOOP_GET_STATUS64, &loinf);
/* Set offset to any value which is bigger than i_size, and sizelimit
* to nonzero value*/
loinf.lo_offset = 4096*1024;
loinf.lo_sizelimit = 1024;
ioctl(fd, LOOP_SET_STATUS64, &loinf);
/* After this loop device will have size similar to 0x7fffffffffxxxx */
#blockdev --getsz /dev/loop0
##OUTPUT: 36028797018955968
##Testcase_end
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If read was not fully successful we have to fail whole bio to prevent
information leak of old pages
##Testcase_begin
dd if=/dev/zero of=./file bs=1M count=1
losetup /dev/loop0 ./file -o 4096
truncate -s 0 ./file
# OOps loop offset is now beyond i_size, so read will silently fail.
# So bio's pages would not be cleared, may which result in information leak.
hexdump -C /dev/loop0
##testcase_end
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
* 'for-3.2/drivers' of git://git.kernel.dk/linux-block: (30 commits)
virtio-blk: use ida to allocate disk index
hpsa: add small delay when using PCI Power Management to reset for kump
cciss: add small delay when using PCI Power Management to reset for kump
xen/blkback: Fix two races in the handling of barrier requests.
xen/blkback: Check for proper operation.
xen/blkback: Fix the inhibition to map pages when discarding sector ranges.
xen/blkback: Report VBD_WSECT (wr_sect) properly.
xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.
xen-blkfront: plug device number leak in xlblk_init() error path
xen-blkfront: If no barrier or flush is supported, use invalid operation.
xen-blkback: use kzalloc() in favor of kmalloc()+memset()
xen-blkback: fixed indentation and comments
xen-blkfront: fix a deadlock while handling discard response
xen-blkfront: Handle discard requests.
xen-blkback: Implement discard requests ('feature-discard')
xen-blkfront: add BLKIF_OP_DISCARD and discard request struct
drivers/block/loop.c: remove unnecessary bdev argument from loop_clr_fd()
drivers/block/loop.c: emit uevent on auto release
drivers/block/cpqarray.c: use pci_dev->revision
loop: always allow userspace partitions and optionally support automatic scanning
...
Fic up trivial header file includsion conflict in drivers/block/loop.c
|
|
|
|
Conflicts:
block/blk-core.c
include/linux/blkdev.h
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Currently the loop device tries to call directly into write_begin/write_end
instead of going through ->write if it can. This is a fairly nasty shortcut
as write_begin and write_end are only callbacks for the generic write code
and expect to be called with filesystem specific locks held.
This code currently causes various issues for clustered filesystems as it
doesn't take the required cluster locks, and it also causes issues for XFS
as it doesn't properly lock against the swapext ioctl as called by the
defragmentation tools. This in case causes data corruption if
defragmentation hits a busy loop device in the wrong time window, as
reported by RH QA.
The reason why we have this shortcut is that it saves a data copy when
doing a transformation on the loop device, which is the technical term
for using cryptoloop (or an XOR transformation). Given that cryptoloop
has been deprecated in favour of dm-crypt my opinion is that we should
simply drop this shortcut instead of finding complicated ways to to
introduce a formal interface for this shortcut.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If the loop device is associated (lo->lo_state == Lo_bound), it will have
a valid bdev pointed to by lo->lo_device. There is no reason to ever pass
an additional block_device pointer.
Signed-off-by: Ayan George <ayan.george@canonical.com>
Cc: Phillip Susi <psusi@cfl.rr.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The loopback driver failed to emit the change uevent when auto releasing
the device. Fixed lo_release() to pass the bdev to loop_clr_fd() so it
can emit the event.
Signed-off-by: Phillip Susi <psusi@cfl.rr.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ayan George <ayan@ayan.net>
Signed-off-by: Andrew Morton <akpm@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There is very little benefit in allowing to let a ->make_request
instance update the bios device and sector and loop around it in
__generic_make_request when we can archive the same through calling
generic_make_request from the driver and letting the loop in
generic_make_request handle it.
Note that various drivers got the return value from ->make_request and
returned non-zero values for errors.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
scanning
Automatic partition scanning can be requested individually per loop
device during its setup by setting LO_FLAGS_PARTSCAN. By default, no
partition tables are scanned.
Userspace can now always add and remove partitions from all loop
devices, regardless if the in-kernel partition scanner is enabled or
not.
The needed partition minor numbers are allocated from the extended
minors space, the main loop device numbers will continue to match the
loop minors, regardless of the number of partitions used.
# grep . /sys/class/block/loop1/loop/*
/sys/block/loop1/loop/autoclear:0
/sys/block/loop1/loop/backing_file:/home/kay/data/stuff/part.img
/sys/block/loop1/loop/offset:0
/sys/block/loop1/loop/partscan:1
/sys/block/loop1/loop/sizelimit:0
# ls -l /dev/loop*
brw-rw---- 1 root disk 7, 0 Aug 14 20:22 /dev/loop0
brw-rw---- 1 root disk 7, 1 Aug 14 20:23 /dev/loop1
brw-rw---- 1 root disk 259, 0 Aug 14 20:23 /dev/loop1p1
brw-rw---- 1 root disk 259, 1 Aug 14 20:23 /dev/loop1p2
brw-rw---- 1 root disk 7, 99 Aug 14 20:23 /dev/loop99
brw-rw---- 1 root disk 259, 2 Aug 14 20:23 /dev/loop99p1
brw-rw---- 1 root disk 259, 3 Aug 14 20:23 /dev/loop99p2
crw------T 1 root root 10, 237 Aug 14 20:22 /dev/loop-control
Cc: Karel Zak <kzak@redhat.com>
Cc: Davidlohr Bueso <dave@gnu.org>
Acked-By: Tejun Heo <tj@kernel.org>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
This commit adds discard support for loop devices. Discard is usually
supported by SSD and thinly provisioned devices as a method for
reclaiming unused space. This is no different than trying to reclaim
back space which is not used by the file system on the image, but it
still occupies space on the host file system.
We can do the reclamation on file system which does support hole
punching. So when discard request gets to the loop driver we can
translate that to punch a hole to the underlying file, hence reclaim
the free space.
This is very useful for trimming down the size of the image to only what
is really used by the file system on that image. Fstrim may be used for
that purpose.
It has been tested on ext4, xfs and btrfs with the image file systems
ext4, ext3, xfs and btrfs. ext4, or ext6 image on ext4 file system has
some problems but it seems that ext4 punch hole implementation is
somewhat flawed and it is unrelated to this commit.
Also this is a very good method of validating file systems punch hole
implementation.
Note that when encryption is used, discard support is disabled, because
using it might leak some information useful for possible attacker.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
LOOP_CLR_FD takes lo->lo_ctl_mutex and tries to remove the loop sysfs
files. Sysfs calls show() and waits for lo->lo_ctl_mutex. LOOP_CLR_FD
waits for show() to finish to remove the sysfs file.
cat /sys/class/block/loop0/loop/backing_file
mutex_lock_nested+0x176/0x350
? loop_attr_do_show_backing_file+0x2f/0xd0 [loop]
? loop_attr_do_show_backing_file+0x2f/0xd0 [loop]
loop_attr_do_show_backing_file+0x2f/0xd0 [loop]
dev_attr_show+0x1b/0x60
? sysfs_read_file+0x86/0x1a0
? __get_free_pages+0x12/0x50
sysfs_read_file+0xaf/0x1a0
ioctl(LOOP_CLR_FD):
wait_for_common+0x12c/0x180
? try_to_wake_up+0x2a0/0x2a0
wait_for_completion+0x18/0x20
sysfs_deactivate+0x178/0x180
? sysfs_addrm_finish+0x43/0x70
? sysfs_addrm_start+0x1d/0x20
sysfs_addrm_finish+0x43/0x70
sysfs_hash_and_remove+0x85/0xa0
sysfs_remove_group+0x59/0x100
loop_clr_fd+0x1dc/0x3f0 [loop]
lo_ioctl+0x223/0x7a0 [loop]
Instead of taking the lo_ctl_mutex from sysfs code, take the inner
lo->lo_lock, to protect the access to the backing_file data.
Thanks to Tejun for help debugging and finding a solution.
Cc: Milan Broz <mbroz@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
devices
Instead of unconditionally creating a fixed number of dead loop
devices which need to be investigated by storage handling services,
even when they are never used, we allow distros start with 0
loop devices and have losetup(8) and similar switch to the dynamic
/dev/loop-control interface instead of searching /dev/loop%i for free
devices.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Loop devices today have a fixed pre-allocated number of usually 8.
The number can only be changed at module init time. To find a free
device to use, /dev/loop%i needs to be scanned, and all devices need
to be opened until a free one is possibly found.
This adds a new /dev/loop-control device node, that allows to
dynamically find or allocate a free device, and to add and remove loop
devices from the running system:
LOOP_CTL_ADD adds a specific device. Arg is the number
of the device. It returns the device i or a negative
error code.
LOOP_CTL_REMOVE removes a specific device, Arg is the
number the device. It returns the device i or a negative
error code.
LOOP_CTL_GET_FREE finds the next unbound device or allocates
a new one. No arg is given. It returns the device i or a
negative error code.
The loop kernel module gets automatically loaded when
/dev/loop-control is accessed the first time. The alias
specified in the module, instructs udev to create this
'dead' device node, even when the module is not loaded.
Example:
cfd = open("/dev/loop-control", O_RDWR);
# add a new specific loop device
err = ioctl(cfd, LOOP_CTL_ADD, devnr);
# remove a specific loop device
err = ioctl(cfd, LOOP_CTL_REMOVE, devnr);
# find or allocate a free loop device to use
devnr = ioctl(cfd, LOOP_CTL_GET_FREE);
sprintf(loopname, "/dev/loop%i", devnr);
ffd = open("backing-file", O_RDWR);
lfd = open(loopname, O_RDWR);
err = ioctl(lfd, LOOP_SET_FD, ffd);
Cc: Tejun Heo <tj@kernel.org>
Cc: Karel Zak <kzak@redhat.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Replace the linked list, that keeps track of allocated devices, with an
idr index to allow a more efficient lookup of devices.
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Export 'max_loop' and 'max_part' parameters to sysfs so user can know
that how many devices are allowed and how many partitions are supported.
If 'max_loop' is 0, there is no restriction on the number of loop devices.
User can create/use the devices as many as minor numbers available. If
'max_part' is 0, it means simply the device doesn't support partitioning.
Also note that 'max_part' can be adjusted to power of 2 minus 1 form if
needed. User should check this value after the module loading if he/she
want to use that number correctly (i.e. fdisk, mknod, etc.).
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
When finding or allocating a loop device, loop_probe() did not take
partition numbers into account so that it can result to a different
device. Consider following example:
$ sudo modprobe loop max_part=15
$ ls -l /dev/loop*
brw-rw---- 1 root disk 7, 0 2011-05-24 22:16 /dev/loop0
brw-rw---- 1 root disk 7, 16 2011-05-24 22:16 /dev/loop1
brw-rw---- 1 root disk 7, 32 2011-05-24 22:16 /dev/loop2
brw-rw---- 1 root disk 7, 48 2011-05-24 22:16 /dev/loop3
brw-rw---- 1 root disk 7, 64 2011-05-24 22:16 /dev/loop4
brw-rw---- 1 root disk 7, 80 2011-05-24 22:16 /dev/loop5
brw-rw---- 1 root disk 7, 96 2011-05-24 22:16 /dev/loop6
brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7
$ sudo mknod /dev/loop8 b 7 128
$ sudo losetup /dev/loop8 ~/temp/disk-with-3-parts.img
$ sudo losetup -a
/dev/loop128: [0805]:278201 (/home/namhyung/temp/disk-with-3-parts.img)
$ ls -l /dev/loop*
brw-rw---- 1 root disk 7, 0 2011-05-24 22:16 /dev/loop0
brw-rw---- 1 root disk 7, 16 2011-05-24 22:16 /dev/loop1
brw-rw---- 1 root disk 7, 2048 2011-05-24 22:18 /dev/loop128
brw-rw---- 1 root disk 7, 2049 2011-05-24 22:18 /dev/loop128p1
brw-rw---- 1 root disk 7, 2050 2011-05-24 22:18 /dev/loop128p2
brw-rw---- 1 root disk 7, 2051 2011-05-24 22:18 /dev/loop128p3
brw-rw---- 1 root disk 7, 32 2011-05-24 22:16 /dev/loop2
brw-rw---- 1 root disk 7, 48 2011-05-24 22:16 /dev/loop3
brw-rw---- 1 root disk 7, 64 2011-05-24 22:16 /dev/loop4
brw-rw---- 1 root disk 7, 80 2011-05-24 22:16 /dev/loop5
brw-rw---- 1 root disk 7, 96 2011-05-24 22:16 /dev/loop6
brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7
brw-r--r-- 1 root root 7, 128 2011-05-24 22:17 /dev/loop8
After this patch, /dev/loop8 - instead of /dev/loop128 - was
accessed correctly.
In addition, 'range' passed to blk_register_region() should
include all range of dev_t that LOOP_MAJOR can address. It does
not need to be limited by partition numbers unless 'max_loop'
param was specified.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Laurent Vivier <Laurent.Vivier@bull.net>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
The 'max_part' parameter controls the number of maximum partition
a loop block device can have. However if a user specifies very
large value it would exceed the limitation of device minor number
and can cause a kernel panic (or, at least, produce invalid
device nodes in some cases).
On my desktop system, following command kills the kernel. On qemu,
it triggers similar oops but the kernel was alive:
$ sudo modprobe loop max_part0000
------------[ cut here ]------------
kernel BUG at /media/Linux_Data/project/linux/fs/sysfs/group.c:65!
invalid opcode: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in: loop(+)
Pid: 43, comm: insmod Tainted: G W 2.6.39-qemu+ #155 Bochs Bochs
RIP: 0010:[<ffffffff8113ce61>] [<ffffffff8113ce61>] internal_create_group=
+0x2a/0x170
RSP: 0018:ffff880007b3fde8 EFLAGS: 00000246
RAX: 00000000ffffffef RBX: ffff880007b3d878 RCX: 00000000000007b4
RDX: ffffffff8152da50 RSI: 0000000000000000 RDI: ffff880007b3d878
RBP: ffff880007b3fe38 R08: ffff880007b3fde8 R09: 0000000000000000
R10: ffff88000783b4a8 R11: ffff880007b3d878 R12: ffffffff8152da50
R13: ffff880007b3d868 R14: 0000000000000000 R15: ffff880007b3d800
FS: 0000000002137880(0063) GS:ffff880007c00000(0000) knlGS:00000000000000=
00
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000422680 CR3: 0000000007b50000 CR4: 00000000000006b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
Process insmod (pid: 43, threadinfo ffff880007b3e000, task ffff880007afb9c=
0)
Stack:
ffff880007b3fe58 ffffffff811e66dd ffff880007b3fe58 ffffffff811e570b
0000000000000010 ffff880007b3d800 ffff880007a7b390 ffff880007b3d868
0000000000400920 ffff880007b3d800 ffff880007b3fe48 ffffffff8113cfc8
Call Trace:
[<ffffffff811e66dd>] ? device_add+0x4bc/0x5af
[<ffffffff811e570b>] ? dev_set_name+0x3c/0x3e
[<ffffffff8113cfc8>] sysfs_create_group+0xe/0x12
[<ffffffff810b420e>] blk_trace_init_sysfs+0x14/0x16
[<ffffffff8116a090>] blk_register_queue+0x47/0xf7
[<ffffffff8116f527>] add_disk+0xdf/0x290
[<ffffffffa00060eb>] loop_init+0xeb/0x1b8 [loop]
[<ffffffffa0006000>] ? 0xffffffffa0005fff
[<ffffffff8100020a>] do_one_initcall+0x7a/0x12e
[<ffffffff81096804>] sys_init_module+0x9c/0x1e0
[<ffffffff813329bb>] system_call_fastpath+0x16/0x1b
Code: c3 55 48 89 e5 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 53 48 89 fb=
48 83 ec 28 48 85 ff 74 0b 85 f6 75 0b 48 83 7f 30 00 75 14 <0f> 0b eb fe =
48 83 7f 30 00 b9 ea ff ff ff 0f 84 18 01 00 00 49
RIP [<ffffffff8113ce61>] internal_create_group+0x2a/0x170
RSP <ffff880007b3fde8>
---[ end trace a123eb592043acad ]---
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Laurent Vivier <Laurent.Vivier@bull.net>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Conflicts:
block/blk-core.c
block/blk-flush.c
drivers/md/raid1.c
drivers/md/raid10.c
drivers/md/raid5.c
fs/nilfs2/btnode.c
fs/nilfs2/mdt.c
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
This merge creates two set of conflicts. One is simple context
conflicts caused by removal of throtl_scheduled_delayed_work() in
for-linus and removal of throtl_shutdown_timer_wq() in
for-2.6.39/core.
The other is caused by commit 255bb490c8 (block: blk-flush shouldn't
call directly into q->request_fn() __blk_run_queue()) in for-linus
crashing with FLUSH reimplementation in for-2.6.39/core. The conflict
isn't trivial but the resolution is straight-forward.
* __blk_run_queue() calls in flush_end_io() and flush_data_end_io()
should be called with @force_kblockd set to %true.
* elv_insert() in blk_kick_flush() should use
%ELEVATOR_INSERT_REQUEUE.
Both changes are to avoid invoking ->request_fn() directly from
request completion path and closely match the changes in the commit
255bb490c8.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Following steps lead to deadlock in kernel:
dd if=/dev/zero of=img bs=512 count=1000
losetup -f img
mkfs.ext2 /dev/loop0
mount -t ext2 -o loop /dev/loop0 mnt
umount mnt/
Stacktrace:
[<c102ec04>] irq_exit+0x36/0x59
[<c101502c>] smp_apic_timer_interrupt+0x6b/0x75
[<c127f639>] apic_timer_interrupt+0x31/0x38
[<c101df88>] mutex_spin_on_owner+0x54/0x5b
[<fe2250e9>] lo_release+0x12/0x67 [loop]
[<c10c4eae>] __blkdev_put+0x7c/0x10c
[<c10a4da5>] fput+0xd5/0x1aa
[<fe2250cf>] loop_clr_fd+0x1a9/0x1b1 [loop]
[<fe225110>] lo_release+0x39/0x67 [loop]
[<c10c4eae>] __blkdev_put+0x7c/0x10c
[<c10a59d9>] deactivate_locked_super+0x17/0x36
[<c10b6f37>] sys_umount+0x27e/0x2a5
[<c10b6f69>] sys_oldumount+0xb/0xe
[<c1002897>] sysenter_do_call+0x12/0x26
[<ffffffff>] 0xffffffff
Regression since 2a48fc0ab24241755dc9, which introduced the private
loop_mutex as part of the BKL removal process.
As per [1], the mutex can be safely removed.
[1] http://www.gossamer-threads.com/lists/linux/kernel/1341930
Addresses: https://bugzilla.novell.com/show_bug.cgi?id=669394
Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=29172
Signed-off-by: Petr Uzel <petr.uzel@suse.cz>
Cc: stable@kernel.org
Reviewed-by: Nikanth Karthikesan <knikanth@suse.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
blk_cleanup_queue()
Now we initialize ->queue_lock at queue allocation time so driver does
not have to worry about initializing it before calling
blk_cleanup_queue().
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Performing
$ sudo mount -o loop -o umask=0 /dev/sdb1 /mnt/
mount: wrong fs type, bad option, bad superblock on /dev/loop0,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
$ sudo modprobe -r loop
results in oops:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: [<ffffffff812479d4>] do_raw_spin_lock+0x14/0x122
Process modprobe (pid: 6189, threadinfo ffff88009a898000, task ffff880154a88000)
Call Trace:
[<ffffffff81486788>] _raw_spin_lock_irq+0x4a/0x51
[<ffffffff8123404b>] ? blk_throtl_exit+0x3b/0xa0
[<ffffffff8105b120>] ? cancel_delayed_work_sync+0xd/0xf
[<ffffffff8123404b>] blk_throtl_exit+0x3b/0xa0
[<ffffffff81229bc8>] blk_release_queue+0x21/0x65
[<ffffffff8123bb06>] kobject_release+0x51/0x66
[<ffffffff8123bab5>] ? kobject_release+0x0/0x66
[<ffffffff8123ce1e>] kref_put+0x43/0x4d
[<ffffffff8123ba27>] kobject_put+0x47/0x4b
[<ffffffff8122717c>] blk_cleanup_queue+0x56/0x5b
[<ffffffffa01c3824>] loop_exit+0x68/0x844 [loop]
[<ffffffff8107cccc>] sys_delete_module+0x1e8/0x25b
[<ffffffff814864c9>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff81002112>] system_call_fastpath+0x16/0x1b
because of an attempt to acquire NULL queue_lock.
I added the same lines as in blk_queue_make_request -
index 44e18c0..49e6a54 100644`fall back to embedded per-queue lock'.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Commit a8adbe3 forgot to remove the return variable, kill it.
drivers/block/loop.c: In function 'lo_splice_actor':
drivers/block/loop.c:398: warning: unused variable 'ret'
[...]
fs/nfsd/vfs.c: In function 'nfsd_splice_actor':
fs/nfsd/vfs.c:848: warning: unused variable 'ret'
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
This patch pulls calls to buf->ops->confirm() from all actors passed
(also indirectly) to splice_from_pipe_feed().
Is avoiding the call to buf->ops->confirm() while splice()ing to
/dev/null is an intentional optimization? No other user does that
and this will remove this special case.
Against current linux.git 6313e3c21743cc88bb5bd8aa72948ee1e83937b6.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
REQ_HARDBARRIER is dead now, so remove the leftovers. What's left
at this point is:
- various checks inside the block layer.
- sanity checks in bio based drivers.
- now unused bio_empty_barrier helper.
- Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while,
but Xen really needs to sort out it's barrier situaton.
- setting of ordered tags in uas - dead code copied from old scsi
drivers.
- scsi different retry for barriers - it's dead and should have been
removed when flushes were converted to FS requests.
- blktrace handling of barriers - removed. Someone who knows blktrace
better should add support for REQ_FLUSH and REQ_FUA, though.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
In autoclear mode bdev is NULL but the sysfs
entry should be destroyed otherwise this warning appears:
WARNING: at fs/sysfs/dir.c:451 sysfs_add_one+0x82/0x95()
sysfs: cannot create duplicate filename '/devices/virtual/block/loop0/loop'
Fixes commit ee86273062cbb310665fe49e1f1937d2cf85b0b9
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Ensure kmap_atomic() usage is strictly nested
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
* 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits)
xen-blkfront: disable barrier/flush write support
Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c
block: remove BLKDEV_IFL_WAIT
aic7xxx_old: removed unused 'req' variable
block: remove the BH_Eopnotsupp flag
block: remove the BLKDEV_IFL_BARRIER flag
block: remove the WRITE_BARRIER flag
swap: do not send discards as barriers
fat: do not send discards as barriers
ext4: do not send discards as barriers
jbd2: replace barriers with explicit flush / FUA usage
jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier
jbd: replace barriers with explicit flush / FUA usage
nilfs2: replace barriers with explicit flush / FUA usage
reiserfs: replace barriers with explicit flush / FUA usage
gfs2: replace barriers with explicit flush / FUA usage
btrfs: replace barriers with explicit flush / FUA usage
xfs: replace barriers with explicit flush / FUA usage
block: pass gfp_mask and flags to sb_issue_discard
dm: convey that all flushes are processed as empty
...
|
|
* 'for-2.6.37/drivers' of git://git.kernel.dk/linux-2.6-block: (95 commits)
cciss: fix PCI IDs for new Smart Array controllers
drbd: add race-breaker to drbd_go_diskless
drbd: use dynamic_dev_dbg to optionally log uuid changes
dynamic_debug.h: Fix dynamic_dev_dbg() macro if CONFIG_DYNAMIC_DEBUG not set
drbd: cleanup: change "<= 0" to "== 0"
drbd: relax the grace period of the md_sync timer again
drbd: add some more explicit drbd_md_sync
drbd: drop wrong debug asserts, fix recently introduced race
drbd: cleanup useless leftover warn/error printk's
drbd: add explicit drbd_md_sync to drbd_resync_finished
drbd: Do not log an ASSERT for P_OV_REQUEST packets while C_CONNECTED
drbd: fix for possible deadlock on IO error during resync
drbd: fix unlikely access after free and list corruption
drbd: fix for spurious fullsync (uuids rotated too fast)
drbd: allow for explicit resync-finished notifications
drbd: preparation commit, using full state in receive_state()
drbd: drbd_send_ack_dp must not rely on header information
drbd: Fix regression in recv_bm_rle_bits (compressed bitmap)
drbd: Fixed a stupid copy and paste error
drbd: Allow larger values for c-fill-target.
...
Fix up trivial conflict in drivers/block/ataflop.c due to BKL removal
|
|
The block device drivers have all gained new lock_kernel
calls from a recent pushdown, and some of the drivers
were already using the BKL before.
This turns the BKL into a set of per-driver mutexes.
Still need to check whether this is safe to do.
file=$1
name=$2
if grep -q lock_kernel ${file} ; then
if grep -q 'include.*linux.mutex.h' ${file} ; then
sed -i '/include.*<linux\/smp_lock.h>/d' ${file}
else
sed -i 's/include.*<linux\/smp_lock.h>.*$/include <linux\/mutex.h>/g' ${file}
fi
sed -i ${file} \
-e "/^#include.*linux.mutex.h/,$ {
1,/^\(static\|int\|long\)/ {
/^\(static\|int\|long\)/istatic DEFINE_MUTEX(${name}_mutex);
} }" \
-e "s/\(un\)*lock_kernel\>[ ]*()/mutex_\1lock(\&${name}_mutex)/g" \
-e '/[ ]*cycle_kernel_lock();/d'
else
sed -i -e '/include.*\<smp_lock.h\>/d' ${file} \
-e '/cycle_kernel_lock()/d'
fi
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Deprecate REQ_HARDBARRIER and implement REQ_FLUSH/FUA instead. Also,
instead of checking file->f_op->fsync() directly, look at the value of
vfs_fsync() and ignore -EINVAL return.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Barrier is deemed too heavy and will soon be replaced by FLUSH/FUA
requests. Deprecate barrier. All REQ_HARDBARRIERs are failed with
-EOPNOTSUPP and blk_queue_ordered() is replaced with simpler
blk_queue_flush().
blk_queue_flush() takes combinations of REQ_FLUSH and FUA. If a
device has write cache and can flush it, it should set REQ_FLUSH. If
the device can handle FUA writes, it should also set REQ_FUA.
All blk_queue_ordered() users are converted.
* ORDERED_DRAIN is mapped to 0 which is the default value.
* ORDERED_DRAIN_FLUSH is mapped to REQ_FLUSH.
* ORDERED_DRAIN_FLUSH_FUA is mapped to REQ_FLUSH | REQ_FUA.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Pierre Ossman <drzeus@drzeus.cx>
Cc: Stefan Weinhuber <wein@de.ibm.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
loop implements FLUSH using fsync but was incorrectly setting its
ordered mode to DRAIN. Change it to DRAIN_FLUSH. In practice, this
doesn't change anything as loop doesn't make use of the block layer
ordered implementation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Create /sys/block/loopX/loop directory and provide these attributes:
- backing_file
- autoclear
- offset
- sizelimit
This loop directory is present only if loop device is configured.
To be used in util-linux-ng (and possibly elsewhere like udev rules)
where code need to get loop attributes from kernel (and not store
duplicate info in userspace).
Moreover loop ioctls are not even able to provide full backing
file info because of buffer limits.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Return of the bi_rw tests is no longer bool after commit 74450be1. But
results of such tests are stored in bools. This doesn't fit in there
for some compilers (gcc 4.5 here), so either use !! magic to get real
bools or use ulong where the result is assigned somewhere.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
The open and release block_device_operations are currently
called with the BKL held. In order to change that, we must
first make sure that all drivers that currently rely
on this have no regressions.
This blindly pushes the BKL into all .open and .release
operations for all block drivers to prepare for the
next step. The drivers can subsequently replace the BKL
with their own locks or remove it completely when it can
be shown that it is not needed.
The functions blkdev_get and blkdev_put are the only
remaining users of the big kernel lock in the block
layer, besides a few uses in the ioctl code, none
of which need to serialize with blkdev_{get,put}.
Most of these two functions is also under the protection
of bdev->bd_mutex, including the actual calls to
->open and ->release, and the common code does not
access any global data structures that need the BKL.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
This removes q->prepare_flush_fn completely (changes the
blk_queue_ordered API).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver. There were two flags in the bio that were
missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
renamed two request flags that had a superflous RW in them.
Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Now that the last user passing a NULL file pointer is gone we can remove
the redundant dentry argument and associated hacks inside vfs_fsynmc_range.
The next step will be removig the dentry argument from ->fsync, but given
the luck with the last round of method prototype changes I'd rather
defer this until after the main merge window.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Recent udev versions probe loop devices for filesystems meaning that
the /dev/disk hierarchy may contain useful entries such as
$ ls -l /dev/disk/by-label/Fedora-12-x86_64-Live
lrwxrwxrwx 1 root root 11 Mar 11 13:41 /dev/disk/by-label/Fedora-12-x86_64-Live -> ../../loop0
Unfortunately, no "change" uevent is generated when the loop device is
detached so the symlink persists. Additionally, no "change" uevent is
guaranteed to be generated when attaching an fd or changing capacity.
For example, user space could open the loop device O_RDONLY (in fact,
recent util-linux-ng does this) so udev's OPTIONS+="watch" machinery may
not trigger the "change" uevent.
This patch ensures that the "change" uevent is generated in all of
these cases. As a result, the /dev/disk hierarchy works as expected
for loop devices.
Signed-off-by: David Zeuthen <davidz@redhat.com>
Acked-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (34 commits)
cfq-iosched: Fix the incorrect timeslice accounting with forced_dispatch
loop: Update mtime when writing using aops
block: expose the statistics in blkio.time and blkio.sectors for the root cgroup
backing-dev: Handle class_create() failure
Block: Fix block/elevator.c elevator_get() off-by-one error
drbd: lc_element_by_index() never returns NULL
cciss: unlock on error path
cfq-iosched: Do not merge queues of BE and IDLE classes
cfq-iosched: Add additional blktrace log messages in CFQ for easier debugging
i2o: Remove the dangerous kobj_to_i2o_device macro
block: remove 16 bytes of padding from struct request on 64bits
cfq-iosched: fix a kbuild regression
block: make CONFIG_BLK_CGROUP visible
Remove GENHD_FL_DRIVERFS
block: Export max number of segments and max segment size in sysfs
block: Finalize conversion of block limits functions
block: Fix overrun in lcm() and move it to lib
vfs: improve writeback_inodes_wb()
paride: fix off-by-one test
drbd: fix al-to-on-disk-bitmap for 4k logical_block_size
...
|