user/sven/linux.git/fs/ext4/inode.c, branch v3.2.52

ext4: fix overflow when counting used blocks on 32-bit architectures

2013-07-27T04:34:34Z

commit 8af8eecc1331dbf5e8c662022272cf667e213da5 upstream. The arithmetics adding delalloc blocks to the number of used blocks in ext4_getattr() can easily overflow on 32-bit archs as we first multiply number of blocks by blocksize and then divide back by 512. Make the arithmetics more clever and also use proper type (unsigned long long instead of unsigned long). Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o Signed-off-by: Ben Hutchings

ext4/jbd2: don't wait (forever) for stale tid caused by wraparound

2013-05-13T14:02:14Z

commit d76a3a77113db020d9bb1e894822869410450bd9 upstream. In the case where an inode has a very stale transaction id (tid) in i_datasync_tid or i_sync_tid, it's possible that after a very large (2**31) number of transactions, that the tid number space might wrap, causing tid_geq()'s calculations to fail. Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily", attempted to fix this problem, but it only avoided kjournald spinning forever by fixing the logic in jbd2_log_start_commit(). Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c that might call jbd2_log_start_commit() with a stale tid, those functions will subsequently call jbd2_log_wait_commit() with the same stale tid, and then wait for a very long time. To fix this, we replace the calls to jbd2_log_start_commit() and jbd2_log_wait_commit() with a call to a new function, jbd2_complete_transaction(), which will correctly handle stale tid's. As a bonus, jbd2_complete_transaction() will avoid locking j_state_lock for writing unless a commit needs to be started. This should have a small (but probably not measurable) improvement for ext4's scalability. Signed-off-by: "Theodore Ts'o" Reported-by: Ben Hutchings Reported-by: George Barnett [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings

ext4: fix data=journal fast mount/umount hang

2013-03-27T02:41:18Z

commit 2b405bfa84063bfa35621d2d6879f52693c614b0 upstream. In data=journal mode, if we unmount the file system before a transaction has a chance to complete, when the journal inode is being evicted, we can end up calling into jbd2_log_wait_commit() for the last transaction, after the journalling machinery has been shut down. Arguably we should adjust ext4_should_journal_data() to return FALSE for the journal inode, but the only place it matters is ext4_evict_inode(), and so to save a bit of CPU time, and to make the patch much more obviously correct by inspection(tm), we'll fix it by explicitly not trying to waiting for a journal commit when we are evicting the journal inode, since it's guaranteed to never succeed in this case. This can be easily replicated via: mount -t ext4 -o data=journal /dev/vdb /vdb ; umount /vdb ------------[ cut here ]------------ WARNING: at /usr/projects/linux/ext4/fs/jbd2/journal.c:542 __jbd2_log_start_commit+0xba/0xcd() Hardware name: Bochs JBD2: bad log_start_commit: 3005630206 3005630206 0 0 Modules linked in: Pid: 2909, comm: umount Not tainted 3.8.0-rc3 #1020 Call Trace: [] warn_slowpath_common+0x68/0x7d [] ? __jbd2_log_start_commit+0xba/0xcd [] warn_slowpath_fmt+0x2b/0x2f [] __jbd2_log_start_commit+0xba/0xcd [] jbd2_log_start_commit+0x24/0x34 [] ext4_evict_inode+0x71/0x2e3 [] evict+0x94/0x135 [] iput+0x10a/0x110 [] jbd2_journal_destroy+0x190/0x1ce [] ? bit_waitqueue+0x50/0x50 [] ext4_put_super+0x52/0x294 [] generic_shutdown_super+0x48/0xb4 [] kill_block_super+0x22/0x60 [] deactivate_locked_super+0x22/0x49 [] deactivate_super+0x30/0x33 [] mntput_no_expire+0x107/0x10c [] sys_umount+0x2cf/0x2e0 [] sys_oldumount+0x12/0x14 [] syscall_call+0x7/0xb ---[ end trace 6a954cc790501c1f ]--- jbd2_log_wait_commit: error: j_commit_request=-1289337090, tid=0 Signed-off-by: "Theodore Ts'o" Reviewed-by: Jan Kara Signed-off-by: Ben Hutchings

ext4: fix possible use-after-free with AIO

2013-03-06T03:23:45Z

commit 091e26dfc156aeb3b73bc5c5f277e433ad39331c upstream. Running AIO is pinning inode in memory using file reference. Once AIO is completed using aio_complete(), file reference is put and inode can be freed from memory. So we have to be sure that calling aio_complete() is the last thing we do with the inode. Reviewed-by: Carlos Maiolino Acked-by: Jeff Moyer Signed-off-by: Jan Kara Signed-off-by: "Theodore Ts'o" [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings

ext4: return ENOMEM if sb_getblk() fails

2013-03-06T03:22:36Z

commit 860d21e2c585f7ee8a4ecc06f474fdc33c9474f4 upstream. The only reason for sb_getblk() failing is if it can't allocate the buffer_head. So ENOMEM is more appropriate than EIO. In addition, make sure that the file system is marked as being inconsistent if sb_getblk() fails. Signed-off-by: "Theodore Ts'o" [bwh: Backported to 3.2: - Adjust context - Drop change to inline.c - Call to ext4_ext_check() from ext4_ext_find_extent() is conditional] Signed-off-by: Ben Hutchings

ext4: init pagevec in ext4_da_block_invalidatepages

2013-01-03T03:33:09Z

commit 66bea92c69477a75a5d37b9bfed5773c92a3c4b4 upstream. ext4_da_block_invalidatepages is missing a pagevec_init(), which means that pvec->cold contains random garbage. This affects whether the page goes to the front or back of the LRU when ->cold makes it to free_hot_cold_page() Reviewed-by: Lukas Czerner Reviewed-by: Carlos Maiolino Signed-off-by: Eric Sandeen Signed-off-by: "Theodore Ts'o" [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings

ext4: fix fdatasync() for files with only i_size changes

2012-10-17T02:49:04Z

commit b71fc079b5d8f42b2a52743c8d2f1d35d655b1c5 upstream. Code tracking when transaction needs to be committed on fdatasync(2) forgets to handle a situation when only inode's i_size is changed. Thus in such situations fdatasync(2) doesn't force transaction with new i_size to disk and that can result in wrong i_size after a crash. Fix the issue by updating inode's i_datasync_tid whenever its size is updated. Reported-by: Kristian Nielsen Signed-off-by: Jan Kara [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings

ext4: fix potential deadlock in ext4_nonda_switch()

2012-10-17T02:48:38Z

commit 00d4e7362ed01987183e9528295de3213031309c upstream. In ext4_nonda_switch(), if the file system is getting full we used to call writeback_inodes_sb_if_idle(). The problem is that we can be holding i_mutex already, and this causes a potential deadlock when writeback_inodes_sb_if_idle() when it tries to take s_umount. (See lockdep output below). As it turns out we don't need need to hold s_umount; the fact that we are in the middle of the write(2) system call will keep the superblock pinned. Unfortunately writeback_inodes_sb() checks to make sure s_umount is taken, and the VFS uses a different mechanism for making sure the file system doesn't get unmounted out from under us. The simplest way of dealing with this is to just simply grab s_umount using a trylock, and skip kicking the writeback flusher thread in the very unlikely case that we can't take a read lock on s_umount without blocking. Also, we now check the cirteria for kicking the writeback thread before we decide to whether to fall back to non-delayed writeback, so if there are any outstanding delayed allocation writes, we try to get them resolved as soon as possible. [ INFO: possible circular locking dependency detected ] 3.6.0-rc1-00042-gce894ca #367 Not tainted ------------------------------------------------------- dd/8298 is trying to acquire lock: (&type->s_umount_key#18){++++..}, at: [] writeback_inodes_sb_if_idle+0x28/0x46 but task is already holding lock: (&sb->s_type->i_mutex_key#8){+.+...}, at: [] generic_file_aio_write+0x5f/0xd3 which lock already depends on the new lock. 2 locks held by dd/8298: #0: (sb_writers#2){.+.+.+}, at: [] generic_file_aio_write+0x56/0xd3 #1: (&sb->s_type->i_mutex_key#8){+.+...}, at: [] generic_file_aio_write+0x5f/0xd3 stack backtrace: Pid: 8298, comm: dd Not tainted 3.6.0-rc1-00042-gce894ca #367 Call Trace: [] ? console_unlock+0x345/0x372 [] print_circular_bug+0x190/0x19d [] __lock_acquire+0x86d/0xb6c [] ? mark_held_locks+0x5c/0x7b [] lock_acquire+0x66/0xb9 [] ? writeback_inodes_sb_if_idle+0x28/0x46 [] down_read+0x28/0x58 [] ? writeback_inodes_sb_if_idle+0x28/0x46 [] writeback_inodes_sb_if_idle+0x28/0x46 [] ext4_nonda_switch+0xe1/0xf4 [] ext4_da_write_begin+0x27/0x193 [] generic_file_buffered_write+0xc8/0x1bb [] __generic_file_aio_write+0x1dd/0x205 [] generic_file_aio_write+0x78/0xd3 [] ext4_file_write+0x480/0x4a6 [] ? __lock_acquire+0x41e/0xb6c [] ? sched_clock_cpu+0x11a/0x13e [] ? trace_hardirqs_off+0xb/0xd [] ? local_clock+0x37/0x4e [] do_sync_write+0x67/0x9d [] ? wait_on_retry_sync_kiocb+0x44/0x44 [] vfs_write+0x7b/0xe6 [] sys_write+0x3b/0x64 [] syscall_call+0x7/0xb Signed-off-by: "Theodore Ts'o" Signed-off-by: Ben Hutchings

ext4: undo ext4_calc_metadata_amount if we fail to claim space

2012-08-02T13:37:58Z

commit 03179fe92318e7934c180d96f12eff2cb36ef7b6 upstream. The function ext4_calc_metadata_amount() has side effects, although it's not obvious from its function name. So if we fail to claim space, regardless of whether we retry to claim the space again, or return an error, we need to undo these side effects. Otherwise we can end up incorrectly calculating the number of metadata blocks needed for the operation, which was responsible for an xfstests failure for test #271 when using an ext2 file system with delalloc enabled. Reported-by: Brian Foster Signed-off-by: "Theodore Ts'o" Signed-off-by: Ben Hutchings

ext4: don't let i_reserved_meta_blocks go negative

2012-08-02T13:37:58Z

commit 97795d2a5b8d3c8dc4365d4bd3404191840453ba upstream. If we hit a condition where we have allocated metadata blocks that were not appropriately reserved, we risk underflow of ei->i_reserved_meta_blocks. In turn, this can throw sbi->s_dirtyclusters_counter significantly out of whack and undermine the nondelalloc fallback logic in ext4_nonda_switch(). Warn if this occurs and set i_allocated_meta_blocks to avoid this problem. This condition is reproduced by xfstests 270 against ext2 with delalloc enabled: Mar 28 08:58:02 localhost kernel: [ 171.526344] EXT4-fs (loop1): delayed block allocation failed for inode 14 at logical offset 64486 with max blocks 64 with error -28 Mar 28 08:58:02 localhost kernel: [ 171.526346] EXT4-fs (loop1): This should not happen!! Data will be lost 270 ultimately fails with an inconsistent filesystem and requires an fsck to repair. The cause of the error is an underflow in ext4_da_update_reserve_space() due to an unreserved meta block allocation. Signed-off-by: Brian Foster Signed-off-by: "Theodore Ts'o" Signed-off-by: Ben Hutchings