user/sven/linux.git/fs/jbd2, branch v3.16.52

jbd2: don't leak modified metadata buffers on an aborted journal

2017-06-05T20:17:03Z

commit e112666b4959b25a8552d63bc564e1059be703e8 upstream. If the journal has been aborted, we shouldn't mark the underlying buffer head as dirty, since that will cause the metadata block to get modified. And if the journal has been aborted, we shouldn't allow this since it will almost certainly lead to a corrupted file system. Signed-off-by: Theodore Ts'o Signed-off-by: Ben Hutchings

jbd2: fix incorrect unlock on j_list_lock

2017-02-23T03:54:14Z

commit 559cce698eaf4ccecb2213b2519ea3a0413e5155 upstream. When 'jh->b_transaction == transaction' (asserted by below) J_ASSERT_JH(jh, (jh->b_transaction == transaction || ... 'journal->j_list_lock' will be incorrectly unlocked, since the the lock is aquired only at the end of if / else-if statements (missing the else case). Signed-off-by: Taesoo Kim Signed-off-by: Theodore Ts'o Reviewed-by: Andreas Dilger Fixes: 6e4862a5bb9d12be87e4ea5d9a60836ebed71d28 Signed-off-by: Ben Hutchings

jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path

2016-04-30T22:05:57Z

commit c0a2ad9b50dd80eeccd73d9ff962234590d5ec93 upstream. On umount path, jbd2_journal_destroy() writes latest transaction ID (->j_tail_sequence) to be used at next mount. The bug is that ->j_tail_sequence is not holding latest transaction ID in some cases. So, at next mount, there is chance to conflict with remaining (not overwritten yet) transactions. mount (id=10) write transaction (id=11) write transaction (id=12) umount (id=10) <= the bug doesn't write latest ID mount (id=10) write transaction (id=11) crash mount [recovery process] transaction (id=11) transaction (id=12) <= valid transaction ID, but old commit must not replay Like above, this bug become the cause of recovery failure, or FS corruption. So why ->j_tail_sequence doesn't point latest ID? Because if checkpoint transactions was reclaimed by memory pressure (i.e. bdev_try_to_free_page()), then ->j_tail_sequence is not updated. (And another case is, __jbd2_journal_clean_checkpoint_list() is called with empty transaction.) So in above cases, ->j_tail_sequence is not pointing latest transaction ID at umount path. Plus, REQ_FLUSH for checkpoint is not done too. So, to fix this problem with minimum changes, this patch updates ->j_tail_sequence, and issue REQ_FLUSH. (With more complex changes, some optimizations would be possible to avoid unnecessary REQ_FLUSH for example though.) BTW, journal->j_tail_sequence = ++journal->j_transaction_sequence; Increment of ->j_transaction_sequence seems to be unnecessary, but ext3 does this. Signed-off-by: OGAWA Hirofumi Signed-off-by: Theodore Ts'o Signed-off-by: Ben Hutchings

jbd2: Fix unreclaimed pages after truncate in data=journal mode

2016-01-11T10:50:16Z

commit bc23f0c8d7ccd8d924c4e70ce311288cb3e61ea8 upstream. Ted and Namjae have reported that truncated pages don't get timely reclaimed after being truncated in data=journal mode. The following test triggers the issue easily: for (i = 0; i < 1000; i++) { pwrite(fd, buf, 1024*1024, 0); fsync(fd); fsync(fd); ftruncate(fd, 0); } The reason is that journal_unmap_buffer() finds that truncated buffers are not journalled (jh->b_transaction == NULL), they are part of checkpoint list of a transaction (jh->b_cp_transaction != NULL) and have been already written out (!buffer_dirty(bh)). We clean such buffers but we leave them in the checkpoint list. Since checkpoint transaction holds a reference to the journal head, these buffers cannot be released until the checkpoint transaction is cleaned up. And at that point we don't call release_buffer_page() anymore so pages detached from mapping are lingering in the system waiting for reclaim to find them and free them. Fix the problem by removing buffers from transaction checkpoint lists when journal_unmap_buffer() finds out they don't have to be there anymore. Reported-and-tested-by: Namjae Jeon Fixes: de1b794130b130e77ffa975bb58cb843744f9ae5 Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o Signed-off-by: Luis Henriques

ext4, jbd2: ensure entering into panic after recording an error in superblock

2015-12-13T17:48:48Z

commit 4327ba52afd03fc4b5afa0ee1d774c9c5b0e85c5 upstream. If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the journaling will be aborted first and the error number will be recorded into JBD2 superblock and, finally, the system will enter into the panic state in "errors=panic" option. But, in the rare case, this sequence is little twisted like the below figure and it will happen that the system enters into panic state, which means the system reset in mobile environment, before completion of recording an error in the journal superblock. In this case, e2fsck cannot recognize that the filesystem failure occurred in the previous run and the corruption wouldn't be fixed. Task A Task B ext4_handle_error() -> jbd2_journal_abort() -> __journal_abort_soft() -> __jbd2_journal_abort_hard() | -> journal->j_flags |= JBD2_ABORT; | | __ext4_abort() | -> jbd2_journal_abort() | | -> __journal_abort_soft() | | -> if (journal->j_flags & JBD2_ABORT) | | return; | -> panic() | -> jbd2_journal_update_sb_errno() Tested-by: Hobin Woo Signed-off-by: Daeho Jeong Signed-off-by: Theodore Ts'o Signed-off-by: Luis Henriques

jbd2: avoid infinite loop when destroying aborted journal

2015-10-19T10:17:07Z

commit 841df7df196237ea63233f0f9eaa41db53afd70f upstream. Commit 6f6a6fda2945 "jbd2: fix ocfs2 corrupt when updating journal superblock fails" changed jbd2_cleanup_journal_tail() to return EIO when the journal is aborted. That makes logic in jbd2_log_do_checkpoint() bail out which is fine, except that jbd2_journal_destroy() expects jbd2_log_do_checkpoint() to always make a progress in cleaning the journal. Without it jbd2_journal_destroy() just loops in an infinite loop. Fix jbd2_journal_destroy() to cleanup journal checkpoint lists of jbd2_log_do_checkpoint() fails with error. Reported-by: Eryu Guan Tested-by: Eryu Guan Fixes: 6f6a6fda294506dfe0e3e0a253bb2d2923f28f0a Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o [ luis: backported to 3.16: used Jan's backport ] Signed-off-by: Luis Henriques

jbd2: fix ocfs2 corrupt when updating journal superblock fails

2015-07-15T09:00:35Z

commit 6f6a6fda294506dfe0e3e0a253bb2d2923f28f0a upstream. If updating journal superblock fails after journal data has been flushed, the error is omitted and this will mislead the caller as a normal case. In ocfs2, the checkpoint will be treated successfully and the other node can get the lock to update. Since the sb_start is still pointing to the old log block, it will rewrite the journal data during journal recovery by the other node. Thus the new updates will be overwritten and ocfs2 corrupts. So in above case we have to return the error, and ocfs2_commit_cache will take care of the error and prevent the other node to do update first. And only after recovering journal it can do the new updates. The issue discussion mail can be found at: https://oss.oracle.com/pipermail/ocfs2-devel/2015-June/010856.html http://comments.gmane.org/gmane.comp.file-systems.ext4/48841 [ Fixed bug in patch which allowed a non-negative error return from jbd2_cleanup_journal_tail() to leak out of jbd2_fjournal_flush(); this was causing xfstests ext4/306 to fail. -- Ted ] Reported-by: Yiwen Jiang Signed-off-by: Joseph Qi Signed-off-by: Theodore Ts'o Tested-by: Yiwen Jiang Cc: Junxiao Bi Signed-off-by: Luis Henriques

jbd2: use GFP_NOFS in jbd2_cleanup_journal_tail()

2015-07-15T09:00:34Z

commit b4f1afcd068f6e533230dfed00782cd8a907f96b upstream. jbd2_cleanup_journal_tail() can be invoked by jbd2__journal_start() So allocations should be done with GFP_NOFS [Full stack trace snipped from 3.10-rh7] [] dump_stack+0x19/0x1b [] warn_slowpath_common+0x61/0x80 [] warn_slowpath_null+0x1a/0x20 [] slab_pre_alloc_hook.isra.31.part.32+0x15/0x17 [] kmem_cache_alloc+0x55/0x210 [] ? mempool_alloc_slab+0x15/0x20 [] mempool_alloc_slab+0x15/0x20 [] mempool_alloc+0x69/0x170 [] ? _raw_spin_unlock_irq+0xe/0x20 [] ? finish_task_switch+0x5d/0x150 [] bio_alloc_bioset+0x1be/0x2e0 [] blkdev_issue_flush+0x99/0x120 [] jbd2_cleanup_journal_tail+0x93/0xa0 [jbd2] -->GFP_KERNEL [] jbd2_log_do_checkpoint+0x221/0x4a0 [jbd2] [] __jbd2_log_wait_for_space+0xa7/0x1e0 [jbd2] [] start_this_handle+0x2d8/0x550 [jbd2] [] ? __memcg_kmem_put_cache+0x29/0x30 [] ? kmem_cache_alloc+0x130/0x210 [] jbd2__journal_start+0xba/0x190 [jbd2] [] ? lru_cache_add+0xe/0x10 [] ? ext4_da_write_begin+0xf9/0x330 [ext4] [] __ext4_journal_start_sb+0x77/0x160 [ext4] [] ext4_da_write_begin+0xf9/0x330 [ext4] [] generic_file_buffered_write_iter+0x10c/0x270 [] __generic_file_write_iter+0x178/0x390 [] __generic_file_aio_write+0x8b/0xb0 [] generic_file_aio_write+0x5d/0xc0 [] ext4_file_write+0xa9/0x450 [ext4] [] ? pipe_read+0x379/0x4f0 [] do_sync_write+0x90/0xe0 [] vfs_write+0xbd/0x1e0 [] SyS_write+0x58/0xb0 [] system_call_fastpath+0x16/0x1b Signed-off-by: Dmitry Monakhov Signed-off-by: Theodore Ts'o Signed-off-by: Luis Henriques

jbd2: fix r_count overflows leading to buffer overflow in journal recovery

2015-05-28T09:00:03Z

commit e531d0bceb402e643a4499de40dd3fa39d8d2e43 upstream. The journal revoke block recovery code does not check r_count for sanity, which means that an evil value of r_count could result in the kernel reading off the end of the revoke table and into whatever garbage lies beyond. This could crash the kernel, so fix that. However, in testing this fix, I discovered that the code to write out the revoke tables also was not correctly checking to see if the block was full -- the current offset check is fine so long as the revoke table space size is a multiple of the record size, but this is not true when either journal_csum_v[23] are set. Signed-off-by: Darrick J. Wong Signed-off-by: Theodore Ts'o Reviewed-by: Jan Kara Signed-off-by: Luis Henriques

ext4: fix NULL pointer dereference when journal restart fails

2015-05-28T09:00:03Z

commit 9d506594069355d1fb2de3f9104667312ff08ed3 upstream. Currently when journal restart fails, we'll have the h_transaction of the handle set to NULL to indicate that the handle has been effectively aborted. We handle this situation quietly in the jbd2_journal_stop() and just free the handle and exit because everything else has been done before we attempted (and failed) to restart the journal. Unfortunately there are a number of problems with that approach introduced with commit 41a5b913197c "jbd2: invalidate handle if jbd2_journal_restart() fails" First of all in ext4 jbd2_journal_stop() will be called through __ext4_journal_stop() where we would try to get a hold of the superblock by dereferencing h_transaction which in this case would lead to NULL pointer dereference and crash. In addition we're going to free the handle regardless of the refcount which is bad as well, because others up the call chain will still reference the handle so we might potentially reference already freed memory. Moreover it's expected that we'll get aborted handle as well as detached handle in some of the journalling function as the error propagates up the stack, so it's unnecessary to call WARN_ON every time we get detached handle. And finally we might leak some memory by forgetting to free reserved handle in jbd2_journal_stop() in the case where handle was detached from the transaction (h_transaction is NULL). Fix the NULL pointer dereference in __ext4_journal_stop() by just calling jbd2_journal_stop() quietly as suggested by Jan Kara. Also fix the potential memory leak in jbd2_journal_stop() and use proper handle refcounting before we attempt to free it to avoid use-after-free issues. And finally remove all WARN_ON(!transaction) from the code so that we do not get random traces when something goes wrong because when journal restart fails we will get to some of those functions. Signed-off-by: Lukas Czerner Signed-off-by: Theodore Ts'o Reviewed-by: Jan Kara Signed-off-by: Luis Henriques