| Age | Commit message (Collapse) | Author |
|
Currently all quota block reservation macros contains hardcoded "2"
aka MAXQUOTAS value. This is no good because in some places it is not
obvious to understand what does this digit represent. Let's introduce
new macro with self descriptive name.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Saves nearly 4kbytes on x86.
Cc: Arnaldo Carvalho de Melo <acme@mandriva.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Remove whitespace from ext3 and jbd, before we clone ext4.
Signed-off-by: Mingming Cao<cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Use improved credits estimates for quota operations. Also reserve a space
for a quota operation in a transaction only if filesystem was mounted with
some quota options.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
fix against credits leak in journal_release_buffer()
The idea is to charge a buffer in journal_dirty_metadata(), not in
journal_get_*_access()). Each buffer has flag call
journal_dirty_metadata() sets on the buffer.
Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
- make some needlessly global code static
- super.c: remove the unused global function ext3_panic
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch improves ext3's ability to deal with corruption on-disk. If we
ever get a corrupt inode or indirect block, then an attempt to delete it
can end up trying to remove any block on the fs, including bitmap blocks.
This can cause ext3 to assert-fail as we end up trying to do an ext3_forget
on a buffer with b_committed_data set.
The fix is to downgrade this to an IO error and journal abort, so that we
take the filesystem readonly but don't bring down the whole kernel.
Make J_EXPECT_JH() return a value so it can be easily tested and yet still
retained as an assert failure if we build ext3 with full internal debugging
enabled. Make journal_forget() return an error code so that in this case
the error can be passed up to the caller.
This is easily reproduced with a sample ext3 fs image containing an inode
whose direct and indirect blocks refer to a block bitmap block. Allocating
new blocks and then deleting that inode will BUG() with:
Assertion failure in journal_forget() at fs/jbd/transaction.c:1228:
"!jh->b_committed_data"
With the fix, ext3 recovers gracefully.
Signed-off-by: Stephen Tweedie <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Implementation of quota reading and writing functions for ext3.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The patch below adds online resize capability to ext3 based on Andreas
patch for 2.4 and fixed up by Stephen.
The patch also removes s_debts:
s_debts is currently not used by ext3 (it is created, destroyed and checked
but never set). Remove it for now.
Resurrecting this will require adding it back in changed form. In existing
form it's already unsafe wrt. byte-tearing as it performs unlocked byte
increment/decrement on words which may be being accessed simultaneously on
other CPUs. It is also the only in-memory dynamic table which needs to be
extended by online-resize, so locking it will require care.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: Jan Kara <jack@ucw.cz>
Journalled quota support for ext3: The patch consists of two parts - ext3
changes and changes in generic quota code. The main idea of the changes is
that a transaction is always started before any operation which changes quota
file and dirtifying of the quota causes its write to disk. These two changes
assure that quota change is journalled into the same transaction as the file
change and hence after journal replay quota is consistent with the filesystem
state. As during journal replay inodes from orphan list are deleted/truncated
we have to do quota_on before the replay of the orphan list - this problem is
solved by additional mount options to ext3 with quota file names and format.
Some changes in generic code were also needed to assure that quota structure
in file is always allocated and so ordinary quota operations (like
adding/deleting a block/inode) need only a few blocks from the transaction.
|
|
The test for whether an inode is using journalled, ordered or writeback data
is incorrect and can lead to ext3_set_aops() giving the inode the wrong set
of address_space_operations. Fix. (Spotted by Jan Kara).
|
|
From: Andreas Gruenbacher <agruen@suse.de>
The xattr and acl code are not properly reserving credits for quotas.
EXT3_DATA_TRANS_BLOCKS is an overestimate of the credits required including
quotas. Make it a little more tight, and use it in the xattr and acl code
to be quota safe.
|
|
There are various places in which JBD is starting a commit against a
transaction without sufficient locking in place to ensure that that
transaction is still alive.
Change it so that log_start_commit() takes a transaction ID instead. Make
the caller take a copy of that ID inside the appropriate locks.
|
|
There's a bug: a caller tries to journal a buffer and then decides he didn't
want to after all. He calls journal_release_buffer().
But journal_release_buffer() is only allowed to give the caller a buffer
credit back if it was the caller who added the buffer in the first place.
journal_release_buffer() currently looks at the buffer state to work that
out, but gets it wrong: if the buffer has been moved onto a different list by
some other part of ext3 the credit is bogusly not returned to the caller and
the fs can later go BUG due to handle credit exhaustion.
The fix:
Change journal_get_undo_access() to return the number of buffers which the
caller actually added to the journal. (one or zero).
When the caller later calls journal_release_buffer(), he passes in that
count, to tell journal_release_buffer() how many credits the caller should
get back.
For API consistency this change should also be made to
journal_get_create_access() and journal_get_write_access(). But there is no
requirement for that in ext3 at this time.
The remaining bug:
This logic effectively gives another transaction handle a free buffer credit.
These could conceivably accumulate and cause a journal overflow. This is a
separate problem and needs changes to the t_outstanding_credits accounting
and the logic in start_this_handle.
|
|
From: Alex Tomas <bzzz@tmi.comex.ru>
This patch weans ext3 off lock_super()-based protection for the inode and
block allocators.
It's basically the same as the ext2 changes.
1) each group has own spinlock, which is used for group counter
modifications
2) sb->s_free_blocks_count isn't used any more. ext2_statfs() and
find_group_orlov() loop over groups to count free blocks
3) sb->s_free_blocks_count is recalculated at mount/umount/sync_super time
in order to check consistency and to avoid fsck warnings
4) reserved blocks are distributed over last groups
5) ext3_new_block() tries to use non-reserved blocks and if it fails then
tries to use reserved blocks
6) ext3_new_block() and ext3_free_blocks do not modify sb->s_free_blocks,
therefore they do not call mark_buffer_dirty() for superblock's
buffer_head. this should reduce I/O a bit
Also fix orlov allocator boundary case:
In the interests of SMP scalability the ext2 free blocks and free inodes
counters are "approximate". But there is a piece of code in the Orlov
allocator which fails due to boundary conditions on really small
filesystems.
Fix that up via a final allocation pass which simply uses first-fit for
allocatiopn of a directory inode.
|
|
ext3_writepage() calls ext3_journal_stop(), which dereferences the affected
inode.
It does this _after_ writing the page out, which is illegal. The IO can
complete, the page can be repeased from the inode and the inode can be freed
up.
It's a long-standing bug. It has been reported happening on preemptible
kernels, where the timing window is larger.
Fix that up by teaching ext3_journal_stop to locate the superblock via the
journal structure, not via the inode.
This means that ext3_journal_stop() does not need the inode argument at all.
Also uninline the affected functions. It saves 5.5 kbytes.
Also remove the setting of sb->s_dirt in ext3_journal_stop(). That was an
awkward way of telling sys_sync() that the filesystem needs a commit, and
with the ext3_sync_fs() that is no longer needed.
|
|
journal_try_start() is a function which nonblockingly attempts to open a JBD
transaction handle. It was added a long time ago when there were concerns
that ext3_writepage() could block kswapd for too long.
It was never clearly necessary.
So the patch throws it all away and just calls the blocking journal_start()
from ext3_writepage().
|
|
Patch from "Stephen C. Tweedie" <sct@redhat.com>
Fix "h_buffer_credits<0" assert failure during truncate.
The bug occurs when the "i_blocks" count in the file's inode overflows
past 2^31. That works fine most of the time, because i_blocks is an
unsigned long, and should go up to 2^32; but there's a place in truncate
where ext3 calculates the size of the next transaction chunk for the
delete, and that mistakenly uses a signed long instead. Because the
huge i_blocks gets cast to a negative value, ext3 does not reserve
enough credits for the transaction and the above error results.
This is usually only possible on filesystems corrupted for other
reasons, but it is reproducible if you create a single, non-sparse file
larger than 1TB on ext3 and then try to delete it.
|
|
This is the leak which Con found. Long story...
- If a dirty page is fed into ext3_writepage() during truncate,
block_write_full_page() will reutrn -EIO (it's outside i_size) and will
leave the buffers dirty. In the expectation that discard_buffer() will
clean them.
- ext3_writepage() then adds the still-dirty buffers to the journal's
"async data list". These are buffers which are known to have had IO
started. All we need to do is to wait on them in commit.
- meanwhile, truncate will chop the pages off the address_space. But
truncate cannot invalidate the buffers (in journal_unmap_buffer()) because
the buffers are attached to the committing transaction. (hm. This
behaviour in journal_unmap_buffer() is bogus. We just never need to write
these buffers.)
- ext3 commit will "wait on writeout" of these writepage buffers (even
though it was never started) and will then release them from the
journalling system.
So we end up with pages which are attached to no mapping, which are clean and
which have dirty buffers. These are unreclaimable.
Aside:
ext3-ordered has two buffer lists: the "sync data list" and the "async
data list".
The sync list consists of probably-dirty buffers which were dirtied in
commit_write(). Transaction commit must write all thee out and wait on
them.
The async list supposedly consists of clean buffers which were attached
to the journal in ->writepage. These have had IO started (by writepage) so
commit merely needs to wait on them.
This is all designed for the 2.4 VM really. In 2.5, tons of writeback
goes via writepage (instead of the buffer lru) and these buffers end up
madly hpooing between the async and sync lists.
Plus it's arguably incorrect to just wait on the writes in commit - if
the buffers were set dirty again (say, by zap_pte_range()) then perhaps we
should write them again before committing.
So what the patch does is to remove the async list. All ordered-data buffers
are now attached to the single "sync data list". So when we come to commit,
those buffers which are dirty will have IO started and all buffers are waited
upon.
This means that the dirty buffers against a clean page which came about from
block_write_full_page()'s -EIO will be written to disk in commit - this
cleans them, and the page is now reclaimable. No leak.
It seems bogus to write these buffers in commit, and indeed it is. But ext3
will not allow those blocks to be reused until the commit has ended so there
is no corruption risk. And the amount of data involved is low - it only
comes about as a race between truncate and writepage().
|
|
This patch adds extended attribute support to the ext3 filesystem. This
uses the generic extended attribute patch which was developed by Andreas
Gruenbacher and the XFS team. As a result, the user space utilities
which work for XFS will also work with these patches.
|
|
Daniel Phillips' indexed directory support. Ported from ext2 by
Christopher Li. Contributions from Andreas Dilger, Stephen Tweedie,
lots from Ted.
It requires e2fsprogs-1.29; I've updated the Changes file to reflect
that.
|
|
Turn on direct-to-BIO writeback for ext3 in data=writeback mode.
|
|
- Al Viro: VFS inode allocation moved down to filesystem, trim inodes
- Greg KH: USB update, hotplug documentation
- Kai Germaschewski: ISDN update
- Ingo Molnar: scheduler tweaking ("J2")
- Arnaldo: emu10k kdev_t updates
- Ben Collins: firewire updates
- Björn Wesen: cris arch update
- Hal Duston: ps2esdi driver bio/kdev_t fixes
- Jean Tourrilhes: move wireless drivers into drivers/net/wireless,
update wireless API #1
- Richard Gooch: devfs race fix
- OGAWA Hirofumi: FATFS update
|
|
- Dave Jones: more merging, fix up last merge..
- release to sync with Dave
|
|
- Ivan Kokshaysky: fix alpha dec_and_lock with modules, for alpha config entry
- Kai Germaschewski: ISDN updates
- Jeff Garzik: network driver updates, sysv fs update
- Kai Mäkisara: SCSI tape update
- Alan Cox: large drivers merge
- Nikita Danilov: reiserfs procfs information
- Andrew Morton: ext3 merge
- Christoph Hellwig: vxfs livelock fix
- Trond Myklebust: NFS updates
- Jens Axboe: cpqarray + cciss dequeue fix
- Tim Waugh: parport_serial base_baud setting
- Matthew Dharm: usb-storage Freecom driver fixes
- Dave McCracken: wait4() thread group race fix
|