| Age | Commit message (Collapse) | Author |
|
A patch that stores inode flags such as S_IMMUTABLE, S_APPEND, etc. from
i_flags to EXT3_I(inode)->i_flags when inode is written to disk. The same
thing is done on GETFLAGS ioctl.
Quota code changes these flags on quota files (to make it harder for
sysadmin to screw himself) and these changes were not correctly propagated
into the filesystem (especially, lsattr did not show them and users were
wondering...).
Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into
ext3-specific i_flags. Hence, when someone sets these flags via a
different interface than ioctl, they are stored correctly.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Many struct inode_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
[try #6]
Move the Ext3 device ioctl compat stuff from fs/compat_ioctl.c to the Ext3
driver so that the Ext3 header file doesn't need to be included.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Move common FS-specific ioctls from linux/ext2_fs.h to linux/fs.h as FS_IOC_*
and FS_IOC32_* and have the users of them use those as a base.
Also move the GETFLAGS/SETFLAGS flags to linux/fs.h as FS_*_FL macros, and then
have the other users use them as a base.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fixing up some endian-ness warnings in preparation to clone ext4 from ext3.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
More white space cleanups in preparation of cloning ext4 from ext3.
Removing spaces that precede a tab.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Jeff Garzik <jeff@garzik.org>
|
|
The inode number out of an NFS file handle gets passed eventually to
ext3_get_inode_block() without any checking. If ext3_get_inode_block()
allows it to trigger an error, then bad filehandles can have unpleasant
effect - ext3_error() will usually cause a forced read-only remount, or a
panic if `errors=panic' was used.
So remove the call to ext3_error there and put a matching check in
ext3/namei.c where inode numbers are read off storage.
[akpm@osdl.org: fix off-by-one error]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: <stable@kernel.org>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Cc: Eric Sandeen <esandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Convert the ext3 in-kernel filesystem blocks to ext3_fsblk_t. Convert the
rest of all unsigned long type in-kernel filesystem blocks to ext3_fsblk_t,
and replace the printk format string respondingly.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Some of the in-kernel ext3 block variable type are treated as signed 4 bytes
int type, thus limited ext3 filesystem to 8TB (4kblock size based). While
trying to fix them, it seems quite confusing in the ext3 code where some
blocks are filesystem-wide blocks, some are group relative offsets that need
to be signed value (as -1 has special meaning). So it seem saner to define
two types of physical blocks: one is filesystem wide blocks, another is
group-relative blocks. The following patches clarify these two types of
blocks in the ext3 code, and fix the type bugs which limit current 32 bit ext3
filesystem limit to 8TB.
With this series of patches and the percpu counter data type changes in the mm
tree, we are able to extend exts filesystem limit to 16TB.
This work is also a pre-request for the recent >32 bit ext3 work, and makes
the kernel to able to address 48 bit ext3 block a lot easier: Simply redefine
ext3_fsblk_t from unsigned long to sector_t and redefine the format string for
ext3 filesystem block corresponding.
Two RFC with a series patches have been posted to ext2-devel list and have
been reviewed and discussed:
http://marc.theaimsgroup.com/?l=ext2-devel&m=114722190816690&w=2
http://marc.theaimsgroup.com/?l=ext2-devel&m=114784919525942&w=2
Patches are tested on both 32 bit machine and 64 bit machine, <8TB ext3 and
>8TB ext3 filesystem(with the latest to be released e2fsprogs-1.39). Tests
includes overnight fsx, tiobench, dbench and fsstress.
This patch:
Defines ext3_fsblk_t and ext3_grpblk_t, and the printk format string for
filesystem wide blocks.
This patch classifies all block group relative blocks, and ext3_fsblk_t blocks
occurs in the same function where used to be confusing before. Also include
kernel bug fixes for filesystem wide in-kernel block variables. There are
some fileystem wide blocks are treated as int/unsigned int type in the kernel
currently, especially in ext3 block allocation and reservation code. This
patch fixed those bugs by converting those variables to ext3_fsblk_t(unsigned
long) type.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
|
|
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
|
|
This is a conversion to make the various file_operations structs in fs/
const. Basically a regexp job, with a few manual fixups
The goal is both to increase correctness (harder to accidentally write to
shared datastructures) and reducing the false sharing of cachelines with
things that get dirty in .data (while .rodata is nicely read only and thus
cache clean)
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Change ext3_try_to_allocate() (called via ext3_new_blocks()) to try to
allocate the requested number of blocks on a best effort basis: After
allocated the first block, it will always attempt to allocate the next few(up
to the requested size and not beyond the reservation window) adjacent blocks
at the same time.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Currently ext3_get_block() only maps or allocates one block at a time. This
is quite inefficient for sequential IO workload.
I have posted a early implements a simply multiple block map and allocation
with current ext3. The basic idea is allocating the 1st block in the existing
way, and attempting to allocate the next adjacent blocks on a best effort
basis. More description about the implementation could be found here:
http://marc.theaimsgroup.com/?l=ext2-devel&m=112162230003522&w=2
The following the latest version of the patch: break the original patch into 5
patches, re-worked some logicals, and fixed some bugs. The break ups are:
[patch 1] Adding map multiple blocks at a time in ext3_get_blocks()
[patch 2] Extend ext3_get_blocks() to support multiple block allocation
[patch 3] Implement multiple block allocation in ext3-try-to-allocate
(called via ext3_new_block()).
[patch 4] Proper accounting updates in ext3_new_blocks()
[patch 5] Adjust reservation window size properly (by the given number
of blocks to allocate) before block allocation to increase the
possibility of allocating multiple blocks in a single call.
Tests done so far includes fsx,tiobench and dbench. The following numbers
collected from Direct IO tests (1G file creation/read) shows the system time
have been greatly reduced (more than 50% on my 8 cpu system) with the patches.
1G file DIO write:
2.6.15 2.6.15+patches
real 0m31.275s 0m31.161s
user 0m0.000s 0m0.000s
sys 0m3.384s 0m0.564s
1G file DIO read:
2.6.15 2.6.15+patches
real 0m30.733s 0m30.624s
user 0m0.000s 0m0.004s
sys 0m0.748s 0m0.380s
Some previous test we did on buffered IO with using multiple blocks allocation
and delayed allocation shows noticeable improvement on throughput and system
time.
This patch:
Add support of mapping multiple blocks in one call.
This is useful for DIO reads and re-writes (where blocks are already
allocated), also is in line with Christoph's proposal of using getblocks() in
mpage_readpage() or mpage_readpages().
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Linus points out that ext3_readdir's readahead only cuts in when
ext3_readdir() is operating at the very start of the directory. So for large
directories we end up performing no readahead at all and we suck.
So take it all out and use the core VM's page_cache_readahead(). This means
that ext3 directory reads will use all of readahead's dynamic sizing goop.
Note that we're using the directory's filp->f_ra to hold the readahead state,
but readahead is actually being performed against the underlying blockdev's
address_space. Fortunately the readahead code is all set up to handle this.
Tested with printk. It works. I was struggling to find a real workload which
actually cared.
(The patch also exports page_cache_readahead() to GPL modules)
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
If /etc/mtab is a regular file all of the mount options (of a file system)
are written to /etc/mtab by the mount command. The quota tools look there
for the quota strings for their operation. If, however, /etc/mtab is a
symlink to /proc/mounts (a "good thing" in some environments) the tools
don't write anything - they assume the kernel will take care of things.
While the quota options are sent down to the kernel via the mount system
call and the file system codes handle them properly unfortunately there is
no code to echo the quota strings into /proc/mounts and the quota tools
fail in the symlink case.
The attached patchs modify the EXT[2|3] and JFS codes to add the necessary
hooks. The show_options function of each file system in these patches
currently deal with only those things that seemed related to quotas;
especially in the EXT3 case more can be done (later?).
Jan Kara also noted the difficulty in moving these changes above the FS
codes responding similarly to myself to Andrew's comment about possible
VFS migration. Issue summary:
- FS codes have to process the entire string of options anyway.
- Only FS codes that use quotas must have a show_options function (for
quotas to work properly) however quotas are only used in a small number
of FS.
- Since most of the quota using FS support other options these FS codes
should have the a show_options function to show those options - and the
quota echoing becomes virtually negligible.
Based on feedback I have modified my patches from the original:
JFS a missing patch has been restored to the posting
EXT[2|3] and JFS always use the show_options function
- Each FS has at least one FS specific option displayed
- QUOTA output is under a CONFIG_QUOTA ifdef
- a follow-on patch will add a multitude of options for each FS
EXT[2|3] and JFS "quota" is treated as "usrquota"
EXT3 journalled data check for journalled quota removed
EXT[2|3] mount when quota specified but not compiled in
- no changes from my original patch. I tested the patch and the codes
warn but
- still mount. With all due respection I believe the comments
otherwise were a
- misread of the patch. Please reread/test and comment. XFS patch
removed - the XFS team already made the necessary changes EXT3 mixing
old and new quotas are handled differently (not purely exclusive)
- if old and new quotas for the same type are used together the old
type is silently depricated for compatability (e.g. usrquota and
usrjquota)
- mixing of old and new quotas is an error (e.g. usrjquota and
grpquota)
Signed-off-by: Mark Bellon <mbellon@mvista.com>
Acked-by: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Fix a problem with ext3 mount option parsing. When remount of a filesystem
fails, old options are now restored.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Use improved credits estimates for quota operations. Also reserve a space
for a quota operation in a transaction only if filesystem was mounted with
some quota options.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Moved i_next_alloc_block and i_next_goal_block out from ext3_inod_info, and
put it together with the reservation structure into the
ext3_block_alloc_info structure, and dynamically allocate that structure
whenever need to allocation a block. This is also apply for noreservation
mount. Also cleanup ext3_find_goal() code.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Right now the ext3 reservation structure(ext3_reserve_window_node) is part of
the ext3 inode itself. This part of information is only needed for files that
need allocate blocks on disk. So, the attached patches reduce the ext3 inode
size by dynamically allocating the block allocation/reservation info
structure(called struct ext3_block_alloc_info) when it is needed(i.e. only
for files who need to allocate blocks)
The reservation structure is being allocated and linked to the ext3 inode at
ext3_get_block_handle(), and being freed and unlinked at the
iput_final->ext3_clear_inode().
The ei->truncate_sem which is currently used to protect concurrent
ext3_get_block() and ext3_truncate is used to protect reservation structure
allocation and deallocation.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Add a `nobh' mount option to ext3 in writeback mode: avoid attaching
buffer_head to data pages, like ext2.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This started of as a patch by Alex Tomas <alex@clusterfs.com> and got an
overhaul by me. The on-disk structure used is the same as in Alex's
original patch.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The in_mem optimization in ext3_get_inode_loc avoids a disk read when only
the requested inode in the block group is allocated: In that case
ext3_get_inode_loc assumes that it can recreate the inode from the
in-memory inode. This is incorrect with in-inode extended attributes,
which don't have a shadow copy in memory. Hide the in_mem option and
clarify comments; the subsequent ea-in-inode changes the in_mem check as
required.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Revert the recently-added (post-2.6.10) ea-in-inode speedup patch. We have a
new one.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
- make some needlessly global code static
- super.c: remove the unused global function ext3_panic
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
ext3_put_inode has been removed a while ago.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
1) intent of the patch is to get possibility to store EAs in the body of large
inode. it saves space and improves performance in some cases
2) the patch is quite simple: it works the same way original xattr does, but
using other storage (inode body). body has priority over separate block.
original routines (ext3_xattr_get, ext3_xattr_list, ext3_xattr_set) are
renamed to ext3_xattr_block_*. new routines that handle inode storate are
added (ext3_xattr_ibody_get, ext3_xattr_ibody_list, ext3_xattr_ibody_set).
routines ext3_xattr_get, ext3_xattr_list and ext3_xattr_set allow user to
accesss both the storages transparently
3) the change makes sense on filesystem with inode size >= 256 bytes only.
2.4 kernels don't support such a filesystems, AFAIK. 2.6 kernels do support
and ignore EAs stored in a body w/o the patch
4) debugfs and e2fsck need to be patched to deal with EAs in inode
the patch will be sent later
5) testing results:
a) Andrew Samba Master (tridge) has done successful tests
b) we've been using ea-in-inode feature in Lustre for many months
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The patch below renames struct reserve_window_node* and rsv_window_add()
function to struct ext3_reserve_window_node* and ext3_rsv_window_add().
This eases the task of having several ext3-derived filesystem drivers (with
different capabilities) in kernel.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The patch below adds online resize capability to ext3 based on Andreas
patch for 2.4 and fixed up by Stephen.
The patch also removes s_debts:
s_debts is currently not used by ext3 (it is created, destroyed and checked
but never set). Remove it for now.
Resurrecting this will require adding it back in changed form. In existing
form it's already unsafe wrt. byte-tearing as it performs unlocked byte
increment/decrement on words which may be being accessed simultaneously on
other CPUs. It is also the only in-memory dynamic table which needs to be
extended by online-resize, so locking it will require care.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
rbtree implementation and other changes From: Stephen Tweedie <sct@redhat.com>
contributions From: Badari Pulavarty <pbadari@us.ibm.com> and probably me.
This is the ext3 block reservation patch. It improves the layout of ext3
files by establishing, for each inode, reserved areas of the disk in which
only that file can allocate blocks. Those reserved areas are managed in an
rbtree, via the in-core inode.
It's a bit like ext2 preallocation only stronger in that it can span
already-allocated blocks, including the per-blockgroup inode tables and
bitmaps.
The patch fixes ext3's worst performance problem: disastrous layout when
multiple files are being concurrently grown.
It increases the size of the inode by rather a lot. A todo item is to
dynamically allocate the `struct reserve_window_node', so we don't need to
carry this storage for inodes which aren't opened for writing.
The feature is enabled by mounting with the "reservation" mount option.
Reservations default to "off".
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Cleans up the old ext3 preallocation code carried from ext2 but turned off.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
* missing le32_to_cpu() in a bunch of printks
* on big-endian boxen ext3_error() failed to set EXT3_ERROR_FS in
->s_state (cpu_to_le32() instead of cpu_to_le16())
Signed-off-by: Al Viro <viro@parcelfarce.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Currently metadata writing errors are ignored and not returned from
sys_fsync on ext2 and ext3 filesystems. That is, at least ext2 and ext3.
Both ext2 and ext3 resort to sync_inode() in their ->sync_inode method,
which in turn calls ->write_inode. ->write_inode method has void type, and
any IO errors happening inside are lost.
Make ->write_inode return the error code?
Signed-off-by: Andrey Savochkin <saw@saw.sw.com.sg>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Mount with "mount -o barrier=1" to enable barriers.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Here is a reworked version of my patch to ext3 to retry certain filesystem
operations after an ENOSPC error. The ext3_should_retry_alloc() function will
not wait on the currently running transaction if there is a currently active
handle; hence this should avoid deadlocks in the Lustre use case. The patch
is versus BK-recent.
I've also included a simple, reliable test case which demonstrates the problem
this patch is intended to fix. (Note that BK-recent is not sufficient to
address this test case, and waiting on the commiting transaction in
ext3_new_block is also not sufficient. Been there, tried that, didn't work.
We need to do the full-bore retry from the top level. The
ext3_should_retry_alloc() will only wait on the committing transaction if
there is an active handle; hence Lustre will probably also need to use
ext3_should_retry_alloc() if it wants to reliably avoid this particular
problem.)
#!/bin/sh
#
#
TEST_DIR=/tmp
IMAGE=$TEST_DIR/retry.img
MNTPT=$TEST_DIR/retry.mnt
TEST_SRC=/usr/projects/e2fsprogs/e2fsprogs/build
MKE2FS_OPTS=""
IMAGE_SIZE=8192
umount $MNTPT
dd if=/dev/zero of=$IMAGE bs=4k count=$IMAGE_SIZE
mke2fs -j -F $MKE2FS_OPTS $IMAGE
function test_log ()
{
echo $*
logger -p local4.notice $*
}
mkdir -p $MNTPT
mount -o loop -t ext3 $IMAGE $MNTPT
test_log Retry test: BEGIN
for i in `seq 1 3`
do
test_log "Retry test: Loop $i"
echo 2 > /proc/sys/fs/jbd-debug
while ! mkdir -p $MNTPT/foo/bar
do
test_log "Retry test: mkdir failed"
sleep 1
done
echo 0 > /proc/sys/fs/jbd-debug
cp -r $TEST_SRC $MNTPT/foo/bar 2> /dev/null
rm -rf $MNTPT/*
done
umount $MNTPT
test_log "Retry test: END"
akpm@osdl.org
Rework the code to make it a formal JBD API entry point.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: Alex Tomas <bzzz@tmi.comex.ru>
ext3_get_inode_loc() read inode's block only if:
1) this inode has no copy in memory
2) inode's block has another valid inode(s)
this optimization allows to avoid needless I/O in two cases:
1) just allocated inode is first valid in the inode's block
2) kernel wants to write inode, but buffer in which inode
belongs to gets freed by VM
|
|
follow by splitting it into two functions: one that calculates
the position, and the other that actually reads the inode
block off the disk.
|
|
The ext3 version number hasn't been updated since ext3 was merged.
We track ext3 via the kernel release ID. Remove the ext3 version
number.
|
|
From: Peter Chubb <peter@chubb.wattle.id.au>
Add two new system calls, statfs64 and fstatfs64. This has been needed
sincew the 64-bit sector_t merge - the current structures will overflow.
- Use a common interface (vfs_statfs) with the rest of the kernel,
- convert to 32-bit at (f)statfs time.
- New field f_frsize gives underlying fragment size for the filesystem.
(Solaris has this, and the Open Group describe it).
- The old statfs syscalls will now return -EOVERFLOW if the device was
too large to be represented inthe old data structures.
The new system calls take a size_t argument, which is the size of the
structure to be filled in (as requested by Ben LaHaise), to `futureproof' the
interface.
Has been reviewed by the arch maintainers and by Ulrich Drepper.
|
|
ext3's fully data-journalled mode has been broken for a year. This patch
fixes it up.
The prepare_write/commit_write/writepage implementations have been split up.
Instead of having each function handle all three journalling mode we now have
three separate sets of address_space_operations.
The problematic part of data=journal is MAP_SHARED writepage traffic: pages
which don't have buffers. In 2.4 these were cheatingly treated as
data-ordered buffers and that caused several nasty problems.
Here we do it properly: writepage traffic is fully journalled. This means
that the various workarounds for the 2.4 scheme can be removed, when I
remember where they all are.
The PG_checked flag has been borrowed: it it set in the atomic set_page_dirty
a_op to tell the subsequent writepage() that this page needs to have buffers
attached, dirtied and journalled.
This rather defines PG_checked as "fs-private info in page->flags" and it
should be renamed sometime.
|
|
From: Alex Tomas <bzzz@tmi.comex.ru>
This patch weans ext3 off lock_super()-based protection for the inode and
block allocators.
It's basically the same as the ext2 changes.
1) each group has own spinlock, which is used for group counter
modifications
2) sb->s_free_blocks_count isn't used any more. ext2_statfs() and
find_group_orlov() loop over groups to count free blocks
3) sb->s_free_blocks_count is recalculated at mount/umount/sync_super time
in order to check consistency and to avoid fsck warnings
4) reserved blocks are distributed over last groups
5) ext3_new_block() tries to use non-reserved blocks and if it fails then
tries to use reserved blocks
6) ext3_new_block() and ext3_free_blocks do not modify sb->s_free_blocks,
therefore they do not call mark_buffer_dirty() for superblock's
buffer_head. this should reduce I/O a bit
Also fix orlov allocator boundary case:
In the interests of SMP scalability the ext2 free blocks and free inodes
counters are "approximate". But there is a piece of code in the Orlov
allocator which fails due to boundary conditions on really small
filesystems.
Fix that up via a final allocation pass which simply uses first-fit for
allocatiopn of a directory inode.
|
|
Patch from "Theodore Ts'o" <tytso@mit.edu>
We now use 0x7ffffff as the EOF cookie, because Linux NFS stupidly interprets
the cookie (which is supposed to be a bag of bits without necessarily any
semantic value) as a signed 64 bit integer, and then converts it to a
unsigned integer, and then blows up if it cannot be expressed be expressed as
a 32-bit value!!
In order to do this, we have to fold the hash value 0x7ffffff into the hash
value 0x7ffffffe. This is relatively safe; the only time we will lose if the
directory contains filenames that hash to both 0x7ffffffe and 0x7fffffff
(under the original hash), and the last directory entry which hashes to
0x7ffffffe is at the end of a leaf block, and the first directory entry which
hashes to 0x7fffffff is at the beginning of a leaf block.
|
|
Patch from "Theodore Ts'o" <tytso@mit.edu>
I recently noticed a bug in ext2/3; newly created inodes which inherit
the noatime flag from their containing directory do not respect noatime
until the inode is flushed from the inode cache and then re-read later.
This is because the code which checks the ext2 no-atime attribute and
then sets the S_NOATIME in inode->i_flags is present in
ext2_read_inode(), but not in ext2_new_inode().
I fixed this in 2.4, and then found an even worse bug in the 2.5 code;
the DIRSYNC flag is completely ignored *except* in the case where a
directory is newly created using mkdir and its parent directory has the
DIRSYNC flag. S_DIRSYNC doesn't get set in the ext2_new_inode() or the
ext2_ioctl() paths (which is used by chattr).
This patch centralizes the code which translates the ext2 flags in the
raw ext2 inode to the appropriate flag values in inode->i_flags in a
single location. This fixes the bug, makes things cleaner, and also
removes 30 lines of code and 128 bytes of compiled x86 text in the
bargain.
|
|
Patch from Andreas Dilger <adilger@clusterfs.com>
This patch against 2.5.53 removes my erronous use of ino_t in a couple of
places in the ext3 code. This has been replaced with unsigned long (the same
as is used for inode->i_ino). This patch matches the fix submitted to 2.4
for fixing 64-bit compiler warnings, and also replaces a couple of %ld with
%lu to forestall output wierdness with filesystems with a few billion inodes.
|
|
fs.h only needs the forward-declaration of struct statfs
|
|
Don't include the following headers implicitly through fs.h:
stddef.h, string.h, bitops.h, pipe_fs_i.h, ext3_fs_i.h, efs_fs_i.h
and fixup the fallout..
|
|
The algorithm for finding the block group descriptor blocks for the
future on-line resizable ext2/3 format change got out of sync with
what was actually shipped in e2fsprogs 1.30. (And what is in e2fsprogs
1.30 is better since it avoids a free block fragmentation at the
beginning of the block group.) This change is safe, since no one is
actually using the new meta_bg block group layout just yet.
|
|
This patch checks for a failed kmalloc() in ext3_htree_store_dirent(),
and passes the error up to its caller, ext3_htree_fill_tree().
|
|
Here's the ext3 version.
|