| Age | Commit message (Collapse) | Author |
|
commit 1415dd8705394399d59a3df1ab48d149e1e41e77 upstream.
When insert_inode_locked() fails in ext3_new_inode() it most likely
means inode bitmap got corrupted and we allocated again inode which
is already in use. Also doing unlock_new_inode() during error recovery
is wrong since inode does not have I_NEW set. Fix the problem by jumping
to fail: (instead of fail_drop:) which declares filesystem error and
does not call unlock_new_inode().
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue:
vfs: add d_prune dentry operation
vfs: protect i_nlink
filesystems: add set_nlink()
filesystems: add missing nlink wrappers
logfs: remove unnecessary nlink setting
ocfs2: remove unnecessary nlink setting
jfs: remove unnecessary nlink setting
hypfs: remove unnecessary nlink setting
vfs: ignore error on forced remount
readlinkat: ensure we return ENOENT for the empty pathname for normal lookups
vfs: fix dentry leak in simple_fill_super()
|
|
Replace direct i_nlink updates with the respective updater function
(inc_nlink, drop_nlink, clear_nlink, inode_dec_link_count).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
For a long time now orlov is the default block allocator in the ext3. It
performs better than the old one and no one seems to claim otherwise so
we can safely drop it and make oldalloc and orlov mount option
deprecated.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
This commit adds fixed tracepoints to the ext3 code. It is based on ext4
tracepoints, however due to the differences of both file systems, there
are some tracepoints missing (those for delaloc and for multi-block
allocator) and there are some ext3 specific as well (for reservation
windows).
Here is a list:
ext3_free_inode
ext3_request_inode
ext3_allocate_inode
ext3_evict_inode
ext3_drop_inode
ext3_mark_inode_dirty
ext3_write_begin
ext3_ordered_write_end
ext3_writeback_write_end
ext3_journalled_write_end
ext3_ordered_writepage
ext3_writeback_writepage
ext3_journalled_writepage
ext3_readpage
ext3_releasepage
ext3_invalidatepage
ext3_discard_blocks
ext3_request_blocks
ext3_allocate_blocks
ext3_free_blocks
ext3_sync_file_enter
ext3_sync_file_exit
ext3_sync_fs
ext3_rsv_window_add
ext3_discard_reservation
ext3_alloc_new_reservation
ext3_reserved
ext3_forget
ext3_read_block_bitmap
ext3_direct_IO_enter
ext3_direct_IO_exit
ext3_unlink_enter
ext3_unlink_exit
ext3_truncate_enter
ext3_truncate_exit
ext3_get_blocks_enter
ext3_get_blocks_exit
ext3_load_inode
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
SELinux would like to implement a new labeling behavior of newly created
inodes. We currently label new inodes based on the parent and the creating
process. This new behavior would also take into account the name of the
new object when deciding the new label. This is not the (supposed) full path,
just the last component of the path.
This is very useful because creating /etc/shadow is different than creating
/etc/passwd but the kernel hooks are unable to differentiate these
operations. We currently require that userspace realize it is doing some
difficult operation like that and than userspace jumps through SELinux hoops
to get things set up correctly. This patch does not implement new
behavior, that is obviously contained in a seperate SELinux patch, but it
does pass the needed name down to the correct LSM hook. If no such name
exists it is fine to pass NULL.
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|
If a filesystem has inode size > 128 and someone deletes lost+found and
reuses inode 11 for some other file, extented attributes set for this
inode before umount will get lost after remounting the filesystem. This
is because extended attributes will get stored in an inode but ext3_iget
will ignore them due to workaround of a bug in an old mkfs.
Fix the problem by initializing i_extra_isize to 0 for freshly allocated
inodes where mkfs workaround in ext3_iget applies. This way these inodes
will always store extended attributes in a special block and no problems
occur.
The bug was spotted and a reproduction test provided by:
Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com>
Reviewed-by: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
In commit 9df93939b735 ("ext3: Use bitops to read/modify
EXT3_I(inode)->i_state") ext3 changed its internal 'i_state' variable to
use bitops for its state handling. However, unline the same ext4
change, it didn't actually change the name of the field when it changed
the semantics of it.
As a result, an old use of 'i_state' remained in fs/ext3/ialloc.c that
initialized the field to EXT3_STATE_NEW. And that does not work
_at_all_ when we're now working with individually named bits rather than
values that get masked. So the code tried to mark the state to be new,
but in actual fact set the field to EXT3_STATE_JDATA. Which makes no
sense at all, and screws up all the code that checks whether the inode
was newly allocated.
In particular, it made the xattr code unhappy, and caused various random
behavior, like apparently
https://bugzilla.redhat.com/show_bug.cgi?id=577911
So fix the initialization, and rename the field to match ext4 so that we
don't have this happen again.
Cc: James Morris <jmorris@namei.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Daniel J Walsh <dwalsh@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Get rid of the initialize dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.
Rename the now static low-level dquot_initialize helper to __dquot_initialize
and vfs_dq_init to dquot_initialize to have a consistent namespace.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Get rid of the drop dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.
Rename the now static low-level dquot_drop helper to __dquot_drop
and vfs_dq_drop to dquot_drop to have a consistent namespace.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Get rid of the alloc_inode and free_inode dquot operations - they are
always called from the filesystem and if a filesystem really needs
their own (which none currently does) it can just call into it's
own routine directly.
Also get rid of the vfs_dq_alloc/vfs_dq_free wrappers and always
call the lowlevel dquot_alloc_inode / dqout_free_inode routines
directly, which now lose the number argument which is always 1.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Use lowercase names of quota functions instead of old uppercase ones.
Signed-off-by: Jan Kara <jack@suse.cz>
CC: linux-ext4@vger.kernel.org
|
|
At the moment there are few restrictions on which flags may be set on
which inodes. Specifically DIRSYNC may only be set on directories and
IMMUTABLE and APPEND may not be set on links. Tighten that to disallow
TOPDIR being set on non-directories and only NODUMP and NOATIME to be set
on non-regular file, non-directories.
Introduces a flags masking function which masks flags based on mode and
use it during inode creation and when flags are set via the ioctl to
facilitate future consistency.
Signed-off-by: Duane Griffin <duaneg@dghda.com>
Acked-by: Andreas Dilger <adilger@sun.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
At present INDEX is the only flag that new ext3 inodes do NOT inherit from
their parent. In addition prevent the flags DIRTY, ECOMPR, IMAGIC and
TOPDIR from being inherited. List inheritable flags explicitly to prevent
future flags from accidentally being inherited.
This fixes the TOPDIR flag inheritance bug reported at
http://bugzilla.kernel.org/show_bug.cgi?id=9866.
Signed-off-by: Duane Griffin <duaneg@dghda.com>
Acked-by: Andreas Dilger <adilger@sun.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
ext3 analog of the previous patch
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.
Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: adilger@sun.com
Cc: linux-ext4@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
|
|
If the orphan node list includes valid, untruncatable nodes with nlink > 0
the ext3_orphan_cleanup loop which attempts to delete them will not do so,
causing it to loop forever. Fix by checking for such nodes in the
ext3_orphan_get function.
This patch fixes the second case (image hdb.20000009.softlockup.gz)
reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: printk warning fix]
Signed-off-by: Duane Griffin <duaneg@dghda.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Spelling fix: prefered -> preferred
Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
|
|
replace all:
little_endian_variable = cpu_to_leX(leX_to_cpu(little_endian_variable) +
expression_in_cpu_byteorder);
with:
leX_add_cpu(&little_endian_variable, expression_in_cpu_byteorder);
sparse didn't generate any new warning with this patch
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: Timothy Shimmin <tes@sgi.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Stop the EXT3 filesystem from using iget() and read_inode(). Replace
ext3_read_inode() with ext3_iget(), and call that instead of iget().
ext3_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.
ext3_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
ext[234]_get_group_desc never tests the bh argument, and only sets it if it
is passed in; it is perfectly happy with a NULL bh argument. But, many
callers send one in and never use it. May as well call with NULL like
other callers who don't use the bh.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This eliminates the i_blksize field from struct inode. Filesystems that want
to provide a per-inode st_blksize can do so by providing their own getattr
routine instead of using the generic_fillattr() function.
Note that some filesystems were providing pretty much random (and incorrect)
values for i_blksize.
[bunk@stusta.de: cleanup]
[akpm@osdl.org: generic_fillattr() fix]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This is primarily format string fixes, with changes to ialloc.c where large
inode counts could overflow, and also pass around journal_inum as an
unsigned long, just to be pedantic about it....
Signed-off-by: Eric Sandeen <esandeen@redhat.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Remove whitespace from ext3 and jbd, before we clone ext4.
Signed-off-by: Mingming Cao<cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Some of the in-kernel ext3 block variable type are treated as signed 4 bytes
int type, thus limited ext3 filesystem to 8TB (4kblock size based). While
trying to fix them, it seems quite confusing in the ext3 code where some
blocks are filesystem-wide blocks, some are group relative offsets that need
to be signed value (as -1 has special meaning). So it seem saner to define
two types of physical blocks: one is filesystem wide blocks, another is
group-relative blocks. The following patches clarify these two types of
blocks in the ext3 code, and fix the type bugs which limit current 32 bit ext3
filesystem limit to 8TB.
With this series of patches and the percpu counter data type changes in the mm
tree, we are able to extend exts filesystem limit to 16TB.
This work is also a pre-request for the recent >32 bit ext3 work, and makes
the kernel to able to address 48 bit ext3 block a lot easier: Simply redefine
ext3_fsblk_t from unsigned long to sector_t and redefine the format string for
ext3 filesystem block corresponding.
Two RFC with a series patches have been posted to ext2-devel list and have
been reviewed and discussed:
http://marc.theaimsgroup.com/?l=ext2-devel&m=114722190816690&w=2
http://marc.theaimsgroup.com/?l=ext2-devel&m=114784919525942&w=2
Patches are tested on both 32 bit machine and 64 bit machine, <8TB ext3 and
>8TB ext3 filesystem(with the latest to be released e2fsprogs-1.39). Tests
includes overnight fsx, tiobench, dbench and fsstress.
This patch:
Defines ext3_fsblk_t and ext3_grpblk_t, and the printk format string for
filesystem wide blocks.
This patch classifies all block group relative blocks, and ext3_fsblk_t blocks
occurs in the same function where used to be confusing before. Also include
kernel bug fixes for filesystem wide in-kernel block variables. There are
some fileystem wide blocks are treated as int/unsigned int type in the kernel
currently, especially in ext3 block allocation and reservation code. This
patch fixed those bugs by converting those variables to ext3_fsblk_t(unsigned
long) type.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch contains the following cleanups:
- there's no need for ext3_count_free() #ifndef EXT3FS_DEBUG
- having prototypes for ext3_count_free() in two different headers is
nonsense
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Remove the trailing newlines in calls to ext3_warning(). This function
already adds a trailing newline to the end of messages.
Signed-off-by: Glauber de Oliveira Costa <glommer@br.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The CONFIG_EXT{2,3}_CHECK options where were never available, and all they
did was to implement a subset of e2fsck in the kernel.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Fix warnings from sparse due to un-declared functions that should either
have a header file or have been declared static
fs/ext2/bitmap.c:14:15: warning: symbol 'ext2_count_free' was not declared. Should it be static?
fs/ext2/namei.c:92:15: warning: symbol 'ext2_get_parent' was not declared. Should it be static?
fs/ext3/bitmap.c:15:15: warning: symbol 'ext3_count_free' was not declared. Should it be static?
fs/ext3/namei.c:1013:15: warning: symbol 'ext3_get_parent' was not declared. Should it be static?
fs/ext3/xattr.c:214:1: warning: symbol 'ext3_xattr_block_get' was not declared. Should it be static?
fs/ext3/xattr.c:358:1: warning: symbol 'ext3_xattr_block_list' was not declared. Should it be static?
fs/ext3/xattr.c:630:1: warning: symbol 'ext3_xattr_block_find' was not declared. Should it be static?
fs/ext3/xattr.c:863:1: warning: symbol 'ext3_xattr_ibody_find' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
I noticed some problems while running ext3 with the debug flag set on.
More precisely, I was unable to umount the filesystem. Some investigation
took me to the patch that follows.
At a first glance , the lock/unlock I've taken out seems really not
necessary, as the main code (outside debug) does not lock the super. The
only additional danger operations that debug code introduces seems to be
related to bitmap, but bitmap operations tends to be all atomic anyway.
I also took the opportunity to fix 2 spelling errors.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Fix failure paths in ext3_new_inode() and clean up duplicated code: -
DQUOT_DROP() was not being called if ext3_init_security() failed.
Signed-off-by: Chris Sykes <chris@sigsegv.plus.com>
Cc: Stephen Smalley <sds@epoch.ncsc.mil>
Cc: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch modifies ext3 to call the inode_init_security LSM hook to obtain
the security attribute for a newly created inode and to set the resulting
attribute on the new inode as part of the same transaction. This parallels
the existing processing for setting ACLs on newly created inodes.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
We must drop references to quota structures before releasing the inode.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Moved i_next_alloc_block and i_next_goal_block out from ext3_inod_info, and
put it together with the reservation structure into the
ext3_block_alloc_info structure, and dynamically allocate that structure
whenever need to allocation a block. This is also apply for noreservation
mount. Also cleanup ext3_find_goal() code.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Right now the ext3 reservation structure(ext3_reserve_window_node) is part of
the ext3 inode itself. This part of information is only needed for files that
need allocate blocks on disk. So, the attached patches reduce the ext3 inode
size by dynamically allocating the block allocation/reservation info
structure(called struct ext3_block_alloc_info) when it is needed(i.e. only
for files who need to allocate blocks)
The reservation structure is being allocated and linked to the ext3 inode at
ext3_get_block_handle(), and being freed and unlinked at the
iput_final->ext3_clear_inode().
The ei->truncate_sem which is currently used to protect concurrent
ext3_get_block() and ext3_truncate is used to protect reservation structure
allocation and deallocation.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
There's a missing failure handling check here that would possibly lead to a
null dereference later on, I'm not sure about the correct return value
however. I haven't tried it as I'm not sure how to trigger the case ;)
Found by the Coverity tool.
Signed-off-by: Alexander Nyberg <alexn@dsv.su.se>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
fix against credits leak in journal_release_buffer()
The idea is to charge a buffer in journal_dirty_metadata(), not in
journal_get_*_access()). Each buffer has flag call
journal_dirty_metadata() sets on the buffer.
Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This started of as a patch by Alex Tomas <alex@clusterfs.com> and got an
overhaul by me. The on-disk structure used is the same as in Alex's
original patch.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Revert the recently-added (post-2.6.10) ea-in-inode speedup patch. We have a
new one.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The attached patch fixes long scheduling latencies in the ext3 code, and it
also cleans up the existing lock-break functionality to use the new
primitives.
This patch has been in the -VP patchset for quite some time.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch corrects a problem that was originally added with the nanosecond
timestamps in stat patch. The problem is that some file systems don't have
enough space in their on disk inode to save nanosecond timestamps, so they
truncate the c/a/mtime to seconds when flushing an dirty node. In core the
inode would have full jiffies granuality.
This can be observed by programs as a timestamp that jumps backwards under
specific loads when an inode is flushed and then reloaded from disk.
The problem was already known when the original patch went in, but it
wasn't deemed important enough at that time. So far there has been only
one report of it causing problems. Now Tridge is worried that it will
break running Excel over samba4 because Excel seems to do very anal
timestamp checking and samba4 will supply 100ns timestamps over the
network.
This patch solves it by putting the time resolution into the superblock of
a fs and always rounding the in core timestamps to that granuality.
This also supercedes some previous ext2/3 hacks to flush the inode less
often when only the subsecond timestamp changes.
I tried to keep the overhead low, in particular it tries to keep divisions
out of fast paths as far as possible.
The patch is quite big but 99% of it is just relatively straight forward
search'n'replace in a lot of fs. Unconverted filesystems will default to a
1ns granuality, but may still show the problem if they continue to use
CURRENT_TIME. I converted all in tree fs.
One possible future extension of this would be to have two time
granualities per superblock - one that specifies the visible resolution,
and the other to specify how often timestamps should be flushed to disk,
which could be tuned with a mount option per fs (e.g. often m/atimes don't
need to be flushed every second). Would be easy to do as an addon if
someone is interested.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
1) intent of the patch is to get possibility to store EAs in the body of large
inode. it saves space and improves performance in some cases
2) the patch is quite simple: it works the same way original xattr does, but
using other storage (inode body). body has priority over separate block.
original routines (ext3_xattr_get, ext3_xattr_list, ext3_xattr_set) are
renamed to ext3_xattr_block_*. new routines that handle inode storate are
added (ext3_xattr_ibody_get, ext3_xattr_ibody_list, ext3_xattr_ibody_set).
routines ext3_xattr_get, ext3_xattr_list and ext3_xattr_set allow user to
accesss both the storages transparently
3) the change makes sense on filesystem with inode size >= 256 bytes only.
2.4 kernels don't support such a filesystems, AFAIK. 2.6 kernels do support
and ignore EAs stored in a body w/o the patch
4) debugfs and e2fsck need to be patched to deal with EAs in inode
the patch will be sent later
5) testing results:
a) Andrew Samba Master (tridge) has done successful tests
b) we've been using ea-in-inode feature in Lustre for many months
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Alex Tomas <alex@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The patch below adds online resize capability to ext3 based on Andreas
patch for 2.4 and fixed up by Stephen.
The patch also removes s_debts:
s_debts is currently not used by ext3 (it is created, destroyed and checked
but never set). Remove it for now.
Resurrecting this will require adding it back in changed form. In existing
form it's already unsafe wrt. byte-tearing as it performs unlocked byte
increment/decrement on words which may be being accessed simultaneously on
other CPUs. It is also the only in-memory dynamic table which needs to be
extended by online-resize, so locking it will require care.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
rbtree implementation and other changes From: Stephen Tweedie <sct@redhat.com>
contributions From: Badari Pulavarty <pbadari@us.ibm.com> and probably me.
This is the ext3 block reservation patch. It improves the layout of ext3
files by establishing, for each inode, reserved areas of the disk in which
only that file can allocate blocks. Those reserved areas are managed in an
rbtree, via the in-core inode.
It's a bit like ext2 preallocation only stronger in that it can span
already-allocated blocks, including the per-blockgroup inode tables and
bitmaps.
The patch fixes ext3's worst performance problem: disastrous layout when
multiple files are being concurrently grown.
It increases the size of the inode by rather a lot. A todo item is to
dynamically allocate the `struct reserve_window_node', so we don't need to
carry this storage for inodes which aren't opened for writing.
The feature is enabled by mounting with the "reservation" mount option.
Reservations default to "off".
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Cleans up the old ext3 preallocation code carried from ext2 but turned off.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
There's no reason to directly #include <asm/bitops.h> since it's
available on all architectures and also included by
#include <linux/bitops.h>.
This patch changes #include <asm/bitops.h> to #include <linux/bitops.h>.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|