summaryrefslogtreecommitdiff
path: root/fs/ext2/ialloc.c
AgeCommit message (Collapse)Author
2011-11-02filesystems: add missing nlink wrappersMiklos Szeredi
Replace direct i_nlink updates with the respective updater function (inc_nlink, drop_nlink, clear_nlink, inode_dec_link_count). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-02-01fs/vfs/security: pass last path component to LSM on inode creationEric Paris
SELinux would like to implement a new labeling behavior of newly created inodes. We currently label new inodes based on the parent and the creating process. This new behavior would also take into account the name of the new object when deciding the new label. This is not the (supposed) full path, just the last component of the path. This is very useful because creating /etc/shadow is different than creating /etc/passwd but the kernel hooks are unable to differentiate these operations. We currently require that userspace realize it is doing some difficult operation like that and than userspace jumps through SELinux hoops to get things set up correctly. This patch does not implement new behavior, that is obviously contained in a seperate SELinux patch, but it does pass the needed name down to the correct LSM hook. If no such name exists it is fine to pass NULL. Signed-off-by: Eric Paris <eparis@redhat.com>
2010-08-09merge ext2 delete_inode and clear_inode, switch to ->evict_inode()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21ext2: replace inode uid,gid,mode init with helperDmitry Monakhov
Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21ext2: remove useless call to brelse() in ext2_free_inode()Francis Moreau
This patch removes a useless call to brelse(bitmap_bh) since at that point bitmap_bh is NULL and slightly cleans up bitmap_bh handling. Signed-off-by: Francis Moreau <francis.moro@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05dquot: cleanup dquot initialize routineChristoph Hellwig
Get rid of the initialize dquot operation - it is now always called from the filesystem and if a filesystem really needs it's own (which none currently does) it can just call into it's own routine directly. Rename the now static low-level dquot_initialize helper to __dquot_initialize and vfs_dq_init to dquot_initialize to have a consistent namespace. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05dquot: cleanup dquot drop routineChristoph Hellwig
Get rid of the drop dquot operation - it is now always called from the filesystem and if a filesystem really needs it's own (which none currently does) it can just call into it's own routine directly. Rename the now static low-level dquot_drop helper to __dquot_drop and vfs_dq_drop to dquot_drop to have a consistent namespace. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-05dquot: cleanup inode allocation / freeing routinesChristoph Hellwig
Get rid of the alloc_inode and free_inode dquot operations - they are always called from the filesystem and if a filesystem really needs their own (which none currently does) it can just call into it's own routine directly. Also get rid of the vfs_dq_alloc/vfs_dq_free wrappers and always call the lowlevel dquot_alloc_inode / dqout_free_inode routines directly, which now lose the number argument which is always 1. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2009-03-26ext2: Use lowercase names of quota functionsJan Kara
Use lowercase names of quota functions instead of old uppercase ones. Signed-off-by: Jan Kara <jack@suse.cz> CC: linux-ext4@vger.kernel.org
2009-01-08ext2: tighten restrictions on inode flagsDuane Griffin
At the moment there are few restrictions on which flags may be set on which inodes. Specifically DIRSYNC may only be set on directories and IMMUTABLE and APPEND may not be set on links. Tighten that to disallow TOPDIR being set on non-directories and only NODUMP and NOATIME to be set on non-regular file, non-directories. Introduces a flags masking function which masks flags based on mode and use it during inode creation and when flags are set via the ioctl to facilitate future consistency. Signed-off-by: Duane Griffin <duaneg@dghda.com> Acked-by: Andreas Dilger <adilger@sun.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08ext2: don't inherit inappropriate inode flags from parentDuane Griffin
At present BTREE/INDEX is the only flag that new ext2 inodes do NOT inherit from their parent. In addition prevent the flags DIRTY, ECOMPR, INDEX, IMAGIC and TOPDIR from being inherited. List inheritable flags explicitly to prevent future flags from accidentally being inherited. This fixes the TOPDIR flag inheritance bug reported at http://bugzilla.kernel.org/show_bug.cgi?id=9866. Signed-off-by: Duane Griffin <duaneg@dghda.com> Acked-by: Andreas Dilger <adilger@sun.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-31nfsd race fixes: ext2Al Viro
* make ext2_new_inode() put the inode into icache in locked state * do not unlock until the inode is fully set up; otherwise nfsd might pick it in half-baked state. * make sure that ext2_new_inode() does *not* lead to two inodes with the same inumber hashed at the same time; otherwise a bogus fhandle coming from nfsd might race with inode creation: nfsd: iget_locked() creates inode nfsd: try to read from disk, block on that. ext2_new_inode(): allocate inode with that inumber ext2_new_inode(): insert it into icache, set it up and dirty ext2_write_inode(): get the relevant part of inode table in cache, set the entry for our inode (and start writing to disk) nfsd: get CPU again, look into inode table, see nice and sane on-disk inode, set the in-core inode from it oops - we have two in-core inodes with the same inumber live in icache, both used for IO. Welcome to fs corruption... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-11-14CRED: Wrap task credential accesses in the Ext2 filesystemDavid Howells
Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: linux-ext4@vger.kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2008-04-28ext2: le*_add_cpu conversionMarcin Slusarz
replace all: little_endian_variable = cpu_to_leX(leX_to_cpu(little_endian_variable) + expression_in_cpu_byteorder); with: leX_add_cpu(&little_endian_variable, expression_in_cpu_byteorder); generated with semantic patch Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-21ext*: spelling fix prefered -> preferredBenoit Boissinot
Spelling fix: prefered -> preferred Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org> Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
2007-10-17ext2 reservationsMartin J. Bligh
Val's cross-port of the ext3 reservations code into ext2. [mbligh@mbligh.org: Small type error for printk [akpm@linux-foundation.org: fix types, sync with ext3] [mbligh@mbligh.org: Bring ext2 reservations code in line with latest ext3] [akpm@linux-foundation.org: kill noisy printk] [akpm@linux-foundation.org: remember to dirty the gdp's block] [akpm@linux-foundation.org: cross-port the missed 5dea5176e5c32ef9f0d1a41d28427b3bf6881b3a] [akpm@linux-foundation.org: cross-port e6022603b9aa7d61d20b392e69edcdbbc1789969] [akpm@linux-foundation.org: Port the omitted 08fb306fe63d98eb86e3b16f4cc21816fa47f18e] [akpm@linux-foundation.org: Backport the missed 20acaa18d0c002fec180956f87adeb3f11f635a6] [akpm@linux-foundation.org: fixes] [cmm@us.ibm.com: fix reservation extension] [bunk@stusta.de: make ext2_get_blocks() static] [hugh@veritas.com: fix hang] [hugh@veritas.com: ext2_new_blocks should reset the reservation window size] [hugh@veritas.com: ext2 balloc: fix off-by-one against rsv_end] [hugh@veritas.com: grp_goal 0 is a genuine goal (unlike -1), so ext2_try_to_allocate_with_rsv should treat it as such] [hugh@veritas.com: rbtree usage cleanup] [pbadari@us.ibm.com: Fix for ext2 reservation] [bunk@kernel.org: remove fs/ext2/balloc.c:reserve_blocks()] [hugh@veritas.com: ext2 balloc: use io_error label] Cc: "Martin J. Bligh" <mbligh@mbligh.org> Cc: Valerie Henson <val_henson@linux.intel.com> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17remove unused bh in calls to ext234_get_group_descEric Sandeen
ext[234]_get_group_desc never tests the bh argument, and only sets it if it is passed in; it is perfectly happy with a NULL bh argument. But, many callers send one in and never use it. May as well call with NULL like other callers who don't use the bh. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17lib: percpu_counter_addPeter Zijlstra
s/percpu_counter_mod/percpu_counter_add/ Because its a better name, _mod implies modulo. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2006-09-27[PATCH] inode-diet: Eliminate i_blksize from the inode structureTheodore Ts'o
This eliminates the i_blksize field from struct inode. Filesystems that want to provide a per-inode st_blksize can do so by providing their own getattr routine instead of using the generic_fillattr() function. Note that some filesystems were providing pretty much random (and incorrect) values for i_blksize. [bunk@stusta.de: cleanup] [akpm@osdl.org: generic_fillattr() fix] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-19[PATCH] EXT2: Remove superblock lock contention in ext2_statfsDave Kleikamp
Fix a performance degradation introduced in 2.6.17. (30% degradation running dbench with 16 threads) Commit 21730eed11de42f22afcbd43f450a1872a0b5ea1, which claims to make EXT2_DEBUG work again, moves the taking of the kernel lock out of debug-only code in ext2_count_free_inodes and ext2_count_free_blocks and into ext2_statfs. The same problem was fixed in ext3 by removing the lock completely (commit 5b11687924e40790deb0d5f959247ade82196665) Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30Remove obsolete #include <linux/config.h>Jörn Engel
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-25[PATCH] Make EXT2_DEBUG work againValerie Henson
This patch makes EXT2_DEBUG work again. Due to lack of proper include file, EXT2_DEBUG was undefined in bitmap.c and ext2_count_free() is left out. Moved to balloc.c and removed bitmap.c entirely. Second, debug versions of ext2_count_free_{inodes/blocks} reacquires superblock lock. Moved lock into callers. Signed-off-by: Val Henson <val_henson@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-03[PATCH] quota: fix error code for ext2_new_inode()Herbert Poetzl
The quota check in ext2_new_inode() returns ENOSPC where it should return EDQUOT instead. Signed-off-by: Herbert Pötzl <herbert@13thfloor.at> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09[PATCH] remove CONFIG_EXT{2,3}_CHECKAdrian Bunk
The CONFIG_EXT{2,3}_CHECK options where were never available, and all they did was to implement a subset of e2fsck in the kernel. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-28[PATCH] Fix ext2_new_inode() failure pathsChris Sykes
Fix failure paths in ext2_new_inode() and clean up duplicated code: - DQUOT_DROP() was not being called if ext2_init_security() failed. Signed-off-by: Chris Sykes <chris@sigsegv.plus.com> Cc: Stephen Smalley <sds@epoch.ncsc.mil> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09[PATCH] ext2: Enable atomic inode security labelingStephen Smalley
This patch modifies ext2 to call the inode_init_security LSM hook to obtain the security attribute for a newly created inode and to set the resulting attribute on the new inode. This parallels the existing processing for setting ACLs on newly created inodes. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27[PATCH] ext2: drop quota reference before releasing inodeJan Kara
We must drop references to quota structures before releasing the inode. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-01-04[PATCH] Sync in core time granuality with filesystemsAndi Kleen
This patch corrects a problem that was originally added with the nanosecond timestamps in stat patch. The problem is that some file systems don't have enough space in their on disk inode to save nanosecond timestamps, so they truncate the c/a/mtime to seconds when flushing an dirty node. In core the inode would have full jiffies granuality. This can be observed by programs as a timestamp that jumps backwards under specific loads when an inode is flushed and then reloaded from disk. The problem was already known when the original patch went in, but it wasn't deemed important enough at that time. So far there has been only one report of it causing problems. Now Tridge is worried that it will break running Excel over samba4 because Excel seems to do very anal timestamp checking and samba4 will supply 100ns timestamps over the network. This patch solves it by putting the time resolution into the superblock of a fs and always rounding the in core timestamps to that granuality. This also supercedes some previous ext2/3 hacks to flush the inode less often when only the subsecond timestamp changes. I tried to keep the overhead low, in particular it tries to keep divisions out of fast paths as far as possible. The patch is quite big but 99% of it is just relatively straight forward search'n'replace in a lot of fs. Unconverted filesystems will default to a 1ns granuality, but may still show the problem if they continue to use CURRENT_TIME. I converted all in tree fs. One possible future extension of this would be to have two time granualities per superblock - one that specifies the visible resolution, and the other to specify how often timestamps should be flushed to disk, which could be tuned with a mount option per fs (e.g. often m/atimes don't need to be flushed every second). Would be easy to do as an addon if someone is interested. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-09-04[PATCH] ext2 endianness fixesAlexander Viro
Several places printk a little-endian number without any conversions. Ones in super.c are particulary unpleasant - there we are getting told that fs couldn't be mounted because of the following set of incompat features and it would be nice to have the printed number matching what one could find in headers... Signed-off-by: Al Viro <viro@parcelfarce.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-05-24[PATCH] ext2: fix build with DEBUG=yAndrew Morton
From: FabF <fabian.frederick@skynet.be>
2004-03-24[PATCH] ext2&3: use the right i_flags in find_group_orlov()Andrew Morton
Spotted by Jorn Engel <joern@wohnheim.fh-wedel.de>: both the generic and fs-specific parts of the inode have an i_flags. find_group_orlov() is using the wrong one.
2004-03-06[PATCH] ext2/ext3 -ENOSPC bugAndrew Morton
From: Chris Mason <mason@suse.com> find_group_other looks buggy for ext2 and ext3 in 2.6, it can cause -ENOSPC errors when the fs has plenty of free room. To hit the bug, you need a filesystem where: parent_group has no free blocks (but might have free inodes) Every other group with free inodes has no free blocks. That gets you down to the final linear search in find_group_other. The linear search has two bugs: group = parent_group + 1; means we start searching at parent_group + 2 because the loop increments group before using it. for(i = 2 ; i < ngroups ; i++) means we don't search through all the groups. The end result is that parent_group and parent_group + 1 are not checked for free inodes in the final linear search. ext3 has the same problem.
2004-01-19[PATCH] ext2: fix build when EXT2_DEBUG is setAndrew Morton
Fix warnings and build errors under EXT2_DEBUG.
2004-01-19[PATCH] ext2: s_next_generation lockingAndrew Morton
There is no locking around the increment of this per-filesystem counter. Create a new lock, just for this.
2004-01-19[PATCH] ext2_new_inode nanocleanupAndrew Morton
We've cached EXT2_SB(sb) in local variable `sbi'. Use it.
2003-11-18[PATCH] remove ext2_reserve_inode()Andrew Morton
It now has no callers.
2003-11-18[PATCH] Fix bugs in ext2_new_inode()Andrew Morton
From: Mingming Cao <cmm@us.ibm.com> I found several bugs/issues in the ext2_new_inode() code: 1) The for loop variable "i" is used to save the inode offset. In the case of failure, the loop variable could be crapped. So it is possible to quit searching before looking at every block groups. 2) The number of free inodes in the selected group is possibly being miscalculated. The counter is only decreased in the find_group_xx() functions for the initial selected group. If the initial try failed, and succeed in finding a free inode in other group, the counter for that group will not to be decreased. 3) In case of the concurrent case, going back to find_group_xx() functions are unnecessary, it will only get the same group as before. The following patch fixed those issues. Ideas are stolen from ext3_new_inode().
2003-07-10[PATCH] misc fixesAndrew Morton
- remove accidental debug code from ext3 commit. - /proc/profile documentation fix (Randy Dunlap) - use sb_breadahead() in ext2_preread_inode() - unused var in mpage_writepages()
2003-07-02[PATCH] ext2: inode allocation race fixAndrew Morton
ext2's inode allocator will call find_group_orlov(), which will return a suitable blockgroup in which the inode should be allocated. But by the time we actually try to allocate an inode in the blockgroup, other CPUs could have used them all up. ext2 will bogusly fail with "ext2_new_inode: Free inodes count corrupted in group NN". To fix this we just advance onto the next blockgroup if the rare race happens. If we've scanned all blockgroups then return -ENOSPC. (This is a bit inaccurate: after we've scanned all blockgroups, there may still be available inodes due to inode freeing activity in other blockgroups. This cannot be fixed without fs-wide locking. The effect is a slightly early ENOSPC in a nearly-full filesystem).
2003-04-16[PATCH] Fix orlov allocator boundary caseAndrew Morton
In the interests of SMP scalability the ext2 free blocks and free inodes counters are "approximate". But there is a piece of code in the Orlov allocator which fails due to boundary conditions on really small filesystems. Fix that up via a final allocation pass which simply uses first-fit for allocation of a directory inode.
2003-04-12[PATCH] use spinlocking in the ext2 inode allocatorAndrew Morton
From Alex Tomas and myself It is identical in concept to the block allocator change. It uses the same hashed spinlock.
2003-04-12[PATCH] use spinlocking in the ext2 block allocatorAndrew Morton
From Alex Tomas and myself ext2 currently uses lock_super() to protect the filesystem's in-core block allocation bitmaps. On big SMP machines the contention on that semaphore is causing high context switch rates, large amounts of idle time and reduced throughput. The context switch rate can also worsen block allocation: if several tasks are trying to allocate blocks inside the same blockgroup for different files, madly rotating between those tasks will cause the files' blocks to be intermingled. On SDET and dbench-style worloads (lots of tasks doing lots of allocation) this patch (and a similar one for the inode allocator) improve throughout on an 8-way by ~15%. On 16-way NUMAQ the speedup is 150%. What wedo isto remove the lock altogether and just rely on the atomic semantics of test_and_set_bit(): if the allocator sees a block was free it runs test_and_set_bit(). If that fails, then we raced and the allocator will go and look for another block. Of course, we don't really use test_and_set_bit() because that isn'tendian-dependent. New atomic endian-independent functions are introduced: ext2_set_bit_atomic() and ext2_clear_bit_atomic(). We do not need ext2_test_bit_atomic(), since even if ext2_test_bit() returns the wrong result, that error will be detected and naturally handled in the subsequent ext2_set_bit_atomic(). For little-endian machines the new atomic ops map directly onto the test_and_set_bit(), etc. For big-endian machines we provide the architecture's impementation with the address of a spinlock whcih can be taken around the nonatomic ext2_set_bit(). The spinlocks are hashed, and the hash is scaled according to the machine size. Architectures are free to implement optimised versions of ext2_set_bit_atomic() and ext2_clear_bit_atomic().
2003-03-16[PATCH] Ext2/3 noatime and dirsync fixesAndrew Morton
Patch from "Theodore Ts'o" <tytso@mit.edu> I recently noticed a bug in ext2/3; newly created inodes which inherit the noatime flag from their containing directory do not respect noatime until the inode is flushed from the inode cache and then re-read later. This is because the code which checks the ext2 no-atime attribute and then sets the S_NOATIME in inode->i_flags is present in ext2_read_inode(), but not in ext2_new_inode(). I fixed this in 2.4, and then found an even worse bug in the 2.5 code; the DIRSYNC flag is completely ignored *except* in the case where a directory is newly created using mkdir and its parent directory has the DIRSYNC flag. S_DIRSYNC doesn't get set in the ext2_new_inode() or the ext2_ioctl() paths (which is used by chattr). This patch centralizes the code which translates the ext2 flags in the raw ext2 inode to the appropriate flag values in inode->i_flags in a single location. This fixes the bug, makes things cleaner, and also removes 30 lines of code and 128 bytes of compiled x86 text in the bargain.
2003-03-02[PATCH] ext2: clear ext3 htree flag on directoriesAndrew Morton
Forward port of a change which Ted made to 2.4's ext2. HTREE backwards compatibility patch. "I thought (and assumed) this patch had been applied to both the ext2 and ext3 filesystems in the 2.4 kernel. It turns out it had only made it into the ext3 filesystem code. This means that if an HTREE-enabled filesystem is mounted using ext2, it will corrupt the filesystem as far as e2fsck and an ext3 htree-enabled kernel is concerned. (The corruption won't cause any data loss, but it will cause e2fsck and an ext3-htree kernel to omit a lot of warning messages.)"
2003-02-10[PATCH] Fix synchronous writers to wait properly for the resultAndrew Morton
Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> points out a bug in ll_rw_block() usage. Typical usage is: mark_buffer_dirty(bh); ll_rw_block(WRITE, 1, &bh); wait_on_buffer(bh); the problem is that if the buffer was locked on entry to this code sequence (due to in-progress I/O), ll_rw_block() will not wait, and start new I/O. So this code will wait on the _old_ I/O, and will then continue execution, leaving the buffer dirty. It turns out that all callers were only writing one buffer, and they were all waiting on that writeout. So I added a new sync_dirty_buffer() function: void sync_dirty_buffer(struct buffer_head *bh) { lock_buffer(bh); if (test_clear_buffer_dirty(bh)) { get_bh(bh); bh->b_end_io = end_buffer_io_sync; submit_bh(WRITE, bh); } else { unlock_buffer(bh); } } which allowed a fair amount of code to be removed, while adding the desired data-integrity guarantees. UFS has its own wrappers around ll_rw_block() which got in the way, so this operation was open-coded in that case.
2002-12-21[PATCH] ext2/3: better starting group for S_ISREG filesAndrew Morton
ext2 places non-directory objects into the same blockgroup as their directory, as long as that directory has free inodes. It does this even if there are no free blocks in that blockgroup (!). This means that if there are lots of files being created at a common point in the tree, they _all_ have the same starting blockgroup. For each file we do a big search forwards for the first block and the allocations end up getting intermingled. So this patch will avoid placing new inodes in block groups which have no free blocks. So far so good. But this means that if a lot of new files are being created under a directory (or multiple directories) which are in the same blockgroup, all the new inodes will overflow into the same blockgroup. No improvement at all. So the patch arranges for the new inode locations to be "spread out" across different blockgroups if they are not going to be placed in their directory's block group. This is done by adding parent->i_ino into the starting point for the quadratic hash. i_ino was chosen so that files which are in the same directory will tend to all land in the same new blockgroup.
2002-11-21[PATCH] ext2/ext3 Orlov directory accounting fixAndrew Morton
Patch from Stephen Tweedie "In looking at the fix for the ext3 Orlov double-accounting bug, I noticed a change to the sb->s_dir_count accounting, restoring a missing s_dir_count++ when we allocate a new directory. However, I can't find anywhere in the code where we decrement this again on directory deletion, neither in ext2 nor in ext3, in 2.4 nor in 2.5." Locking is via lock_super().
2002-11-01[PATCH] Fixup Orlov block allocator for ext2Theodore Y. Ts'o
I finally had time to look at the Orlov patches, and found a memory leak; sbi->s_debts wasn't getting freed when the filesystem was getting unmounted, or in the error path. This patch also makes the following cleanups/changes: 1) Use sbi->s_debts instead of sbi->debts --- all other fields in struct ext2_sb_info are prefixed by "s_", so this makes things consistent. 2) Add support for a new inode flag, EXT2_TOPDIR_FL, which tells tells the Orlov allocator to treat that directory as the top of directory hierarchies, so that new subdirectories created in that directory should be spread apart. System administrators should set this flag on directories like /usr/src, /usr/home, etc. 3) Add a mount-time flag, -o oldalloc, which forces the use of the old inode (pre-Orlov) allocator. This makes it easier to do comparison benchmarks, and in case people want to use the old algorithm.
2002-10-31[PATCH] Orlov block allocator for ext2Andrew Morton
This is Al's implementation of the Orlov block allocator for ext2. At least doubles the throughput for the traverse-a-kernel-tree test and is well tested. I still need to do the ext3 version. No effort has been put into tuning it at this time, so more gains are probably possible.
2002-10-30Port of (bugfixed) 0.8.50 acl-ext2 to 2.5Theodore Y. Ts'o
This patch adds ACL support to the ext2 filesystem.