| Age | Commit message (Collapse) | Author |
|
I have benchmarked this on an x86_64 NUMA system and see no significant
performance difference on kernbench. Tested on both x86_64 and powerpc.
The way we do file struct accounting is not very suitable for batched
freeing. For scalability reasons, file accounting was
constructor/destructor based. This meant that nr_files was decremented
only when the object was removed from the slab cache. This is susceptible
to slab fragmentation. With RCU based file structure, consequent batched
freeing and a test program like Serge's, we just speed this up and end up
with a very fragmented slab -
llm22:~ # cat /proc/sys/fs/file-nr
587730 0 758844
At the same time, I see only a 2000+ objects in filp cache. The following
patch I fixes this problem.
This patch changes the file counting by removing the filp_count_lock.
Instead we use a separate percpu counter, nr_files, for now and all
accesses to it are through get_nr_files() api. In the sysctl handler for
nr_files, we populate files_stat.nr_files before returning to user.
Counting files as an when they are created and destroyed (as opposed to
inside slab) allows us to correctly count open files with RCU.
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The file_lock spinlock sits close to mostly read fields of 'struct
files_struct'
In SMP (and NUMA) environments, each time a thread wants to open or close
a file, it has to acquire the spinlock, thus invalidating the cache line
containing this spinlock on other CPUS. So other threads doing
read()/write()/... calls that use RCU to access the file table are going
to ask further memory (possibly NUMA) transactions to read again this
memory line.
Move the spinlock to another cache line, so that concurrent threads can
share the cache line containing 'count' and 'fdt' fields.
It's worth up to 9% on a microbenchmark using a 4-thread 2-package x86
machine. See
http://marc.theaimsgroup.com/?l=linux-kernel&m=112680448713342&w=2
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch renames struct kmem_cache_s to kmem_cache so we can start using
it instead of kmem_cache_t typedef.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Patch to eliminate struct files_struct.file_lock spinlock on the reader side
and use rcu refcounting rcuref_xxx api for the f_count refcounter. The
updates to the fdtable are done by allocating a new fdtable structure and
setting files->fdt to point to the new structure. The fdtable structure is
protected by RCU thereby allowing lock-free lookup. For fd arrays/sets that
are vmalloced, we use keventd to free them since RCU callbacks can't sleep. A
global list of fdtable to be freed is not scalable, so we use a per-cpu list.
If keventd is already handling the current cpu's work, we use a timer to defer
queueing of that work.
Since the last publication, this patch has been re-written to avoid using
explicit memory barriers and use rcu_assign_pointer(), rcu_dereference()
premitives instead. This required that the fd information is kept in a
separate structure (fdtable) and updated atomically.
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
In order for the RCU to work, the file table array, sets and their sizes must
be updated atomically. Instead of ensuring this through too many memory
barriers, we put the arrays and their sizes in a separate structure. This
patch takes the first step of putting the file table elements in a separate
structure fdtable that is embedded withing files_struct. It also changes all
the users to refer to the file table using files_fdtable() macro. Subsequent
applciation of RCU becomes easier after this.
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
expand_files() cleanup: Make expand_files() common code for
fork.c:fork/copy_files(), open.c:open/get_unused_fd() and
fcntl.c:dup/locate_fd().
expand_files() does both expand_fd_array and expand_fd_set based on the
need. This is used in dup(). open() and fork() duplicates the work of
expand files. At all places we check for expanding fd array, we also check
for expanding fdset. There is no need of checking and calling them
seperately.
This change also moves the expand_files to file.c from fcntl.c, and makes
the expand_fd_array and expand_fd_set local to that file.
Signed-off-by: Prasanna Meda <pmeda@akamai.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: Russell King <rmk@arm.linux.org.uk>
Cleanup the 4 duplicate "get_files_struct" implementations into one
get_files_struct() function to compliment put_files_struct().
|
|
From: Dipankar Sarma <dipankar@in.ibm.com>
fget() shows up on profiles, especially on SMP. Dipankar's patch
special-cases the situation wherein there are no sharers of current->files.
In this situation we know that no other process can close this file, so it
is not necessary to increment the file's refcount.
It's ugly as sin, but makes a substantial difference.
The test is
dd if=/dev/zero of=foo bs=1 count=1M
On 4CPU P3 xeon with 1MB L2 cache and 512MB ram:
kernel sys time std-dev
------------ -------- -------
UP - vanilla 2.104 0.028
UP - file 1.867 0.019
SMP - vanilla 2.976 0.023
SMP - file 2.719 0.026
|
|
Time to write a 2M file, one byte at a time:
Before:
1.09s user 4.92s system 99% cpu 6.014 total
0.74s user 5.28s system 99% cpu 6.023 total
1.03s user 4.97s system 100% cpu 5.991 total
After:
0.79s user 5.17s system 99% cpu 5.993 total
0.79s user 5.17s system 100% cpu 5.957 total
0.84s user 5.11s system 100% cpu 5.942 total
|
|
Patch from Manfred Spraul <manfred@colorfullife.com>
Fixes the lock contention over file_list_lock by simply removing the unneeded
anon_list. So filp allocation does not need to take a global lock any more.
The use of a spinlock to protect files_stat.nr_files is a bit awkward, but
the alternative is a custom sysctl handler, and isn't much more efficient
anyway.
|
|
A minor removal of 6 function definitions from sched.h. They clearly
fit better in file.h. All users of these functions already include file.h.
And none of them included sched.h directly...
|
|
This patch splits fput into fput and __fput. __fput is needed by aio to
construct a mechanism for performing a deferred fput during io
completion, which typically occurs during interrupt context.
|
|
Big bits first, I'll redo the smaller bits tomorrow after some sleep.
Same as last time, rediffed against pre5
|
|
- David Howells: abtract out "current->need_resched" as "need_resched()"
- Frank Davis: ide-tape update for bio
- various: header file fixups
- Jens Axboe: fix up bio/ide/highmem issues
- Kai Germaschewski: ISDN update
- Tim Waugh: parport update
- Patrik Mochel: initcall update
- Greg KH: USB and Compaq PCI hotplug updates
|
|
- Al Viro: fix up silly problem in swapfile filp cleanups in 2.5.2
- Tachino Nobuhiro: fix another error return for swapfile filp code
- Robert Love: merge some of Ingo's scheduler fixes
- David Miller: networking, sparc and some scsi driver fixes
- Tim Waugh: parport update
- OGAWA Hirofumi: fatfs cleanups and bugfixes
- Roland Dreier: fix vsscanf buglets.
- Ben LaHaise: include file cleanup
- Andre Hedrick: IDE taskfile update
|
|
- Jens Axboe: more bio updates, fix some request list bogosity under load
- Al Viro: export seq_xxx functions
- Manfred Spraul: include file cleanups, pc110pad compile fix
- David Woodhouse: fix JFFS2 write error handling
- Dave Jones: start merging up with 2.4.x patches
- Manfred Spraul: coredump fixes, FS event counter cleanups
- me: fix SCSI CD-ROM sectorsize BIO breakage
|
|
|