| Age | Commit message (Collapse) | Author |
|
|
|
|
|
idle_cpu had the same botched move from kernel/ksyms.c to kernel/sched.c
that __wake_up_sync() had.
|
|
from kernel/ksyms.c to kernel/sched.c.
Noted by Richard Henderson <rth@twiddle.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
From: Jonathan Corbet <corbet@lwn.net>
Nobody told me that the failure to export these (like their block
counterparts) was anything but an oversight; modules will not be able to
use larger device numbers without them. So...this patch exports the new
char device functions.
|
|
cdevname() killed, there was only one remaining user
(tty_paranoia_check()) and in that case cdevname() was worse
than plain major:minor (basically, it's "you've got corrupted
inode that was supposed to belong to tty device; here's what
I'd found in ->i_rdev")
|
|
|
|
This removes EXPORT_SYMBOL(add_timer) since add_timer() became inline
recently.
|
|
From: Oliver Xymoron <oxymoron@waste.org>
These patches add the infrastructure for reporting asynchronous write errors
to block devices to userspace. Error which are detected due to pdflush or VM
writeout are reported at the next fsync, fdatasync, or msync on the given
file, and on close if the error occurs in time.
We do this by propagating any errors into page->mapping->error when they are
detected. In fsync(), msync(), fdatasync() and close() we return that error
and zero it out.
The Open Group say close() _may_ fail if an I/O error occurred while reading
from or writing to the file system. Well, in this implementation close() can
return -EIO or -ENOSPC. And in that case it will succeed, not fail - perhaps
that is what they meant.
There are three patches in this series and testing has only been performed
with all three applied.
|
|
From: Ingo Molnar <mingo@elte.hu>
It unifies the functionality of add_timer() and mod_timer(), and makes any
combination of the timer API calls completely SMP-safe. del_timer() is still
not using the timer lock.
this patch fixes the only timer bug in 2.6 i'm aware of: the del_timer_sync()
+ add_timer() combination in kernel/itimer.c is buggy. This was correct code
in 2.4, because there it was safe to do an add_timer() from the timer handler
itself, parallel to a del_timer_sync().
If we want to make this safe in 2.6 too (which i think we want to) then we
have to make add_timer() almost equivalent to mod_timer(), locking-wise. And
once we are at this point i think it's much cleaner to actually make
add_timer() a variant of mod_timer(). (There's no locking cost for
add_timer(), only the cost of an extra branch. And we've removed another
commonly used function from the icache.)
|
|
From: jbarnes@sgi.com (Jesse Barnes)
hwgfs needs lookup_create(), and intermezzo already has copied it.
Document it, export it to modules and fix intermezzo.
|
|
install_page() is a library function which we expect will be used by all
drivers which implement vm_operations.populate(). Therefore it should be
exported to kernel modules.
Petr Vandrovec has a project which involves sparse mappings of device memory
which can use remap_file_pages(). It needs install_page().
|
|
From: "Paul E. McKenney" <paulmck@us.ibm.com>
The patch reworks and generalises vmtruncate_list() a bit to create an API
which invalidates a specified portion of an address_space, permitting
distributed filesystems to maintain POSIX semantics when a file mmap()ed on
one client is modified on another client.
|
|
This solves the unresolved symbol problem with modular intermezzo. Also
update the MAINTAINERS entry.
|
|
From: Andreas Gruenbacher <agruen@suse.de>
Without acls, when creating files the umask is applied directly in the vfs.
ACLs require that the umask is applied at the file system level, depending on
whether or not the containing directory has a default acl. The daemonize()
function makes kernel threads share their fs_struct structure with the init
process. Among other things, fs_struct contains the umask, so all kernel
threads share their umask with init.
The kernel nfsd needs to create files with a umask of 0. Init's umask cannot
simply be changed to 0 --- this would have side effects on init, and init
would have side effects on nfsd. So this patch recreates a fs_struct
structure for nfsd kernel threads, and sets its umask to 0.
This fixes bug #721, <http://www.osdl.net/show_bug.cgi?id=721>.
|
|
The function cpu_raise_softirq() takes a softirq number, and a cpu number,
but cannot be used with cpu != smp_processor_id(), because there's no
locking around the pending softirq lists. Since noone does this, remove
that arg.
As per Linus' suggestion, names changed:
raise_softirq(int nr)
cpu_raise_softirq(int cpu, int nr) -> raise_softirq_irqoff(int nr)
__cpu_raise_softirq(int cpu, int nr) -> __raise_softirq_irqoff(int nr)
|
|
With the recent fixes, io_schedule needs to be exported for modular dm
to work.
|
|
|
|
init_thread_union doesn't need to be exported to modules.
We haven't exported the symbol on ia64 for ages, and we should be able
to make the init_thread_union local to arch/ARCH/kernel/init_task.c and
that in turn would let us remove its declaration from
include/linux/sched.h altogether (i.e., no more ugly #ifdefs).
|
|
Try to trap some more state when an assertion which cannot happen happens.
|
|
From: Christoph Hellwig <hch@lst.de>
currently only x86_64 and ia64 don't use the generic irq_cpustat code
and both have to workaround it's brokenness for the non-default case.
x86_64 defines an empty irq_cpustat_t even if it doesn't need one and
ia64 adds CONFIG_IA64 ifdefs around all users. What about this patch
instead to make __ARCH_IRQ_STAT useable?
|
|
From: Manfred Spraul and Brian Gerst
The patch performs the kmalloc cache lookup for constant kmalloc calls at
compile time. The idea is that the loop in kmalloc takes a significant
amount of time, and for kmalloc(4096,GFP_KERNEL), that lookup can happen
entirely at compile time.
A problem has been seen with gcc-3.2.2-5 from RedHat. This code:
if(__builtin_constant_t(size)) {
if(size < 32) return kmem_cache_alloc(...);
if(size < 64) return kmem_cache_alloc(...);
if(size < 96) return kmem_cache_alloc(...);
if(size < 128) return kmem_cache_alloc(...);
...
}
doesn't work, because gcc only optimizes the first two or three comparisons,
and then suddenly generates code.
But we did it that way anyway. Apparently it's fixed in later compilers.
|
|
From: Rusty Russell <rusty@rustcorp.com.au>
Several tweaks to the kmalloc_percpu()/kfree_percpu() interface, to
allow future implementations to be more flexible, and make easier to
use now we can see how it's actually being used.
1) No flags argument: GFP_ATOMIC doesn't make much sense,
2) Explicit alignment argument, so we don't have to give SMP_CACHE_BYTES
alignment always,
3) Zeros memory, since most callers want that and it's not entirely
trivial,
4) Convenient type-safe wrapper which takes a typename, and
5) Rename to alloc_percpu/__alloc_percpu, since usage no longer matches
kmalloc.
|
|
From: Oleg Drokin <green@namesys.com>
With the current 'one block at a time' algorithm, writes past the end of a
file are slow because each new file block is separately added into the tree
causing shifting of other items which is CPU expensive.
With this new implementation if you write into file with big enough chunks,
it uses half as much CPU. Also this version is more SMP friendly than the
current one.
There are some known-bad applications that break with this patch (ie. start
to work very slow or even hang).
This is because the filesystem returns a large value in the stat.st_blocksize
hint (128k instead of 4k). This tickles a small number of application bugs.
One is KDE's kmail 3.04 (fixed by upgrading to 3.1+) and the other is
sleepycat's database from before 1997.
If you hit a slowdown problem that you believe is related to the increased
"recommended i/o size" value, try to mount your fs with nolargeio=1 mount
option (remount should work too).
This patch exports block_commit_write(), generic_osync_inode() and
remove_suid() to modules.
|
|
Don't depend on undefined preprocessor symbols evaluating to zero.
|
|
|
|
From: Christoph Hellwig <hch@lst.de>
partition_name() is a variant of __bdevname() that caches results and
returns a pointrer to kmalloc()ed data instead of printing into a buffer.
Due to it's caching it gets utterly confused when the name for a dev_t
changes (can happen easily now with device mapper and probably in the
future with dynamic dev_t users).
It's only used by the raid code and most calls are through a wrapper,
bdev_partition_name() which takes a struct block_device * that maybe be
NULL.
The patch below changes the bdev_partition_name() to call bdevname() if
possible and the other calls where we really have nothing more than a dev_t
to __bdevname.
Btw, it would be nice if someone who knows the md code a bit better than me
could remove bdev_partition_name() in favour of direct calls to bdevname()
where possible - that would also get rid of the returns pointer to string
on stack issue that this patch can't fix yet.
|
|
|
|
|
|
New helper - bdget_disk(gendisk, partition)
invalidate_device() replaced with invalidate_partition(disk, part)
|
|
New helper - open_by_devnum(). Opens block_device by device number;
for use in situations when we really have nothing better than dev_t (i.e.
had received it from stupid userland API).
|
|
A couple of helpers - simple_pin_fs() and simple_release_fs().
My fault - that code should've been put into libfs.c from the very
beginning. As it is, it got copied all over the place (binfmt_misc,
capifs, usbfs, usbdevfs, rpc_pipefs).
Taken to libfs.c and cleaned up.
|
|
New libfs.c helper - simple_fill_super(). Abstracted from
nfsd/nfsctl.c, couple of filesystems converted to it (nfsctl, binfmt_misc).
Function takes an array of triples (name, file_operations, mode),
superblock and value for its ->s_magic. It acts as fill_super() - populates
superblock or fails. We get a ramfs-style flat tree - root directory and
a bunch of files in it.
That animal allows to put together a simple filesystem without
touching any directory-related stuff - now it's as easy as implementing
file_operations for files you want to have and telling what to call them.
|
|
* bogus calls of invalidate_buffers() gone from floppy_open()
* invalidate_buffers() killed.
* new helper - __invalidate_device(bdev, do_sync). invalidate_device()
is calling it.
* fixed races between floppy_open()/floppy_open and
floppy_open()/set_geometry():
a) floppy_open()/floppy_release() is done under a semaphore. That
closes the races between simultaneous open() on /dev/fd0foo and /dev/fd0bar.
b) pointer to struct block_device is kept as long as floppy is
opened (per-drive, non-NULL when number of openers is non-zero, does not
contribute to block_device refcount).
c) set_geometry() grabs the same semaphore and invalidates the
devices directly instead of messing with setting fake "it had changed"
and calling __check_disk_change().
* __check_disk_change() killed - no remaining callers
* full_check_disk_change() killed - ditto.
|
|
Several places in ext2 and ext3 are using filesystem-wide counters which use
global locking. Mainly for the orlov allocator's heuristics.
To solve the contention which this causes we can trade off accuracy against
speed.
This patch introduces a "percpu_counter" library type in which the counts are
per-cpu and are periodically spilled into a global counter. Readers only
read the global counter.
These objects are *large*. On a 32 CPU P4, they are 4 kbytes. On a 4 way
p3, 128 bytes.
|
|
The big SMP machines are seeing quite some contention in dnotify_parent()
(via vfs_write). This function is hammering the global dparent_lock.
However we don't actually need a global dparent_lock for pinning down
dentry->d_parent. We can use dentry->d_lock for this. That is already being
held across d_move.
This patch speeds up SDET on the 16-way by 5% and wipes dnotify_parent() off
the profiles.
It also uninlines dnofity_parent().
It also uses spin_lock(), which is faster than read_lock().
I'm not sure that we need to take both the source and target dentry's d_lock
in d_move.
The patch also does lots of s/__inline__/inline/ in dcache.h
|
|
|