| Age | Commit message (Collapse) | Author |
|
To make spinlock/rwlock initialization consistent all over the kernel,
this patch converts explicit lock-initializers into spin_lock_init() and
rwlock_init() calls.
Currently, spinlocks and rwlocks are initialized in two different ways:
lock = SPIN_LOCK_UNLOCKED
spin_lock_init(&lock)
rwlock = RW_LOCK_UNLOCKED
rwlock_init(&rwlock)
this patch converts all explicit lock initializations to
spin_lock_init() or rwlock_init(). (Besides consistency this also helps
automatic lock validators and debugging code.)
The conversion was done with a script, it was verified manually and it
was reviewed, compiled and tested as far as possible on x86, ARM, PPC.
There is no runtime overhead or actual code change resulting out of this
patch, because spin_lock_init() and rwlock_init() are macros and are
thus equivalent to the explicit initialization method.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch uses the rcu_assign_pointer() API to eliminate a number of explicit
memory barriers from the SysV IPC code that uses RCU. It also restructures
the ipc_ids structure so that the array size is stored in the same memory
block as the array itself (see the new struct ipc_id_ary). This prevents the
race that the earlier code was subject to, where a reader could see a mismatch
between the size and the actual array. With the size stored with the array,
the possibility of mismatch is eliminated -- with out the need for careful
ordering and explicit memory barriers. This has been tested successfully on
i386 and ppc64.
Signed-off-by: <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Use abstracted RCU API to dereference RCU protected data. Hides barrier
details. Patch from Paul McKenney.
This patch introduced an rcu_dereference() macro that replaces most uses of
smp_read_barrier_depends(). The new macro has the advantage of explicitly
documenting which pointers are protected by RCU -- in contrast, it is
sometimes difficult to figure out which pointer is being protected by a given
smp_read_barrier_depends() call.
Signed-off-by: Paul McKenney <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The lifetime of the ipc objects (sem array, msg queue, shm mapping) is
controlled by kern_ipc_perms->lock - a spinlock. There is no simple way to
reacquire this spinlock after it was dropped to
schedule()/kmalloc/copy_{to,from}_user/whatever.
The attached patch adds a reference count as a preparation to get rid of
sem_revalidate().
Signed-Off-By: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: Dipankar Sarma <dipankar@in.ibm.com>
This patch changes the call_rcu() API and avoids passing an argument to the
callback function as suggested by Rusty. Instead, it is assumed that the
user has embedded the rcu head into a structure that is useful in the
callback and the rcu_head pointer is passed to the callback. The callback
can use container_of() to get the pointer to its structure and work with
it. Together with the rcu-singly-link patch, it reduces the rcu_head size
by 50%. Considering that we use these in things like struct dentry and
struct dst_entry, this is good savings in space.
An example :
struct my_struct {
struct rcu_head rcu;
int x;
int y;
};
void my_rcu_callback(struct rcu_head *head)
{
struct my_struct *p = container_of(head, struct my_struct, rcu);
free(p);
}
void my_delete(struct my_struct *p)
{
...
call_rcu(&p->rcu, my_rcu_callback);
...
}
Signed-Off-By: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: David Mosberger <davidm@napali.hpl.hp.com>
Below is a patch that tries to sanitize the dropping of unneeded system-call
stubs in generic code. In some instances, it would be possible to move the
optional system-call stubs into a library routine which would avoid the need
for #ifdefs, but in many cases, doing so would require making several
functions global (and possibly exporting additional data-structures in
header-files). Furthermore, it would inhibit (automatic) inlining in the
cases in the cases where the stubs are needed. For these reasons, the patch
keeps the #ifdef-approach.
This has been tested on ia64 and there were no objections from the
arch-maintainers (and one positive response). The patch should be safe but
arch-maintainers may want to take a second look to see if some __ARCH_WANT_foo
macros should be removed for their architecture (I'm quite sure that's the
case, but I wanted to play it safe and only preserved the status-quo in that
regard).
|
|
From: Manfred Spraul <manfred@colorfullife.com>
cleanup of sysv ipc as a preparation for posix message queues:
- replace !CONFIG_SYSVIPC wrappers for copy_semundo and exit_sem with
static inline wrappers. Now the whole ipc/util.c file is only used if
CONFIG_SYSVIPC is set, use makefile magic instead of #ifdef.
- remove the prototypes for copy_semundo and exit_sem from kernel/fork.c
- they belong into a header file.
- create a new msgutil.c with the helper functions for message queues.
- cleanup the helper functions: run Lindent, add __user tags.
|
|
From: Manfred Spraul <manfred@colorfullife.com>
Attached is a patch that replaces the #ifndef CONFIG_SYSV syscall stubs
with cond_syscall stubs.
|
|
PA-RISC also uses the 64-bit version of the IPC structs.
|
|
This fixes CONFIG_UID16 problems on x86-64 as discussed earlier.
CONFIG_UID16 now only selects the inclusion of kernel/uid16.c, all
conversions are triggered dynamically based on type sizes. This allows
x86-64 to both include uid16.c for emulation purposes, but not truncate
uids to 16bit in sys_newstat.
- Replace the old macros from linux/highuid.h with new SET_UID/SET_GID
macros that do type checking. Based on Linus' proposal.
- Fix everybody to use them.
- Clean up some cruft in the x86-64 32bit emulation allowed by this
(other 32bit emulations could be cleaned too, but I'm too lazy for
that right now)
- Add one missing EOVERFLOW check in x86-64 32bit sys_newstat while
I was at it.
|
|
From: Andrea Arcangeli <andrea@suse.de>
aka: "vmalloc allocations in ipc needs smp initialized (and vm must be
allowed to schedule in 2.6)"
In short if you change SEMMNI to 8192 the kernel will crash at boot, beause
it tries to call vmalloc before the smp is initialized. The reason is that
vmalloc calls into the pte alloc code, and the fast pte alloc is tried
first, but that reads into the pte_quicklist, that requires the cpu_data to
be initialized (and that happens in smp_init()).
the patch is obviously safe, since no piece of kernel (especially the code
in the check_bugs and smp_init paths ;) calls into the ipc subsystem.
The reason this started to trigger wasn't really that we increased SEMMNI,
but what happend is that some IPC data structure grown, and for some reason
the corruption due the uninitalized pte_quicklist triggers only for smp
boxes with less than 1G (not very common anymore ;). So it wasn't
immediatly reproducible on all setups.
2.6 doesn't suffer from the same problem, simply because 2.6 isn't using
the quicklist anymore, but I think it would be much more correct to make
the same change in 2.6 too, since whatever cond_resched() in the vm paths
(and they're definitely allowed to call it), will lead to a crash since the
init task isn't initialized and the scheduler can't be invoked yet. (and
2.6 already has the bigger data structures that should trigger the vmalloc
all the time on all setups)
|
|
AMD64 like IA64 needs to force IPC_64 in the IPC functions. This makes
2.5 compatible with 2.4 again.
|
|
From: Manfred Spraul <manfred@colorfullife.com>
The CLONE_SYSVSEM implementation is racy: it does an (atomic_read(->refcnt)
==1) instead of atomic_dec_and_test calls in the exit handling. The patch
fixes that.
Additionally, the patch contains the following changes:
- lock_undo() locks the list of undo structures. The lock is held
throughout the semop() syscall, but that's unnecessary - we can drop it
immediately after the lookup.
- undo structures are only allocated when necessary. The need for undo
structures is only noticed in the middle of the semop operation, while
holding the semaphore array spinlock. The result is a convoluted
unlock&revalidate implementation. I've reordered the code, and now the
undo allocation can happen before acquiring the semaphore array spinlock.
As a bonus, less code runs under the semaphore array spinlock.
- sysvsem.sleep_list looks like code to handle oopses: if an oops kills a
thread that sleeps in sys_timedsemop(), then sem_exit tries to recover.
I've removed that - too fragile.
|
|
Patch from Mark Fasheh <mark.fasheh@oracle.com> (plus a few cleanups
and a speedup from yours truly)
Adds the semtimedop() function - semop with a timeout. Solaris has
this. It's apparently worth a couple of percent to Oracle throughput
and given the simplicity, that is sufficient benefit for inclusion IMO.
This patch hooks up semtimedop() only for ia64 and ia32.
|
|
Patch from Mingming Cao <cmm@us.ibm.com>
- ipc_lock() need a read_barrier_depends() to prevent indexing
uninitialized new array on the read side. This is corresponding to
the write memory barrier added in grow_ary() from Dipankar's patch to
prevent indexing uninitialized array.
- Replaced "wmb()" in IPC code with "smp_wmb()"."wmb()" produces a
full write memory barrier in both UP and SMP kernels, while
"smp_wmb()" provides a full write memory barrier in an SMP kernel,
but only a compiler directive in a UP kernel. The same change are
made for "rmb()".
- Removed rmb() in ipc_get(). We do not need a read memory barrier
there since ipc_get() is protected by ipc_ids.sem semaphore.
- Added more comments about why write barriers and read barriers are
needed (or not needed) here or there.
|
|
|
|
From Dipanker Sarma.
Before setting the ids->entries to the new array, there must be a wmb()
to make sure that the memcpyed contents of the new array are visible
before the new array becomes visible.
|
|
Uninlines some large functions in the ipc code.
Before:
text data bss dec hex filename
30226 224 192 30642 77b2 ipc/built-in.o
After:
text data bss dec hex filename
20274 224 192 20690 50d2 ipc/built-in.o
|
|
Patch from Mingming, Rusty, Hugh, Dipankar, me:
- It greatly reduces the lock contention by having one lock per id.
The global spinlock is removed and a spinlock is added in
kern_ipc_perm structure.
- Uses ReadCopyUpdate in grow_ary() for locking-free resizing.
- In the places where ipc_rmid() is called, delay calling ipc_free()
to RCU callbacks. This is to prevent ipc_lock() returning an invalid
pointer after ipc_rmid(). In addition, use the workqueue to enable
RCU freeing vmalloced entries.
Also some other changes:
- Remove redundant ipc_lockall/ipc_unlockall
- Now ipc_unlock() directly takes IPC ID pointer as argument, avoid
extra looking up the array.
The changes are made based on the input from Huge Dickens, Manfred
Spraul and Dipankar Sarma. In addition, Cliff White has run OSDL's
dbt1 test on a 2 way against the earlier version of this patch.
Results shows about 2-6% improvement on the average number of
transactions per second. Here is the summary of his tests:
2.5.42-mm2 2.5.42-mm2-ipclock
-----------------------------
Average over 5 runs 85.0 BT 89.8 BT
Std Deviation 5 runs 7.4 BT 1.0 BT
Average over 4 best 88.15 BT 90.2 BT
Std Deviation 4 best 2.8 BT 0.5 BT
Also, another test today from Bill Hartner:
I tested Mingming's RCU ipc lock patch using a *new* microbenchmark - semopbench.
semopbench was written to test the performance of Mingming's patch.
I also ran a 3 hour stress and it completed successfully.
Explanation of the microbenchmark is below the results.
Here is a link to the microbenchmark source.
http://www-124.ibm.com/developerworks/opensource/linuxperf/semopbench/semopbench.c
SUT : 8-way 700 Mhz PIII
I tested 2.5.44-mm2 and 2.5.44-mm2 + RCU ipc patch
>semopbench -g 64 -s 16 -n 16384 -r > sem.results.out
>readprofile -m /boot/System.map | sort -n +0 -r > sem.profile.out
The metric is seconds / per repetition. Lower is better.
kernel run 1 run 2
seconds seconds
================== ======= =======
2.5.44-mm2 515.1 515.4
2.5.44-mm2+rcu-ipc 46.7 46.7
With Mingming's patch, the test completes 10X faster.
|
|
|
|
The patch below adds the base set of LSM hooks for System V IPC to the
2.5.41 kernel. These hooks permit a security module to label
semaphore sets, message queues, and shared memory segments and to
perform security checks on these objects that parallel the existing
IPC access checks. Additional LSM hooks for labeling and controlling
individual messages sent on a single message queue and for providing
fine-grained distinctions among IPC operations will be submitted
separately after this base set of LSM IPC hooks has been accepted.
|
|
As we discussed some time ago, here is a patch for the SEM_UNDO change
that can be applied to linux-2.5.9.
|
|
- me: fix forgotten nfsd usage of filldir off_t -> loff_t change
- Alan Cox: more driver merges
|
|
- sync up more with Alan
- Urban Widmark: smbfs and HIGHMEM fix
- Chris Mason: reiserfs tail unpacking fix ("null bytes in reiserfs files")
- Adan Richter: new cpia usb ID
- Hugh Dickins: misc small sysv ipc fixes
- Andries Brouwer: remove overly restrictive sector size check for
SCSI cd-roms
|
|
- Jens: better ordering of requests when unable to merge
- Neil Brown: make md work as a module again (we cannot autodetect
in modules, not enough background information)
- Neil Brown: raid5 SMP locking cleanups
- Neil Brown: nfsd: handle Irix NFS clients named pipe behavior and
dentry leak fix
- maestro3 shutdown fix
- fix dcache hash calculation that could cause bad hashes under certain
circumstances (Dean Gaudet)
- David Miller: networking and sparc updates
- Jeff Garzik: include file cleanups
- Andy Grover: ACPI update
- Coda-fs error return fixes
- rth: alpha Jensen update
|
|
|