user/sven/linux.git/kernel/futex.c, branch v2.6.14.2

[PATCH] futex: remove duplicate code

2005-09-07T23:57:33Z

This patch cleans up the error path of futex_fd() by removing duplicate code. Signed-off-by: Pekka Enberg Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

[PATCH] FUTEX_WAKE_OP: pthread_cond_signal() speedup

2005-09-07T23:57:17Z

ATM pthread_cond_signal is unnecessarily slow, because it wakes one waiter (which at least on UP usually means an immediate context switch to one of the waiter threads). This waiter wakes up and after a few instructions it attempts to acquire the cv internal lock, but that lock is still held by the thread calling pthread_cond_signal. So it goes to sleep and eventually the signalling thread is scheduled in, unlocks the internal lock and wakes the waiter again. Now, before 2003-09-21 NPTL was using FUTEX_REQUEUE in pthread_cond_signal to avoid this performance issue, but it was removed when locks were redesigned to the 3 state scheme (unlocked, locked uncontended, locked contended). Following scenario shows why simply using FUTEX_REQUEUE in pthread_cond_signal together with using lll_mutex_unlock_force in place of lll_mutex_unlock is not enough and probably why it has been disabled at that time: The number is value in cv->__data.__lock. thr1 thr2 thr3 0 pthread_cond_wait 1 lll_mutex_lock (cv->__data.__lock) 0 lll_mutex_unlock (cv->__data.__lock) 0 lll_futex_wait (&cv->__data.__futex, futexval) 0 pthread_cond_signal 1 lll_mutex_lock (cv->__data.__lock) 1 pthread_cond_signal 2 lll_mutex_lock (cv->__data.__lock) 2 lll_futex_wait (&cv->__data.__lock, 2) 2 lll_futex_requeue (&cv->__data.__futex, 0, 1, &cv->__data.__lock) # FUTEX_REQUEUE, not FUTEX_CMP_REQUEUE 2 lll_mutex_unlock_force (cv->__data.__lock) 0 cv->__data.__lock = 0 0 lll_futex_wake (&cv->__data.__lock, 1) 1 lll_mutex_lock (cv->__data.__lock) 0 lll_mutex_unlock (cv->__data.__lock) # Here, lll_mutex_unlock doesn't know there are threads waiting # on the internal cv's lock Now, I believe it is possible to use FUTEX_REQUEUE in pthread_cond_signal, but it will cost us not one, but 2 extra syscalls and, what's worse, one of these extra syscalls will be done for every single waiting loop in pthread_cond_*wait. We would need to use lll_mutex_unlock_force in pthread_cond_signal after requeue and lll_mutex_cond_lock in pthread_cond_*wait after lll_futex_wait. Another alternative is to do the unlocking pthread_cond_signal needs to do (the lock can't be unlocked before lll_futex_wake, as that is racy) in the kernel. I have implemented both variants, futex-requeue-glibc.patch is the first one and futex-wake_op{,-glibc}.patch is the unlocking inside of the kernel. The kernel interface allows userland to specify how exactly an unlocking operation should look like (some atomic arithmetic operation with optional constant argument and comparison of the previous futex value with another constant). It has been implemented just for ppc*, x86_64 and i?86, for other architectures I'm including just a stub header which can be used as a starting point by maintainers to write support for their arches and ATM will just return -ENOSYS for FUTEX_WAKE_OP. The requeue patch has been (lightly) tested just on x86_64, the wake_op patch on ppc64 kernel running 32-bit and 64-bit NPTL and x86_64 kernel running 32-bit and 64-bit NPTL. With the following benchmark on UP x86-64 I get: for i in nptl-orig nptl-requeue nptl-wake_op; do echo time elf/ld.so --library-path .:$i /tmp/bench; \ for j in 1 2; do echo ( time elf/ld.so --library-path .:$i /tmp/bench ) 2>&1; done; done time elf/ld.so --library-path .:nptl-orig /tmp/bench real 0m0.655s user 0m0.253s sys 0m0.403s real 0m0.657s user 0m0.269s sys 0m0.388s time elf/ld.so --library-path .:nptl-requeue /tmp/bench real 0m0.496s user 0m0.225s sys 0m0.271s real 0m0.531s user 0m0.242s sys 0m0.288s time elf/ld.so --library-path .:nptl-wake_op /tmp/bench real 0m0.380s user 0m0.176s sys 0m0.204s real 0m0.382s user 0m0.175s sys 0m0.207s The benchmark is at: http://sourceware.org/ml/libc-alpha/2005-03/txt00001.txt Older futex-requeue-glibc.patch version is at: http://sourceware.org/ml/libc-alpha/2005-03/txt00002.txt Older futex-wake_op-glibc.patch version is at: http://sourceware.org/ml/libc-alpha/2005-03/txt00003.txt Will post a new version (just x86-64 fixes so that the patch applies against pthread_cond_signal.S) to libc-hacker ml soon. Attached is the kernel FUTEX_WAKE_OP patch as well as a simple-minded testcase that will not test the atomicity of the operation, but at least check if the threads that should have been woken up are woken up and whether the arithmetic operation in the kernel gave the expected results. Acked-by: Ingo Molnar Cc: Ulrich Drepper Cc: Jamie Lokier Cc: Rusty Russell Signed-off-by: Yoichi Yuasa Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

[PATCH] convert that currently tests _NSIG directly to use valid_signal()

2005-05-01T15:59:14Z

Convert most of the current code that uses _NSIG directly to instead use valid_signal(). This avoids gcc -W warnings and off-by-one errors. Signed-off-by: Jesper Juhl Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

[PATCH] Futex: make futex_wait() atomic again

2005-03-28T12:00:54Z

Call get_futex_value_locked in futex_wait with futex hash bucket locked and only enqueue the futex if futex has the expected value. Simplify futex_requeue. Signed-off-by: Jakub Jelinek Acked-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

[PATCH] base-small: shrink futex queues

2005-03-08T02:04:09Z

CONFIG_BASE_SMALL reduce futex hash table Signed-off-by: Matt Mackall Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

[PATCH] Fix possible futex mmap_sem deadlock

2005-02-23T05:56:33Z

Some futex functions do get_user calls while holding mmap_sem for reading. If get_user() faults, and another thread happens to be in mmap (or somewhere else holding waiting on down_write for the same semaphore), then do_page_fault will deadlock. Most architectures seem to be exposed to this. To avoid it, make sure the page is available. If not, release the semaphore, fault it in and retry. I also found another exposure by inspection, moving some of the code around avoids the possible deadlock there. Signed-off-by: Olof Johansson Signed-off-by: Linus Torvalds

[PATCH] Remove Futex Warning

2004-11-29T12:24:39Z

If we're waiting on a futex and we are woken up, it's either because someone did FUTEX_WAKE, we timed out, or have been signalled. However, the WARN_ON(!signal_pending(current)) test is overzealous: with threads (a common use of futexes), we share the signal handler and the other thread might get to the signal before us. In addition, exit_notify() can do a recalc_sigpending_tsk() on us, which will then clear our TIF_SIGPENDING bit, making signal_pending(current) return false. Returning EINTR is a little strange in this case, since this thread hasn't handled a signal. However, with threads it's the best we can do: there's always a race where another thread could have been the actual one to handle the signal. Signed-off-by: Rusty Russell Signed-off-by: Linus Torvalds

[PATCH] revert recent futex_wait fix

2004-11-14T10:57:08Z

The patch was wrong. Back it out, and add some commentary explaining why we need to run queue_me() prior to the get_user(). Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

[PATCH] futex_wait hang fix

2004-11-11T05:40:33Z

NPTL has 3 control counters (total/wake/woken). so NPTL can know: "how many threads enter to wait"(total), "how many threads receive wake signal"(wake), and "how many threads exit waiting"(woken). Abstraction of pthread_cond_wait and pthread_cond_signal are: A01 pthread_cond_wait { A02 timeout = 0; A03 lock(counters); A04 total++; A05 val = get_from(futex); A06 unlock(counters); A07 A08 sys_futex(futex, FUTEX_WAIT, val, timeout); A09 A10 lock(counters); A11 woken++; A12 unlock(counters); A13 } B01 pthread_cond_signal { B02 lock(counters); B03 if(total>wake) { /* if there is waiter */ B04 wake++; B05 update_val(futex); B06 sys_futex(futex, FUTEX_WAKE, 1); B07 } B08 unlock(counters); B09 } What we have to notice is: FUTEX_WAKE could be called before FUTEX_WAIT have called (at A07). In such case, FUTEX_WAKE will fail if there is no thread in waitqueue. However, since pthread_cond_signal do not only wake++ but also update_val(futex), next FUTEX_WAIT will fail with -EWOULDBLOCK because the val passed to WAIT is now not equal to updated val. Therefore, as the result, it seems that the WAKE wakes the WAIT. === The bug will appear if 2 pair of wait & wake called at (nearly)once: * Assume 4 threads, wait_A, wait_B, wake_X, and wake_Y * counters start from [total/wake/woken]=[0/0/0] * the val of futex starts from (0), update means inclement of the val. * there is no thread in waitqueue on the futex. [simulation] wait_A: calls pthread_cond_wait: total++, prepare to call FUTEX_WAIT with val=0. # status: [1/0/0] (0) queue={}(empty) # wake_X: calls pthread_cond_signal: no one in waitqueue, just wake++ and update futex val. # status: [1/1/0] (1) queue={}(empty) # wait_B: calls pthread_cond_wait: total++, prepare to call FUTEX_WAIT with val=1. # status: [2/1/0] (1) queue={}(empty) # wait_A: calls FUTEX_WAIT with val=0: after queueing, compare val. 0!=1 ... this should be blocked... # status: [2/1/0] (1) queue={A} # wait_B: calls FUTEX_WAIT with val=1: after queueing, compare val. 1==1 ... OK, let's schedule()... # status: [2/1/0] (1) queue={A,B} (B=sleeping) # wake_Y: calls pthread_cond_signal: A is in waitqueue ... dequeue A, wake++ and update futex val. # status: [2/2/0] (2) queue={B} (B=sleeping) # wait_A: end of FUTEX_WAIT with val=0: try to dequeue but already dequeued, return anyway. # status: [2/2/0] (2) queue={B} (B=sleeping) # wait_A: end of pthread_cond_wait: woken++. # status: [2/2/1] (2) queue={B} (B=sleeping) # This is bug: wait_A: wakeup wait_B: sleeping wake_X: wake A wake_Y: wake A again if subsequent wake_Z try to wake B: wake_Z: calls pthread_cond_signal: since total==wake, do nothing. # status: [2/2/1] (2) queue={B} (B=sleeping) # If wait_C comes, B become to can be woken, but C... This bug makes the waitqueue to trap some threads in it all time. ==== > - According to man of futex: > "If the futex was not equal to the expected value, the operation > returns -EWOULDBLOCK." > but now, here is no description about the rare case: > "returns 0 if the futex was not equal to the expected value, but > the process was woken by a FUTEX_WAKE call." > this behavior on rare case causes the hang which I found. So to avoid this problem, my patch shut up the window that you said: > The patch certainly looks sensible - I can see that without the patch, > there is a window in which this process is pointlessly queued up on the > futex and that in this window a wakeup attempt might do a bad thing. ===== In short: There is an un-documented behavior of futex_wait. This behavior misleads NPTL to wake a thread doubly, as the result, causes an application hang. Signed-off-by: Hidetoshi Seto Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

[PATCH] Lock initializer unifying (Core)

2004-10-28T01:33:04Z

To make spinlock/rwlock initialization consistent all over the kernel, this patch converts explicit lock-initializers into spin_lock_init() and rwlock_init() calls. Currently, spinlocks and rwlocks are initialized in two different ways: lock = SPIN_LOCK_UNLOCKED spin_lock_init(&lock) rwlock = RW_LOCK_UNLOCKED rwlock_init(&rwlock) this patch converts all explicit lock initializations to spin_lock_init() or rwlock_init(). (Besides consistency this also helps automatic lock validators and debugging code.) The conversion was done with a script, it was verified manually and it was reviewed, compiled and tested as far as possible on x86, ARM, PPC. There is no runtime overhead or actual code change resulting out of this patch, because spin_lock_init() and rwlock_init() are macros and are thus equivalent to the explicit initialization method. Signed-off-by: Thomas Gleixner Acked-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds