<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/futex.c, branch v2.6.14.2</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v2.6.14.2</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v2.6.14.2'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2005-09-07T23:57:33Z</updated>
<entry>
<title>[PATCH] futex: remove duplicate code</title>
<updated>2005-09-07T23:57:33Z</updated>
<author>
<name>Pekka Enberg</name>
<email>penberg@cs.helsinki.fi</email>
</author>
<published>2005-09-06T22:17:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=39ed3fdeec1290dd246dcf1da6b278566987a084'/>
<id>urn:sha1:39ed3fdeec1290dd246dcf1da6b278566987a084</id>
<content type='text'>
This patch cleans up the error path of futex_fd() by removing duplicate
code.

Signed-off-by: Pekka Enberg &lt;penberg@cs.helsinki.fi&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] FUTEX_WAKE_OP: pthread_cond_signal() speedup</title>
<updated>2005-09-07T23:57:17Z</updated>
<author>
<name>Jakub Jelinek</name>
<email>jakub@redhat.com</email>
</author>
<published>2005-09-06T22:16:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4732efbeb997189d9f9b04708dc26bf8613ed721'/>
<id>urn:sha1:4732efbeb997189d9f9b04708dc26bf8613ed721</id>
<content type='text'>
ATM pthread_cond_signal is unnecessarily slow, because it wakes one waiter
(which at least on UP usually means an immediate context switch to one of
the waiter threads).  This waiter wakes up and after a few instructions it
attempts to acquire the cv internal lock, but that lock is still held by
the thread calling pthread_cond_signal.  So it goes to sleep and eventually
the signalling thread is scheduled in, unlocks the internal lock and wakes
the waiter again.

Now, before 2003-09-21 NPTL was using FUTEX_REQUEUE in pthread_cond_signal
to avoid this performance issue, but it was removed when locks were
redesigned to the 3 state scheme (unlocked, locked uncontended, locked
contended).

Following scenario shows why simply using FUTEX_REQUEUE in
pthread_cond_signal together with using lll_mutex_unlock_force in place of
lll_mutex_unlock is not enough and probably why it has been disabled at
that time:

The number is value in cv-&gt;__data.__lock.
        thr1            thr2            thr3
0       pthread_cond_wait
1       lll_mutex_lock (cv-&gt;__data.__lock)
0       lll_mutex_unlock (cv-&gt;__data.__lock)
0       lll_futex_wait (&amp;cv-&gt;__data.__futex, futexval)
0                       pthread_cond_signal
1                       lll_mutex_lock (cv-&gt;__data.__lock)
1                                       pthread_cond_signal
2                                       lll_mutex_lock (cv-&gt;__data.__lock)
2                                         lll_futex_wait (&amp;cv-&gt;__data.__lock, 2)
2                       lll_futex_requeue (&amp;cv-&gt;__data.__futex, 0, 1, &amp;cv-&gt;__data.__lock)
                          # FUTEX_REQUEUE, not FUTEX_CMP_REQUEUE
2                       lll_mutex_unlock_force (cv-&gt;__data.__lock)
0                         cv-&gt;__data.__lock = 0
0                         lll_futex_wake (&amp;cv-&gt;__data.__lock, 1)
1       lll_mutex_lock (cv-&gt;__data.__lock)
0       lll_mutex_unlock (cv-&gt;__data.__lock)
          # Here, lll_mutex_unlock doesn't know there are threads waiting
          # on the internal cv's lock

Now, I believe it is possible to use FUTEX_REQUEUE in pthread_cond_signal,
but it will cost us not one, but 2 extra syscalls and, what's worse, one of
these extra syscalls will be done for every single waiting loop in
pthread_cond_*wait.

We would need to use lll_mutex_unlock_force in pthread_cond_signal after
requeue and lll_mutex_cond_lock in pthread_cond_*wait after lll_futex_wait.

Another alternative is to do the unlocking pthread_cond_signal needs to do
(the lock can't be unlocked before lll_futex_wake, as that is racy) in the
kernel.

I have implemented both variants, futex-requeue-glibc.patch is the first
one and futex-wake_op{,-glibc}.patch is the unlocking inside of the kernel.
 The kernel interface allows userland to specify how exactly an unlocking
operation should look like (some atomic arithmetic operation with optional
constant argument and comparison of the previous futex value with another
constant).

It has been implemented just for ppc*, x86_64 and i?86, for other
architectures I'm including just a stub header which can be used as a
starting point by maintainers to write support for their arches and ATM
will just return -ENOSYS for FUTEX_WAKE_OP.  The requeue patch has been
(lightly) tested just on x86_64, the wake_op patch on ppc64 kernel running
32-bit and 64-bit NPTL and x86_64 kernel running 32-bit and 64-bit NPTL.

With the following benchmark on UP x86-64 I get:

for i in nptl-orig nptl-requeue nptl-wake_op; do echo time elf/ld.so --library-path .:$i /tmp/bench; \
for j in 1 2; do echo ( time elf/ld.so --library-path .:$i /tmp/bench ) 2&gt;&amp;1; done; done
time elf/ld.so --library-path .:nptl-orig /tmp/bench
real 0m0.655s user 0m0.253s sys 0m0.403s
real 0m0.657s user 0m0.269s sys 0m0.388s
time elf/ld.so --library-path .:nptl-requeue /tmp/bench
real 0m0.496s user 0m0.225s sys 0m0.271s
real 0m0.531s user 0m0.242s sys 0m0.288s
time elf/ld.so --library-path .:nptl-wake_op /tmp/bench
real 0m0.380s user 0m0.176s sys 0m0.204s
real 0m0.382s user 0m0.175s sys 0m0.207s

The benchmark is at:
http://sourceware.org/ml/libc-alpha/2005-03/txt00001.txt
Older futex-requeue-glibc.patch version is at:
http://sourceware.org/ml/libc-alpha/2005-03/txt00002.txt
Older futex-wake_op-glibc.patch version is at:
http://sourceware.org/ml/libc-alpha/2005-03/txt00003.txt
Will post a new version (just x86-64 fixes so that the patch
applies against pthread_cond_signal.S) to libc-hacker ml soon.

Attached is the kernel FUTEX_WAKE_OP patch as well as a simple-minded
testcase that will not test the atomicity of the operation, but at least
check if the threads that should have been woken up are woken up and
whether the arithmetic operation in the kernel gave the expected results.

Acked-by: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Ulrich Drepper &lt;drepper@redhat.com&gt;
Cc: Jamie Lokier &lt;jamie@shareable.org&gt;
Cc: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Signed-off-by: Yoichi Yuasa &lt;yuasa@hh.iij4u.or.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] convert that currently tests _NSIG directly to use valid_signal()</title>
<updated>2005-05-01T15:59:14Z</updated>
<author>
<name>Jesper Juhl</name>
<email>juhl-lkml@dif.dk</email>
</author>
<published>2005-05-01T15:59:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7ed20e1ad521b5f5df61bf6559ae60738e393741'/>
<id>urn:sha1:7ed20e1ad521b5f5df61bf6559ae60738e393741</id>
<content type='text'>
Convert most of the current code that uses _NSIG directly to instead use
valid_signal().  This avoids gcc -W warnings and off-by-one errors.

Signed-off-by: Jesper Juhl &lt;juhl-lkml@dif.dk&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] Futex: make futex_wait() atomic again</title>
<updated>2005-03-28T12:00:54Z</updated>
<author>
<name>Jakub Jelínek</name>
<email>jakub@redhat.com</email>
</author>
<published>2005-03-28T12:00:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=58aceba09b4f67abd309d199de8f2da375f45e88'/>
<id>urn:sha1:58aceba09b4f67abd309d199de8f2da375f45e88</id>
<content type='text'>
Call get_futex_value_locked in futex_wait with futex hash bucket locked and
only enqueue the futex if futex has the expected value.  Simplify
futex_requeue.

Signed-off-by: Jakub Jelinek &lt;jakub@redhat.com&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] base-small: shrink futex queues</title>
<updated>2005-03-08T02:04:09Z</updated>
<author>
<name>Matt Mackall</name>
<email>mpm@selenic.com</email>
</author>
<published>2005-03-08T02:04:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=731184f54cc774f11e7569877c00102ad5bdc6f8'/>
<id>urn:sha1:731184f54cc774f11e7569877c00102ad5bdc6f8</id>
<content type='text'>
CONFIG_BASE_SMALL reduce futex hash table

Signed-off-by: Matt Mackall &lt;mpm@selenic.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] Fix possible futex mmap_sem deadlock</title>
<updated>2005-02-23T05:56:33Z</updated>
<author>
<name>Olof Johansson</name>
<email>olof@austin.ibm.com</email>
</author>
<published>2005-02-23T05:56:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f5f23ec8d572816c7ff9c6e5eb0c209c9faa008e'/>
<id>urn:sha1:f5f23ec8d572816c7ff9c6e5eb0c209c9faa008e</id>
<content type='text'>
Some futex functions do get_user calls while holding mmap_sem for
reading.  If get_user() faults, and another thread happens to be in mmap
(or somewhere else holding waiting on down_write for the same
semaphore), then do_page_fault will deadlock.  Most architectures seem
to be exposed to this.

To avoid it, make sure the page is available.  If not, release the
semaphore, fault it in and retry.

I also found another exposure by inspection, moving some of the code
around avoids the possible deadlock there.

Signed-off-by: Olof Johansson &lt;olof@austin.ibm.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] Remove Futex Warning</title>
<updated>2004-11-29T12:24:39Z</updated>
<author>
<name>Rusty Russell</name>
<email>rusty@rustcorp.com.au</email>
</author>
<published>2004-11-29T12:24:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e5f39047bfa57e93d5bf6657823c909dfd5c7564'/>
<id>urn:sha1:e5f39047bfa57e93d5bf6657823c909dfd5c7564</id>
<content type='text'>
If we're waiting on a futex and we are woken up, it's either because
someone did FUTEX_WAKE, we timed out, or have been signalled.  However, the
WARN_ON(!signal_pending(current)) test is overzealous: with threads (a
common use of futexes), we share the signal handler and the other
thread might get to the signal before us.  In addition, exit_notify()
can do a recalc_sigpending_tsk() on us, which will then clear our
TIF_SIGPENDING bit, making signal_pending(current) return false.

Returning EINTR is a little strange in this case, since this thread
hasn't handled a signal.  However, with threads it's the best we can
do: there's always a race where another thread could have been the
actual one to handle the signal.

Signed-off-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] revert recent futex_wait fix</title>
<updated>2004-11-14T10:57:08Z</updated>
<author>
<name>Jamie Lokier</name>
<email>jamie@shareable.org</email>
</author>
<published>2004-11-14T10:57:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=aebeba096859f0e6c9b35e04a2b4b9e9a56736b2'/>
<id>urn:sha1:aebeba096859f0e6c9b35e04a2b4b9e9a56736b2</id>
<content type='text'>
The patch was wrong.  Back it out, and add some commentary explaining why we
need to run queue_me() prior to the get_user().

Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] futex_wait hang fix</title>
<updated>2004-11-11T05:40:33Z</updated>
<author>
<name>Hidetoshi Seto</name>
<email>seto.hidetoshi@jp.fujitsu.com</email>
</author>
<published>2004-11-11T05:40:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4ee9f9196a5539321b7887afc6358b4ff9ec2179'/>
<id>urn:sha1:4ee9f9196a5539321b7887afc6358b4ff9ec2179</id>
<content type='text'>
NPTL has 3 control counters (total/wake/woken).
so NPTL can know:
  "how many threads enter to wait"(total),
  "how many threads receive wake signal"(wake),
  and "how many threads exit waiting"(woken).

Abstraction of pthread_cond_wait and pthread_cond_signal are:

A01 pthread_cond_wait {
A02   timeout = 0;
A03   lock(counters);
A04     total++;
A05     val = get_from(futex);
A06   unlock(counters);
A07
A08   sys_futex(futex, FUTEX_WAIT, val, timeout);
A09
A10   lock(counters);
A11     woken++;
A12   unlock(counters);
A13 }

B01 pthread_cond_signal {
B02   lock(counters);
B03   if(total&gt;wake) { /* if there is waiter */
B04     wake++;
B05     update_val(futex);
B06     sys_futex(futex, FUTEX_WAKE, 1);
B07   }
B08   unlock(counters);
B09 }

What we have to notice is:
    FUTEX_WAKE could be called before FUTEX_WAIT have called (at A07).
In such case, FUTEX_WAKE will fail if there is no thread in waitqueue.

However, since pthread_cond_signal do not only wake++ but also
update_val(futex), next FUTEX_WAIT will fail with -EWOULDBLOCK because the val
passed to WAIT is now not equal to updated val.  Therefore, as the result, it
seems that the WAKE wakes the WAIT.

===

The bug will appear if 2 pair of wait &amp; wake called at (nearly)once:

   * Assume 4 threads, wait_A, wait_B, wake_X, and wake_Y
   * counters start from [total/wake/woken]=[0/0/0]
   * the val of futex starts from (0), update means inclement of the val.
   * there is no thread in waitqueue on the futex.

[simulation]

wait_A: calls pthread_cond_wait:
    total++, prepare to call FUTEX_WAIT with val=0.
    # status: [1/0/0] (0) queue={}(empty) #

wake_X: calls pthread_cond_signal:
    no one in waitqueue, just wake++ and update futex val.
    # status: [1/1/0] (1) queue={}(empty) #

wait_B: calls pthread_cond_wait:
    total++, prepare to call FUTEX_WAIT with val=1.
    # status: [2/1/0] (1) queue={}(empty) #

wait_A: calls FUTEX_WAIT with val=0:
    after queueing, compare val. 0!=1 ... this should be blocked...
    # status: [2/1/0] (1) queue={A} #

wait_B: calls FUTEX_WAIT with val=1:
    after queueing, compare val. 1==1 ... OK, let's schedule()...
    # status: [2/1/0] (1) queue={A,B} (B=sleeping) #

wake_Y: calls pthread_cond_signal:
    A is in waitqueue ... dequeue A, wake++ and update futex val.
    # status: [2/2/0] (2) queue={B} (B=sleeping) #

wait_A: end of FUTEX_WAIT with val=0:
    try to dequeue but already dequeued, return anyway.
    # status: [2/2/0] (2) queue={B} (B=sleeping) #

wait_A: end of pthread_cond_wait:
    woken++.
    # status: [2/2/1] (2) queue={B} (B=sleeping) #

This is bug:
   wait_A: wakeup
   wait_B: sleeping
   wake_X: wake A
   wake_Y: wake A again

if subsequent wake_Z try to wake B:

wake_Z: calls pthread_cond_signal:
    since total==wake, do nothing.
    # status: [2/2/1] (2) queue={B} (B=sleeping) #

If wait_C comes, B become to can be woken, but C...

This bug makes the waitqueue to trap some threads in it all time.

====

&gt;  - According to man of futex:
&gt;      "If the futex was not equal to the expected value, the operation
&gt;       returns -EWOULDBLOCK."
&gt;    but now, here is no description about the rare case:
&gt;      "returns 0 if the futex was not equal to the expected value, but
&gt;       the process was woken by a FUTEX_WAKE call."
&gt;    this behavior on rare case causes the hang which I found.

So to avoid this problem, my patch shut up the window that you said:

 &gt; The patch certainly looks sensible - I can see that without the patch,
 &gt; there is a window in which this process is pointlessly queued up on the
 &gt; futex and that in this window a wakeup attempt might do a bad thing.

=====

In short:
There is an un-documented behavior of futex_wait. This behavior misleads
NPTL to wake a thread doubly, as the result, causes an application hang.

Signed-off-by: Hidetoshi Seto &lt;seto.hidetoshi@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
<entry>
<title>[PATCH] Lock initializer unifying (Core)</title>
<updated>2004-10-28T01:33:04Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2004-10-28T01:33:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c83ff1d2b7be6651f369b3687f88f3cf9236b4f6'/>
<id>urn:sha1:c83ff1d2b7be6651f369b3687f88f3cf9236b4f6</id>
<content type='text'>
To make spinlock/rwlock initialization consistent all over the kernel,
this patch converts explicit lock-initializers into spin_lock_init() and
rwlock_init() calls.

Currently, spinlocks and rwlocks are initialized in two different ways:

  lock = SPIN_LOCK_UNLOCKED
  spin_lock_init(&amp;lock)

  rwlock = RW_LOCK_UNLOCKED
  rwlock_init(&amp;rwlock)

this patch converts all explicit lock initializations to
spin_lock_init() or rwlock_init(). (Besides consistency this also helps
automatic lock validators and debugging code.)

The conversion was done with a script, it was verified manually and it
was reviewed, compiled and tested as far as possible on x86, ARM, PPC.

There is no runtime overhead or actual code change resulting out of this
patch, because spin_lock_init() and rwlock_init() are macros and are
thus equivalent to the explicit initialization method.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andrew Morton &lt;akpm@osdl.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@osdl.org&gt;
</content>
</entry>
</feed>
