<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/futex.c, branch v2.6.27.8</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v2.6.27.8</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v2.6.27.8'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2008-06-23T11:31:15Z</updated>
<entry>
<title>futexes: fix fault handling in futex_lock_pi</title>
<updated>2008-06-23T11:31:15Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2008-06-23T09:21:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1b7558e457ed0de61023cfc913d2c342c7c3d9f2'/>
<id>urn:sha1:1b7558e457ed0de61023cfc913d2c342c7c3d9f2</id>
<content type='text'>
This patch addresses a very sporadic pi-futex related failure in
highly threaded java apps on large SMP systems.

David Holmes reported that the pi_state consistency check in
lookup_pi_state triggered with his test application. This means that
the kernel internal pi_state and the user space futex variable are out
of sync. First we assumed that this is a user space data corruption,
but deeper investigation revieled that the problem happend because the
pi-futex code is not handling a fault in the futex_lock_pi path when
the user space variable needs to be fixed up.

The fault happens when a fork mapped the anon memory which contains
the futex readonly for COW or the page got swapped out exactly between
the unlock of the futex and the return of either the new futex owner
or the task which was the expected owner but failed to acquire the
kernel internal rtmutex. The current futex_lock_pi() code drops out
with an inconsistent in case it faults and returns -EFAULT to user
space. User space has no way to fixup that state.

When we wrote this code we thought that we could not drop the hash
bucket lock at this point to handle the fault.

After analysing the code again it turned out to be wrong because there
are only two tasks involved which might modify the pi_state and the
user space variable:

 - the task which acquired the rtmutex
 - the pending owner of the pi_state which did not get the rtmutex

Both tasks drop into the fixup_pi_state() function before returning to
user space. The first task which acquired the hash bucket lock faults
in the fixup of the user space variable, drops the spinlock and calls
futex_handle_fault() to fault in the page. Now the second task could
acquire the hash bucket lock and tries to fixup the user space
variable as well. It either faults as well or it succeeds because the
first task already faulted the page in.

One caveat is to avoid a double fixup. After returning from the fault
handling we reacquire the hash bucket lock and check whether the
pi_state owner has been modified already.

Reported-by: David Holmes &lt;david.holmes@sun.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: David Holmes &lt;david.holmes@sun.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;

 kernel/futex.c |   93 ++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 73 insertions(+), 20 deletions(-)
</content>
</entry>
<entry>
<title>Removal of FUTEX_FD</title>
<updated>2008-05-05T15:18:45Z</updated>
<author>
<name>Eric Sesterhenn</name>
<email>snakebyte@gmx.de</email>
</author>
<published>2008-01-25T09:40:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=82af7aca56c67061420d618cc5a30f0fd4106b80'/>
<id>urn:sha1:82af7aca56c67061420d618cc5a30f0fd4106b80</id>
<content type='text'>
Since FUTEX_FD was scheduled for removal in June 2007 lets remove it.

Google Code search found no users for it and NGPT was abandoned in 2003
according to IBM.  futex.h is left untouched to make sure the id does
not get reassigned.  Since queue_me() has no users left it is commented
out to avoid a warning, i didnt remove it completely since it is part of
the internal api (matching unqueue_me())

Signed-off-by: Eric Sesterhenn &lt;snakebyte@gmx.de&gt;
Signed-off-by: Rusty Russell &lt;rusty@rustcorp.com.au&gt; (removed rest)
Acked-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>add hrtimer specific debugobjects code</title>
<updated>2008-04-30T15:29:53Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2008-04-30T07:55:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=237fc6e7a35076f584b9d0794a5204fe4bd9b9e5'/>
<id>urn:sha1:237fc6e7a35076f584b9d0794a5204fe4bd9b9e5</id>
<content type='text'>
hrtimers have now dynamic users in the network code.  Put them under
debugobjects surveillance as well.

Add calls to the generic object debugging infrastructure and provide fixup
functions which allow to keep the system alive when recoverable problems have
been detected by the object debugging core code.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Greg KH &lt;greg@kroah.com&gt;
Cc: Randy Dunlap &lt;randy.dunlap@oracle.com&gt;
Cc: Kay Sievers &lt;kay.sievers@vrfy.org&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>NULL noise: fs/*, mm/*, kernel/*</title>
<updated>2008-03-30T21:18:41Z</updated>
<author>
<name>Al Viro</name>
<email>viro@ftp.linux.org.uk</email>
</author>
<published>2008-03-29T03:07:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9dce07f1a441b77a15631cf0ed0238e0baa7ed64'/>
<id>urn:sha1:9dce07f1a441b77a15631cf0ed0238e0baa7ed64</id>
<content type='text'>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Give futex init a proper name</title>
<updated>2008-03-27T15:02:13Z</updated>
<author>
<name>Benjamin Herrenschmidt</name>
<email>benh@ozlabs.org</email>
</author>
<published>2008-03-27T03:52:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f6d107fb10def502522b10bfb7af9533afbb8274'/>
<id>urn:sha1:f6d107fb10def502522b10bfb7af9533afbb8274</id>
<content type='text'>
The futex init function is called init(). This is a pain in the neck
when debugging when you code dies in ... init :-)

This renames it to futex_init().

Signed-off-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>futex: runtime enable pi and robust functionality</title>
<updated>2008-02-24T01:12:15Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2008-02-23T23:23:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a0c1e9073ef7428a14309cba010633a6cd6719ea'/>
<id>urn:sha1:a0c1e9073ef7428a14309cba010633a6cd6719ea</id>
<content type='text'>
Not all architectures implement futex_atomic_cmpxchg_inatomic().  The default
implementation returns -ENOSYS, which is currently not handled inside of the
futex guts.

Futex PI calls and robust list exits with a held futex result in an endless
loop in the futex code on architectures which have no support.

Fixing up every place where futex_atomic_cmpxchg_inatomic() is called would
add a fair amount of extra if/else constructs to the already complex code.  It
is also not possible to disable the robust feature before user space tries to
register robust lists.

Compile time disabling is not a good idea either, as there are already
architectures with runtime detection of futex_atomic_cmpxchg_inatomic support.

Detect the functionality at runtime instead by calling
cmpxchg_futex_value_locked() with a NULL pointer from the futex initialization
code.  This is guaranteed to fail, but the call of
futex_atomic_cmpxchg_inatomic() happens with pagefaults disabled.

On architectures, which use the asm-generic implementation or have a runtime
CPU feature detection, a -ENOSYS return value disables the PI/robust features.

On architectures with a working implementation the call returns -EFAULT and
the PI/robust features are enabled.

The relevant syscalls return -ENOSYS and the robust list exit code is blocked,
when the detection fails.

Fixes http://lkml.org/lkml/2008/2/11/149
Originally reported by: Lennart Buytenhek

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Lennert Buytenhek &lt;buytenh@wantstofly.org&gt;
Cc: Riku Voipio &lt;riku.voipio@movial.fi&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>futex: fix init order</title>
<updated>2008-02-24T01:12:15Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2008-02-23T23:23:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3e4ab747efa8e78562ec6782b08bbf21a00aba1b'/>
<id>urn:sha1:3e4ab747efa8e78562ec6782b08bbf21a00aba1b</id>
<content type='text'>
When the futex init code fails to initialize the futex pseudo file system it
returns early without initializing the hash queues.  Should the boot succeed
then a futex syscall which tries to enqueue a waiter on the hashqueue will
crash due to the unitilialized plist heads.

Initialize the hash queues before the filesystem.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Lennert Buytenhek &lt;buytenh@wantstofly.org&gt;
Cc: Riku Voipio &lt;riku.voipio@movial.fi&gt;
Cc: &lt;stable@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>hrtimer: check relative timeouts for overflow</title>
<updated>2008-02-14T21:08:30Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2008-02-13T08:20:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5a7780e725d1bb4c3094fcc12f1c5c5faea1e988'/>
<id>urn:sha1:5a7780e725d1bb4c3094fcc12f1c5c5faea1e988</id>
<content type='text'>
Various user space callers ask for relative timeouts. While we fixed
that overflow issue in hrtimer_start(), the sites which convert
relative user space values to absolute timeouts themself were uncovered.

Instead of putting overflow checks into each place add a function
which does the sanity checking and convert all affected callers to use
it.

Thanks to Frans Pop, who reported the problem and tested the fixes.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Tested-by: Frans Pop &lt;elendil@planet.nl&gt;

</content>
</entry>
<entry>
<title>futex: Add bitset conditional wait/wakeup functionality</title>
<updated>2008-02-01T16:45:14Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2008-02-01T16:45:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cd689985cf49f6ff5c8eddc48d98b9d581d9475d'/>
<id>urn:sha1:cd689985cf49f6ff5c8eddc48d98b9d581d9475d</id>
<content type='text'>
To allow the implementation of optimized rw-locks in user space, glibc
needs a possibility to select waiters for wakeup depending on a bitset
mask.

This requires two new futex OPs: FUTEX_WAIT_BITS and FUTEX_WAKE_BITS
These OPs are basically the same as FUTEX_WAIT and FUTEX_WAKE plus an
additional argument - a bitset. Further the FUTEX_WAIT_BITS OP is
expecting an absolute timeout value instead of the relative one, which
is used for the FUTEX_WAIT OP.

FUTEX_WAIT_BITS calls into the kernel with a bitset. The bitset is
stored in the futex_q structure, which is used to enqueue the waiter
into the hashed futex waitqueue.

FUTEX_WAKE_BITS also calls into the kernel with a bitset. The wakeup
function logically ANDs the bitset with the bitset stored in each
waiters futex_q structure. If the result is zero (i.e. none of the set
bits in the bitsets is matching), then the waiter is not woken up. If
the result is not zero (i.e. one of the set bits in the bitsets is
matching), then the waiter is woken.

The bitset provided by the caller must be non zero. In case the
provided bitset is zero the kernel returns EINVAL.

Internaly the new OPs are only extensions to the existing FUTEX_WAIT
and FUTEX_WAKE functions. The existing OPs hand a bitset with all bits
set into the futex_wait() and futex_wake() functions.

Signed-off-by: Thomas Gleixner &lt;tgxl@linutronix.de&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>futex: Remove warn on in return fixup path</title>
<updated>2008-02-01T16:45:14Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2008-02-01T16:45:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=83e96c604e781098a2f61b8a294919bcf3abfab4'/>
<id>urn:sha1:83e96c604e781098a2f61b8a294919bcf3abfab4</id>
<content type='text'>
The WARN_ON() in the fixup return path of futex_lock_pi() can
trigger with false positives.

The following scenario happens:
t1 holds the futex and t2 and t3 are blocked on the kernel side rt_mutex.
t1 releases the futex (and the rt_mutex) and assigned t2 to be the next
owner of the futex.

t2 is interrupted and returns w/o acquiring the rt_mutex, before t1 can
release the rtmutex.

t1 releases the rtmutex and t3 becomes the pending owner of the rtmutex.

t2 notices that it is the designated owner (user space variable) and
fails to acquire the rt_mutex via trylock, because it is not allowed to
steal the rt_mutex from t3. Now it looks at the rt_mutex pending owner (t3)
and assigns the futex and the pi_state to it.

During the fixup t4 steals the rtmutex from t3.

t2 returns from the fixup and the owner of the rt_mutex has changed from
t3 to t4.

There is no need to do another round of fixups from t2. The important
part (t2 is not returning as the user space visible owner) is
done. The further fixups are done, before either t3 or t4 return to
user space.

For the user space it is not relevant which task (t3 or t4) is the real
owner, as long as those are both in the kernel, which is guaranteed by
the serialization of the hash bucket lock. Both tasks (which ever returns
first to userspace - t4 because it locked the rt_mutex or t3 due to a signal)
are going through the lock_futex_pi() return path where the ownership is
fixed before the return to user space.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;

Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
</feed>
