<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/futex.c, branch v5.15.73</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.15.73</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.15.73'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2021-09-03T21:00:22Z</updated>
<entry>
<title>futex: Remove unused variable 'vpid' in futex_proxy_trylock_atomic()</title>
<updated>2021-09-03T21:00:22Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-09-03T20:47:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d66e3edee7af87fe212df611ab9846b987a5070f'/>
<id>urn:sha1:d66e3edee7af87fe212df611ab9846b987a5070f</id>
<content type='text'>
The recent bug fix left the variable 'vpid' and an assignment to it around,
but the variable is otherwise unused.

clang dose not complain even with W=1, but gcc exposed this.

Fixes: 4f07ec0d76f2 ("futex: Prevent inconsistent state and exit race")
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
</entry>
<entry>
<title>futex: Avoid redundant task lookup</title>
<updated>2021-09-02T20:07:18Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-09-02T09:48:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=340576590dac4bb58d532a8ad5bfa806d8ab473c'/>
<id>urn:sha1:340576590dac4bb58d532a8ad5bfa806d8ab473c</id>
<content type='text'>
No need to do the full VPID based task lookup and validation of the top
waiter when the user space futex was acquired on it's behalf during the
requeue_pi operation. The task is known already and it cannot go away
before requeue_pi_wake_futex() has been invoked.

Split out the actual attach code from attach_pi_state_owner() and use that
instead of the full blown variant.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20210902094414.676104881@linutronix.de


</content>
</entry>
<entry>
<title>futex: Clarify comment for requeue_pi_wake_futex()</title>
<updated>2021-09-02T20:07:18Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-09-02T09:48:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=249955e51c8136189b3c66f54e212981a1350a0f'/>
<id>urn:sha1:249955e51c8136189b3c66f54e212981a1350a0f</id>
<content type='text'>
It's slightly confusing.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20210902094414.618613025@linutronix.de


</content>
</entry>
<entry>
<title>futex: Prevent inconsistent state and exit race</title>
<updated>2021-09-02T20:07:18Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-09-02T09:48:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4f07ec0d76f242d4ca0f0c0c6f7293c28254a554'/>
<id>urn:sha1:4f07ec0d76f242d4ca0f0c0c6f7293c28254a554</id>
<content type='text'>
The recent rework of the requeue PI code introduced a possibility for
going back to user space in inconsistent state:

CPU 0				CPU 1

requeue_futex()
  if (lock_pifutex_user()) {
      dequeue_waiter();
      wake_waiter(task);
				sched_in(task);
     				return_from_futex_syscall();

  ---&gt; Inconsistent state because PI state is not established

It becomes worse if the woken up task immediately exits:

				sys_exit();
				
      attach_pistate(vpid);	&lt;--- FAIL


Attach the pi state before dequeuing and waking the waiter. If the waiter
gets a spurious wakeup before the dequeue operation it will wait in
futex_requeue_pi_wakeup_sync() and therefore cannot return and exit.

Fixes: 07d91ef510fb ("futex: Prevent requeue_pi() lock nesting issue on RT")
Reported-by: syzbot+4d1bd0725ef09168e1a0@syzkaller.appspotmail.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20210902094414.558914045@linutronix.de


</content>
</entry>
<entry>
<title>futex: Return error code instead of assigning it without effect</title>
<updated>2021-09-02T20:07:18Z</updated>
<author>
<name>Colin Ian King</name>
<email>colin.king@canonical.com</email>
</author>
<published>2021-08-18T13:18:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a974b54036f79dd5e395e9f6c80c3decb4661a14'/>
<id>urn:sha1:a974b54036f79dd5e395e9f6c80c3decb4661a14</id>
<content type='text'>
The check on the rt_waiter and top_waiter-&gt;pi_state is assigning an error
return code to ret but this later gets re-assigned, hence the check is
ineffective.

Return -EINVAL rather than assigning it to ret which was the original
intent.

Fixes: dc7109aaa233 ("futex: Validate waiter correctly in futex_proxy_trylock_atomic()")
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King &lt;colin.king@canonical.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: André Almeida &lt;andrealmeid@collabora.com&gt;
Link: https://lore.kernel.org/r/20210818131840.34262-1-colin.king@canonical.com

</content>
</entry>
<entry>
<title>futex: Prevent requeue_pi() lock nesting issue on RT</title>
<updated>2021-08-17T17:05:59Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-08-15T21:29:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=07d91ef510fb16a2e0ca7453222105835b7ba3b8'/>
<id>urn:sha1:07d91ef510fb16a2e0ca7453222105835b7ba3b8</id>
<content type='text'>
The requeue_pi() operation on RT kernels creates a problem versus the
task::pi_blocked_on state when a waiter is woken early (signal, timeout)
and that early wake up interleaves with the requeue_pi() operation.

When the requeue manages to block the waiter on the rtmutex which is
associated to the second futex, then a concurrent early wakeup of that
waiter faces the problem that it has to acquire the hash bucket spinlock,
which is not an issue on non-RT kernels, but on RT kernels spinlocks are
substituted by 'sleeping' spinlocks based on rtmutex. If the hash bucket
lock is contended then blocking on that spinlock would result in a
impossible situation: blocking on two locks at the same time (the hash
bucket lock and the rtmutex representing the PI futex).

It was considered to make the hash bucket locks raw_spinlocks, but
especially requeue operations with a large amount of waiters can introduce
significant latencies, so that's not an option for RT.

The RT tree carried a solution which (ab)used task::pi_blocked_on to store
the information about an ongoing requeue and an early wakeup which worked,
but required to add checks for these special states all over the place.

The distangling of an early wakeup of a waiter for a requeue_pi() operation
is already looking at quite some different states and the task::pi_blocked_on
magic just expanded that to a hard to understand 'state machine'.

This can be avoided by keeping track of the waiter/requeue state in the
futex_q object itself.

Add a requeue_state field to struct futex_q with the following possible
states:

	Q_REQUEUE_PI_NONE
	Q_REQUEUE_PI_IGNORE
	Q_REQUEUE_PI_IN_PROGRESS
	Q_REQUEUE_PI_WAIT
	Q_REQUEUE_PI_DONE
	Q_REQUEUE_PI_LOCKED

The waiter starts with state = NONE and the following state transitions are
valid:

On the waiter side:
  Q_REQUEUE_PI_NONE		-&gt; Q_REQUEUE_PI_IGNORE
  Q_REQUEUE_PI_IN_PROGRESS	-&gt; Q_REQUEUE_PI_WAIT

On the requeue side:
  Q_REQUEUE_PI_NONE		-&gt; Q_REQUEUE_PI_INPROGRESS
  Q_REQUEUE_PI_IN_PROGRESS	-&gt; Q_REQUEUE_PI_DONE/LOCKED
  Q_REQUEUE_PI_IN_PROGRESS	-&gt; Q_REQUEUE_PI_NONE (requeue failed)
  Q_REQUEUE_PI_WAIT		-&gt; Q_REQUEUE_PI_DONE/LOCKED
  Q_REQUEUE_PI_WAIT		-&gt; Q_REQUEUE_PI_IGNORE (requeue failed)

The requeue side ignores a waiter with state Q_REQUEUE_PI_IGNORE as this
signals that the waiter is already on the way out. It also means that
the waiter is still on the 'wait' futex, i.e. uaddr1.

The waiter side signals early wakeup to the requeue side either through
setting state to Q_REQUEUE_PI_IGNORE or to Q_REQUEUE_PI_WAIT depending
on the current state. In case of Q_REQUEUE_PI_IGNORE it can immediately
proceed to take the hash bucket lock of uaddr1. If it set state to WAIT,
which means the wakeup is interleaving with a requeue in progress it has
to wait for the requeue side to change the state. Either to DONE/LOCKED
or to IGNORE. DONE/LOCKED means the waiter q is now on the uaddr2 futex
and either blocked (DONE) or has acquired it (LOCKED). IGNORE is set by
the requeue side when the requeue attempt failed via deadlock detection
and therefore the waiter's futex_q is still on the uaddr1 futex.

While this is not strictly required on !RT making this unconditional has
the benefit of common code and it also allows the waiter to avoid taking
the hash bucket lock on the way out in certain cases, which reduces
contention.

Add the required helpers required for the state transitions, invoke them at
the right places and restructure the futex_wait_requeue_pi() code to handle
the return from wait (early or not) based on the state machine values.

On !RT enabled kernels the waiter spin waits for the state going from
Q_REQUEUE_PI_WAIT to some other state, on RT enabled kernels this is
handled by rcuwait_wait_event() and the corresponding wake up on the
requeue side.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20210815211305.693317658@linutronix.de
</content>
</entry>
<entry>
<title>futex: Simplify handle_early_requeue_pi_wakeup()</title>
<updated>2021-08-17T17:05:57Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-08-15T21:29:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6231acbd0802e76580c71ceb52c09646d42170fb'/>
<id>urn:sha1:6231acbd0802e76580c71ceb52c09646d42170fb</id>
<content type='text'>
Move the futex key match out of handle_early_requeue_pi_wakeup() which
allows to simplify that function. The upcoming state machine for
requeue_pi() will make that go away.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20210815211305.638938670@linutronix.de
</content>
</entry>
<entry>
<title>futex: Reorder sanity checks in futex_requeue()</title>
<updated>2021-08-17T17:05:54Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-08-15T21:29:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d69cba5c719b0c551f6380ec5da4ed8c20a3815a'/>
<id>urn:sha1:d69cba5c719b0c551f6380ec5da4ed8c20a3815a</id>
<content type='text'>
No point in allocating memory when the input parameters are bogus.
Validate all parameters before proceeding.

Suggested-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20210815211305.581789253@linutronix.de
</content>
</entry>
<entry>
<title>futex: Clarify comment in futex_requeue()</title>
<updated>2021-08-17T17:05:51Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-08-15T21:29:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c18eaa3aca43688a3aee199d85ce4227686a29b6'/>
<id>urn:sha1:c18eaa3aca43688a3aee199d85ce4227686a29b6</id>
<content type='text'>
The comment about the restriction of the number of waiters to wake for the
REQUEUE_PI case is confusing at best. Rewrite it.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20210815211305.524990421@linutronix.de
</content>
</entry>
<entry>
<title>futex: Restructure futex_requeue()</title>
<updated>2021-08-17T17:05:49Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-08-15T21:29:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=64b7b715f7f92ae3233446b4a4cdda3524fcd4b0'/>
<id>urn:sha1:64b7b715f7f92ae3233446b4a4cdda3524fcd4b0</id>
<content type='text'>
No point in taking two more 'requeue_pi' conditionals just to get to the
requeue. Same for the requeue_pi case just the other way round.

No functional change.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20210815211305.468835790@linutronix.de
</content>
</entry>
</feed>
