<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/futex.c, branch v5.0.16</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.0.16</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.0.16'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2019-05-10T16:36:14Z</updated>
<entry>
<title>locking/futex: Allow low-level atomic operations to return -EAGAIN</title>
<updated>2019-05-10T16:36:14Z</updated>
<author>
<name>Will Deacon</name>
<email>will.deacon@arm.com</email>
</author>
<published>2019-02-28T11:58:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3b928b59fae0e10d856f4a58e67839384239e507'/>
<id>urn:sha1:3b928b59fae0e10d856f4a58e67839384239e507</id>
<content type='text'>
commit 6b4f4bc9cb22875f97023984a625386f0c7cc1c0 upstream.

Some futex() operations, including FUTEX_WAKE_OP, require the kernel to
perform an atomic read-modify-write of the futex word via the userspace
mapping. These operations are implemented by each architecture in
arch_futex_atomic_op_inuser() and futex_atomic_cmpxchg_inatomic(), which
are called in atomic context with the relevant hash bucket locks held.

Although these routines may return -EFAULT in response to a page fault
generated when accessing userspace, they are expected to succeed (i.e.
return 0) in all other cases. This poses a problem for architectures
that do not provide bounded forward progress guarantees or fairness of
contended atomic operations and can lead to starvation in some cases.

In these problematic scenarios, we must return back to the core futex
code so that we can drop the hash bucket locks and reschedule if
necessary, much like we do in the case of a page fault.

Allow architectures to return -EAGAIN from their implementations of
arch_futex_atomic_op_inuser() and futex_atomic_cmpxchg_inatomic(), which
will cause the core futex code to reschedule if necessary and return
back to the architecture code later on.

Cc: &lt;stable@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>futex: Ensure that futex address is aligned in handle_futex_death()</title>
<updated>2019-03-27T05:17:54Z</updated>
<author>
<name>Chen Jie</name>
<email>chenjie6@huawei.com</email>
</author>
<published>2019-03-15T03:44:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8dfb1e702caa46124669964fac5ebdab040137f3'/>
<id>urn:sha1:8dfb1e702caa46124669964fac5ebdab040137f3</id>
<content type='text'>
commit 5a07168d8d89b00fe1760120714378175b3ef992 upstream.

The futex code requires that the user space addresses of futexes are 32bit
aligned. sys_futex() checks this in futex_get_keys() but the robust list
code has no alignment check in place.

As a consequence the kernel crashes on architectures with strict alignment
requirements in handle_futex_death() when trying to cmpxchg() on an
unaligned futex address which was retrieved from the robust list.

[ tglx: Rewrote changelog, proper sizeof() based alignement check and add
  	comment ]

Fixes: 0771dfefc9e5 ("[PATCH] lightweight robust futexes: core")
Signed-off-by: Chen Jie &lt;chenjie6@huawei.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: &lt;dvhart@infradead.org&gt;
Cc: &lt;peterz@infradead.org&gt;
Cc: &lt;zengweilin@huawei.com&gt;
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1552621478-119787-1-git-send-email-chenjie6@huawei.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>futex: Handle early deadlock return correctly</title>
<updated>2019-02-08T12:00:36Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-01-29T22:15:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1a1fb985f2e2b85ec0d3dc2e519ee48389ec2434'/>
<id>urn:sha1:1a1fb985f2e2b85ec0d3dc2e519ee48389ec2434</id>
<content type='text'>
commit 56222b212e8e ("futex: Drop hb-&gt;lock before enqueueing on the
rtmutex") changed the locking rules in the futex code so that the hash
bucket lock is not longer held while the waiter is enqueued into the
rtmutex wait list. This made the lock and the unlock path symmetric, but
unfortunately the possible early exit from __rt_mutex_proxy_start() due to
a detected deadlock was not updated accordingly. That allows a concurrent
unlocker to observe inconsitent state which triggers the warning in the
unlock path.

futex_lock_pi()                         futex_unlock_pi()
  lock(hb-&gt;lock)
  queue(hb_waiter)				lock(hb-&gt;lock)
  lock(rtmutex-&gt;wait_lock)
  unlock(hb-&gt;lock)
                                        // acquired hb-&gt;lock
                                        hb_waiter = futex_top_waiter()
                                        lock(rtmutex-&gt;wait_lock)
  __rt_mutex_proxy_start()
     ---&gt; fail
          remove(rtmutex_waiter);
     ---&gt; returns -EDEADLOCK
  unlock(rtmutex-&gt;wait_lock)
                                        // acquired wait_lock
                                        wake_futex_pi()
                                        rt_mutex_next_owner()
					  --&gt; returns NULL
                                          --&gt; WARN

  lock(hb-&gt;lock)
  unqueue(hb_waiter)

The problem is caused by the remove(rtmutex_waiter) in the failure case of
__rt_mutex_proxy_start() as this lets the unlocker observe a waiter in the
hash bucket but no waiter on the rtmutex, i.e. inconsistent state.

The original commit handles this correctly for the other early return cases
(timeout, signal) by delaying the removal of the rtmutex waiter until the
returning task reacquired the hash bucket lock.

Treat the failure case of __rt_mutex_proxy_start() in the same way and let
the existing cleanup code handle the eventual handover of the rtmutex
gracefully. The regular rt_mutex_proxy_start() gains the rtmutex waiter
removal for the failure case, so that the other callsites are still
operating correctly.

Add proper comments to the code so all these details are fully documented.

Thanks to Peter for helping with the analysis and writing the really
valuable code comments.

Fixes: 56222b212e8e ("futex: Drop hb-&gt;lock before enqueueing on the rtmutex")
Reported-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Co-developed-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: linux-s390@vger.kernel.org
Cc: Stefan Liebler &lt;stli@linux.ibm.com&gt;
Cc: Sebastian Sewior &lt;bigeasy@linutronix.de&gt;
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1901292311410.1950@nanos.tec.linutronix.de

</content>
</entry>
<entry>
<title>futex: Fix barrier comment</title>
<updated>2019-02-08T12:00:35Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2019-02-06T18:56:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6f568ebe2afefdc33a6fb06ef20a94f8b96455f1'/>
<id>urn:sha1:6f568ebe2afefdc33a6fb06ef20a94f8b96455f1</id>
<content type='text'>
The current comment for the barrier that guarantees that waiter increment
is always before taking the hb spinlock (barrier (A)) needs to be fixed as
it is misplaced.

This is obviously referring to hb_waiters_inc, which is a full barrier.

Reported-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lkml.kernel.org/r/20190206185602.949-1-dave@stgolabs.net

</content>
</entry>
<entry>
<title>futex: Fix (possible) missed wakeup</title>
<updated>2019-01-21T10:15:38Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2018-11-29T13:44:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b061c38bef43406df8e73c5be06cbfacad5ee6ad'/>
<id>urn:sha1:b061c38bef43406df8e73c5be06cbfacad5ee6ad</id>
<content type='text'>
We must not rely on wake_q_add() to delay the wakeup; in particular
commit:

  1d0dcb3ad9d3 ("futex: Implement lockless wakeups")

moved wake_q_add() before smp_store_release(&amp;q-&gt;lock_ptr, NULL), which
could result in futex_wait() waking before observing -&gt;lock_ptr ==
NULL and going back to sleep again.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: 1d0dcb3ad9d3 ("futex: Implement lockless wakeups")
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>Remove 'type' argument from access_ok() function</title>
<updated>2019-01-04T02:57:57Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-01-04T02:57:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=96d4f267e40f9509e8a66e2b39e8b95655617693'/>
<id>urn:sha1:96d4f267e40f9509e8a66e2b39e8b95655617693</id>
<content type='text'>
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.

It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access.  But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.

A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model.  And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.

This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.

There were a couple of notable cases:

 - csky still had the old "verify_area()" name as an alias.

 - the iter_iov code had magical hardcoded knowledge of the actual
   values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
   really used it)

 - microblaze used the type argument for a debug printout

but other than those oddities this should be a total no-op patch.

I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something.  Any missed conversion should be trivially fixable, though.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'y2038-for-4.21' of ssh://gitolite.kernel.org:/pub/scm/linux/kernel/git/arnd/playground</title>
<updated>2018-12-28T20:45:04Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-12-28T20:45:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b12a9124eeb71d766a3e3eb594ebbb3fefc66902'/>
<id>urn:sha1:b12a9124eeb71d766a3e3eb594ebbb3fefc66902</id>
<content type='text'>
Pull y2038 updates from Arnd Bergmann:
 "More syscalls and cleanups

  This concludes the main part of the system call rework for 64-bit
  time_t, which has spread over most of year 2018, the last six system
  calls being

    - ppoll
    - pselect6
    - io_pgetevents
    - recvmmsg
    - futex
    - rt_sigtimedwait

  As before, nothing changes for 64-bit architectures, while 32-bit
  architectures gain another entry point that differs only in the layout
  of the timespec structure. Hopefully in the next release we can wire
  up all 22 of those system calls on all 32-bit architectures, which
  gives us a baseline version for glibc to start using them.

  This does not include the clock_adjtime, getrusage/waitid, and
  getitimer/setitimer system calls. I still plan to have new versions of
  those as well, but they are not required for correct operation of the
  C library since they can be emulated using the old 32-bit time_t based
  system calls.

  Aside from the system calls, there are also a few cleanups here,
  removing old kernel internal interfaces that have become unused after
  all references got removed. The arch/sh cleanups are part of this,
  there were posted several times over the past year without a reaction
  from the maintainers, while the corresponding changes made it into all
  other architectures"

* tag 'y2038-for-4.21' of ssh://gitolite.kernel.org:/pub/scm/linux/kernel/git/arnd/playground:
  timekeeping: remove obsolete time accessors
  vfs: replace current_kernel_time64 with ktime equivalent
  timekeeping: remove timespec_add/timespec_del
  timekeeping: remove unused {read,update}_persistent_clock
  sh: remove board_time_init() callback
  sh: remove unused rtc_sh_get/set_time infrastructure
  sh: sh03: rtc: push down rtc class ops into driver
  sh: dreamcast: rtc: push down rtc class ops into driver
  y2038: signal: Add compat_sys_rt_sigtimedwait_time64
  y2038: signal: Add sys_rt_sigtimedwait_time32
  y2038: socket: Add compat_sys_recvmmsg_time64
  y2038: futex: Add support for __kernel_timespec
  y2038: futex: Move compat implementation into futex.c
  io_pgetevents: use __kernel_timespec
  pselect6: use __kernel_timespec
  ppoll: use __kernel_timespec
  signal: Add restore_user_sigmask()
  signal: Add set_user_sigmask()
</content>
</entry>
<entry>
<title>futex: Cure exit race</title>
<updated>2018-12-18T22:13:15Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2018-12-10T13:35:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=da791a667536bf8322042e38ca85d55a78d3c273'/>
<id>urn:sha1:da791a667536bf8322042e38ca85d55a78d3c273</id>
<content type='text'>
Stefan reported, that the glibc tst-robustpi4 test case fails
occasionally. That case creates the following race between
sys_exit() and sys_futex_lock_pi():

 CPU0				CPU1

 sys_exit()			sys_futex()
  do_exit()			 futex_lock_pi()
   exit_signals(tsk)		  No waiters:
    tsk-&gt;flags |= PF_EXITING;	  *uaddr == 0x00000PID
  mm_release(tsk)		  Set waiter bit
   exit_robust_list(tsk) {	  *uaddr = 0x80000PID;
      Set owner died		  attach_to_pi_owner() {
    *uaddr = 0xC0000000;	   tsk = get_task(PID);
   }				   if (!tsk-&gt;flags &amp; PF_EXITING) {
  ...				     attach();
  tsk-&gt;flags |= PF_EXITPIDONE;	   } else {
				     if (!(tsk-&gt;flags &amp; PF_EXITPIDONE))
				       return -EAGAIN;
				     return -ESRCH; &lt;--- FAIL
				   }

ESRCH is returned all the way to user space, which triggers the glibc test
case assert. Returning ESRCH unconditionally is wrong here because the user
space value has been changed by the exiting task to 0xC0000000, i.e. the
FUTEX_OWNER_DIED bit is set and the futex PID value has been cleared. This
is a valid state and the kernel has to handle it, i.e. taking the futex.

Cure it by rereading the user space value when PF_EXITING and PF_EXITPIDONE
is set in the task which 'owns' the futex. If the value has changed, let
the kernel retry the operation, which includes all regular sanity checks
and correctly handles the FUTEX_OWNER_DIED case.

If it hasn't changed, then return ESRCH as there is no way to distinguish
this case from malfunctioning user space. This happens when the exiting
task did not have a robust list, the robust list was corrupted or the user
space value in the futex was simply bogus.

Reported-by: Stefan Liebler &lt;stli@linux.ibm.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Heiko Carstens &lt;heiko.carstens@de.ibm.com&gt;
Cc: Darren Hart &lt;dvhart@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Sasha Levin &lt;sashal@kernel.org&gt;
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200467
Link: https://lkml.kernel.org/r/20181210152311.986181245@linutronix.de
</content>
</entry>
<entry>
<title>y2038: futex: Add support for __kernel_timespec</title>
<updated>2018-12-07T21:19:07Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2018-04-17T15:23:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bec2f7cbb73eadf5e1cc7d54ecb0980ede244257'/>
<id>urn:sha1:bec2f7cbb73eadf5e1cc7d54ecb0980ede244257</id>
<content type='text'>
This prepares sys_futex for y2038 safe calling: the native
syscall is changed to receive a __kernel_timespec argument, which
will be switched to 64-bit time_t in the future. All the internal
time handling gets changed to timespec64, and the compat_sys_futex
entry point is moved under the CONFIG_COMPAT_32BIT_TIME check
to provide compatibility for existing 32-bit architectures.

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</content>
</entry>
<entry>
<title>y2038: futex: Move compat implementation into futex.c</title>
<updated>2018-12-07T21:19:07Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2018-04-17T14:31:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=04e7712f4460585e5eed5b853fd8b82a9943958f'/>
<id>urn:sha1:04e7712f4460585e5eed5b853fd8b82a9943958f</id>
<content type='text'>
We are going to share the compat_sys_futex() handler between 64-bit
architectures and 32-bit architectures that need to deal with both 32-bit
and 64-bit time_t, and this is easier if both entry points are in the
same file.

In fact, most other system call handlers do the same thing these days, so
let's follow the trend here and merge all of futex_compat.c into futex.c.

In the process, a few minor changes have to be done to make sure everything
still makes sense: handle_futex_death() and futex_cmpxchg_enabled() become
local symbol, and the compat version of the fetch_robust_entry() function
gets renamed to compat_fetch_robust_entry() to avoid a symbol clash.

This is intended as a purely cosmetic patch, no behavior should
change.

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</content>
</entry>
</feed>
