<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/futex, branch v6.17.8</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.17.8</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.17.8'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2025-11-13T20:36:52Z</updated>
<entry>
<title>futex: Don't leak robust_list pointer on exec race</title>
<updated>2025-11-13T20:36:52Z</updated>
<author>
<name>Pranav Tyagi</name>
<email>pranav.tyagi03@gmail.com</email>
</author>
<published>2025-09-15T18:21:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b524455a51feb6013df3a5dba3160487b2e8e22a'/>
<id>urn:sha1:b524455a51feb6013df3a5dba3160487b2e8e22a</id>
<content type='text'>
[ Upstream commit 6b54082c3ed4dc9821cdf0edb17302355cc5bb45 ]

sys_get_robust_list() and compat_get_robust_list() use ptrace_may_access()
to check if the calling task is allowed to access another task's
robust_list pointer. This check is racy against a concurrent exec() in the
target process.

During exec(), a task may transition from a non-privileged binary to a
privileged one (e.g., setuid binary) and its credentials/memory mappings
may change. If get_robust_list() performs ptrace_may_access() before
this transition, it may erroneously allow access to sensitive information
after the target becomes privileged.

A racy access allows an attacker to exploit a window during which
ptrace_may_access() passes before a target process transitions to a
privileged state via exec().

For example, consider a non-privileged task T that is about to execute a
setuid-root binary. An attacker task A calls get_robust_list(T) while T
is still unprivileged. Since ptrace_may_access() checks permissions
based on current credentials, it succeeds. However, if T begins exec
immediately afterwards, it becomes privileged and may change its memory
mappings. Because get_robust_list() proceeds to access T-&gt;robust_list
without synchronizing with exec() it may read user-space pointers from a
now-privileged process.

This violates the intended post-exec access restrictions and could
expose sensitive memory addresses or be used as a primitive in a larger
exploit chain. Consequently, the race can lead to unauthorized
disclosure of information across privilege boundaries and poses a
potential security risk.

Take a read lock on signal-&gt;exec_update_lock prior to invoking
ptrace_may_access() and accessing the robust_list/compat_robust_list.
This ensures that the target task's exec state remains stable during the
check, allowing for consistent and synchronized validation of
credentials.

Suggested-by: Jann Horn &lt;jann@thejh.net&gt;
Signed-off-by: Pranav Tyagi &lt;pranav.tyagi03@gmail.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/linux-fsdevel/1477863998-3298-5-git-send-email-jann@thejh.net/
Link: https://github.com/KSPP/linux/issues/119
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>futex: Prevent use-after-free during requeue-PI</title>
<updated>2025-09-20T15:40:42Z</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2025-09-10T10:42:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b549113738e8c751b613118032a724b772aa83f2'/>
<id>urn:sha1:b549113738e8c751b613118032a724b772aa83f2</id>
<content type='text'>
syzbot managed to trigger the following race:

   T1                               T2

 futex_wait_requeue_pi()
   futex_do_wait()
     schedule()
                               futex_requeue()
                                 futex_proxy_trylock_atomic()
                                   futex_requeue_pi_prepare()
                                   requeue_pi_wake_futex()
                                     futex_requeue_pi_complete()
                                      /* preempt */

         * timeout/ signal wakes T1 *

   futex_requeue_pi_wakeup_sync() // Q_REQUEUE_PI_LOCKED
   futex_hash_put()
  // back to userland, on stack futex_q is garbage

                                      /* back */
                                     wake_up_state(q-&gt;task, TASK_NORMAL);

In this scenario futex_wait_requeue_pi() is able to leave without using
futex_q::lock_ptr for synchronization.

This can be prevented by reading futex_q::task before updating the
futex_q::requeue_state. A reference on the task_struct is not needed
because requeue_pi_wake_futex() is invoked with a spinlock_t held which
implies a RCU read section.

Even if T1 terminates immediately after, the task_struct will remain valid
during T2's wake_up_state().  A READ_ONCE on futex_q::task before
futex_requeue_pi_complete() is enough because it ensures that the variable
is read before the state is updated.

Read futex_q::task before updating the requeue state, use it for the
following wakeup.

Fixes: 07d91ef510fb1 ("futex: Prevent requeue_pi() lock nesting issue on RT")
Reported-by: syzbot+034246a838a10d181e78@syzkaller.appspotmail.com
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Closes: https://lore.kernel.org/all/68b75989.050a0220.3db4df.01dd.GAE@google.com/
</content>
</entry>
<entry>
<title>futex: Move futex_hash_free() back to __mmput()</title>
<updated>2025-08-31T09:48:19Z</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2025-08-22T14:12:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d9b05321e21e4b218de4ce8a590bf375f58b6346'/>
<id>urn:sha1:d9b05321e21e4b218de4ce8a590bf375f58b6346</id>
<content type='text'>
To avoid a memory leak via mm_alloc() + mmdrop() the futex cleanup code
has been moved to __mmdrop(). This resulted in a warnings if the futex
hash table has been allocated via vmalloc() the mmdrop() was invoked
from atomic context.
The free path must stay in __mmput() to ensure it is invoked from
preemptible context.

In order to avoid the memory leak, delay the allocation of
mm_struct::mm-&gt;futex_ref to futex_hash_allocate(). This works because
neither the per-CPU counter nor the private hash has been allocated and
therefore
- futex_private_hash() callers (such as exit_pi_state_list()) don't
  acquire reference if there is no private hash yet. There is also no
  reference put.

- Regular callers (futex_hash()) fallback to global hash. No reference
  counting here.

The futex_ref member can be allocated in futex_hash_allocate() before
the private hash itself is allocated. This happens either while the
first thread is created or on request. In both cases the process has
just a single thread so there can be either futex operation in progress
or the request to create a private hash.

Move futex_hash_free() back to __mmput();
Move the allocation of mm_struct::futex_ref to futex_hash_allocate().

  [ bp: Fold a follow-up fix to prevent a use-after-free:
    https://lore.kernel.org/r/20250830213806.sEKuuGSm@linutronix.de ]

Fixes:  e703b7e247503 ("futex: Move futex cleanup to __mmdrop()")
Closes: https://lore.kernel.org/all/20250821102721.6deae493@kernel.org/
Reported-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lkml.kernel.org/r/20250822141238.PfnkTjFb@linutronix.de
</content>
</entry>
<entry>
<title>futex: Use user_write_access_begin/_end() in futex_put_value()</title>
<updated>2025-08-11T15:53:21Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2025-08-11T14:11:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=dfb36e4a8db0cd56f92d4cb445f54e85a9b40897'/>
<id>urn:sha1:dfb36e4a8db0cd56f92d4cb445f54e85a9b40897</id>
<content type='text'>
Commit cec199c5e39b ("futex: Implement FUTEX2_NUMA") introduced the
futex_put_value() helper to write a value to the given user
address.

However, it uses user_read_access_begin() before the write. For
architectures that differentiate between read and write accesses, like
PowerPC, futex_put_value() fails with -EFAULT.

Fix that by using the user_write_access_begin/user_write_access_end() pair
instead.

Fixes: cec199c5e39b ("futex: Implement FUTEX2_NUMA")
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250811141147.322261-1-longman@redhat.com
</content>
</entry>
<entry>
<title>futex: Remove support for IMMUTABLE</title>
<updated>2025-07-11T14:02:01Z</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2025-07-10T11:00:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=760e6f7befbab9a84c54457a8ee45313b7b91ee5'/>
<id>urn:sha1:760e6f7befbab9a84c54457a8ee45313b7b91ee5</id>
<content type='text'>
The FH_FLAG_IMMUTABLE flag was meant to avoid the reference counting on
the private hash and so to avoid the performance regression on big
machines.
With the switch to per-CPU counter this is no longer needed. That flag
was never useable on any released kernel.

Remove any support for IMMUTABLE while preserve the flags argument and
enforce it to be zero.

Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20250710110011.384614-5-bigeasy@linutronix.de
</content>
</entry>
<entry>
<title>futex: Make futex_private_hash_get() static</title>
<updated>2025-07-11T14:02:00Z</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2025-07-10T11:00:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fb3c553da7fa9991f9b1436d91dbb78c7477c86a'/>
<id>urn:sha1:fb3c553da7fa9991f9b1436d91dbb78c7477c86a</id>
<content type='text'>
futex_private_hash_get() is not used outside if its compilation unit.
Make it static.

Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20250710110011.384614-4-bigeasy@linutronix.de
</content>
</entry>
<entry>
<title>futex: Use RCU-based per-CPU reference counting instead of rcuref_t</title>
<updated>2025-07-11T14:02:00Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2025-07-10T11:00:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=56180dd20c19e5b0fa34822997a9ac66b517e7b3'/>
<id>urn:sha1:56180dd20c19e5b0fa34822997a9ac66b517e7b3</id>
<content type='text'>
The use of rcuref_t for reference counting introduces a performance bottleneck
when accessed concurrently by multiple threads during futex operations.

Replace rcuref_t with special crafted per-CPU reference counters. The
lifetime logic remains the same.

The newly allocate private hash starts in FR_PERCPU state. In this state, each
futex operation that requires the private hash uses a per-CPU counter (an
unsigned int) for incrementing or decrementing the reference count.

When the private hash is about to be replaced, the per-CPU counters are
migrated to a atomic_t counter mm_struct::futex_atomic.
The migration process:
- Waiting for one RCU grace period to ensure all users observe the
  current private hash. This can be skipped if a grace period elapsed
  since the private hash was assigned.

- futex_private_hash::state is set to FR_ATOMIC, forcing all users to
  use mm_struct::futex_atomic for reference counting.

- After a RCU grace period, all users are guaranteed to be using the
  atomic counter. The per-CPU counters can now be summed up and added to
  the atomic_t counter. If the resulting count is zero, the hash can be
  safely replaced. Otherwise, active users still hold a valid reference.

- Once the atomic reference count drops to zero, the next futex
  operation will switch to the new private hash.

call_rcu_hurry() is used to speed up transition which otherwise might be
delay with RCU_LAZY. There is nothing wrong with using call_rcu(). The
side effects would be that on auto scaling the new hash is used later
and the SET_SLOTS prctl() will block longer.

[bigeasy: commit description + mm get/ put_async]

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20250710110011.384614-3-bigeasy@linutronix.de
</content>
</entry>
<entry>
<title>futex: Verify under the lock if hash can be replaced</title>
<updated>2025-06-11T15:24:09Z</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2025-06-02T11:00:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=69a14d146f3b87819f3fb73ed5d1de3e1fa680c1'/>
<id>urn:sha1:69a14d146f3b87819f3fb73ed5d1de3e1fa680c1</id>
<content type='text'>
Once the global hash is requested there is no way back to switch back to
the per-task private hash. This is checked at the begin of the function.

It is possible that two threads simultaneously request the global hash
and both pass the initial check and block later on the
mm::futex_hash_lock. In this case the first thread performs the switch
to the global hash. The second thread will also attempt to switch to the
global hash and while doing so, accessing the nonexisting slot 1 of the
struct futex_private_hash.
The same applies if the hash is made immutable: There is no reference
counting and the hash must not be replaced.

Verify under mm_struct::futex_phash that neither the global hash nor an
immutable hash in use.

Tested-by: "Lai, Yi" &lt;yi1.lai@linux.intel.com&gt;
Reported-by: "Lai, Yi" &lt;yi1.lai@linux.intel.com&gt;
Closes: https://lore.kernel.org/all/aDwDw9Aygqo6oAx+@ly-workstation/
Fixes: bd54df5ea7cad ("futex: Allow to resize the private local hash")
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/all/20250610104400.1077266-5-bigeasy@linutronix.de/
</content>
</entry>
<entry>
<title>futex: Handle invalid node numbers supplied by user</title>
<updated>2025-06-05T12:37:58Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2025-05-28T08:55:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8337204c58899fe422db765b481711eb2d95eb0b'/>
<id>urn:sha1:8337204c58899fe422db765b481711eb2d95eb0b</id>
<content type='text'>
syzbot used a negative node number which was not rejected early and led
to invalid memory access in node_possible().

Reject negative node numbers except for FUTEX_NO_NODE.

[bigeasy: Keep the FUTEX_NO_NODE check]

Closes: https://lore.kernel.org/all/6835bfe3.a70a0220.253bc2.00b5.GAE@google.com/
Fixes: cec199c5e39bd ("futex: Implement FUTEX2_NUMA")
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reported-by: syzbot+9afaf6749e3a7aa1bdf3@syzkaller.appspotmail.com
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20250528085521.1938355-4-bigeasy@linutronix.de
</content>
</entry>
<entry>
<title>Merge tag 'mm-stable-2025-06-01-14-06' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2025-06-02T23:00:26Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-06-02T23:00:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fd1f8473503e5bf897bd3e8efe3545c0352954e6'/>
<id>urn:sha1:fd1f8473503e5bf897bd3e8efe3545c0352954e6</id>
<content type='text'>
Pull more MM updates from Andrew Morton:

 - "zram: support algorithm-specific parameters" from Sergey Senozhatsky
   adds infrastructure for passing algorithm-specific parameters into
   zram. A single parameter `winbits' is implemented at this time.

 - "memcg: nmi-safe kmem charging" from Shakeel Butt makes memcg
   charging nmi-safe, which is required by BFP, which can operate in NMI
   context.

 - "Some random fixes and cleanup to shmem" from Kemeng Shi implements
   small fixes and cleanups in the shmem code.

 - "Skip mm selftests instead when kernel features are not present" from
   Zi Yan fixes some issues in the MM selftest code.

 - "mm/damon: build-enable essential DAMON components by default" from
   SeongJae Park reworks DAMON Kconfig to make it easier to enable
   CONFIG_DAMON.

 - "sched/numa: add statistics of numa balance task migration" from Libo
   Chen adds more info into sysfs and procfs files to improve visibility
   into the NUMA balancer's task migration activity.

 - "selftests/mm: cow and gup_longterm cleanups" from Mark Brown
   provides various updates to some of the MM selftests to make them
   play better with the overall containing framework.

* tag 'mm-stable-2025-06-01-14-06' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (43 commits)
  mm/khugepaged: clean up refcount check using folio_expected_ref_count()
  selftests/mm: fix test result reporting in gup_longterm
  selftests/mm: report unique test names for each cow test
  selftests/mm: add helper for logging test start and results
  selftests/mm: use standard ksft_finished() in cow and gup_longterm
  selftests/damon/_damon_sysfs: skip testcases if CONFIG_DAMON_SYSFS is disabled
  sched/numa: add statistics of numa balance task
  sched/numa: fix task swap by skipping kernel threads
  tools/testing: check correct variable in open_procmap()
  tools/testing/vma: add missing function stub
  mm/gup: update comment explaining why gup_fast() disables IRQs
  selftests/mm: two fixes for the pfnmap test
  mm/khugepaged: fix race with folio split/free using temporary reference
  mm: add CONFIG_PAGE_BLOCK_ORDER to select page block order
  mmu_notifiers: remove leftover stub macros
  selftests/mm: deduplicate test names in madv_populate
  kcov: rust: add flags for KCOV with Rust
  mm: rust: make CONFIG_MMU ifdefs more narrow
  mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables()
  mm/damon/Kconfig: enable CONFIG_DAMON by default
  ...
</content>
</entry>
</feed>
