<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/sys.c, branch v6.1.148</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.1.148</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.1.148'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2025-03-28T20:58:48Z</updated>
<entry>
<title>hrtimer: Use and report correct timerslack values for realtime tasks</title>
<updated>2025-03-28T20:58:48Z</updated>
<author>
<name>Felix Moessbauer</name>
<email>felix.moessbauer@siemens.com</email>
</author>
<published>2024-08-14T12:10:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4c3712c15f78138f06ffe3a6fb6680aabb7e6624'/>
<id>urn:sha1:4c3712c15f78138f06ffe3a6fb6680aabb7e6624</id>
<content type='text'>
commit ed4fb6d7ef68111bb539283561953e5c6e9a6e38 upstream.

The timerslack_ns setting is used to specify how much the hardware
timers should be delayed, to potentially dispatch multiple timers in a
single interrupt. This is a performance optimization. Timers of
realtime tasks (having a realtime scheduling policy) should not be
delayed.

This logic was inconsitently applied to the hrtimers, leading to delays
of realtime tasks which used timed waits for events (e.g. condition
variables). Due to the downstream override of the slack for rt tasks,
the procfs reported incorrect (non-zero) timerslack_ns values.

This is changed by setting the timer_slack_ns task attribute to 0 for
all tasks with a rt policy. By that, downstream users do not need to
specially handle rt tasks (w.r.t. the slack), and the procfs entry
shows the correct value of "0". Setting non-zero slack values (either
via procfs or PR_SET_TIMERSLACK) on tasks with a rt policy is ignored,
as stated in "man 2 PR_SET_TIMERSLACK":

  Timer slack is not applied to threads that are scheduled under a
  real-time scheduling policy (see sched_setscheduler(2)).

The special handling of timerslack on rt tasks in downstream users
is removed as well.

Signed-off-by: Felix Moessbauer &lt;felix.moessbauer@siemens.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/all/20240814121032.368444-2-felix.moessbauer@siemens.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>getrusage: use sig-&gt;stats_lock rather than lock_task_sighand()</title>
<updated>2024-03-15T14:48:22Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2024-01-22T15:50:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9793a3bb531c5b747bb726d7611b81abe32f6401'/>
<id>urn:sha1:9793a3bb531c5b747bb726d7611b81abe32f6401</id>
<content type='text'>
[ Upstream commit f7ec1cd5cc7ef3ad964b677ba82b8b77f1c93009 ]

lock_task_sighand() can trigger a hard lockup. If NR_CPUS threads call
getrusage() at the same time and the process has NR_THREADS, spin_lock_irq
will spin with irqs disabled O(NR_CPUS * NR_THREADS) time.

Change getrusage() to use sig-&gt;stats_lock, it was specifically designed
for this type of use. This way it runs lockless in the likely case.

TODO:
	- Change do_task_stat() to use sig-&gt;stats_lock too, then we can
	  remove spin_lock_irq(siglock) in wait_task_zombie().

	- Turn sig-&gt;stats_lock into seqcount_rwlock_t, this way the
	  readers in the slow mode won't exclude each other. See
	  https://lore.kernel.org/all/20230913154907.GA26210@redhat.com/

	- stats_lock has to disable irqs because -&gt;siglock can be taken
	  in irq context, it would be very nice to change __exit_signal()
	  to avoid the siglock-&gt;stats_lock dependency.

Link: https://lkml.kernel.org/r/20240122155053.GA26214@redhat.com
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reported-by: Dylan Hatch &lt;dylanbhatch@google.com&gt;
Tested-by: Dylan Hatch &lt;dylanbhatch@google.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>getrusage: use __for_each_thread()</title>
<updated>2024-03-15T14:48:22Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2023-09-09T17:26:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2a304d8c922f2d13d6b95457022950350c23103b'/>
<id>urn:sha1:2a304d8c922f2d13d6b95457022950350c23103b</id>
<content type='text'>
[ Upstream commit 13b7bc60b5353371460a203df6c38ccd38ad7a3a ]

do/while_each_thread should be avoided when possible.

Plus this change allows to avoid lock_task_sighand(), we can use rcu
and/or sig-&gt;stats_lock instead.

Link: https://lkml.kernel.org/r/20230909172629.GA20454@redhat.com
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Stable-dep-of: f7ec1cd5cc7e ("getrusage: use sig-&gt;stats_lock rather than lock_task_sighand()")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand()</title>
<updated>2024-03-15T14:48:22Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2024-01-22T15:50:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d9fe6ef245766d4f4d0494aafc7d5b2189b9b94c'/>
<id>urn:sha1:d9fe6ef245766d4f4d0494aafc7d5b2189b9b94c</id>
<content type='text'>
[ Upstream commit daa694e4137571b4ebec330f9a9b4d54aa8b8089 ]

Patch series "getrusage: use sig-&gt;stats_lock", v2.

This patch (of 2):

thread_group_cputime() does its own locking, we can safely shift
thread_group_cputime_adjusted() which does another for_each_thread loop
outside of -&gt;siglock protected section.

This is also preparation for the next patch which changes getrusage() to
use stats_lock instead of siglock, thread_group_cputime() takes the same
lock.  With the current implementation recursive read_seqbegin_or_lock()
is fine, thread_group_cputime() can't enter the slow mode if the caller
holds stats_lock, yet this looks more safe and better performance-wise.

Link: https://lkml.kernel.org/r/20240122155023.GA26169@redhat.com
Link: https://lkml.kernel.org/r/20240122155050.GA26205@redhat.com
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reported-by: Dylan Hatch &lt;dylanbhatch@google.com&gt;
Tested-by: Dylan Hatch &lt;dylanbhatch@google.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>getrusage: add the "signal_struct *sig" local variable</title>
<updated>2024-03-15T14:48:21Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2023-09-09T17:25:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=eba76e4808c9ad2c41ec5652a9335a8b5c03a709'/>
<id>urn:sha1:eba76e4808c9ad2c41ec5652a9335a8b5c03a709</id>
<content type='text'>
[ Upstream commit c7ac8231ace9b07306d0299969e42073b189c70a ]

No functional changes, cleanup/preparation.

Link: https://lkml.kernel.org/r/20230909172554.GA20441@redhat.com
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Stable-dep-of: daa694e41375 ("getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand()")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>kernel/sys.c: fix and improve control flow in __sys_setres[ug]id()</title>
<updated>2023-04-26T12:28:39Z</updated>
<author>
<name>Ondrej Mosnacek</name>
<email>omosnace@redhat.com</email>
</author>
<published>2023-02-17T16:21:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ec90129b91b6568aab011d954d85edbd43ea7196'/>
<id>urn:sha1:ec90129b91b6568aab011d954d85edbd43ea7196</id>
<content type='text'>
commit 659c0ce1cb9efc7f58d380ca4bb2a51ae9e30553 upstream.

Linux Security Modules (LSMs) that implement the "capable" hook will
usually emit an access denial message to the audit log whenever they
"block" the current task from using the given capability based on their
security policy.

The occurrence of a denial is used as an indication that the given task
has attempted an operation that requires the given access permission, so
the callers of functions that perform LSM permission checks must take care
to avoid calling them too early (before it is decided if the permission is
actually needed to perform the requested operation).

The __sys_setres[ug]id() functions violate this convention by first
calling ns_capable_setid() and only then checking if the operation
requires the capability or not.  It means that any caller that has the
capability granted by DAC (task's capability set) but not by MAC (LSMs)
will generate a "denied" audit record, even if is doing an operation for
which the capability is not required.

Fix this by reordering the checks such that ns_capable_setid() is checked
last and -EPERM is returned immediately if it returns false.

While there, also do two small optimizations:
* move the capability check before prepare_creds() and
* bail out early in case of a no-op.

Link: https://lkml.kernel.org/r/20230217162154.837549-1-omosnace@redhat.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ondrej Mosnacek &lt;omosnace@redhat.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>prlimit: do_prlimit needs to have a speculation check</title>
<updated>2023-01-24T06:24:34Z</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2023-01-20T10:03:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=91185568c99d60534bacf38439846103962d1e2c'/>
<id>urn:sha1:91185568c99d60534bacf38439846103962d1e2c</id>
<content type='text'>
commit 739790605705ddcf18f21782b9c99ad7d53a8c11 upstream.

do_prlimit() adds the user-controlled resource value to a pointer that
will subsequently be dereferenced.  In order to help prevent this
codepath from being used as a spectre "gadget" a barrier needs to be
added after checking the range.

Reported-by: Jordy Zomer &lt;jordyzomer@google.com&gt;
Tested-by: Jordy Zomer &lt;jordyzomer@google.com&gt;
Suggested-by: Linus Torvalds &lt;torvalds@linuxfoundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random</title>
<updated>2022-10-10T17:41:21Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-10-10T17:41:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8adc0486f3c85e3c1e40c1ce6884317a17c380d3'/>
<id>urn:sha1:8adc0486f3c85e3c1e40c1ce6884317a17c380d3</id>
<content type='text'>
Pull random number generator updates from Jason Donenfeld:

 - Huawei reported that when they updated their kernel from 4.4 to
   something much newer, some userspace code they had broke, the culprit
   being the accidental removal of O_NONBLOCK from /dev/random way back
   in 5.6. It's been gone for over 2 years now and this is the first
   we've heard of it, but userspace breakage is userspace breakage, so
   O_NONBLOCK is now back.

 - Use randomness from hardware RNGs much more often during early boot,
   at the same interval that crng reseeds are done, from Dominik.

 - A semantic change in hardware RNG throttling, so that the hwrng
   framework can properly feed random.c with randomness from hardware
   RNGs that aren't specifically marked as creditable.

   A related patch coming to you via Herbert's hwrng tree depends on
   this one, not to compile, but just to function properly, so you may
   want to merge this PULL before that one.

 - A fix to clamp credited bits from the interrupts pool to the size of
   the pool sample. This is mainly just a theoretical fix, as it'd be
   pretty hard to exceed it in practice.

 - Oracle reported that InfiniBand TCP latency regressed by around
   10-15% after a change a few cycles ago made at the request of the RT
   folks, in which we hoisted a somewhat rare operation (1 in 1024
   times) out of the hard IRQ handler and into a workqueue, a pretty
   common and boring pattern.

   It turns out, though, that scheduling a worker from there has
   overhead of its own, whereas scheduling a timer on that same CPU for
   the next jiffy amortizes better and doesn't incur the same overhead.

   I also eliminated a cache miss by moving the work_struct (and
   subsequently, the timer_list) to below a critical cache line, so that
   the more critical members that are accessed on every hard IRQ aren't
   split between two cache lines.

 - The boot-time initialization of the RNG has been split into two
   approximate phases: what we can accomplish before timekeeping is
   possible and what we can accomplish after.

   This winds up being useful so that we can use RDRAND to seed the RNG
   before CONFIG_SLAB_FREELIST_RANDOM=y systems initialize slabs, in
   addition to other early uses of randomness. The effect is that
   systems with RDRAND (or a bootloader seed) will never see any
   warnings at all when setting CONFIG_WARN_ALL_UNSEEDED_RANDOM=y. And
   kfence benefits from getting a better seed of its own.

 - Small systems without much entropy sometimes wind up putting some
   truncated serial number read from flash into hostname, so contribute
   utsname changes to the RNG, without crediting.

 - Add smaller batches to serve requests for smaller integers, and make
   use of them when people ask for random numbers bounded by a given
   compile-time constant. This has positive effects all over the tree,
   most notably in networking and kfence.

 - The original jitter algorithm intended (I believe) to schedule the
   timer for the next jiffy, not the next-next jiffy, yet it used
   mod_timer(jiffies + 1), which will fire on the next-next jiffy,
   instead of what I believe was intended, mod_timer(jiffies), which
   will fire on the next jiffy. So fix that.

 - Fix a comment typo, from William.

* tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  random: clear new batches when bringing new CPUs online
  random: fix typos in get_random_bytes() comment
  random: schedule jitter credit for next jiffy, not in two jiffies
  prandom: make use of smaller types in prandom_u32_max
  random: add 8-bit and 16-bit batches
  utsname: contribute changes to RNG
  random: use init_utsname() instead of utsname()
  kfence: use better stack hash seed
  random: split initialization into early step and later step
  random: use expired timer rather than wq for mixing fast pool
  random: avoid reading two cache lines on irq randomness
  random: clamp credited irq bits to maximum mixed
  random: throttle hwrng writes if no entropy is credited
  random: use hwgenerator randomness more frequently at early boot
  random: restore O_NONBLOCK support
</content>
</entry>
<entry>
<title>Merge tag 'ucount-rlimits-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace</title>
<updated>2022-10-09T23:24:05Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-10-09T23:24:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=493ffd6605b2d3d4dc7008ab927dba319f36671f'/>
<id>urn:sha1:493ffd6605b2d3d4dc7008ab927dba319f36671f</id>
<content type='text'>
Pull ucounts update from Eric Biederman:
 "Split rlimit and ucount values and max values

  After the ucount rlimit code was merged a bunch of small but
  siginificant bugs were found and fixed. At the time it was realized
  that part of the problem was that while the ucount rlimits were very
  similar to the oridinary ucounts (in being nested counts with limits)
  the semantics were slightly different and the code would be less error
  prone if there was less sharing.

  This is the long awaited cleanup that should hopefully keep things
  more comprehensible and less error prone for whoever needs to touch
  that code next"

* tag 'ucount-rlimits-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ucounts: Split rlimit and ucount values and max values
</content>
</entry>
<entry>
<title>utsname: contribute changes to RNG</title>
<updated>2022-09-29T19:37:27Z</updated>
<author>
<name>Jason A. Donenfeld</name>
<email>Jason@zx2c4.com</email>
</author>
<published>2022-09-27T09:35:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=37608ba315a2b1b548aa5b1064e5559e029cb016'/>
<id>urn:sha1:37608ba315a2b1b548aa5b1064e5559e029cb016</id>
<content type='text'>
On some small machines with little entropy, a quasi-unique hostname is
sometimes a relevant factor. I've seen, for example, 8 character
alpha-numeric serial numbers. In addition, the time at which the hostname
is set is usually a decent measurement of how long early boot took. So,
call add_device_randomness() on new hostnames, which feeds its arguments
to the RNG in addition to a fresh cycle counter.

Low cost hooks like this never hurt and can only ever help, and since
this costs basically nothing for an operation that is never a fast path,
this is an overall easy win.

Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Dominik Brodowski &lt;linux@dominikbrodowski.net&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
</content>
</entry>
</feed>
