<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/time/hrtimer.c, branch v6.3.12</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.3.12</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.3.12'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2023-01-31T10:23:07Z</updated>
<entry>
<title>hrtimer: Ignore slack time for RT tasks in schedule_hrtimeout_range()</title>
<updated>2023-01-31T10:23:07Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2023-01-23T17:32:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0c52310f260014d95c1310364379772cb74cf82d'/>
<id>urn:sha1:0c52310f260014d95c1310364379772cb74cf82d</id>
<content type='text'>
While in theory the timer can be triggered before expires + delta, for the
cases of RT tasks they really have no business giving any lenience for
extra slack time, so override any passed value by the user and always use
zero for schedule_hrtimeout_range() calls. Furthermore, this is similar to
what the nanosleep(2) family already does with current-&gt;timer_slack_ns.

Signed-off-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/r/20230123173206.6764-3-dave@stgolabs.net

</content>
</entry>
<entry>
<title>hrtimer: Rely on rt_task() for DL tasks too</title>
<updated>2023-01-31T10:23:07Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2023-01-23T17:32:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c14fd3dcacaa480394d3ac0b4a91a7d17a4b5516'/>
<id>urn:sha1:c14fd3dcacaa480394d3ac0b4a91a7d17a4b5516</id>
<content type='text'>
Checking dl_task() is redundant as rt_task() returns true for deadline
tasks too.

Signed-off-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/r/20230123173206.6764-2-dave@stgolabs.net

</content>
</entry>
<entry>
<title>timers: Prevent union confusion from unexpected restart_syscall()</title>
<updated>2023-01-11T18:31:47Z</updated>
<author>
<name>Jann Horn</name>
<email>jannh@google.com</email>
</author>
<published>2023-01-05T13:44:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9f76d59173d9d146e96c66886b671c1915a5c5e5'/>
<id>urn:sha1:9f76d59173d9d146e96c66886b671c1915a5c5e5</id>
<content type='text'>
The nanosleep syscalls use the restart_block mechanism, with a quirk:
The `type` and `rmtp`/`compat_rmtp` fields are set up unconditionally on
syscall entry, while the rest of the restart_block is only set up in the
unlikely case that the syscall is actually interrupted by a signal (or
pseudo-signal) that doesn't have a signal handler.

If the restart_block was set up by a previous syscall (futex(...,
FUTEX_WAIT, ...) or poll()) and hasn't been invalidated somehow since then,
this will clobber some of the union fields used by futex_wait_restart() and
do_restart_poll().

If userspace afterwards wrongly calls the restart_syscall syscall,
futex_wait_restart()/do_restart_poll() will read struct fields that have
been clobbered.

This doesn't actually lead to anything particularly interesting because
none of the union fields contain trusted kernel data, and
futex(..., FUTEX_WAIT, ...) and poll() aren't syscalls where it makes much
sense to apply seccomp filters to their arguments.

So the current consequences are just of the "if userspace does bad stuff,
it can damage itself, and that's not a problem" flavor.

But still, it seems like a hazard for future developers, so invalidate the
restart_block when partly setting it up in the nanosleep syscalls.

Signed-off-by: Jann Horn &lt;jannh@google.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/r/20230105134403.754986-1-jannh@google.com
</content>
</entry>
<entry>
<title>freezer,sched: Rewrite core freezer logic</title>
<updated>2022-09-07T19:53:50Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2022-08-22T11:18:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f5d39b020809146cc28e6e73369bf8065e0310aa'/>
<id>urn:sha1:f5d39b020809146cc28e6e73369bf8065e0310aa</id>
<content type='text'>
Rewrite the core freezer to behave better wrt thawing and be simpler
in general.

By replacing PF_FROZEN with TASK_FROZEN, a special block state, it is
ensured frozen tasks stay frozen until thawed and don't randomly wake
up early, as is currently possible.

As such, it does away with PF_FROZEN and PF_FREEZER_SKIP, freeing up
two PF_flags (yay!).

Specifically; the current scheme works a little like:

	freezer_do_not_count();
	schedule();
	freezer_count();

And either the task is blocked, or it lands in try_to_freezer()
through freezer_count(). Now, when it is blocked, the freezer
considers it frozen and continues.

However, on thawing, once pm_freezing is cleared, freezer_count()
stops working, and any random/spurious wakeup will let a task run
before its time.

That is, thawing tries to thaw things in explicit order; kernel
threads and workqueues before doing bringing SMP back before userspace
etc.. However due to the above mentioned races it is entirely possible
for userspace tasks to thaw (by accident) before SMP is back.

This can be a fatal problem in asymmetric ISA architectures (eg ARMv9)
where the userspace task requires a special CPU to run.

As said; replace this with a special task state TASK_FROZEN and add
the following state transitions:

	TASK_FREEZABLE	-&gt; TASK_FROZEN
	__TASK_STOPPED	-&gt; TASK_FROZEN
	__TASK_TRACED	-&gt; TASK_FROZEN

The new TASK_FREEZABLE can be set on any state part of TASK_NORMAL
(IOW. TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE) -- any such state
is already required to deal with spurious wakeups and the freezer
causes one such when thawing the task (since the original state is
lost).

The special __TASK_{STOPPED,TRACED} states *can* be restored since
their canonical state is in -&gt;jobctl.

With this, frozen tasks need an explicit TASK_FROZEN wakeup and are
free of undue (early / spurious) wakeups.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Link: https://lore.kernel.org/r/20220822114649.055452969@infradead.org
</content>
</entry>
<entry>
<title>wireguard: ratelimiter: use hrtimer in selftest</title>
<updated>2022-08-02T20:47:50Z</updated>
<author>
<name>Jason A. Donenfeld</name>
<email>Jason@zx2c4.com</email>
</author>
<published>2022-08-02T12:56:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=151c8e499f4705010780189377f85b57400ccbf5'/>
<id>urn:sha1:151c8e499f4705010780189377f85b57400ccbf5</id>
<content type='text'>
Using msleep() is problematic because it's compared against
ratelimiter.c's ktime_get_coarse_boottime_ns(), which means on systems
with slow jiffies (such as UML's forced HZ=100), the result is
inaccurate. So switch to using schedule_hrtimeout().

However, hrtimer gives us access only to the traditional posix timers,
and none of the _COARSE variants. So now, rather than being too
imprecise like jiffies, it's too precise.

One solution would be to give it a large "range" value, but this will
still fire early on a loaded system. A better solution is to align the
timeout to the actual coarse timer, and then round up to the nearest
tick, plus change.

So add the timeout to the current coarse time, and then
schedule_hrtimer() until the absolute computed time.

This should hopefully reduce flakes in CI as well. Note that we keep the
retry loop in case the entire function is running behind, because the
test could still be scheduled out, by either the kernel or by the
hypervisor's kernel, in which case restarting the test and hoping to not
be scheduled out still helps.

Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
Suggested-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>hrtimer: Unbreak hrtimer_force_reprogram()</title>
<updated>2021-08-12T20:34:40Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-08-12T20:32:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f80e21489590c00f46226d5802d900e6f66e5633'/>
<id>urn:sha1:f80e21489590c00f46226d5802d900e6f66e5633</id>
<content type='text'>
Since the recent consoliation of reprogramming functions,
hrtimer_force_reprogram() is affected by a check whether the new expiry
time is past the current expiry time.

This breaks the NOHZ logic as that relies on the fact that the tick hrtimer
is moved into the future. That means cpu_base-&gt;expires_next becomes stale
and subsequent reprogramming attempts fail as well until the situation is
cleaned up by an hrtimer interrupts.

For some yet unknown reason this leads to a complete stall, so for now
partially revert the offending commit to a known working state. The root
cause for the stall is still investigated and will be fixed in a subsequent
commit.

Fixes: b14bca97c9f5 ("hrtimer: Consolidate reprogramming code")
Reported-by: Mike Galbraith &lt;efault@gmx.de&gt;
Reported-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Mike Galbraith &lt;efault@gmx.de&gt;
Link: https://lore.kernel.org/r/8735recskh.ffs@tglx

</content>
</entry>
<entry>
<title>hrtimer: Use raw_cpu_ptr() in clock_was_set()</title>
<updated>2021-08-12T20:34:40Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-08-12T20:31:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9482fd71dbb8f0d1a61821a83e467dc0a9d7b429'/>
<id>urn:sha1:9482fd71dbb8f0d1a61821a83e467dc0a9d7b429</id>
<content type='text'>
clock_was_set() can be invoked from preemptible context. Use raw_cpu_ptr()
to check whether high resolution mode is active or not. It does not matter
whether the task migrates after acquiring the pointer.

Fixes: e71a4153b7c2 ("hrtimer: Force clock_was_set() handling for the HIGHRES=n, NOHZ=y case")
Reported-by: Mike Galbraith &lt;efault@gmx.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/r/875ywacsmb.ffs@tglx

</content>
</entry>
<entry>
<title>hrtimer: Avoid more SMP function calls in clock_was_set()</title>
<updated>2021-08-10T15:57:23Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-07-13T13:39:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1e7f7fbcd40c69d23e3fe641ead9f3dc128fa8aa'/>
<id>urn:sha1:1e7f7fbcd40c69d23e3fe641ead9f3dc128fa8aa</id>
<content type='text'>
By unconditionally updating the offsets there are more indicators
whether the SMP function calls on clock_was_set() can be avoided:

  - When the offset update already happened on the remote CPU then the
    remote update attempt will yield the same seqeuence number and no
    IPI is required.

  - When the remote CPU is currently handling hrtimer_interrupt(). In
    that case the remote CPU will reevaluate the timer bases before
    reprogramming anyway, so nothing to do.

  - After updating it can be checked whether the first expiring timer in
    the affected clock bases moves before the first expiring (softirq)
    timer of the CPU. If that's not the case then sending the IPI is not
    required.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/r/20210713135158.887322464@linutronix.de

</content>
</entry>
<entry>
<title>hrtimer: Avoid unnecessary SMP function calls in clock_was_set()</title>
<updated>2021-08-10T15:57:23Z</updated>
<author>
<name>Marcelo Tosatti</name>
<email>mtosatti@redhat.com</email>
</author>
<published>2021-07-13T13:39:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=81d741d3460ca422843ce0ec8351083f259c6166'/>
<id>urn:sha1:81d741d3460ca422843ce0ec8351083f259c6166</id>
<content type='text'>
Setting of clocks triggers an unconditional SMP function call on all online
CPUs to reprogram the clock event device.

However, only some clocks have their offsets updated and therefore
potentially require a reprogram. That's CLOCK_REALTIME and CLOCK_TAI and in
the case of resume (delayed sleep time injection) also CLOCK_BOOTTIME.

Instead of sending an IPI unconditionally, check each per CPU hrtimer base
whether it has active timers in the affected clock bases which are
indicated by the caller in the @bases argument of clock_was_set().

If that's not the case, skip the IPI and update the offsets remotely which
ensures that any subsequently armed timers on the affected clocks are
evaluated with the correct offsets.

[ tglx: Adopted to the new bases argument, removed the softirq_active
  	check, added comment, fixed up stale comment ]

Signed-off-by: Marcelo Tosatti &lt;mtosatti@redhat.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/r/20210713135158.787536542@linutronix.de

</content>
</entry>
<entry>
<title>hrtimer: Add bases argument to clock_was_set()</title>
<updated>2021-08-10T15:57:23Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2021-07-13T13:39:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=17a1b8826b451c80e7999a7c68e06b70579b2b8f'/>
<id>urn:sha1:17a1b8826b451c80e7999a7c68e06b70579b2b8f</id>
<content type='text'>
clock_was_set() unconditionaly invokes retrigger_next_event() on all online
CPUs. This was necessary because that mechanism was also used for resume
from suspend to idle which is not longer the case.

The bases arguments allows the callers of clock_was_set() to hand in a mask
which tells clock_was_set() which of the hrtimer clock bases are affected
by the clock setting. This mask will be used in the next step to check
whether a CPU base has timers queued on a clock base affected by the event
and avoid the SMP function call if there are none.

Add a @bases argument, provide defines for the active bases masking and
fixup all callsites.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/r/20210713135158.691083465@linutronix.de

</content>
</entry>
</feed>
