<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/entry, branch next/master</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=next%2Fmaster</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=next%2Fmaster'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2026-02-27T15:40:13Z</updated>
<entry>
<title>entry: Prepare for deferred hrtimer rearming</title>
<updated>2026-02-27T15:40:13Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2026-02-24T16:38:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0e98eb14814ef669e07ca6effaa03df2e57ef956'/>
<id>urn:sha1:0e98eb14814ef669e07ca6effaa03df2e57ef956</id>
<content type='text'>
The hrtimer interrupt expires timers and at the end of the interrupt it
rearms the clockevent device for the next expiring timer.

That's obviously correct, but in the case that a expired timer sets
NEED_RESCHED the return from interrupt ends up in schedule(). If HRTICK is
enabled then schedule() will modify the hrtick timer, which causes another
reprogramming of the hardware.

That can be avoided by deferring the rearming to the return from interrupt
path and if the return results in a immediate schedule() invocation then it
can be deferred until the end of schedule(), which avoids multiple rearms
and re-evaluation of the timer wheel.

As this is only relevant for interrupt to user return split the work masks
up and hand them in as arguments from the relevant exit to user functions,
which allows the compiler to optimize the deferred handling out for the
syscall exit to user case.

Add the rearm checks to the approritate places in the exit to user loop and
the interrupt return to kernel path, so that the rearming is always
guaranteed.

In the return to user space path this is handled in the same way as
TIF_RSEQ to avoid extra instructions in the fast path, which are truly
hurtful for device interrupt heavy work loads as the extra instructions and
conditionals while benign at first sight accumulate quickly into measurable
regressions. The return from syscall path is completely unaffected due to
the above mentioned split so syscall heavy workloads wont have any extra
burden.

For now this is just placing empty stubs at the right places which are all
optimized out by the compiler until the actual functionality is in place.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://patch.msgid.link/20260224163431.066469985@kernel.org
</content>
</entry>
<entry>
<title>Merge branch 'core/entry' into sched/core</title>
<updated>2026-01-30T14:40:05Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@kernel.org</email>
</author>
<published>2026-01-30T14:40:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5c4378b7b0e1f45cf38e77db9305ee2c7bf88002'/>
<id>urn:sha1:5c4378b7b0e1f45cf38e77db9305ee2c7bf88002</id>
<content type='text'>
Pull the entry update to avoid merge conflicts with the time slice
extension changes.

Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
</content>
</entry>
<entry>
<title>entry: Inline syscall_exit_work() and syscall_trace_enter()</title>
<updated>2026-01-30T14:38:10Z</updated>
<author>
<name>Jinjie Ruan</name>
<email>ruanjinjie@huawei.com</email>
</author>
<published>2026-01-28T03:19:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=31c9387d0d84bc1d643a0c30155b6d92d05c92fc'/>
<id>urn:sha1:31c9387d0d84bc1d643a0c30155b6d92d05c92fc</id>
<content type='text'>
After switching ARM64 to the generic entry code, a syscall_exit_work()
appeared as a profiling hotspot because it is not inlined.

Inlining both syscall_trace_enter() and syscall_exit_work() provides a
performance gain when any of the work items is enabled. With audit enabled
this results in a ~4% performance gain for perf bench basic syscall on
a kunpeng920 system:

    | Metric     | Baseline    | Inlined     | Change  |
    | ---------- | ----------- | ----------- | ------  |
    | Total time | 2.353 [sec] | 2.264 [sec] |  ↓3.8%  |
    | usecs/op   | 0.235374    | 0.226472    |  ↓3.8%  |
    | ops/sec    | 4,248,588   | 4,415,554   |  ↑3.9%  |

Small gains can be observed on x86 as well, though the generated code
optimizes for the work case, which is counterproductive for high
performance scenarios where such entry/exit work is usually avoided.

Avoid this by marking the work check in syscall_enter_from_user_mode_work()
unlikely, which is what the corresponding check in the exit path does
already.

[ tglx: Massage changelog and add the unlikely() ]

Signed-off-by: Jinjie Ruan &lt;ruanjinjie@huawei.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
Link: https://patch.msgid.link/20260128031934.3906955-14-ruanjinjie@huawei.com
</content>
</entry>
<entry>
<title>entry: Add arch_ptrace_report_syscall_entry/exit()</title>
<updated>2026-01-30T14:38:09Z</updated>
<author>
<name>Jinjie Ruan</name>
<email>ruanjinjie@huawei.com</email>
</author>
<published>2026-01-28T03:19:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=578b21fd3ab2d9901ce40ed802e428a41a40610d'/>
<id>urn:sha1:578b21fd3ab2d9901ce40ed802e428a41a40610d</id>
<content type='text'>
ARM64 requires a architecture specific ptrace wrapper as it needs to save
and restore scratch registers.

Provide arch_ptrace_report_syscall_entry/exit() wrappers which fall back to
ptrace_report_syscall_entry/exit() if the architecture does not provide
them.

No functional change intended.

[ tglx: Massaged changelog and comments ]

Suggested-by: Mark Rutland &lt;mark.rutland@arm.com&gt;
Suggested-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Jinjie Ruan &lt;ruanjinjie@huawei.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
Reviewed-by: Kevin Brodsky &lt;kevin.brodsky@arm.com&gt;
Link: https://patch.msgid.link/20260128031934.3906955-11-ruanjinjie@huawei.com
</content>
</entry>
<entry>
<title>entry: Remove unused syscall argument from syscall_trace_enter()</title>
<updated>2026-01-30T14:38:09Z</updated>
<author>
<name>Jinjie Ruan</name>
<email>ruanjinjie@huawei.com</email>
</author>
<published>2026-01-28T03:19:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=03150a9f84b328f5c724b8ed9ff8600c2d7e2d7b'/>
<id>urn:sha1:03150a9f84b328f5c724b8ed9ff8600c2d7e2d7b</id>
<content type='text'>
The 'syscall' argument of syscall_trace_enter() is immediately overwritten
before any real use and serves only as a local variable, so drop the
parameter.

No functional change intended.

Signed-off-by: Jinjie Ruan &lt;ruanjinjie@huawei.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
Link: https://patch.msgid.link/20260128031934.3906955-2-ruanjinjie@huawei.com
</content>
</entry>
<entry>
<title>entry: Hook up rseq time slice extension</title>
<updated>2026-01-22T10:11:19Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2025-12-15T16:52:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3c78aaec19b0621bf952756670c8b066a55202fe'/>
<id>urn:sha1:3c78aaec19b0621bf952756670c8b066a55202fe</id>
<content type='text'>
Wire the grant decision function up in exit_to_user_mode_loop()

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://patch.msgid.link/20251215155709.258157362@linutronix.de
</content>
</entry>
<entry>
<title>rseq: Implement syscall entry work for time slice extensions</title>
<updated>2026-01-22T10:11:18Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2025-12-15T16:52:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=dd0a04606937af5810e9117d343ee3792635bd3d'/>
<id>urn:sha1:dd0a04606937af5810e9117d343ee3792635bd3d</id>
<content type='text'>
The kernel sets SYSCALL_WORK_RSEQ_SLICE when it grants a time slice
extension. This allows to handle the rseq_slice_yield() syscall, which is
used by user space to relinquish the CPU after finishing the critical
section for which it requested an extension.

In case the kernel state is still GRANTED, the kernel resets both kernel
and user space state with a set of sanity checks. If the kernel state is
already cleared, then this raced against the timer or some other interrupt
and just clears the work bit.

Doing it in syscall entry work allows to catch misbehaving user space,
which issues an arbitrary syscall, i.e. not rseq_slice_yield(), from the
critical section. Contrary to the initial strict requirement to use
rseq_slice_yield() arbitrary syscalls are not considered a violation of the
ABI contract anymore to allow onion architecture applications, which cannot
control the code inside a critical section, to utilize this as well.

If the code detects inconsistent user space that result in a SIGSEGV for
the application.

If the grant was still active and the task was not preempted yet, the work
code reschedules immediately before continuing through the syscall.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://patch.msgid.link/20251215155709.005777059@linutronix.de
</content>
</entry>
<entry>
<title>rseq: Switch to TIF_RSEQ if supported</title>
<updated>2025-11-04T07:35:37Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2025-10-27T08:45:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=32034df66b5f49626aa450ceaf1849a08d87906e'/>
<id>urn:sha1:32034df66b5f49626aa450ceaf1849a08d87906e</id>
<content type='text'>
TIF_NOTIFY_RESUME is a multiplexing TIF bit, which is suboptimal especially
with the RSEQ fast path depending on it, but not really handling it.

Define a separate TIF_RSEQ in the generic TIF space and enable the full
separation of fast and slow path for architectures which utilize that.

That avoids the hassle with invocations of resume_user_mode_work() from
hypervisors, which clear TIF_NOTIFY_RESUME. It makes the therefore required
re-evaluation at the end of vcpu_run() a NOOP on architectures which
utilize the generic TIF space and have a separate TIF_RSEQ.

The hypervisor TIF handling does not include the separate TIF_RSEQ as there
is no point in doing so. The guest does neither know nor care about the VMM
host applications RSEQ state. That state is only relevant when the ioctl()
returns to user space.

The fastpath implementation still utilizes TIF_NOTIFY_RESUME for failure
handling, but this only happens within exit_to_user_mode_loop(), so
arguably the hypervisor ioctl() code is long done when this happens.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Reviewed-by: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Link: https://patch.msgid.link/20251027084307.903622031@linutronix.de
</content>
</entry>
<entry>
<title>rseq: Switch to fast path processing on exit to user</title>
<updated>2025-11-04T07:34:39Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2025-10-27T08:45:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3db6b38dfe640207da706b286d4181237391f5bd'/>
<id>urn:sha1:3db6b38dfe640207da706b286d4181237391f5bd</id>
<content type='text'>
Now that all bits and pieces are in place, hook the RSEQ handling fast path
function into exit_to_user_mode_prepare() after the TIF work bits have been
handled. If case of fast path failure, TIF_NOTIFY_RESUME has been raised
and the caller needs to take another turn through the TIF handling slow
path.

This only works for architectures which use the generic entry code.
Architectures who still have their own incomplete hacks are not supported
and won't be.

This results in the following improvements:

  Kernel build	       Before		  After		      Reduction

  exit to user         80692981		  80514451
  signal checks:          32581		       121	       99%
  slowpath runs:        1201408   1.49%	       198 0.00%      100%
  fastpath runs:			    675941 0.84%       N/A
  id updates:           1233989   1.53%	     50541 0.06%       96%
  cs checks:            1125366   1.39%	         0 0.00%      100%
    cs cleared:         1125366      100%	 0            100%
    cs fixup:                 0        0%	 0

  RSEQ selftests      Before		  After		      Reduction

  exit to user:       386281778		  387373750
  signal checks:       35661203		          0           100%
  slowpath runs:      140542396 36.38%	        100  0.00%    100%
  fastpath runs:			    9509789  2.51%     N/A
  id updates:         176203599 45.62%	    9087994  2.35%     95%
  cs checks:          175587856 45.46%	    4728394  1.22%     98%
    cs cleared:       172359544   98.16%    1319307   27.90%   99%
    cs fixup:           3228312    1.84%    3409087   72.10%

The 'cs cleared' and 'cs fixup' percentages are not relative to the exit to
user invocations, they are relative to the actual 'cs check' invocations.

While some of this could have been avoided in the original code, like the
obvious clearing of CS when it's already clear, the main problem of going
through TIF_NOTIFY_RESUME cannot be solved. In some workloads the RSEQ
notify handler is invoked more than once before going out to user
space. Doing this once when everything has stabilized is the only solution
to avoid this.

The initial attempt to completely decouple it from the TIF work turned out
to be suboptimal for workloads, which do a lot of quick and short system
calls. Even if the fast path decision is only 4 instructions (including a
conditional branch), this adds up quickly and becomes measurable when the
rate for actually having to handle rseq is in the low single digit
percentage range of user/kernel transitions.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Reviewed-by: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Link: https://patch.msgid.link/20251027084307.701201365@linutronix.de
</content>
</entry>
<entry>
<title>entry: Inline irqentry_enter/exit_from/to_user_mode()</title>
<updated>2025-11-04T07:31:47Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2025-10-27T08:44:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7702a9c2856794b6bf961b408eba3bacb753bd5b'/>
<id>urn:sha1:7702a9c2856794b6bf961b408eba3bacb753bd5b</id>
<content type='text'>
There is no point to have this as a function which just inlines
enter_from_user_mode(). The function call overhead is larger than the
function itself.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Reviewed-by: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Link: https://patch.msgid.link/20251027084306.715309918@linutronix.de
</content>
</entry>
</feed>
