<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel, branch v6.6.29</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.6.29</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.6.29'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2024-04-27T15:11:41Z</updated>
<entry>
<title>sched: Add missing memory barrier in switch_mm_cid</title>
<updated>2024-04-27T15:11:41Z</updated>
<author>
<name>Mathieu Desnoyers</name>
<email>mathieu.desnoyers@efficios.com</email>
</author>
<published>2024-04-15T15:21:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7fce9f0f4810da56d44aa08b8e99d64b0acfd97c'/>
<id>urn:sha1:7fce9f0f4810da56d44aa08b8e99d64b0acfd97c</id>
<content type='text'>
commit fe90f3967bdb3e13f133e5f44025e15f943a99c5 upstream.

Many architectures' switch_mm() (e.g. arm64) do not have an smp_mb()
which the core scheduler code has depended upon since commit:

    commit 223baf9d17f25 ("sched: Fix performance regression introduced by mm_cid")

If switch_mm() doesn't call smp_mb(), sched_mm_cid_remote_clear() can
unset the actively used cid when it fails to observe active task after it
sets lazy_put.

There *is* a memory barrier between storing to rq-&gt;curr and _return to
userspace_ (as required by membarrier), but the rseq mm_cid has stricter
requirements: the barrier needs to be issued between store to rq-&gt;curr
and switch_mm_cid(), which happens earlier than:

  - spin_unlock(),
  - switch_to().

So it's fine when the architecture switch_mm() happens to have that
barrier already, but less so when the architecture only provides the
full barrier in switch_to() or spin_unlock().

It is a bug in the rseq switch_mm_cid() implementation. All architectures
that don't have memory barriers in switch_mm(), but rather have the full
barrier either in finish_lock_switch() or switch_to() have them too late
for the needs of switch_mm_cid().

Introduce a new smp_mb__after_switch_mm(), defined as smp_mb() in the
generic barrier.h header, and use it in switch_mm_cid() for scheduler
transitions where switch_mm() is expected to provide a memory barrier.

Architectures can override smp_mb__after_switch_mm() if their
switch_mm() implementation provides an implicit memory barrier.
Override it with a no-op on x86 which implicitly provide this memory
barrier by writing to CR3.

Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid")
Reported-by: levi.yun &lt;yeoreum.yun@arm.com&gt;
Signed-off-by: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Reviewed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt; # for arm64
Acked-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt; # for x86
Cc: &lt;stable@vger.kernel.org&gt; # 6.4.x
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: https://lore.kernel.org/r/20240415152114.59122-2-mathieu.desnoyers@efficios.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n</title>
<updated>2024-04-17T09:19:36Z</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2024-04-09T17:51:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2978ee7c973ce81b6e51100ba1e5ae001af624b9'/>
<id>urn:sha1:2978ee7c973ce81b6e51100ba1e5ae001af624b9</id>
<content type='text'>
commit f337a6a21e2fd67eadea471e93d05dd37baaa9be upstream.

Initialize cpu_mitigations to CPU_MITIGATIONS_OFF if the kernel is built
with CONFIG_SPECULATION_MITIGATIONS=n, as the help text quite clearly
states that disabling SPECULATION_MITIGATIONS is supposed to turn off all
mitigations by default.

  │ If you say N, all mitigations will be disabled. You really
  │ should know what you are doing to say so.

As is, the kernel still defaults to CPU_MITIGATIONS_AUTO, which results in
some mitigations being enabled in spite of SPECULATION_MITIGATIONS=n.

Fixes: f43b9876e857 ("x86/retbleed: Add fine grained Kconfig knobs")
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Reviewed-by: Daniel Sneddon &lt;daniel.sneddon@linux.intel.com&gt;
Cc: stable@vger.kernel.org
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: https://lore.kernel.org/r/20240409175108.1512861-2-seanjc@google.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>kprobes: Fix possible use-after-free issue on kprobe registration</title>
<updated>2024-04-17T09:19:34Z</updated>
<author>
<name>Zheng Yejian</name>
<email>zhengyejian1@huawei.com</email>
</author>
<published>2024-04-10T01:58:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d15023fb407337028a654237d8968fefdcf87c2f'/>
<id>urn:sha1:d15023fb407337028a654237d8968fefdcf87c2f</id>
<content type='text'>
commit 325f3fb551f8cd672dbbfc4cf58b14f9ee3fc9e8 upstream.

When unloading a module, its state is changing MODULE_STATE_LIVE -&gt;
 MODULE_STATE_GOING -&gt; MODULE_STATE_UNFORMED. Each change will take
a time. `is_module_text_address()` and `__module_text_address()`
works with MODULE_STATE_LIVE and MODULE_STATE_GOING.
If we use `is_module_text_address()` and `__module_text_address()`
separately, there is a chance that the first one is succeeded but the
next one is failed because module-&gt;state becomes MODULE_STATE_UNFORMED
between those operations.

In `check_kprobe_address_safe()`, if the second `__module_text_address()`
is failed, that is ignored because it expected a kernel_text address.
But it may have failed simply because module-&gt;state has been changed
to MODULE_STATE_UNFORMED. In this case, arm_kprobe() will try to modify
non-exist module text address (use-after-free).

To fix this problem, we should not use separated `is_module_text_address()`
and `__module_text_address()`, but use only `__module_text_address()`
once and do `try_module_get(module)` which is only available with
MODULE_STATE_LIVE.

Link: https://lore.kernel.org/all/20240410015802.265220-1-zhengyejian1@huawei.com/

Fixes: 28f6c37a2910 ("kprobes: Forbid probing on trampoline and BPF code areas")
Cc: stable@vger.kernel.org
Signed-off-by: Zheng Yejian &lt;zhengyejian1@huawei.com&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>tracing: hide unused ftrace_event_id_fops</title>
<updated>2024-04-17T09:19:33Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2024-04-03T08:06:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e3e1e80b692c14684bc2d4da890e38a9bab0ec0e'/>
<id>urn:sha1:e3e1e80b692c14684bc2d4da890e38a9bab0ec0e</id>
<content type='text'>
[ Upstream commit 5281ec83454d70d98b71f1836fb16512566c01cd ]

When CONFIG_PERF_EVENTS, a 'make W=1' build produces a warning about the
unused ftrace_event_id_fops variable:

kernel/trace/trace_events.c:2155:37: error: 'ftrace_event_id_fops' defined but not used [-Werror=unused-const-variable=]
 2155 | static const struct file_operations ftrace_event_id_fops = {

Hide this in the same #ifdef as the reference to it.

Link: https://lore.kernel.org/linux-trace-kernel/20240403080702.3509288-7-arnd@kernel.org

Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Zheng Yejian &lt;zhengyejian1@huawei.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Ajay Kaher &lt;akaher@vmware.com&gt;
Cc: Jinjie Ruan &lt;ruanjinjie@huawei.com&gt;
Cc: Clément Léger &lt;cleger@rivosinc.com&gt;
Cc: Dan Carpenter &lt;dan.carpenter@linaro.org&gt;
Cc: "Tzvetomir Stoyanov (VMware)" &lt;tz.stoyanov@gmail.com&gt;
Fixes: 620a30e97feb ("tracing: Don't pass file_operations array to event_create_dir()")
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>PM: s2idle: Make sure CPUs will wakeup directly on resume</title>
<updated>2024-04-17T09:19:26Z</updated>
<author>
<name>Anna-Maria Behnsen</name>
<email>anna-maria@linutronix.de</email>
</author>
<published>2024-04-08T07:02:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9fdfeef4c5419e1d6d952cd788d6714d2734f7c1'/>
<id>urn:sha1:9fdfeef4c5419e1d6d952cd788d6714d2734f7c1</id>
<content type='text'>
commit 3c89a068bfd0698a5478f4cf39493595ef757d5e upstream.

s2idle works like a regular suspend with freezing processes and freezing
devices. All CPUs except the control CPU go into idle. Once this is
completed the control CPU kicks all other CPUs out of idle, so that they
reenter the idle loop and then enter s2idle state. The control CPU then
issues an swait() on the suspend state and therefore enters the idle loop
as well.

Due to being kicked out of idle, the other CPUs leave their NOHZ states,
which means the tick is active and the corresponding hrtimer is programmed
to the next jiffie.

On entering s2idle the CPUs shut down their local clockevent device to
prevent wakeups. The last CPU which enters s2idle shuts down its local
clockevent and freezes timekeeping.

On resume, one of the CPUs receives the wakeup interrupt, unfreezes
timekeeping and its local clockevent and starts the resume process. At that
point all other CPUs are still in s2idle with their clockevents switched
off. They only resume when they are kicked by another CPU or after resuming
devices and then receiving a device interrupt.

That means there is no guarantee that all CPUs will wakeup directly on
resume. As a consequence there is no guarantee that timers which are queued
on those CPUs and should expire directly after resume, are handled. Also
timer list timers which are remotely queued to one of those CPUs after
resume will not result in a reprogramming IPI as the tick is
active. Queueing a hrtimer will also not result in a reprogramming IPI
because the first hrtimer event is already in the past.

The recent introduction of the timer pull model (7ee988770326 ("timers:
Implement the hierarchical pull model")) amplifies this problem, if the
current migrator is one of the non woken up CPUs. When a non pinned timer
list timer is queued and the queuing CPU goes idle, it relies on the still
suspended migrator CPU to expire the timer which will happen by chance.

The problem exists since commit 8d89835b0467 ("PM: suspend: Do not pause
cpuidle in the suspend-to-idle path"). There the cpuidle_pause() call which
in turn invoked a wakeup for all idle CPUs was moved to a later point in
the resume process. This might not be reached or reached very late because
it waits on a timer of a still suspended CPU.

Address this by kicking all CPUs out of idle after the control CPU returns
from swait() so that they resume their timers and restore consistent system
state.

Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218641
Fixes: 8d89835b0467 ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path")
Signed-off-by: Anna-Maria Behnsen &lt;anna-maria@linutronix.de&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Cc: 5.16+ &lt;stable@kernel.org&gt; # 5.16+
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Ulf Hansson &lt;ulf.hansson@linaro.org&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ring-buffer: Only update pages_touched when a new page is touched</title>
<updated>2024-04-17T09:19:25Z</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2024-04-09T19:13:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a9cd92bc051f4a8eba0faee62011d4c7a5b98f0b'/>
<id>urn:sha1:a9cd92bc051f4a8eba0faee62011d4c7a5b98f0b</id>
<content type='text'>
commit ffe3986fece696cf65e0ef99e74c75f848be8e30 upstream.

The "buffer_percent" logic that is used by the ring buffer splice code to
only wake up the tasks when there's no data after the buffer is filled to
the percentage of the "buffer_percent" file is dependent on three
variables that determine the amount of data that is in the ring buffer:

 1) pages_read - incremented whenever a new sub-buffer is consumed
 2) pages_lost - incremented every time a writer overwrites a sub-buffer
 3) pages_touched - incremented when a write goes to a new sub-buffer

The percentage is the calculation of:

  (pages_touched - (pages_lost + pages_read)) / nr_pages

Basically, the amount of data is the total number of sub-bufs that have been
touched, minus the number of sub-bufs lost and sub-bufs consumed. This is
divided by the total count to give the buffer percentage. When the
percentage is greater than the value in the "buffer_percent" file, it
wakes up splice readers waiting for that amount.

It was observed that over time, the amount read from the splice was
constantly decreasing the longer the trace was running. That is, if one
asked for 60%, it would read over 60% when it first starts tracing, but
then it would be woken up at under 60% and would slowly decrease the
amount of data read after being woken up, where the amount becomes much
less than the buffer percent.

This was due to an accounting of the pages_touched incrementation. This
value is incremented whenever a writer transfers to a new sub-buffer. But
the place where it was incremented was incorrect. If a writer overflowed
the current sub-buffer it would go to the next one. If it gets preempted
by an interrupt at that time, and the interrupt performs a trace, it too
will end up going to the next sub-buffer. But only one should increment
the counter. Unfortunately, that was not the case.

Change the cmpxchg() that does the real switch of the tail-page into a
try_cmpxchg(), and on success, perform the increment of pages_touched. This
will only increment the counter once for when the writer moves to a new
sub-buffer, and not when there's a race and is incremented for when a
writer and its preempting writer both move to the same new sub-buffer.

Link: https://lore.kernel.org/linux-trace-kernel/20240409151309.0d0e5056@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>ring-buffer: use READ_ONCE() to read cpu_buffer-&gt;commit_page in concurrent environment</title>
<updated>2024-04-13T11:07:38Z</updated>
<author>
<name>linke li</name>
<email>lilinke99@qq.com</email>
</author>
<published>2024-03-02T04:42:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fb8579acacd0c39b7da0a647a8130c21f782fb5b'/>
<id>urn:sha1:fb8579acacd0c39b7da0a647a8130c21f782fb5b</id>
<content type='text'>
[ Upstream commit f1e30cb6369251c03f63c564006f96a54197dcc4 ]

In function ring_buffer_iter_empty(), cpu_buffer-&gt;commit_page is read
while other threads may change it. It may cause the time_stamp that read
in the next line come from a different page. Use READ_ONCE() to avoid
having to reason about compiler optimizations now and in future.

Link: https://lore.kernel.org/linux-trace-kernel/tencent_DFF7D3561A0686B5E8FC079150A02505180A@qq.com

Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Signed-off-by: linke li &lt;lilinke99@qq.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>rcu/nocb: Fix WARN_ON_ONCE() in the rcu_nocb_bypass_lock()</title>
<updated>2024-04-13T11:07:34Z</updated>
<author>
<name>Zqiang</name>
<email>qiang.zhang1211@gmail.com</email>
</author>
<published>2024-01-10T08:11:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4d58c9fb45c70e62c19e8be3f3605889c47601bc'/>
<id>urn:sha1:4d58c9fb45c70e62c19e8be3f3605889c47601bc</id>
<content type='text'>
[ Upstream commit dda98810b552fc6bf650f4270edeebdc2f28bd3f ]

For the kernels built with CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y and
CONFIG_RCU_LAZY=y, the following scenarios will trigger WARN_ON_ONCE()
in the rcu_nocb_bypass_lock() and rcu_nocb_wait_contended() functions:

        CPU2                                               CPU11
kthread
rcu_nocb_cb_kthread                                       ksys_write
rcu_do_batch                                              vfs_write
rcu_torture_timer_cb                                      proc_sys_write
__kmem_cache_free                                         proc_sys_call_handler
kmemleak_free                                             drop_caches_sysctl_handler
delete_object_full                                        drop_slab
__delete_object                                           shrink_slab
put_object                                                lazy_rcu_shrink_scan
call_rcu                                                  rcu_nocb_flush_bypass
__call_rcu_commn                                            rcu_nocb_bypass_lock
                                                            raw_spin_trylock(&amp;rdp-&gt;nocb_bypass_lock) fail
                                                            atomic_inc(&amp;rdp-&gt;nocb_lock_contended);
rcu_nocb_wait_contended                                     WARN_ON_ONCE(smp_processor_id() != rdp-&gt;cpu);
 WARN_ON_ONCE(atomic_read(&amp;rdp-&gt;nocb_lock_contended))                                          |
                            |_ _ _ _ _ _ _ _ _ _same rdp and rdp-&gt;cpu != 11_ _ _ _ _ _ _ _ _ __|

Reproduce this bug with "echo 3 &gt; /proc/sys/vm/drop_caches".

This commit therefore uses rcu_nocb_try_flush_bypass() instead of
rcu_nocb_flush_bypass() in lazy_rcu_shrink_scan().  If the nocb_bypass
queue is being flushed, then rcu_nocb_try_flush_bypass will return
directly.

Signed-off-by: Zqiang &lt;qiang.zhang1211@gmail.com&gt;
Reviewed-by: Joel Fernandes (Google) &lt;joel@joelfernandes.org&gt;
Reviewed-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Reviewed-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Signed-off-by: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>dma-direct: Leak pages on dma_set_decrypted() failure</title>
<updated>2024-04-13T11:07:32Z</updated>
<author>
<name>Rick Edgecombe</name>
<email>rick.p.edgecombe@intel.com</email>
</author>
<published>2024-02-22T00:17:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4031b72ca747a1e6e9ae4fa729e765b43363d66a'/>
<id>urn:sha1:4031b72ca747a1e6e9ae4fa729e765b43363d66a</id>
<content type='text'>
[ Upstream commit b9fa16949d18e06bdf728a560f5c8af56d2bdcaf ]

On TDX it is possible for the untrusted host to cause
set_memory_encrypted() or set_memory_decrypted() to fail such that an
error is returned and the resulting memory is shared. Callers need to
take care to handle these errors to avoid returning decrypted (shared)
memory to the page allocator, which could lead to functional or security
issues.

DMA could free decrypted/shared pages if dma_set_decrypted() fails. This
should be a rare case. Just leak the pages in this case instead of
freeing them.

Signed-off-by: Rick Edgecombe &lt;rick.p.edgecombe@intel.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>panic: Flush kernel log buffer at the end</title>
<updated>2024-04-13T11:07:29Z</updated>
<author>
<name>John Ogness</name>
<email>john.ogness@linutronix.de</email>
</author>
<published>2024-02-07T13:41:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=750d44684a95c1ca0d6c8a7e2d24abb500790ee6'/>
<id>urn:sha1:750d44684a95c1ca0d6c8a7e2d24abb500790ee6</id>
<content type='text'>
[ Upstream commit d988d9a9b9d180bfd5c1d353b3b176cb90d6861b ]

If the kernel crashes in a context where printk() calls always
defer printing (such as in NMI or inside a printk_safe section)
then the final panic messages will be deferred to irq_work. But
if irq_work is not available, the messages will not get printed
unless explicitly flushed. The result is that the final
"end Kernel panic" banner does not get printed.

Add one final flush after the last printk() call to make sure
the final panic messages make it out as well.

Signed-off-by: John Ogness &lt;john.ogness@linutronix.de&gt;
Reviewed-by: Petr Mladek &lt;pmladek@suse.com&gt;
Link: https://lore.kernel.org/r/20240207134103.1357162-14-john.ogness@linutronix.de
Signed-off-by: Petr Mladek &lt;pmladek@suse.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
