<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel, branch v6.11.5</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.11.5</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.11.5'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2024-10-22T13:51:23Z</updated>
<entry>
<title>fgraph: Use CPU hotplug mechanism to initialize idle shadow stacks</title>
<updated>2024-10-22T13:51:23Z</updated>
<author>
<name>Steven Rostedt</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2024-10-19T01:43:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=79c78057dc5d1f0439cbf5dd7be6915b442edefd'/>
<id>urn:sha1:79c78057dc5d1f0439cbf5dd7be6915b442edefd</id>
<content type='text'>
commit 2c02f7375e658ae93d57a31a66f91b62754ef8f1 upstream.

The function graph infrastructure allocates a shadow stack for every task
when enabled. This includes the idle tasks. The first time the function
graph is invoked, the shadow stacks are created and never freed until the
task exits. This includes the idle tasks.

Only the idle tasks that were for online CPUs had their shadow stacks
created when function graph tracing started. If function graph tracing is
enabled and a CPU comes online, the idle task representing that CPU will
not have its shadow stack created, and all function graph tracing for that
idle task will be silently dropped.

Instead, use the CPU hotplug mechanism to allocate the idle shadow stacks.
This will include idle tasks for CPUs that come online during tracing.

This issue can be reproduced by:

 # cd /sys/kernel/tracing
 # echo 0 &gt; /sys/devices/system/cpu/cpu1/online
 # echo 0 &gt; set_ftrace_pid
 # echo function_graph &gt; current_tracer
 # echo 1 &gt; options/funcgraph-proc
 # echo 1 &gt; /sys/devices/system/cpu/cpu1
 # grep '&lt;idle&gt;' per_cpu/cpu1/trace | head

Before, nothing would show up.

After:
 1)    &lt;idle&gt;-0    |   0.811 us    |                        __enqueue_entity();
 1)    &lt;idle&gt;-0    |   5.626 us    |                      } /* enqueue_entity */
 1)    &lt;idle&gt;-0    |               |                      dl_server_update_idle_time() {
 1)    &lt;idle&gt;-0    |               |                        dl_scaled_delta_exec() {
 1)    &lt;idle&gt;-0    |   0.450 us    |                          arch_scale_cpu_capacity();
 1)    &lt;idle&gt;-0    |   1.242 us    |                        }
 1)    &lt;idle&gt;-0    |   1.908 us    |                      }
 1)    &lt;idle&gt;-0    |               |                      dl_server_start() {
 1)    &lt;idle&gt;-0    |               |                        enqueue_dl_entity() {
 1)    &lt;idle&gt;-0    |               |                          task_contending() {

Note, if tracing stops and restarts, the old way would then initialize
the onlined CPUs.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/20241018214300.6df82178@rorschach
Fixes: 868baf07b1a25 ("ftrace: Fix memory leak with function graph and cpu hotplug")
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>posix-clock: Fix missing timespec64 check in pc_clock_settime()</title>
<updated>2024-10-22T13:51:18Z</updated>
<author>
<name>Jinjie Ruan</name>
<email>ruanjinjie@huawei.com</email>
</author>
<published>2024-10-09T07:23:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1ff7247101af723731ea42ed565d54fb8f341264'/>
<id>urn:sha1:1ff7247101af723731ea42ed565d54fb8f341264</id>
<content type='text'>
commit d8794ac20a299b647ba9958f6d657051fc51a540 upstream.

As Andrew pointed out, it will make sense that the PTP core
checked timespec64 struct's tv_sec and tv_nsec range before calling
ptp-&gt;info-&gt;settime64().

As the man manual of clock_settime() said, if tp.tv_sec is negative or
tp.tv_nsec is outside the range [0..999,999,999], it should return EINVAL,
which include dynamic clocks which handles PTP clock, and the condition is
consistent with timespec64_valid(). As Thomas suggested, timespec64_valid()
only check the timespec is valid, but not ensure that the time is
in a valid range, so check it ahead using timespec64_valid_strict()
in pc_clock_settime() and return -EINVAL if not valid.

There are some drivers that use tp-&gt;tv_sec and tp-&gt;tv_nsec directly to
write registers without validity checks and assume that the higher layer
has checked it, which is dangerous and will benefit from this, such as
hclge_ptp_settime(), igb_ptp_settime_i210(), _rcar_gen4_ptp_settime(),
and some drivers can remove the checks of itself.

Cc: stable@vger.kernel.org
Fixes: 0606f422b453 ("posix clocks: Introduce dynamic clocks")
Acked-by: Richard Cochran &lt;richardcochran@gmail.com&gt;
Suggested-by: Andrew Lunn &lt;andrew@lunn.ch&gt;
Suggested-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Jinjie Ruan &lt;ruanjinjie@huawei.com&gt;
Link: https://patch.msgid.link/20241009072302.1754567-2-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>kthread: unpark only parked kthread</title>
<updated>2024-10-17T13:27:02Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2024-09-13T21:46:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cda5423c1a1c906062ef235c940f249b97d9d135'/>
<id>urn:sha1:cda5423c1a1c906062ef235c940f249b97d9d135</id>
<content type='text'>
commit 214e01ad4ed7158cab66498810094fac5d09b218 upstream.

Calling into kthread unparking unconditionally is mostly harmless when
the kthread is already unparked. The wake up is then simply ignored
because the target is not in TASK_PARKED state.

However if the kthread is per CPU, the wake up is preceded by a call
to kthread_bind() which expects the task to be inactive and in
TASK_PARKED state, which obviously isn't the case if it is unparked.

As a result, calling kthread_stop() on an unparked per-cpu kthread
triggers such a warning:

	WARNING: CPU: 0 PID: 11 at kernel/kthread.c:525 __kthread_bind_mask kernel/kthread.c:525
	 &lt;TASK&gt;
	 kthread_stop+0x17a/0x630 kernel/kthread.c:707
	 destroy_workqueue+0x136/0xc40 kernel/workqueue.c:5810
	 wg_destruct+0x1e2/0x2e0 drivers/net/wireguard/device.c:257
	 netdev_run_todo+0xe1a/0x1000 net/core/dev.c:10693
	 default_device_exit_batch+0xa14/0xa90 net/core/dev.c:11769
	 ops_exit_list net/core/net_namespace.c:178 [inline]
	 cleanup_net+0x89d/0xcc0 net/core/net_namespace.c:640
	 process_one_work kernel/workqueue.c:3231 [inline]
	 process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3312
	 worker_thread+0x86d/0xd70 kernel/workqueue.c:3393
	 kthread+0x2f0/0x390 kernel/kthread.c:389
	 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
	 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
	 &lt;/TASK&gt;

Fix this with skipping unecessary unparking while stopping a kthread.

Link: https://lkml.kernel.org/r/20240913214634.12557-1-frederic@kernel.org
Fixes: 5c25b5ff89f0 ("workqueue: Tag bound workers with KTHREAD_IS_PER_CPU")
Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Reported-by: syzbot+943d34fa3cf2191e3068@syzkaller.appspotmail.com
Tested-by: syzbot+943d34fa3cf2191e3068@syzkaller.appspotmail.com
Suggested-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Hillf Danton &lt;hdanton@sina.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>rcu/nocb: Fix rcuog wake-up from offline softirq</title>
<updated>2024-10-17T13:26:52Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>frederic@kernel.org</email>
</author>
<published>2024-10-10T16:36:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e66b1e01f2eb3209d08122572f41f7838b79540d'/>
<id>urn:sha1:e66b1e01f2eb3209d08122572f41f7838b79540d</id>
<content type='text'>
[ Upstream commit f7345ccc62a4b880cf76458db5f320725f28e400 ]

After a CPU has set itself offline and before it eventually calls
rcutree_report_cpu_dead(), there are still opportunities for callbacks
to be enqueued, for example from a softirq. When that happens on NOCB,
the rcuog wake-up is deferred through an IPI to an online CPU in order
not to call into the scheduler and risk arming the RT-bandwidth after
hrtimers have been migrated out and disabled.

But performing a synchronized IPI from a softirq is buggy as reported in
the following scenario:

        WARNING: CPU: 1 PID: 26 at kernel/smp.c:633 smp_call_function_single
        Modules linked in: rcutorture torture
        CPU: 1 UID: 0 PID: 26 Comm: migration/1 Not tainted 6.11.0-rc1-00012-g9139f93209d1 #1
        Stopper: multi_cpu_stop+0x0/0x320 &lt;- __stop_cpus+0xd0/0x120
        RIP: 0010:smp_call_function_single
        &lt;IRQ&gt;
        swake_up_one_online
        __call_rcu_nocb_wake
        __call_rcu_common
        ? rcu_torture_one_read
        call_timer_fn
        __run_timers
        run_timer_softirq
        handle_softirqs
        irq_exit_rcu
        ? tick_handle_periodic
        sysvec_apic_timer_interrupt
        &lt;/IRQ&gt;

Fix this with forcing deferred rcuog wake up through the NOCB timer when
the CPU is offline. The actual wake up will happen from
rcutree_report_cpu_dead().

Reported-by: kernel test robot &lt;oliver.sang@intel.com&gt;
Closes: https://lore.kernel.org/oe-lkp/202409231644.4c55582d-lkp@intel.com
Fixes: 9139f93209d1 ("rcu/nocb: Fix RT throttling hrtimer armed from offline CPU")
Reviewed-by: "Joel Fernandes (Google)" &lt;joel@joelfernandes.org&gt;
Signed-off-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Signed-off-by: Neeraj Upadhyay &lt;neeraj.upadhyay@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Prevent tail call between progs attached to different hooks</title>
<updated>2024-10-17T13:26:34Z</updated>
<author>
<name>Xu Kuohai</name>
<email>xukuohai@huawei.com</email>
</author>
<published>2024-07-19T11:00:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=88c2a10e6c176c2860cd0659f4c0e9d20b3f64d1'/>
<id>urn:sha1:88c2a10e6c176c2860cd0659f4c0e9d20b3f64d1</id>
<content type='text'>
[ Upstream commit 28ead3eaabc16ecc907cfb71876da028080f6356 ]

bpf progs can be attached to kernel functions, and the attached functions
can take different parameters or return different return values. If
prog attached to one kernel function tail calls prog attached to another
kernel function, the ctx access or return value verification could be
bypassed.

For example, if prog1 is attached to func1 which takes only 1 parameter
and prog2 is attached to func2 which takes two parameters. Since verifier
assumes the bpf ctx passed to prog2 is constructed based on func2's
prototype, verifier allows prog2 to access the second parameter from
the bpf ctx passed to it. The problem is that verifier does not prevent
prog1 from passing its bpf ctx to prog2 via tail call. In this case,
the bpf ctx passed to prog2 is constructed from func1 instead of func2,
that is, the assumption for ctx access verification is bypassed.

Another example, if BPF LSM prog1 is attached to hook file_alloc_security,
and BPF LSM prog2 is attached to hook bpf_lsm_audit_rule_known. Verifier
knows the return value rules for these two hooks, e.g. it is legal for
bpf_lsm_audit_rule_known to return positive number 1, and it is illegal
for file_alloc_security to return positive number. So verifier allows
prog2 to return positive number 1, but does not allow prog1 to return
positive number. The problem is that verifier does not prevent prog1
from calling prog2 via tail call. In this case, prog2's return value 1
will be used as the return value for prog1's hook file_alloc_security.
That is, the return value rule is bypassed.

This patch adds restriction for tail call to prevent such bypasses.

Signed-off-by: Xu Kuohai &lt;xukuohai@huawei.com&gt;
Link: https://lore.kernel.org/r/20240719110059.797546-4-xukuohai@huaweicloud.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Check percpu map value size first</title>
<updated>2024-10-17T13:26:33Z</updated>
<author>
<name>Tao Chen</name>
<email>chen.dylane@gmail.com</email>
</author>
<published>2024-09-10T14:41:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a5f4ee787cdda12db41d50fbb6201e788deb8ad4'/>
<id>urn:sha1:a5f4ee787cdda12db41d50fbb6201e788deb8ad4</id>
<content type='text'>
[ Upstream commit 1d244784be6b01162b732a5a7d637dfc024c3203 ]

Percpu map is often used, but the map value size limit often ignored,
like issue: https://github.com/iovisor/bcc/issues/2519. Actually,
percpu map value size is bound by PCPU_MIN_UNIT_SIZE, so we
can check the value size whether it exceeds PCPU_MIN_UNIT_SIZE first,
like percpu map of local_storage. Maybe the error message seems clearer
compared with "cannot allocate memory".

Signed-off-by: Jinke Han &lt;jinkehan@didiglobal.com&gt;
Signed-off-by: Tao Chen &lt;chen.dylane@gmail.com&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Acked-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20240910144111.1464912-2-chen.dylane@gmail.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: Call the missed btf_record_free() when map creation fails</title>
<updated>2024-10-17T13:26:33Z</updated>
<author>
<name>Hou Tao</name>
<email>houtao1@huawei.com</email>
</author>
<published>2024-09-12T01:28:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=be62de5079901656260881f15dbfe43ce24e6ec3'/>
<id>urn:sha1:be62de5079901656260881f15dbfe43ce24e6ec3</id>
<content type='text'>
[ Upstream commit 87e9675a0dfd0bf4a36550e4a0e673038ec67aee ]

When security_bpf_map_create() in map_create() fails, map_create() will
call btf_put() and -&gt;map_free() callback to free the map. It doesn't
free the btf_record of map value, so add the missed btf_record_free()
when map creation fails.

However btf_record_free() needs to be called after -&gt;map_free() just
like bpf_map_free_deferred() did, because -&gt;map_free() may use the
btf_record to free the special fields in preallocated map value. So
factor out bpf_map_free() helper to free the map, btf_record, and btf
orderly and use the helper in both map_create() and
bpf_map_free_deferred().

Signed-off-by: Hou Tao &lt;houtao1@huawei.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Link: https://lore.kernel.org/r/20240912012845.3458483-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>uprobes: fix kernel info leak via "[uprobes]" vma</title>
<updated>2024-10-10T10:04:14Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2024-10-07T17:46:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9634e8dc964a4adafa7e1535147abd7ec29441a6'/>
<id>urn:sha1:9634e8dc964a4adafa7e1535147abd7ec29441a6</id>
<content type='text'>
commit 34820304cc2cd1804ee1f8f3504ec77813d29c8e upstream.

xol_add_vma() maps the uninitialized page allocated by __create_xol_area()
into userspace. On some architectures (x86) this memory is readable even
without VM_READ, VM_EXEC results in the same pgprot_t as VM_EXEC|VM_READ,
although this doesn't really matter, debugger can read this memory anyway.

Link: https://lore.kernel.org/all/20240929162047.GA12611@redhat.com/

Reported-by: Will Deacon &lt;will@kernel.org&gt;
Fixes: d4b3b6384f98 ("uprobes/core: Allocate XOL slots for uprobes use")
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>close_range(): fix the logics in descriptor table trimming</title>
<updated>2024-10-10T10:04:11Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2024-08-16T19:17:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=19497297ecb4945a8aa61adb8e438975f49c807b'/>
<id>urn:sha1:19497297ecb4945a8aa61adb8e438975f49c807b</id>
<content type='text'>
commit 678379e1d4f7443b170939525d3312cfc37bf86b upstream.

Cloning a descriptor table picks the size that would cover all currently
opened files.  That's fine for clone() and unshare(), but for close_range()
there's an additional twist - we clone before we close, and it would be
a shame to have
	close_range(3, ~0U, CLOSE_RANGE_UNSHARE)
leave us with a huge descriptor table when we are not going to keep
anything past stderr, just because some large file descriptor used to
be open before our call has taken it out.

Unfortunately, it had been dealt with in an inherently racy way -
sane_fdtable_size() gets a "don't copy anything past that" argument
(passed via unshare_fd() and dup_fd()), close_range() decides how much
should be trimmed and passes that to unshare_fd().

The problem is, a range that used to extend to the end of descriptor
table back when close_range() had looked at it might very well have stuff
grown after it by the time dup_fd() has allocated a new files_struct
and started to figure out the capacity of fdtable to be attached to that.

That leads to interesting pathological cases; at the very least it's a
QoI issue, since unshare(CLONE_FILES) is atomic in a sense that it takes
a snapshot of descriptor table one might have observed at some point.
Since CLOSE_RANGE_UNSHARE close_range() is supposed to be a combination
of unshare(CLONE_FILES) with plain close_range(), ending up with a
weird state that would never occur with unshare(2) is confusing, to put
it mildly.

It's not hard to get rid of - all it takes is passing both ends of the
range down to sane_fdtable_size().  There we are under -&gt;files_lock,
so the race is trivially avoided.

So we do the following:
	* switch close_files() from calling unshare_fd() to calling
dup_fd().
	* undo the calling convention change done to unshare_fd() in
60997c3d45d9 "close_range: add CLOSE_RANGE_UNSHARE"
	* introduce struct fd_range, pass a pointer to that to dup_fd()
and sane_fdtable_size() instead of "trim everything past that point"
they are currently getting.  NULL means "we are not going to be punching
any holes"; NR_OPEN_MAX is gone.
	* make sane_fdtable_size() use find_last_bit() instead of
open-coding it; it's easier to follow that way.
	* while we are at it, have dup_fd() report errors by returning
ERR_PTR(), no need to use a separate int *errorp argument.

Fixes: 60997c3d45d9 "close_range: add CLOSE_RANGE_UNSHARE"
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>tracing/timerlat: Fix duplicated kthread creation due to CPU online/offline</title>
<updated>2024-10-10T10:04:10Z</updated>
<author>
<name>Wei Li</name>
<email>liwei391@huawei.com</email>
</author>
<published>2024-09-24T09:45:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=37acf0ea6155d411860cf57da04607f6aede603c'/>
<id>urn:sha1:37acf0ea6155d411860cf57da04607f6aede603c</id>
<content type='text'>
commit 0bb0a5c12ecf36ad561542bbb95f96355e036a02 upstream.

osnoise_hotplug_workfn() is the asynchronous online callback for
"trace/osnoise:online". It may be congested when a CPU goes online and
offline repeatedly and is invoked for multiple times after a certain
online.

This will lead to kthread leak and timer corruption. Add a check
in start_kthread() to prevent this situation.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Link: https://lore.kernel.org/20240924094515.3561410-2-liwei391@huawei.com
Fixes: c8895e271f79 ("trace/osnoise: Support hotplug operations")
Signed-off-by: Wei Li &lt;liwei391@huawei.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
