<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel, branch v6.1.121</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.1.121</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.1.121'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2024-12-19T17:08:58Z</updated>
<entry>
<title>x86/static-call: provide a way to do very early static-call updates</title>
<updated>2024-12-19T17:08:58Z</updated>
<author>
<name>Juergen Gross</name>
<email>jgross@suse.com</email>
</author>
<published>2024-11-29T15:15:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f8b1f5472b19e29439d58dc52cfb67cfeac18770'/>
<id>urn:sha1:f8b1f5472b19e29439d58dc52cfb67cfeac18770</id>
<content type='text'>
commit 0ef8047b737d7480a5d4c46d956e97c190f13050 upstream.

Add static_call_update_early() for updating static-call targets in
very early boot.

This will be needed for support of Xen guest type specific hypercall
functions.

This is part of XSA-466 / CVE-2024-53241.

Reported-by: Andrew Cooper &lt;andrew.cooper3@citrix.com&gt;
Signed-off-by: Juergen Gross &lt;jgross@suse.com&gt;
Co-developed-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Co-developed-by: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>tracing/kprobes: Skip symbol counting logic for module symbols in create_local_trace_kprobe()</title>
<updated>2024-12-19T17:08:57Z</updated>
<author>
<name>Nikolay Kuratov</name>
<email>kniv@yandex-team.ru</email>
</author>
<published>2024-12-16T11:19:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b070291c83a6e7bd7ac4e1bb197655aea29b620f'/>
<id>urn:sha1:b070291c83a6e7bd7ac4e1bb197655aea29b620f</id>
<content type='text'>
commit b022f0c7e404 ("tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols")
avoids checking number_of_same_symbols() for module symbol in
__trace_kprobe_create(), but create_local_trace_kprobe() should avoid this
check too. Doing this check leads to ENOENT for module_name:symbol_name
constructions passed over perf_event_open.

No bug in newer kernels as it was fixed more generally by
commit 9d8616034f16 ("tracing/kprobes: Add symbol counting check when module loads")

Link: https://lore.kernel.org/linux-trace-kernel/20240705161030.b3ddb33a8167013b9b1da202@kernel.org
Fixes: b022f0c7e404 ("tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols")
Signed-off-by: Nikolay Kuratov &lt;kniv@yandex-team.ru&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bpf: sync_linked_regs() must preserve subreg_def</title>
<updated>2024-12-19T17:08:57Z</updated>
<author>
<name>Eduard Zingerman</name>
<email>eddyz87@gmail.com</email>
</author>
<published>2024-09-24T21:08:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=60fd3538d2a8fd44c41d25088c0ece3e1fd30659'/>
<id>urn:sha1:60fd3538d2a8fd44c41d25088c0ece3e1fd30659</id>
<content type='text'>
commit e9bd9c498cb0f5843996dbe5cbce7a1836a83c70 upstream.

Range propagation must not affect subreg_def marks, otherwise the
following example is rewritten by verifier incorrectly when
BPF_F_TEST_RND_HI32 flag is set:

  0: call bpf_ktime_get_ns                   call bpf_ktime_get_ns
  1: r0 &amp;= 0x7fffffff       after verifier   r0 &amp;= 0x7fffffff
  2: w1 = w0                rewrites         w1 = w0
  3: if w0 &lt; 10 goto +0     --------------&gt;  r11 = 0x2f5674a6     (r)
  4: r1 &gt;&gt;= 32                               r11 &lt;&lt;= 32           (r)
  5: r0 = r1                                 r1 |= r11            (r)
  6: exit;                                   if w0 &lt; 0xa goto pc+0
                                             r1 &gt;&gt;= 32
                                             r0 = r1
                                             exit

(or zero extension of w1 at (2) is missing for architectures that
 require zero extension for upper register half).

The following happens w/o this patch:
- r0 is marked as not a subreg at (0);
- w1 is marked as subreg at (2);
- w1 subreg_def is overridden at (3) by copy_register_state();
- w1 is read at (5) but mark_insn_zext() does not mark (2)
  for zero extension, because w1 subreg_def is not set;
- because of BPF_F_TEST_RND_HI32 flag verifier inserts random
  value for hi32 bits of (2) (marked (r));
- this random value is read at (5).

Fixes: 75748837b7e5 ("bpf: Propagate scalar ranges through register assignments.")
Reported-by: Lonial Con &lt;kongln9170@gmail.com&gt;
Signed-off-by: Lonial Con &lt;kongln9170@gmail.com&gt;
Signed-off-by: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Closes: https://lore.kernel.org/bpf/7e2aa30a62d740db182c170fdd8f81c596df280d.camel@gmail.com
Link: https://lore.kernel.org/bpf/20240924210844.1758441-1-eddyz87@gmail.com
[ shung-hsi.yu: sync_linked_regs() was called find_equal_scalars() before commit
  4bf79f9be434 ("bpf: Track equal scalars history on per-instruction level"), and
  modification is done because there is only a single call to
  copy_register_state() before commit 98d7ca374ba4 ("bpf: Track delta between
  "linked" registers."). ]
Signed-off-by: Shung-Hsi Yu &lt;shung-hsi.yu@suse.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bpf,perf: Fix invalid prog_array access in perf_event_detach_bpf_prog</title>
<updated>2024-12-19T17:08:51Z</updated>
<author>
<name>Jiri Olsa</name>
<email>jolsa@kernel.org</email>
</author>
<published>2024-12-08T14:25:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=842e5af282453983586e2eae3c8eaf252de5f22f'/>
<id>urn:sha1:842e5af282453983586e2eae3c8eaf252de5f22f</id>
<content type='text'>
commit 978c4486cca5c7b9253d3ab98a88c8e769cb9bbd upstream.

Syzbot reported [1] crash that happens for following tracing scenario:

  - create tracepoint perf event with attr.inherit=1, attach it to the
    process and set bpf program to it
  - attached process forks -&gt; chid creates inherited event

    the new child event shares the parent's bpf program and tp_event
    (hence prog_array) which is global for tracepoint

  - exit both process and its child -&gt; release both events
  - first perf_event_detach_bpf_prog call will release tp_event-&gt;prog_array
    and second perf_event_detach_bpf_prog will crash, because
    tp_event-&gt;prog_array is NULL

The fix makes sure the perf_event_detach_bpf_prog checks prog_array
is valid before it tries to remove the bpf program from it.

[1] https://lore.kernel.org/bpf/Z1MR6dCIKajNS6nU@krava/T/#m91dbf0688221ec7a7fc95e896a7ef9ff93b0b8ad

Fixes: 0ee288e69d03 ("bpf,perf: Fix perf_event_detach_bpf_prog error handling")
Reported-by: syzbot+2e0d2840414ce817aaac@syzkaller.appspotmail.com
Signed-off-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20241208142507.1207698-1-jolsa@kernel.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bpf: Fix UAF via mismatching bpf_prog/attachment RCU flavors</title>
<updated>2024-12-19T17:08:48Z</updated>
<author>
<name>Jann Horn</name>
<email>jannh@google.com</email>
</author>
<published>2024-12-10T16:32:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9245459a992d22fe0e92e988f49db1fec82c184a'/>
<id>urn:sha1:9245459a992d22fe0e92e988f49db1fec82c184a</id>
<content type='text'>
commit ef1b808e3b7c98612feceedf985c2fbbeb28f956 upstream.

Uprobes always use bpf_prog_run_array_uprobe() under tasks-trace-RCU
protection. But it is possible to attach a non-sleepable BPF program to a
uprobe, and non-sleepable BPF programs are freed via normal RCU (see
__bpf_prog_put_noref()). This leads to UAF of the bpf_prog because a normal
RCU grace period does not imply a tasks-trace-RCU grace period.

Fix it by explicitly waiting for a tasks-trace-RCU grace period after
removing the attachment of a bpf_prog to a perf_event.

Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps")
Suggested-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Suggested-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Jann Horn &lt;jannh@google.com&gt;
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/bpf/20241210-bpf-fix-actual-uprobe-uaf-v1-1-19439849dd44@google.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bpf: Fix helper writes to read-only maps</title>
<updated>2024-12-14T18:54:53Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2024-09-13T19:17:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=988e55abcf7fdb8fc9a76a7cf3f4e939a4d4fb3a'/>
<id>urn:sha1:988e55abcf7fdb8fc9a76a7cf3f4e939a4d4fb3a</id>
<content type='text'>
commit 32556ce93bc45c730829083cb60f95a2728ea48b upstream.

Lonial found an issue that despite user- and BPF-side frozen BPF map
(like in case of .rodata), it was still possible to write into it from
a BPF program side through specific helpers having ARG_PTR_TO_{LONG,INT}
as arguments.

In check_func_arg() when the argument is as mentioned, the meta-&gt;raw_mode
is never set. Later, check_helper_mem_access(), under the case of
PTR_TO_MAP_VALUE as register base type, it assumes BPF_READ for the
subsequent call to check_map_access_type() and given the BPF map is
read-only it succeeds.

The helpers really need to be annotated as ARG_PTR_TO_{LONG,INT} | MEM_UNINIT
when results are written into them as opposed to read out of them. The
latter indicates that it's okay to pass a pointer to uninitialized memory
as the memory is written to anyway.

However, ARG_PTR_TO_{LONG,INT} is a special case of ARG_PTR_TO_FIXED_SIZE_MEM
just with additional alignment requirement. So it is better to just get
rid of the ARG_PTR_TO_{LONG,INT} special cases altogether and reuse the
fixed size memory types. For this, add MEM_ALIGNED to additionally ensure
alignment given these helpers write directly into the args via *&lt;ptr&gt; = val.
The .arg*_size has been initialized reflecting the actual sizeof(*&lt;ptr&gt;).

MEM_ALIGNED can only be used in combination with MEM_FIXED_SIZE annotated
argument types, since in !MEM_FIXED_SIZE cases the verifier does not know
the buffer size a priori and therefore cannot blindly write *&lt;ptr&gt; = val.

Fixes: 57c3bb725a3d ("bpf: Introduce ARG_PTR_TO_{INT,LONG} arg types")
Reported-by: Lonial Con &lt;kongln9170@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Acked-by: Shung-Hsi Yu &lt;shung-hsi.yu@suse.com&gt;
Link: https://lore.kernel.org/r/20240913191754.13290-3-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
[ Resolve merge conflict in include/linux/bpf.h and merge conflict in
  kernel/bpf/verifier.c.]
Signed-off-by: Bin Lan &lt;bin.lan.cn@windriver.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>tracing/eprobe: Fix to release eprobe when failed to add dyn_event</title>
<updated>2024-12-14T18:54:50Z</updated>
<author>
<name>Masami Hiramatsu (Google)</name>
<email>mhiramat@kernel.org</email>
</author>
<published>2024-11-29T16:47:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=db774504be998a5f1070539f311a4144ffaa862f'/>
<id>urn:sha1:db774504be998a5f1070539f311a4144ffaa862f</id>
<content type='text'>
[ Upstream commit 494b332064c0ce2f7392fa92632bc50191c1b517 ]

Fix eprobe event to unregister event call and release eprobe when it fails
to add dynamic event correctly.

Link: https://lore.kernel.org/all/173289886698.73724.1959899350183686006.stgit@devnote2/

Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/core: Prevent wakeup of ksoftirqd during idle load balance</title>
<updated>2024-12-14T18:54:50Z</updated>
<author>
<name>K Prateek Nayak</name>
<email>kprateek.nayak@amd.com</email>
</author>
<published>2024-11-19T05:44:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=641e9638aadfa88f1b3e5aff51e3fe144655b841'/>
<id>urn:sha1:641e9638aadfa88f1b3e5aff51e3fe144655b841</id>
<content type='text'>
[ Upstream commit e932c4ab38f072ce5894b2851fea8bc5754bb8e5 ]

Scheduler raises a SCHED_SOFTIRQ to trigger a load balancing event on
from the IPI handler on the idle CPU. If the SMP function is invoked
from an idle CPU via flush_smp_call_function_queue() then the HARD-IRQ
flag is not set and raise_softirq_irqoff() needlessly wakes ksoftirqd
because soft interrupts are handled before ksoftirqd get on the CPU.

Adding a trace_printk() in nohz_csd_func() at the spot of raising
SCHED_SOFTIRQ and enabling trace events for sched_switch, sched_wakeup,
and softirq_entry (for SCHED_SOFTIRQ vector alone) helps observing the
current behavior:

       &lt;idle&gt;-0   [000] dN.1.:  nohz_csd_func: Raising SCHED_SOFTIRQ from nohz_csd_func
       &lt;idle&gt;-0   [000] dN.4.:  sched_wakeup: comm=ksoftirqd/0 pid=16 prio=120 target_cpu=000
       &lt;idle&gt;-0   [000] .Ns1.:  softirq_entry: vec=7 [action=SCHED]
       &lt;idle&gt;-0   [000] .Ns1.:  softirq_exit: vec=7  [action=SCHED]
       &lt;idle&gt;-0   [000] d..2.:  sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==&gt; next_comm=ksoftirqd/0 next_pid=16 next_prio=120
  ksoftirqd/0-16  [000] d..2.:  sched_switch: prev_comm=ksoftirqd/0 prev_pid=16 prev_prio=120 prev_state=S ==&gt; next_comm=swapper/0 next_pid=0 next_prio=120
       ...

Use __raise_softirq_irqoff() to raise the softirq. The SMP function call
is always invoked on the requested CPU in an interrupt handler. It is
guaranteed that soft interrupts are handled at the end.

Following are the observations with the changes when enabling the same
set of events:

       &lt;idle&gt;-0       [000] dN.1.: nohz_csd_func: Raising SCHED_SOFTIRQ for nohz_idle_balance
       &lt;idle&gt;-0       [000] dN.1.: softirq_raise: vec=7 [action=SCHED]
       &lt;idle&gt;-0       [000] .Ns1.: softirq_entry: vec=7 [action=SCHED]

No unnecessary ksoftirqd wakeups are seen from idle task's context to
service the softirq.

Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
Closes: https://lore.kernel.org/lkml/fcf823f-195e-6c9a-eac3-25f870cb35ac@inria.fr/ [1]
Reported-by: Julia Lawall &lt;julia.lawall@inria.fr&gt;
Suggested-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: K Prateek Nayak &lt;kprateek.nayak@amd.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Link: https://lore.kernel.org/r/20241119054432.6405-5-kprateek.nayak@amd.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Check idle_cpu() before need_resched() to detect ilb CPU turning busy</title>
<updated>2024-12-14T18:54:50Z</updated>
<author>
<name>K Prateek Nayak</name>
<email>kprateek.nayak@amd.com</email>
</author>
<published>2024-11-19T05:44:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f4a4250cf53dcfdc5c9cf3bfea26abb0682a5197'/>
<id>urn:sha1:f4a4250cf53dcfdc5c9cf3bfea26abb0682a5197</id>
<content type='text'>
[ Upstream commit ff47a0acfcce309cf9e175149c75614491953c8f ]

Commit b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
optimizes IPIs to idle CPUs in TIF_POLLING_NRFLAG mode by setting the
TIF_NEED_RESCHED flag in idle task's thread info and relying on
flush_smp_call_function_queue() in idle exit path to run the
call-function. A softirq raised by the call-function is handled shortly
after in do_softirq_post_smp_call_flush() but the TIF_NEED_RESCHED flag
remains set and is only cleared later when schedule_idle() calls
__schedule().

need_resched() check in _nohz_idle_balance() exists to bail out of load
balancing if another task has woken up on the CPU currently in-charge of
idle load balancing which is being processed in SCHED_SOFTIRQ context.
Since the optimization mentioned above overloads the interpretation of
TIF_NEED_RESCHED, check for idle_cpu() before going with the existing
need_resched() check which can catch a genuine task wakeup on an idle
CPU processing SCHED_SOFTIRQ from do_softirq_post_smp_call_flush(), as
well as the case where ksoftirqd needs to be preempted as a result of
new task wakeup or slice expiry.

In case of PREEMPT_RT or threadirqs, although the idle load balancing
may be inhibited in some cases on the ilb CPU, the fact that ksoftirqd
is the only fair task going back to sleep will trigger a newidle balance
on the CPU which will alleviate some imbalance if it exists if idle
balance fails to do so.

Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
Signed-off-by: K Prateek Nayak &lt;kprateek.nayak@amd.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20241119054432.6405-4-kprateek.nayak@amd.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/core: Remove the unnecessary need_resched() check in nohz_csd_func()</title>
<updated>2024-12-14T18:54:50Z</updated>
<author>
<name>K Prateek Nayak</name>
<email>kprateek.nayak@amd.com</email>
</author>
<published>2024-11-19T05:44:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=77912496064a73da31948b58985b1c54255197d6'/>
<id>urn:sha1:77912496064a73da31948b58985b1c54255197d6</id>
<content type='text'>
[ Upstream commit ea9cffc0a154124821531991d5afdd7e8b20d7aa ]

The need_resched() check currently in nohz_csd_func() can be tracked
to have been added in scheduler_ipi() back in 2011 via commit
ca38062e57e9 ("sched: Use resched IPI to kick off the nohz idle balance")

Since then, it has travelled quite a bit but it seems like an idle_cpu()
check currently is sufficient to detect the need to bail out from an
idle load balancing. To justify this removal, consider all the following
case where an idle load balancing could race with a task wakeup:

o Since commit f3dd3f674555b ("sched: Remove the limitation of WF_ON_CPU
  on wakelist if wakee cpu is idle") a target perceived to be idle
  (target_rq-&gt;nr_running == 0) will return true for
  ttwu_queue_cond(target) which will offload the task wakeup to the idle
  target via an IPI.

  In all such cases target_rq-&gt;ttwu_pending will be set to 1 before
  queuing the wake function.

  If an idle load balance races here, following scenarios are possible:

  - The CPU is not in TIF_POLLING_NRFLAG mode in which case an actual
    IPI is sent to the CPU to wake it out of idle. If the
    nohz_csd_func() queues before sched_ttwu_pending(), the idle load
    balance will bail out since idle_cpu(target) returns 0 since
    target_rq-&gt;ttwu_pending is 1. If the nohz_csd_func() is queued after
    sched_ttwu_pending() it should see rq-&gt;nr_running to be non-zero and
    bail out of idle load balancing.

  - The CPU is in TIF_POLLING_NRFLAG mode and instead of an actual IPI,
    the sender will simply set TIF_NEED_RESCHED for the target to put it
    out of idle and flush_smp_call_function_queue() in do_idle() will
    execute the call function. Depending on the ordering of the queuing
    of nohz_csd_func() and sched_ttwu_pending(), the idle_cpu() check in
    nohz_csd_func() should either see target_rq-&gt;ttwu_pending = 1 or
    target_rq-&gt;nr_running to be non-zero if there is a genuine task
    wakeup racing with the idle load balance kick.

o The waker CPU perceives the target CPU to be busy
  (targer_rq-&gt;nr_running != 0) but the CPU is in fact going idle and due
  to a series of unfortunate events, the system reaches a case where the
  waker CPU decides to perform the wakeup by itself in ttwu_queue() on
  the target CPU but target is concurrently selected for idle load
  balance (XXX: Can this happen? I'm not sure, but we'll consider the
  mother of all coincidences to estimate the worst case scenario).

  ttwu_do_activate() calls enqueue_task() which would increment
  "rq-&gt;nr_running" post which it calls wakeup_preempt() which is
  responsible for setting TIF_NEED_RESCHED (via a resched IPI or by
  setting TIF_NEED_RESCHED on a TIF_POLLING_NRFLAG idle CPU) The key
  thing to note in this case is that rq-&gt;nr_running is already non-zero
  in case of a wakeup before TIF_NEED_RESCHED is set which would
  lead to idle_cpu() check returning false.

In all cases, it seems that need_resched() check is unnecessary when
checking for idle_cpu() first since an impending wakeup racing with idle
load balancer will either set the "rq-&gt;ttwu_pending" or indicate a newly
woken task via "rq-&gt;nr_running".

Chasing the reason why this check might have existed in the first place,
I came across  Peter's suggestion on the fist iteration of Suresh's
patch from 2011 [1] where the condition to raise the SCHED_SOFTIRQ was:

	sched_ttwu_do_pending(list);

	if (unlikely((rq-&gt;idle == current) &amp;&amp;
	    rq-&gt;nohz_balance_kick &amp;&amp;
	    !need_resched()))
		raise_softirq_irqoff(SCHED_SOFTIRQ);

Since the condition to raise the SCHED_SOFIRQ was preceded by
sched_ttwu_do_pending() (which is equivalent of sched_ttwu_pending()) in
the current upstream kernel, the need_resched() check was necessary to
catch a newly queued task. Peter suggested modifying it to:

	if (idle_cpu() &amp;&amp; rq-&gt;nohz_balance_kick &amp;&amp; !need_resched())
		raise_softirq_irqoff(SCHED_SOFTIRQ);

where idle_cpu() seems to have replaced "rq-&gt;idle == current" check.

Even back then, the idle_cpu() check would have been sufficient to catch
a new task being enqueued. Since commit b2a02fc43a1f ("smp: Optimize
send_call_function_single_ipi()") overloads the interpretation of
TIF_NEED_RESCHED for TIF_POLLING_NRFLAG idling, remove the
need_resched() check in nohz_csd_func() to raise SCHED_SOFTIRQ based
on Peter's suggestion.

Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()")
Suggested-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: K Prateek Nayak &lt;kprateek.nayak@amd.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20241119054432.6405-3-kprateek.nayak@amd.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
