<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/sched/ext.c, branch v6.12.71</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.12.71</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.12.71'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2026-01-30T09:28:48Z</updated>
<entry>
<title>sched_ext: Fix possible deadlock in the deferred_irq_workfn()</title>
<updated>2026-01-30T09:28:48Z</updated>
<author>
<name>Zqiang</name>
<email>qiang.zhang@linux.dev</email>
</author>
<published>2026-01-23T02:52:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=541959b2fadb832a7d0ceb95041dc52bdcf6bff7'/>
<id>urn:sha1:541959b2fadb832a7d0ceb95041dc52bdcf6bff7</id>
<content type='text'>
[ Upstream commit a257e974210320ede524f340ffe16bf4bf0dda1e ]

For PREEMPT_RT=y kernels, the deferred_irq_workfn() is executed in
the per-cpu irq_work/* task context and not disable-irq, if the rq
returned by container_of() is current CPU's rq, the following scenarios
may occur:

lock(&amp;rq-&gt;__lock);
&lt;Interrupt&gt;
  lock(&amp;rq-&gt;__lock);

This commit use IRQ_WORK_INIT_HARD() to replace init_irq_work() to
initialize rq-&gt;scx.deferred_irq_work, make the deferred_irq_workfn()
is always invoked in hard-irq context.

Signed-off-by: Zqiang &lt;qiang.zhang@linux.dev&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Chen Yu &lt;xnguchen@sina.cn&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: Fix missing post-enqueue handling in move_local_task_to_local_dsq()</title>
<updated>2026-01-08T09:14:58Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-01-02T02:01:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d4dd6694d1021c401bfece43adb7b3b91592558d'/>
<id>urn:sha1:d4dd6694d1021c401bfece43adb7b3b91592558d</id>
<content type='text'>
[ Upstream commit f5e1e5ec204da11fa87fdf006d451d80ce06e118 ]

move_local_task_to_local_dsq() is used when moving a task from a non-local
DSQ to a local DSQ on the same CPU. It directly manipulates the local DSQ
without going through dispatch_enqueue() and was missing the post-enqueue
handling that triggers preemption when SCX_ENQ_PREEMPT is set or the idle
task is running.

The function is used by move_task_between_dsqs() which backs
scx_bpf_dsq_move() and may be called while the CPU is busy.

Add local_dsq_post_enq() call to move_local_task_to_local_dsq(). As the
dispatch path doesn't need post-enqueue handling, add SCX_RQ_IN_BALANCE
early exit to keep consume_dispatch_q() behavior unchanged and avoid
triggering unnecessary resched when scx_bpf_dsq_move() is used from the
dispatch path.

Fixes: 4c30f5ce4f7a ("sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()")
Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Andrea Righi &lt;arighi@nvidia.com&gt;
Reviewed-by: Emil Tsalapatis &lt;emil@etsalapatis.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: Factor out local_dsq_post_enq() from dispatch_enqueue()</title>
<updated>2026-01-08T09:14:58Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-01-02T02:01:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=44273abc2fea0134de4c7fd9807a7ded59511ae7'/>
<id>urn:sha1:44273abc2fea0134de4c7fd9807a7ded59511ae7</id>
<content type='text'>
[ Upstream commit 530b6637c79e728d58f1d9b66bd4acf4b735b86d ]

Factor out local_dsq_post_enq() which performs post-enqueue handling for
local DSQs - triggering resched_curr() if SCX_ENQ_PREEMPT is specified or if
the current CPU is idle. No functional change.

This will be used by the next patch to fix move_local_task_to_local_dsq().

Cc: stable@vger.kernel.org # v6.12+
Reviewed-by: Andrea Righi &lt;arighi@nvidia.com&gt;
Reviewed-by: Emil Tsalapatis &lt;emil@etsalapatis.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: Fix incorrect sched_class settings for per-cpu migration tasks</title>
<updated>2026-01-08T09:14:56Z</updated>
<author>
<name>Zqiang</name>
<email>qiang.zhang@linux.dev</email>
</author>
<published>2025-12-29T19:39:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e61f636cc31042b717a46f6937a2dfaf21f45c91'/>
<id>urn:sha1:e61f636cc31042b717a46f6937a2dfaf21f45c91</id>
<content type='text'>
[ Upstream commit 1dd6c84f1c544e552848a8968599220bd464e338 ]

When loading the ebpf scheduler, the tasks in the scx_tasks list will
be traversed and invoke __setscheduler_class() to get new sched_class.
however, this would also incorrectly set the per-cpu migration
task's-&gt;sched_class to rt_sched_class, even after unload, the per-cpu
migration task's-&gt;sched_class remains sched_rt_class.

The log for this issue is as follows:

./scx_rustland --stats 1
[  199.245639][  T630] sched_ext: "rustland" does not implement cgroup cpu.weight
[  199.269213][  T630] sched_ext: BPF scheduler "rustland" enabled
04:25:09 [INFO] RustLand scheduler attached

bpftrace -e 'iter:task /strcontains(ctx-&gt;task-&gt;comm, "migration")/
{ printf("%s:%d-&gt;%pS\n", ctx-&gt;task-&gt;comm, ctx-&gt;task-&gt;pid, ctx-&gt;task-&gt;sched_class); }'
Attaching 1 probe...
migration/0:24-&gt;rt_sched_class+0x0/0xe0
migration/1:27-&gt;rt_sched_class+0x0/0xe0
migration/2:33-&gt;rt_sched_class+0x0/0xe0
migration/3:39-&gt;rt_sched_class+0x0/0xe0
migration/4:45-&gt;rt_sched_class+0x0/0xe0
migration/5:52-&gt;rt_sched_class+0x0/0xe0
migration/6:58-&gt;rt_sched_class+0x0/0xe0
migration/7:64-&gt;rt_sched_class+0x0/0xe0

sched_ext: BPF scheduler "rustland" disabled (unregistered from user space)
EXIT: unregistered from user space
04:25:21 [INFO] Unregister RustLand scheduler

bpftrace -e 'iter:task /strcontains(ctx-&gt;task-&gt;comm, "migration")/
{ printf("%s:%d-&gt;%pS\n", ctx-&gt;task-&gt;comm, ctx-&gt;task-&gt;pid, ctx-&gt;task-&gt;sched_class); }'
Attaching 1 probe...
migration/0:24-&gt;rt_sched_class+0x0/0xe0
migration/1:27-&gt;rt_sched_class+0x0/0xe0
migration/2:33-&gt;rt_sched_class+0x0/0xe0
migration/3:39-&gt;rt_sched_class+0x0/0xe0
migration/4:45-&gt;rt_sched_class+0x0/0xe0
migration/5:52-&gt;rt_sched_class+0x0/0xe0
migration/6:58-&gt;rt_sched_class+0x0/0xe0
migration/7:64-&gt;rt_sched_class+0x0/0xe0

This commit therefore generate a new scx_setscheduler_class() and
add check for stop_sched_class to replace __setscheduler_class().

Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Zqiang &lt;qiang.zhang@linux.dev&gt;
Reviewed-by: Andrea Righi &lt;arighi@nvidia.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
[ Adjust context ]
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: Fix unsafe locking in the scx_dump_state()</title>
<updated>2025-11-24T09:35:57Z</updated>
<author>
<name>Zqiang</name>
<email>qiang.zhang@linux.dev</email>
</author>
<published>2025-11-12T07:33:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=13d1c96d3a9f208bc1aa8642f6362dca25a157d2'/>
<id>urn:sha1:13d1c96d3a9f208bc1aa8642f6362dca25a157d2</id>
<content type='text'>
[ Upstream commit 5f02151c411dda46efcc5dc57b0845efcdcfc26d ]

For built with CONFIG_PREEMPT_RT=y kernels, the dump_lock will be converted
sleepable spinlock and not disable-irq, so the following scenarios occur:

inconsistent {IN-HARDIRQ-W} -&gt; {HARDIRQ-ON-W} usage.
irq_work/0/27 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&amp;rq-&gt;__lock){?...}-{2:2}, at: raw_spin_rq_lock_nested+0x2b/0x40
{IN-HARDIRQ-W} state was registered at:
   lock_acquire+0x1e1/0x510
   _raw_spin_lock_nested+0x42/0x80
   raw_spin_rq_lock_nested+0x2b/0x40
   sched_tick+0xae/0x7b0
   update_process_times+0x14c/0x1b0
   tick_periodic+0x62/0x1f0
   tick_handle_periodic+0x48/0xf0
   timer_interrupt+0x55/0x80
   __handle_irq_event_percpu+0x20a/0x5c0
   handle_irq_event_percpu+0x18/0xc0
   handle_irq_event+0xb5/0x150
   handle_level_irq+0x220/0x460
   __common_interrupt+0xa2/0x1e0
   common_interrupt+0xb0/0xd0
   asm_common_interrupt+0x2b/0x40
   _raw_spin_unlock_irqrestore+0x45/0x80
   __setup_irq+0xc34/0x1a30
   request_threaded_irq+0x214/0x2f0
   hpet_time_init+0x3e/0x60
   x86_late_time_init+0x5b/0xb0
   start_kernel+0x308/0x410
   x86_64_start_reservations+0x1c/0x30
   x86_64_start_kernel+0x96/0xa0
   common_startup_64+0x13e/0x148

 other info that might help us debug this:
 Possible unsafe locking scenario:

        CPU0
        ----
   lock(&amp;rq-&gt;__lock);
   &lt;Interrupt&gt;
     lock(&amp;rq-&gt;__lock);

  *** DEADLOCK ***

 stack backtrace:
 CPU: 0 UID: 0 PID: 27 Comm: irq_work/0
 Call Trace:
  &lt;TASK&gt;
  dump_stack_lvl+0x8c/0xd0
  dump_stack+0x14/0x20
  print_usage_bug+0x42e/0x690
  mark_lock.part.44+0x867/0xa70
  ? __pfx_mark_lock.part.44+0x10/0x10
  ? string_nocheck+0x19c/0x310
  ? number+0x739/0x9f0
  ? __pfx_string_nocheck+0x10/0x10
  ? __pfx_check_pointer+0x10/0x10
  ? kvm_sched_clock_read+0x15/0x30
  ? sched_clock_noinstr+0xd/0x20
  ? local_clock_noinstr+0x1c/0xe0
  __lock_acquire+0xc4b/0x62b0
  ? __pfx_format_decode+0x10/0x10
  ? __pfx_string+0x10/0x10
  ? __pfx___lock_acquire+0x10/0x10
  ? __pfx_vsnprintf+0x10/0x10
  lock_acquire+0x1e1/0x510
  ? raw_spin_rq_lock_nested+0x2b/0x40
  ? __pfx_lock_acquire+0x10/0x10
  ? dump_line+0x12e/0x270
  ? raw_spin_rq_lock_nested+0x20/0x40
  _raw_spin_lock_nested+0x42/0x80
  ? raw_spin_rq_lock_nested+0x2b/0x40
  raw_spin_rq_lock_nested+0x2b/0x40
  scx_dump_state+0x3b3/0x1270
  ? finish_task_switch+0x27e/0x840
  scx_ops_error_irq_workfn+0x67/0x80
  irq_work_single+0x113/0x260
  irq_work_run_list.part.3+0x44/0x70
  run_irq_workd+0x6b/0x90
  ? __pfx_run_irq_workd+0x10/0x10
  smpboot_thread_fn+0x529/0x870
  ? __pfx_smpboot_thread_fn+0x10/0x10
  kthread+0x305/0x3f0
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x40/0x70
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  &lt;/TASK&gt;

This commit therefore use rq_lock_irqsave/irqrestore() to replace
rq_lock/unlock() in the scx_dump_state().

Fixes: 07814a9439a3 ("sched_ext: Print debug dump after an error exit")
Signed-off-by: Zqiang &lt;qiang.zhang@linux.dev&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: Mark scx_bpf_dsq_move_set_[slice|vtime]() with KF_RCU</title>
<updated>2025-11-13T20:34:01Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2025-11-02T14:44:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=46fcee9f99ef401a61a97368263b37f60bc6a0c7'/>
<id>urn:sha1:46fcee9f99ef401a61a97368263b37f60bc6a0c7</id>
<content type='text'>
[ Upstream commit 54e96258a6930909b690fd7e8889749231ba8085 ]

scx_bpf_dsq_move_set_slice() and scx_bpf_dsq_move_set_vtime() take a DSQ
iterator argument which has to be valid. Mark them with KF_RCU.

Fixes: 4c30f5ce4f7a ("sched_ext: Implement scx_bpf_dispatch[_vtime]_from_dsq()")
Cc: stable@vger.kernel.org # v6.12+
Acked-by: Andrea Righi &lt;arighi@nvidia.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
[ scx_bpf_dsq_move_set_* =&gt; scx_bpf_dispatch_from_dsq_set_* ]
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: initialize built-in idle state before ops.init()</title>
<updated>2025-08-28T14:31:05Z</updated>
<author>
<name>Andrea Righi</name>
<email>arighi@nvidia.com</email>
</author>
<published>2025-03-25T09:32:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3c06e9ad0bea6f8983f069f56d1e6e3c68ce341b'/>
<id>urn:sha1:3c06e9ad0bea6f8983f069f56d1e6e3c68ce341b</id>
<content type='text'>
commit f0c6eab5e45c529f449fbc595873719e00de6d79 upstream.

A BPF scheduler may want to use the built-in idle cpumasks in ops.init()
before the scheduler is fully initialized, either directly or through a
BPF timer for example.

However, this would result in an error, since the idle state has not
been properly initialized yet.

This can be easily verified by modifying scx_simple to call
scx_bpf_get_idle_cpumask() in ops.init():

$ sudo scx_simple

DEBUG DUMP
===========================================================================

scx_simple[121] triggered exit kind 1024:
  runtime error (built-in idle tracking is disabled)
...

Fix this by properly initializing the idle state before ops.init() is
called. With this change applied:

$ sudo scx_simple
local=2 global=0
local=19 global=11
local=23 global=11
...

Fixes: d73249f88743d ("sched_ext: idle: Make idle static keys private")
Signed-off-by: Andrea Righi &lt;arighi@nvidia.com&gt;
Reviewed-by: Joel Fernandes &lt;joelagnelf@nvidia.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
[ Backport to 6.12:
  - Original commit doesn't apply cleanly to 6.12 since d73249f88743d is
    not present.
  - This backport applies the same logical fix to prevent BPF scheduler
    failures while accessing idle cpumasks from ops.init(). ]
Signed-off-by: Andrea Righi &lt;arighi@nvidia.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched/ext: Fix invalid task state transitions on class switch</title>
<updated>2025-08-28T14:31:03Z</updated>
<author>
<name>Andrea Righi</name>
<email>arighi@nvidia.com</email>
</author>
<published>2025-08-05T08:59:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=786f6314604b34c3e7de5f733f4e08e35c448a50'/>
<id>urn:sha1:786f6314604b34c3e7de5f733f4e08e35c448a50</id>
<content type='text'>
commit ddf7233fcab6c247379d0928d46cc316ee122229 upstream.

When enabling a sched_ext scheduler, we may trigger invalid task state
transitions, resulting in warnings like the following (which can be
easily reproduced by running the hotplug selftest in a loop):

 sched_ext: Invalid task state transition 0 -&gt; 3 for fish[770]
 WARNING: CPU: 18 PID: 787 at kernel/sched/ext.c:3862 scx_set_task_state+0x7c/0xc0
 ...
 RIP: 0010:scx_set_task_state+0x7c/0xc0
 ...
 Call Trace:
  &lt;TASK&gt;
  scx_enable_task+0x11f/0x2e0
  switching_to_scx+0x24/0x110
  scx_enable.isra.0+0xd14/0x13d0
  bpf_struct_ops_link_create+0x136/0x1a0
  __sys_bpf+0x1edd/0x2c30
  __x64_sys_bpf+0x21/0x30
  do_syscall_64+0xbb/0x370
  entry_SYSCALL_64_after_hwframe+0x77/0x7f

This happens because we skip initialization for tasks that are already
dead (with their usage counter set to zero), but we don't exclude them
during the scheduling class transition phase.

Fix this by also skipping dead tasks during class swiching, preventing
invalid task state transitions.

Fixes: a8532fac7b5d2 ("sched_ext: TASK_DEAD tasks must be switched into SCX on ops_enable")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Andrea Righi &lt;arighi@nvidia.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched_ext: Make scx_group_set_weight() always update tg-&gt;scx.weight</title>
<updated>2025-07-10T14:05:05Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2025-07-06T06:42:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=790ce73721abebf98f72555c9be9f0855e54a45c'/>
<id>urn:sha1:790ce73721abebf98f72555c9be9f0855e54a45c</id>
<content type='text'>
[ Upstream commit c50784e99f0e7199cdb12dbddf02229b102744ef ]

Otherwise, tg-&gt;scx.weight can go out of sync while scx_cgroup is not enabled
and ops.cgroup_init() may be called with a stale weight value.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Fixes: 819513666966 ("sched_ext: Add cgroup support")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched_ext, sched/core: Don't call scx_group_set_weight() prematurely from sched_create_group()</title>
<updated>2025-06-27T10:11:38Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2025-06-16T20:13:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=57ec0818698c7d587ebc204971387bdf9d3c9fa1'/>
<id>urn:sha1:57ec0818698c7d587ebc204971387bdf9d3c9fa1</id>
<content type='text'>
commit 33796b91871ad4010c8188372dd1faf97cf0f1c0 upstream.

During task_group creation, sched_create_group() calls
scx_group_set_weight() with CGROUP_WEIGHT_DFL to initialize the sched_ext
portion. This is premature and ends up calling ops.cgroup_set_weight() with
an incorrect @cgrp before ops.cgroup_init() is called.

sched_create_group() should just initialize SCX related fields in the new
task_group. Fix it by factoring out scx_tg_init() from sched_init() and
making sched_create_group() call that function instead of
scx_group_set_weight().

v2: Retain CONFIG_EXT_GROUP_SCHED ifdef in sched_init() as removing it leads
    to build failures on !CONFIG_GROUP_SCHED configs.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Fixes: 819513666966 ("sched_ext: Add cgroup support")
Cc: stable@vger.kernel.org # v6.12+
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
