<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/sched/rt.c, branch v6.7.9</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.7.9</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.7.9'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2023-10-13T07:56:29Z</updated>
<entry>
<title>sched: Fix stop_one_cpu_nowait() vs hotplug</title>
<updated>2023-10-13T07:56:29Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2023-10-10T18:57:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f0498d2a54e7966ce23cd7c7ff42c64fa0059b07'/>
<id>urn:sha1:f0498d2a54e7966ce23cd7c7ff42c64fa0059b07</id>
<content type='text'>
Kuyo reported sporadic failures on a sched_setaffinity() vs CPU
hotplug stress-test -- notably affine_move_task() remains stuck in
wait_for_completion(), leading to a hung-task detector warning.

Specifically, it was reported that stop_one_cpu_nowait(.fn =
migration_cpu_stop) returns false -- this stopper is responsible for
the matching complete().

The race scenario is:

	CPU0					CPU1

					// doing _cpu_down()

  __set_cpus_allowed_ptr()
    task_rq_lock();
					takedown_cpu()
					  stop_machine_cpuslocked(take_cpu_down..)

					&lt;PREEMPT: cpu_stopper_thread()
					  MULTI_STOP_PREPARE
					  ...
    __set_cpus_allowed_ptr_locked()
      affine_move_task()
        task_rq_unlock();

  &lt;PREEMPT: cpu_stopper_thread()\&gt;
    ack_state()
					  MULTI_STOP_RUN
					    take_cpu_down()
					      __cpu_disable();
					      stop_machine_park();
						stopper-&gt;enabled = false;
					 /&gt;
   /&gt;
	stop_one_cpu_nowait(.fn = migration_cpu_stop);
          if (stopper-&gt;enabled) // false!!!

That is, by doing stop_one_cpu_nowait() after dropping rq-lock, the
stopper thread gets a chance to preempt and allows the cpu-down for
the target CPU to complete.

OTOH, since stop_one_cpu_nowait() / cpu_stop_queue_work() needs to
issue a wakeup, it must not be ran under the scheduler locks.

Solve this apparent contradiction by keeping preemption disabled over
the unlock + queue_stopper combination:

	preempt_disable();
	task_rq_unlock(...);
	if (!stop_pending)
	  stop_one_cpu_nowait(...)
	preempt_enable();

This respects the lock ordering contraints while still avoiding the
above race. That is, if we find the CPU is online under rq-lock, the
targeted stop_one_cpu_nowait() must succeed.

Apply this pattern to all similar stop_one_cpu_nowait() invocations.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Reported-by: "Kuyo Chang (張建文)" &lt;Kuyo.Chang@mediatek.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: "Kuyo Chang (張建文)" &lt;Kuyo.Chang@mediatek.com&gt;
Link: https://lkml.kernel.org/r/20231010200442.GA16515@noisy.programming.kicks-ass.net
</content>
</entry>
<entry>
<title>sched/topology: Consolidate and clean up access to a CPU's max compute capacity</title>
<updated>2023-10-09T10:59:48Z</updated>
<author>
<name>Vincent Guittot</name>
<email>vincent.guittot@linaro.org</email>
</author>
<published>2023-10-09T10:36:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7bc263840bc3377186cb06b003ac287bb2f18ce2'/>
<id>urn:sha1:7bc263840bc3377186cb06b003ac287bb2f18ce2</id>
<content type='text'>
Remove the rq::cpu_capacity_orig field and use arch_scale_cpu_capacity()
instead.

The scheduler uses 3 methods to get access to a CPU's max compute capacity:

 - arch_scale_cpu_capacity(cpu) which is the default way to get a CPU's capacity.

 - cpu_capacity_orig field which is periodically updated with
   arch_scale_cpu_capacity().

 - capacity_orig_of(cpu) which encapsulates rq-&gt;cpu_capacity_orig.

There is no real need to save the value returned by arch_scale_cpu_capacity()
in struct rq. arch_scale_cpu_capacity() returns:

 - either a per_cpu variable.

 - or a const value for systems which have only one capacity.

Remove rq::cpu_capacity_orig and use arch_scale_cpu_capacity() everywhere.

No functional changes.

Some performance tests on Arm64:

  - small SMP device (hikey): no noticeable changes
  - HMP device (RB5):         hackbench shows minor improvement (1-2%)
  - large smp (thx2):         hackbench and tbench shows minor improvement (1%)

Signed-off-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Reviewed-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Link: https://lore.kernel.org/r/20231009103621.374412-2-vincent.guittot@linaro.org
</content>
</entry>
<entry>
<title>sched/rt: Change the type of 'sysctl_sched_rt_period' from 'unsigned int' to 'int'</title>
<updated>2023-10-09T10:44:56Z</updated>
<author>
<name>Yajun Deng</name>
<email>yajun.deng@linux.dev</email>
</author>
<published>2023-10-08T02:15:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=089768dfeb3ab294f9ab6a1f2462001f0f879fbb'/>
<id>urn:sha1:089768dfeb3ab294f9ab6a1f2462001f0f879fbb</id>
<content type='text'>
Doing this matches the natural type of 'int' based calculus
in sched_rt_handler(), and also enables the adding in of a
correct upper bounds check on the sysctl interface.

[ mingo: Rewrote the changelog. ]

Signed-off-by: Yajun Deng &lt;yajun.deng@linux.dev&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20231008021538.3063250-1-yajun.deng@linux.dev
</content>
</entry>
<entry>
<title>sched/rt: Disallow writing invalid values to sched_rt_period_us</title>
<updated>2023-10-02T13:15:56Z</updated>
<author>
<name>Cyril Hrubis</name>
<email>chrubis@suse.cz</email>
</author>
<published>2023-10-02T11:55:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=079be8fc630943d9fc70a97807feb73d169ee3fc'/>
<id>urn:sha1:079be8fc630943d9fc70a97807feb73d169ee3fc</id>
<content type='text'>
The validation of the value written to sched_rt_period_us was broken
because:

  - the sysclt_sched_rt_period is declared as unsigned int
  - parsed by proc_do_intvec()
  - the range is asserted after the value parsed by proc_do_intvec()

Because of this negative values written to the file were written into a
unsigned integer that were later on interpreted as large positive
integers which did passed the check:

  if (sysclt_sched_rt_period &lt;= 0)
	return EINVAL;

This commit fixes the parsing by setting explicit range for both
perid_us and runtime_us into the sched_rt_sysctls table and processes
the values with proc_dointvec_minmax() instead.

Alternatively if we wanted to use full range of unsigned int for the
period value we would have to split the proc_handler and use
proc_douintvec() for it however even the
Documentation/scheduller/sched-rt-group.rst describes the range as 1 to
INT_MAX.

As far as I can tell the only problem this causes is that the sysctl
file allows writing negative values which when read back may confuse
userspace.

There is also a LTP test being submitted for these sysctl files at:

  http://patchwork.ozlabs.org/project/ltp/patch/20230901144433.2526-1-chrubis@suse.cz/

Signed-off-by: Cyril Hrubis &lt;chrubis@suse.cz&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20231002115553.3007-2-chrubis@suse.cz
</content>
</entry>
<entry>
<title>sched/rt: Make rt_rq-&gt;pushable_tasks updates drive rto_mask</title>
<updated>2023-09-25T08:25:29Z</updated>
<author>
<name>Valentin Schneider</name>
<email>vschneid@redhat.com</email>
</author>
<published>2023-08-11T11:20:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=612f769edd06a6e42f7cd72425488e68ddaeef0a'/>
<id>urn:sha1:612f769edd06a6e42f7cd72425488e68ddaeef0a</id>
<content type='text'>
Sebastian noted that the rto_push_work IRQ work can be queued for a CPU
that has an empty pushable_tasks list, which means nothing useful will be
done in the IPI other than queue the work for the next CPU on the rto_mask.

rto_push_irq_work_func() only operates on tasks in the pushable_tasks list,
but the conditions for that irq_work to be queued (and for a CPU to be
added to the rto_mask) rely on rq_rt-&gt;nr_migratory instead.

nr_migratory is increased whenever an RT task entity is enqueued and it has
nr_cpus_allowed &gt; 1. Unlike the pushable_tasks list, nr_migratory includes a
rt_rq's current task. This means a rt_rq can have a migratible current, N
non-migratible queued tasks, and be flagged as overloaded / have its CPU
set in the rto_mask, despite having an empty pushable_tasks list.

Make an rt_rq's overload logic be driven by {enqueue,dequeue}_pushable_task().
Since rt_rq-&gt;{rt_nr_migratory,rt_nr_total} become unused, remove them.

Note that the case where the current task is pushed away to make way for a
migration-disabled task remains unchanged: the migration-disabled task has
to be in the pushable_tasks list in the first place, which means it has
nr_cpus_allowed &gt; 1.

Reported-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Valentin Schneider &lt;vschneid@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Tested-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Link: https://lore.kernel.org/r/20230811112044.3302588-1-vschneid@redhat.com
</content>
</entry>
<entry>
<title>sched/fair: Rename check_preempt_curr() to wakeup_preempt()</title>
<updated>2023-09-19T08:40:10Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2023-09-19T08:38:21Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e23edc86b09df655bf8963bbcb16647adc787395'/>
<id>urn:sha1:e23edc86b09df655bf8963bbcb16647adc787395</id>
<content type='text'>
The name is a bit opaque - make it clear that this is about wakeup
preemption.

Also rename the -&gt;check_preempt_curr() methods similarly.

Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
</content>
</entry>
<entry>
<title>sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset</title>
<updated>2023-08-14T15:01:23Z</updated>
<author>
<name>Cyril Hrubis</name>
<email>chrubis@suse.cz</email>
</author>
<published>2023-08-02T15:19:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c1fc6484e1fb7cc2481d169bfef129a1b0676abe'/>
<id>urn:sha1:c1fc6484e1fb7cc2481d169bfef129a1b0676abe</id>
<content type='text'>
The sched_rr_timeslice can be reset to default by writing value that is
&lt;= 0. However after reading from this file we always got the last value
written, which is not useful at all.

$ echo -1 &gt; /proc/sys/kernel/sched_rr_timeslice_ms
$ cat /proc/sys/kernel/sched_rr_timeslice_ms
-1

Fix this by setting the variable that holds the sysctl file value to the
jiffies_to_msecs(RR_TIMESLICE) in case that &lt;= 0 value was written.

Signed-off-by: Cyril Hrubis &lt;chrubis@suse.cz&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Petr Vorel &lt;pvorel@suse.cz&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Tested-by: Petr Vorel &lt;pvorel@suse.cz&gt;
Link: https://lore.kernel.org/r/20230802151906.25258-3-chrubis@suse.cz
</content>
</entry>
<entry>
<title>sched/rt: Fix sysctl_sched_rr_timeslice intial value</title>
<updated>2023-08-14T15:01:23Z</updated>
<author>
<name>Cyril Hrubis</name>
<email>chrubis@suse.cz</email>
</author>
<published>2023-08-02T15:19:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c7fcb99877f9f542c918509b2801065adcaf46fa'/>
<id>urn:sha1:c7fcb99877f9f542c918509b2801065adcaf46fa</id>
<content type='text'>
There is a 10% rounding error in the intial value of the
sysctl_sched_rr_timeslice with CONFIG_HZ_300=y.

This was found with LTP test sched_rr_get_interval01:

sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed
sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns
sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90
sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed
sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns
sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90

What this test does is to compare the return value from the
sched_rr_get_interval() and the sched_rr_timeslice_ms sysctl file and
fails if they do not match.

The problem it found is the intial sysctl file value which was computed as:

static int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;

which works fine as long as MSEC_PER_SEC is multiple of HZ, however it
introduces 10% rounding error for CONFIG_HZ_300:

(MSEC_PER_SEC / HZ) * (100 * HZ / 1000)

(1000 / 300) * (100 * 300 / 1000)

3 * 30 = 90

This can be easily fixed by reversing the order of the multiplication
and division. After this fix we get:

(MSEC_PER_SEC * (100 * HZ / 1000)) / HZ

(1000 * (100 * 300 / 1000)) / 300

(1000 * 30) / 300 = 100

Fixes: 975e155ed873 ("sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds")
Signed-off-by: Cyril Hrubis &lt;chrubis@suse.cz&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Petr Vorel &lt;pvorel@suse.cz&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Tested-by: Petr Vorel &lt;pvorel@suse.cz&gt;
Link: https://lore.kernel.org/r/20230802151906.25258-2-chrubis@suse.cz
</content>
</entry>
<entry>
<title>sched/rt: Fix bad task migration for rt tasks</title>
<updated>2023-04-21T11:24:21Z</updated>
<author>
<name>Schspa Shi</name>
<email>schspa@gmail.com</email>
</author>
<published>2022-08-28T17:03:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=feffe5bb274dd3442080ef0e4053746091878799'/>
<id>urn:sha1:feffe5bb274dd3442080ef0e4053746091878799</id>
<content type='text'>
Commit 95158a89dd50 ("sched,rt: Use the full cpumask for balancing")
allows find_lock_lowest_rq() to pick a task with migration disabled.
The purpose of the commit is to push the current running task on the
CPU that has the migrate_disable() task away.

However, there is a race which allows a migrate_disable() task to be
migrated. Consider:

  CPU0                                    CPU1
  push_rt_task
    check is_migration_disabled(next_task)

                                          task not running and
                                          migration_disabled == 0

    find_lock_lowest_rq(next_task, rq);
      _double_lock_balance(this_rq, busiest);
        raw_spin_rq_unlock(this_rq);
        double_rq_lock(this_rq, busiest);
          &lt;&lt;wait for busiest rq&gt;&gt;
                                              &lt;wakeup&gt;
                                          task become running
                                          migrate_disable();
                                            &lt;context out&gt;
    deactivate_task(rq, next_task, 0);
    set_task_cpu(next_task, lowest_rq-&gt;cpu);
      WARN_ON_ONCE(is_migration_disabled(p));

Fixes: 95158a89dd50 ("sched,rt: Use the full cpumask for balancing")
Signed-off-by: Schspa Shi &lt;schspa@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Reviewed-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Reviewed-by: Valentin Schneider &lt;vschneid@redhat.com&gt;
Tested-by: Dwaine Gonyier &lt;dgonyier@redhat.com&gt;
</content>
</entry>
<entry>
<title>sched/core: Avoid selecting the task that is throttled to run when core-sched enable</title>
<updated>2023-03-22T09:10:58Z</updated>
<author>
<name>Hao Jia</name>
<email>jiahao.os@bytedance.com</email>
</author>
<published>2023-03-16T08:18:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=530bfad1d53d103f98cec66a3e491a36d397884d'/>
<id>urn:sha1:530bfad1d53d103f98cec66a3e491a36d397884d</id>
<content type='text'>
When {rt, cfs}_rq or dl task is throttled, since cookied tasks
are not dequeued from the core tree, So sched_core_find() and
sched_core_next() may return throttled task, which may
cause throttled task to run on the CPU.

So we add checks in sched_core_find() and sched_core_next()
to make sure that the return is a runnable task that is
not throttled.

Co-developed-by: Cruz Zhao &lt;CruzZhao@linux.alibaba.com&gt;
Signed-off-by: Cruz Zhao &lt;CruzZhao@linux.alibaba.com&gt;
Signed-off-by: Hao Jia &lt;jiahao.os@bytedance.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20230316081806.69544-1-jiahao.os@bytedance.com
</content>
</entry>
</feed>
