<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/sched, branch v6.6.111</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.6.111</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.6.111'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2025-09-09T16:56:26Z</updated>
<entry>
<title>sched: Fix sched_numa_find_nth_cpu() if mask offline</title>
<updated>2025-09-09T16:56:26Z</updated>
<author>
<name>Christian Loehle</name>
<email>christian.loehle@arm.com</email>
</author>
<published>2025-09-03T15:48:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f9b8d4dba8e78c1887fecd81ba0d8204d6ff05fc'/>
<id>urn:sha1:f9b8d4dba8e78c1887fecd81ba0d8204d6ff05fc</id>
<content type='text'>
commit 5ebf512f335053a42482ebff91e46c6dc156bf8c upstream.

sched_numa_find_nth_cpu() uses a bsearch to look for the 'closest'
CPU in sched_domains_numa_masks and given cpus mask. However they
might not intersect if all CPUs in the cpus mask are offline. bsearch
will return NULL in that case, bail out instead of dereferencing a
bogus pointer.

The previous behaviour lead to this bug when using maxcpus=4 on an
rk3399 (LLLLbb) (i.e. booting with all big CPUs offline):

[    1.422922] Unable to handle kernel paging request at virtual address ffffff8000000000
[    1.423635] Mem abort info:
[    1.423889]   ESR = 0x0000000096000006
[    1.424227]   EC = 0x25: DABT (current EL), IL = 32 bits
[    1.424715]   SET = 0, FnV = 0
[    1.424995]   EA = 0, S1PTW = 0
[    1.425279]   FSC = 0x06: level 2 translation fault
[    1.425735] Data abort info:
[    1.425998]   ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
[    1.426499]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[    1.426952]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[    1.427428] swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000004a9f000
[    1.428038] [ffffff8000000000] pgd=18000000f7fff403, p4d=18000000f7fff403, pud=18000000f7fff403, pmd=0000000000000000
[    1.429014] Internal error: Oops: 0000000096000006 [#1]  SMP
[    1.429525] Modules linked in:
[    1.429813] CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.17.0-rc4-dirty #343 PREEMPT
[    1.430559] Hardware name: Pine64 RockPro64 v2.1 (DT)
[    1.431012] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    1.431634] pc : sched_numa_find_nth_cpu+0x2a0/0x488
[    1.432094] lr : sched_numa_find_nth_cpu+0x284/0x488
[    1.432543] sp : ffffffc084e1b960
[    1.432843] x29: ffffffc084e1b960 x28: ffffff80078a8800 x27: ffffffc0846eb1d0
[    1.433495] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
[    1.434144] x23: 0000000000000000 x22: fffffffffff7f093 x21: ffffffc081de6378
[    1.434792] x20: 0000000000000000 x19: 0000000ffff7f093 x18: 00000000ffffffff
[    1.435441] x17: 3030303866666666 x16: 66663d736b73616d x15: ffffffc104e1b5b7
[    1.436091] x14: 0000000000000000 x13: ffffffc084712860 x12: 0000000000000372
[    1.436739] x11: 0000000000000126 x10: ffffffc08476a860 x9 : ffffffc084712860
[    1.437389] x8 : 00000000ffffefff x7 : ffffffc08476a860 x6 : 0000000000000000
[    1.438036] x5 : 000000000000bff4 x4 : 0000000000000000 x3 : 0000000000000000
[    1.438683] x2 : 0000000000000000 x1 : ffffffc0846eb000 x0 : ffffff8000407b68
[    1.439332] Call trace:
[    1.439559]  sched_numa_find_nth_cpu+0x2a0/0x488 (P)
[    1.440016]  smp_call_function_any+0xc8/0xd0
[    1.440416]  armv8_pmu_init+0x58/0x27c
[    1.440770]  armv8_cortex_a72_pmu_init+0x20/0x2c
[    1.441199]  arm_pmu_device_probe+0x1e4/0x5e8
[    1.441603]  armv8_pmu_device_probe+0x1c/0x28
[    1.442007]  platform_probe+0x5c/0xac
[    1.442347]  really_probe+0xbc/0x298
[    1.442683]  __driver_probe_device+0x78/0x12c
[    1.443087]  driver_probe_device+0xdc/0x160
[    1.443475]  __driver_attach+0x94/0x19c
[    1.443833]  bus_for_each_dev+0x74/0xd4
[    1.444190]  driver_attach+0x24/0x30
[    1.444525]  bus_add_driver+0xe4/0x208
[    1.444874]  driver_register+0x60/0x128
[    1.445233]  __platform_driver_register+0x24/0x30
[    1.445662]  armv8_pmu_driver_init+0x28/0x4c
[    1.446059]  do_one_initcall+0x44/0x25c
[    1.446416]  kernel_init_freeable+0x1dc/0x3bc
[    1.446820]  kernel_init+0x20/0x1d8
[    1.447151]  ret_from_fork+0x10/0x20
[    1.447493] Code: 90022e21 f000e5f5 910de2b5 2a1703e2 (f8767803)
[    1.448040] ---[ end trace 0000000000000000 ]---
[    1.448483] note: swapper/0[1] exited with preempt_count 1
[    1.449047] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    1.449741] SMP: stopping secondary CPUs
[    1.450105] Kernel Offset: disabled
[    1.450419] CPU features: 0x000000,00080000,20002001,0400421b
[    1.450935] Memory Limit: none
[    1.451217] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Yury: with the fix, the function returns cpu == nr_cpu_ids, and later in

	smp_call_function_any -&gt;
	  smp_call_function_single -&gt;
	     generic_exec_single

we test the cpu for '&gt;= nr_cpu_ids' and return -ENXIO. So everything is
handled correctly.

Fixes: cd7f55359c90 ("sched: add sched_numa_find_nth_cpu()")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Loehle &lt;christian.loehle@arm.com&gt;
Signed-off-by: Yury Norov (NVIDIA) &lt;yury.norov@gmail.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Fix frequency selection for non-invariant case</title>
<updated>2025-08-28T14:28:43Z</updated>
<author>
<name>Vincent Guittot</name>
<email>vincent.guittot@linaro.org</email>
</author>
<published>2024-01-14T18:36:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d1446050a29a7f609895780ef1500f2d29cfc0c7'/>
<id>urn:sha1:d1446050a29a7f609895780ef1500f2d29cfc0c7</id>
<content type='text'>
commit e37617c8e53a1f7fcba6d5e1041f4fd8a2425c27 upstream.

Linus reported a ~50% performance regression on single-threaded
workloads on his AMD Ryzen system, and bisected it to:

  9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor performance estimation")

When frequency invariance is not enabled, get_capacity_ref_freq(policy)
is supposed to return the current frequency and the performance margin
applied by map_util_perf(), enabling the utilization to go above the
maximum compute capacity and to select a higher frequency than the current one.

After the changes in 9c0b4bb7f630, the performance margin was applied
earlier in the path to take into account utilization clampings and
we couldn't get a utilization higher than the maximum compute capacity,
and the CPU remained 'stuck' at lower frequencies.

To fix this, we must use a frequency above the current frequency to
get a chance to select a higher OPP when the current one becomes fully used.
Apply the same margin and return a frequency 25% higher than the current
one in order to switch to the next OPP before we fully use the CPU
at the current one.

[ mingo: Clarified the changelog. ]

Fixes: 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor performance estimation")
Reported-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Bisected-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Reported-by: Wyes Karny &lt;wkarny@gmail.com&gt;
Signed-off-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Tested-by: Wyes Karny &lt;wkarny@gmail.com&gt;
Link: https://lore.kernel.org/r/20240114183600.135316-1-vincent.guittot@linaro.org
Signed-off-by: Wentao Guan &lt;guanwentao@uniontech.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>cpufreq/schedutil: Use a fixed reference frequency</title>
<updated>2025-08-28T14:28:42Z</updated>
<author>
<name>Vincent Guittot</name>
<email>vincent.guittot@linaro.org</email>
</author>
<published>2023-12-11T10:48:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6688eb92693239a4cbd5385ead8a0e165951df02'/>
<id>urn:sha1:6688eb92693239a4cbd5385ead8a0e165951df02</id>
<content type='text'>
commit b3edde44e5d4504c23a176819865cd603fd16d6c upstream.

cpuinfo.max_freq can change at runtime because of boost as an example. This
implies that the value could be different than the one that has been
used when computing the capacity of a CPU.

The new arch_scale_freq_ref() returns a fixed and coherent reference
frequency that can be used when computing a frequency based on utilization.

Use this arch_scale_freq_ref() when available and fallback to
policy otherwise.

Signed-off-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Tested-by: Lukasz Luba &lt;lukasz.luba@arm.com&gt;
Reviewed-by: Lukasz Luba &lt;lukasz.luba@arm.com&gt;
Reviewed-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Acked-by: Rafael J. Wysocki &lt;rafael@kernel.org&gt;
Acked-by: Viresh Kumar &lt;viresh.kumar@linaro.org&gt;
Link: https://lore.kernel.org/r/20231211104855.558096-4-vincent.guittot@linaro.org
Stable-dep-of: e37617c8e53a ("sched/fair: Fix frequency selection for non-invariant case")
Signed-off-by: Wentao Guan &lt;guanwentao@uniontech.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Bump sd-&gt;max_newidle_lb_cost when newidle balance fails</title>
<updated>2025-08-28T14:28:21Z</updated>
<author>
<name>Chris Mason</name>
<email>clm@fb.com</email>
</author>
<published>2025-06-26T14:39:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c73e54dba1074f136b5061618825e916e5cb47a5'/>
<id>urn:sha1:c73e54dba1074f136b5061618825e916e5cb47a5</id>
<content type='text'>
[ Upstream commit 155213a2aed42c85361bf4f5c817f5cb68951c3b ]

schbench (https://github.com/masoncl/schbench.git) is showing a
regression from previous production kernels that bisected down to:

sched/fair: Remove sysctl_sched_migration_cost condition (c5b0a7eefc)

The schbench command line was:

schbench -L -m 4 -M auto -t 256 -n 0 -r 0 -s 0

This creates 4 message threads pinned to CPUs 0-3, and 256x4 worker
threads spread across the rest of the CPUs.  Neither the worker threads
or the message threads do any work, they just wake each other up and go
back to sleep as soon as possible.

The end result is the first 4 CPUs are pegged waking up those 1024
workers, and the rest of the CPUs are constantly banging in and out of
idle.  If I take a v6.9 Linus kernel and revert that one commit,
performance goes from 3.4M RPS to 5.4M RPS.

schedstat shows there are ~100x  more new idle balance operations, and
profiling shows the worker threads are spending ~20% of their CPU time
on new idle balance.  schedstats also shows that almost all of these new
idle balance attemps are failing to find busy groups.

The fix used here is to crank up the cost of the newidle balance whenever it
fails.  Since we don't want sd-&gt;max_newidle_lb_cost to grow out of
control, this also changes update_newidle_cost() to use
sysctl_sched_migration_cost as the upper limit on max_newidle_lb_cost.

Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Link: https://lkml.kernel.org/r/20250626144017.1510594-2-clm@fb.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>freezer,sched: Use saved_state to reduce some spurious wakeups</title>
<updated>2025-08-15T10:09:07Z</updated>
<author>
<name>Elliot Berman</name>
<email>quic_eberman@quicinc.com</email>
</author>
<published>2023-09-08T22:49:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e241ca2f0ec364ed3e202ce84e8ae4d3b6e8dac2'/>
<id>urn:sha1:e241ca2f0ec364ed3e202ce84e8ae4d3b6e8dac2</id>
<content type='text'>
commit 8f0eed4a78a81668bc78923ea09f51a7a663c2b0 upstream.

After commit f5d39b020809 ("freezer,sched: Rewrite core freezer logic"),
tasks that transition directly from TASK_FREEZABLE to TASK_FROZEN  are
always woken up on the thaw path. Prior to that commit, tasks could ask
freezer to consider them "frozen enough" via freezer_do_not_count(). The
commit replaced freezer_do_not_count() with a TASK_FREEZABLE state which
allows freezer to immediately mark the task as TASK_FROZEN without
waking up the task.  This is efficient for the suspend path, but on the
thaw path, the task is always woken up even if the task didn't need to
wake up and goes back to its TASK_(UN)INTERRUPTIBLE state. Although
these tasks are capable of handling of the wakeup, we can observe a
power/perf impact from the extra wakeup.

We observed on Android many tasks wait in the TASK_FREEZABLE state
(particularly due to many of them being binder clients). We observed
nearly 4x the number of tasks and a corresponding linear increase in
latency and power consumption when thawing the system. The latency
increased from ~15ms to ~50ms.

Avoid the spurious wakeups by saving the state of TASK_FREEZABLE tasks.
If the task was running before entering TASK_FROZEN state
(__refrigerator()) or if the task received a wake up for the saved
state, then the task is woken on thaw. saved_state from PREEMPT_RT locks
can be re-used because freezer would not stomp on the rtlock wait flow:
TASK_RTLOCK_WAIT isn't considered freezable.

Reported-by: Prakash Viswalingam &lt;quic_prakashv@quicinc.com&gt;
Signed-off-by: Elliot Berman &lt;quic_eberman@quicinc.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Chen Ridong &lt;chenridong@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched/core: Remove ifdeffery for saved_state</title>
<updated>2025-08-15T10:09:07Z</updated>
<author>
<name>Elliot Berman</name>
<email>quic_eberman@quicinc.com</email>
</author>
<published>2023-09-08T22:49:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8afa818c77330fcb269c77540b268f199d84ee8e'/>
<id>urn:sha1:8afa818c77330fcb269c77540b268f199d84ee8e</id>
<content type='text'>
commit fbaa6a181a4b1886cbf4214abdf9a2df68471510 upstream.

In preparation for freezer to also use saved_state, remove the
CONFIG_PREEMPT_RT compilation guard around saved_state.

On the arm64 platform I tested which did not have CONFIG_PREEMPT_RT,
there was no statistically significant deviation by applying this patch.

Test methodology:

perf bench sched message -g 40 -l 40

Signed-off-by: Elliot Berman &lt;quic_eberman@quicinc.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Chen Ridong &lt;chenridong@huawei.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched/psi: Fix psi_seq initialization</title>
<updated>2025-08-15T10:09:00Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2025-07-15T19:11:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9184a2bb522c7dee7a823cb61d8efd89e443968f'/>
<id>urn:sha1:9184a2bb522c7dee7a823cb61d8efd89e443968f</id>
<content type='text'>
[ Upstream commit 99b773d720aeea1ef2170dce5fcfa80649e26b78 ]

With the seqcount moved out of the group into a global psi_seq,
re-initializing the seqcount on group creation is causing seqcount
corruption.

Fixes: 570c8efd5eb7 ("sched/psi: Optimize psi_group_change() cpu_clock() usage")
Reported-by: Chris Mason &lt;clm@meta.com&gt;
Suggested-by: Beata Michalska &lt;beata.michalska@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/psi: Optimize psi_group_change() cpu_clock() usage</title>
<updated>2025-08-15T10:08:46Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2025-05-23T15:28:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b1c6b13f48a8f5bf147e91e42da1db740b0a7b1f'/>
<id>urn:sha1:b1c6b13f48a8f5bf147e91e42da1db740b0a7b1f</id>
<content type='text'>
[ Upstream commit 570c8efd5eb79c3725ba439ce105ed1bedc5acd9 ]

Dietmar reported that commit 3840cbe24cf0 ("sched: psi: fix bogus
pressure spikes from aggregation race") caused a regression for him on
a high context switch rate benchmark (schbench) due to the now
repeating cpu_clock() calls.

In particular the problem is that get_recent_times() will extrapolate
the current state to 'now'. But if an update uses a timestamp from
before the start of the update, it is possible to get two reads
with inconsistent results. It is effectively back-dating an update.

(note that this all hard-relies on the clock being synchronized across
CPUs -- if this is not the case, all bets are off).

Combine this problem with the fact that there are per-group-per-cpu
seqcounts, the commit in question pushed the clock read into the group
iteration, causing tree-depth cpu_clock() calls. On architectures
where cpu_clock() has appreciable overhead, this hurts.

Instead move to a per-cpu seqcount, which allows us to have a single
clock read for all group updates, increasing internal consistency and
lowering update overhead. This comes at the cost of a longer update
side (proportional to the tree depth) which can cause the read side to
retry more often.

Fixes: 3840cbe24cf0 ("sched: psi: fix bogus pressure spikes from aggregation race")
Reported-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Tested-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;,
Link: https://lkml.kernel.org/20250522084844.GC31726@noisy.programming.kicks-ass.net
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched: Change nr_uninterruptible type to unsigned long</title>
<updated>2025-07-24T06:53:20Z</updated>
<author>
<name>Aruna Ramakrishna</name>
<email>aruna.ramakrishna@oracle.com</email>
</author>
<published>2025-07-09T17:33:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=496efa228f0dd58980d301e379e5561a9b612eaa'/>
<id>urn:sha1:496efa228f0dd58980d301e379e5561a9b612eaa</id>
<content type='text'>
commit 36569780b0d64de283f9d6c2195fd1a43e221ee8 upstream.

The commit e6fe3f422be1 ("sched: Make multiple runqueue task counters
32-bit") changed nr_uninterruptible to an unsigned int. But the
nr_uninterruptible values for each of the CPU runqueues can grow to
large numbers, sometimes exceeding INT_MAX. This is valid, if, over
time, a large number of tasks are migrated off of one CPU after going
into an uninterruptible state. Only the sum of all nr_interruptible
values across all CPUs yields the correct result, as explained in a
comment in kernel/sched/loadavg.c.

Change the type of nr_uninterruptible back to unsigned long to prevent
overflows, and thus the miscalculation of load average.

Fixes: e6fe3f422be1 ("sched: Make multiple runqueue task counters 32-bit")

Signed-off-by: Aruna Ramakrishna &lt;aruna.ramakrishna@oracle.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20250709173328.606794-1-aruna.ramakrishna@oracle.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched: Reduce the default slice to avoid tasks getting an extra tick</title>
<updated>2025-06-04T12:42:09Z</updated>
<author>
<name>zihan zhou</name>
<email>15645113830zzh@gmail.com</email>
</author>
<published>2025-02-08T07:53:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e9bed533ec803d4efe4c8af7d5bbb75266d4ff0a'/>
<id>urn:sha1:e9bed533ec803d4efe4c8af7d5bbb75266d4ff0a</id>
<content type='text'>
[ Upstream commit 2ae891b826958b60919ea21c727f77bcd6ffcc2c ]

The old default value for slice is 0.75 msec * (1 + ilog(ncpus)) which
means that we have a default slice of:

  0.75 for 1 cpu
  1.50 up to 3 cpus
  2.25 up to 7 cpus
  3.00 for 8 cpus and above.

For HZ=250 and HZ=100, because of the tick accuracy, the runtime of
tasks is far higher than their slice.

For HZ=1000 with 8 cpus or more, the accuracy of tick is already
satisfactory, but there is still an issue that tasks will get an extra
tick because the tick often arrives a little faster than expected. In
this case, the task can only wait until the next tick to consider that it
has reached its deadline, and will run 1ms longer.

vruntime + sysctl_sched_base_slice =     deadline
        |-----------|-----------|-----------|-----------|
             1ms          1ms         1ms         1ms
                   ^           ^           ^           ^
                 tick1       tick2       tick3       tick4(nearly 4ms)

There are two reasons for tick error: clockevent precision and the
CONFIG_IRQ_TIME_ACCOUNTING/CONFIG_PARAVIRT_TIME_ACCOUNTING. with
CONFIG_IRQ_TIME_ACCOUNTING every tick will be less than 1ms, but even
without it, because of clockevent precision, tick still often less than
1ms.

In order to make scheduling more precise, we changed 0.75 to 0.70,
Using 0.70 instead of 0.75 should not change much for other configs
and would fix this issue:

  0.70 for 1 cpu
  1.40 up to 3 cpus
  2.10 up to 7 cpus
  2.8 for 8 cpus and above.

This does not guarantee that tasks can run the slice time accurately
every time, but occasionally running an extra tick has little impact.

Signed-off-by: zihan zhou &lt;15645113830zzh@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Link: https://lkml.kernel.org/r/20250208075322.13139-1-15645113830zzh@gmail.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
</feed>
