<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/sched, branch v6.6.5</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.6.5</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.6.5'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2023-12-03T06:33:02Z</updated>
<entry>
<title>sched/fair: Fix the decision for load balance</title>
<updated>2023-12-03T06:33:02Z</updated>
<author>
<name>Keisuke Nishimura</name>
<email>keisuke.nishimura@inria.fr</email>
</author>
<published>2023-10-31T13:38:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=47a3075109978d9684f7fea59de7c639fcba7145'/>
<id>urn:sha1:47a3075109978d9684f7fea59de7c639fcba7145</id>
<content type='text'>
[ Upstream commit 6d7e4782bcf549221b4ccfffec2cf4d1a473f1a3 ]

should_we_balance is called for the decision to do load-balancing.
When sched ticks invoke this function, only one CPU should return
true. However, in the current code, two CPUs can return true. The
following situation, where b means busy and i means idle, is an
example, because CPU 0 and CPU 2 return true.

        [0, 1] [2, 3]
         b  b   i  b

This fix checks if there exists an idle CPU with busy sibling(s)
after looking for a CPU on an idle core. If some idle CPUs with busy
siblings are found, just the first one should do load-balancing.

Fixes: b1bfeab9b002 ("sched/fair: Consider the idle state of the whole core for load balance")
Signed-off-by: Keisuke Nishimura &lt;keisuke.nishimura@inria.fr&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Chen Yu &lt;yu.c.chen@intel.com&gt;
Reviewed-by: Shrikanth Hegde &lt;sshegde@linux.vnet.ibm.com&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Link: https://lkml.kernel.org/r/20231031133821.1570861-1-keisuke.nishimura@inria.fr
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/eevdf: Fix vruntime adjustment on reweight</title>
<updated>2023-12-03T06:33:02Z</updated>
<author>
<name>Abel Wu</name>
<email>wuyun.abel@bytedance.com</email>
</author>
<published>2023-11-07T09:05:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=14204acc09f652169baed1141c671429047b1313'/>
<id>urn:sha1:14204acc09f652169baed1141c671429047b1313</id>
<content type='text'>
[ Upstream commit eab03c23c2a162085b13200d7942fc5a00b5ccc8 ]

vruntime of the (on_rq &amp;&amp; !0-lag) entity needs to be adjusted when
it gets re-weighted, and the calculations can be simplified based
on the fact that re-weight won't change the w-average of all the
entities. Please check the proofs in comments.

But adjusting vruntime can also cause position change in RB-tree
hence require re-queue to fix up which might be costly. This might
be avoided by deferring adjustment to the time the entity actually
leaves tree (dequeue/pick), but that will negatively affect task
selection and probably not good enough either.

Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy")
Signed-off-by: Abel Wu &lt;wuyun.abel@bytedance.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20231107090510.71322-2-wuyun.abel@bytedance.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/core: Fix RQCF_ACT_SKIP leak</title>
<updated>2023-11-28T17:19:59Z</updated>
<author>
<name>Hao Jia</name>
<email>jiahao.os@bytedance.com</email>
</author>
<published>2023-10-12T09:00:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bf5dd542721a4f5145b241cf51e92f3809855d27'/>
<id>urn:sha1:bf5dd542721a4f5145b241cf51e92f3809855d27</id>
<content type='text'>
commit 5ebde09d91707a4a9bec1e3d213e3c12ffde348f upstream.

Igor Raits and Bagas Sanjaya report a RQCF_ACT_SKIP leak warning.

This warning may be triggered in the following situations:

    CPU0                                      CPU1

__schedule()
  *rq-&gt;clock_update_flags &lt;&lt;= 1;*   unregister_fair_sched_group()
  pick_next_task_fair+0x4a/0x410      destroy_cfs_bandwidth()
    newidle_balance+0x115/0x3e0       for_each_possible_cpu(i) *i=0*
      rq_unpin_lock(this_rq, rf)      __cfsb_csd_unthrottle()
      raw_spin_rq_unlock(this_rq)
                                      rq_lock(*CPU0_rq*, &amp;rf)
                                      rq_clock_start_loop_update()
                                      rq-&gt;clock_update_flags &amp; RQCF_ACT_SKIP &lt;--
      raw_spin_rq_lock(this_rq)

The purpose of RQCF_ACT_SKIP is to skip the update rq clock,
but the update is very early in __schedule(), but we clear
RQCF_*_SKIP very late, causing it to span that gap above
and triggering this warning.

In __schedule() we can clear the RQCF_*_SKIP flag immediately
after update_rq_clock() to avoid this RQCF_ACT_SKIP leak warning.
And set rq-&gt;clock_update_flags to RQCF_UPDATED to avoid
rq-&gt;clock_update_flags &lt; RQCF_ACT_SKIP warning that may be triggered later.

Fixes: ebb83d84e49b ("sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()")
Closes: https://lore.kernel.org/all/20230913082424.73252-1-jiahao.os@bytedance.com
Reported-by: Igor Raits &lt;igor.raits@gmail.com&gt;
Reported-by: Bagas Sanjaya &lt;bagasdotme@gmail.com&gt;
Suggested-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Hao Jia &lt;jiahao.os@bytedance.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/a5dd536d-041a-2ce9-f4b7-64d8d85c86dc@gmail.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched: Fix stop_one_cpu_nowait() vs hotplug</title>
<updated>2023-11-20T10:58:52Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2023-10-10T18:57:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d03b4817436286eeaa1ecb3606064fca31095fd2'/>
<id>urn:sha1:d03b4817436286eeaa1ecb3606064fca31095fd2</id>
<content type='text'>
[ Upstream commit f0498d2a54e7966ce23cd7c7ff42c64fa0059b07 ]

Kuyo reported sporadic failures on a sched_setaffinity() vs CPU
hotplug stress-test -- notably affine_move_task() remains stuck in
wait_for_completion(), leading to a hung-task detector warning.

Specifically, it was reported that stop_one_cpu_nowait(.fn =
migration_cpu_stop) returns false -- this stopper is responsible for
the matching complete().

The race scenario is:

	CPU0					CPU1

					// doing _cpu_down()

  __set_cpus_allowed_ptr()
    task_rq_lock();
					takedown_cpu()
					  stop_machine_cpuslocked(take_cpu_down..)

					&lt;PREEMPT: cpu_stopper_thread()
					  MULTI_STOP_PREPARE
					  ...
    __set_cpus_allowed_ptr_locked()
      affine_move_task()
        task_rq_unlock();

  &lt;PREEMPT: cpu_stopper_thread()\&gt;
    ack_state()
					  MULTI_STOP_RUN
					    take_cpu_down()
					      __cpu_disable();
					      stop_machine_park();
						stopper-&gt;enabled = false;
					 /&gt;
   /&gt;
	stop_one_cpu_nowait(.fn = migration_cpu_stop);
          if (stopper-&gt;enabled) // false!!!

That is, by doing stop_one_cpu_nowait() after dropping rq-lock, the
stopper thread gets a chance to preempt and allows the cpu-down for
the target CPU to complete.

OTOH, since stop_one_cpu_nowait() / cpu_stop_queue_work() needs to
issue a wakeup, it must not be ran under the scheduler locks.

Solve this apparent contradiction by keeping preemption disabled over
the unlock + queue_stopper combination:

	preempt_disable();
	task_rq_unlock(...);
	if (!stop_pending)
	  stop_one_cpu_nowait(...)
	preempt_enable();

This respects the lock ordering contraints while still avoiding the
above race. That is, if we find the CPU is online under rq-lock, the
targeted stop_one_cpu_nowait() must succeed.

Apply this pattern to all similar stop_one_cpu_nowait() invocations.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Reported-by: "Kuyo Chang (張建文)" &lt;Kuyo.Chang@mediatek.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: "Kuyo Chang (張建文)" &lt;Kuyo.Chang@mediatek.com&gt;
Link: https://lkml.kernel.org/r/20231010200442.GA16515@noisy.programming.kicks-ass.net
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/uclamp: Ignore (util == 0) optimization in feec() when p_util_max = 0</title>
<updated>2023-11-20T10:58:52Z</updated>
<author>
<name>Qais Yousef</name>
<email>qyousef@layalina.io</email>
</author>
<published>2023-09-16T23:29:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=294e3c797d7cb27415f3a842f27e263e1a851ddd'/>
<id>urn:sha1:294e3c797d7cb27415f3a842f27e263e1a851ddd</id>
<content type='text'>
[ Upstream commit 23c9519def98ee0fa97ea5871535e9b136f522fc ]

find_energy_efficient_cpu() bails out early if effective util of the
task is 0 as the delta at this point will be zero and there's nothing
for EAS to do. When uclamp is being used, this could lead to wrong
decisions when uclamp_max is set to 0. In this case the task is capped
to performance point 0, but it is actually running and consuming energy
and we can benefit from EAS energy calculations.

Rework the condition so that it bails out when both util and uclamp_min
are 0.

We can do that without needing to use uclamp_task_util(); remove it.

Fixes: d81304bc6193 ("sched/uclamp: Cater for uclamp in find_energy_efficient_cpu()'s early exit condition")
Signed-off-by: Qais Yousef (Google) &lt;qyousef@layalina.io&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Reviewed-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20230916232955.2099394-3-qyousef@layalina.io
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/uclamp: Set max_spare_cap_cpu even if max_spare_cap is 0</title>
<updated>2023-11-20T10:58:52Z</updated>
<author>
<name>Qais Yousef</name>
<email>qyousef@layalina.io</email>
</author>
<published>2023-09-16T23:29:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=60bbc99f7d28042f5a8656167938535861c3bd49'/>
<id>urn:sha1:60bbc99f7d28042f5a8656167938535861c3bd49</id>
<content type='text'>
[ Upstream commit 6b00a40147653c8ea748e8f4396510f252763364 ]

When uclamp_max is being used, the util of the task could be higher than
the spare capacity of the CPU, but due to uclamp_max value we force-fit
it there.

The way the condition for checking for max_spare_cap in
find_energy_efficient_cpu() was constructed; it ignored any CPU that has
its spare_cap less than or _equal_ to max_spare_cap. Since we initialize
max_spare_cap to 0; this lead to never setting max_spare_cap_cpu and
hence ending up never performing compute_energy() for this cluster and
missing an opportunity for a better energy efficient placement to honour
uclamp_max setting.

	max_spare_cap = 0;
	cpu_cap = capacity_of(cpu) - cpu_util(p);  // 0 if cpu_util(p) is high

	...

	util_fits_cpu(...);		// will return true if uclamp_max forces it to fit

	...

	// this logic will fail to update max_spare_cap_cpu if cpu_cap is 0
	if (cpu_cap &gt; max_spare_cap) {
		max_spare_cap = cpu_cap;
		max_spare_cap_cpu = cpu;
	}

prev_spare_cap suffers from a similar problem.

Fix the logic by converting the variables into long and treating -1
value as 'not populated' instead of 0 which is a viable and correct
spare capacity value. We need to be careful signed comparison is used
when comparing with cpu_cap in one of the conditions.

Fixes: 1d42509e475c ("sched/fair: Make EAS wakeup placement consider uclamp restrictions")
Signed-off-by: Qais Yousef (Google) &lt;qyousef@layalina.io&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Reviewed-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20230916232955.2099394-2-qyousef@layalina.io
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Fix cfs_rq_is_decayed() on !SMP</title>
<updated>2023-11-20T10:58:51Z</updated>
<author>
<name>Chengming Zhou</name>
<email>zhouchengming@bytedance.com</email>
</author>
<published>2023-09-13T13:20:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f6df646b7ee3c7932ae32fcc75bbe394f54923c5'/>
<id>urn:sha1:f6df646b7ee3c7932ae32fcc75bbe394f54923c5</id>
<content type='text'>
[ Upstream commit c0490bc9bb62d9376f3dd4ec28e03ca0fef97152 ]

We don't need to maintain per-queue leaf_cfs_rq_list on !SMP, since
it's used for cfs_rq load tracking &amp; balancing on SMP.

But sched debug interface uses it to print per-cfs_rq stats.

This patch fixes the !SMP version of cfs_rq_is_decayed(), so the
per-queue leaf_cfs_rq_list is also maintained correctly on !SMP,
to fix the warning in assert_list_leaf_cfs_rq().

Fixes: 0a00a354644e ("sched/fair: Delete useless condition in tg_unthrottle_up()")
Reported-by: Leo Yu-Chi Liang &lt;ycliang@andestech.com&gt;
Signed-off-by: Chengming Zhou &lt;zhouchengming@bytedance.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Tested-by: Leo Yu-Chi Liang &lt;ycliang@andestech.com&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Closes: https://lore.kernel.org/all/ZN87UsqkWcFLDxea@swlinux02/
Link: https://lore.kernel.org/r/20230913132031.2242151-1-chengming.zhou@linux.dev
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/topology: Fix sched_numa_find_nth_cpu() in CPU-less case</title>
<updated>2023-11-20T10:58:51Z</updated>
<author>
<name>Yury Norov</name>
<email>yury.norov@gmail.com</email>
</author>
<published>2023-08-19T14:12:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b633c05136544aff3e06c1a93d65e6c15f77ac1a'/>
<id>urn:sha1:b633c05136544aff3e06c1a93d65e6c15f77ac1a</id>
<content type='text'>
[ Upstream commit 617f2c38cb5ce60226042081c09e2ee3a90d03f8 ]

When the node provided by user is CPU-less, corresponding record in
sched_domains_numa_masks is not set. Trying to dereference it in the
following code leads to kernel crash.

To avoid it, start searching from the nearest node with CPUs.

Fixes: cd7f55359c90 ("sched: add sched_numa_find_nth_cpu()")
Reported-by: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Reported-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Signed-off-by: Yury Norov &lt;yury.norov@gmail.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Reviewed-by: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Link: https://lore.kernel.org/r/20230819141239.287290-4-yury.norov@gmail.com

Closes: https://lore.kernel.org/lkml/CAAH8bW8C5humYnfpW3y5ypwx0E-09A3QxFE1JFzR66v+mO4XfA@mail.gmail.com/T/
Closes: https://lore.kernel.org/lkml/ZMHSNQfv39HN068m@yury-ThinkPad/T/#mf6431cb0b7f6f05193c41adeee444bc95bf2b1c4
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/eevdf: Fix heap corruption more</title>
<updated>2023-10-18T08:22:13Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2023-10-17T14:59:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d2929762cc3f85528b0ca12f6f63c2a714f24778'/>
<id>urn:sha1:d2929762cc3f85528b0ca12f6f63c2a714f24778</id>
<content type='text'>
Because someone is a flaming idiot... and forgot we have current as
se-&gt;on_rq but not actually in the tree itself, and walking rb_parent()
on an entry not in the tree is 'funky' and KASAN complains.

Fixes: 8dafa9d0eb1a ("sched/eevdf: Fix min_deadline heap integrity")
Reported-by: 0599jiangyc@gmail.com
Reported-by: Dmitry Safonov &lt;0x7f454c46@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Dmitry Safonov &lt;0x7f454c46@gmail.com&gt;
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218020
Link: https://lkml.kernel.org/r/CAJwJo6ZGXO07%3DQvW4fgQfbsDzQPs9xj5sAQ1zp%3DmAyPMNbHYww%40mail.gmail.com
</content>
</entry>
<entry>
<title>sched/eevdf: Fix pick_eevdf()</title>
<updated>2023-10-09T07:48:33Z</updated>
<author>
<name>Benjamin Segall</name>
<email>bsegall@google.com</email>
</author>
<published>2023-09-30T00:09:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b01db23d5923a35023540edc4f0c5f019e11ac7d'/>
<id>urn:sha1:b01db23d5923a35023540edc4f0c5f019e11ac7d</id>
<content type='text'>
The old pick_eevdf() could fail to find the actual earliest eligible
deadline when it descended to the right looking for min_deadline, but
it turned out that that min_deadline wasn't actually eligible. In that
case we need to go back and search through any left branches we
skipped looking for the actual best _eligible_ min_deadline.

This is more expensive, but still O(log n), and at worst should only
involve descending two branches of the rbtree.

I've run this through a userspace stress test (thank you
tools/lib/rbtree.c), so hopefully this implementation doesn't miss any
corner cases.

Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy")
Signed-off-by: Ben Segall &lt;bsegall@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/xm261qego72d.fsf_-_@google.com
</content>
</entry>
</feed>
