From 98c88dc8a1ace642d9021b103b28cba7b51e3abc Mon Sep 17 00:00:00 2001
From: Vincent Guittot <vincent.guittot@linaro.org>
Date: Wed, 21 Jan 2026 17:33:17 +0100
Subject: sched/fair: Fix pelt clock sync when entering idle

Samuel and Alex reported regressions of the util_avg of RT rq with
commit 17e3e88ed0b6 ("sched/fair: Fix pelt lost idle time detection").
It happens that fair is updating and syncing the pelt clock with task one
when pick_next_task_fair() fails to pick a task but before the prev
scheduling class got a chance to update its pelt signals.

Move update_idle_rq_clock_pelt() in set_next_task_idle() which is called
after prev class has been called.

Fixes: 17e3e88ed0b6 ("sched/fair: Fix pelt lost idle time detection")
Closes: https://lore.kernel.org/all/CAG2KctpO6VKS6GN4QWDji0t92_gNBJ7HjjXrE+6H+RwRXt=iLg@mail.gmail.com/
Closes: https://lore.kernel.org/all/8cf19bf0e0054dcfed70e9935029201694f1bb5a.camel@mediatek.com/
Reported-by: Samuel Wu <wusamuel@google.com>
Reported-by: Alex Hoh <Alex.Hoh@mediatek.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Samuel Wu <wusamuel@google.com>
Tested-by: Alex Hoh <Alex.Hoh@mediatek.com>
Link: https://patch.msgid.link/20260121163317.505635-1-vincent.guittot@linaro.org
---
 kernel/sched/fair.c | 6 ------
 kernel/sched/idle.c | 6 ++++++
 2 files changed, 6 insertions(+), 6 deletions(-)

(limited to 'kernel/sched')

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e71302282671..a148c61a8085 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8995,12 +8995,6 @@ idle:
 			goto again;
 	}
 
-	/*
-	 * rq is about to be idle, check if we need to update the
-	 * lost_idle_time of clock_pelt
-	 */
-	update_idle_rq_clock_pelt(rq);
-
 	return NULL;
 }
 
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index c174afe1dd17..abf8f15d60c9 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -468,6 +468,12 @@ static void set_next_task_idle(struct rq *rq, struct task_struct *next, bool fir
 	scx_update_idle(rq, true, true);
 	schedstat_inc(rq->sched_goidle);
 	next->se.exec_start = rq_clock_task(rq);
+
+	/*
+	 * rq is about to be idle, check if we need to update the
+	 * lost_idle_time of clock_pelt
+	 */
+	update_idle_rq_clock_pelt(rq);
 }
 
 struct task_struct *pick_task_idle(struct rq *rq, struct rq_flags *rf)
-- 
cgit v1.2.3


From 4f70f106bca1a56bd66d00830ac91680bd754974 Mon Sep 17 00:00:00 2001
From: Mel Gorman <mgorman@techsingularity.net>
Date: Tue, 20 Jan 2026 11:33:35 +0000
Subject: sched/fair: Disable scheduler feature NEXT_BUDDY

NEXT_BUDDY was disabled with the introduction of EEVDF and enabled again
after NEXT_BUDDY was rewritten for EEVDF by commit e837456fdca8 ("sched/fair:
Reimplement NEXT_BUDDY to align with EEVDF goals"). It was not expected
that this would be a universal win without a crystal ball instruction
but the reported regressions are a concern [1][2] even if gains were
also reported. Specifically;

o mysql with client/server running on different servers regresses
o specjbb reports lower peak metrics
o daytrader regresses

The mysql is realistic and a concern. It needs to be confirmed if
specjbb is simply shifting the point where peak performance is measured
but still a concern. daytrader is considered to be representative of a
real workload.

Access to test machines is currently problematic for verifying any fix to
this problem. Disable NEXT_BUDDY for now by default until the root causes
are addressed.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Link: https://lore.kernel.org/lkml/4b96909a-f1ac-49eb-b814-97b8adda6229@arm.com [1]
Link: https://lore.kernel.org/lkml/ec3ea66f-3a0d-4b5a-ab36-ce778f159b5b@linux.ibm.com [2]
Link: https://patch.msgid.link/fyqsk63pkoxpeaclyqsm5nwtz3dyejplr7rg6p74xwemfzdzuu@7m7xhs5aqpqw
---
 kernel/sched/features.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'kernel/sched')

diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 980d92bab8ab..136a6584be79 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -29,7 +29,7 @@ SCHED_FEAT(PREEMPT_SHORT, true)
  * wakeup-preemption), since its likely going to consume data we
  * touched, increases cache locality.
  */
-SCHED_FEAT(NEXT_BUDDY, true)
+SCHED_FEAT(NEXT_BUDDY, false)
 
 /*
  * Allow completely ignoring cfs_rq->next; which can be set from various
-- 
cgit v1.2.3


From 15257cc2f905dbf5813c0bfdd3c15885f28093c4 Mon Sep 17 00:00:00 2001
From: Vincent Guittot <vincent.guittot@linaro.org>
Date: Fri, 23 Jan 2026 11:28:58 +0100
Subject: sched/fair: Revert force wakeup preemption

This agressively bypasses run_to_parity and slice protection with the
assumpiton that this is what waker wants but there is no garantee that
the wakee will be the next to run. It is a better choice to use
yield_to_task or WF_SYNC in such case.

This increases the number of resched and preemption because a task becomes
quickly "ineligible" when it runs; We update the task vruntime periodically
and before the task exhausted its slice or at least quantum.

Example:
2 tasks A and B wake up simultaneously with lag = 0. Both are
eligible. Task A runs 1st and wakes up task C. Scheduler updates task
A's vruntime which becomes greater than average runtime as all others
have a lag == 0 and didn't run yet. Now task A is ineligible because
it received more runtime than the other task but it has not yet
exhausted its slice nor a min quantum. We force preemption, disable
protection but Task B will run 1st not task C.

Sidenote, DELAY_ZERO increases this effect by clearing positive lag at
wake up.

Fixes: e837456fdca8 ("sched/fair: Reimplement NEXT_BUDDY to align with EEVDF goals")
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260123102858.52428-1-vincent.guittot@linaro.org
---
 kernel/sched/fair.c | 10 ----------
 1 file changed, 10 deletions(-)

(limited to 'kernel/sched')

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a148c61a8085..3eaeceda71b0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8828,16 +8828,6 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
 	if ((wake_flags & WF_FORK) || pse->sched_delayed)
 		return;
 
-	/*
-	 * If @p potentially is completing work required by current then
-	 * consider preemption.
-	 *
-	 * Reschedule if waker is no longer eligible. */
-	if (in_task() && !entity_eligible(cfs_rq, se)) {
-		preempt_action = PREEMPT_WAKEUP_RESCHED;
-		goto preempt;
-	}
-
 	/* Prefer picking wakee soon if appropriate. */
 	if (sched_feat(NEXT_BUDDY) &&
 	    set_preempt_buddy(cfs_rq, wake_flags, pse, se)) {
-- 
cgit v1.2.3