<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/sched, branch v6.1.34</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.1.34</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.1.34'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2023-05-11T14:03:32Z</updated>
<entry>
<title>sched/rt: Fix bad task migration for rt tasks</title>
<updated>2023-05-11T14:03:32Z</updated>
<author>
<name>Schspa Shi</name>
<email>schspa@gmail.com</email>
</author>
<published>2022-08-28T17:03:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6472a6d0c79826256bfa3a6b45710dd499a984bf'/>
<id>urn:sha1:6472a6d0c79826256bfa3a6b45710dd499a984bf</id>
<content type='text'>
[ Upstream commit feffe5bb274dd3442080ef0e4053746091878799 ]

Commit 95158a89dd50 ("sched,rt: Use the full cpumask for balancing")
allows find_lock_lowest_rq() to pick a task with migration disabled.
The purpose of the commit is to push the current running task on the
CPU that has the migrate_disable() task away.

However, there is a race which allows a migrate_disable() task to be
migrated. Consider:

  CPU0                                    CPU1
  push_rt_task
    check is_migration_disabled(next_task)

                                          task not running and
                                          migration_disabled == 0

    find_lock_lowest_rq(next_task, rq);
      _double_lock_balance(this_rq, busiest);
        raw_spin_rq_unlock(this_rq);
        double_rq_lock(this_rq, busiest);
          &lt;&lt;wait for busiest rq&gt;&gt;
                                              &lt;wakeup&gt;
                                          task become running
                                          migrate_disable();
                                            &lt;context out&gt;
    deactivate_task(rq, next_task, 0);
    set_task_cpu(next_task, lowest_rq-&gt;cpu);
      WARN_ON_ONCE(is_migration_disabled(p));

Fixes: 95158a89dd50 ("sched,rt: Use the full cpumask for balancing")
Signed-off-by: Schspa Shi &lt;schspa@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Reviewed-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Reviewed-by: Valentin Schneider &lt;vschneid@redhat.com&gt;
Tested-by: Dwaine Gonyier &lt;dgonyier@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Fix inaccurate tally of ttwu_move_affine</title>
<updated>2023-05-11T14:03:31Z</updated>
<author>
<name>Libo Chen</name>
<email>libo.chen@oracle.com</email>
</author>
<published>2022-08-10T22:33:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=944465c772fb00509150ba374298d9e4d5d47788'/>
<id>urn:sha1:944465c772fb00509150ba374298d9e4d5d47788</id>
<content type='text'>
[ Upstream commit 39afe5d6fc59237ff7738bf3ede5a8856822d59d ]

There are scenarios where non-affine wakeups are incorrectly counted as
affine wakeups by schedstats.

When wake_affine_idle() returns prev_cpu which doesn't equal to
nr_cpumask_bits, it will slip through the check: target == nr_cpumask_bits
in wake_affine() and be counted as if target == this_cpu in schedstats.

Replace target == nr_cpumask_bits with target != this_cpu to make sure
affine wakeups are accurately tallied.

Fixes: 806486c377e33 (sched/fair: Do not migrate if the prev_cpu is idle)
Suggested-by: Daniel Jordan &lt;daniel.m.jordan@oracle.com&gt;
Signed-off-by: Libo Chen &lt;libo.chen@oracle.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Gautham R. Shenoy &lt;gautham.shenoy@amd.com&gt;
Link: https://lore.kernel.org/r/20220810223313.386614-1-libo.chen@oracle.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Fixes for capacity inversion detection</title>
<updated>2023-04-26T12:28:42Z</updated>
<author>
<name>Qais Yousef</name>
<email>qyousef@layalina.io</email>
</author>
<published>2023-04-18T14:04:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d362a03d920edc0bb9a85c5150ef4ff5fd365c72'/>
<id>urn:sha1:d362a03d920edc0bb9a85c5150ef4ff5fd365c72</id>
<content type='text'>
commit: da07d2f9c153e457e845d4dcfdd13568d71d18a4 upstream.

Traversing the Perf Domains requires rcu_read_lock() to be held and is
conditional on sched_energy_enabled(). Ensure right protections applied.

Also skip capacity inversion detection for our own pd; which was an
error.

Fixes: 44c7b80bffc3 ("sched/fair: Detect capacity inversion")
Reported-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Signed-off-by: Qais Yousef (Google) &lt;qyousef@layalina.io&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Link: https://lore.kernel.org/r/20230112122708.330667-3-qyousef@layalina.io
(cherry picked from commit da07d2f9c153e457e845d4dcfdd13568d71d18a4)
Signed-off-by: Qais Yousef (Google) &lt;qyousef@layalina.io&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Consider capacity inversion in util_fits_cpu()</title>
<updated>2023-04-26T12:28:42Z</updated>
<author>
<name>Qais Yousef</name>
<email>qais.yousef@arm.com</email>
</author>
<published>2023-04-18T14:04:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=799c7301ded6cc44c5b7b716f3fe707a41722ed1'/>
<id>urn:sha1:799c7301ded6cc44c5b7b716f3fe707a41722ed1</id>
<content type='text'>
commit: aa69c36f31aadc1669bfa8a3de6a47b5e6c98ee8 upstream.

We do consider thermal pressure in util_fits_cpu() for uclamp_min only.
With the exception of the biggest cores which by definition are the max
performance point of the system and all tasks by definition should fit.

Even under thermal pressure, the capacity of the biggest CPU is the
highest in the system and should still fit every task. Except when it
reaches capacity inversion point, then this is no longer true.

We can handle this by using the inverted capacity as capacity_orig in
util_fits_cpu(). Which not only addresses the problem above, but also
ensure uclamp_max now considers the inverted capacity. Force fitting
a task when a CPU is in this adverse state will contribute to making the
thermal throttling last longer.

Signed-off-by: Qais Yousef &lt;qais.yousef@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20220804143609.515789-10-qais.yousef@arm.com
(cherry picked from commit aa69c36f31aadc1669bfa8a3de6a47b5e6c98ee8)
Signed-off-by: Qais Yousef (Google) &lt;qyousef@layalina.io&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Detect capacity inversion</title>
<updated>2023-04-26T12:28:41Z</updated>
<author>
<name>Qais Yousef</name>
<email>qais.yousef@arm.com</email>
</author>
<published>2023-04-18T14:04:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fe1c982958c51507cbb3371a8365905f931860fa'/>
<id>urn:sha1:fe1c982958c51507cbb3371a8365905f931860fa</id>
<content type='text'>
commit: 44c7b80bffc3a657a36857098d5d9c49d94e652b upstream.

Check each performance domain to see if thermal pressure is causing its
capacity to be lower than another performance domain.

We assume that each performance domain has CPUs with the same
capacities, which is similar to an assumption made in energy_model.c

We also assume that thermal pressure impacts all CPUs in a performance
domain equally.

If there're multiple performance domains with the same capacity_orig, we
will trigger a capacity inversion if the domain is under thermal
pressure.

The new cpu_in_capacity_inversion() should help users to know when
information about capacity_orig are not reliable and can opt in to use
the inverted capacity as the 'actual' capacity_orig.

Signed-off-by: Qais Yousef &lt;qais.yousef@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20220804143609.515789-9-qais.yousef@arm.com
(cherry picked from commit 44c7b80bffc3a657a36857098d5d9c49d94e652b)
Signed-off-by: Qais Yousef (Google) &lt;qyousef@layalina.io&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Fix imbalance overflow</title>
<updated>2023-04-20T10:35:14Z</updated>
<author>
<name>Vincent Guittot</name>
<email>vincent.guittot@linaro.org</email>
</author>
<published>2023-04-11T09:06:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=31c5ad43bdd14fd90221f4a93d764a923b0fae5e'/>
<id>urn:sha1:31c5ad43bdd14fd90221f4a93d764a923b0fae5e</id>
<content type='text'>
[ Upstream commit 91dcf1e8068e9a8823e419a7a34ff4341275fb70 ]

When local group is fully busy but its average load is above system load,
computing the imbalance will overflow and local group is not the best
target for pulling this load.

Fixes: 0b0695f2b34a ("sched/fair: Rework load_balance()")
Reported-by: Tingjia Cao &lt;tjcao980311@gmail.com&gt;
Signed-off-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Tingjia Cao &lt;tjcao980311@gmail.com&gt;
Link: https://lore.kernel.org/lkml/CABcWv9_DAhVBOq2=W=2ypKE9dKM5s2DvoV8-U0+GDwwuKZ89jQ@mail.gmail.com/T/
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched_getaffinity: don't assume 'cpumask_size()' is fully initialized</title>
<updated>2023-04-06T10:10:40Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-03-15T02:32:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c8943cf3ab9b83c4cdc448279dfdbd03b7beeb93'/>
<id>urn:sha1:c8943cf3ab9b83c4cdc448279dfdbd03b7beeb93</id>
<content type='text'>
[ Upstream commit 6015b1aca1a233379625385feb01dd014aca60b5 ]

The getaffinity() system call uses 'cpumask_size()' to decide how big
the CPU mask is - so far so good.  It is indeed the allocation size of a
cpumask.

But the code also assumes that the whole allocation is initialized
without actually doing so itself.  That's wrong, because we might have
fixed-size allocations (making copying and clearing more efficient), but
not all of it is then necessarily used if 'nr_cpu_ids' is smaller.

Having checked other users of 'cpumask_size()', they all seem to be ok,
either using it purely for the allocation size, or explicitly zeroing
the cpumask before using the size in bytes to copy it.

See for example the ublk_ctrl_get_queue_affinity() function that uses
the proper 'zalloc_cpumask_var()' to make sure that the whole mask is
cleared, whether the storage is on the stack or if it was an external
allocation.

Fix this by just zeroing the allocation before using it.  Do the same
for the compat version of sched_getaffinity(), which had the same logic.

Also, for consistency, make sched_getaffinity() use 'cpumask_bits()' to
access the bits.  For a cpumask_var_t, it ends up being a pointer to the
same data either way, but it's just a good idea to treat it like you
would a 'cpumask_t'.  The compat case already did that.

Reported-by: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Link: https://lore.kernel.org/lkml/7d026744-6bd6-6827-0471-b5e8eae0be3f@arm.com/
Cc: Yury Norov &lt;yury.norov@gmail.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Sanitize vruntime of entity being migrated</title>
<updated>2023-03-30T10:49:30Z</updated>
<author>
<name>Vincent Guittot</name>
<email>vincent.guittot@linaro.org</email>
</author>
<published>2023-03-17T16:08:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=388c4c1d1212eb2015835c4587873b3d3ea115e9'/>
<id>urn:sha1:388c4c1d1212eb2015835c4587873b3d3ea115e9</id>
<content type='text'>
commit a53ce18cacb477dd0513c607f187d16f0fa96f71 upstream.

Commit 829c1651e9c4 ("sched/fair: sanitize vruntime of entity being placed")
fixes an overflowing bug, but ignore a case that se-&gt;exec_start is reset
after a migration.

For fixing this case, we delay the reset of se-&gt;exec_start after
placing the entity which se-&gt;exec_start to detect long sleeping task.

In order to take into account a possible divergence between the clock_task
of 2 rqs, we increase the threshold to around 104 days.

Fixes: 829c1651e9c4 ("sched/fair: sanitize vruntime of entity being placed")
Originally-by: Zhang Qiao &lt;zhangqiao22@huawei.com&gt;
Signed-off-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Zhang Qiao &lt;zhangqiao22@huawei.com&gt;
Link: https://lore.kernel.org/r/20230317160810.107988-1-vincent.guittot@linaro.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: sanitize vruntime of entity being placed</title>
<updated>2023-03-30T10:49:30Z</updated>
<author>
<name>Zhang Qiao</name>
<email>zhangqiao22@huawei.com</email>
</author>
<published>2023-01-30T12:22:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e427be68892e0eeeaafcc1029c77deb8ba4973db'/>
<id>urn:sha1:e427be68892e0eeeaafcc1029c77deb8ba4973db</id>
<content type='text'>
commit 829c1651e9c4a6f78398d3e67651cef9bb6b42cc upstream.

When a scheduling entity is placed onto cfs_rq, its vruntime is pulled
to the base level (around cfs_rq-&gt;min_vruntime), so that the entity
doesn't gain extra boost when placed backwards.

However, if the entity being placed wasn't executed for a long time, its
vruntime may get too far behind (e.g. while cfs_rq was executing a
low-weight hog), which can inverse the vruntime comparison due to s64
overflow.  This results in the entity being placed with its original
vruntime way forwards, so that it will effectively never get to the cpu.

To prevent that, ignore the vruntime of the entity being placed if it
didn't execute for much longer than the characteristic sheduler time
scale.

[rkagan: formatted, adjusted commit log, comments, cutoff value]
Signed-off-by: Zhang Qiao &lt;zhangqiao22@huawei.com&gt;
Co-developed-by: Roman Kagan &lt;rkagan@amazon.de&gt;
Signed-off-by: Roman Kagan &lt;rkagan@amazon.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20230130122216.3555094-1-rkagan@amazon.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>wait: Return number of exclusive waiters awaken</title>
<updated>2023-03-10T08:34:34Z</updated>
<author>
<name>Gabriel Krisman Bertazi</name>
<email>krisman@suse.de</email>
</author>
<published>2022-11-15T22:45:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d710b1e91bc0b054bebc525d01c644fde1900e50'/>
<id>urn:sha1:d710b1e91bc0b054bebc525d01c644fde1900e50</id>
<content type='text'>
commit ee7dc86b6d3e3b86c2c487f713eda657850de238 upstream.

Sbitmap code will need to know how many waiters were actually woken for
its batched wakeups implementation.  Return the number of woken
exclusive waiters from __wake_up() to facilitate that.

Suggested-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Link: https://lore.kernel.org/r/20221115224553.23594-3-krisman@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
