<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/sched/debug.c, branch v5.15.161</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.15.161</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.15.161'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2023-05-11T14:00:40Z</updated>
<entry>
<title>sched: Fix DEBUG &amp;&amp; !SCHEDSTATS warn</title>
<updated>2023-05-11T14:00:40Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2021-10-06T08:12:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0a008c5098d8f0d7a392cb9130fa5b6fd42f8a74'/>
<id>urn:sha1:0a008c5098d8f0d7a392cb9130fa5b6fd42f8a74</id>
<content type='text'>
commit 769fdf83df57b373660343ef4270b3ada91ef434 upstream.

When !SCHEDSTATS schedstat_enabled() is an unconditional 0 and the
whole block doesn't exist, however GCC figures the scoped variable
'stats' is unused and complains about it.

Upgrade the warning from -Wunused-variable to -Wunused-but-set-variable
by writing it in two statements. This fixes the build because the new
warning is in W=1.

Given that whole if(0) {} thing, I don't feel motivated to change
things overly much and quite strongly feel this is the compiler being
daft.

Fixes: cb3e971c435d ("sched: Make struct sched_statistics independent of fair sched class")
Reported-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched: Make struct sched_statistics independent of fair sched class</title>
<updated>2023-05-11T14:00:34Z</updated>
<author>
<name>Yafang Shao</name>
<email>laoar.shao@gmail.com</email>
</author>
<published>2021-09-05T14:35:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c3b9f95598b8029b7e8d9a6b3ad14280e712c70c'/>
<id>urn:sha1:c3b9f95598b8029b7e8d9a6b3ad14280e712c70c</id>
<content type='text'>
[ Upstream commit ceeadb83aea28372e54857bf88ab7e17af48ab7b ]

If we want to use the schedstats facility to trace other sched classes, we
should make it independent of fair sched class. The struct sched_statistics
is the schedular statistics of a task_struct or a task_group. So we can
move it into struct task_struct and struct task_group to achieve the goal.

After the patch, schestats are orgnized as follows,

    struct task_struct {
       ...
       struct sched_entity se;
       struct sched_rt_entity rt;
       struct sched_dl_entity dl;
       ...
       struct sched_statistics stats;
       ...
   };

Regarding the task group, schedstats is only supported for fair group
sched, and a new struct sched_entity_stats is introduced, suggested by
Peter -

    struct sched_entity_stats {
        struct sched_entity     se;
        struct sched_statistics stats;
    } __no_randomize_layout;

Then with the se in a task_group, we can easily get the stats.

The sched_statistics members may be frequently modified when schedstats is
enabled, in order to avoid impacting on random data which may in the same
cacheline with them, the struct sched_statistics is defined as cacheline
aligned.

As this patch changes the core struct of scheduler, so I verified the
performance it may impact on the scheduler with 'perf bench sched
pipe', suggested by Mel. Below is the result, in which all the values
are in usecs/op.
                                  Before               After
      kernel.sched_schedstats=0  5.2~5.4               5.2~5.4
      kernel.sched_schedstats=1  5.3~5.5               5.3~5.5
[These data is a little difference with the earlier version, that is
 because my old test machine is destroyed so I have to use a new
 different test machine.]

Almost no impact on the sched performance.

No functional change.

[lkp@intel.com: reported build failure in earlier version]

Signed-off-by: Yafang Shao &lt;laoar.shao@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Link: https://lore.kernel.org/r/20210905143547.4668-3-laoar.shao@gmail.com
Stable-dep-of: 39afe5d6fc59 ("sched/fair: Fix inaccurate tally of ttwu_move_affine")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/debug: fix dentry leak in update_sched_domain_debugfs</title>
<updated>2022-09-15T09:30:02Z</updated>
<author>
<name>Greg Kroah-Hartman</name>
<email>gregkh@linuxfoundation.org</email>
</author>
<published>2022-09-02T12:31:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=26e9a1ded8923510e5529fbb28390b22228700c2'/>
<id>urn:sha1:26e9a1ded8923510e5529fbb28390b22228700c2</id>
<content type='text'>
commit c2e406596571659451f4b95e37ddfd5a8ef1d0dc upstream.

Kuyo reports that the pattern of using debugfs_remove(debugfs_lookup())
leaks a dentry and with a hotplug stress test, the machine eventually
runs out of memory.

Fix this up by using the newly created debugfs_lookup_and_remove() call
instead which properly handles the dentry reference counting logic.

Cc: Major Chen &lt;major.chen@samsung.com&gt;
Cc: stable &lt;stable@kernel.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Cc: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Cc: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Ben Segall &lt;bsegall@google.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Daniel Bristot de Oliveira &lt;bristot@redhat.com&gt;
Cc: Valentin Schneider &lt;vschneid@redhat.com&gt;
Cc: Matthias Brugger &lt;matthias.bgg@gmail.com&gt;
Reported-by: Kuyo Chang &lt;kuyo.chang@mediatek.com&gt;
Tested-by: Kuyo Chang &lt;kuyo.chang@mediatek.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20220902123107.109274-2-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched/debug: Remove mpol_get/put and task_lock/unlock from sched_show_numa</title>
<updated>2022-04-08T12:23:10Z</updated>
<author>
<name>Bharata B Rao</name>
<email>bharata@amd.com</email>
</author>
<published>2022-01-18T05:05:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8bc68c44d97f28978aaa0012c395dc6f08dc02c3'/>
<id>urn:sha1:8bc68c44d97f28978aaa0012c395dc6f08dc02c3</id>
<content type='text'>
[ Upstream commit 28c988c3ec29db74a1dda631b18785958d57df4f ]

The older format of /proc/pid/sched printed home node info which
required the mempolicy and task lock around mpol_get(). However
the format has changed since then and there is no need for
sched_show_numa() any more to have mempolicy argument,
asssociated mpol_get/put and task_lock/unlock. Remove them.

Fixes: 397f2378f1361 ("sched/numa: Fix numa balancing stats in /proc/pid/sched")
Signed-off-by: Bharata B Rao &lt;bharata@amd.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Srikar Dronamraju &lt;srikar@linux.vnet.ibm.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Link: https://lore.kernel.org/r/20220118050515.2973-1-bharata@amd.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Null terminate buffer when updating tunable_scaling</title>
<updated>2021-10-01T11:57:57Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@techsingularity.net</email>
</author>
<published>2021-09-27T11:46:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=703066188f63d66cc6b9d678e5b5ef1213c5938e'/>
<id>urn:sha1:703066188f63d66cc6b9d678e5b5ef1213c5938e</id>
<content type='text'>
This patch null-terminates the temporary buffer in sched_scaling_write()
so kstrtouint() does not return failure and checks the value is valid.

Before:
  $ cat /sys/kernel/debug/sched/tunable_scaling
  1
  $ echo 0 &gt; /sys/kernel/debug/sched/tunable_scaling
  -bash: echo: write error: Invalid argument
  $ cat /sys/kernel/debug/sched/tunable_scaling
  1

After:
  $ cat /sys/kernel/debug/sched/tunable_scaling
  1
  $ echo 0 &gt; /sys/kernel/debug/sched/tunable_scaling
  $ cat /sys/kernel/debug/sched/tunable_scaling
  0
  $ echo 3 &gt; /sys/kernel/debug/sched/tunable_scaling
  -bash: echo: write error: Invalid argument

Fixes: 8a99b6833c88 ("sched: Move SCHED_DEBUG sysctl to debugfs")
Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Link: https://lore.kernel.org/r/20210927114635.GH3959@techsingularity.net
</content>
</entry>
<entry>
<title>sched: Cgroup SCHED_IDLE support</title>
<updated>2021-08-20T10:32:58Z</updated>
<author>
<name>Josh Don</name>
<email>joshdon@google.com</email>
</author>
<published>2021-07-30T02:00:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=304000390f88d049c85e9a0958ac5567f38816ee'/>
<id>urn:sha1:304000390f88d049c85e9a0958ac5567f38816ee</id>
<content type='text'>
This extends SCHED_IDLE to cgroups.

Interface: cgroup/cpu.idle.
 0: default behavior
 1: SCHED_IDLE

Extending SCHED_IDLE to cgroups means that we incorporate the existing
aspects of SCHED_IDLE; a SCHED_IDLE cgroup will count all of its
descendant threads towards the idle_h_nr_running count of all of its
ancestor cgroups. Thus, sched_idle_rq() will work properly.
Additionally, SCHED_IDLE cgroups are configured with minimum weight.

There are two key differences between the per-task and per-cgroup
SCHED_IDLE interface:

  - The cgroup interface allows tasks within a SCHED_IDLE hierarchy to
    maintain their relative weights. The entity that is "idle" is the
    cgroup, not the tasks themselves.

  - Since the idle entity is the cgroup, our SCHED_IDLE wakeup preemption
    decision is not made by comparing the current task with the woken
    task, but rather by comparing their matching sched_entity.

A typical use-case for this is a user that creates an idle and a
non-idle subtree. The non-idle subtree will dominate competition vs
the idle subtree, but the idle subtree will still be high priority vs
other users on the system. The latter is accomplished via comparing
matching sched_entity in the waken preemption path (this could also be
improved by making the sched_idle_rq() decision dependent on the
perspective of a specific task).

For now, we maintain the existing SCHED_IDLE semantics. Future patches
may make improvements that extend how we treat SCHED_IDLE entities.

The per-task_group idle field is an integer that currently only holds
either a 0 or a 1. This is explicitly typed as an integer to allow for
further extensions to this API. For example, a negative value may
indicate a highly latency-sensitive cgroup that should be preferred
for preemption/placement/etc.

Signed-off-by: Josh Don &lt;joshdon@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Link: https://lore.kernel.org/r/20210730020019.1487127-2-joshdon@google.com
</content>
</entry>
<entry>
<title>sched/debug: Don't update sched_domain debug directories before sched_debug_init()</title>
<updated>2021-06-28T13:42:24Z</updated>
<author>
<name>Valentin Schneider</name>
<email>valentin.schneider@arm.com</email>
</author>
<published>2021-05-18T13:07:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=459b09b5a3254008b63382bf41a9b36d0b590f57'/>
<id>urn:sha1:459b09b5a3254008b63382bf41a9b36d0b590f57</id>
<content type='text'>
Since CPU capacity asymmetry can stem purely from maximum frequency
differences (e.g. Pixel 1), a rebuild of the scheduler topology can be
issued upon loading cpufreq, see:

  arch_topology.c::init_cpu_capacity_callback()

Turns out that if this rebuild happens *before* sched_debug_init() is
run (which is a late initcall), we end up messing up the sched_domain debug
directory: passing a NULL parent to debugfs_create_dir() ends up creating
the directory at the debugfs root, which in this case creates
/sys/kernel/debug/domains (instead of /sys/kernel/debug/sched/domains).

This currently doesn't happen on asymmetric systems which use cpufreq-scpi
or cpufreq-dt drivers, as those are loaded via
deferred_probe_initcall() (it is also a late initcall, but appears to be
ordered *after* sched_debug_init()).

Ionela has been working on detecting maximum frequency asymmetry via ACPI,
and that actually happens via a *device* initcall, thus before
sched_debug_init(), and causes the aforementionned debugfs mayhem.

One option would be to punt sched_debug_init() down to
fs_initcall_sync(). Preventing update_sched_domain_debugfs() from running
before sched_debug_init() appears to be the safer option.

Fixes: 3b87f136f8fc ("sched,debug: Convert sysctl sched_domains to debugfs")
Signed-off-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: http://lore.kernel.org/r/20210514095339.12979-1-ionela.voinescu@arm.com
</content>
</entry>
<entry>
<title>Merge branch 'sched/urgent' into sched/core, to resolve conflicts</title>
<updated>2021-06-18T09:31:25Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2021-06-18T09:31:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b2c0931a07b7376c6291e0cfb347ad27f7b66263'/>
<id>urn:sha1:b2c0931a07b7376c6291e0cfb347ad27f7b66263</id>
<content type='text'>
This commit in sched/urgent moved the cfs_rq_is_decayed() function:

  a7b359fc6a37: ("sched/fair: Correctly insert cfs_rq's to list on unthrottle")

and this fresh commit in sched/core modified it in the old location:

  9e077b52d86a: ("sched/pelt: Check that *_avg are null when *_sum are")

Merge the two variants.

Conflicts:
	kernel/sched/fair.c

Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Fix util_est UTIL_AVG_UNCHANGED handling</title>
<updated>2021-06-03T13:47:23Z</updated>
<author>
<name>Dietmar Eggemann</name>
<email>dietmar.eggemann@arm.com</email>
</author>
<published>2021-06-02T14:58:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=68d7a190682aa4eb02db477328088ebad15acc83'/>
<id>urn:sha1:68d7a190682aa4eb02db477328088ebad15acc83</id>
<content type='text'>
The util_est internal UTIL_AVG_UNCHANGED flag which is used to prevent
unnecessary util_est updates uses the LSB of util_est.enqueued. It is
exposed via _task_util_est() (and task_util_est()).

Commit 92a801e5d5b7 ("sched/fair: Mask UTIL_AVG_UNCHANGED usages")
mentions that the LSB is lost for util_est resolution but
find_energy_efficient_cpu() checks if task_util_est() returns 0 to
return prev_cpu early.

_task_util_est() returns the max value of util_est.ewma and
util_est.enqueued or'ed w/ UTIL_AVG_UNCHANGED.
So task_util_est() returning the max of task_util() and
_task_util_est() will never return 0 under the default
SCHED_FEAT(UTIL_EST, true).

To fix this use the MSB of util_est.enqueued instead and keep the flag
util_est internal, i.e. don't export it via _task_util_est().

The maximal possible util_avg value for a task is 1024 so the MSB of
'unsigned int util_est.enqueued' isn't used to store a util value.

As a caveat the code behind the util_est_se trace point has to filter
UTIL_AVG_UNCHANGED to see the real util_est.enqueued value which should
be easy to do.

This also fixes an issue report by Xuewen Yan that util_est_update()
only used UTIL_AVG_UNCHANGED for the subtrahend of the equation:

  last_enqueued_diff = ue.enqueued - (task_util() | UTIL_AVG_UNCHANGED)

Fixes: b89997aa88f0b sched/pelt: Fix task util_est update filtering
Signed-off-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Xuewen Yan &lt;xuewen.yan@unisoc.com&gt;
Reviewed-by: Vincent Donnefort &lt;vincent.donnefort@arm.com&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Link: https://lore.kernel.org/r/20210602145808.1562603-1-dietmar.eggemann@arm.com
</content>
</entry>
<entry>
<title>sched: Wrap rq::lock access</title>
<updated>2021-05-12T09:43:26Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2020-11-17T23:19:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5cb9eaa3d274f75539077a28cf01e3563195fa53'/>
<id>urn:sha1:5cb9eaa3d274f75539077a28cf01e3563195fa53</id>
<content type='text'>
In preparation of playing games with rq-&gt;lock, abstract the thing
using an accessor.

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Don Hiatt &lt;dhiatt@digitalocean.com&gt;
Tested-by: Hongyu Ning &lt;hongyu.ning@linux.intel.com&gt;
Tested-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Link: https://lkml.kernel.org/r/20210422123308.136465446@infradead.org
</content>
</entry>
</feed>
