<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/sched, branch v4.19.234</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.19.234</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.19.234'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2022-01-27T08:04:30Z</updated>
<entry>
<title>cputime, cpuacct: Include guest time in user time in cpuacct.stat</title>
<updated>2022-01-27T08:04:30Z</updated>
<author>
<name>Andrey Ryabinin</name>
<email>arbn@yandex-team.com</email>
</author>
<published>2021-11-15T16:46:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=952514c8565cf72a966993b473fae1708c3684f3'/>
<id>urn:sha1:952514c8565cf72a966993b473fae1708c3684f3</id>
<content type='text'>
commit 9731698ecb9c851f353ce2496292ff9fcea39dff upstream.

cpuacct.stat in no-root cgroups shows user time without guest time
included int it. This doesn't match with user time shown in root
cpuacct.stat and /proc/&lt;pid&gt;/stat. This also affects cgroup2's cpu.stat
in the same way.

Make account_guest_time() to add user time to cgroup's cpustat to
fix this.

Fixes: ef12fefabf94 ("cpuacct: add per-cgroup utime/stime statistics")
Signed-off-by: Andrey Ryabinin &lt;arbn@yandex-team.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Daniel Jordan &lt;daniel.m.jordan@oracle.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Link: https://lore.kernel.org/r/20211115164607.23784-1-arbn@yandex-team.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched/rt: Try to restart rt period timer when rt runtime exceeded</title>
<updated>2022-01-27T08:04:19Z</updated>
<author>
<name>Li Hua</name>
<email>hucool.lihua@huawei.com</email>
</author>
<published>2021-12-03T03:36:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e6bc7279b16517fab9ed3cdbd58ad7b08060c246'/>
<id>urn:sha1:e6bc7279b16517fab9ed3cdbd58ad7b08060c246</id>
<content type='text'>
[ Upstream commit 9b58e976b3b391c0cf02e038d53dd0478ed3013c ]

When rt_runtime is modified from -1 to a valid control value, it may
cause the task to be throttled all the time. Operations like the following
will trigger the bug. E.g:

  1. echo -1 &gt; /proc/sys/kernel/sched_rt_runtime_us
  2. Run a FIFO task named A that executes while(1)
  3. echo 950000 &gt; /proc/sys/kernel/sched_rt_runtime_us

When rt_runtime is -1, The rt period timer will not be activated when task
A enqueued. And then the task will be throttled after setting rt_runtime to
950,000. The task will always be throttled because the rt period timer is
not activated.

Fixes: d0b27fa77854 ("sched: rt-group: synchonised bandwidth period")
Reported-by: Hulk Robot &lt;hulkci@huawei.com&gt;
Signed-off-by: Li Hua &lt;hucool.lihua@huawei.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20211203033618.11895-1-hucool.lihua@huawei.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>wait: add wake_up_pollfree()</title>
<updated>2021-12-14T09:18:06Z</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2021-12-10T23:53:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8dd7c46a59756bdc29cb9783338b899cd3fb4b83'/>
<id>urn:sha1:8dd7c46a59756bdc29cb9783338b899cd3fb4b83</id>
<content type='text'>
commit 42288cb44c4b5fff7653bc392b583a2b8bd6a8c0 upstream.

Several -&gt;poll() implementations are special in that they use a
waitqueue whose lifetime is the current task, rather than the struct
file as is normally the case.  This is okay for blocking polls, since a
blocking poll occurs within one task; however, non-blocking polls
require another solution.  This solution is for the queue to be cleared
before it is freed, using 'wake_up_poll(wq, EPOLLHUP | POLLFREE);'.

However, that has a bug: wake_up_poll() calls __wake_up() with
nr_exclusive=1.  Therefore, if there are multiple "exclusive" waiters,
and the wakeup function for the first one returns a positive value, only
that one will be called.  That's *not* what's needed for POLLFREE;
POLLFREE is special in that it really needs to wake up everyone.

Considering the three non-blocking poll systems:

- io_uring poll doesn't handle POLLFREE at all, so it is broken anyway.

- aio poll is unaffected, since it doesn't support exclusive waits.
  However, that's fragile, as someone could add this feature later.

- epoll doesn't appear to be broken by this, since its wakeup function
  returns 0 when it sees POLLFREE.  But this is fragile.

Although there is a workaround (see epoll), it's better to define a
function which always sends POLLFREE to all waiters.  Add such a
function.  Also make it verify that the queue really becomes empty after
all waiters have been woken up.

Reported-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211209010455.42744-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain()</title>
<updated>2021-11-26T10:36:21Z</updated>
<author>
<name>Vincent Donnefort</name>
<email>vincent.donnefort@arm.com</email>
</author>
<published>2021-11-04T17:51:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=71731e0d68f4dd352a958942d6d0e39cb0f0fa58'/>
<id>urn:sha1:71731e0d68f4dd352a958942d6d0e39cb0f0fa58</id>
<content type='text'>
[ Upstream commit 42dc938a590c96eeb429e1830123fef2366d9c80 ]

Nothing protects the access to the per_cpu variable sd_llc_id. When testing
the same CPU (i.e. this_cpu == that_cpu), a race condition exists with
update_top_cache_domain(). One scenario being:

              CPU1                            CPU2
  ==================================================================

  per_cpu(sd_llc_id, CPUX) =&gt; 0
                                    partition_sched_domains_locked()
      				      detach_destroy_domains()
  cpus_share_cache(CPUX, CPUX)          update_top_cache_domain(CPUX)
    per_cpu(sd_llc_id, CPUX) =&gt; 0
                                          per_cpu(sd_llc_id, CPUX) = CPUX
    per_cpu(sd_llc_id, CPUX) =&gt; CPUX
    return false

ttwu_queue_cond() wouldn't catch smp_processor_id() == cpu and the result
is a warning triggered from ttwu_queue_wakelist().

Avoid a such race in cpus_share_cache() by always returning true when
this_cpu == that_cpu.

Fixes: 518cd6234178 ("sched: Only queue remote wakeups when crossing cache boundaries")
Reported-by: Jing-Ting Wu &lt;jing-ting.wu@mediatek.com&gt;
Signed-off-by: Vincent Donnefort &lt;vincent.donnefort@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Valentin Schneider &lt;valentin.schneider@arm.com&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Link: https://lore.kernel.org/r/20211104175120.857087-1-vincent.donnefort@arm.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>cpufreq: schedutil: Use kobject release() method to free sugov_tunables</title>
<updated>2021-10-06T13:31:20Z</updated>
<author>
<name>Kevin Hao</name>
<email>haokexin@gmail.com</email>
</author>
<published>2021-08-05T07:29:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=30d57cf2c4116ca6d34ecd1cac94ad84f8bc446c'/>
<id>urn:sha1:30d57cf2c4116ca6d34ecd1cac94ad84f8bc446c</id>
<content type='text'>
[ Upstream commit e5c6b312ce3cc97e90ea159446e6bfa06645364d ]

The struct sugov_tunables is protected by the kobject, so we can't free
it directly. Otherwise we would get a call trace like this:
  ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x30
  WARNING: CPU: 3 PID: 720 at lib/debugobjects.c:505 debug_print_object+0xb8/0x100
  Modules linked in:
  CPU: 3 PID: 720 Comm: a.sh Tainted: G        W         5.14.0-rc1-next-20210715-yocto-standard+ #507
  Hardware name: Marvell OcteonTX CN96XX board (DT)
  pstate: 40400009 (nZcv daif +PAN -UAO -TCO BTYPE=--)
  pc : debug_print_object+0xb8/0x100
  lr : debug_print_object+0xb8/0x100
  sp : ffff80001ecaf910
  x29: ffff80001ecaf910 x28: ffff00011b10b8d0 x27: ffff800011043d80
  x26: ffff00011a8f0000 x25: ffff800013cb3ff0 x24: 0000000000000000
  x23: ffff80001142aa68 x22: ffff800011043d80 x21: ffff00010de46f20
  x20: ffff800013c0c520 x19: ffff800011d8f5b0 x18: 0000000000000010
  x17: 6e6968207473696c x16: 5f72656d6974203a x15: 6570797420746365
  x14: 6a626f2029302065 x13: 303378302f307830 x12: 2b6e665f72656d69
  x11: ffff8000124b1560 x10: ffff800012331520 x9 : ffff8000100ca6b0
  x8 : 000000000017ffe8 x7 : c0000000fffeffff x6 : 0000000000000001
  x5 : ffff800011d8c000 x4 : ffff800011d8c740 x3 : 0000000000000000
  x2 : ffff0001108301c0 x1 : ab3c90eedf9c0f00 x0 : 0000000000000000
  Call trace:
   debug_print_object+0xb8/0x100
   __debug_check_no_obj_freed+0x1c0/0x230
   debug_check_no_obj_freed+0x20/0x88
   slab_free_freelist_hook+0x154/0x1c8
   kfree+0x114/0x5d0
   sugov_exit+0xbc/0xc0
   cpufreq_exit_governor+0x44/0x90
   cpufreq_set_policy+0x268/0x4a8
   store_scaling_governor+0xe0/0x128
   store+0xc0/0xf0
   sysfs_kf_write+0x54/0x80
   kernfs_fop_write_iter+0x128/0x1c0
   new_sync_write+0xf0/0x190
   vfs_write+0x2d4/0x478
   ksys_write+0x74/0x100
   __arm64_sys_write+0x24/0x30
   invoke_syscall.constprop.0+0x54/0xe0
   do_el0_svc+0x64/0x158
   el0_svc+0x2c/0xb0
   el0t_64_sync_handler+0xb0/0xb8
   el0t_64_sync+0x198/0x19c
  irq event stamp: 5518
  hardirqs last  enabled at (5517): [&lt;ffff8000100cbd7c&gt;] console_unlock+0x554/0x6c8
  hardirqs last disabled at (5518): [&lt;ffff800010fc0638&gt;] el1_dbg+0x28/0xa0
  softirqs last  enabled at (5504): [&lt;ffff8000100106e0&gt;] __do_softirq+0x4d0/0x6c0
  softirqs last disabled at (5483): [&lt;ffff800010049548&gt;] irq_exit+0x1b0/0x1b8

So split the original sugov_tunables_free() into two functions,
sugov_clear_global_tunables() is just used to clear the global_tunables
and the new sugov_tunables_free() is used as kobj_type::release to
release the sugov_tunables safely.

Fixes: 9bdcb44e391d ("cpufreq: schedutil: New governor based on scheduler utilization data")
Cc: 4.7+ &lt;stable@vger.kernel.org&gt; # 4.7+
Signed-off-by: Kevin Hao &lt;haokexin@gmail.com&gt;
Acked-by: Viresh Kumar &lt;viresh.kumar@linaro.org&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/deadline: Fix missing clock update in migrate_task_rq_dl()</title>
<updated>2021-09-22T09:47:49Z</updated>
<author>
<name>Dietmar Eggemann</name>
<email>dietmar.eggemann@arm.com</email>
</author>
<published>2021-08-04T13:59:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=293fe77dbfe68c5794be4e76c9212003e9304d24'/>
<id>urn:sha1:293fe77dbfe68c5794be4e76c9212003e9304d24</id>
<content type='text'>
[ Upstream commit b4da13aa28d4fd0071247b7b41c579ee8a86c81a ]

A missing clock update is causing the following warning:

rq-&gt;clock_update_flags &lt; RQCF_ACT_SKIP
WARNING: CPU: 112 PID: 2041 at kernel/sched/sched.h:1453
sub_running_bw.isra.0+0x190/0x1a0
...
CPU: 112 PID: 2041 Comm: sugov:112 Tainted: G W 5.14.0-rc1 #1
Hardware name: WIWYNN Mt.Jade Server System
B81.030Z1.0007/Mt.Jade Motherboard, BIOS 1.6.20210526 (SCP:
1.06.20210526) 2021/05/26
...
Call trace:
  sub_running_bw.isra.0+0x190/0x1a0
  migrate_task_rq_dl+0xf8/0x1e0
  set_task_cpu+0xa8/0x1f0
  try_to_wake_up+0x150/0x3d4
  wake_up_q+0x64/0xc0
  __up_write+0xd0/0x1c0
  up_write+0x4c/0x2b0
  cppc_set_perf+0x120/0x2d0
  cppc_cpufreq_set_target+0xe0/0x1a4 [cppc_cpufreq]
  __cpufreq_driver_target+0x74/0x140
  sugov_work+0x64/0x80
  kthread_worker_fn+0xe0/0x230
  kthread+0x138/0x140
  ret_from_fork+0x10/0x18

The task causing this is the `cppc_fie` DL task introduced by
commit 1eb5dde674f5 ("cpufreq: CPPC: Add support for frequency
invariance").

With CONFIG_ACPI_CPPC_CPUFREQ_FIE=y and schedutil cpufreq governor on
slow-switching system (like on this Ampere Altra WIWYNN Mt. Jade Arm
Server):

DL task `curr=sugov:112` lets `p=cppc_fie` migrate and since the latter
is in `non_contending` state, migrate_task_rq_dl() calls

  sub_running_bw()-&gt;__sub_running_bw()-&gt;cpufreq_update_util()-&gt;
  rq_clock()-&gt;assert_clock_updated()

on p.

Fix this by updating the clock for a non_contending task in
migrate_task_rq_dl() before calling sub_running_bw().

Reported-by: Bruno Goncalves &lt;bgoncalv@redhat.com&gt;
Signed-off-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Daniel Bristot de Oliveira &lt;bristot@kernel.org&gt;
Acked-by: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Link: https://lore.kernel.org/r/20210804135925.3734605-1-dietmar.eggemann@arm.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/deadline: Fix reset_on_fork reporting of DL tasks</title>
<updated>2021-09-22T09:47:49Z</updated>
<author>
<name>Quentin Perret</name>
<email>qperret@google.com</email>
</author>
<published>2021-07-27T10:11:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bc0d543f53e0d73059cf10733be0109984dadd44'/>
<id>urn:sha1:bc0d543f53e0d73059cf10733be0109984dadd44</id>
<content type='text'>
[ Upstream commit f95091536f78971b269ec321b057b8d630b0ad8a ]

It is possible for sched_getattr() to incorrectly report the state of
the reset_on_fork flag when called on a deadline task.

Indeed, if the flag was set on a deadline task using sched_setattr()
with flags (SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_KEEP_PARAMS), then
p-&gt;sched_reset_on_fork will be set, but __setscheduler() will bail out
early, which means that the dl_se-&gt;flags will not get updated by
__setscheduler_params()-&gt;__setparam_dl(). Consequently, if
sched_getattr() is then called on the task, __getparam_dl() will
override kattr.sched_flags with the now out-of-date copy in dl_se-&gt;flags
and report the stale value to userspace.

To fix this, make sure to only copy the flags that are relevant to
sched_deadline to and from the dl_se-&gt;flags field.

Signed-off-by: Quentin Perret &lt;qperret@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20210727101103.2729607-2-qperret@google.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Fix CFS bandwidth hrtimer expiry type</title>
<updated>2021-07-28T09:13:44Z</updated>
<author>
<name>Odin Ugedal</name>
<email>odin@uged.al</email>
</author>
<published>2021-06-29T12:14:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=aa36bd8fc187f513af2d250b82a3c913053ceaf1'/>
<id>urn:sha1:aa36bd8fc187f513af2d250b82a3c913053ceaf1</id>
<content type='text'>
[ Upstream commit 72d0ad7cb5bad265adb2014dbe46c4ccb11afaba ]

The time remaining until expiry of the refresh_timer can be negative.
Casting the type to an unsigned 64-bit value will cause integer
underflow, making the runtime_refresh_within return false instead of
true. These situations are rare, but they do happen.

This does not cause user-facing issues or errors; other than
possibly unthrottling cfs_rq's using runtime from the previous period(s),
making the CFS bandwidth enforcement less strict in those (special)
situations.

Signed-off-by: Odin Ugedal &lt;odin@uged.al&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Ben Segall &lt;bsegall@google.com&gt;
Link: https://lore.kernel.org/r/20210629121452.18429-1-odin@uged.al
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Fix ascii art by relpacing tabs</title>
<updated>2021-07-20T14:15:44Z</updated>
<author>
<name>Odin Ugedal</name>
<email>odin@uged.al</email>
</author>
<published>2021-05-18T12:52:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9b808bdcc936be3503e5141879beb1cbd693d984'/>
<id>urn:sha1:9b808bdcc936be3503e5141879beb1cbd693d984</id>
<content type='text'>
[ Upstream commit 08f7c2f4d0e9f4283f5796b8168044c034a1bfcb ]

When using something other than 8 spaces per tab, this ascii art
makes not sense, and the reader might end up wondering what this
advanced equation "is".

Signed-off-by: Odin Ugedal &lt;odin@uged.al&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Link: https://lkml.kernel.org/r/20210518125202.78658-4-odin@uged.al
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/fair: Make sure to update tg contrib for blocked load</title>
<updated>2021-06-16T09:55:01Z</updated>
<author>
<name>Vincent Guittot</name>
<email>vincent.guittot@linaro.org</email>
</author>
<published>2021-05-27T12:29:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4af84445075da26439598984d1a01c1cdadb0e2d'/>
<id>urn:sha1:4af84445075da26439598984d1a01c1cdadb0e2d</id>
<content type='text'>
commit 02da26ad5ed6ea8680e5d01f20661439611ed776 upstream.

During the update of fair blocked load (__update_blocked_fair()), we
update the contribution of the cfs in tg-&gt;load_avg if cfs_rq's pelt
has decayed.  Nevertheless, the pelt values of a cfs_rq could have
been recently updated while propagating the change of a child. In this
case, cfs_rq's pelt will not decayed because it has already been
updated and we don't update tg-&gt;load_avg.

__update_blocked_fair
  ...
  for_each_leaf_cfs_rq_safe: child cfs_rq
    update cfs_rq_load_avg() for child cfs_rq
    ...
    update_load_avg(cfs_rq_of(se), se, 0)
      ...
      update cfs_rq_load_avg() for parent cfs_rq
		-propagation of child's load makes parent cfs_rq-&gt;load_sum
		 becoming null
        -UPDATE_TG is not set so it doesn't update parent
		 cfs_rq-&gt;tg_load_avg_contrib
  ..
  for_each_leaf_cfs_rq_safe: parent cfs_rq
    update cfs_rq_load_avg() for parent cfs_rq
      - nothing to do because parent cfs_rq has already been updated
		recently so cfs_rq-&gt;tg_load_avg_contrib is not updated
    ...
    parent cfs_rq is decayed
      list_del_leaf_cfs_rq parent cfs_rq
	  - but it still contibutes to tg-&gt;load_avg

we must set UPDATE_TG flags when propagting pending load to the parent

Fixes: 039ae8bcf7a5 ("sched/fair: Fix O(nr_cgroups) in the load balancing path")
Reported-by: Odin Ugedal &lt;odin@uged.al&gt;
Signed-off-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Odin Ugedal &lt;odin@uged.al&gt;
Link: https://lkml.kernel.org/r/20210527122916.27683-3-vincent.guittot@linaro.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
