<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/sched, branch v4.12.5</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.12.5</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.12.5'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2017-07-27T22:10:25Z</updated>
<entry>
<title>sched/cputime: Don't use smp_processor_id() in preemptible context</title>
<updated>2017-07-27T22:10:25Z</updated>
<author>
<name>Wanpeng Li</name>
<email>wanpeng.li@hotmail.com</email>
</author>
<published>2017-07-09T07:40:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=baa11d76d15783e132e34957ab9f205373e9b9c5'/>
<id>urn:sha1:baa11d76d15783e132e34957ab9f205373e9b9c5</id>
<content type='text'>
commit 0e4097c3354e2f5a5ad8affd9dc7f7f7d00bb6b9 upstream.

Recent kernels trigger this warning:

 BUG: using smp_processor_id() in preemptible [00000000] code: 99-trinity/181
 caller is debug_smp_processor_id+0x17/0x19
 CPU: 0 PID: 181 Comm: 99-trinity Not tainted 4.12.0-01059-g2a42eb9 #1
 Call Trace:
  dump_stack+0x82/0xb8
  check_preemption_disabled()
  debug_smp_processor_id()
  vtime_delta()
  task_cputime()
  thread_group_cputime()
  thread_group_cputime_adjusted()
  wait_consider_task()
  do_wait()
  SYSC_wait4()
  do_syscall_64()
  entry_SYSCALL64_slow_path()

As Frederic pointed out:

| Although those sched_clock_cpu() things seem to only matter when the
| sched_clock() is unstable. And that stability is a condition for nohz_full
| to work anyway. So probably sched_clock() alone would be enough.

This patch fixes it by replacing sched_clock_cpu() with sched_clock() to
avoid calling smp_processor_id() in a preemptible context.

Reported-by: Xiaolong Ye &lt;xiaolong.ye@intel.com&gt;
Signed-off-by: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1499586028-7402-1-git-send-email-wanpeng.li@hotmail.com
[ Prettified the changelog. ]
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched/fair: Fix load_balance() affinity redo path</title>
<updated>2017-07-27T22:10:23Z</updated>
<author>
<name>Jeffrey Hugo</name>
<email>jhugo@codeaurora.org</email>
</author>
<published>2017-06-07T19:18:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fa7333486ea06d0f7c3078449aa9f9e069ef2d50'/>
<id>urn:sha1:fa7333486ea06d0f7c3078449aa9f9e069ef2d50</id>
<content type='text'>
commit 65a4433aebe36c8c6abeb69b99ef00274b971c6c upstream.

If load_balance() fails to migrate any tasks because all tasks were
affined, load_balance() removes the source CPU from consideration and
attempts to redo and balance among the new subset of CPUs.

There is a bug in this code path where the algorithm considers all active
CPUs in the system (minus the source that was just masked out).  This is
not valid for two reasons: some active CPUs may not be in the current
scheduling domain and one of the active CPUs is dst_cpu. These CPUs should
not be considered, as we cannot pull load from them.

Instead of failing out of load_balance(), we may end up redoing the search
with no valid CPUs and incorrectly concluding the domain is balanced.
Additionally, if the group_imbalance flag was just set, it may also be
incorrectly unset, thus the flag will not be seen by other CPUs in future
load_balance() runs as that algorithm intends.

Fix the check by removing CPUs not in the current domain and the dst_cpu
from considertation, thus limiting the evaluation to valid remaining CPUs
from which load might be migrated.

Co-authored-by: Austin Christ &lt;austinwc@codeaurora.org&gt;
Co-authored-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Tested-by: Tyler Baicar &lt;tbaicar@codeaurora.org&gt;
Signed-off-by: Jeffrey Hugo &lt;jhugo@codeaurora.org&gt;
Acked-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Austin Christ &lt;austinwc@codeaurora.org&gt;
Cc: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Timur Tabi &lt;timur@codeaurora.org&gt;
Link: http://lkml.kernel.org/r/1496863138-11322-2-git-send-email-jhugo@codeaurora.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched/cputime: Accumulate vtime on top of nsec clocksource</title>
<updated>2017-07-27T22:10:23Z</updated>
<author>
<name>Wanpeng Li</name>
<email>wanpeng.li@hotmail.com</email>
</author>
<published>2017-06-29T17:15:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8e44a3517763a89e1135024eab12ef3b24819dc0'/>
<id>urn:sha1:8e44a3517763a89e1135024eab12ef3b24819dc0</id>
<content type='text'>
commit 2a42eb9594a1480b4ead9e036e06ee1290e5fa6d upstream.

Currently the cputime source used by vtime is jiffies. When we cross
a context boundary and jiffies have changed since the last snapshot, the
pending cputime is accounted to the switching out context.

This system works ok if the ticks are not aligned across CPUs. If they
instead are aligned (ie: all fire at the same time) and the CPUs run in
userspace, the jiffies change is only observed on tick exit and therefore
the user cputime is accounted as system cputime. This is because the
CPU that maintains timekeeping fires its tick at the same time as the
others. It updates jiffies in the middle of the tick and the other CPUs
see that update on IRQ exit:

    CPU 0 (timekeeper)                  CPU 1
    -------------------              -------------
                      jiffies = N
    ...                              run in userspace for a jiffy
    tick entry                       tick entry (sees jiffies = N)
    set jiffies = N + 1
    tick exit                        tick exit (sees jiffies = N + 1)
                                                account 1 jiffy as stime

Fix this with using a nanosec clock source instead of jiffies. The
cputime is then accumulated and flushed everytime the pending delta
reaches a jiffy in order to mitigate the accounting overhead.

[ fweisbec: changelog, rebase on struct vtime, field renames, add delta
  on cputime readers, keep idle vtime as-is (low overhead accounting),
  harmonize clock sources. ]

Suggested-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reported-by: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Tested-by: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Signed-off-by: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Wanpeng Li &lt;kernellwp@gmail.com&gt;
Link: http://lkml.kernel.org/r/1498756511-11714-6-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched/cputime: Move the vtime task fields to their own struct</title>
<updated>2017-07-27T22:10:23Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>fweisbec@gmail.com</email>
</author>
<published>2017-06-29T17:15:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a1d04e8a110288917a4f4ac14370cb9f06c36500'/>
<id>urn:sha1:a1d04e8a110288917a4f4ac14370cb9f06c36500</id>
<content type='text'>
commit bac5b6b6b11560f323e71d0ebac4061cfe5f56c0 upstream.

We are about to add vtime accumulation fields to the task struct. Let's
avoid more bloatification and gather vtime information to their own
struct.

Tested-by: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Wanpeng Li &lt;kernellwp@gmail.com&gt;
Link: http://lkml.kernel.org/r/1498756511-11714-5-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched/cputime: Rename vtime fields</title>
<updated>2017-07-27T22:10:22Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>fweisbec@gmail.com</email>
</author>
<published>2017-06-29T17:15:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7d546b63c820089b08941ae9e5b63147fad211ee'/>
<id>urn:sha1:7d546b63c820089b08941ae9e5b63147fad211ee</id>
<content type='text'>
commit 60a9ce57e7c5ac1df3a39fb941022bbfa40c0862 upstream.

The current "snapshot" based naming on vtime fields suggests we record
some past event but that's a low level picture of their actual purpose
which comes out blurry. The real point of these fields is to run a basic
state machine that tracks down cputime entry while switching between
contexts.

So lets reflect that with more meaningful names.

Tested-by: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Wanpeng Li &lt;kernellwp@gmail.com&gt;
Link: http://lkml.kernel.org/r/1498756511-11714-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched/cputime: Always set tsk-&gt;vtime_snap_whence after accounting vtime</title>
<updated>2017-07-27T22:10:22Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>fweisbec@gmail.com</email>
</author>
<published>2017-06-29T17:15:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=09b43d8df3119e309e8842782f9acf39a9ab4872'/>
<id>urn:sha1:09b43d8df3119e309e8842782f9acf39a9ab4872</id>
<content type='text'>
commit 9fa57cf5a5c4aed1e45879b335fe433048709327 upstream.

Even though it doesn't have functional consequences, setting
the task's new context state after we actually accounted the pending
vtime from the old context state makes more sense from a review
perspective.

vtime_user_exit() is the only function that doesn't follow that rule
and that can bug the reviewer for a little while until he realizes there
is no reason for this special case.

Tested-by: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Wanpeng Li &lt;kernellwp@gmail.com&gt;
Link: http://lkml.kernel.org/r/1498756511-11714-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>vtime, sched/cputime: Remove vtime_account_user()</title>
<updated>2017-07-27T22:10:22Z</updated>
<author>
<name>Frederic Weisbecker</name>
<email>fweisbec@gmail.com</email>
</author>
<published>2017-06-29T17:15:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6bb15a0824ffdae94f60556bb433719d167e2fac'/>
<id>urn:sha1:6bb15a0824ffdae94f60556bb433719d167e2fac</id>
<content type='text'>
commit 1c3eda01a79b8e9237d91c52c5a75b20983f47c6 upstream.

It's an unnecessary function between vtime_user_exit() and
account_user_time().

Tested-by: Luiz Capitulino &lt;lcapitulino@redhat.com&gt;
Signed-off-by: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Wanpeng Li &lt;kernellwp@gmail.com&gt;
Link: http://lkml.kernel.org/r/1498756511-11714-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@techsingularity.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched/topology: Fix overlapping sched_group_mask</title>
<updated>2017-07-21T04:59:12Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2017-04-25T12:00:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d5ba2df79bc6ae02927d5465e93eb635809e6db5'/>
<id>urn:sha1:d5ba2df79bc6ae02927d5465e93eb635809e6db5</id>
<content type='text'>
commit 73bb059f9b8a00c5e1bf2f7ca83138c05d05e600 upstream.

The point of sched_group_mask is to select those CPUs from
sched_group_cpus that can actually arrive at this balance domain.

The current code gets it wrong, as can be readily demonstrated with a
topology like:

  node   0   1   2   3
    0:  10  20  30  20
    1:  20  10  20  30
    2:  30  20  10  20
    3:  20  30  20  10

Where (for example) domain 1 on CPU1 ends up with a mask that includes
CPU0:

  [] CPU1 attaching sched-domain:
  []  domain 0: span 0-2 level NUMA
  []   groups: 1 (mask: 1), 2, 0
  []   domain 1: span 0-3 level NUMA
  []    groups: 0-2 (mask: 0-2) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072)

This causes sched_balance_cpu() to compute the wrong CPU and
consequently should_we_balance() will terminate early resulting in
missed load-balance opportunities.

The fixed topology looks like:

  [] CPU1 attaching sched-domain:
  []  domain 0: span 0-2 level NUMA
  []   groups: 1 (mask: 1), 2, 0
  []   domain 1: span 0-3 level NUMA
  []    groups: 0-2 (mask: 1) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072)

(note: this relies on OVERLAP domains to always have children, this is
 true because the regular topology domains are still here -- this is
 before degenerate trimming)

Debugged-by: Lauro Ramos Venancio &lt;lvenanci@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Fixes: e3589f6c81e4 ("sched: Allow for overlapping sched_domain spans")
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched/topology: Optimize build_group_mask()</title>
<updated>2017-07-21T04:59:12Z</updated>
<author>
<name>Lauro Ramos Venancio</name>
<email>lvenanci@redhat.com</email>
</author>
<published>2017-04-20T19:51:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3397c1f0371a49f52635c9e60e552ef3c91fa10f'/>
<id>urn:sha1:3397c1f0371a49f52635c9e60e552ef3c91fa10f</id>
<content type='text'>
commit f32d782e31bf079f600dcec126ed117b0577e85c upstream.

The group mask is always used in intersection with the group CPUs. So,
when building the group mask, we don't have to care about CPUs that are
not part of the group.

Signed-off-by: Lauro Ramos Venancio &lt;lvenanci@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: lwang@redhat.com
Cc: riel@redhat.com
Link: http://lkml.kernel.org/r/1492717903-5195-2-git-send-email-lvenanci@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched/topology: Fix building of overlapping sched-groups</title>
<updated>2017-07-21T04:59:12Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2017-04-14T15:24:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e64918b4b7c3bac472198d7b59b6092a78092169'/>
<id>urn:sha1:e64918b4b7c3bac472198d7b59b6092a78092169</id>
<content type='text'>
commit 0372dd2736e02672ac6e189c31f7d8c02ad543cd upstream.

When building the overlapping groups, we very obviously should start
with the previous domain of _this_ @cpu, not CPU-0.

This can be readily demonstrated with a topology like:

  node   0   1   2   3
    0:  10  20  30  20
    1:  20  10  20  30
    2:  30  20  10  20
    3:  20  30  20  10

Where (for example) CPU1 ends up generating the following nonsensical groups:

  [] CPU1 attaching sched-domain:
  []  domain 0: span 0-2 level NUMA
  []   groups: 1 2 0
  []   domain 1: span 0-3 level NUMA
  []    groups: 1-3 (cpu_capacity = 3072) 0-1,3 (cpu_capacity = 3072)

Where the fact that domain 1 doesn't include a group with span 0-2 is
the obvious fail.

With patch this looks like:

  [] CPU1 attaching sched-domain:
  []  domain 0: span 0-2 level NUMA
  []   groups: 1 0 2
  []   domain 1: span 0-3 level NUMA
  []    groups: 0-2 (cpu_capacity = 3072) 0,2-3 (cpu_capacity = 3072)

Debugged-by: Lauro Ramos Venancio &lt;lvenanci@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Fixes: e3589f6c81e4 ("sched: Allow for overlapping sched_domain spans")
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
