<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/sched/core.c, branch v3.10.107</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.10.107</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.10.107'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2017-06-20T12:04:19Z</updated>
<entry>
<title>sched/debug: Don't dump sched debug info in SysRq-W</title>
<updated>2017-06-20T12:04:19Z</updated>
<author>
<name>Rabin Vincent</name>
<email>rabinv@axis.com</email>
</author>
<published>2016-04-04T13:42:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4718919e008905508b5cfa5c3531f6b3cc05cdf8'/>
<id>urn:sha1:4718919e008905508b5cfa5c3531f6b3cc05cdf8</id>
<content type='text'>
commit fb90a6e93c0684ab2629a42462400603aa829b9c upstream.

sysrq_sched_debug_show() can dump a lot of information.  Don't print out
all that if we're just trying to get a list of blocked tasks (SysRq-W).
The information is still accessible with SysRq-T.

Signed-off-by: Rabin Vincent &lt;rabinv@axis.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1459777322-30902-1-git-send-email-rabin.vincent@axis.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Nikolay Borisov &lt;n.borisov.lkml@gmail.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
Signed-off-by: Willy Tarreau &lt;w@1wt.eu&gt;
</content>
</entry>
<entry>
<title>locking/static_keys: Add static_key_{en,dis}able() helpers</title>
<updated>2017-06-07T22:47:03Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2015-07-24T13:03:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=12f1a0f987bec98b105f24225691ab1ea7c77477'/>
<id>urn:sha1:12f1a0f987bec98b105f24225691ab1ea7c77477</id>
<content type='text'>
commit e33886b38cc82a9fc3b2d655dfc7f50467594138 upstream.

Add two helpers to make it easier to treat the refcount as boolean.

[js] do not involve WARN_ON_ONCE as it causes build failures

Suggested-by: Jason Baron &lt;jasonbaron0@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
[wt: only backported for use in next fix ;
     s/static_key_count(key)/atomic_read(&amp;key-&gt;enabled)/]

Signed-off-by: Willy Tarreau &lt;w@1wt.eu&gt;
</content>
</entry>
<entry>
<title>sched/core: Fix an SMP ordering race in try_to_wake_up() vs. schedule()</title>
<updated>2017-02-06T08:04:07Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2015-10-07T12:14:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c73d76df37ce1fa2d96e6c97e44ba5b4edc8eb7f'/>
<id>urn:sha1:c73d76df37ce1fa2d96e6c97e44ba5b4edc8eb7f</id>
<content type='text'>
commit ecf7d01c229d11a44609c0067889372c91fb4f36 upstream.

Oleg noticed that its possible to falsely observe p-&gt;on_cpu == 0 such
that we'll prematurely continue with the wakeup and effectively run p on
two CPUs at the same time.

Even though the overlap is very limited; the task is in the middle of
being scheduled out; it could still result in corruption of the
scheduler data structures.

        CPU0                            CPU1

        set_current_state(...)

        &lt;preempt_schedule&gt;
          context_switch(X, Y)
            prepare_lock_switch(Y)
              Y-&gt;on_cpu = 1;
            finish_lock_switch(X)
              store_release(X-&gt;on_cpu, 0);

                                        try_to_wake_up(X)
                                          LOCK(p-&gt;pi_lock);

                                          t = X-&gt;on_cpu; // 0

          context_switch(Y, X)
            prepare_lock_switch(X)
              X-&gt;on_cpu = 1;
            finish_lock_switch(Y)
              store_release(Y-&gt;on_cpu, 0);
        &lt;/preempt_schedule&gt;

        schedule();
          deactivate_task(X);
          X-&gt;on_rq = 0;

                                          if (X-&gt;on_rq) // false

                                          if (t) while (X-&gt;on_cpu)
                                            cpu_relax();

          context_switch(X, ..)
            finish_lock_switch(X)
              store_release(X-&gt;on_cpu, 0);

Avoid the load of X-&gt;on_cpu being hoisted over the X-&gt;on_rq load.

Reported-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Willy Tarreau &lt;w@1wt.eu&gt;
</content>
</entry>
<entry>
<title>sched/core: Fix a race between try_to_wake_up() and a woken up task</title>
<updated>2017-02-06T08:04:07Z</updated>
<author>
<name>Balbir Singh</name>
<email>bsingharora@gmail.com</email>
</author>
<published>2016-09-05T03:16:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6a94ced1bf392ce674e04ccfe34b860e970dde4a'/>
<id>urn:sha1:6a94ced1bf392ce674e04ccfe34b860e970dde4a</id>
<content type='text'>
commit 135e8c9250dd5c8c9aae5984fde6f230d0cbfeaf upstream.

The origin of the issue I've seen is related to
a missing memory barrier between check for task-&gt;state and
the check for task-&gt;on_rq.

The task being woken up is already awake from a schedule()
and is doing the following:

	do {
		schedule()
		set_current_state(TASK_(UN)INTERRUPTIBLE);
	} while (!cond);

The waker, actually gets stuck doing the following in
try_to_wake_up():

	while (p-&gt;on_cpu)
		cpu_relax();

Analysis:

The instance I've seen involves the following race:

 CPU1					CPU2

 while () {
   if (cond)
     break;
   do {
     schedule();
     set_current_state(TASK_UN..)
   } while (!cond);
					wakeup_routine()
					  spin_lock_irqsave(wait_lock)
   raw_spin_lock_irqsave(wait_lock)	  wake_up_process()
 }					  try_to_wake_up()
 set_current_state(TASK_RUNNING);	  ..
 list_del(&amp;waiter.list);

CPU2 wakes up CPU1, but before it can get the wait_lock and set
current state to TASK_RUNNING the following occurs:

 CPU3
 wakeup_routine()
 raw_spin_lock_irqsave(wait_lock)
 if (!list_empty)
   wake_up_process()
   try_to_wake_up()
   raw_spin_lock_irqsave(p-&gt;pi_lock)
   ..
   if (p-&gt;on_rq &amp;&amp; ttwu_wakeup())
   ..
   while (p-&gt;on_cpu)
     cpu_relax()
   ..

CPU3 tries to wake up the task on CPU1 again since it finds
it on the wait_queue, CPU1 is spinning on wait_lock, but immediately
after CPU2, CPU3 got it.

CPU3 checks the state of p on CPU1, it is TASK_UNINTERRUPTIBLE and
the task is spinning on the wait_lock. Interestingly since p-&gt;on_rq
is checked under pi_lock, I've noticed that try_to_wake_up() finds
p-&gt;on_rq to be 0. This was the most confusing bit of the analysis,
but p-&gt;on_rq is changed under runqueue lock, rq_lock, the p-&gt;on_rq
check is not reliable without this fix IMHO. The race is visible
(based on the analysis) only when ttwu_queue() does a remote wakeup
via ttwu_queue_remote. In which case the p-&gt;on_rq change is not
done uder the pi_lock.

The result is that after a while the entire system locks up on
the raw_spin_irqlock_save(wait_lock) and the holder spins infintely

Reproduction of the issue:

The issue can be reproduced after a long run on my system with 80
threads and having to tweak available memory to very low and running
memory stress-ng mmapfork test. It usually takes a long time to
reproduce. I am trying to work on a test case that can reproduce
the issue faster, but thats work in progress. I am still testing the
changes on my still in a loop and the tests seem OK thus far.

Big thanks to Benjamin and Nick for helping debug this as well.
Ben helped catch the missing barrier, Nick caught every missing
bit in my theory.

Signed-off-by: Balbir Singh &lt;bsingharora@gmail.com&gt;
[ Updated comment to clarify matching barriers. Many
  architectures do not have a full barrier in switch_to()
  so that cannot be relied upon. ]
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Alexey Kardashevskiy &lt;aik@ozlabs.ru&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Nicholas Piggin &lt;nicholas.piggin@gmail.com&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/e02cce7b-d9ca-1ad0-7a61-ea97c7582b37@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Willy Tarreau &lt;w@1wt.eu&gt;
</content>
</entry>
<entry>
<title>sched/cputime: Fix steal time accounting vs. CPU hotplug</title>
<updated>2016-06-07T08:42:48Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2016-03-04T14:59:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0579a12791e483fcaaad76748fb163ec855102a4'/>
<id>urn:sha1:0579a12791e483fcaaad76748fb163ec855102a4</id>
<content type='text'>
commit e9532e69b8d1d1284e8ecf8d2586de34aec61244 upstream.

On CPU hotplug the steal time accounting can keep a stale rq-&gt;prev_steal_time
value over CPU down and up. So after the CPU comes up again the delta
calculation in steal_account_process_tick() wreckages itself due to the
unsigned math:

	 u64 steal = paravirt_steal_clock(smp_processor_id());

	 steal -= this_rq()-&gt;prev_steal_time;

So if steal is smaller than rq-&gt;prev_steal_time we end up with an insane large
value which then gets added to rq-&gt;prev_steal_time, resulting in a permanent
wreckage of the accounting. As a consequence the per CPU stats in /proc/stat
become stale.

Nice trick to tell the world how idle the system is (100%) while the CPU is
100% busy running tasks. Though we prefer realistic numbers.

None of the accounting values which use a previous value to account for
fractions is reset at CPU hotplug time. update_rq_clock_task() has a sanity
check for prev_irq_time and prev_steal_time_rq, but that sanity check solely
deals with clock warps and limits the /proc/stat visible wreckage. The
prev_time values are still wrong.

Solution is simple: Reset rq-&gt;prev_*_time when the CPU is plugged in again.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Glauber Costa &lt;glommer@parallels.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Fixes: commit 095c0aa83e52 "sched: adjust scheduler cpu power for stolen time"
Fixes: commit aa483808516c "sched: Remove irq time from available CPU power"
Fixes: commit e6e6685accfa "KVM guest: Steal time accounting"
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603041539490.3686@nanos
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Willy Tarreau &lt;w@1wt.eu&gt;
</content>
</entry>
<entry>
<title>sched/core: Remove false-positive warning from wake_up_process()</title>
<updated>2016-03-03T23:06:22Z</updated>
<author>
<name>Sasha Levin</name>
<email>sasha.levin@oracle.com</email>
</author>
<published>2015-12-01T01:34:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6bf97b05008739ad7d644758301a427460479537'/>
<id>urn:sha1:6bf97b05008739ad7d644758301a427460479537</id>
<content type='text'>
commit 119d6f6a3be8b424b200dcee56e74484d5445f7e upstream.

Because wakeups can (fundamentally) be late, a task might not be in
the expected state. Therefore testing against a task's state is racy,
and can yield false positives.

Signed-off-by: Sasha Levin &lt;sasha.levin@oracle.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: oleg@redhat.com
Fixes: 9067ac85d533 ("wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task")
Link: http://lkml.kernel.org/r/1448933660-23082-1-git-send-email-sasha.levin@oracle.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>printk: rename printk_sched to printk_deferred</title>
<updated>2014-08-07T21:30:26Z</updated>
<author>
<name>John Stultz</name>
<email>john.stultz@linaro.org</email>
</author>
<published>2014-06-04T23:11:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3984bb13c8f5f2f192aed228a88696d4a697a435'/>
<id>urn:sha1:3984bb13c8f5f2f192aed228a88696d4a697a435</id>
<content type='text'>
commit aac74dc495456412c4130a1167ce4beb6c1f0b38 upstream.

After learning we'll need some sort of deferred printk functionality in
the timekeeping core, Peter suggested we rename the printk_sched function
so it can be reused by needed subsystems.

This only changes the function name. No logic changes.

Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
Reviewed-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Jiri Bohac &lt;jbohac@suse.cz&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched: Fix hotplug vs. set_cpus_allowed_ptr()</title>
<updated>2014-06-11T19:03:24Z</updated>
<author>
<name>Lai Jiangshan</name>
<email>laijs@cn.fujitsu.com</email>
</author>
<published>2014-05-16T03:50:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=24d52daafcd53f109893ec2faf668c8eeef4a382'/>
<id>urn:sha1:24d52daafcd53f109893ec2faf668c8eeef4a382</id>
<content type='text'>
commit 6acbfb96976fc3350e30d964acb1dbbdf876d55e upstream.

Lai found that:

  WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b()
  ...
  migration_cpu_stop+0x1d/0x22

was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is
always a sub-set of cpu_online_mask.

This isn't true since 5fbd036b552f ("sched: Cleanup cpu_active madness").

So set active and online at the same time to avoid this particular
problem.

Fixes: 5fbd036b552f ("sched: Cleanup cpu_active madness")
Signed-off-by: Lai Jiangshan &lt;laijs@cn.fujitsu.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Gautham R. Shenoy &lt;ego@linux.vnet.ibm.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Michael wang &lt;wangyun@linux.vnet.ibm.com&gt;
Cc: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
Cc: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Cc: Srivatsa S. Bhat &lt;srivatsa.bhat@linux.vnet.ibm.com&gt;
Cc: Toshi Kani &lt;toshi.kani@hp.com&gt;
Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched: Fix race on toggling cfs_bandwidth_used</title>
<updated>2014-01-15T23:28:54Z</updated>
<author>
<name>Ben Segall</name>
<email>bsegall@google.com</email>
</author>
<published>2013-10-16T18:16:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9d80092f8d9e0fc4aa2b6a7c8d2e4a7437899ca5'/>
<id>urn:sha1:9d80092f8d9e0fc4aa2b6a7c8d2e4a7437899ca5</id>
<content type='text'>
commit 1ee14e6c8cddeeb8a490d7b54cd9016e4bb900b4 upstream.

When we transition cfs_bandwidth_used to false, any currently
throttled groups will incorrectly return false from cfs_rq_throttled.
While tg_set_cfs_bandwidth will unthrottle them eventually, currently
running code (including at least dequeue_task_fair and
distribute_cfs_runtime) will cause errors.

Fix this by turning off cfs_bandwidth_used only after unthrottling all
cfs_rqs.

Tested: toggle bandwidth back and forth on a loaded cgroup. Caused
crashes in minutes without the patch, hasn't crashed with it.

Signed-off-by: Ben Segall &lt;bsegall@google.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/20131016181611.22647.80365.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Chris J Arges &lt;chris.j.arges@canonical.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>sched: fix the theoretical signal_wake_up() vs schedule() race</title>
<updated>2014-01-09T20:24:23Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2013-08-12T16:14:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=57f74b6ecebf59991677dd2da0f0433e8be6c945'/>
<id>urn:sha1:57f74b6ecebf59991677dd2da0f0433e8be6c945</id>
<content type='text'>
commit e0acd0a68ec7dbf6b7a81a87a867ebd7ac9b76c4 upstream.

This is only theoretical, but after try_to_wake_up(p) was changed
to check p-&gt;state under p-&gt;pi_lock the code like

	__set_current_state(TASK_INTERRUPTIBLE);
	schedule();

can miss a signal. This is the special case of wait-for-condition,
it relies on try_to_wake_up/schedule interaction and thus it does
not need mb() between __set_current_state() and if(signal_pending).

However, this __set_current_state() can move into the critical
section protected by rq-&gt;lock, now that try_to_wake_up() takes
another lock we need to ensure that it can't be reordered with
"if (signal_pending(current))" check inside that section.

The patch is actually one-liner, it simply adds smp_wmb() before
spin_lock_irq(rq-&gt;lock). This is what try_to_wake_up() already
does by the same reason.

We turn this wmb() into the new helper, smp_mb__before_spinlock(),
for better documentation and to allow the architectures to change
the default implementation.

While at it, kill smp_mb__after_lock(), it has no callers.

Perhaps we can also add smp_mb__before/after_spinunlock() for
prepare_to_wait().

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
