<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/sched.c, branch v3.0.17</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.0.17</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.0.17'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2011-10-16T21:14:51Z</updated>
<entry>
<title>posix-cpu-timers: Cure SMP wobbles</title>
<updated>2011-10-16T21:14:51Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2011-09-01T10:42:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=249cf808ba1a0d403fe7c476a74b66e2bc0a8e53'/>
<id>urn:sha1:249cf808ba1a0d403fe7c476a74b66e2bc0a8e53</id>
<content type='text'>
commit d670ec13178d0fd8680e6742a2bc6e04f28f87d8 upstream.

David reported:

  Attached below is a watered-down version of rt/tst-cpuclock2.c from
  GLIBC.  Just build it with "gcc -o test test.c -lpthread -lrt" or
  similar.

  Run it several times, and you will see cases where the main thread
  will measure a process clock difference before and after the nanosleep
  which is smaller than the cpu-burner thread's individual thread clock
  difference.  This doesn't make any sense since the cpu-burner thread
  is part of the top-level process's thread group.

  I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
  64-bit binaries).

  For example:

  [davem@boricha build-x86_64-linux]$ ./test
  process: before(0.001221967) after(0.498624371) diff(497402404)
  thread:  before(0.000081692) after(0.498316431) diff(498234739)
  self:    before(0.001223521) after(0.001240219) diff(16698)
  [davem@boricha build-x86_64-linux]$

  The diff of 'process' should always be &gt;= the diff of 'thread'.

  I make sure to wrap the 'thread' clock measurements the most tightly
  around the nanosleep() call, and that the 'process' clock measurements
  are the outer-most ones.

  ---
  #include &lt;unistd.h&gt;
  #include &lt;stdio.h&gt;
  #include &lt;stdlib.h&gt;
  #include &lt;time.h&gt;
  #include &lt;fcntl.h&gt;
  #include &lt;string.h&gt;
  #include &lt;errno.h&gt;
  #include &lt;pthread.h&gt;

  static pthread_barrier_t barrier;

  static void *chew_cpu(void *arg)
  {
	  pthread_barrier_wait(&amp;barrier);
	  while (1)
		  __asm__ __volatile__("" : : : "memory");
	  return NULL;
  }

  int main(void)
  {
	  clockid_t process_clock, my_thread_clock, th_clock;
	  struct timespec process_before, process_after;
	  struct timespec me_before, me_after;
	  struct timespec th_before, th_after;
	  struct timespec sleeptime;
	  unsigned long diff;
	  pthread_t th;
	  int err;

	  err = clock_getcpuclockid(0, &amp;process_clock);
	  if (err)
		  return 1;

	  err = pthread_getcpuclockid(pthread_self(), &amp;my_thread_clock);
	  if (err)
		  return 1;

	  pthread_barrier_init(&amp;barrier, NULL, 2);
	  err = pthread_create(&amp;th, NULL, chew_cpu, NULL);
	  if (err)
		  return 1;

	  err = pthread_getcpuclockid(th, &amp;th_clock);
	  if (err)
		  return 1;

	  pthread_barrier_wait(&amp;barrier);

	  err = clock_gettime(process_clock, &amp;process_before);
	  if (err)
		  return 1;

	  err = clock_gettime(my_thread_clock, &amp;me_before);
	  if (err)
		  return 1;

	  err = clock_gettime(th_clock, &amp;th_before);
	  if (err)
		  return 1;

	  sleeptime.tv_sec = 0;
	  sleeptime.tv_nsec = 500000000;
	  nanosleep(&amp;sleeptime, NULL);

	  err = clock_gettime(th_clock, &amp;th_after);
	  if (err)
		  return 1;

	  err = clock_gettime(my_thread_clock, &amp;me_after);
	  if (err)
		  return 1;

	  err = clock_gettime(process_clock, &amp;process_after);
	  if (err)
		  return 1;

	  diff = process_after.tv_nsec - process_before.tv_nsec;
	  printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
		 process_before.tv_sec, process_before.tv_nsec,
		 process_after.tv_sec, process_after.tv_nsec, diff);
	  diff = th_after.tv_nsec - th_before.tv_nsec;
	  printf("thread:  before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
		 th_before.tv_sec, th_before.tv_nsec,
		 th_after.tv_sec, th_after.tv_nsec, diff);
	  diff = me_after.tv_nsec - me_before.tv_nsec;
	  printf("self:    before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
		 me_before.tv_sec, me_before.tv_nsec,
		 me_after.tv_sec, me_after.tv_nsec, diff);

	  return 0;
  }

This is due to us using p-&gt;se.sum_exec_runtime in
thread_group_cputime() where we iterate the thread group and sum all
data. This does not take time since the last schedule operation (tick
or otherwise) into account. We can cure this by using
task_sched_runtime() at the cost of having to take locks.

This also means we can (and must) do away with
thread_group_sched_runtime() since the modified thread_group_cputime()
is now more accurate and would deadlock when called from
thread_group_sched_runtime().

Aside of that it makes the function safe on 32 bit systems. The old
code added t-&gt;se.sum_exec_runtime unprotected. sum_exec_runtime is a
64bit value and could be changed on another cpu at the same time.

Reported-by: David Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins
Tested-by: David Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>sched: Fix up wchan borkage</title>
<updated>2011-10-16T21:14:51Z</updated>
<author>
<name>Simon Kirby</name>
<email>sim@hostway.ca</email>
</author>
<published>2011-09-23T00:03:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4e41ce698822a9a2cfd324a11ea44f60fc95c871'/>
<id>urn:sha1:4e41ce698822a9a2cfd324a11ea44f60fc95c871</id>
<content type='text'>
commit 6ebbe7a07b3bc40b168d2afc569a6543c020d2e3 upstream.

Commit c259e01a1ec ("sched: Separate the scheduler entry for
preemption") contained a boo-boo wrecking wchan output. It forgot to
put the new schedule() function in the __sched section and thereby
doesn't get properly ignored for things like wchan.

Tested-by: Simon Kirby &lt;sim@hostway.ca&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/20110923000346.GA25425@hostway.ca
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>sched: Fix a memory leak in __sdt_free()</title>
<updated>2011-10-03T18:40:09Z</updated>
<author>
<name>WANG Cong</name>
<email>amwang@redhat.com</email>
</author>
<published>2011-08-18T12:36:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=70a4888b98f8fe19323a7a33d8d55be5d22513e8'/>
<id>urn:sha1:70a4888b98f8fe19323a7a33d8d55be5d22513e8</id>
<content type='text'>
commit feff8fa0075bdfd43c841e9d689ed81adda988d6 upstream.

This patch fixes the following memory leak:

unreferenced object 0xffff880107266800 (size 512):
  comm "sched-powersave", pid 3718, jiffies 4323097853 (age 27495.450s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [&lt;ffffffff81133940&gt;] create_object+0x187/0x28b
    [&lt;ffffffff814ac103&gt;] kmemleak_alloc+0x73/0x98
    [&lt;ffffffff811232ba&gt;] __kmalloc_node+0x104/0x159
    [&lt;ffffffff81044b98&gt;] kzalloc_node.clone.97+0x15/0x17
    [&lt;ffffffff8104cb90&gt;] build_sched_domains+0xb7/0x7f3
    [&lt;ffffffff8104d4df&gt;] partition_sched_domains+0x1db/0x24a
    [&lt;ffffffff8109ee4a&gt;] do_rebuild_sched_domains+0x3b/0x47
    [&lt;ffffffff810a00c7&gt;] rebuild_sched_domains+0x10/0x12
    [&lt;ffffffff8104d5ba&gt;] sched_power_savings_store+0x6c/0x7b
    [&lt;ffffffff8104d5df&gt;] sched_mc_power_savings_store+0x16/0x18
    [&lt;ffffffff8131322c&gt;] sysdev_class_store+0x20/0x22
    [&lt;ffffffff81193876&gt;] sysfs_write_file+0x108/0x144
    [&lt;ffffffff81135b10&gt;] vfs_write+0xaf/0x102
    [&lt;ffffffff81135d23&gt;] sys_write+0x4d/0x74
    [&lt;ffffffff814c8a42&gt;] system_call_fastpath+0x16/0x1b
    [&lt;ffffffffffffffff&gt;] 0xffffffffffffffff

Signed-off-by: WANG Cong &lt;amwang@redhat.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/1313671017-4112-1-git-send-email-amwang@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>sched: Move blk_schedule_flush_plug() out of __schedule()</title>
<updated>2011-10-03T18:40:09Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2011-06-22T17:47:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f4e97b682ac92d2aed3db68f396b022113d9ad30'/>
<id>urn:sha1:f4e97b682ac92d2aed3db68f396b022113d9ad30</id>
<content type='text'>
commit 9c40cef2b799f9b5e7fa5de4d2ad3a0168ba118c upstream.

There is no real reason to run blk_schedule_flush_plug() with
interrupts and preemption disabled.

Move it into schedule() and call it when the task is going voluntarily
to sleep. There might be false positives when the task is woken
between that call and actually scheduling, but that's not really
different from being woken immediately after switching away.

This fixes a deadlock in the scheduler where the
blk_schedule_flush_plug() callchain enables interrupts and thereby
allows a wakeup to happen of the task that's going to sleep.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/n/tip-dwfxtra7yg1b5r65m32ywtct@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>sched: Separate the scheduler entry for preemption</title>
<updated>2011-10-03T18:40:08Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2011-06-22T17:47:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=edbb7ce79e62d1028781b58337100108dc41471e'/>
<id>urn:sha1:edbb7ce79e62d1028781b58337100108dc41471e</id>
<content type='text'>
commit c259e01a1ec90063042f758e409cd26b2a0963c8 upstream.

Block-IO and workqueues call into notifier functions from the
scheduler core code with interrupts and preemption disabled. These
calls should be made before entering the scheduler core.

To simplify this, separate the scheduler core code into
__schedule(). __schedule() is directly called from the places which
set PREEMPT_ACTIVE and from schedule(). This allows us to add the work
checks into schedule(), so they are only called when a task voluntary
goes to sleep.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/20110622174918.813258321@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip</title>
<updated>2011-07-20T22:56:25Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-07-20T22:56:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cf6ace16a3cd8b728fb0afa68368fd40bbeae19f'/>
<id>urn:sha1:cf6ace16a3cd8b728fb0afa68368fd40bbeae19f</id>
<content type='text'>
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  signal: align __lock_task_sighand() irq disabling and RCU
  softirq,rcu: Inform RCU of irq_exit() activity
  sched: Add irq_{enter,exit}() to scheduler_ipi()
  rcu: protect __rcu_read_unlock() against scheduler-using irq handlers
  rcu: Streamline code produced by __rcu_read_unlock()
  rcu: Fix RCU_BOOST race handling current-&gt;rcu_read_unlock_special
  rcu: decrease rcu_report_exp_rnp coupling with scheduler
</content>
</entry>
<entry>
<title>Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu into core/urgent</title>
<updated>2011-07-20T18:59:26Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@elte.hu</email>
</author>
<published>2011-07-20T18:59:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d1e9ae47a0285d3f1699e8219ce50f656243b93f'/>
<id>urn:sha1:d1e9ae47a0285d3f1699e8219ce50f656243b93f</id>
<content type='text'>
</content>
</entry>
<entry>
<title>sched: Add irq_{enter,exit}() to scheduler_ipi()</title>
<updated>2011-07-20T17:50:11Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2011-07-19T22:07:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c5d753a55ac92e09816d410cd17093813f1a904b'/>
<id>urn:sha1:c5d753a55ac92e09816d410cd17093813f1a904b</id>
<content type='text'>
Ensure scheduler_ipi() calls irq_{enter,exit} when it does some actual
work. Traditionally we never did any actual work from the resched IPI
and all magic happened in the return from interrupt path.

Now that we do do some work, we need to ensure irq_{enter,exit} are
called so that we don't confuse things.

This affects things like timekeeping, NO_HZ and RCU, basically
everything with a hook in irq_enter/exit.

Explicit examples of things going wrong are:

  sched_clock_cpu() -- has a callback when leaving NO_HZ state to take
                    a new reading from GTOD and TSC. Without this
                    callback, time is stuck in the past.

  RCU -- needs in_irq() to work in order to avoid some nasty deadlocks

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
</entry>
<entry>
<title>sched: Avoid creating superfluous NUMA domains on non-NUMA systems</title>
<updated>2011-07-20T16:54:33Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2011-07-20T16:42:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d110235d2c331c4f79e0879f51104be79e17a469'/>
<id>urn:sha1:d110235d2c331c4f79e0879f51104be79e17a469</id>
<content type='text'>
When creating sched_domains, stop when we've covered the entire
target span instead of continuing to create domains, only to
later find they're redundant and throw them away again.

This avoids single node systems from touching funny NUMA
sched_domain creation code and reduces the risks of the new
SD_OVERLAP code.

Requested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Anton Blanchard &lt;anton@samba.org&gt;
Cc: mahesh@linux.vnet.ibm.com
Cc: benh@kernel.crashing.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1311180177.29152.57.camel@twins
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>sched: Allow for overlapping sched_domain spans</title>
<updated>2011-07-20T16:32:41Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2011-07-15T08:35:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e3589f6c81e4764d32a25d2a2a0afe54fa344f5c'/>
<id>urn:sha1:e3589f6c81e4764d32a25d2a2a0afe54fa344f5c</id>
<content type='text'>
Allow for sched_domain spans that overlap by giving such domains their
own sched_group list instead of sharing the sched_groups amongst
each-other.

This is needed for machines with more than 16 nodes, because
sched_domain_node_span() will generate a node mask from the
16 nearest nodes without regard if these masks have any overlap.

Currently sched_domains have a sched_group that maps to their child
sched_domain span, and since there is no overlap we share the
sched_group between the sched_domains of the various CPUs. If however
there is overlap, we would need to link the sched_group list in
different ways for each cpu, and hence sharing isn't possible.

In order to solve this, allocate private sched_groups for each CPU's
sched_domain but have the sched_groups share a sched_group_power
structure such that we can uniquely track the power.

Reported-and-tested-by: Anton Blanchard &lt;anton@samba.org&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Link: http://lkml.kernel.org/n/tip-08bxqw9wis3qti9u5inifh3y@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
</feed>
