<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/pid.c, branch v3.2</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2011-10-31T13:20:12Z</updated>
<entry>
<title>kernel: Map most files to use export.h instead of module.h</title>
<updated>2011-10-31T13:20:12Z</updated>
<author>
<name>Paul Gortmaker</name>
<email>paul.gortmaker@windriver.com</email>
</author>
<published>2011-05-23T18:51:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9984de1a5a8a96275fcab818f7419af5a3c86e71'/>
<id>urn:sha1:9984de1a5a8a96275fcab818f7419af5a3c86e71</id>
<content type='text'>
The changed files were only including linux/module.h for the
EXPORT_SYMBOL infrastructure, and nothing else.  Revector them
onto the isolated export header for faster compile times.

Nothing to see here but a whole lot of instances of:

  -#include &lt;linux/module.h&gt;
  +#include &lt;linux/export.h&gt;

This commit is only changing the kernel dir; next targets
will probably be mm, fs, the arch dirs, etc.

Signed-off-by: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
</content>
</entry>
<entry>
<title>rcu: Restore checks for blocking in RCU read-side critical sections</title>
<updated>2011-09-29T04:36:37Z</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.vnet.ibm.com</email>
</author>
<published>2011-05-24T15:31:09Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b3fbab0571eb09746cc0283648165ec00efc8eb2'/>
<id>urn:sha1:b3fbab0571eb09746cc0283648165ec00efc8eb2</id>
<content type='text'>
Long ago, using TREE_RCU with PREEMPT would result in "scheduling
while atomic" diagnostics if you blocked in an RCU read-side critical
section.  However, PREEMPT now implies TREE_PREEMPT_RCU, which defeats
this diagnostic.  This commit therefore adds a replacement diagnostic
based on PROVE_RCU.

Because rcu_lockdep_assert() and lockdep_rcu_dereference() are now being
used for things that have nothing to do with rcu_dereference(), rename
lockdep_rcu_dereference() to lockdep_rcu_suspicious() and add a third
argument that is a string indicating what is suspicious.  This third
argument is passed in from a new third argument to rcu_lockdep_assert().
Update all calls to rcu_lockdep_assert() to add an informative third
argument.

Also, add a pair of rcu_lockdep_assert() calls from within
rcu_note_context_switch(), one complaining if a context switch occurs
in an RCU-bh read-side critical section and another complaining if a
context switch occurs in an RCU-sched read-side critical section.
These are present only if the PROVE_RCU kernel parameter is enabled.

Finally, fix some checkpatch whitespace complaints in lockdep.c.

Again, you must enable PROVE_RCU to see these new diagnostics.  But you
are enabling PROVE_RCU to check out new RCU uses in any case, aren't you?

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
</content>
</entry>
<entry>
<title>rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check</title>
<updated>2011-07-08T20:21:58Z</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.cz</email>
</author>
<published>2011-07-08T12:39:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d8bf4ca9ca9576548628344c9725edd3786e90b1'/>
<id>urn:sha1:d8bf4ca9ca9576548628344c9725edd3786e90b1</id>
<content type='text'>
Since ca5ecddf (rcu: define __rcu address space modifier for sparse)
rcu_dereference_check use rcu_read_lock_held as a part of condition
automatically so callers do not have to do that as well.

Signed-off-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Acked-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
</content>
</entry>
<entry>
<title>next_pidmap: fix overflow condition</title>
<updated>2011-04-18T17:35:30Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-04-18T17:35:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c78193e9c7bcbf25b8237ad0dec82f805c4ea69b'/>
<id>urn:sha1:c78193e9c7bcbf25b8237ad0dec82f805c4ea69b</id>
<content type='text'>
next_pidmap() just quietly accepted whatever 'last' pid that was passed
in, which is not all that safe when one of the users is /proc.

Admittedly the proc code should do some sanity checking on the range
(and that will be the next commit), but that doesn't mean that the
helper functions should just do that pidmap pointer arithmetic without
checking the range of its arguments.

So clamp 'last' to PID_MAX_LIMIT.  The fact that we then do "last+1"
doesn't really matter, the for-loop does check against the end of the
pidmap array properly (it's only the actual pointer arithmetic overflow
case we need to worry about, and going one bit beyond isn't going to
overflow).

[ Use PID_MAX_LIMIT rather than pid_max as per Eric Biederman ]

Reported-by: Tavis Ormandy &lt;taviso@cmpxchg8b.com&gt;
Analyzed-by: Robert Święcki &lt;robert@swiecki.net&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>export pid symbols needed for kvm_vcpu_on_spin</title>
<updated>2011-03-17T16:08:28Z</updated>
<author>
<name>Rik van Riel</name>
<email>riel@redhat.com</email>
</author>
<published>2011-02-01T14:51:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=77c100c83e84316ced2507c5799f79c2c80bc6b9'/>
<id>urn:sha1:77c100c83e84316ced2507c5799f79c2c80bc6b9</id>
<content type='text'>
Export the symbols required for a race-free kvm_vcpu_on_spin.

Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Avi Kivity &lt;avi@redhat.com&gt;
</content>
</entry>
<entry>
<title>Add RCU check for find_task_by_vpid().</title>
<updated>2010-08-20T00:18:02Z</updated>
<author>
<name>Tetsuo Handa</name>
<email>penguin-kernel@I-love.SAKURA.ne.jp</email>
</author>
<published>2010-06-25T16:08:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4221a9918e38b7494cee341dda7b7b4bb8c04bde'/>
<id>urn:sha1:4221a9918e38b7494cee341dda7b7b4bb8c04bde</id>
<content type='text'>
find_task_by_vpid() says "Must be called under rcu_read_lock().". But due to
commit 3120438 "rcu: Disable lockdep checking in RCU list-traversal primitives",
we are currently unable to catch "find_task_by_vpid() with tasklist_lock held
but RCU lock not held" errors due to the RCU-lockdep checks being
suppressed in the RCU variants of the struct list_head traversals.
This commit therefore places an explicit check for being in an RCU
read-side critical section in find_task_by_pid_ns().

  ===================================================
  [ INFO: suspicious rcu_dereference_check() usage. ]
  ---------------------------------------------------
  kernel/pid.c:386 invoked rcu_dereference_check() without protection!

  other info that might help us debug this:

  rcu_scheduler_active = 1, debug_locks = 1
  1 lock held by rc.sysinit/1102:
   #0:  (tasklist_lock){.+.+..}, at: [&lt;c1048340&gt;] sys_setpgid+0x40/0x160

  stack backtrace:
  Pid: 1102, comm: rc.sysinit Not tainted 2.6.35-rc3-dirty #1
  Call Trace:
   [&lt;c105e714&gt;] lockdep_rcu_dereference+0x94/0xb0
   [&lt;c104b4cd&gt;] find_task_by_pid_ns+0x6d/0x70
   [&lt;c104b4e8&gt;] find_task_by_vpid+0x18/0x20
   [&lt;c1048347&gt;] sys_setpgid+0x47/0x160
   [&lt;c1002b50&gt;] sysenter_do_call+0x12/0x36

Commit updated to use a new rcu_lockdep_assert() exported API rather than
the old internal __do_rcu_dereference().

Signed-off-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Reviewed-by: Josh Triplett &lt;josh@joshtriplett.org&gt;
</content>
</entry>
<entry>
<title>rculist: avoid __rcu annotations</title>
<updated>2010-08-20T00:18:00Z</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2010-02-25T15:55:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=67bdbffd696f29a0b68aa8daa285783a06651583'/>
<id>urn:sha1:67bdbffd696f29a0b68aa8daa285783a06651583</id>
<content type='text'>
This avoids warnings from missing __rcu annotations
in the rculist implementation, making it possible to
use the same lists in both RCU and non-RCU cases.

We can add rculist annotations later, together with
lockdep support for rculist, which is missing as well,
but that may involve changing all the users.

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Sukadev Bhattiprolu &lt;sukadev@us.ibm.com&gt;
Reviewed-by: Josh Triplett &lt;josh@joshtriplett.org&gt;
</content>
</entry>
<entry>
<title>pids: alloc_pidmap: remove the unnecessary boundary checks</title>
<updated>2010-08-11T15:59:20Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-08-11T01:03:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c52b0b91ba1f4b7ea90e20385c0a6df0ba54aed4'/>
<id>urn:sha1:c52b0b91ba1f4b7ea90e20385c0a6df0ba54aed4</id>
<content type='text'>
alloc_pidmap() calculates max_scan so that if the initial offset != 0 we
inspect the first map-&gt;page twice.  This is correct, we want to find the
unused bits &lt; offset in this bitmap block.  Add the comment.

But it doesn't make any sense to stop the find_next_offset() loop when we
are looking into this map-&gt;page for the second time.  We have already
already checked the bits &gt;= offset during the first attempt, it is fine to
do this again, no matter if we succeed this time or not.

Remove this hard-to-understand code.  It optimizes the very unlikely case
when we are going to fail, but slows down the more likely case.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Salman Qazi &lt;sqazi@google.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Sukadev Bhattiprolu &lt;sukadev@us.ibm.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>pids: fix a race in pid generation that causes pids to be reused immediately</title>
<updated>2010-08-11T15:59:20Z</updated>
<author>
<name>Salman</name>
<email>sqazi@google.com</email>
</author>
<published>2010-08-11T01:03:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5fdee8c4a5e1800489ce61963208f8cc55e42ea1'/>
<id>urn:sha1:5fdee8c4a5e1800489ce61963208f8cc55e42ea1</id>
<content type='text'>
A program that repeatedly forks and waits is susceptible to having the
same pid repeated, especially when it competes with another instance of
the same program.  This is really bad for bash implementation.
Furthermore, many shell scripts assume that pid numbers will not be used
for some length of time.

Race Description:

A                                    B

// pid == offset == n                // pid == offset == n + 1
test_and_set_bit(offset, map-&gt;page)
                                     test_and_set_bit(offset, map-&gt;page);
                                     pid_ns-&gt;last_pid = pid;
pid_ns-&gt;last_pid = pid;
                                     // pid == n + 1 is freed (wait())

                                     // Next fork()...
                                     last = pid_ns-&gt;last_pid; // == n
                                     pid = last + 1;

Code to reproduce it (Running multiple instances is more effective):

#include &lt;errno.h&gt;
#include &lt;sys/types.h&gt;
#include &lt;sys/wait.h&gt;
#include &lt;unistd.h&gt;
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;

// The distance mod 32768 between two pids, where the first pid is expected
// to be smaller than the second.
int PidDistance(pid_t first, pid_t second) {
  return (second + 32768 - first) % 32768;
}

int main(int argc, char* argv[]) {
  int failed = 0;
  pid_t last_pid = 0;
  int i;
  printf("%d\n", sizeof(pid_t));
  for (i = 0; i &lt; 10000000; ++i) {
    if (i % 32786 == 0)
      printf("Iter: %d\n", i/32768);
    int child_exit_code = i % 256;
    pid_t pid = fork();
    if (pid == -1) {
      fprintf(stderr, "fork failed, iteration %d, errno=%d", i, errno);
      exit(1);
    }
    if (pid == 0) {
      // Child
      exit(child_exit_code);
    } else {
      // Parent
      if (i &gt; 0) {
        int distance = PidDistance(last_pid, pid);
        if (distance == 0 || distance &gt; 30000) {
          fprintf(stderr,
                  "Unexpected pid sequence: previous fork: pid=%d, "
                  "current fork: pid=%d for iteration=%d.\n",
                  last_pid, pid, i);
          failed = 1;
        }
      }
      last_pid = pid;
      int status;
      int reaped = wait(&amp;status);
      if (reaped != pid) {
        fprintf(stderr,
                "Wait return value: expected pid=%d, "
                "got %d, iteration %d\n",
                pid, reaped, i);
        failed = 1;
      } else if (WEXITSTATUS(status) != child_exit_code) {
        fprintf(stderr,
                "Unexpected exit status %x, iteration %d\n",
                WEXITSTATUS(status), i);
        failed = 1;
      }
    }
  }
  exit(failed);
}

Thanks to Ted Tso for the key ideas of this implementation.

Signed-off-by: Salman Qazi &lt;sqazi@google.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Theodore Ts'o &lt;tytso@mit.edu&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Sukadev Bhattiprolu &lt;sukadev@us.ibm.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>pids: increase pid_max based on num_possible_cpus</title>
<updated>2010-05-27T16:12:51Z</updated>
<author>
<name>Hedi Berriche</name>
<email>hedi@sgi.com</email>
</author>
<published>2010-05-26T21:44:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=72680a191b934377430032f93af15ef50aafb3a8'/>
<id>urn:sha1:72680a191b934377430032f93af15ef50aafb3a8</id>
<content type='text'>
On a system with a substantial number of processors, the early default
pid_max of 32k will not be enough.  A system with 1664 CPU's, there are
25163 processes started before the login prompt.  It's estimated that with
2048 CPU's we will pass the 32k limit.  With 4096, we'll reach that limit
very early during the boot cycle, and processes would stall waiting for an
available pid.

This patch increases the early maximum number of pids available, and
increases the minimum number of pids that can be set during runtime.

[akpm@linux-foundation.org: fix warnings]
Signed-off-by: Hedi Berriche &lt;hedi@sgi.com&gt;
Signed-off-by: Mike Travis &lt;travis@sgi.com&gt;
Signed-off-by: Robin Holt &lt;holt@sgi.com&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: Pavel Machek &lt;pavel@ucw.cz&gt;
Cc: Alan Cox &lt;alan@lxorguk.ukuu.org.uk&gt;
Cc: Greg KH &lt;gregkh@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: John Stoffel &lt;john@stoffel.org&gt;
Cc: Jack Steiner &lt;steiner@sgi.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
