<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/pid_namespace.c, branch v4.15.5</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.15.5</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.15.5'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2017-11-18T00:10:04Z</updated>
<entry>
<title>pid: remove pidhash</title>
<updated>2017-11-18T00:10:04Z</updated>
<author>
<name>Gargi Sharma</name>
<email>gs051095@gmail.com</email>
</author>
<published>2017-11-17T23:30:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e8cfbc245e24887e3c30235f71e9e9405e0cfc39'/>
<id>urn:sha1:e8cfbc245e24887e3c30235f71e9e9405e0cfc39</id>
<content type='text'>
pidhash is no longer required as all the information can be looked up
from idr tree.  nr_hashed represented the number of pids that had been
hashed.  Since, nr_hashed and PIDNS_HASH_ADDING are no longer relevant,
it has been renamed to pid_allocated and PIDNS_ADDING respectively.

[gs051095@gmail.com: v6]
  Link: http://lkml.kernel.org/r/1507760379-21662-3-git-send-email-gs051095@gmail.com
Link: http://lkml.kernel.org/r/1507583624-22146-3-git-send-email-gs051095@gmail.com
Signed-off-by: Gargi Sharma &lt;gs051095@gmail.com&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Tested-by: Tony Luck &lt;tony.luck@intel.com&gt;	[ia64]
Cc: Julia Lawall &lt;julia.lawall@lip6.fr&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@oracle.com&gt;
Cc: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>pid: replace pid bitmap implementation with IDR API</title>
<updated>2017-11-18T00:10:03Z</updated>
<author>
<name>Gargi Sharma</name>
<email>gs051095@gmail.com</email>
</author>
<published>2017-11-17T23:30:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=95846ecf9dac5089aed4b144d912225f8ef86ae4'/>
<id>urn:sha1:95846ecf9dac5089aed4b144d912225f8ef86ae4</id>
<content type='text'>
Patch series "Replacing PID bitmap implementation with IDR API", v4.

This series replaces kernel bitmap implementation of PID allocation with
IDR API.  These patches are written to simplify the kernel by replacing
custom code with calls to generic code.

The following are the stats for pid and pid_namespace object files
before and after the replacement.  There is a noteworthy change between
the IDR and bitmap implementation.

Before
   text       data        bss        dec        hex    filename
   8447       3894         64      12405       3075    kernel/pid.o
After
   text       data        bss        dec        hex    filename
   3397        304          0       3701        e75    kernel/pid.o

Before
   text       data        bss        dec        hex    filename
   5692       1842        192       7726       1e2e    kernel/pid_namespace.o
After
   text       data        bss        dec        hex    filename
   2854        216         16       3086        c0e    kernel/pid_namespace.o

The following are the stats for ps, pstree and calling readdir on /proc
for 10,000 processes.

ps:
        With IDR API    With bitmap
real    0m1.479s        0m2.319s
user    0m0.070s        0m0.060s
sys     0m0.289s        0m0.516s

pstree:
        With IDR API    With bitmap
real    0m1.024s        0m1.794s
user    0m0.348s        0m0.612s
sys     0m0.184s        0m0.264s

proc:
        With IDR API    With bitmap
real    0m0.059s        0m0.074s
user    0m0.000s        0m0.004s
sys     0m0.016s        0m0.016s

This patch (of 2):

Replace the current bitmap implementation for Process ID allocation.
Functions that are no longer required, for example, free_pidmap(),
alloc_pidmap(), etc.  are removed.  The rest of the functions are
modified to use the IDR API.  The change was made to make the PID
allocation less complex by replacing custom code with calls to generic
API.

[gs051095@gmail.com: v6]
  Link: http://lkml.kernel.org/r/1507760379-21662-2-git-send-email-gs051095@gmail.com
[avagin@openvz.org: restore the old behaviour of the ns_last_pid sysctl]
  Link: http://lkml.kernel.org/r/20171106183144.16368-1-avagin@openvz.org
Link: http://lkml.kernel.org/r/1507583624-22146-2-git-send-email-gs051095@gmail.com
Signed-off-by: Gargi Sharma &lt;gs051095@gmail.com&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Julia Lawall &lt;julia.lawall@lip6.fr&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@oracle.com&gt;
Cc: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>userns,pidns: Verify the userns for new pid namespaces</title>
<updated>2017-07-20T12:43:58Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2017-04-29T19:12:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a2b426267c56773201f968fdb5eda6ab9ae94e34'/>
<id>urn:sha1:a2b426267c56773201f968fdb5eda6ab9ae94e34</id>
<content type='text'>
It is pointless and confusing to allow a pid namespace hierarchy and
the user namespace hierarchy to get out of sync.  The owner of a child
pid namespace should be the owner of the parent pid namespace or
a descendant of the owner of the parent pid namespace.

Otherwise it is possible to construct scenarios where a process has a
capability over a parent pid namespace but does not have the
capability over a child pid namespace.  Which confusingly makes
permission checks non-transitive.

It requires use of setns into a pid namespace (but not into a user
namespace) to create such a scenario.

Add the function in_userns to help in making this determination.

v2: Optimized in_userns by using level as suggested
    by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;

Ref: 49f4d8b93ccf ("pidns: Capture the user namespace and filter ns_last_pid")
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes</title>
<updated>2017-05-13T22:26:01Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2017-05-11T23:21:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b9a985db98961ae1ba0be169f19df1c567e4ffe0'/>
<id>urn:sha1:b9a985db98961ae1ba0be169f19df1c567e4ffe0</id>
<content type='text'>
The code can potentially sleep for an indefinite amount of time in
zap_pid_ns_processes triggering the hung task timeout, and increasing
the system average.  This is undesirable.  Sleep with a task state of
TASK_INTERRUPTIBLE instead of TASK_UNINTERRUPTIBLE to remove these
undesirable side effects.

Apparently under heavy load this has been allowing Chrome to trigger
the hung time task timeout error and cause ChromeOS to reboot.

Reported-by: Vovo Yang &lt;vovoy@google.com&gt;
Reported-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Tested-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Fixes: 6347e9009104 ("pidns: guarantee that the pidns init will be the last pidns process reaped")
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>pidns: expose task pid_ns_for_children to userspace</title>
<updated>2017-05-09T00:15:12Z</updated>
<author>
<name>Kirill Tkhai</name>
<email>ktkhai@virtuozzo.com</email>
</author>
<published>2017-05-08T22:56:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=eaa0d190bfe1ed891b814a52712dcd852554cb08'/>
<id>urn:sha1:eaa0d190bfe1ed891b814a52712dcd852554cb08</id>
<content type='text'>
pid_ns_for_children set by a task is known only to the task itself, and
it's impossible to identify it from outside.

It's a big problem for checkpoint/restore software like CRIU, because it
can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of
their work.

This patch solves the problem, and it exposes pid_ns_for_children to ns
directory in standard way with the name "pid_for_children":

  ~# ls /proc/5531/ns -l | grep pid
  lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -&gt; pid:[4026531836]
  lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -&gt; pid:[4026532286]

Link: http://lkml.kernel.org/r/149201123914.6007.2187327078064239572.stgit@localhost.localdomain
Signed-off-by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Cc: Andrei Vagin &lt;avagin@virtuozzo.com&gt;
Cc: Andreas Gruenbacher &lt;agruenba@redhat.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@googlemail.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Paul Moore &lt;paul@paul-moore.com&gt;
Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Serge Hallyn &lt;serge@hallyn.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched/headers: Prepare for the reduction of &lt;linux/sched.h&gt;'s signal API dependency</title>
<updated>2017-03-02T07:42:37Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2017-02-03T22:47:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f361bf4a66c9bfabace46f6ff5d97005c9b524fe'/>
<id>urn:sha1:f361bf4a66c9bfabace46f6ff5d97005c9b524fe</id>
<content type='text'>
Instead of including the full &lt;linux/signal.h&gt;, we are going to include the
types-only &lt;linux/signal_types.h&gt; header in &lt;linux/sched.h&gt;, to further
decouple the scheduler header from the signal headers.

This means that various files which relied on the full &lt;linux/signal.h&gt; need
to be updated to gain an explicit dependency on it.

Update the code that relies on sched.h's inclusion of the &lt;linux/signal.h&gt; header.

Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/headers: Prepare for new header dependencies before moving code to &lt;linux/sched/task.h&gt;</title>
<updated>2017-03-02T07:42:35Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2017-02-08T17:51:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=299300258d1bc4e997b7db340a2e06636757fe2e'/>
<id>urn:sha1:299300258d1bc4e997b7db340a2e06636757fe2e</id>
<content type='text'>
We are going to split &lt;linux/sched/task.h&gt; out of &lt;linux/sched.h&gt;, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder &lt;linux/sched/task.h&gt; file that just
maps to &lt;linux/sched.h&gt; to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/headers: Prepare to remove &lt;linux/cred.h&gt; inclusion from &lt;linux/sched.h&gt;</title>
<updated>2017-03-02T07:42:31Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2017-02-02T16:54:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5b825c3af1d8a0af4deb4a5eb349d0d0050c62e5'/>
<id>urn:sha1:5b825c3af1d8a0af4deb4a5eb349d0d0050c62e5</id>
<content type='text'>
Add #include &lt;linux/cred.h&gt; dependencies to all .c files rely on sched.h
doing that for them.

Note that even if the count where we need to add extra headers seems high,
it's still a net win, because &lt;linux/sched.h&gt; is included in over
2,200 files ...

Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>pid: fix lockdep deadlock warning due to ucount_lock</title>
<updated>2017-01-10T00:34:56Z</updated>
<author>
<name>Andrei Vagin</name>
<email>avagin@openvz.org</email>
</author>
<published>2017-01-05T03:28:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=add7c65ca426b7a37184dd3d2172394e23d585d6'/>
<id>urn:sha1:add7c65ca426b7a37184dd3d2172394e23d585d6</id>
<content type='text'>
=========================================================
[ INFO: possible irq lock inversion dependency detected ]
4.10.0-rc2-00024-g4aecec9-dirty #118 Tainted: G        W
---------------------------------------------------------
swapper/1/0 just changed the state of lock:
 (&amp;(&amp;sighand-&gt;siglock)-&gt;rlock){-.....}, at: [&lt;ffffffffbd0a1bc6&gt;] __lock_task_sighand+0xb6/0x2c0
but this lock took another, HARDIRQ-unsafe lock in the past:
 (ucounts_lock){+.+...}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Chain exists of:                 &amp;(&amp;sighand-&gt;siglock)-&gt;rlock --&gt; &amp;(&amp;tty-&gt;ctrl_lock)-&gt;rlock --&gt; ucounts_lock
 Possible interrupt unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock(ucounts_lock);
                               local_irq_disable();
                               lock(&amp;(&amp;sighand-&gt;siglock)-&gt;rlock);
                               lock(&amp;(&amp;tty-&gt;ctrl_lock)-&gt;rlock);
  &lt;Interrupt&gt;
    lock(&amp;(&amp;sighand-&gt;siglock)-&gt;rlock);

 *** DEADLOCK ***

This patch removes a dependency between rlock and ucount_lock.

Fixes: f333c700c610 ("pidns: Add a limit on the number of pid namespaces")
Cc: stable@vger.kernel.org
Signed-off-by: Andrei Vagin &lt;avagin@openvz.org&gt;
Acked-by: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'nsfs-ioctls' into HEAD</title>
<updated>2016-09-23T01:00:36Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2016-09-23T01:00:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=78725596644be0181c46f55c52aadfb8c70bcdb7'/>
<id>urn:sha1:78725596644be0181c46f55c52aadfb8c70bcdb7</id>
<content type='text'>
From: Andrey Vagin &lt;avagin@openvz.org&gt;

Each namespace has an owning user namespace and now there is not way
to discover these relationships.

Pid and user namepaces are hierarchical. There is no way to discover
parent-child relationships too.

Why we may want to know relationships between namespaces?

One use would be visualization, in order to understand the running
system.  Another would be to answer the question: what capability does
process X have to perform operations on a resource governed by namespace
Y?

One more use-case (which usually called abnormal) is checkpoint/restart.
In CRIU we are going to dump and restore nested namespaces.

There [1] was a discussion about which interface to choose to determing
relationships between namespaces.

Eric suggested to add two ioctl-s [2]:
&gt; Grumble, Grumble.  I think this may actually a case for creating ioctls
&gt; for these two cases.  Now that random nsfs file descriptors are bind
&gt; mountable the original reason for using proc files is not as pressing.
&gt;
&gt; One ioctl for the user namespace that owns a file descriptor.
&gt; One ioctl for the parent namespace of a namespace file descriptor.

Here is an implementaions of these ioctl-s.

$ man man7/namespaces.7
...
Since  Linux  4.X,  the  following  ioctl(2)  calls are supported for
namespace file descriptors.  The correct syntax is:

      fd = ioctl(ns_fd, ioctl_type);

where ioctl_type is one of the following:

NS_GET_USERNS
      Returns a file descriptor that refers to an owning user names‐
      pace.

NS_GET_PARENT
      Returns  a  file descriptor that refers to a parent namespace.
      This ioctl(2) can be used for pid  and  user  namespaces.  For
      user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same
      meaning.

In addition to generic ioctl(2) errors, the following  specific  ones
can occur:

EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.

EPERM  The  requested  namespace  is outside of the current namespace
      scope.

[1] https://lkml.org/lkml/2016/7/6/158
[2] https://lkml.org/lkml/2016/7/9/101

Changes for v2:
* don't return ENOENT for init_user_ns and init_pid_ns. There is nothing
  outside of the init namespace, so we can return EPERM in this case too.
  &gt; The fewer special cases the easier the code is to get
  &gt; correct, and the easier it is to read. // Eric

Changes for v3:
* rename ns-&gt;get_owner() to ns-&gt;owner(). get_* usually means that it
  grabs a reference.

Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: James Bottomley &lt;James.Bottomley@HansenPartnership.com&gt;
Cc: "Michael Kerrisk (man-pages)" &lt;mtk.manpages@gmail.com&gt;
Cc: "W. Trevor King" &lt;wking@tremily.us&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
</content>
</entry>
</feed>
