<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/pid_namespace.c, branch v4.14.92</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.14.92</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.14.92'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2017-07-20T12:43:58Z</updated>
<entry>
<title>userns,pidns: Verify the userns for new pid namespaces</title>
<updated>2017-07-20T12:43:58Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2017-04-29T19:12:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a2b426267c56773201f968fdb5eda6ab9ae94e34'/>
<id>urn:sha1:a2b426267c56773201f968fdb5eda6ab9ae94e34</id>
<content type='text'>
It is pointless and confusing to allow a pid namespace hierarchy and
the user namespace hierarchy to get out of sync.  The owner of a child
pid namespace should be the owner of the parent pid namespace or
a descendant of the owner of the parent pid namespace.

Otherwise it is possible to construct scenarios where a process has a
capability over a parent pid namespace but does not have the
capability over a child pid namespace.  Which confusingly makes
permission checks non-transitive.

It requires use of setns into a pid namespace (but not into a user
namespace) to create such a scenario.

Add the function in_userns to help in making this determination.

v2: Optimized in_userns by using level as suggested
    by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;

Ref: 49f4d8b93ccf ("pidns: Capture the user namespace and filter ns_last_pid")
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes</title>
<updated>2017-05-13T22:26:01Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2017-05-11T23:21:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b9a985db98961ae1ba0be169f19df1c567e4ffe0'/>
<id>urn:sha1:b9a985db98961ae1ba0be169f19df1c567e4ffe0</id>
<content type='text'>
The code can potentially sleep for an indefinite amount of time in
zap_pid_ns_processes triggering the hung task timeout, and increasing
the system average.  This is undesirable.  Sleep with a task state of
TASK_INTERRUPTIBLE instead of TASK_UNINTERRUPTIBLE to remove these
undesirable side effects.

Apparently under heavy load this has been allowing Chrome to trigger
the hung time task timeout error and cause ChromeOS to reboot.

Reported-by: Vovo Yang &lt;vovoy@google.com&gt;
Reported-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Tested-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Fixes: 6347e9009104 ("pidns: guarantee that the pidns init will be the last pidns process reaped")
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>pidns: expose task pid_ns_for_children to userspace</title>
<updated>2017-05-09T00:15:12Z</updated>
<author>
<name>Kirill Tkhai</name>
<email>ktkhai@virtuozzo.com</email>
</author>
<published>2017-05-08T22:56:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=eaa0d190bfe1ed891b814a52712dcd852554cb08'/>
<id>urn:sha1:eaa0d190bfe1ed891b814a52712dcd852554cb08</id>
<content type='text'>
pid_ns_for_children set by a task is known only to the task itself, and
it's impossible to identify it from outside.

It's a big problem for checkpoint/restore software like CRIU, because it
can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of
their work.

This patch solves the problem, and it exposes pid_ns_for_children to ns
directory in standard way with the name "pid_for_children":

  ~# ls /proc/5531/ns -l | grep pid
  lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -&gt; pid:[4026531836]
  lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -&gt; pid:[4026532286]

Link: http://lkml.kernel.org/r/149201123914.6007.2187327078064239572.stgit@localhost.localdomain
Signed-off-by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Cc: Andrei Vagin &lt;avagin@virtuozzo.com&gt;
Cc: Andreas Gruenbacher &lt;agruenba@redhat.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@googlemail.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Paul Moore &lt;paul@paul-moore.com&gt;
Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Serge Hallyn &lt;serge@hallyn.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>sched/headers: Prepare for the reduction of &lt;linux/sched.h&gt;'s signal API dependency</title>
<updated>2017-03-02T07:42:37Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2017-02-03T22:47:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f361bf4a66c9bfabace46f6ff5d97005c9b524fe'/>
<id>urn:sha1:f361bf4a66c9bfabace46f6ff5d97005c9b524fe</id>
<content type='text'>
Instead of including the full &lt;linux/signal.h&gt;, we are going to include the
types-only &lt;linux/signal_types.h&gt; header in &lt;linux/sched.h&gt;, to further
decouple the scheduler header from the signal headers.

This means that various files which relied on the full &lt;linux/signal.h&gt; need
to be updated to gain an explicit dependency on it.

Update the code that relies on sched.h's inclusion of the &lt;linux/signal.h&gt; header.

Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/headers: Prepare for new header dependencies before moving code to &lt;linux/sched/task.h&gt;</title>
<updated>2017-03-02T07:42:35Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2017-02-08T17:51:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=299300258d1bc4e997b7db340a2e06636757fe2e'/>
<id>urn:sha1:299300258d1bc4e997b7db340a2e06636757fe2e</id>
<content type='text'>
We are going to split &lt;linux/sched/task.h&gt; out of &lt;linux/sched.h&gt;, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder &lt;linux/sched/task.h&gt; file that just
maps to &lt;linux/sched.h&gt; to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/headers: Prepare to remove &lt;linux/cred.h&gt; inclusion from &lt;linux/sched.h&gt;</title>
<updated>2017-03-02T07:42:31Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2017-02-02T16:54:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5b825c3af1d8a0af4deb4a5eb349d0d0050c62e5'/>
<id>urn:sha1:5b825c3af1d8a0af4deb4a5eb349d0d0050c62e5</id>
<content type='text'>
Add #include &lt;linux/cred.h&gt; dependencies to all .c files rely on sched.h
doing that for them.

Note that even if the count where we need to add extra headers seems high,
it's still a net win, because &lt;linux/sched.h&gt; is included in over
2,200 files ...

Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>pid: fix lockdep deadlock warning due to ucount_lock</title>
<updated>2017-01-10T00:34:56Z</updated>
<author>
<name>Andrei Vagin</name>
<email>avagin@openvz.org</email>
</author>
<published>2017-01-05T03:28:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=add7c65ca426b7a37184dd3d2172394e23d585d6'/>
<id>urn:sha1:add7c65ca426b7a37184dd3d2172394e23d585d6</id>
<content type='text'>
=========================================================
[ INFO: possible irq lock inversion dependency detected ]
4.10.0-rc2-00024-g4aecec9-dirty #118 Tainted: G        W
---------------------------------------------------------
swapper/1/0 just changed the state of lock:
 (&amp;(&amp;sighand-&gt;siglock)-&gt;rlock){-.....}, at: [&lt;ffffffffbd0a1bc6&gt;] __lock_task_sighand+0xb6/0x2c0
but this lock took another, HARDIRQ-unsafe lock in the past:
 (ucounts_lock){+.+...}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Chain exists of:                 &amp;(&amp;sighand-&gt;siglock)-&gt;rlock --&gt; &amp;(&amp;tty-&gt;ctrl_lock)-&gt;rlock --&gt; ucounts_lock
 Possible interrupt unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock(ucounts_lock);
                               local_irq_disable();
                               lock(&amp;(&amp;sighand-&gt;siglock)-&gt;rlock);
                               lock(&amp;(&amp;tty-&gt;ctrl_lock)-&gt;rlock);
  &lt;Interrupt&gt;
    lock(&amp;(&amp;sighand-&gt;siglock)-&gt;rlock);

 *** DEADLOCK ***

This patch removes a dependency between rlock and ucount_lock.

Fixes: f333c700c610 ("pidns: Add a limit on the number of pid namespaces")
Cc: stable@vger.kernel.org
Signed-off-by: Andrei Vagin &lt;avagin@openvz.org&gt;
Acked-by: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'nsfs-ioctls' into HEAD</title>
<updated>2016-09-23T01:00:36Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2016-09-23T01:00:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=78725596644be0181c46f55c52aadfb8c70bcdb7'/>
<id>urn:sha1:78725596644be0181c46f55c52aadfb8c70bcdb7</id>
<content type='text'>
From: Andrey Vagin &lt;avagin@openvz.org&gt;

Each namespace has an owning user namespace and now there is not way
to discover these relationships.

Pid and user namepaces are hierarchical. There is no way to discover
parent-child relationships too.

Why we may want to know relationships between namespaces?

One use would be visualization, in order to understand the running
system.  Another would be to answer the question: what capability does
process X have to perform operations on a resource governed by namespace
Y?

One more use-case (which usually called abnormal) is checkpoint/restart.
In CRIU we are going to dump and restore nested namespaces.

There [1] was a discussion about which interface to choose to determing
relationships between namespaces.

Eric suggested to add two ioctl-s [2]:
&gt; Grumble, Grumble.  I think this may actually a case for creating ioctls
&gt; for these two cases.  Now that random nsfs file descriptors are bind
&gt; mountable the original reason for using proc files is not as pressing.
&gt;
&gt; One ioctl for the user namespace that owns a file descriptor.
&gt; One ioctl for the parent namespace of a namespace file descriptor.

Here is an implementaions of these ioctl-s.

$ man man7/namespaces.7
...
Since  Linux  4.X,  the  following  ioctl(2)  calls are supported for
namespace file descriptors.  The correct syntax is:

      fd = ioctl(ns_fd, ioctl_type);

where ioctl_type is one of the following:

NS_GET_USERNS
      Returns a file descriptor that refers to an owning user names‐
      pace.

NS_GET_PARENT
      Returns  a  file descriptor that refers to a parent namespace.
      This ioctl(2) can be used for pid  and  user  namespaces.  For
      user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same
      meaning.

In addition to generic ioctl(2) errors, the following  specific  ones
can occur:

EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.

EPERM  The  requested  namespace  is outside of the current namespace
      scope.

[1] https://lkml.org/lkml/2016/7/6/158
[2] https://lkml.org/lkml/2016/7/9/101

Changes for v2:
* don't return ENOENT for init_user_ns and init_pid_ns. There is nothing
  outside of the init namespace, so we can return EPERM in this case too.
  &gt; The fewer special cases the easier the code is to get
  &gt; correct, and the easier it is to read. // Eric

Changes for v3:
* rename ns-&gt;get_owner() to ns-&gt;owner(). get_* usually means that it
  grabs a reference.

Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: James Bottomley &lt;James.Bottomley@HansenPartnership.com&gt;
Cc: "Michael Kerrisk (man-pages)" &lt;mtk.manpages@gmail.com&gt;
Cc: "W. Trevor King" &lt;wking@tremily.us&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
</content>
</entry>
<entry>
<title>nsfs: add ioctl to get a parent namespace</title>
<updated>2016-09-23T00:59:41Z</updated>
<author>
<name>Andrey Vagin</name>
<email>avagin@openvz.org</email>
</author>
<published>2016-09-06T07:47:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a7306ed8d94af729ecef8b6e37506a1c6fc14788'/>
<id>urn:sha1:a7306ed8d94af729ecef8b6e37506a1c6fc14788</id>
<content type='text'>
Pid and user namepaces are hierarchical. There is no way to discover
parent-child relationships.

In a future we will use this interface to dump and restore nested
namespaces.

Acked-by: Serge Hallyn &lt;serge@hallyn.com&gt;
Signed-off-by: Andrei Vagin &lt;avagin@openvz.org&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>kernel: add a helper to get an owning user namespace for a namespace</title>
<updated>2016-09-23T00:59:39Z</updated>
<author>
<name>Andrey Vagin</name>
<email>avagin@openvz.org</email>
</author>
<published>2016-09-06T07:47:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bcac25a58bfc6bd79191ac5d7afb49bea96da8c9'/>
<id>urn:sha1:bcac25a58bfc6bd79191ac5d7afb49bea96da8c9</id>
<content type='text'>
Return -EPERM if an owning user namespace is outside of a process
current user namespace.

v2: In a first version ns_get_owner returned ENOENT for init_user_ns.
    This special cases was removed from this version. There is nothing
    outside of init_user_ns, so we can return EPERM.
v3: rename ns-&gt;get_owner() to ns-&gt;owner(). get_* usually means that it
grabs a reference.

Acked-by: Serge Hallyn &lt;serge@hallyn.com&gt;
Signed-off-by: Andrei Vagin &lt;avagin@openvz.org&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
</feed>
