<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/pid_namespace.c, branch v4.20.10</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.20.10</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.20.10'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2018-09-16T14:08:25Z</updated>
<entry>
<title>signal: Use group_send_sig_info to kill all processes in a pid namespace</title>
<updated>2018-09-16T14:08:25Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2018-07-20T21:35:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=82058d6684658430cd9b4123d4c3e863fd48f813'/>
<id>urn:sha1:82058d6684658430cd9b4123d4c3e863fd48f813</id>
<content type='text'>
Replace send_sig_info in zap_pid_ns_processes with
group_send_sig_info.  This makes more sense as the entire process
group is being killed.  More importantly this allows the kill of those
processes with PIDTYPE_MAX to indicate all of the process in the pid
namespace are being signaled.  This is needed for fork to detect when
signals are sent to a group of processes.

Admittedly fork has another case to catch SIGKILL but the principle remains
that it is desirable to know when a group of processes is being signaled.

Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>signal: Use SEND_SIG_PRIV not SEND_SIG_FORCED with SIGKILL and SIGSTOP</title>
<updated>2018-09-11T19:19:35Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2018-09-03T08:32:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=079b22dc9be985c591589fcb94769b8e13518aa0'/>
<id>urn:sha1:079b22dc9be985c591589fcb94769b8e13518aa0</id>
<content type='text'>
Now that siginfo is never allocated for SIGKILL and SIGSTOP there is
no difference between SEND_SIG_PRIV and SEND_SIG_FORCED for SIGKILL
and SIGSTOP.  This makes SEND_SIG_FORCED unnecessary and redundant in
the presence of SIGKILL and SIGSTOP.  Therefore change users of
SEND_SIG_FORCED that are sending SIGKILL or SIGSTOP to use
SEND_SIG_PRIV instead.

This removes the last users of SEND_SIG_FORCED.

Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace</title>
<updated>2018-04-04T02:15:32Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-04-04T02:15:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=17dec0a949153d9ac00760ba2f5b78cb583e995f'/>
<id>urn:sha1:17dec0a949153d9ac00760ba2f5b78cb583e995f</id>
<content type='text'>
Pull namespace updates from Eric Biederman:
 "There was a lot of work this cycle fixing bugs that were discovered
  after the merge window and getting everything ready where we can
  reasonably support fully unprivileged fuse. The bug fixes you already
  have and much of the unprivileged fuse work is coming in via other
  trees.

  Still left for fully unprivileged fuse is figuring out how to cleanly
  handle .set_acl and .get_acl in the legacy case, and properly handling
  of evm xattrs on unprivileged mounts.

  Included in the tree is a cleanup from Alexely that replaced a linked
  list with a statically allocated fix sized array for the pid caches,
  which simplifies and speeds things up.

  Then there is are some cleanups and fixes for the ipc namespace. The
  motivation was that in reviewing other code it was discovered that
  access ipc objects from different pid namespaces recorded pids in such
  a way that when asked the wrong pids were returned. In the worst case
  there has been a measured 30% performance impact for sysvipc
  semaphores. Other test cases showed no measurable performance impact.
  Manfred Spraul and Davidlohr Bueso who tend to work on sysvipc
  performance both gave the nod that this is good enough.

  Casey Schaufler and James Morris have given their approval to the LSM
  side of the changes.

  I simplified the types and the code dealing with sysvipc to pass just
  kern_ipc_perm for all three types of ipc. Which reduced the header
  dependencies throughout the kernel and simplified the lsm code.

  Which let me work on the pid fixes without having to worry about
  trivial changes causing complete kernel recompiles"

* 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ipc/shm: Fix pid freeing.
  ipc/shm: fix up for struct file no longer being available in shm.h
  ipc/smack: Tidy up from the change in type of the ipc security hooks
  ipc: Directly call the security hook in ipc_ops.associate
  ipc/sem: Fix semctl(..., GETPID, ...) between pid namespaces
  ipc/msg: Fix msgctl(..., IPC_STAT, ...) between pid namespaces
  ipc/shm: Fix shmctl(..., IPC_STAT, ...) between pid namespaces.
  ipc/util: Helpers for making the sysvipc operations pid namespace aware
  ipc: Move IPCMNI from include/ipc.h into ipc/util.h
  msg: Move struct msg_queue into ipc/msg.c
  shm: Move struct shmid_kernel into ipc/shm.c
  sem: Move struct sem and struct sem_array into ipc/sem.c
  msg/security: Pass kern_ipc_perm not msg_queue into the msg_queue security hooks
  shm/security: Pass kern_ipc_perm not shmid_kernel into the shm security hooks
  sem/security: Pass kern_ipc_perm not sem_array into the sem security hooks
  pidns: simpler allocation of pid_* caches
</content>
</entry>
<entry>
<title>kernel: use kernel_wait4() instead of sys_wait4()</title>
<updated>2018-04-02T18:14:51Z</updated>
<author>
<name>Dominik Brodowski</name>
<email>linux@dominikbrodowski.net</email>
</author>
<published>2018-03-11T10:34:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d300b610812f3c10d146db4c18f98eba38834c70'/>
<id>urn:sha1:d300b610812f3c10d146db4c18f98eba38834c70</id>
<content type='text'>
All call sites of sys_wait4() set *rusage to NULL. Therefore, there is
no need for the copy_to_user() handling of *rusage, and we can use
kernel_wait4() directly.

This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net

Acked-by: Luis R. Rodriguez &lt;mcgrof@kernel.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Dominik Brodowski &lt;linux@dominikbrodowski.net&gt;
</content>
</entry>
<entry>
<title>pidns: simpler allocation of pid_* caches</title>
<updated>2018-03-21T14:40:17Z</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2018-03-20T18:51:06Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=dd206bec9a446884805370b1c16c1d7a97036777'/>
<id>urn:sha1:dd206bec9a446884805370b1c16c1d7a97036777</id>
<content type='text'>
Those pid_* caches are created on demand when a process advances to the new
level of pid namespace. Which means pointers are stable, write only and
thus can be packed into an array instead of spreading them over and using
lists(!) to find them.

Both first and subsequent clone/unshare(CLONE_NEWPID) become faster.

Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>pid: remove pidhash</title>
<updated>2017-11-18T00:10:04Z</updated>
<author>
<name>Gargi Sharma</name>
<email>gs051095@gmail.com</email>
</author>
<published>2017-11-17T23:30:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e8cfbc245e24887e3c30235f71e9e9405e0cfc39'/>
<id>urn:sha1:e8cfbc245e24887e3c30235f71e9e9405e0cfc39</id>
<content type='text'>
pidhash is no longer required as all the information can be looked up
from idr tree.  nr_hashed represented the number of pids that had been
hashed.  Since, nr_hashed and PIDNS_HASH_ADDING are no longer relevant,
it has been renamed to pid_allocated and PIDNS_ADDING respectively.

[gs051095@gmail.com: v6]
  Link: http://lkml.kernel.org/r/1507760379-21662-3-git-send-email-gs051095@gmail.com
Link: http://lkml.kernel.org/r/1507583624-22146-3-git-send-email-gs051095@gmail.com
Signed-off-by: Gargi Sharma &lt;gs051095@gmail.com&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Tested-by: Tony Luck &lt;tony.luck@intel.com&gt;	[ia64]
Cc: Julia Lawall &lt;julia.lawall@lip6.fr&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@oracle.com&gt;
Cc: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>pid: replace pid bitmap implementation with IDR API</title>
<updated>2017-11-18T00:10:03Z</updated>
<author>
<name>Gargi Sharma</name>
<email>gs051095@gmail.com</email>
</author>
<published>2017-11-17T23:30:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=95846ecf9dac5089aed4b144d912225f8ef86ae4'/>
<id>urn:sha1:95846ecf9dac5089aed4b144d912225f8ef86ae4</id>
<content type='text'>
Patch series "Replacing PID bitmap implementation with IDR API", v4.

This series replaces kernel bitmap implementation of PID allocation with
IDR API.  These patches are written to simplify the kernel by replacing
custom code with calls to generic code.

The following are the stats for pid and pid_namespace object files
before and after the replacement.  There is a noteworthy change between
the IDR and bitmap implementation.

Before
   text       data        bss        dec        hex    filename
   8447       3894         64      12405       3075    kernel/pid.o
After
   text       data        bss        dec        hex    filename
   3397        304          0       3701        e75    kernel/pid.o

Before
   text       data        bss        dec        hex    filename
   5692       1842        192       7726       1e2e    kernel/pid_namespace.o
After
   text       data        bss        dec        hex    filename
   2854        216         16       3086        c0e    kernel/pid_namespace.o

The following are the stats for ps, pstree and calling readdir on /proc
for 10,000 processes.

ps:
        With IDR API    With bitmap
real    0m1.479s        0m2.319s
user    0m0.070s        0m0.060s
sys     0m0.289s        0m0.516s

pstree:
        With IDR API    With bitmap
real    0m1.024s        0m1.794s
user    0m0.348s        0m0.612s
sys     0m0.184s        0m0.264s

proc:
        With IDR API    With bitmap
real    0m0.059s        0m0.074s
user    0m0.000s        0m0.004s
sys     0m0.016s        0m0.016s

This patch (of 2):

Replace the current bitmap implementation for Process ID allocation.
Functions that are no longer required, for example, free_pidmap(),
alloc_pidmap(), etc.  are removed.  The rest of the functions are
modified to use the IDR API.  The change was made to make the PID
allocation less complex by replacing custom code with calls to generic
API.

[gs051095@gmail.com: v6]
  Link: http://lkml.kernel.org/r/1507760379-21662-2-git-send-email-gs051095@gmail.com
[avagin@openvz.org: restore the old behaviour of the ns_last_pid sysctl]
  Link: http://lkml.kernel.org/r/20171106183144.16368-1-avagin@openvz.org
Link: http://lkml.kernel.org/r/1507583624-22146-2-git-send-email-gs051095@gmail.com
Signed-off-by: Gargi Sharma &lt;gs051095@gmail.com&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Julia Lawall &lt;julia.lawall@lip6.fr&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Pavel Tatashin &lt;pasha.tatashin@oracle.com&gt;
Cc: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: Christoph Hellwig &lt;hch@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>userns,pidns: Verify the userns for new pid namespaces</title>
<updated>2017-07-20T12:43:58Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2017-04-29T19:12:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a2b426267c56773201f968fdb5eda6ab9ae94e34'/>
<id>urn:sha1:a2b426267c56773201f968fdb5eda6ab9ae94e34</id>
<content type='text'>
It is pointless and confusing to allow a pid namespace hierarchy and
the user namespace hierarchy to get out of sync.  The owner of a child
pid namespace should be the owner of the parent pid namespace or
a descendant of the owner of the parent pid namespace.

Otherwise it is possible to construct scenarios where a process has a
capability over a parent pid namespace but does not have the
capability over a child pid namespace.  Which confusingly makes
permission checks non-transitive.

It requires use of setns into a pid namespace (but not into a user
namespace) to create such a scenario.

Add the function in_userns to help in making this determination.

v2: Optimized in_userns by using level as suggested
    by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;

Ref: 49f4d8b93ccf ("pidns: Capture the user namespace and filter ns_last_pid")
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes</title>
<updated>2017-05-13T22:26:01Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2017-05-11T23:21:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b9a985db98961ae1ba0be169f19df1c567e4ffe0'/>
<id>urn:sha1:b9a985db98961ae1ba0be169f19df1c567e4ffe0</id>
<content type='text'>
The code can potentially sleep for an indefinite amount of time in
zap_pid_ns_processes triggering the hung task timeout, and increasing
the system average.  This is undesirable.  Sleep with a task state of
TASK_INTERRUPTIBLE instead of TASK_UNINTERRUPTIBLE to remove these
undesirable side effects.

Apparently under heavy load this has been allowing Chrome to trigger
the hung time task timeout error and cause ChromeOS to reboot.

Reported-by: Vovo Yang &lt;vovoy@google.com&gt;
Reported-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Tested-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Fixes: 6347e9009104 ("pidns: guarantee that the pidns init will be the last pidns process reaped")
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>pidns: expose task pid_ns_for_children to userspace</title>
<updated>2017-05-09T00:15:12Z</updated>
<author>
<name>Kirill Tkhai</name>
<email>ktkhai@virtuozzo.com</email>
</author>
<published>2017-05-08T22:56:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=eaa0d190bfe1ed891b814a52712dcd852554cb08'/>
<id>urn:sha1:eaa0d190bfe1ed891b814a52712dcd852554cb08</id>
<content type='text'>
pid_ns_for_children set by a task is known only to the task itself, and
it's impossible to identify it from outside.

It's a big problem for checkpoint/restore software like CRIU, because it
can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of
their work.

This patch solves the problem, and it exposes pid_ns_for_children to ns
directory in standard way with the name "pid_for_children":

  ~# ls /proc/5531/ns -l | grep pid
  lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -&gt; pid:[4026531836]
  lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -&gt; pid:[4026532286]

Link: http://lkml.kernel.org/r/149201123914.6007.2187327078064239572.stgit@localhost.localdomain
Signed-off-by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Cc: Andrei Vagin &lt;avagin@virtuozzo.com&gt;
Cc: Andreas Gruenbacher &lt;agruenba@redhat.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@googlemail.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Paul Moore &lt;paul@paul-moore.com&gt;
Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Serge Hallyn &lt;serge@hallyn.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
