<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/pid.c, branch v4.1.6</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.1.6</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.1.6'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2015-04-17T13:04:06Z</updated>
<entry>
<title>fork: report pid reservation failure properly</title>
<updated>2015-04-17T13:04:06Z</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.cz</email>
</author>
<published>2015-04-16T19:47:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=35f71bc0a09a45924bed268d8ccd0d3407bc476f'/>
<id>urn:sha1:35f71bc0a09a45924bed268d8ccd0d3407bc476f</id>
<content type='text'>
copy_process will report any failure in alloc_pid as ENOMEM currently
which is misleading because the pid allocation might fail not only when
the memory is short but also when the pid space is consumed already.

The current man page even mentions this case:

: EAGAIN
:
:       A system-imposed limit on the number of threads was encountered.
:       There are a number of limits that may trigger this error: the
:       RLIMIT_NPROC soft resource limit (set via setrlimit(2)), which
:       limits the number of processes and threads for a real user ID, was
:       reached; the kernel's system-wide limit on the number of processes
:       and threads, /proc/sys/kernel/threads-max, was reached (see
:       proc(5)); or the maximum number of PIDs, /proc/sys/kernel/pid_max,
:       was reached (see proc(5)).

so the current behavior is also incorrect wrt.  documentation.  POSIX man
page also suggest returing EAGAIN when the process count limit is reached.

This patch simply propagates error code from alloc_pid and makes sure we
return -EAGAIN due to reservation failure.  This will make behavior of
fork closer to both our documentation and POSIX.

alloc_pid might alsoo fail when the reaper in the pid namespace is dead
(the namespace basically disallows all new processes) and there is no
good error code which would match documented ones. We have traditionally
returned ENOMEM for this case which is misleading as well but as per
Eric W. Biederman this behavior is documented in man pid_namespaces(7)

: If the "init" process of a PID namespace terminates, the kernel
: terminates all of the processes in the namespace via a SIGKILL signal.
: This behavior reflects the fact that the "init" process is essential for
: the correct operation of a PID namespace.  In this case, a subsequent
: fork(2) into this PID namespace will fail with the error ENOMEM; it is
: not possible to create a new processes in a PID namespace whose "init"
: process has terminated.

and introducing a new error code would be too risky so let's stick to
ENOMEM for this case.

Signed-off-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2014-12-16T23:53:03Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-12-16T23:53:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=603ba7e41bf5d405aba22294af5d075d8898176d'/>
<id>urn:sha1:603ba7e41bf5d405aba22294af5d075d8898176d</id>
<content type='text'>
Pull vfs pile #2 from Al Viro:
 "Next pile (and there'll be one or two more).

  The large piece in this one is getting rid of /proc/*/ns/* weirdness;
  among other things, it allows to (finally) make nameidata completely
  opaque outside of fs/namei.c, making for easier further cleanups in
  there"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  coda_venus_readdir(): use file_inode()
  fs/namei.c: fold link_path_walk() call into path_init()
  path_init(): don't bother with LOOKUP_PARENT in argument
  fs/namei.c: new helper (path_cleanup())
  path_init(): store the "base" pointer to file in nameidata itself
  make default -&gt;i_fop have -&gt;open() fail with ENXIO
  make nameidata completely opaque outside of fs/namei.c
  kill proc_ns completely
  take the targets of /proc/*/ns/* symlinks to separate fs
  bury struct proc_ns in fs/proc
  copy address of proc_ns_ops into ns_common
  new helpers: ns_alloc_inum/ns_free_inum
  make proc_ns_operations work with struct ns_common * instead of void *
  switch the rest of proc_ns_operations to working with &amp;...-&gt;ns
  netns: switch -&gt;get()/-&gt;put()/-&gt;install()/-&gt;inum() to working with &amp;net-&gt;ns
  make mntns -&gt;get()/-&gt;put()/-&gt;install()/-&gt;inum() work with &amp;mnt_ns-&gt;ns
  common object embedded into various struct ....ns
</content>
</entry>
<entry>
<title>exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting</title>
<updated>2014-12-11T01:41:18Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2014-12-10T23:55:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=24c037ebf5723d4d9ab0996433cee4f96c292a4d'/>
<id>urn:sha1:24c037ebf5723d4d9ab0996433cee4f96c292a4d</id>
<content type='text'>
alloc_pid() does get_pid_ns() beforehand but forgets to put_pid_ns() if it
fails because disable_pid_allocation() was called by the exiting
child_reaper.

We could simply move get_pid_ns() down to successful return, but this fix
tries to be as trivial as possible.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reviewed-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Aaron Tomlin &lt;atomlin@redhat.com&gt;
Cc: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Cc: Serge Hallyn &lt;serge.hallyn@ubuntu.com&gt;
Cc: Sterling Alexander &lt;stalexan@redhat.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>copy address of proc_ns_ops into ns_common</title>
<updated>2014-12-04T19:34:47Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2014-11-01T06:32:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=33c429405a2c8d9e42afb9fee88a63cfb2de1e98'/>
<id>urn:sha1:33c429405a2c8d9e42afb9fee88a63cfb2de1e98</id>
<content type='text'>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>common object embedded into various struct ....ns</title>
<updated>2014-12-04T19:31:00Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2014-11-01T02:56:04Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=435d5f4bb2ccba3b791d9ef61d2590e30b8e806e'/>
<id>urn:sha1:435d5f4bb2ccba3b791d9ef61d2590e30b8e806e</id>
<content type='text'>
for now - just move corresponding -&gt;proc_inum instances over there

Acked-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>pidns: fix free_pid() to handle the first fork failure</title>
<updated>2013-09-30T21:31:03Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2013-09-30T20:45:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=314a8ad0f18ac37887896b288939acd8cb17e208'/>
<id>urn:sha1:314a8ad0f18ac37887896b288939acd8cb17e208</id>
<content type='text'>
"case 0" in free_pid() assumes that disable_pid_allocation() should
clear PIDNS_HASH_ADDING before the last pid goes away.

However this doesn't happen if the first fork() fails to create the
child reaper which should call disable_pid_allocation().

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reviewed-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: "Serge E. Hallyn" &lt;serge@hallyn.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup</title>
<updated>2013-08-31T00:30:37Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2013-08-29T20:56:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a606488513543312805fab2b93070cefe6a3016c'/>
<id>urn:sha1:a606488513543312805fab2b93070cefe6a3016c</id>
<content type='text'>
Serge Hallyn &lt;serge.hallyn@ubuntu.com&gt; writes:

&gt; Since commit af4b8a83add95ef40716401395b44a1b579965f4 it's been
&gt; possible to get into a situation where a pidns reaper is
&gt; &lt;defunct&gt;, reparented to host pid 1, but never reaped.  How to
&gt; reproduce this is documented at
&gt;
&gt; https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526
&gt; (and see
&gt; https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/comments/13)
&gt; In short, run repeated starts of a container whose init is
&gt;
&gt; Process.exit(0);
&gt;
&gt; sysrq-t when such a task is playing zombie shows:
&gt;
&gt; [  131.132978] init            x ffff88011fc14580     0  2084   2039 0x00000000
&gt; [  131.132978]  ffff880116e89ea8 0000000000000002 ffff880116e89fd8 0000000000014580
&gt; [  131.132978]  ffff880116e89fd8 0000000000014580 ffff8801172a0000 ffff8801172a0000
&gt; [  131.132978]  ffff8801172a0630 ffff88011729fff0 ffff880116e14650 ffff88011729fff0
&gt; [  131.132978] Call Trace:
&gt; [  131.132978]  [&lt;ffffffff816f6159&gt;] schedule+0x29/0x70
&gt; [  131.132978]  [&lt;ffffffff81064591&gt;] do_exit+0x6e1/0xa40
&gt; [  131.132978]  [&lt;ffffffff81071eae&gt;] ? signal_wake_up_state+0x1e/0x30
&gt; [  131.132978]  [&lt;ffffffff8106496f&gt;] do_group_exit+0x3f/0xa0
&gt; [  131.132978]  [&lt;ffffffff810649e4&gt;] SyS_exit_group+0x14/0x20
&gt; [  131.132978]  [&lt;ffffffff8170102f&gt;] tracesys+0xe1/0xe6
&gt;
&gt; Further debugging showed that every time this happened, zap_pid_ns_processes()
&gt; started with nr_hashed being 3, while we were expecting it to drop to 2.
&gt; Any time it didn't happen, nr_hashed was 1 or 2.  So the reaper was
&gt; waiting for nr_hashed to become 2, but free_pid() only wakes the reaper
&gt; if nr_hashed hits 1.

The issue is that when the task group leader of an init process exits
before other tasks of the init process when the init process finally
exits it will be a secondary task sleeping in zap_pid_ns_processes and
waiting to wake up when the number of hashed pids drops to two.  This
case waits forever as free_pid only sends a wake up when the number of
hashed pids drops to 1.

To correct this the simple strategy of sending a possibly unncessary
wake up when the number of hashed pids drops to 2 is adopted.

Sending one extraneous wake up is relatively harmless, at worst we
waste a little cpu time in the rare case when a pid namespace
appropaches exiting.

We can detect the case when the pid namespace drops to just two pids
hashed race free in free_pid.

Dereferencing pid_ns-&gt;child_reaper with the pidmap_lock held is safe
without out the tasklist_lock because it is guaranteed that the
detach_pid will be called on the child_reaper before it is freed and
detach_pid calls __change_pid which calls free_pid which takes the
pidmap_lock.  __change_pid only calls free_pid if this is the
last use of the pid.  For a thread that is not the thread group leader
the threads pid will only ever have one user because a threads pid
is not allowed to be the pid of a process, of a process group or
a session.  For a thread that is a thread group leader all of
the other threads of that process will be reaped before it is allowed
for the thread group leader to be reaped ensuring there will only
be one user of the threads pid as a process pid.  Furthermore
because the thread is the init process of a pid namespace all of the
other processes in the pid namespace will have also been already freed
leading to the fact that the pid will not be used as a session pid or
a process group pid for any other running process.

CC: stable@vger.kernel.org
Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Tested-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Reported-by: Serge Hallyn &lt;serge.hallyn@ubuntu.com&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
</entry>
<entry>
<title>kernel/pid.c: move statement</title>
<updated>2013-07-03T23:08:05Z</updated>
<author>
<name>Raphael S. Carvalho</name>
<email>raphael.scarv@gmail.com</email>
</author>
<published>2013-07-03T22:09:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8f75af44eed0c81f818b5b345023cebfe5209400'/>
<id>urn:sha1:8f75af44eed0c81f818b5b345023cebfe5209400</id>
<content type='text'>
Move statement to static initilization of init_pid_ns.

Signed-off-by: Raphael S. Carvalho &lt;raphael.scarv@gmail.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>kernel/fork.c:copy_process(): don't add the uninitialized child to thread/task/pid lists</title>
<updated>2013-07-03T23:08:03Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2013-07-03T22:08:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8190773985141f063e1d6dc10200527c655abfb5'/>
<id>urn:sha1:8190773985141f063e1d6dc10200527c655abfb5</id>
<content type='text'>
copy_process() adds the new child to thread_group/init_task.tasks list and
then does attach_pid(child, PIDTYPE_PID).  This means that the lockless
next_thread() or next_task() can see this thread with the wrong pid.  Say,
"ls /proc/pid/task" can list the same inode twice.

We could move attach_pid(child, PIDTYPE_PID) up, but in this case
find_task_by_vpid() can find the new thread before it was fully
initialized.

And this is already true for PIDTYPE_PGID/PIDTYPE_SID, With this patch
copy_process() initializes child-&gt;pids[*].pid first, then calls
attach_pid() to insert the task into the pid-&gt;tasks list.

attach_pid() no longer need the "struct pid*" argument, it is always
called after pid_link-&gt;pid was already set.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Pavel Emelyanov &lt;xemul@parallels.com&gt;
Cc: Sergey Dyasly &lt;dserrg@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2013-05-02T00:51:54Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-05-02T00:51:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=20b4fb485227404329e41ad15588afad3df23050'/>
<id>urn:sha1:20b4fb485227404329e41ad15588afad3df23050</id>
<content type='text'>
Pull VFS updates from Al Viro,

Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).

7kloc removed.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
  don't bother with deferred freeing of fdtables
  proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
  proc: Make the PROC_I() and PDE() macros internal to procfs
  proc: Supply a function to remove a proc entry by PDE
  take cgroup_open() and cpuset_open() to fs/proc/base.c
  ppc: Clean up scanlog
  ppc: Clean up rtas_flash driver somewhat
  hostap: proc: Use remove_proc_subtree()
  drm: proc: Use remove_proc_subtree()
  drm: proc: Use minor-&gt;index to label things, not PDE-&gt;name
  drm: Constify drm_proc_list[]
  zoran: Don't print proc_dir_entry data in debug
  reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
  proc: Supply an accessor for getting the data from a PDE's parent
  airo: Use remove_proc_subtree()
  rtl8192u: Don't need to save device proc dir PDE
  rtl8187se: Use a dir under /proc/net/r8180/
  proc: Add proc_mkdir_data()
  proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
  proc: Move PDE_NET() to fs/proc/proc_net.c
  ...
</content>
</entry>
</feed>
