<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/pid_namespace.h, branch v3.4.15</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.4.15</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.4.15'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2012-03-29T00:14:36Z</updated>
<entry>
<title>pidns: add reboot_pid_ns() to handle the reboot syscall</title>
<updated>2012-03-29T00:14:36Z</updated>
<author>
<name>Daniel Lezcano</name>
<email>daniel.lezcano@free.fr</email>
</author>
<published>2012-03-28T21:42:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cf3f89214ef6a33fad60856bc5ffd7bb2fc4709b'/>
<id>urn:sha1:cf3f89214ef6a33fad60856bc5ffd7bb2fc4709b</id>
<content type='text'>
In the case of a child pid namespace, rebooting the system does not really
makes sense.  When the pid namespace is used in conjunction with the other
namespaces in order to create a linux container, the reboot syscall leads
to some problems.

A container can reboot the host.  That can be fixed by dropping the
sys_reboot capability but we are unable to correctly to poweroff/
halt/reboot a container and the container stays stuck at the shutdown time
with the container's init process waiting indefinitively.

After several attempts, no solution from userspace was found to reliabily
handle the shutdown from a container.

This patch propose to make the init process of the child pid namespace to
exit with a signal status set to : SIGINT if the child pid namespace
called "halt/poweroff" and SIGHUP if the child pid namespace called
"reboot".  When the reboot syscall is called and we are not in the initial
pid namespace, we kill the pid namespace for "HALT", "POWEROFF",
"RESTART", and "RESTART2".  Otherwise we return EINVAL.

Returning EINVAL is also an easy way to check if this feature is supported
by the kernel when invoking another 'reboot' option like CAD.

By this way the parent process of the child pid namespace knows if it
rebooted or not and can take the right decision.

Test case:
==========

#include &lt;alloca.h&gt;
#include &lt;stdio.h&gt;
#include &lt;sched.h&gt;
#include &lt;unistd.h&gt;
#include &lt;signal.h&gt;
#include &lt;sys/reboot.h&gt;
#include &lt;sys/types.h&gt;
#include &lt;sys/wait.h&gt;

#include &lt;linux/reboot.h&gt;

static int do_reboot(void *arg)
{
        int *cmd = arg;

        if (reboot(*cmd))
                printf("failed to reboot(%d): %m\n", *cmd);
}

int test_reboot(int cmd, int sig)
{
        long stack_size = 4096;
        void *stack = alloca(stack_size) + stack_size;
        int status;
        pid_t ret;

        ret = clone(do_reboot, stack, CLONE_NEWPID | SIGCHLD, &amp;cmd);
        if (ret &lt; 0) {
                printf("failed to clone: %m\n");
                return -1;
        }

        if (wait(&amp;status) &lt; 0) {
                printf("unexpected wait error: %m\n");
                return -1;
        }

        if (!WIFSIGNALED(status)) {
                printf("child process exited but was not signaled\n");
                return -1;
        }

        if (WTERMSIG(status) != sig) {
                printf("signal termination is not the one expected\n");
                return -1;
        }

        return 0;
}

int main(int argc, char *argv[])
{
        int status;

        status = test_reboot(LINUX_REBOOT_CMD_RESTART, SIGHUP);
        if (status &lt; 0)
                return 1;
        printf("reboot(LINUX_REBOOT_CMD_RESTART) succeed\n");

        status = test_reboot(LINUX_REBOOT_CMD_RESTART2, SIGHUP);
        if (status &lt; 0)
                return 1;
        printf("reboot(LINUX_REBOOT_CMD_RESTART2) succeed\n");

        status = test_reboot(LINUX_REBOOT_CMD_HALT, SIGINT);
        if (status &lt; 0)
                return 1;
        printf("reboot(LINUX_REBOOT_CMD_HALT) succeed\n");

        status = test_reboot(LINUX_REBOOT_CMD_POWER_OFF, SIGINT);
        if (status &lt; 0)
                return 1;
        printf("reboot(LINUX_REBOOT_CMD_POWERR_OFF) succeed\n");

        status = test_reboot(LINUX_REBOOT_CMD_CAD_ON, -1);
        if (status &gt;= 0) {
                printf("reboot(LINUX_REBOOT_CMD_CAD_ON) should have failed\n");
                return 1;
        }
        printf("reboot(LINUX_REBOOT_CMD_CAD_ON) has failed as expected\n");

        return 0;
}

[akpm@linux-foundation.org: tweak and add comments]
[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@free.fr&gt;
Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Tested-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@gmail.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>BUG: headers with BUG/BUG_ON etc. need linux/bug.h</title>
<updated>2012-03-04T22:54:34Z</updated>
<author>
<name>Paul Gortmaker</name>
<email>paul.gortmaker@windriver.com</email>
</author>
<published>2011-11-24T01:12:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=187f1882b5b0748b3c4c22274663fdb372ac0452'/>
<id>urn:sha1:187f1882b5b0748b3c4c22274663fdb372ac0452</id>
<content type='text'>
If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
other BUG variant in a static inline (i.e. not in a #define) then
that header really should be including &lt;linux/bug.h&gt; and not just
expecting it to be implicitly present.

We can make this change risk-free, since if the files using these
headers didn't have exposure to linux/bug.h already, they would have
been causing compile failures/warnings.

Signed-off-by: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
</content>
</entry>
<entry>
<title>procfs: add hidepid= and gid= mount options</title>
<updated>2012-01-11T00:30:54Z</updated>
<author>
<name>Vasiliy Kulikov</name>
<email>segooon@gmail.com</email>
</author>
<published>2012-01-10T23:11:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0499680a42141d86417a8fbaa8c8db806bea1201'/>
<id>urn:sha1:0499680a42141d86417a8fbaa8c8db806bea1201</id>
<content type='text'>
Add support for mount options to restrict access to /proc/PID/
directories.  The default backward-compatible "relaxed" behaviour is left
untouched.

The first mount option is called "hidepid" and its value defines how much
info about processes we want to be available for non-owners:

hidepid=0 (default) means the old behavior - anybody may read all
world-readable /proc/PID/* files.

hidepid=1 means users may not access any /proc/&lt;pid&gt;/ directories, but
their own.  Sensitive files like cmdline, sched*, status are now protected
against other users.  As permission checking done in proc_pid_permission()
and files' permissions are left untouched, programs expecting specific
files' modes are not confused.

hidepid=2 means hidepid=1 plus all /proc/PID/ will be invisible to other
users.  It doesn't mean that it hides whether a process exists (it can be
learned by other means, e.g.  by kill -0 $PID), but it hides process' euid
and egid.  It compicates intruder's task of gathering info about running
processes, whether some daemon runs with elevated privileges, whether
another user runs some sensitive program, whether other users run any
program at all, etc.

gid=XXX defines a group that will be able to gather all processes' info
(as in hidepid=0 mode).  This group should be used instead of putting
nonroot user in sudoers file or something.  However, untrusted users (like
daemons, etc.) which are not supposed to monitor the tasks in the whole
system should not be added to the group.

hidepid=1 or higher is designed to restrict access to procfs files, which
might reveal some sensitive private information like precise keystrokes
timings:

http://www.openwall.com/lists/oss-security/2011/11/05/3

hidepid=1/2 doesn't break monitoring userspace tools.  ps, top, pgrep, and
conky gracefully handle EPERM/ENOENT and behave as if the current user is
the only user running processes.  pstree shows the process subtree which
contains "pstree" process.

Note: the patch doesn't deal with setuid/setgid issues of keeping
preopened descriptors of procfs files (like
https://lkml.org/lkml/2011/2/7/368).  We rely on that the leaked
information like the scheduling counters of setuid apps doesn't threaten
anybody's privacy - only the user started the setuid program may read the
counters.

Signed-off-by: Vasiliy Kulikov &lt;segoon@openwall.com&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Randy Dunlap &lt;rdunlap@xenotime.net&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Greg KH &lt;greg@kroah.com&gt;
Cc: Theodore Tso &lt;tytso@MIT.EDU&gt;
Cc: Alan Cox &lt;alan@lxorguk.ukuu.org.uk&gt;
Cc: James Morris &lt;jmorris@namei.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>pid: generalize task_active_pid_ns</title>
<updated>2009-01-08T16:31:12Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2009-01-08T02:08:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=61bce0f1371cfff497fe85594fd39d1a0b15ebe1'/>
<id>urn:sha1:61bce0f1371cfff497fe85594fd39d1a0b15ebe1</id>
<content type='text'>
Currently task_active_pid_ns is not safe to call after a task becomes a
zombie and exit_task_namespaces is called, as nsproxy becomes NULL.  By
reading the pid namespace from the pid of the task we can trivially solve
this problem at the cost of one extra memory read in what should be the
same cacheline as we read the namespace from.

When moving things around I have made task_active_pid_ns out of line
because keeping it in pid_namespace.h would require adding includes of
pid.h and sched.h that I don't think we want.

This change does make task_active_pid_ns unsafe to call during
copy_process until we attach a pid on the task_struct which seems to be a
reasonable trade off.

Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Sukadev Bhattiprolu &lt;sukadev@linux.vnet.ibm.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Cc: Bastian Blank &lt;bastian@waldi.eu.org&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Nadia Derbey &lt;Nadia.Derbey@bull.net&gt;
Acked-by: Serge Hallyn &lt;serue@us.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>pid_ns: kill the now unused task_child_reaper()</title>
<updated>2008-10-16T18:21:48Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@tv-sign.ru</email>
</author>
<published>2008-10-16T05:04:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=25cbe53ef1cb828ae012f3955a5aa18117114439'/>
<id>urn:sha1:25cbe53ef1cb828ae012f3955a5aa18117114439</id>
<content type='text'>
task_child_reaper() has no callers anymore, kill it.

Signed-off-by: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Acked-by: Serge Hallyn &lt;serue@us.ibm.com&gt;
Acked-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Acked-by: Sukadev Bhattiprolu &lt;sukadev@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>pidns: add the struct bsd_acct_struct pointer on pid_namespace struct</title>
<updated>2008-07-25T17:53:46Z</updated>
<author>
<name>Pavel Emelyanov</name>
<email>xemul@openvz.org</email>
</author>
<published>2008-07-25T08:48:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=20fad13ac66ac001c19220d3d08b4de5b6cca6e1'/>
<id>urn:sha1:20fad13ac66ac001c19220d3d08b4de5b6cca6e1</id>
<content type='text'>
All the bsdacct-related info will be stored in the area, pointer by this
one.

It will be NULL automatically for all new namespaces.

Signed-off-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Balbir Singh &lt;balbir@in.ibm.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>proper pid{hash,map}_init() prototypes</title>
<updated>2008-07-25T17:53:45Z</updated>
<author>
<name>Adrian Bunk</name>
<email>bunk@kernel.org</email>
</author>
<published>2008-07-25T08:48:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3ae4eed34be0177a8e003411a84e4ee212adbced'/>
<id>urn:sha1:3ae4eed34be0177a8e003411a84e4ee212adbced</id>
<content type='text'>
This patch adds proper prototypes for pid{hash,map}_init() in
include/linux/pid_namespace.h

Signed-off-by: Adrian Bunk &lt;bunk@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>pidns: make pid-&gt;level and pid_ns-&gt;level unsigned</title>
<updated>2008-04-30T15:29:49Z</updated>
<author>
<name>Pavel Emelyanov</name>
<email>xemul@openvz.org</email>
</author>
<published>2008-04-30T07:54:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=caafa4324335aeb11bc233d5f87aca8cce30beba'/>
<id>urn:sha1:caafa4324335aeb11bc233d5f87aca8cce30beba</id>
<content type='text'>
These values represent the nesting level of a namespace and pids living in it,
and it's always non-negative.

Turning this from int to unsigned int saves some space in pid.c (11 bytes on
x86 and 64 on ia64) by letting the compiler optimize the pid_nr_ns a bit.
E.g.  on ia64 this removes the sign extension calls, which compiler adds to
optimize access to pid-&gt;nubers[ns-&gt;level].

Signed-off-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>namespaces: cleanup the code managed with PID_NS option</title>
<updated>2008-02-08T17:22:23Z</updated>
<author>
<name>Pavel Emelyanov</name>
<email>xemul@openvz.org</email>
</author>
<published>2008-02-08T12:18:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=74bd59bb39eb08b4379e2590c5f160748d83f812'/>
<id>urn:sha1:74bd59bb39eb08b4379e2590c5f160748d83f812</id>
<content type='text'>
Just like with the user namespaces, move the namespace management code into
the separate .c file and mark the (already existing) PID_NS option as "depend
on NAMESPACES"

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Acked-by: Serge Hallyn &lt;serue@us.ibm.com&gt;
Cc: Cedric Le Goater &lt;clg@fr.ibm.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Herbert Poetzl &lt;herbert@13thfloor.at&gt;
Cc: Kirill Korotaev &lt;dev@sw.ru&gt;
Cc: Sukadev Bhattiprolu &lt;sukadev@us.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>pidns: Place under CONFIG_EXPERIMENTAL</title>
<updated>2007-11-15T02:45:43Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2007-11-15T01:00:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=57d5f66b86079efac5c9a7843cce2a9bcbe58fb8'/>
<id>urn:sha1:57d5f66b86079efac5c9a7843cce2a9bcbe58fb8</id>
<content type='text'>
This is my trivial patch to swat innumerable little bugs with a single
blow.

After some intensive review (my apologies for not having gotten to this
sooner) what we have looks like a good base to build on with the current
pid namespace code but it is not complete, and it is still much to simple
to find issues where the kernel does the wrong thing outside of the initial
pid namespace.

Until the dust settles and we are certain we have the ABI and the
implementation is as correct as humanly possible let's keep process ID
namespaces behind CONFIG_EXPERIMENTAL.

Allowing us the option of fixing any ABI or other bugs we find as long as
they are minor.

Allowing users of the kernel to avoid those bugs simply by ensuring their
kernel does not have support for multiple pid namespaces.

[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Cc: Cedric Le Goater &lt;clg@fr.ibm.com&gt;
Cc: Adrian Bunk &lt;bunk@kernel.org&gt;
Cc: Jeremy Fitzhardinge &lt;jeremy@goop.org&gt;
Cc: Kir Kolyshkin &lt;kir@swsoft.com&gt;
Cc: Kirill Korotaev &lt;dev@sw.ru&gt;
Cc: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
