user/sven/linux.git/fs/exec.c, branch v3.16.52

exec: Ensure mm->user_ns contains the execed files

2018-01-01T20:52:11Z

commit f84df2a6f268de584a201e8911384a2d244876e3 upstream. When the user namespace support was merged the need to prevent ptrace from revealing the contents of an unreadable executable was overlooked. Correct this oversight by ensuring that the executed file or files are in mm->user_ns, by adjusting mm->user_ns. Use the new function privileged_wrt_inode_uidgid to see if the executable is a member of the user namespace, and as such if having CAP_SYS_PTRACE in the user namespace should allow tracing the executable. If not update mm->user_ns to the parent user namespace until an appropriate parent is found. Reported-by: Jann Horn Fixes: 9e4a36ece652 ("userns: Fail exec for suid and sgid binaries with ids outside our user namespace.") Signed-off-by: "Eric W. Biederman" [bwh: Backported to 3.16: - Add #include - Adjust context] Signed-off-by: Ben Hutchings

ptrace: Capture the ptracer's creds not PT_PTRACE_CAP

2018-01-01T20:52:10Z

commit 64b875f7ac8a5d60a4e191479299e931ee949b67 upstream. When the flag PT_PTRACE_CAP was added the PTRACE_TRACEME path was overlooked. This can result in incorrect behavior when an application like strace traces an exec of a setuid executable. Further PT_PTRACE_CAP does not have enough information for making good security decisions as it does not report which user namespace the capability is in. This has already allowed one mistake through insufficient granulariy. I found this issue when I was testing another corner case of exec and discovered that I could not get strace to set PT_PTRACE_CAP even when running strace as root with a full set of caps. This change fixes the above issue with strace allowing stracing as root a setuid executable without disabling setuid. More fundamentaly this change allows what is allowable at all times, by using the correct information in it's decision. Fixes: 4214e42f96d4 ("v2.4.9.11 -> v2.4.9.12") Signed-off-by: "Eric W. Biederman" [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings

sched: move no_new_privs into new atomic flags

2017-10-12T14:28:22Z

commit 1d4457f99928a968767f6405b4a1f50845aa15fd upstream. Since seccomp transitions between threads requires updates to the no_new_privs flag to be atomic, the flag must be part of an atomic flag set. This moves the nnp flag into a separate task field, and introduces accessors. Signed-off-by: Kees Cook Reviewed-by: Oleg Nesterov Reviewed-by: Andy Lutomirski [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings

fs/exec.c: account for argv/envp pointers

2017-07-18T17:40:40Z

commit 98da7d08850fb8bdeb395d6368ed15753304aa0c upstream. When limiting the argv/envp strings during exec to 1/4 of the stack limit, the storage of the pointers to the strings was not included. This means that an exec with huge numbers of tiny strings could eat 1/4 of the stack limit in strings and then additional space would be later used by the pointers to the strings. For example, on 32-bit with a 8MB stack rlimit, an exec with 1677721 single-byte strings would consume less than 2MB of stack, the max (8MB / 4) amount allowed, but the pointers to the strings would consume the remaining additional stack space (1677721 * 4 == 6710884). The result (1677721 + 6710884 == 8388605) would exhaust stack space entirely. Controlling this stack exhaustion could result in pathological behavior in setuid binaries (CVE-2017-1000365). [akpm@linux-foundation.org: additional commenting from Kees] Fixes: b6a2fea39318 ("mm: variable length argument support") Link: http://lkml.kernel.org/r/20170622001720.GA32173@beast Signed-off-by: Kees Cook Acked-by: Rik van Riel Acked-by: Michal Hocko Cc: Alexander Viro Cc: Qualys Security Advisory Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [bwh: Backported to 3.16: use ACCESS_ONCE() instead of READ_ONCE()] Signed-off-by: Ben Hutchings

vfs: Commit to never having exectuables on proc and sysfs.

2017-04-04T21:21:56Z

commit 22f6b4d34fcf039c63a94e7670e0da24f8575a5a upstream. Today proc and sysfs do not contain any executable files. Several applications today mount proc or sysfs without noexec and nosuid and then depend on there being no exectuables files on proc or sysfs. Having any executable files show on proc or sysfs would cause a user space visible regression, and most likely security problems. Therefore commit to never allowing executables on proc and sysfs by adding a new flag to mark them as filesystems without executables and enforce that flag. Test the flag where MNT_NOEXEC is tested today, so that the only user visible effect will be that exectuables will be treated as if the execute bit is cleared. The filesystems proc and sysfs do not currently incoporate any executable files so this does not result in any user visible effects. This makes it unnecessary to vet changes to proc and sysfs tightly for adding exectuable files or changes to chattr that would modify existing files, as no matter what the individual file say they will not be treated as exectuable files by the vfs. Not having to vet changes to closely is important as without this we are only one proc_create call (or another goof up in the implementation of notify_change) from having problematic executables on proc. Those mistakes are all too easy to make and would create a situation where there are security issues or the assumptions of some program having to be broken (and cause userspace regressions). Signed-off-by: "Eric W. Biederman" [bwh: Backported to 3.16: we don't have super_block::s_iflags; use file_system_type::fs_flags instead] Signed-off-by: Ben Hutchings

fs: exec: apply CLOEXEC before changing dumpable task flags

2017-03-16T02:26:38Z

commit 613cc2b6f272c1a8ad33aefa21cad77af23139f7 upstream. If you have a process that has set itself to be non-dumpable, and it then undergoes exec(2), any CLOEXEC file descriptors it has open are "exposed" during a race window between the dumpable flags of the process being reset for exec(2) and CLOEXEC being applied to the file descriptors. This can be exploited by a process by attempting to access /proc//fd/... during this window, without requiring CAP_SYS_PTRACE. The race in question is after set_dumpable has been (for get_link, though the trace is basically the same for readlink): [vfs] -> proc_pid_link_inode_operations.get_link -> proc_pid_get_link -> proc_fd_access_allowed -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS); Which will return 0, during the race window and CLOEXEC file descriptors will still be open during this window because do_close_on_exec has not been called yet. As a result, the ordering of these calls should be reversed to avoid this race window. This is of particular concern to container runtimes, where joining a PID namespace with file descriptors referring to the host filesystem can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect against access of CLOEXEC file descriptors -- file descriptors which may reference filesystem objects the container shouldn't have access to). Cc: dev@opencontainers.org Reported-by: Michael Crosby Signed-off-by: Aleksa Sarai Signed-off-by: Al Viro Signed-off-by: Ben Hutchings

parisc,metag: Fix crashes due to stack randomization on stack-grows-upwards architectures

2015-05-28T09:00:03Z

commit d045c77c1a69703143a36169c224429c48b9eecd upstream. On architectures where the stack grows upwards (CONFIG_STACK_GROWSUP=y, currently parisc and metag only) stack randomization sometimes leads to crashes when the stack ulimit is set to lower values than STACK_RND_MASK (which is 8 MB by default if not defined in arch-specific headers). The problem is, that when the stack vm_area_struct is set up in fs/exec.c, the additional space needed for the stack randomization (as defined by the value of STACK_RND_MASK) was not taken into account yet and as such, when the stack randomization code added a random offset to the stack start, the stack effectively got smaller than what the user defined via rlimit_max(RLIMIT_STACK) which then sometimes leads to out-of-stack situations and crashes. This patch fixes it by adding the maximum possible amount of memory (based on STACK_RND_MASK) which theoretically could be added by the stack randomization code to the initial stack size. That way, the user-defined stack size is always guaranteed to be at minimum what is defined via rlimit_max(RLIMIT_STACK). This bug is currently not visible on the metag architecture, because on metag STACK_RND_MASK is defined to 0 which effectively disables stack randomization. The changes to fs/exec.c are inside an "#ifdef CONFIG_STACK_GROWSUP" section, so it does not affect other platformws beside those where the stack grows upwards (parisc and metag). Signed-off-by: Helge Deller Cc: linux-parisc@vger.kernel.org Cc: James Hogan Cc: linux-metag@vger.kernel.org Signed-off-by: Luis Henriques

fs: take i_mutex during prepare_binprm for set[ug]id executables

2015-04-28T11:35:37Z

commit 8b01fc86b9f425899f8a3a8fc1c47d73c2c20543 upstream. This prevents a race between chown() and execve(), where chowning a setuid-user binary to root would momentarily make the binary setuid root. This patch was mostly written by Linus Torvalds. Signed-off-by: Jann Horn Signed-off-by: Linus Torvalds [ luis: backported to 3.16: - relaced task_no_new_privs() by current->no_new_privs - replaced READ_ONCE() by ACCESS_ONCE() ] Signed-off-by: Luis Henriques

perf: Differentiate exec() and non-exec() comm events

2014-06-06T05:56:22Z

perf tools like 'perf report' can aggregate samples by comm strings, which generally works. However, there are other potential use-cases. For example, to pair up 'calls' with 'returns' accurately (from branch events like Intel BTS) it is necessary to identify whether the process has exec'd. Although a comm event is generated when an 'exec' happens it is also generated whenever the comm string is changed on a whim (e.g. by prctl PR_SET_NAME). This patch adds a flag to the comm event to differentiate one case from the other. In order to determine whether the kernel supports the new flag, a selection bit named 'exec' is added to struct perf_event_attr. The bit does nothing but will cause perf_event_open() to fail if the bit is set on kernels that do not have it defined. Signed-off-by: Adrian Hunter Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/537D9EBE.7030806@intel.com Cc: Paul Mackerras Cc: Dave Jones Cc: Arnaldo Carvalho de Melo Cc: David Ahern Cc: Jiri Olsa Cc: Alexander Viro Cc: Linus Torvalds Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar

perf: Fix perf_event_comm() vs. exec() assumption

2014-06-06T05:54:02Z

perf_event_comm() assumes that set_task_comm() is only called on exec(), and in particular that its only called on current. Neither are true, as Dave reported a WARN triggered by set_task_comm() being called on !current. Separate the exec() hook from the comm hook. Reported-by: Dave Jones Signed-off-by: Peter Zijlstra Cc: Alexander Viro Cc: Arnaldo Carvalho de Melo Cc: Linus Torvalds Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20140521153219.GH5226@laptop.programming.kicks-ass.net [ Build fix. ] Signed-off-by: Ingo Molnar