user/sven/linux.git/kernel/seccomp.c, branch v6.16.2

seccomp: avoid the lock trip seccomp_filter_release in common case

2025-02-24T19:17:10Z

Vast majority of threads don't have any seccomp filters, all while the lock taken here is shared between all threads in given process and frequently used. Safety of the check relies on the following: - seccomp_filter_release is only legally called for PF_EXITING threads - SIGNAL_GROUP_EXIT is only ever set with the sighand lock held - PF_EXITING is only ever set with the sighand lock held *or* after SIGNAL_GROUP_EXIT is set *or* the process is single-threaded - seccomp_sync_threads holds the sighand lock and skips all threads if SIGNAL_GROUP_EXIT is set, PF_EXITING threads if not Resulting reduction of contention gives me a 5% boost in a microbenchmark spawning and killing threads within the same process. Signed-off-by: Mateusz Guzik Link: https://lore.kernel.org/r/20250213170911.1140187-1-mjguzik@gmail.com Signed-off-by: Kees Cook

seccomp: remove the 'sd' argument from __seccomp_filter()

2025-02-10T17:26:22Z

After the previous change 'sd' is always NULL. Signed-off-by: Oleg Nesterov Reviewed-by: Kees Cook Link: https://lore.kernel.org/r/20250128150321.GA15343@redhat.com Signed-off-by: Kees Cook

seccomp: remove the 'sd' argument from __secure_computing()

2025-02-10T17:26:22Z

After the previous changes 'sd' is always NULL. Signed-off-by: Oleg Nesterov Reviewed-by: Kees Cook Link: https://lore.kernel.org/r/20250128150313.GA15336@redhat.com Signed-off-by: Kees Cook

seccomp: fix the __secure_computing() stub for !HAVE_ARCH_SECCOMP_FILTER

2025-02-10T17:26:22Z

Depending on CONFIG_HAVE_ARCH_SECCOMP_FILTER, __secure_computing(NULL) will crash or not. This is not consistent/safe, especially considering that after the previous change __secure_computing(sd) is always called with sd == NULL. Fortunately, if CONFIG_HAVE_ARCH_SECCOMP_FILTER=n, __secure_computing() has no callers, these architectures use secure_computing_strict(). Yet it make sense make __secure_computing(NULL) safe in this case. Note also that with this change we can unexport secure_computing_strict() and change the current callers to use __secure_computing(NULL). Fixes: 8cf8dfceebda ("seccomp: Stub for !HAVE_ARCH_SECCOMP_FILTER") Signed-off-by: Oleg Nesterov Reviewed-by: Linus Walleij Link: https://lore.kernel.org/r/20250128150307.GA15325@redhat.com Signed-off-by: Kees Cook

seccomp: passthrough uretprobe systemcall without filtering

2025-02-06T20:48:21Z

When attaching uretprobes to processes running inside docker, the attached process is segfaulted when encountering the retprobe. The reason is that now that uretprobe is a system call the default seccomp filters in docker block it as they only allow a specific set of known syscalls. This is true for other userspace applications which use seccomp to control their syscall surface. Since uretprobe is a "kernel implementation detail" system call which is not used by userspace application code directly, it is impractical and there's very little point in forcing all userspace applications to explicitly allow it in order to avoid crashing tracked processes. Pass this systemcall through seccomp without depending on configuration. Note: uretprobe is currently only x86_64 and isn't expected to ever be supported in i386. Fixes: ff474a78cef5 ("uprobe: Add uretprobe syscall to speed up return probe") Reported-by: Rafael Buchbinder Closes: https://lore.kernel.org/lkml/CAHsH6Gs3Eh8DFU0wq58c_LF8A4_+o6z456J7BidmcVY2AqOnHQ@mail.gmail.com/ Link: https://lore.kernel.org/lkml/20250121182939.33d05470@gandalf.local.home/T/#me2676c378eff2d6a33f3054fed4a5f3afa64e65b Link: https://lore.kernel.org/lkml/20250128145806.1849977-1-eyal.birger@gmail.com/ Cc: stable@vger.kernel.org Signed-off-by: Eyal Birger Link: https://lore.kernel.org/r/20250202162921.335813-2-eyal.birger@gmail.com [kees: minimized changes for easier backporting, tweaked commit log] Signed-off-by: Kees Cook

treewide: const qualify ctl_tables where applicable

2025-01-28T12:48:37Z

Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysctl_table and the ones calling register_net_sysctl (./net, drivers/inifiniband dirs). These are special cases as they use a registration function with a non-const qualified ctl_table argument or modify the arrays before passing them on to the registration function. Constifying ctl_table structs will prevent the modification of proc_handler function pointers as the arrays would reside in .rodata. This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide: constify the ctl_table argument of proc_handlers") constified all the proc_handlers. Created this by running an spatch followed by a sed command: Spatch: virtual patch @ depends on !(file in "net") disable optional_qualifier @ identifier table_name != { watchdog_hardlockup_sysctl, iwcm_ctl_table, ucma_ctl_table, memory_allocation_profiling_sysctls, loadpin_sysctl_table }; @@ + const struct ctl_table table_name [] = { ... }; sed: sed --in-place \ -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \ kernel/utsname_sysctl.c Reviewed-by: Song Liu Acked-by: Steven Rostedt (Google) # for kernel/trace/ Reviewed-by: Martin K. Petersen # SCSI Reviewed-by: Darrick J. Wong # xfs Acked-by: Jani Nikula Acked-by: Corey Minyard Acked-by: Wei Liu Acked-by: Thomas Gleixner Reviewed-by: Bill O'Donnell Acked-by: Baoquan He Acked-by: Ashutosh Dixit Acked-by: Anna Schumaker Signed-off-by: Joel Granados

sysctl: treewide: constify the ctl_table argument of proc_handlers

2024-07-24T18:59:29Z

const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified. This patch has been generated by the following coccinelle script: ``` virtual patch @r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)"; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos); @r2@ identifier func, ctl, write, buffer, lenp, ppos; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos) { ... } @r3@ identifier func; @@ int func( - struct ctl_table * + const struct ctl_table * ,int , void *, size_t *, loff_t *); @r4@ identifier func, ctl; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int , void *, size_t *, loff_t *); @r5@ identifier func, write, buffer, lenp, ppos; @@ int func( - struct ctl_table * + const struct ctl_table * ,int write, void *buffer, size_t *lenp, loff_t *ppos); ``` * Code formatting was adjusted in xfs_sysctl.c to comply with code conventions. The xfs_stats_clear_proc_handler, xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where adjusted. * The ctl_table argument in proc_watchdog_common was const qualified. This is called from a proc_handler itself and is calling back into another proc_handler, making it necessary to change it as part of the proc_handler migration. Co-developed-by: Thomas Weißschuh Signed-off-by: Thomas Weißschuh Co-developed-by: Joel Granados Signed-off-by: Joel Granados

seccomp: release task filters when the task exits

2024-06-28T16:37:11Z

Previously, seccomp filters were released in release_task(), which required the process to exit and its zombie to be collected. However, exited threads/processes can't trigger any seccomp events, making it more logical to release filters upon task exits. This adjustment simplifies scenarios where a parent is tracing its child process. The parent process can now handle all events from a seccomp listening descriptor and then call wait to collect a child zombie. seccomp_filter_release takes the siglock to avoid races with seccomp_sync_threads. There was an idea to bypass taking the lock by checking PF_EXITING, but it can be set without holding siglock if threads have SIGNAL_GROUP_EXIT. This means it can happen concurently with seccomp_filter_release. This change also fixes another minor problem. Suppose that a group leader installs the new filter without SECCOMP_FILTER_FLAG_TSYNC, exits, and becomes a zombie. Without this change, SECCOMP_FILTER_FLAG_TSYNC from any other thread can never succeed, seccomp_can_sync_threads() will check a zombie leader and is_ancestor() will fail. Reviewed-by: Oleg Nesterov Signed-off-by: Andrei Vagin Link: https://lore.kernel.org/r/20240628021014.231976-3-avagin@google.com Reviewed-by: Tycho Andersen Signed-off-by: Kees Cook

seccomp: interrupt SECCOMP_IOCTL_NOTIF_RECV when all users have exited

2024-06-28T16:37:11Z

SECCOMP_IOCTL_NOTIF_RECV promptly returns when a seccomp filter becomes unused, as a filter without users can't trigger any events. Previously, event listeners had to rely on epoll to detect when all processes had exited. The change is based on the 'commit 99cdb8b9a573 ("seccomp: notify about unused filter")' which implemented (E)POLLHUP notifications. Reviewed-by: Christian Brauner Signed-off-by: Andrei Vagin Reviewed-by: Oleg Nesterov Link: https://lore.kernel.org/r/20240628021014.231976-2-avagin@google.com Reviewed-by: Tycho Andersen Signed-off-by: Kees Cook

Merge tag 'sysctl-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl

2024-05-18T00:31:24Z

Pull sysctl updates from Joel Granados: - Remove sentinel elements from ctl_table structs in kernel/* Removing sentinels in ctl_table arrays reduces the build time size and runtime memory consumed by ~64 bytes per array. Removals for net/, io_uring/, mm/, ipc/ and security/ are set to go into mainline through their respective subsystems making the next release the most likely place where the final series that removes the check for proc_name == NULL will land. This adds to removals already in arch/, drivers/ and fs/. - Adjust ctl_table definitions and references to allow constification - Remove unused ctl_table function arguments - Move non-const elements from ctl_table to ctl_table_header - Make ctl_table pointers const in ctl_table_root structure Making the static ctl_table structs const will increase safety by keeping the pointers to proc_handler functions in .rodata. Though no ctl_tables where made const in this PR, the ground work for making that possible has started with these changes sent by Thomas Weißschuh. * tag 'sysctl-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: sysctl: drop now unnecessary out-of-bounds check sysctl: move sysctl type to ctl_table_header sysctl: drop sysctl_is_perm_empty_ctl_table sysctl: treewide: constify argument ctl_table_root::permissions(table) sysctl: treewide: drop unused argument ctl_table_root::set_ownership(table) bpf: Remove the now superfluous sentinel elements from ctl_table array delayacct: Remove the now superfluous sentinel elements from ctl_table array kprobes: Remove the now superfluous sentinel elements from ctl_table array printk: Remove the now superfluous sentinel elements from ctl_table array scheduler: Remove the now superfluous sentinel elements from ctl_table array seccomp: Remove the now superfluous sentinel elements from ctl_table array timekeeping: Remove the now superfluous sentinel elements from ctl_table array ftrace: Remove the now superfluous sentinel elements from ctl_table array umh: Remove the now superfluous sentinel elements from ctl_table array kernel misc: Remove the now superfluous sentinel elements from ctl_table array