user/sven/linux.git/kernel, branch v5.14.18

bpf: Use kvmalloc for map values in syscall

2021-11-02T18:51:00Z

[ Upstream commit f0dce1d9b7c81fc3dc9d0cc0bc7ef9b3eae22584 ] Use kvmalloc/kvfree for temporary value when manipulating a map via syscall. kmalloc might not be sufficient for percpu maps where the value is big (and further multiplied by hundreds of CPUs). Can be reproduced with netcnt test on qemu with "-smp 255". Signed-off-by: Stanislav Fomichev Signed-off-by: Daniel Borkmann Acked-by: Song Liu Link: https://lore.kernel.org/bpf/20210818235216.1159202-1-sdf@google.com Signed-off-by: Sasha Levin

bpf: Fix error usage of map_fd and fdget() in generic_map_update_batch()

2021-11-02T18:50:56Z

commit fda7a38714f40b635f5502ec4855602c6b33dad2 upstream. 1. The ufd in generic_map_update_batch() should be read from batch.map_fd; 2. A call to fdget() should be followed by a symmetric call to fdput(). Fixes: aa2e93b8e58e ("bpf: Add generic support for update and delete batch ops") Signed-off-by: Xu Kuohai Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20211019032934.1210517-1-xukuohai@huawei.com Signed-off-by: Greg Kroah-Hartman

bpf: Fix potential race in tail call compatibility check

2021-11-02T18:50:56Z

commit 54713c85f536048e685258f880bf298a74c3620d upstream. Lorenzo noticed that the code testing for program type compatibility of tail call maps is potentially racy in that two threads could encounter a map with an unset type simultaneously and both return true even though they are inserting incompatible programs. The race window is quite small, but artificially enlarging it by adding a usleep_range() inside the check in bpf_prog_array_compatible() makes it trivial to trigger from userspace with a program that does, essentially: map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, 4, 4, 2, 0); pid = fork(); if (pid) { key = 0; value = xdp_fd; } else { key = 1; value = tc_fd; } err = bpf_map_update_elem(map_fd, &key, &value, 0); While the race window is small, it has potentially serious ramifications in that triggering it would allow a BPF program to tail call to a program of a different type. So let's get rid of it by protecting the update with a spinlock. The commit in the Fixes tag is the last commit that touches the code in question. v2: - Use a spinlock instead of an atomic variable and cmpxchg() (Alexei) v3: - Put lock and the members it protects into an embedded 'owner' struct (Daniel) Fixes: 3324b584b6f6 ("ebpf: misc core cleanup") Reported-by: Lorenzo Bianconi Signed-off-by: Toke Høiland-Jørgensen Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20211026110019.363464-1-toke@redhat.com Signed-off-by: Greg Kroah-Hartman

cgroup: Fix memory leak caused by missing cgroup_bpf_offline

2021-11-02T18:50:56Z

commit 04f8ef5643bcd8bcde25dfdebef998aea480b2ba upstream. When enabling CONFIG_CGROUP_BPF, kmemleak can be observed by running the command as below: $mount -t cgroup -o none,name=foo cgroup cgroup/ $umount cgroup/ unreferenced object 0xc3585c40 (size 64): comm "mount", pid 425, jiffies 4294959825 (age 31.990s) hex dump (first 32 bytes): 01 00 00 80 84 8c 28 c0 00 00 00 00 00 00 00 00 ......(......... 00 00 00 00 00 00 00 00 6c 43 a0 c3 00 00 00 00 ........lC...... backtrace: [] cgroup_bpf_inherit+0x44/0x24c [<1f03679c>] cgroup_setup_root+0x174/0x37c [] cgroup1_get_tree+0x2c0/0x4a0 [] vfs_get_tree+0x24/0x108 [] path_mount+0x384/0x988 [] do_mount+0x64/0x9c [<208c9cfe>] sys_mount+0xfc/0x1f4 [<06dd06e0>] ret_fast_syscall+0x0/0x48 [] 0xbeb4daa8 This is because that since the commit 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path") root_cgrp->bpf.refcnt.data is allocated by the function percpu_ref_init in cgroup_bpf_inherit which is called by cgroup_setup_root when mounting, but not freed along with root_cgrp when umounting. Adding cgroup_bpf_offline which calls percpu_ref_kill to cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in umount path. This patch also fixes the commit 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself"). A cgroup_bpf_offline is needed to do a cleanup that frees the resources which are allocated by cgroup_bpf_inherit in cgroup_setup_root. And inside cgroup_bpf_offline, cgroup_get() is at the beginning and cgroup_put is at the end of cgroup_bpf_release which is called by cgroup_bpf_offline. So cgroup_bpf_offline can keep the balance of cgroup's refcount. Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path") Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself") Signed-off-by: Quanyang Wang Signed-off-by: Alexei Starovoitov Acked-by: Roman Gushchin Acked-by: John Fastabend Link: https://lore.kernel.org/bpf/20211018075623.26884-1-quanyang.wang@windriver.com Signed-off-by: Greg Kroah-Hartman

sched/scs: Reset the shadow stack when idle_task_exit

2021-10-27T07:59:53Z

[ Upstream commit 63acd42c0d4942f74710b11c38602fb14dea7320 ] Commit f1a0a376ca0c ("sched/core: Initialize the idle task with preemption disabled") removed the init_idle() call from idle_thread_get(). This was the sole call-path on hotplug that resets the Shadow Call Stack (scs) Stack Pointer (sp). Not resetting the scs-sp leads to scs overflow after enough hotplug cycles. Therefore add an explicit scs_task_reset() to the hotplug code to make sure the scs-sp does get reset on hotplug. Fixes: f1a0a376ca0c ("sched/core: Initialize the idle task with preemption disabled") Signed-off-by: Woody Lin [peterz: Changelog] Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Valentin Schneider Link: https://lore.kernel.org/r/20211012083521.973587-1-woodylin@google.com Signed-off-by: Sasha Levin

ucounts: Fix signal ucount refcounting

2021-10-27T07:59:45Z

commit 15bc01effefe97757ef02ca09e9d1b927ab22725 upstream. In commit fda31c50292a ("signal: avoid double atomic counter increments for user accounting") Linus made a clever optimization to how rlimits and the struct user_struct. Unfortunately that optimization does not work in the obvious way when moved to nested rlimits. The problem is that the last decrement of the per user namespace per user sigpending counter might also be the last decrement of the sigpending counter in the parent user namespace as well. Which means that simply freeing the leaf ucount in __free_sigqueue is not enough. Maintain the optimization and handle the tricky cases by introducing inc_rlimit_get_ucounts and dec_rlimit_put_ucounts. By moving the entire optimization into functions that perform all of the work it becomes possible to ensure that every level is handled properly. The new function inc_rlimit_get_ucounts returns 0 on failure to increment the ucount. This is different than inc_rlimit_ucounts which increments the ucounts and returns LONG_MAX if the ucount counter has exceeded it's maximum or it wrapped (to indicate the counter needs to decremented). I wish we had a single user to account all pending signals to across all of the threads of a process so this complexity was not necessary Cc: stable@vger.kernel.org Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") v1: https://lkml.kernel.org/r/87mtnavszx.fsf_-_@disp2133 Link: https://lkml.kernel.org/r/87fssytizw.fsf_-_@disp2133 Reviewed-by: Alexey Gladkov Tested-by: Rune Kleveland Tested-by: Yu Zhao Tested-by: Jordan Glover Signed-off-by: "Eric W. Biederman" Signed-off-by: Greg Kroah-Hartman

ucounts: Proper error handling in set_cred_ucounts

2021-10-27T07:59:45Z

commit 34dc2fd6e6908499b669c7b45320cddf38b332e1 upstream. Instead of leaking the ucounts in new if alloc_ucounts fails, store the result of alloc_ucounts into a temporary variable, which is later assigned to new->ucounts. Cc: stable@vger.kernel.org Fixes: 905ae01c4ae2 ("Add a reference to ucounts for each cred") Link: https://lkml.kernel.org/r/87pms2s0v8.fsf_-_@disp2133 Tested-by: Yu Zhao Reviewed-by: Alexey Gladkov Signed-off-by: "Eric W. Biederman" Signed-off-by: Greg Kroah-Hartman

ucounts: Pair inc_rlimit_ucounts with dec_rlimit_ucoutns in commit_creds

2021-10-27T07:59:44Z

commit 629715adc62b0ad27ab04d0aa73a71927f886910 upstream. The purpose of inc_rlimit_ucounts and dec_rlimit_ucounts in commit_creds is to change which rlimit counter is used to track a process when the credentials changes. Use the same test for both to guarantee the tracking is correct. Cc: stable@vger.kernel.org Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts") Link: https://lkml.kernel.org/r/87v91us0w4.fsf_-_@disp2133 Tested-by: Yu Zhao Reviewed-by: Alexey Gladkov Signed-off-by: "Eric W. Biederman" Signed-off-by: Greg Kroah-Hartman

ucounts: Move get_ucounts from cred_alloc_blank to key_change_session_keyring

2021-10-27T07:59:44Z

commit 5ebcbe342b1c12fae44b4f83cbeae1520e09857e upstream. Setting cred->ucounts in cred_alloc_blank does not make sense. The uid and user_ns are deliberately not set in cred_alloc_blank but instead the setting is delayed until key_change_session_keyring. So move dealing with ucounts into key_change_session_keyring as well. Unfortunately that movement of get_ucounts adds a new failure mode to key_change_session_keyring. I do not see anything stopping the parent process from calling setuid and changing the relevant part of it's cred while keyctl_session_to_parent is running making it fundamentally necessary to call get_ucounts in key_change_session_keyring. Which means that the new failure mode cannot be avoided. A failure of key_change_session_keyring results in a single threaded parent keeping it's existing credentials. Which results in the parent process not being able to access the session keyring and whichever keys are in the new keyring. Further get_ucounts is only expected to fail if the number of bits in the refernece count for the structure is too few. Since the code has no other way to report the failure of get_ucounts and because such failures are not expected to be common add a WARN_ONCE to report this problem to userspace. Between the WARN_ONCE and the parent process not having access to the keys in the new session keyring I expect any failure of get_ucounts will be noticed and reported and we can find another way to handle this condition. (Possibly by just making ucounts->count an atomic_long_t). Cc: stable@vger.kernel.org Fixes: 905ae01c4ae2 ("Add a reference to ucounts for each cred") Link: https://lkml.kernel.org/r/7k0ias0uf.fsf_-_@disp2133 Tested-by: Yu Zhao Reviewed-by: Alexey Gladkov Signed-off-by: "Eric W. Biederman" Signed-off-by: Greg Kroah-Hartman

audit: fix possible null-pointer dereference in audit_filter_rules

2021-10-27T07:59:44Z

commit 6e3ee990c90494561921c756481d0e2125d8b895 upstream. Fix possible null-pointer dereference in audit_filter_rules. audit_filter_rules() error: we previously assumed 'ctx' could be null Cc: stable@vger.kernel.org Fixes: bf361231c295 ("audit: add saddr_fam filter field") Reported-by: kernel test robot Reported-by: Dan Carpenter Signed-off-by: Gaosheng Cui Signed-off-by: Paul Moore Signed-off-by: Greg Kroah-Hartman