user/sven/linux.git - Linux Kernel

Age	Commit message (Collapse)	Author
44 hours	Merge tag 'mm-stable-2026-04-13-21-45' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel\|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...
4 days	Merge tag 'selinux-pr-20260410' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux update from Paul Moore: - Annotate a known race condition to soothe KCSAN * tag 'selinux-pr-20260410' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: annotate intentional data race in inode_doinit_with_dentry()
4 days	Merge tag 'lsm-pr-20260410' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull LSM updates from Paul Moore: "We only have five patches in the LSM tree, but three of the five are for an important bugfix relating to overlayfs and the mmap() and mprotect() access controls for LSMs. Highlights below: - Fix problems with the mmap() and mprotect() LSM hooks on overlayfs As we are dealing with problems both in mmap() and mprotect() there are essentially two components to this fix, spread across three patches with all marked for stable. The simplest portion of the fix is the creation of a new LSM hook, security_mmap_backing_file(), that is used to enforce LSM mmap() access controls on backing files in the stacked/overlayfs case. The existing security_mmap_file() does not have visibility past the user file. You can see from the associated SELinux hook callback the code is fairly straightforward. The mprotect() fix is a bit more complicated as there is no way in the mprotect() code path to inspect both the user and backing files, and bolting on a second file reference to vm_area_struct wasn't really an option. The solution taken here adds a LSM security blob and associated hooks to the backing_file struct that LSMs can use to capture and store relevant information from the user file. While the necessary SELinux information is relatively small, a single u32, I expect other LSMs to require more than that, and a dedicated backing_file LSM blob provides a storage mechanism without negatively impacting other filesystems. I want to note that other LSMs beyond SELinux have been involved in the discussion of the fixes presented here and they are working on their own related changes using these new hooks, but due to other issues those patches will be coming at a later date. - Use kstrdup_const()/kfree_const() for securityfs symlink targets - Resolve a handful of kernel-doc warnings in cred.h" * tag 'lsm-pr-20260410' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: selinux: fix overlayfs mmap() and mprotect() access checks lsm: add backing_file LSM hooks fs: prepare for adding LSM blob to backing_file securityfs: use kstrdup_const() to manage symlink targets cred: fix kernel-doc warnings in cred.h
4 days	Merge tag 'vfs-7.1-rc1.kino' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs i_ino updates from Christian Brauner: "For historical reasons, the inode->i_ino field is an unsigned long, which means that it's 32 bits on 32 bit architectures. This has caused a number of filesystems to implement hacks to hash a 64-bit identifier into a 32-bit field, and deprives us of a universal identifier field for an inode. This changes the inode->i_ino field from an unsigned long to a u64. This shouldn't make any material difference on 64-bit hosts, but 32-bit hosts will see struct inode grow by at least 4 bytes. This could have effects on slabcache sizes and field alignment. The bulk of the changes are to format strings and tracepoints, since the kernel itself doesn't care that much about the i_ino field. The first patch changes some vfs function arguments, so check that one out carefully. With this change, we may be able to shrink some inode structures. For instance, struct nfs_inode has a fileid field that holds the 64-bit inode number. With this set of changes, that field could be eliminated. I'd rather leave that sort of cleanups for later just to keep this simple" * tag 'vfs-7.1-rc1.kino' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nilfs2: fix 64-bit division operations in nilfs_bmap_find_target_in_group() EVM: add comment describing why ino field is still unsigned long vfs: remove externs from fs.h on functions modified by i_ino widening treewide: fix missed i_ino format specifier conversions ext4: fix signed format specifier in ext4_load_inode trace event treewide: change inode->i_ino from unsigned long to u64 nilfs2: widen trace event i_ino fields to u64 f2fs: widen trace event i_ino fields to u64 ext4: widen trace event i_ino fields to u64 zonefs: widen trace event i_ino fields to u64 hugetlbfs: widen trace event i_ino fields to u64 ext2: widen trace event i_ino fields to u64 cachefiles: widen trace event i_ino fields to u64 vfs: widen trace event i_ino fields to u64 net: change sock.sk_ino and sock_i_ino() to u64 audit: widen ino fields to u64 vfs: widen inode hash/lookup functions to u64
12 days	mm: convert do_brk_flags() to use vma_flags_t	Lorenzo Stoakes (Oracle)
	In order to be able to do this, we need to change VM_DATA_DEFAULT_FLAGS and friends and update the architecture-specific definitions also. We then have to update some KSM logic to handle VMA flags, and introduce VMA_STACK_FLAGS to define the vma_flags_t equivalent of VM_STACK_FLAGS. We also introduce two helper functions for use during the time we are converting legacy flags to vma_flags_t values - vma_flags_to_legacy() and legacy_to_vma_flags(). This enables us to iteratively make changes to break these changes up into separate parts. We use these explicitly here to keep VM_STACK_FLAGS around for certain users which need to maintain the legacy vm_flags_t values for the time being. We are no longer able to rely on the simple VM_xxx being set to zero if the feature is not enabled, so in the case of VM_DROPPABLE we introduce VMA_DROPPABLE as the vma_flags_t equivalent, which is set to EMPTY_VMA_FLAGS if the droppable flag is not available. While we're here, we make the description of do_brk_flags() into a kdoc comment, as it almost was already. We use vma_flags_to_legacy() to not need to update the vm_get_page_prot() logic as this time. Note that in create_init_stack_vma() we have to replace the BUILD_BUG_ON() with a VM_WARN_ON_ONCE() as the tested values are no longer build time available. We also update mprotect_fixup() to use VMA flags where possible, though we have to live with a little duplication between vm_flags_t and vma_flags_t values for the time being until further conversions are made. While we're here, update VM_SPECIAL to be defined in terms of VMA_SPECIAL_FLAGS now we have vma_flags_to_legacy(). Finally, we update the VMA tests to reflect these changes. Link: https://lkml.kernel.org/r/d02e3e45d9a33d7904b149f5604904089fd640ae.1774034900.git.ljs@kernel.org Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> [SELinux] Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Kees Cook <kees@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Ondrej Mosnacek <omosnace@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Stephen Smalley <stephen.smalley.work@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Cc: xu xin <xu.xin16@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
14 days	selinux: fix overlayfs mmap() and mprotect() access checks	Paul Moore
	The existing SELinux security model for overlayfs is to allow access if the current task is able to access the top level file (the "user" file) and the mounter's credentials are sufficient to access the lower level file (the "backing" file). Unfortunately, the current code does not properly enforce these access controls for both mmap() and mprotect() operations on overlayfs filesystems. This patch makes use of the newly created security_mmap_backing_file() LSM hook to provide the missing backing file enforcement for mmap() operations, and leverages the backing file API and new LSM blob to provide the necessary information to properly enforce the mprotect() access controls. Cc: stable@vger.kernel.org Acked-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2026-03-06	treewide: change inode->i_ino from unsigned long to u64	Jeff Layton
	On 32-bit architectures, unsigned long is only 32 bits wide, which causes 64-bit inode numbers to be silently truncated. Several filesystems (NFS, XFS, BTRFS, etc.) can generate inode numbers that exceed 32 bits, and this truncation can lead to inode number collisions and other subtle bugs on 32-bit systems. Change the type of inode->i_ino from unsigned long to u64 to ensure that inode numbers are always represented as 64-bit values regardless of architecture. Update all format specifiers treewide from %lu/%lx to %llu/%llx to match the new type, along with corresponding local variable types. This is the bulk treewide conversion. Earlier patches in this series handled trace events separately to allow trace field reordering for better struct packing on 32-bit. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260304-iino-u64-v3-12-2257ad83d372@kernel.org Acked-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-06	selinux: Use simple_start_creating() / simple_done_creating()	NeilBrown
	Instead of explicitly locking the parent and performing a lookup in selinux, use simple_start_creating(), and then use simple_done_creating() to unlock. This extends the region that the directory is locked for, and also performs a lookup. The lock extension is of no real consequence. The lookup uses simple_lookup() and so always succeeds. Thus when d_make_persistent() is called the dentry will already be hashed. d_make_persistent() handles this case. Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20260224222542.3458677-7-neilb@ownmail.net Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-23	selinux: annotate intentional data race in inode_doinit_with_dentry()	Christian Göttsche
	Like other `isec->initialized == LABEL_INITIALIZED` checks annotate the check in inode_doinit_with_dentry() with data_race to please KCSAN. The value is rechecked next with lock held. Reported-by: syzbot+9ab96b38b76bec93939a@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9ab96b38b76bec93939a Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2026-02-22	Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses	Kees Cook
	Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21	Convert more 'alloc_obj' cases to default GFP_KERNEL arguments	Linus Torvalds
	This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument	Linus Torvalds
	This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21	treewide: Replace kmalloc with kmalloc_obj for non-scalar types	Kees Cook
	This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-12	Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves disk space by teaching ocfs2 to reclaim suballocator block group space (Heming Zhao) - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the ARRAY_END() macro and uses it in various places (Alejandro Colomar) - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the page size (Pnina Feder) - "kallsyms: Prevent invalid access when showing module buildid" cleans up kallsyms code related to module buildid and fixes an invalid access crash when printing backtraces (Petr Mladek) - "Address page fault in ima_restore_measurement_list()" fixes a kexec-related crash that can occur when booting the second-stage kernel on x86 (Harshit Mogalapalli) - "kho: ABI headers and Documentation updates" updates the kexec handover ABI documentation (Mike Rapoport) - "Align atomic storage" adds the __aligned attribute to atomic_t and atomic64_t definitions to get natural alignment of both types on csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain) - "kho: clean up page initialization logic" simplifies the page initialization logic in kho_restore_page() (Pratyush Yadav) - "Unload linux/kernel.h" moves several things out of kernel.h and into more appropriate places (Yury Norov) - "don't abuse task_struct.group_leader" removes the usage of ->group_leader when it is "obviously unnecessary" (Oleg Nesterov) - "list private v2 & luo flb" adds some infrastructure improvements to the live update orchestrator (Pasha Tatashin) * tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits) watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency procfs: fix missing RCU protection when reading real_parent in do_task_stat() watchdog/softlockup: fix sample ring index wrap in need_counting_irqs() kcsan, compiler_types: avoid duplicate type issues in BPF Type Format kho: fix doc for kho_restore_pages() tests/liveupdate: add in-kernel liveupdate test liveupdate: luo_flb: introduce File-Lifecycle-Bound global state liveupdate: luo_file: Use private list list: add kunit test for private list primitives list: add primitives for private list manipulations delayacct: fix uapi timespec64 definition panic: add panic_force_cpu= parameter to redirect panic to a specific CPU netclassid: use thread_group_leader(p) in update_classid_task() RDMA/umem: don't abuse current->group_leader drm/pan*: don't abuse current->group_leader drm/amd: kill the outdated "Only the pthreads threading model is supported" checks drm/amdgpu: don't abuse current->group_leader android/binder: use same_thread_group(proc->tsk, current) in binder_mmap() android/binder: don't abuse current->group_leader kho: skip memoryless NUMA nodes when reserving scratch areas ...
2026-01-20	kernel.h: drop hex.h and update all hex.h users	Randy Dunlap
	Remove <linux/hex.h> from <linux/kernel.h> and update all users/callers of hex.h interfaces to directly #include <linux/hex.h> as part of the process of putting kernel.h on a diet. Removing hex.h from kernel.h means that 36K C source files don't have to pay the price of parsing hex.h for the roughly 120 C source files that need it. This change has been build-tested with allmodconfig on most ARCHes. Also, all users/callers of <linux/hex.h> in the entire source tree have been updated if needed (if not already #included). Link: https://lkml.kernel.org/r/20251215005206.2362276-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-14	selinux: drop the BUG() in cred_has_capability()	Paul Moore
	With the compile time check located immediately above the cred_has_capability() function ensuring that we will notice if the capability set grows beyond 63 capabilities, we can safely remove the BUG() call from the cred_has_capability(). Signed-off-by: Paul Moore <paul@paul-moore.com>
2026-01-14	selinux: fix a capabilities parsing typo in selinux_bpf_token_capable()	Paul Moore
	There was a typo, likely a cut-n-paste bug, where we were checking for SECCLASS_CAPABILITY instead of SECCLASS_CAPABILITY2. Fixes: 5473a722f782 ("selinux: add support for BPF token access control") Reported-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2026-01-13	selinux: add support for BPF token access control	Eric Suen
	BPF token support was introduced to allow a privileged process to delegate limited BPF functionality—such as map creation and program loading—to an unprivileged process: https://lore.kernel.org/linux-security-module/20231130185229.2688956-1-andrii@kernel.org/ This patch adds SELinux support for controlling BPF token access. With this change, SELinux policies can now enforce constraints on BPF token usage based on both the delegating (privileged) process and the recipient (unprivileged) process. Supported operations currently include: - map_create - prog_load High-level workflow: 1. An unprivileged process creates a VFS context via `fsopen()` and obtains a file descriptor. 2. This descriptor is passed to a privileged process, which configures BPF token delegation options and mounts a BPF filesystem. 3. SELinux records the `creator_sid` of the privileged process during mount setup. 4. The unprivileged process then uses this BPF fs mount to create a token and attach it to subsequent BPF syscalls. 5. During verification of `map_create` and `prog_load`, SELinux uses `creator_sid` and the current SID to check policy permissions via: avc_has_perm(creator_sid, current_sid, SECCLASS_BPF, BPF__MAP_CREATE, NULL); The implementation introduces two new permissions: - map_create_as - prog_load_as At token creation time, SELinux verifies that the current process has the appropriate `*_as` permission (depending on the `allowed_cmds` value in the bpf_token) to act on behalf of the `creator_sid`. Example SELinux policy: allow test_bpf_t self:bpf { map_create map_read map_write prog_load prog_run map_create_as prog_load_as }; Additionally, a new policy capability bpf_token_perms is added to ensure backward compatibility. If disabled, previous behavior ((checks based on current process SID)) is preserved. Signed-off-by: Eric Suen <ericsu@linux.microsoft.com> Tested-by: Daniel Durning <danieldurning.work@gmail.com> Reviewed-by: Daniel Durning <danieldurning.work@gmail.com> [PM: merge fuzz, subject tweaks, whitespace tweaks, line length tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2026-01-13	selinux: move the selinux_blob_sizes struct	Paul Moore
	Move the selinux_blob_sizes struct so it adjacent to the rest of the SELinux initialization code and not in the middle of the LSM hook callbacks. Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-12-05	Merge tag 'pull-persistency' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull persistent dentry infrastructure and conversion from Al Viro: "Some filesystems use a kinda-sorta controlled dentry refcount leak to pin dentries of created objects in dcache (and undo it when removing those). A reference is grabbed and not released, but it's not actually _stored_ anywhere. That works, but it's hard to follow and verify; among other things, we have no way to tell _which_ of the increments is intended to be an unpaired one. Worse, on removal we need to decide whether the reference had already been dropped, which can be non-trivial if that removal is on umount and we need to figure out if this dentry is pinned due to e.g. unlink() not done. Usually that is handled by using kill_litter_super() as ->kill_sb(), but there are open-coded special cases of the same (consider e.g. /proc/self). Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set claims responsibility for +1 in refcount. The end result this series is aiming for: - get these unbalanced dget() and dput() replaced with new primitives that would, in addition to adjusting refcount, set and clear persistency flag. - instead of having kill_litter_super() mess with removing the remaining "leaked" references (e.g. for all tmpfs files that hadn't been removed prior to umount), have the regular shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries, dropping the corresponding reference if it had been set. After that kill_litter_super() becomes an equivalent of kill_anon_super(). Doing that in a single step is not feasible - it would affect too many places in too many filesystems. It has to be split into a series. This work has really started early in 2024; quite a few preliminary pieces have already gone into mainline. This chunk is finally getting to the meat of that stuff - infrastructure and most of the conversions to it. Some pieces are still sitting in the local branches, but the bulk of that stuff is here" * tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits) d_make_discardable(): warn if given a non-persistent dentry kill securityfs_recursive_remove() convert securityfs get rid of kill_litter_super() convert rust_binderfs convert nfsctl convert rpc_pipefs convert hypfs hypfs: swich hypfs_create_u64() to returning int hypfs: switch hypfs_create_str() to returning int hypfs: don't pin dentries twice convert gadgetfs gadgetfs: switch to simple_remove_by_name() convert functionfs functionfs: switch to simple_remove_by_name() functionfs: fix the open/removal races functionfs: need to cancel ->reset_work in ->kill_sb() functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}() functionfs: don't abuse ffs_data_closed() on fs shutdown convert selinuxfs ...
2025-12-03	Merge tag 'integrity-v6.19' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity updates from Mimi Zohar: "Bug fixes: - defer credentials checking from the bprm_check_security hook to the bprm_creds_from_file security hook - properly ignore IMA policy rules based on undefined SELinux labels IMA policy rule extensions: - extend IMA to limit including file hashes in the audit logs (dont_audit action) - define a new filesystem subtype policy option (fs_subtype) Misc: - extend IMA to support in-kernel module decompression by deferring the IMA signature verification in kernel_read_file() to after the kernel module is decompressed" * tag 'integrity-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: ima: Handle error code returned by ima_filter_rule_match() ima: Access decompressed kernel module to verify appended signature ima: add fs_subtype condition for distinguishing FUSE instances ima: add dont_audit action to suppress audit actions ima: Attach CREDS_CHECK IMA hook to bprm_creds_from_file LSM hook
2025-12-03	Merge tag 'selinux-pr-20251201' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: - Improve the granularity of SELinux labeling for memfd files Currently when creating a memfd file, SELinux treats it the same as any other tmpfs, or hugetlbfs, file. While simple, the drawback is that it is not possible to differentiate between memfd and tmpfs files. This adds a call to the security_inode_init_security_anon() LSM hook and wires up SELinux to provide a set of memfd specific access controls, including the ability to control the execution of memfds. As usual, the commit message has more information. - Improve the SELinux AVC lookup performance Adopt MurmurHash3 for the SELinux AVC hash function instead of the custom hash function currently used. MurmurHash3 is already used for the SELinux access vector table so the impact to the code is minimal, and performance tests have shown improvements in both hash distribution and latency. See the commit message for the performance measurments. - Introduce a Kconfig option for the SELinux AVC bucket/slot size While we have the ability to grow the number of AVC hash buckets today, the size of the buckets (slot size) is fixed at 512. This pull request makes that slot size configurable at build time through a new Kconfig knob, CONFIG_SECURITY_SELINUX_AVC_HASH_BITS. * tag 'selinux-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: improve bucket distribution uniformity of avc_hash() selinux: Move avtab_hash() to a shared location for future reuse selinux: Introduce a new config to make avc cache slot size adjustable memfd,selinux: call security_inode_init_security_anon()
2025-12-03	Merge tag 'lsm-pr-20251201' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull LSM updates from Paul Moore: - Rework the LSM initialization code What started as a "quick" patch to enable a notification event once all of the individual LSMs were initialized, snowballed a bit into a 30+ patch patchset when everything was done. Most of the patches, and diffstat, is due to splitting out the initialization code into security/lsm_init.c and cleaning up some of the mess that was there. While not strictly necessary, it does cleanup the code signficantly, and hopefully makes the upkeep a bit easier in the future. Aside from the new LSM_STARTED_ALL notification, these changes also ensure that individual LSM initcalls are only called when the LSM is enabled at boot time. There should be a minor reduction in boot times for those who build multiple LSMs into their kernels, but only enable a subset at boot. It is worth mentioning that nothing at present makes use of the LSM_STARTED_ALL notification, but there is work in progress which is dependent upon LSM_STARTED_ALL. - Make better use of the seq_put() helpers in device_cgroup tag 'lsm-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (36 commits) lsm: use unrcu_pointer() for current->cred in security_init() device_cgroup: Refactor devcgroup_seq_show to use seq_put* helpers lsm: add a LSM_STARTED_ALL notification event lsm: consolidate all of the LSM framework initcalls selinux: move initcalls to the LSM framework ima,evm: move initcalls to the LSM framework lockdown: move initcalls to the LSM framework apparmor: move initcalls to the LSM framework safesetid: move initcalls to the LSM framework tomoyo: move initcalls to the LSM framework smack: move initcalls to the LSM framework ipe: move initcalls to the LSM framework loadpin: move initcalls to the LSM framework lsm: introduce an initcall mechanism into the LSM framework lsm: group lsm_order_parse() with the other lsm_order_*() functions lsm: output available LSMs when debugging lsm: cleanup the debug and console output in lsm_init.c lsm: add/tweak function header comment blocks in lsm_init.c lsm: fold lsm_init_ordered() into security_init() lsm: cleanup initialize_lsm() and rename to lsm_init_single() ...
2025-12-01	Merge tag 'vfs-6.19-rc1.directory.locking' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull directory locking updates from Christian Brauner: "This contains the work to add centralized APIs for directory locking operations. This series is part of a larger effort to change directory operation locking to allow multiple concurrent operations in a directory. The ultimate goal is to lock the target dentry(s) rather than the whole parent directory. To help with changing the locking protocol, this series centralizes locking and lookup in new helper functions. The helpers establish a pattern where it is the dentry that is being locked and unlocked (currently the lock is held on dentry->d_parent->d_inode, but that can change in the future). This also changes vfs_mkdir() to unlock the parent on failure, as well as dput()ing the dentry. This allows end_creating() to only require the target dentry (which may be IS_ERR() after vfs_mkdir()), not the parent" * tag 'vfs-6.19-rc1.directory.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nfsd: fix end_creating() conversion VFS: introduce end_creating_keep() VFS: change vfs_mkdir() to unlock on failure. ecryptfs: use new start_creating/start_removing APIs Add start_renaming_two_dentries() VFS/ovl/smb: introduce start_renaming_dentry() VFS/nfsd/ovl: introduce start_renaming() and end_renaming() VFS: add start_creating_killable() and start_removing_killable() VFS: introduce start_removing_dentry() smb/server: use end_removing_noperm for for target of smb2_create_link() VFS: introduce start_creating_noperm() and start_removing_noperm() VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing() VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating() VFS: tidy up do_unlinkat() VFS: introduce start_dirop() and end_dirop() debugfs: rename end_creating() to debugfs_end_creating()
2025-11-20	selinux: rename the cred_security_struct variables to "crsec"	Paul Moore
	Along with the renaming from task_security_struct to cred_security_struct, rename the local variables to "crsec" from "tsec". This both fits with existing conventions and helps distinguish between task and cred related variables. No functional changes. Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-11-20	selinux: move avdcache to per-task security struct	Stephen Smalley
	The avdcache is meant to be per-task; move it to a new task_security_struct that is duplicated per-task. Cc: stable@vger.kernel.org Fixes: 5d7ddc59b3d89b724a5aa8f30d0db94ff8d2d93f ("selinux: reduce path walk overhead") Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> [PM: line length fixes] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-11-20	selinux: rename task_security_struct to cred_security_struct	Stephen Smalley
	Before Linux had cred structures, the SELinux task_security_struct was per-task and although the structure was switched to being per-cred long ago, the name was never updated. This change renames it to cred_security_struct to avoid confusion and pave the way for the introduction of an actual per-task security structure for SELinux. No functional change. Cc: stable@vger.kernel.org Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-11-19	ima: Access decompressed kernel module to verify appended signature	Coiby Xu
	Currently, when in-kernel module decompression (CONFIG_MODULE_DECOMPRESS) is enabled, IMA has no way to verify the appended module signature as it can't decompress the module. Define a new kernel_read_file_id enumerate READING_MODULE_COMPRESSED so IMA can calculate the compressed kernel module data hash on READING_MODULE_COMPRESSED and defer appraising/measuring it until on READING_MODULE when the module has been decompressed. Before enabling in-kernel module decompression, a kernel module in initramfs can still be loaded with ima_policy=secure_boot. So adjust the kernel module rule in secure_boot policy to allow either an IMA signature OR an appended signature i.e. to use "appraise func=MODULE_CHECK appraise_type=imasig\|modsig". Reported-by: Karel Srot <ksrot@redhat.com> Suggested-by: Mimi Zohar <zohar@linux.ibm.com> Suggested-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Coiby Xu <coxu@redhat.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2025-11-16	convert selinuxfs	Al Viro
	Tree has invariant part + two subtrees that get replaced upon each policy load. Invariant parts stay for the lifetime of filesystem, these two subdirs - from policy load to policy load (serialized on lock_rename(root, ...)). All object creations are via d_alloc_name()+d_add() inside selinuxfs, all removals are via simple_recursive_removal(). Turn those d_add() into d_make_persistent()+dput() and that's mostly it. Acked-by: Paul Moore <paul@paul-moore.com> Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com> Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16	selinuxfs: new helper for attaching files to tree	Al Viro
	allocating dentry after the inode has been set up reduces the amount of boilerplate - "attach this inode under that name and this parent or drop inode in case of failure" simplifies quite a few places. Acked-by: Paul Moore <paul@paul-moore.com> Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com> Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16	selinuxfs: don't stash the dentry of /policy_capabilities	Al Viro
	Don't bother to store the dentry of /policy_capabilities - it belongs to invariant part of tree and we only use it to populate that directory, so there's no reason to keep it around afterwards. Same situation as with /avc, /ss, etc. There are two directories that get replaced on policy load - /class and /booleans. These we need to stash (and update the pointers on policy reload); /policy_capabilities is not in the same boat. Acked-by: Paul Moore <paul@paul-moore.com> Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com> Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-14	Add start_renaming_two_dentries()	NeilBrown
	A few callers want to lock for a rename and already have both dentries. Also debugfs does want to perform a lookup but doesn't want permission checking, so start_renaming_dentry() cannot be used. This patch introduces start_renaming_two_dentries() which is given both dentries. debugfs performs one lookup itself. As it will only continue with a negative dentry and as those cannot be renamed or unlinked, it is safe to do the lookup before getting the rename locks. overlayfs uses start_renaming_two_dentries() in three places and selinux uses it twice in sel_make_policy_nodes(). In sel_make_policy_nodes() we now lock for rename twice instead of just once so the combined operation is no longer atomic w.r.t the parent directory locks. As selinux_state.policy_mutex is held across the whole operation this does not open up any interesting races. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20251113002050.676694-13-neilb@ownmail.net Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-23	selinux: improve bucket distribution uniformity of avc_hash()	Hongru Zhang
	Reuse the already implemented MurmurHash3 algorithm. Under heavy stress testing (on an 8-core system sustaining over 50,000 authentication events per second), sample once per second and take the mean of 1800 samples: 1. Bucket utilization rate and length of longest chain +--------------------------+-----------------------------------------+ \| \| bucket utilization rate / longest chain \| \| +--------------------+--------------------+ \| \| no-patch \| with-patch \| +--------------------------+--------------------+--------------------+ \| 512 nodes, 512 buckets \| 52.5%/7.5 \| 60.2%/5.7 \| +--------------------------+--------------------+--------------------+ \| 1024 nodes, 512 buckets \| 68.9%/12.1 \| 80.2%/9.7 \| +--------------------------+--------------------+--------------------+ \| 2048 nodes, 512 buckets \| 83.7%/19.4 \| 93.4%/16.3 \| +--------------------------+--------------------+--------------------+ \| 8192 nodes, 8192 buckets \| 49.5%/11.4 \| 60.3%/7.4 \| +--------------------------+--------------------+--------------------+ 2. avc_search_node latency (total latency of hash operation and table lookup) +--------------------------+-----------------------------------------+ \| \| latency of function avc_search_node \| \| +--------------------+--------------------+ \| \| no-patch \| with-patch \| +--------------------------+--------------------+--------------------+ \| 512 nodes, 512 buckets \| 87ns \| 84ns \| +--------------------------+--------------------+--------------------+ \| 1024 nodes, 512 buckets \| 97ns \| 96ns \| +--------------------------+--------------------+--------------------+ \| 2048 nodes, 512 buckets \| 118ns \| 113ns \| +--------------------------+--------------------+--------------------+ \| 8192 nodes, 8192 buckets \| 106ns \| 99ns \| +--------------------------+--------------------+--------------------+ Although MurmurHash3 has higher overhead than the bitwise operations in the original algorithm, the data shows that the MurmurHash3 achieves better distribution, reducing average lookup time. Consequently, the total latency of hashing and table lookup is lower than before. Signed-off-by: Hongru Zhang <zhanghongru@xiaomi.com> [PM: whitespace fixes] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-23	selinux: Move avtab_hash() to a shared location for future reuse	Hongru Zhang
	This is a preparation patch, no functional change. Signed-off-by: Hongru Zhang <zhanghongru@xiaomi.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-23	selinux: Introduce a new config to make avc cache slot size adjustable	Hongru Zhang
	On mobile device high-load situations, permission check can happen more than 90,000/s (8 core system). With default 512 cache nodes configuration, avc cache miss happens more often and occasionally leads to long time (>2ms) irqs off on both big and little cores, which decreases system real-time capability. An actual call stack is as follows: => avc_compute_av => avc_perm_nonode => avc_has_perm_noaudit => selinux_capable => security_capable => capable => __sched_setscheduler => do_sched_setscheduler => __arm64_sys_sched_setscheduler => invoke_syscall => el0_svc_common => do_el0_svc => el0_svc => el0t_64_sync_handler => el0t_64_sync Although we can expand avc nodes through /sys/fs/selinux/cache_threshold to mitigate long time irqs off, hash conflicts make the bucket average length longer because of the fixed size of cache slots, leading to avc_search_node() latency increase. So introduce a new config to make avc cache slot size also configurable, and with fine tuning, we can mitigate long time irqs off with slightly avc_search_node() performance regression. Theoretically, the main overhead is memory consumption. Signed-off-by: Hongru Zhang <zhanghongru@xiaomi.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22	memfd,selinux: call security_inode_init_security_anon()	Thiébaud Weksteen
	Prior to this change, no security hooks were called at the creation of a memfd file. It means that, for SELinux as an example, it will receive the default type of the filesystem that backs the in-memory inode. In most cases, that would be tmpfs, but if MFD_HUGETLB is passed, it will be hugetlbfs. Both can be considered implementation details of memfd. It also means that it is not possible to differentiate between a file coming from memfd_create and a file coming from a standard tmpfs mount point. Additionally, no permission is validated at creation, which differs from the similar memfd_secret syscall. Call security_inode_init_security_anon during creation. This ensures that the file is setup similarly to other anonymous inodes. On SELinux, it means that the file will receive the security context of its task. The ability to limit fexecve on memfd has been of interest to avoid potential pitfalls where /proc/self/exe or similar would be executed [1][2]. Reuse the "execute_no_trans" and "entrypoint" access vectors, similarly to the file class. These access vectors may not make sense for the existing "anon_inode" class. Therefore, define and assign a new class "memfd_file" to support such access vectors. Guard these changes behind a new policy capability named "memfd_class". [1] https://crbug.com/1305267 [2] https://lore.kernel.org/lkml/20221215001205.51969-1-jeffxu@google.com/ Signed-off-by: Thiébaud Weksteen <tweek@google.com> Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com> Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> [PM: subj tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22	selinux: move initcalls to the LSM framework	Paul Moore
	SELinux currently has a number of initcalls so we've created a new function, selinux_initcall(), which wraps all of these initcalls so that we have a single initcall function that can be registered with the LSM framework. Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22	lsm: replace the name field with a pointer to the lsm_id struct	Paul Moore
	Reduce the duplication between the lsm_id struct and the DEFINE_LSM() definition by linking the lsm_id struct directly into the individual LSM's DEFINE_LSM() instance. Linking the lsm_id into the LSM definition also allows us to simplify the security_add_hooks() function by removing the code which populates the lsm_idlist[] array and moving it into the normal LSM startup code where the LSM list is parsed and the individual LSMs are enabled, making for a cleaner implementation with less overhead at boot. Reviewed-by: Kees Cook <kees@kernel.org> Reviewed-by: John Johansen <john.johansen@canonical.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-03	Merge tag 'pull-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	Linus Torvalds
	Pull d_name audit update from Al Viro: "Simplifying ->d_name audits, easy part. Turn dentry->d_name into an anon union of const struct qsrt (d_name itself) and a writable alias (__d_name). With constification of some struct qstr * arguments of functions that get &dentry->d_name passed to them, that ends up with all modifications provably done only in fs/dcache.c (and a fairly small part of it). Any new places doing modifications will be easy to find - grep for __d_name will suffice" * tag 'pull-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make it easier to catch those who try to modify ->d_name generic_ci_validate_strict_name(): constify name argument afs_dir_search: constify qstr argument afs_edit_dir_{add,remove}(): constify qstr argument exfat_find(): constify qstr argument security_dentry_init_security(): constify qstr argument
2025-09-30	Merge tag 'lsm-pr-20250926' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm updates from Paul Moore: - Move the management of the LSM BPF security blobs into the framework In order to enable multiple LSMs we need to allocate and free the various security blobs in the LSM framework and not the individual LSMs as they would end up stepping all over each other. - Leverage the lsm_bdev_alloc() helper in lsm_bdev_alloc() Make better use of our existing helper functions to reduce some code duplication. - Update the Rust cred code to use 'sync::aref' Part of a larger effort to move the Rust code over to the 'sync' module. - Make CONFIG_LSM dependent on CONFIG_SECURITY As the CONFIG_LSM Kconfig setting is an ordered list of the LSMs to enable a boot, it obviously doesn't make much sense to enable this when CONFIG_SECURITY is disabled. - Update the LSM and CREDENTIALS sections in MAINTAINERS with Rusty bits Add the Rust helper files to the associated LSM and CREDENTIALS entries int the MAINTAINERS file. We're trying to improve the communication between the two groups and making sure we're all aware of what is going on via cross-posting to the relevant lists is a good way to start. * tag 'lsm-pr-20250926' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: lsm: CONFIG_LSM can depend on CONFIG_SECURITY MAINTAINERS: add the associated Rust helper to the CREDENTIALS section MAINTAINERS: add the associated Rust helper to the LSM section rust,cred: update AlwaysRefCounted import to sync::aref security: use umax() to improve code lsm,selinux: Add LSM blob support for BPF objects lsm: use lsm_blob_alloc() in lsm_bdev_alloc()
2025-09-30	Merge tag 'selinux-pr-20250926' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: - Support per-file labeling for functionfs Both genfscon and user defined labeling methods are supported. This should help users who want to provide separation between the control endpoint file, "ep0", and other endpoints. - Remove our use of get_zeroed_page() in sel_read_bool() Update sel_read_bool() to use a four byte stack buffer instead of a memory page fetched via get_zeroed_page(), and fix a memory in the process. Needless to say we should have done this a long time ago, but it was in a very old chunk of code that "just worked" and I don't think anyone had taken a real look at it in many years. - Better use of the netdev skb/sock helper functions Convert a sk_to_full_sk(skb->sk) into a skb_to_full_sk(skb) call. - Remove some old, dead, and/or redundant code * tag 'selinux-pr-20250926' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: enable per-file labeling for functionfs selinux: fix sel_read_bool() allocation and error handling selinux: Remove redundant __GFP_NOWARN selinux: use a consistent method to get full socket from skb selinux: Remove unused function selinux_policycap_netif_wildcard()
2025-09-30	Merge tag 'audit-pr-20250926' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: - Proper audit support for multiple LSMs As the audit subsystem predated the work to enable multiple LSMs, some additional work was needed to support logging the different LSM labels for the subjects/tasks and objects on the system. Casey's patches add new auxillary records for subjects and objects that convey the additional labels. - Ensure fanotify audit events are always generated Generally speaking security relevant subsystems always generate audit events, unless explicitly ignored. However, up to this point fanotify events had been ignored by default, but starting with this pull request fanotify follows convention and generates audit events by default. - Replace an instance of strcpy() with strscpy() - Minor indentation, style, and comment fixes * tag 'audit-pr-20250926' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: fix skb leak when audit rate limit is exceeded audit: init ab->skb_list earlier in audit_buffer_alloc() audit: add record for multiple object contexts audit: add record for multiple task security contexts lsm: security_lsmblob_to_secctx module selection audit: create audit_stamp structure audit: add a missing tab audit: record fanotify event regardless of presence of rules audit: fix typo in auditfilter.c comment audit: Replace deprecated strcpy() with strscpy() audit: fix indentation in audit_log_exit()
2025-09-15	security_dentry_init_security(): constify qstr argument	Al Viro
	Nothing outside of fs/dcache.c has any business modifying dentry names; passing &dentry->d_name as an argument should have that argument declared as a const pointer. Acked-by: Casey Schaufler <casey@schaufler-ca.com> # smack part Acked-by: Paul Moore <paul@paul-moore.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-09-07	selinux: enable per-file labeling for functionfs	Neill Kapron
	This patch adds support for genfscon per-file labeling of functionfs files as well as support for userspace to apply labels after new functionfs endpoints are created. This allows for separate labels and therefore access control on a per-endpoint basis. An example use case would be for the default endpoint EP0 used as a restricted control endpoint, and additional usb endpoints to be used by other more permissive domains. It should be noted that if there are multiple functionfs mounts on a system, genfs file labels will apply to all mounts, and therefore will not likely be as useful as the userspace relabeling portion of this patch - the addition to selinux_is_genfs_special_handling(). This patch introduces the functionfs_seclabel policycap to maintain existing functionfs genfscon behavior unless explicitly enabled. Signed-off-by: Neill Kapron <nkapron@google.com> Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> [PM: trim changelog, apply boolean logic fixup] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-09-03	selinux: fix sel_read_bool() allocation and error handling	Stephen Smalley
	Switch sel_read_bool() from using get_zeroed_page() and free_page() to a stack-allocated buffer. This also fixes a memory leak in the error path when security_get_bool_value() returns an error. Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-09-01	copy_process: pass clone_flags as u64 across calltree	Simon Schuster
	With the introduction of clone3 in commit 7f192e3cd316 ("fork: add clone3") the effective bit width of clone_flags on all architectures was increased from 32-bit to 64-bit, with a new type of u64 for the flags. However, for most consumers of clone_flags the interface was not changed from the previous type of unsigned long. While this works fine as long as none of the new 64-bit flag bits (CLONE_CLEAR_SIGHAND and CLONE_INTO_CGROUP) are evaluated, this is still undesirable in terms of the principle of least surprise. Thus, this commit fixes all relevant interfaces of callees to sys_clone3/copy_process (excluding the architecture-specific copy_thread) to consistently pass clone_flags as u64, so that no truncation to 32-bit integers occurs on 32-bit architectures. Signed-off-by: Simon Schuster <schuster.simon@siemens-energy.com> Link: https://lore.kernel.org/20250901-nios2-implement-clone3-v2-2-53fcf5577d57@siemens-energy.com Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-30	audit: add record for multiple object contexts	Casey Schaufler
	Create a new audit record AUDIT_MAC_OBJ_CONTEXTS. An example of the MAC_OBJ_CONTEXTS record is: type=MAC_OBJ_CONTEXTS msg=audit(1601152467.009:1050): obj_selinux=unconfined_u:object_r:user_home_t:s0 When an audit event includes a AUDIT_MAC_OBJ_CONTEXTS record the "obj=" field in other records in the event will be "obj=?". An AUDIT_MAC_OBJ_CONTEXTS record is supplied when the system has multiple security modules that may make access decisions based on an object security context. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subj tweak, audit example readability indents] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-30	audit: add record for multiple task security contexts	Casey Schaufler
	Replace the single skb pointer in an audit_buffer with a list of skb pointers. Add the audit_stamp information to the audit_buffer as there's no guarantee that there will be an audit_context containing the stamp associated with the event. At audit_log_end() time create auxiliary records as have been added to the list. Functions are created to manage the skb list in the audit_buffer. Create a new audit record AUDIT_MAC_TASK_CONTEXTS. An example of the MAC_TASK_CONTEXTS record is: type=MAC_TASK_CONTEXTS msg=audit(1600880931.832:113) subj_apparmor=unconfined subj_smack=_ When an audit event includes a AUDIT_MAC_TASK_CONTEXTS record the "subj=" field in other records in the event will be "subj=?". An AUDIT_MAC_TASK_CONTEXTS record is supplied when the system has multiple security modules that may make access decisions based on a subject security context. Refactor audit_log_task_context(), creating a new audit_log_subj_ctx(). This is used in netlabel auditing to provide multiple subject security contexts as necessary. Suggested-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subj tweak, audit example readability indents] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-12	selinux: Remove redundant __GFP_NOWARN	Qianfeng Rong
	Commit 16f5dfbc851b ("gfp: include __GFP_NOWARN in GFP_NOWAIT") made GFP_NOWAIT implicitly include __GFP_NOWARN. Therefore, explicit __GFP_NOWARN combined with GFP_NOWAIT (e.g., `GFP_NOWAIT \| __GFP_NOWARN`) is now redundant. Let's clean up these redundant flags across subsystems. No functional changes. Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> [PM: fixed horizontal spacing / alignment, line wraps] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-11	lsm,selinux: Add LSM blob support for BPF objects	Blaise Boscaccy
	This patch introduces LSM blob support for BPF maps, programs, and tokens to enable LSM stacking and multiplexing of LSM modules that govern BPF objects. Additionally, the existing BPF hooks used by SELinux have been updated to utilize the new blob infrastructure, removing the assumption of exclusive ownership of the security pointer. Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com> [PM: dropped local variable init, style fixes] Signed-off-by: Paul Moore <paul@paul-moore.com>