summaryrefslogtreecommitdiff
path: root/arch/arm64/kernel
AgeCommit message (Collapse)Author
2025-12-15arm64/gcs: Flush the GCS locking state on execMark Brown
When we exec a new task we forget to flush the set of locked GCS mode bits. Since we do flush the rest of the state this means that if GCS is locked the new task will be unable to enable GCS, it will be locked as being disabled. Add the expected flush. Fixes: fc84bc5378a8 ("arm64/gcs: Context switch GCS state for EL0") Cc: <stable@vger.kernel.org> # 6.13.x Reported-by: Yury Khrustalev <Yury.Khrustalev@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Tested-by: Yury Khrustalev <yury.khrustalev@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-12-15arm64/efi: Remove unneeded SVE/SME fallback preserve/store handlingArd Biesheuvel
Since commit 7137a203b251 ("arm64/fpsimd: Permit kernel mode NEON with IRQs off"), the only condition under which the fallback path is taken for FP/SIMD preserve/restore across a EFI runtime call is when it is called from hardirq or NMI context. In practice, this only happens when the EFI pstore driver is called to dump the kernel log buffer into a EFI variable under a panic, oops or emergency_restart() condition, and none of these can be expected to result in a return to user space for the task in question. This means that the existing EFI-specific logic for preserving and restoring SVE/SME state is pointless, and can be removed. Instead, kill the task, so that an exceedingly unlikely inadvertent return to user space does not proceed with a corrupted FP/SIMD state. Also, retain the preserve and restore of the base FP/SIMD state, as that might belong to kernel mode use of FP/SIMD. (Note that EFI runtime calls are never invoked reentrantly, even in this case, and so any interrupted kernel mode FP/SIMD usage will be unrelated to EFI) Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-12-05Merge tag 'driver-core-6.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core updates from Danilo Krummrich: "Arch Topology: - Move parse_acpi_topology() from arm64 to common code for reuse in RISC-V CPU: - Expose housekeeping CPUs through /sys/devices/system/cpu/housekeeping - Print a newline (or 0x0A) instead of '(null)' reading /sys/devices/system/cpu/nohz_full when nohz_full= is not set debugfs - Remove (broken) 'no-mount' mode - Remove redundant access mode checks in debugfs_get_tree() and debugfs_create_*() functions Devres: - Remove unused devm_free_percpu() helper - Move devm_alloc_percpu() from device.h to devres.h Firmware Loader: - Replace simple_strtol() with kstrtoint() - Do not call cancel_store() when no upload is in progress kernfs: - Increase struct super_block::maxbytes to MAX_LFS_FILESIZE - Fix a missing unwind path in __kernfs_new_node() Misc: - Increase the name size in struct auxiliary_device_id to 40 characters - Replace system_unbound_wq with system_dfl_wq and add WQ_PERCPU to alloc_workqueue() Platform: - Replace ERR_PTR() with IOMEM_ERR_PTR() in platform ioremap functions Rust: - Auxiliary: - Unregister auxiliary device on parent device unbind - Move parent() to impl Device; implement device context aware parent() for Device<Bound> - Illustrate how to safely obtain a driver's device private data when calling from an auxiliary driver into the parant device driver - DebugFs: - Implement support for binary large objects - Device: - Let probe() return the driver's device private data as pinned initializer, i.e. impl PinInit<Self, Error> - Implement safe accessor for a driver's device private data for Device<Bound> (returned reference can't out-live driver binding and guarantees the correct private data type) - Implement AsBusDevice trait, to be used by class device abstractions to derive the bus device type of the parent device - DMA: - Store raw pointer of allocation as NonNull - Use start_ptr() and start_ptr_mut() to inherit correct mutability of self - FS: - Add file::Offset type alias - I2C: - Add abstractions for I2C device / driver infrastructure - Implement abstractions for manual I2C device registrations - I/O: - Use "kernel vertical" style for imports - Define ResourceSize as resource_size_t - Move ResourceSize to top-level I/O module - Add type alias for phys_addr_t - Implement Rust version of read_poll_timeout_atomic() - PCI: - Use "kernel vertical" style for imports - Move I/O and IRQ infrastructure to separate files - Add support for PCI interrupt vectors - Implement TryInto<IrqRequest<'a>> for IrqVector<'a> to convert an IrqVector bound to specific pci::Device into an IrqRequest bound to the same pci::Device's parent Device - Leverage pin_init_scope() to get rid of redundant Result in IRQ methods - PinInit: - Add {pin_}init_scope() to execute code before creating an initializer - Platform: - Leverage pin_init_scope() to get rid of redundant Result in IRQ methods - Timekeeping: - Implement abstraction of udelay() - Uaccess: - Implement read_slice_partial() and read_slice_file() for UserSliceReader - Implement write_slice_partial() and write_slice_file() for UserSliceWriter sysfs: - Prepare the constification of struct attribute" * tag 'driver-core-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (75 commits) rust: pci: fix build failure when CONFIG_PCI_MSI is disabled debugfs: Fix default access mode config check debugfs: Remove broken no-mount mode debugfs: Remove redundant access mode checks driver core: Check drivers_autoprobe for all added devices driver core: WQ_PERCPU added to alloc_workqueue users driver core: replace use of system_unbound_wq with system_dfl_wq tick/nohz: Expose housekeeping CPUs in sysfs tick/nohz: avoid showing '(null)' if nohz_full= not set sysfs/cpu: Use DEVICE_ATTR_RO for nohz_full attribute kernfs: fix memory leak of kernfs_iattrs in __kernfs_new_node fs/kernfs: raise sb->maxbytes to MAX_LFS_FILESIZE mod_devicetable: Bump auxiliary_device_id name size sysfs: simplify attribute definition macros samples/kobject: constify 'struct foo_attribute' samples/kobject: add is_visible() callback to attribute group sysfs: attribute_group: enable const variants of is_visible() sysfs: introduce __SYSFS_FUNCTION_ALTERNATIVE() sysfs: transparently handle const pointers in ATTRIBUTE_GROUPS() sysfs: attribute_group: allow registration of const attribute ...
2025-12-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - Support for userspace handling of synchronous external aborts (SEAs), allowing the VMM to potentially handle the abort in a non-fatal manner - Large rework of the VGIC's list register handling with the goal of supporting more active/pending IRQs than available list registers in hardware. In addition, the VGIC now supports EOImode==1 style deactivations for IRQs which may occur on a separate vCPU than the one that acked the IRQ - Support for FEAT_XNX (user / privileged execute permissions) and FEAT_HAF (hardware update to the Access Flag) in the software page table walkers and shadow MMU - Allow page table destruction to reschedule, fixing long need_resched latencies observed when destroying a large VM - Minor fixes to KVM and selftests Loongarch: - Get VM PMU capability from HW GCFG register - Add AVEC basic support - Use 64-bit register definition for EIOINTC - Add KVM timer test cases for tools/selftests RISC/V: - SBI message passing (MPXY) support for KVM guest - Give a new, more specific error subcode for the case when in-kernel AIA virtualization fails to allocate IMSIC VS-file - Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually in small chunks - Fix guest page fault within HLV* instructions - Flush VS-stage TLB after VCPU migration for Andes cores s390: - Always allocate ESCA (Extended System Control Area), instead of starting with the basic SCA and converting to ESCA with the addition of the 65th vCPU. The price is increased number of exits (and worse performance) on z10 and earlier processor; ESCA was introduced by z114/z196 in 2010 - VIRT_XFER_TO_GUEST_WORK support - Operation exception forwarding support - Cleanups x86: - Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO SPTE caching is disabled, as there can't be any relevant SPTEs to zap - Relocate a misplaced export - Fix an async #PF bug where KVM would clear the completion queue when the guest transitioned in and out of paging mode, e.g. when handling an SMI and then returning to paged mode via RSM - Leave KVM's user-return notifier registered even when disabling virtualization, as long as kvm.ko is loaded. On reboot/shutdown, keeping the notifier registered is ok; the kernel does not use the MSRs and the callback will run cleanly and restore host MSRs if the CPU manages to return to userspace before the system goes down - Use the checked version of {get,put}_user() - Fix a long-lurking bug where KVM's lack of catch-up logic for periodic APIC timers can result in a hard lockup in the host - Revert the periodic kvmclock sync logic now that KVM doesn't use a clocksource that's subject to NTP corrections - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the latter behind CONFIG_CPU_MITIGATIONS - Context switch XCR0, XSS, and PKRU outside of the entry/exit fast path; the only reason they were handled in the fast path was to paper of a bug in the core #MC code, and that has long since been fixed - Add emulator support for AVX MOV instructions, to play nice with emulated devices whose guest drivers like to access PCI BARs with large multi-byte instructions x86 (AMD): - Fix a few missing "VMCB dirty" bugs - Fix the worst of KVM's lack of EFER.LMSLE emulation - Add AVIC support for addressing 4k vCPUs in x2AVIC mode - Fix incorrect handling of selective CR0 writes when checking intercepts during emulation of L2 instructions - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on VMRUN and #VMEXIT - Fix a bug where KVM corrupt the guest code stream when re-injecting a soft interrupt if the guest patched the underlying code after the VM-Exit, e.g. when Linux patches code with a temporary INT3 - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to userspace, and extend KVM "support" to all policy bits that don't require any actual support from KVM x86 (Intel): - Use the root role from kvm_mmu_page to construct EPTPs instead of the current vCPU state, partly as worthwhile cleanup, but mostly to pave the way for tracking per-root TLB flushes, and elide EPT flushes on pCPU migration if the root is clean from a previous flush - Add a few missing nested consistency checks - Rip out support for doing "early" consistency checks via hardware as the functionality hasn't been used in years and is no longer useful in general; replace it with an off-by-default module param to WARN if hardware fails a check that KVM does not perform - Fix a currently-benign bug where KVM would drop the guest's SPEC_CTRL[63:32] on VM-Enter - Misc cleanups - Overhaul the TDX code to address systemic races where KVM (acting on behalf of userspace) could inadvertantly trigger lock contention in the TDX-Module; KVM was either working around these in weird, ugly ways, or was simply oblivious to them (though even Yan's devilish selftests could only break individual VMs, not the host kernel) - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a TDX vCPU, if creating said vCPU failed partway through - Fix a few sparse warnings (bad annotation, 0 != NULL) - Use struct_size() to simplify copying TDX capabilities to userspace - Fix a bug where TDX would effectively corrupt user-return MSR values if the TDX Module rejects VP.ENTER and thus doesn't clobber host MSRs as expected Selftests: - Fix a math goof in mmu_stress_test when running on a single-CPU system/VM - Forcefully override ARCH from x86_64 to x86 to play nice with specifying ARCH=x86_64 on the command line - Extend a bunch of nested VMX to validate nested SVM as well - Add support for LA57 in the core VM_MODE_xxx macro, and add a test to verify KVM can save/restore nested VMX state when L1 is using 5-level paging, but L2 is not - Clean up the guest paging code in anticipation of sharing the core logic for nested EPT and nested NPT guest_memfd: - Add NUMA mempolicy support for guest_memfd, and clean up a variety of rough edges in guest_memfd along the way - Define a CLASS to automatically handle get+put when grabbing a guest_memfd from a memslot to make it harder to leak references - Enhance KVM selftests to make it easer to develop and debug selftests like those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs often result in hard-to-debug SIGBUS errors - Misc cleanups Generic: - Use the recently-added WQ_PERCPU when creating the per-CPU workqueue for irqfd cleanup - Fix a goof in the dirty ring documentation - Fix choice of target for directed yield across different calls to kvm_vcpu_on_spin(); the function was always starting from the first vCPU instead of continuing the round-robin search" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits) KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2 KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX} KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected" KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot() KVM: arm64: Add endian casting to kvm_swap_s[12]_desc() KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n KVM: arm64: selftests: Add test for AT emulation KVM: arm64: nv: Expose hardware access flag management to NV guests KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW KVM: arm64: Implement HW access flag management in stage-1 SW PTW KVM: arm64: Propagate PTW errors up to AT emulation KVM: arm64: Add helper for swapping guest descriptor KVM: arm64: nv: Use pgtable definitions in stage-2 walk KVM: arm64: Handle endianness in read helper for emulated PTW KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW KVM: arm64: Call helper for reading descriptors directly KVM: arm64: nv: Advertise support for FEAT_XNX KVM: arm64: Teach ptdump about FEAT_XNX permissions KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions ...
2025-12-02Merge tag 'fpsimd-on-stack-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull arm64 FPSIMD on-stack buffer updates from Eric Biggers: "This is a core arm64 change. However, I was asked to take this because most uses of kernel-mode FPSIMD are in crypto or CRC code. In v6.8, the size of task_struct on arm64 increased by 528 bytes due to the new 'kernel_fpsimd_state' field. This field was added to allow kernel-mode FPSIMD code to be preempted. Unfortunately, 528 bytes is kind of a lot for task_struct. This regression in the task_struct size was noticed and reported. Recover that space by making this state be allocated on the stack at the beginning of each kernel-mode FPSIMD section. To make it easier for all the users of kernel-mode FPSIMD to do that correctly, introduce and use a 'scoped_ksimd' abstraction" * tag 'fpsimd-on-stack-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (23 commits) lib/crypto: arm64: Move remaining algorithms to scoped ksimd API lib/crypto: arm/blake2b: Move to scoped ksimd API arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack arm64/fpu: Enforce task-context only for generic kernel mode FPU net/mlx5: Switch to more abstract scoped ksimd guard API on arm64 arm64/xorblocks: Switch to 'ksimd' scoped guard API crypto/arm64: sm4 - Switch to 'ksimd' scoped guard API crypto/arm64: sm3 - Switch to 'ksimd' scoped guard API crypto/arm64: sha3 - Switch to 'ksimd' scoped guard API crypto/arm64: polyval - Switch to 'ksimd' scoped guard API crypto/arm64: nhpoly1305 - Switch to 'ksimd' scoped guard API crypto/arm64: aes-gcm - Switch to 'ksimd' scoped guard API crypto/arm64: aes-blk - Switch to 'ksimd' scoped guard API crypto/arm64: aes-ccm - Switch to 'ksimd' scoped guard API raid6: Move to more abstract 'ksimd' guard API crypto: aegis128-neon - Move to more abstract 'ksimd' guard API crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit lib/crc: Switch ARM and arm64 to 'ksimd' scoped guard API ...
2025-12-02Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "These are the arm64 updates for 6.19. The biggest part is the Arm MPAM driver under drivers/resctrl/. There's a patch touching mm/ to handle spurious faults for huge pmd (similar to the pte version). The corresponding arm64 part allows us to avoid the TLB maintenance if a (huge) page is reused after a write fault. There's EFI refactoring to allow runtime services with preemption enabled and the rest is the usual perf/PMU updates and several cleanups/typos. Summary: Core features: - Basic Arm MPAM (Memory system resource Partitioning And Monitoring) driver under drivers/resctrl/ which makes use of the fs/rectrl/ API Perf and PMU: - Avoid cycle counter on multi-threaded CPUs - Extend CSPMU device probing and add additional filtering support for NVIDIA implementations - Add support for the PMUs on the NoC S3 interconnect - Add additional compatible strings for new Cortex and C1 CPUs - Add support for data source filtering to the SPE driver - Add support for i.MX8QM and "DB" PMU in the imx PMU driver Memory managemennt: - Avoid broadcast TLBI if page reused in write fault - Elide TLB invalidation if the old PTE was not valid - Drop redundant cpu_set_*_tcr_t0sz() macros - Propagate pgtable_alloc() errors outside of __create_pgd_mapping() - Propagate return value from __change_memory_common() ACPI and EFI: - Call EFI runtime services without disabling preemption - Remove unused ACPI function Miscellaneous: - ptrace support to disable streaming on SME-only systems - Improve sysreg generation to include a 'Prefix' descriptor - Replace __ASSEMBLY__ with __ASSEMBLER__ - Align register dumps in the kselftest zt-test - Remove some no longer used macros/functions - Various spelling corrections" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits) arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic arm64/pageattr: Propagate return value from __change_memory_common arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS KVM: arm64: selftests: Consider all 7 possible levels of cache KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros Documentation/arm64: Fix the typo of register names ACPI: GTDT: Get rid of acpi_arch_timer_mem_init() perf: arm_spe: Add support for filtering on data source perf: Add perf_event_attr::config4 perf/imx_ddr: Add support for PMU in DB (system interconnects) perf/imx_ddr: Get and enable optional clks perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe() dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT arm64: mm: use untagged address to calculate page index MAINTAINERS: new entry for MPAM Driver arm_mpam: Add kunit tests for props_mismatch() arm_mpam: Add kunit test for bitmap reset arm_mpam: Add helper to reset saved mbwu state ...
2025-12-02Merge tag 'kvmarm-6.19' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.19 - Support for userspace handling of synchronous external aborts (SEAs), allowing the VMM to potentially handle the abort in a non-fatal manner. - Large rework of the VGIC's list register handling with the goal of supporting more active/pending IRQs than available list registers in hardware. In addition, the VGIC now supports EOImode==1 style deactivations for IRQs which may occur on a separate vCPU than the one that acked the IRQ. - Support for FEAT_XNX (user / privileged execute permissions) and FEAT_HAF (hardware update to the Access Flag) in the software page table walkers and shadow MMU. - Allow page table destruction to reschedule, fixing long need_resched latencies observed when destroying a large VM. - Minor fixes to KVM and selftests
2025-12-02Merge tag 'irq-core-2025-11-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq core updates from Thomas Gleixner: "Updates for the interrupt core and treewide cleanups: - Rework of the Per Processor Interrupt (PPI) management on ARM[64] PPI support was built under the assumption that the systems are homogenous so that the same CPU local device types are connected to them. That's unfortunately wishful thinking and created horrible workarounds. This rework provides affinity management for PPIs so that they can be individually configured in the firmware tables and mops up the related drivers all over the place. - Prevent CPUSET/isolation changes to arbitrarily affine interrupt threads to random CPUs, which ignores user or driver settings. - Plug a harmless race in the interrupt affinity proc interface, which allows to see a half updated mask - Adjust the priority of secondary interrupt threads on RT, so that the combination of primary and secondary thread emulates the hardware interrupt plus thread scenario. Having them at the same priority can cause starvation issues in some drivers" * tag 'irq-core-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) genirq: Remove cpumask availability check on kthread affinity setting genirq: Fix interrupt threads affinity vs. cpuset isolated partitions genirq: Prevent early spurious wake-ups of interrupt threads genirq: Use raw_spinlock_irq() in irq_set_affinity_notifier() genirq/manage: Reduce priority of forced secondary interrupt handler genirq/proc: Fix race in show_irq_affinity() genirq: Fix percpu_devid irq affinity documentation perf: arm_pmu: Kill last use of per-CPU cpu_armpmu pointer irqdomain: Kill of_node_to_fwnode() helper genirq: Kill irq_{g,s}et_percpu_devid_partition() irqchip: Kill irq-partition-percpu irqchip/apple-aic: Drop support for custom PMU irq partitions irqchip/gic-v3: Drop support for custom PPI partitions coresight: trbe: Request specific affinities for per CPU interrupts perf: arm_spe_pmu: Request specific affinities for per CPU interrupts perf: arm_pmu: Request specific affinities for per CPU NMIs/interrupts genirq: Add request_percpu_irq_affinity() helper genirq: Allow per-cpu interrupt sharing for non-overlapping affinities genirq: Update request_percpu_nmi() to take an affinity genirq: Add affinity to percpu_devid interrupt requests ...
2025-12-02Merge tag 'core-rseq-2025-11-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull rseq updates from Thomas Gleixner: "A large overhaul of the restartable sequences and CID management: The recent enablement of RSEQ in glibc resulted in regressions which are caused by the related overhead. It turned out that the decision to invoke the exit to user work was not really a decision. More or less each context switch caused that. There is a long list of small issues which sums up nicely and results in a 3-4% regression in I/O benchmarks. The other detail which caused issues due to extra work in context switch and task migration is the CID (memory context ID) management. It also requires to use a task work to consolidate the CID space, which is executed in the context of an arbitrary task and results in sporadic uncontrolled exit latencies. The rewrite addresses this by: - Removing deprecated and long unsupported functionality - Moving the related data into dedicated data structures which are optimized for fast path processing. - Caching values so actual decisions can be made - Replacing the current implementation with a optimized inlined variant. - Separating fast and slow path for architectures which use the generic entry code, so that only fault and error handling goes into the TIF_NOTIFY_RESUME handler. - Rewriting the CID management so that it becomes mostly invisible in the context switch path. That moves the work of switching modes into the fork/exit path, which is a reasonable tradeoff. That work is only required when a process creates more threads than the cpuset it is allowed to run on or when enough threads exit after that. An artificial thread pool benchmarks which triggers this did not degrade, it actually improved significantly. The main effect in migration heavy scenarios is that runqueue lock held time and therefore contention goes down significantly" * tag 'core-rseq-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) sched/mmcid: Switch over to the new mechanism sched/mmcid: Implement deferred mode change irqwork: Move data struct to a types header sched/mmcid: Provide CID ownership mode fixup functions sched/mmcid: Provide new scheduler CID mechanism sched/mmcid: Introduce per task/CPU ownership infrastructure sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutex sched/mmcid: Provide precomputed maximal value sched/mmcid: Move initialization out of line signal: Move MMCID exit out of sighand lock sched/mmcid: Convert mm CID mask to a bitmap cpumask: Cache num_possible_cpus() sched/mmcid: Use cpumask_weighted_or() cpumask: Introduce cpumask_weighted_or() sched/mmcid: Prevent pointless work in mm_update_cpus_allowed() sched/mmcid: Move scheduler code out of global header sched: Fixup whitespace damage sched/mmcid: Cacheline align MM CID storage sched/mmcid: Use proper data structures sched/mmcid: Revert the complex CID management ...
2025-12-01Merge tag 'vfs-6.19-rc1.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - Cheaper MAY_EXEC handling for path lookup. This elides MAY_WRITE permission checks during path lookup and adds the IOP_FASTPERM_MAY_EXEC flag so filesystems like btrfs can avoid expensive permission work. - Hide dentry_cache behind runtime const machinery. - Add German Maglione as virtiofs co-maintainer. Cleanups: - Tidy up and inline step_into() and walk_component() for improved code generation. - Re-enable IOCB_NOWAIT writes to files. This refactors file timestamp update logic, fixing a layering bypass in btrfs when updating timestamps on device files and improving FMODE_NOCMTIME handling in VFS now that nfsd started using it. - Path lookup optimizations extracting slowpaths into dedicated routines and adding branch prediction hints for mntput_no_expire(), fd_install(), lookup_slow(), and various other hot paths. - Enable clang's -fms-extensions flag, requiring a JFS rename to avoid conflicts. - Remove spurious exports in fs/file_attr.c. - Stop duplicating union pipe_index declaration. This depends on the shared kbuild branch that brings in -fms-extensions support which is merged into this branch. - Use MD5 library instead of crypto_shash in ecryptfs. - Use largest_zero_folio() in iomap_dio_zero(). - Replace simple_strtol/strtoul with kstrtoint/kstrtouint in init and initrd code. - Various typo fixes. Fixes: - Fix emergency sync for btrfs. Btrfs requires an explicit sync_fs() call with wait == 1 to commit super blocks. The emergency sync path never passed this, leaving btrfs data uncommitted during emergency sync. - Use local kmap in watch_queue's post_one_notification(). - Add hint prints in sb_set_blocksize() for LBS dependency on THP" * tag 'vfs-6.19-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits) MAINTAINERS: add German Maglione as virtiofs co-maintainer fs: inline step_into() and walk_component() fs: tidy up step_into() & friends before inlining orangefs: use inode_update_timestamps directly btrfs: fix the comment on btrfs_update_time btrfs: use vfs_utimes to update file timestamps fs: export vfs_utimes fs: lift the FMODE_NOCMTIME check into file_update_time_flags fs: refactor file timestamp update logic include/linux/fs.h: trivial fix: regualr -> regular fs/splice.c: trivial fix: pipes -> pipe's fs: mark lookup_slow() as noinline fs: add predicts based on nd->depth fs: move mntput_no_expire() slowpath into a dedicated routine fs: remove spurious exports in fs/file_attr.c watch_queue: Use local kmap in post_one_notification() fs: touch up predicts in path lookup fs: move fd_install() slowpath into a dedicated routine and provide commentary fs: hide dentry_cache behind runtime const machinery fs: touch predicts in do_dentry_open() ...
2025-12-01Merge branch 'kvm-arm64/nv-xnx-haf' into kvmarm/nextOliver Upton
* kvm-arm64/nv-xnx-haf: (22 commits) : Support for FEAT_XNX and FEAT_HAF in nested : : Add support for a couple of MMU-related features that weren't : implemented by KVM's software page table walk: : : - FEAT_XNX: Allows the hypervisor to describe execute permissions : separately for EL0 and EL1 : : - FEAT_HAF: Hardware update of the Access Flag, which in the context of : nested means software walkers must also set the Access Flag. : : The series also adds some basic support for testing KVM's emulation of : the AT instruction, including the implementation detail that AT sets the : Access Flag in KVM. KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2 KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX} KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected" KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot() KVM: arm64: Add endian casting to kvm_swap_s[12]_desc() KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n KVM: arm64: selftests: Add test for AT emulation KVM: arm64: nv: Expose hardware access flag management to NV guests KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW KVM: arm64: Implement HW access flag management in stage-1 SW PTW KVM: arm64: Propagate PTW errors up to AT emulation KVM: arm64: Add helper for swapping guest descriptor KVM: arm64: nv: Use pgtable definitions in stage-2 walk KVM: arm64: Handle endianness in read helper for emulated PTW KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW KVM: arm64: Call helper for reading descriptors directly KVM: arm64: nv: Advertise support for FEAT_XNX KVM: arm64: Teach ptdump about FEAT_XNX permissions KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2 ... Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-28Merge branch 'for-next/sysreg' into for-next/coreCatalin Marinas
* for-next/sysreg: : arm64 sysreg updates/cleanups arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS KVM: arm64: selftests: Consider all 7 possible levels of cache KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user arm64/sysreg: Add ICH_VMCR_EL2 arm64/sysreg: Move generation of RES0/RES1/UNKN to function arm64/sysreg: Support feature-specific fields with 'Prefix' descriptor arm64/sysreg: Fix checks for incomplete sysreg definitions arm64/sysreg: Replace TCR_EL1 field macros
2025-11-28Merge branches 'for-next/misc', 'for-next/kselftest', ↵Catalin Marinas
'for-next/efi-preempt', 'for-next/assembler-macro', 'for-next/typos', 'for-next/sme-ptrace-disable', 'for-next/local-tlbi-page-reused', 'for-next/mpam', 'for-next/acpi' and 'for-next/documentation', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: perf: arm_spe: Add support for filtering on data source perf: Add perf_event_attr::config4 perf/imx_ddr: Add support for PMU in DB (system interconnects) perf/imx_ddr: Get and enable optional clks perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe() dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL arch_topology: Provide a stub topology_core_has_smt() for !CONFIG_GENERIC_ARCH_TOPOLOGY perf/arm-ni: Fix and optimise register offset calculation perf: arm_pmuv3: Add new Cortex and C1 CPU PMUs perf: arm_cspmu: fix error handling in arm_cspmu_impl_unregister() perf/arm-ni: Add NoC S3 support perf/arm_cspmu: nvidia: Add pmevfiltr2 support perf/arm_cspmu: nvidia: Add revision id matching perf/arm_cspmu: Add pmpidr support perf/arm_cspmu: Add callback to reset filter config perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores * for-next/misc: : Miscellaneous patches arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT arm64: mm: use untagged address to calculate page index arm64: mm: make linear mapping permission update more robust for patial range arm64/mm: Elide TLB flush in certain pte protection transitions arm64/mm: Rename try_pgd_pgtable_alloc_init_mm arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors arm64: add unlikely hint to MTE async fault check in el0_svc_common arm64: acpi: add newline to deferred APEI warning arm64: entry: Clean out some indirection arm64/mm: Ensure PGD_SIZE is aligned to 64 bytes when PA_BITS = 52 arm64/mm: Drop cpu_set_[default|idmap]_tcr_t0sz() arm64: remove unused ARCH_PFN_OFFSET arm64: use SOFTIRQ_ON_OWN_STACK for enabling softirq stack arm64: Remove assertion on CONFIG_VMAP_STACK * for-next/kselftest: : arm64 kselftest patches kselftest/arm64: Align zt-test register dumps * for-next/efi-preempt: : arm64: Make EFI calls preemptible arm64/efi: Call EFI runtime services without disabling preemption arm64/efi: Move uaccess en/disable out of efi_set_pgd() arm64/efi: Drop efi_rt_lock spinlock from EFI arch wrapper arm64/fpsimd: Permit kernel mode NEON with IRQs off arm64/fpsimd: Don't warn when EFI execution context is preemptible efi/runtime-wrappers: Keep track of the efi_runtime_lock owner efi: Add missing static initializer for efi_mm::cpus_allowed_lock * for-next/assembler-macro: : arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in headers arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers arm64: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi headers * for-next/typos: : Random typo/spelling fixes arm64: Fix double word in comments arm64: Fix typos and spelling errors in comments * for-next/sme-ptrace-disable: : Support disabling streaming mode via ptrace on SME only systems kselftest/arm64: Cover disabling streaming mode without SVE in fp-ptrace kselftst/arm64: Test NT_ARM_SVE FPSIMD format writes on non-SVE systems arm64/sme: Support disabling streaming mode via ptrace on SME only systems * for-next/local-tlbi-page-reused: : arm64, mm: avoid TLBI broadcast if page reused in write fault arm64, tlbflush: don't TLBI broadcast if page reused in write fault mm: add spurious fault fixing support for huge pmd * for-next/mpam: (34 commits) : Basic Arm MPAM driver (more to follow) MAINTAINERS: new entry for MPAM Driver arm_mpam: Add kunit tests for props_mismatch() arm_mpam: Add kunit test for bitmap reset arm_mpam: Add helper to reset saved mbwu state arm_mpam: Use long MBWU counters if supported arm_mpam: Probe for long/lwd mbwu counters arm_mpam: Consider overflow in bandwidth counter state arm_mpam: Track bandwidth counter state for power management arm_mpam: Add mpam_msmon_read() to read monitor value arm_mpam: Add helpers to allocate monitors arm_mpam: Probe and reset the rest of the features arm_mpam: Allow configuration to be applied and restored during cpu online arm_mpam: Use a static key to indicate when mpam is enabled arm_mpam: Register and enable IRQs arm_mpam: Extend reset logic to allow devices to be reset any time arm_mpam: Add a helper to touch an MSC from any CPU arm_mpam: Reset MSC controls from cpuhp callbacks arm_mpam: Merge supported features during mpam_enable() into mpam_class arm_mpam: Probe the hardware features resctrl supports arm_mpam: Add helpers for managing the locking around the mon_sel registers ... * for-next/acpi: : arm64 acpi updates ACPI: GTDT: Get rid of acpi_arch_timer_mem_init() * for-next/documentation: : arm64 Documentation updates Documentation/arm64: Fix the typo of register names
2025-11-25KVM: arm64: Convert ICH_HCR_EL2_TDIR cap to EARLY_LOCAL_CPU_FEATUREMarc Zyngier
Suzuki notices that making the ICH_HCR_EL2_TDIR capability a system one isn't a very good idea, should we end-up with CPUs that have asymmetric TDIR support (somehow unlikely, but you never know what level of stupidity vendors are up to). For this hypothetical setup, making this an "EARLY_LOCAL_CPU_FEATURE" is a much better option. This is actually consistent with what we already do with GICv5 legacy interface, so flip the capability over. Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Fixes: 2a28810cbb8b2 ("KVM: arm64: GICv3: Detect and work around the lack of ICV_DIR_EL1 trapping") Link: https://lore.kernel.org/r/5df713d4-8b79-4456-8fd1-707ca89a61b6@arm.com Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://msgid.link/20251125160144.1086511-1-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-25Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "We've got a revert due to one of the recent CCA commits breaking ACPI firmware-based error reporting, a fix for a hard-lockup introduced by a prior fix affecting non-default (CONFIG_EXPERT) configurations and another ACPI fix for systems using MMIO-based timers. Other than that, we're looking pretty good. - Avoid hardlockup when CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY=n - Fix regression in APEI/GHES error handling - Fix MMIO timers when probed via ACPI" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: proton-pack: Fix hard lockup when !MITIGATE_SPECTRE_BRANCH_HISTORY ACPI: GTDT: Correctly number platform devices for MMIO timers Revert "arm64: acpi: Enable ACPI CCEL support"
2025-11-24KVM: arm64: GICv3: Detect and work around the lack of ICV_DIR_EL1 trappingMarc Zyngier
A long time ago, an unsuspecting architect forgot to add a trap bit for ICV_DIR_EL1 in ICH_HCR_EL2. Which was unfortunate, but what's a bit of spec between friends? Thankfully, this was fixed in a later revision, and ARM "deprecates" the lack of trapping ability. Unfortuantely, a few (billion) CPUs went out with that defect, anything ARMv8.0 from ARM, give or take. And on these CPUs, you can't trap DIR on its own, full stop. As the next best thing, we can trap everything in the common group, which is a tad expensive, but hey ho, that's what you get. You can otherwise recycle the HW in the neaby bin. Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-7-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24KVM: arm64: Turn vgic-v3 errata traps into a patched-in constantMarc Zyngier
The trap bits are currently only set to manage CPU errata. However, we are about to make use of them for purposes beyond beating broken CPUs into submission. For this purpose, turn these errata-driven bits into a patched-in constant that is merged with the KVM-driven value at the point of programming the ICH_HCR_EL2 register, rather than being directly stored with with the shadow value.. This allows the KVM code to distinguish between a trap being handled for the purpose of an erratum workaround, or for KVM's own need. Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-5-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24arm64: Detect FEAT_XNXOliver Upton
Detect the feature in anticipation of using it in KVM. Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20251124190158.177318-2-oupton@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24arm64: proton-pack: Fix hard lockup when !MITIGATE_SPECTRE_BRANCH_HISTORYJonathan Marek
The "drop print" commit removed the whole branch and not just the print. For some ARM64 cpus, this leads to hard lockup when CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY is not enabled. Fixes: 62e72463ca71 ("arm64: proton-pack: Drop print when !CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY") Signed-off-by: Jonathan Marek <jonathan@marek.ca> Signed-off-by: Will Deacon <will@kernel.org>
2025-11-24Revert "arm64: acpi: Enable ACPI CCEL support"Will Deacon
This reverts commit d02c2e45b1e7767b177f6854026e4ad0d70b4a4d. Mauro reports that this breaks APEI notifications on his QEMU setup because the "reserved for firmware" region still needs to be writable by Linux in order to signal _back_ to the firmware after processing the reported error: | {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 | ... | [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error | Unable to handle kernel write to read-only memory at virtual address ffff800080035018 | Mem abort info: | ESR = 0x000000009600004f | EC = 0x25: DABT (current EL), IL = 32 bits | SET = 0, FnV = 0 | EA = 0, S1PTW = 0 | FSC = 0x0f: level 3 permission fault | Data abort info: | ISV = 0, ISS = 0x0000004f, ISS2 = 0x00000000 | CM = 0, WnR = 1, TnD = 0, TagAccess = 0 | GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 | swapper pgtable: 4k pages, 52-bit VAs, pgdp=00000000505d7000 | pgd=10000000510bc003, p4d=1000000100229403, pud=100000010022a403, pmd=100000010022b403, pte=0060000139b90483 | Internal error: Oops: 000000009600004f [#1] SMP For now, revert the offending commit. We can presumably switch back to PAGE_KERNEL when bringing this back in the future. Link: https://lore.kernel.org/r/20251121224611.07efa95a@foz.lan Reported-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
2025-11-17arm64/sme: Support disabling streaming mode via ptrace on SME only systemsMark Brown
Currently it is not possible to disable streaming mode via ptrace on SME only systems, the interface for doing this is to write via NT_ARM_SVE but such writes will be rejected on a system without SVE support. Enable this functionality by allowing userspace to write SVE_PT_REGS_FPSIMD format data via NT_ARM_SVE with the vector length set to 0 on SME only systems. Such writes currently error since we require that a vector length is specified which should minimise the risk that existing software is relying on current behaviour. Reads are not supported since I am not aware of any use case for this and there is some risk that an existing userspace application may be confused if it reads NT_ARM_SVE on a system without SVE. Existing kernels will return FPSIMD formatted register state from NT_ARM_SVE if full SVE state is not stored, for example if the task has not used SVE. Returning a vector length of 0 would create a risk that software would try to do things like allocate space for register state with zero sizes, while returning a vector length of 128 bits would look like SVE is supported. It seems safer to just not make the changes to add read support. It remains possible for userspace to detect a SME only system via the ptrace interface only since reads of NT_ARM_SSVE and NT_ARM_ZA will succeed while reads of NT_ARM_SVE will fail. Read/write access to the FPSIMD registers in non-streaming mode is available via REGSET_FPR. sve_set_common() already avoids allocating SVE storage when doing a FPSIMD formatted write and allocating SME storage when doing a NT_ARM_SVE write so we change the function to validate the new case and skip setting a vector length for it. The aim is to make a minimally invasive change, no operation that would previously have succeeded will be affected, and we use a previously defined interface in new circumstances rather than define completely new ABI. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: David Spickett <david.spickett@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-13arm64/sysreg: Replace TCR_EL1 field macrosAnshuman Khandual
This just replaces all used TCR_EL1 field macros with tools sysreg variant based fields and subsequently drops them from the header (pgtable-hwdef.h), although while retaining the ones used for KVM (represented via the sysreg tools format). Cc: Will Deacon <will@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-12Merge tag 'arm64-fpsimd-on-stack-for-v6.19' into libcrypto-fpsimd-on-stackEric Biggers
Pull fpsimd-on-stack changes from Ard Biesheuvel: "Shared tag/branch for arm64 FP/SIMD changes going through libcrypto" Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-12arm64: Fix double word in commentsBo Liu
Remove the repeated word "the" in comments. Signed-off-by: Bo Liu <liubo03@inspur.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-12arm64: Fix typos and spelling errors in commentsmrigendrachaubey
This patch corrects several minor typographical and spelling errors in comments across multiple arm64 source files. No functional changes. Signed-off-by: mrigendrachaubey <mrigendra.chaubey@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-12arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stackArd Biesheuvel
Commit aefbab8e77eb16b5 ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch") added a 'kernel_fpsimd_state' field to struct thread_struct, which is the arch-specific portion of struct task_struct, and is allocated for each task in the system. The size of this field is 528 bytes, resulting in non-negligible bloat of task_struct, and the resulting memory overhead may impact performance on systems with many processes. This allocation is only used if the task is scheduled out or interrupted by a softirq while using the FP/SIMD unit in kernel mode, and so it is possible to transparently allocate this buffer on the caller's stack instead. So tweak the 'ksimd' scoped guard implementation so that a stack buffer is allocated and passed to both kernel_neon_begin() and kernel_neon_end(), and either record it in the task struct, or use it directly to preserve the task mode kernel FP/SIMD when running in softirq context. Passing the address to both functions, and checking the addresses for consistency ensures that callers of the updated bare begin/end API use it in a manner that is consistent with the new context switch semantics. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-11arm64: add unlikely hint to MTE async fault check in el0_svc_commonLi Qiang
Add unlikely() hint to the _TIF_MTE_ASYNC_FAULT flag check in el0_svc_common() since asynchronous MTE faults are expected to be rare occurrences during normal system call execution. This optimization helps the compiler to improve instruction caching and branch prediction for the common case where no asynchronous MTE faults are pending, while maintaining correct behavior for the exceptional case where such faults need to be handled prior to system call execution. Signed-off-by: Li Qiang <liqiang01@kylinos.cn> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11arm64: acpi: add newline to deferred APEI warningOsama Abdelkader
missing newline in pr_warn_ratelimited in apei_claim_sea Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11arm64: entry: Clean out some indirectionLinus Walleij
The conversion to generic IRQ entry left some functions in the EL1 (kernel) IRQ entry path very shallow, so drop the __inner_functions() where appropriate, saving some time and stack. This is not a fix but an optimization. Drop stale comments about irqentry_enter/exit() while we are at it. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11arm64/efi: Call EFI runtime services without disabling preemptionArd Biesheuvel
The only remaining reason why EFI runtime services are invoked with preemption disabled is the fact that the mm is swapped out behind the back of the context switching code. The kernel no longer disables preemption in kernel_neon_begin(). Furthermore, the EFI spec is being clarified to explicitly state that only baseline FP/SIMD is permitted in EFI runtime service implementations, and so the existing kernel mode NEON context switching code is sufficient to preserve and restore the execution context of an in-progress EFI runtime service call. Most EFI calls are made from the efi_rts_wq, which is serviced by a kthread. As kthreads never return to user space, they usually don't have an mm, and so we can use the existing infrastructure to swap in the efi_mm while the EFI call is in progress. This is visible to the scheduler, which will therefore reactivate the selected mm when switching out the kthread and back in again. Given that the EFI spec explicitly permits runtime services to be called with interrupts enabled, firmware code is already required to tolerate interruptions. So rather than disable preemption, disable only migration so that EFI runtime services are less likely to cause scheduling delays. To avoid potential issues where runtime services are interrupted while polling the secure firmware for async completions, keep migration disabled so that a runtime service invocation does not resume on a different CPU from the one it was started on. Note, though, that the firmware executes at the same privilege level as the kernel, and is therefore able to disable interrupts altogether. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11arm64/efi: Move uaccess en/disable out of efi_set_pgd()Ard Biesheuvel
efi_set_pgd() will no longer be called when invoking EFI runtime services via the efi_rts_wq work queue, but the uaccess en/disable are still needed when using PAN emulation using TTBR0 switching. So move these into the callers. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11arm64/efi: Drop efi_rt_lock spinlock from EFI arch wrapperArd Biesheuvel
Since commit 5894cf571e14 ("acpi/prmt: Use EFI runtime sandbox to invoke PRM handlers") all EFI runtime calls on arm64 are routed via the EFI runtime wrappers, which are serialized using the efi_runtime_lock semaphore. This means the efi_rt_lock spinlock in the arm64 arch wrapper code has become redundant, and can be dropped. For robustness, replace it with an assert that the EFI runtime lock is in fact held by 'current'. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11arm64/fpsimd: Permit kernel mode NEON with IRQs offArd Biesheuvel
Currently, may_use_simd() will return false when called from a context where IRQs are disabled. One notable case where this happens is when calling the ResetSystem() EFI runtime service from the reboot/poweroff code path. For this case alone, there is a substantial amount of FP/SIMD support code to handle the corner case where a EFI runtime service is invoked with IRQs disabled. The only reason kernel mode SIMD is not allowed when IRQs are disabled is that re-enabling softirqs in this case produces a noisy diagnostic when lockdep is enabled. The warning is valid, in the sense that delivering pending softirqs over the back of the call to local_bh_enable() is problematic when IRQs are disabled. While the API lacks a facility to simply mask and unmask softirqs without triggering their delivery, disabling softirqs is not needed to begin with when IRQs are disabled, given that softirqs are only every taken asynchronously over the back of a hard IRQ. So dis/enable softirq processing conditionally, based on whether IRQs are enabled, and relax the check in may_use_simd(). Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11arm64/fpsimd: Don't warn when EFI execution context is preemptibleArd Biesheuvel
Kernel mode FP/SIMD no longer requires preemption to be disabled, so only warn on uses of FP/SIMD from preemptible context if the fallback path is taken for cases where kernel mode NEON would not be allowed otherwise. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-11Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "There's more here than I would ideally like at this stage, but there's been a steady trickle of fixes and some of them took a few rounds of review. The bulk of the changes are fixing some fallout from the recent BBM level two support which allows the linear map to be split from block to page mappings at runtime, but inadvertently led to sleeping in atomic context on some paths where the linear map was already mapped with page granularity. The fix is simply to avoid splitting in those cases but the implementation of that is a little involved. The other interesting fix is addressing a catastophic performance issue with our per-cpu atomics discovered by Paul in the SRCU locking code but which took some interactions with the hardware folks to resolve. Summary: - Avoid sleeping in atomic context when changing linear map permissions for DEBUG_PAGEALLOC or KFENCE - Rework printing of Spectre mitigation status to avoid hardlockup when enabling per-task mitigations on the context-switch path - Reject kernel modules when instruction patching fails either due to the DWARF-based SCS patching or because of an alternatives callback residing outside of the core kernel text - Propagate error when updating kernel memory permissions in kprobes - Drop pointless, incorrect message when enabling the ACPI SPCR console - Use value-returning LSE instructions for per-cpu atomics to reduce latency in SRCU locking routines" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Reject modules with internal alternative callbacks arm64: Fail module loading if dynamic SCS patching fails arm64: proton-pack: Fix hard lockup due to print in scheduler context arm64: proton-pack: Drop print when !CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY arm64: mm: Tidy up force_pte_mapping() arm64: mm: Optimize range_split_to_ptes() arm64: mm: Don't sleep in split_kernel_leaf_mapping() when in atomic context arm64: kprobes: check the return value of set_memory_rox() arm64: acpi: Drop message logging SPCR default console Revert "ACPI: Suppress misleading SPCR console message when SPCR table is absent" arm64: Use load LSE atomics for the non-return per-CPU atomic operations
2025-11-10Merge patch "kbuild: Add '-fms-extensions' to areas with dedicated CFLAGS"Christian Brauner
Nathan Chancellor <nathan@kernel.org> says: Shared branch between Kbuild and other trees for enabling '-fms-extensions' for 6.19. * tag 'kbuild-ms-extensions-6.19' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: kbuild: Add '-fms-extensions' to areas with dedicated CFLAGS Kbuild: enable -fms-extensions jfs: Rename _inline to avoid conflict with clang's '-fms-extensions' Link: https://patch.msgid.link/20251101-kbuild-ms-extensions-dedicated-cflags-v1-1-38004aba524b@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-09mm/huge_memory: initialise the tags of the huge zero folioCatalin Marinas
On arm64 with MTE enabled, a page mapped as Normal Tagged (PROT_MTE) in user space will need to have its allocation tags initialised. This is normally done in the arm64 set_pte_at() after checking the memory attributes. Such page is also marked with the PG_mte_tagged flag to avoid subsequent clearing. Since this relies on having a struct page, pte_special() mappings are ignored. Commit d82d09e48219 ("mm/huge_memory: mark PMD mappings of the huge zero folio special") maps the huge zero folio special and the arm64 set_pmd_at() will no longer zero the tags. There is no guarantee that the tags are zero, especially if parts of this huge page have been previously tagged. It's fairly easy to detect this by regularly dropping the caches to force the reallocation of the huge zero folio. Allocate the huge zero folio with the __GFP_ZEROTAGS flag. In addition, do not warn in the arm64 __access_remote_tags() when reading tags from the huge zero page. I bundled the arm64 change in here as well since they are both related to the commit mapping the huge zero folio as special. [catalin.marinas@arm.com: handle arch mte_zero_clear_page_tags() code issuing MTE instructions] Link: https://lkml.kernel.org/r/aQi8dA_QpXM8XqrE@arm.com Link: https://lkml.kernel.org/r/20251031170133.280742-1-catalin.marinas@arm.com Fixes: d82d09e48219 ("mm/huge_memory: mark PMD mappings of the huge zero folio special") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lance Yang <lance.yang@linux.dev> Tested-by: Beleswar Padhi <b-padhi@ti.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Aishwarya TCV <aishwarya.tcv@arm.com> Cc: David Hildenbrand (Red Hat) <david@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-07arm64: use SOFTIRQ_ON_OWN_STACK for enabling softirq stackRyo Takakura
For those architectures with HAVE_SOFTIRQ_ON_OWN_STACK use their dedicated softirq stack when !PREEMPT_RT. This condition is ensured by SOFTIRQ_ON_OWN_STACK. Let arm64 use SOFTIRQ_ON_OWN_STACK as well to select its usage of the stack. Signed-off-by: Ryo Takakura <ryotkkr98@gmail.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-07arm64: Remove assertion on CONFIG_VMAP_STACKDawei Li
CONFIG_VMAP_STACK is selected by arm64 arch unconditionly since commit ef6861b8e6dd ("arm64: Mandate VMAP_STACK"). Remove the redundant assertion and headers. Signed-off-by: Dawei Li <dawei.li@linux.dev> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-07arm64: Reject modules with internal alternative callbacksAdrian Barnaś
During module loading, check if a callback function used by the alternatives specified in the '.altinstruction' ELF section (if present) is located in core kernel .text. If not fail module loading before callback is called. Reported-by: Fanqin Cui <cuifq1@chinatelecom.cn> Closes: https://lore.kernel.org/all/20250807072700.348514-1-fanqincui@163.com/ Signed-off-by: Adrian Barnaś <abarnas@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> [will: Folded in 'noinstr' tweak from Mark] Signed-off-by: Will Deacon <will@kernel.org>
2025-11-07arm64: Fail module loading if dynamic SCS patching failsAdrian Barnaś
Disallow a module to load if SCS dynamic patching fails for its code. For module loading, instead of running a dry-run to check for patching errors, try to run patching in the first run and propagate any errors so module loading will fail. Signed-off-by: Adrian Barnaś <abarnas@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
2025-11-07arm64: proton-pack: Fix hard lockup due to print in scheduler contextshechenglong
Relocate the printk() calls from spectre_v4_mitigations_off() and spectre_v2_mitigations_off() into setup_system_capabilities() function, preventing hard lockups caused by printk calls in scheduler context: | _raw_spin_lock_nested+168 | ttwu_queue+180 (rq_lock(rq, &rf); 2nd acquiring the rq->__lock) | try_to_wake_up+548 | wake_up_process+32 | __up+88 | up+100 | __up_console_sem+96 | console_unlock+696 | vprintk_emit+428 | vprintk_default+64 | vprintk_func+220 | printk+104 | spectre_v4_enable_task_mitigation+344 | __switch_to+100 | __schedule+1028 (rq_lock(rq, &rf); 1st acquiring the rq->__lock) | schedule_idle+48 | do_idle+388 | cpu_startup_entry+44 | secondary_start_kernel+352 Suggested-by: Mark Rutland <mark.rutland@arm.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: shechenglong <shechenglong@xfusion.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-11-07arm64: proton-pack: Drop print when !CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORYshechenglong
Following the pattern established with other Spectre mitigations, do not print a message when the CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY Kconfig option is disabled. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: shechenglong <shechenglong@xfusion.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-11-07arm64: kprobes: check the return value of set_memory_rox()Yang Shi
Since commit a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full"), __change_memory_common has more chance to fail due to memory allocation failure when splitting page table. So check the return value of set_memory_rox(), then bail out if it fails otherwise we may have RW memory mapping for kprobes insn page. Fixes: 195a1b7d8388 ("arm64: kprobes: call set_memory_rox() for kprobe page") Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-11-07arm64: acpi: Drop message logging SPCR default consolePunit Agrawal
Commit f5a4af3c7527 ("ACPI: Add acpi=nospcr to disable ACPI SPCR as default console on ARM64") introduced a command line parameter to prevent using SPCR provided console as default. It also introduced a message to log this choice. Drop the message as it is not particularly useful and can be incorrect in situations where no SPCR is provided by the firmware. Link: https://lore.kernel.org/all/aQN0YWUYaPYWpgJM@willie-the-truck/ Signed-off-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-11-07Revert "ACPI: Suppress misleading SPCR console message when SPCR table is ↵Punit Agrawal
absent" This reverts commit bad3fa2fb9206f4dcec6ddef094ec2fbf6e8dcb2. Commit bad3fa2fb920 ("ACPI: Suppress misleading SPCR console message when SPCR table is absent") mistakenly assumes acpi_parse_spcr() returning 0 to indicate a failure to parse SPCR. While addressing the resultant incorrect logging it was deemed that dropping the message is a better approach as it is not particularly useful. Roll back the commit introducing the bug as a step towards dropping the log message. Link: https://lore.kernel.org/all/aQN0YWUYaPYWpgJM@willie-the-truck/ Signed-off-by: Punit Agrawal <punit.agrawal@oss.qualcomm.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-11-04entry: Split up exit_to_user_mode_prepare()Thomas Gleixner
exit_to_user_mode_prepare() is used for both interrupts and syscalls, but there is extra rseq work, which is only required for in the interrupt exit case. Split up the function and provide wrappers for syscalls and interrupts, which allows to separate the rseq exit work in the next step. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251027084307.782234789@linutronix.de
2025-10-30kbuild: Add '-fms-extensions' to areas with dedicated CFLAGSNathan Chancellor
This is a follow up to commit c4781dc3d1cf ("Kbuild: enable -fms-extensions") but in a separate change due to being substantially different from the initial submission. There are many places within the kernel that use their own CFLAGS instead of the main KBUILD_CFLAGS, meaning code written with the main kernel's use of '-fms-extensions' in mind that may be tangentially included in these areas will result in "error: declaration does not declare anything" messages from the compiler. Add '-fms-extensions' to all these areas to ensure consistency, along with -Wno-microsoft-anon-tag to silence clang's warning about use of the extension that the kernel cares about using. parisc does not build with clang so it does not need this warning flag. LoongArch does not need it either because -W flags from KBUILD_FLAGS are pulled into cflags-vdso. Reported-by: Christian Brauner <brauner@kernel.org> Closes: https://lore.kernel.org/20251030-meerjungfrau-getrocknet-7b46eacc215d@brauner/ Reviewed-by: Christian Brauner <brauner@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2025-10-27genirq: Update request_percpu_nmi() to take an affinityMarc Zyngier
Continue spreading the notion of affinity to the per CPU interrupt request code by updating the call sites that use request_percpu_nmi() (all two of them) to take an affinity pointer. This pointer is firmly NULL for now. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Will Deacon <will@kernel.org> Link: https://patch.msgid.link/20251020122944.3074811-16-maz@kernel.org
2025-10-27Merge 6.18-rc3 into driver-core-nextGreg Kroah-Hartman
We need the driver core fixes in here as well to build on top of. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>