| Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/lindholm/alpha
Pull alpha updates from Magnus Lindholm:
"Two small uapi fixes. One patch hardcodes TC* ioctl values that
previously depended on the deprecated termio struct, avoiding build
issues with newer glibc versions. The other patch switches uapi
headers to use the compiler-defined __ASSEMBLER__ macro for better
consistency between kernel and userspace.
- don't reference obsolete termio struct for TC* constants
- Replace __ASSEMBLY__ with __ASSEMBLER__ in the alpha headers"
* tag 'alpha-for-v6.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/lindholm/alpha:
alpha: don't reference obsolete termio struct for TC* constants
alpha: Replace __ASSEMBLY__ with __ASSEMBLER__ in the alpha headers
|
|
Pull ARM updates from Russell King:
- disable jump label and high PTE for PREEMPT RT kernels
- fix input operand modification in load_unaligned_zeropad()
- fix hash_name() / fault path induced warnings
- fix branch predictor hardening
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
ARM: fix branch predictor hardening
ARM: fix hash_name() fault
ARM: allow __do_kernel_fault() to report execution of memory faults
ARM: group is_permission_fault() with is_translation_fault()
ARM: 9464/1: fix input-only operand modification in load_unaligned_zeropad()
ARM: 9461/1: Disable HIGHPTE on PREEMPT_RT kernels
ARM: 9459/1: Disable jump-label on PREEMPT_RT
|
|
Commit 9a7c987fb92b ("crypto: arm64/ghash - Use API partial block
handling") made ghash_finup() pass the wrong buffer to
ghash_do_simd_update(). As a result, ghash-neon now produces incorrect
outputs when the message length isn't divisible by 16 bytes. Fix this.
(I didn't notice this earlier because this code is reached only on CPUs
that support NEON but not PMULL. I haven't yet found a way to get
qemu-system-aarch64 to emulate that configuration.)
Fixes: 9a7c987fb92b ("crypto: arm64/ghash - Use API partial block handling")
Cc: stable@vger.kernel.org
Reported-by: Diederik de Haas <diederik@cknow-tech.com>
Closes: https://lore.kernel.org/linux-crypto/DETXT7QI62KE.F3CGH2VWX1SC@cknow-tech.com/
Tested-by: Diederik de Haas <diederik@cknow-tech.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Link: https://lore.kernel.org/r/20251209223417.112294-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
|
|
__do_user_fault() may be called with indeterminent interrupt enable
state, which means we may be preemptive at this point. This causes
problems when calling harden_branch_predictor(). For example, when
called from a data abort, do_alignment_fault()->do_bad_area().
Move harden_branch_predictor() out of __do_user_fault() and into the
calling contexts.
Moving it into do_kernel_address_page_fault(), we can be sure that
interrupts will be disabled here.
Converting do_translation_fault() to use do_kernel_address_page_fault()
rather than do_bad_area() means that we keep branch predictor handling
for translation faults. Interrupts will also be disabled at this call
site.
do_sect_fault() needs special handling, so detect user mode accesses
to kernel-addresses, and add an explicit call to branch predictor
hardening.
Finally, add branch predictor hardening to do_alignment() for the
faulting case (user mode accessing kernel addresses) before interrupts
are enabled.
This should cover all cases where harden_branch_predictor() is called,
ensuring that it is always has interrupts disabled, also ensuring that
it is called early in each call path.
Reviewed-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Tested-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
Zizhi Wo reports:
"During the execution of hash_name()->load_unaligned_zeropad(), a
potential memory access beyond the PAGE boundary may occur. For
example, when the filename length is near the PAGE_SIZE boundary.
This triggers a page fault, which leads to a call to
do_page_fault()->mmap_read_trylock(). If we can't acquire the lock,
we have to fall back to the mmap_read_lock() path, which calls
might_sleep(). This breaks RCU semantics because path lookup occurs
under an RCU read-side critical section."
This is seen with CONFIG_DEBUG_ATOMIC_SLEEP=y and CONFIG_KFENCE=y.
Kernel addresses (with the exception of the vectors/kuser helper
page) do not have VMAs associated with them. If the vectors/kuser
helper page faults, then there are two possibilities:
1. if the fault happened while in kernel mode, then we're basically
dead, because the CPU won't be able to vector through this page
to handle the fault.
2. if the fault happened while in user mode, that means the page was
protected from user access, and we want to fault anyway.
Thus, we can handle kernel addresses from any context entirely
separately without going anywhere near the mmap lock. This gives us
an entirely non-sleeping path for all kernel mode kernel address
faults.
As we handle the kernel address faults before interrupts are enabled,
this change has the side effect of improving the branch predictor
hardening, but does not completely solve the issue.
Reported-by: Zizhi Wo <wozizhi@huaweicloud.com>
Reported-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Link: https://lore.kernel.org/r/20251126090505.3057219-1-wozizhi@huaweicloud.com
Reviewed-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Tested-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
Allow __do_kernel_fault() to detect the execution of memory, so we can
provide the same fault message as do_page_fault() would do. This is
required when we split the kernel address fault handling from the
main do_page_fault() code path.
Reviewed-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Tested-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
Zero can be a valid value of num_records. For example, on Intel Atom x6425RE,
only x87 and SSE are supported (features 0, 1), and fpu_user_cfg.max_features
is 3. The for_each_extended_xfeature() loop only iterates feature 2, which is
not enabled, so num_records = 0. This is valid and should not cause core dump
failure.
The issue is that dump_xsave_layout_desc() returns 0 for both genuine errors
(dump_emit() failure) and valid cases (no extended features). Use negative
return values for errors and only abort on genuine failures.
Fixes: ba386777a30b ("x86/elf: Add a new FPU buffer layout info to x86 core files")
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/20251210000219.4094353-2-yongxin.liu@windriver.com
|
|
BPF JIT programs and trampolines use a frame pointer, so the current ORC
unwinder strategy of falling back to frame pointers (when an ORC entry
is missing) usually works in practice when unwinding through BPF JIT
stack frames.
However, that frame pointer fallback is just a guess, so the unwind gets
marked unreliable for live patching, which can cause livepatch
transition stalls.
Make the common case reliable by calling the bpf_has_frame_pointer()
helper to detect the valid frame pointer region of BPF JIT programs and
trampolines.
Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder")
Reported-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
Closes: https://lore.kernel.org/0e555733-c670-4e84-b2e6-abb8b84ade38@crowdstrike.com
Acked-by: Song Liu <song@kernel.org>
Acked-and-tested-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/a18505975662328c8ffb1090dded890c6f8c1004.1764818927.git.jpoimboe@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
|
|
Introduce a bpf_has_frame_pointer() helper that unwinders can call to
determine whether a given instruction pointer is within the valid frame
pointer region of a BPF JIT program or trampoline (i.e., after the
prologue, before the epilogue).
This will enable livepatch (with the ORC unwinder) to reliably unwind
through BPF JIT frames.
Acked-by: Song Liu <song@kernel.org>
Acked-and-tested-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/fd2bc5b4e261a680774b28f6100509fd5ebad2f0.1764818927.git.jpoimboe@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
|
|
Analogically to the x86 commit 881a9c9cb785 ("bpf: Do not audit
capability check in do_jit()"), change the capable() call to
ns_capable_noaudit() in order to avoid spurious SELinux denials in audit
log.
The commit log from that commit applies here as well:
"""
The failure of this check only results in a security mitigation being
applied, slightly affecting performance of the compiled BPF program. It
doesn't result in a failed syscall, an thus auditing a failed LSM
permission check for it is unwanted. For example with SELinux, it causes
a denial to be reported for confined processes running as root, which
tends to be flagged as a problem to be fixed in the policy. Yet
dontauditing or allowing CAP_SYS_ADMIN to the domain may not be
desirable, as it would allow/silence also other checks - either going
against the principle of least privilege or making debugging potentially
harder.
Fix it by changing it from capable() to ns_capable_noaudit(), which
instructs the LSMs to not audit the resulting denials.
"""
Fixes: f300769ead03 ("arm64: bpf: Only mitigate cBPF programs loaded by unprivileged users")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Link: https://lore.kernel.org/r/20251204125916.441021-1-omosnace@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Pull csky updates from Guo Ren:
- Remove compile warning for CONFIG_SMP
- Fix __ASSEMBLER__ typo in headers
- Fix csky_cmpxchg_fixup
* tag 'csky-for-linus-6.19' of https://github.com/c-sky/csky-linux:
csky: Remove compile warning for CONFIG_SMP
csky: Replace __ASSEMBLY__ with __ASSEMBLER__ in uapi header
csky: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-uapi headers
csky: fix csky_cmpxchg_fixup not working
|
|
Merge the two ksimd scopes in the implementation of SM4-XTS to prevent
stack bloat in cases where the compiler fails to combine the stack slots
for the kernel mode FP/SIMD buffers.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20251203163803.157541-6-ardb@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
The ciphertext stealing logic in the AES-XTS implementation creates a
separate ksimd scope to call into the FP/SIMD core routines, and in some
cases (CONFIG_KASAN_STACK is one, but there might be others), the 528
byte kernel mode FP/SIMD buffer that is allocated inside this scope is
not shared with the preceding ksimd scope, resulting in unnecessary
stack bloat.
Considering that
a) the XTS ciphertext stealing logic is never called for block
encryption use cases, and XTS is rarely used for anything else,
b) in the vast majority of cases, the entire input block is processed
during the first iteration of the loop,
we can combine both ksimd scopes into a single one with no practical
impact on how often/how long FP/SIMD is en/disabled, allowing us to
reuse the same stack slot for both FP/SIMD routine calls.
Fixes: ba3c1b3b5ac9 ("crypto/arm64: aes-blk - Switch to 'ksimd' scoped guard API")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20251203163803.157541-5-ardb@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
Replace the RISCV_ISA_V dependency of the RISC-V crypto code with
RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS, which implies RISCV_ISA_V as
well as vector unaligned accesses being efficient.
This is necessary because this code assumes that vector unaligned
accesses are supported and are efficient. (It does so to avoid having
to use lots of extra vsetvli instructions to switch the element width
back and forth between 8 and either 32 or 64.)
This was omitted from the code originally just because the RISC-V kernel
support for detecting this feature didn't exist yet. Support has now
been added, but it's fragmented into per-CPU runtime detection, a
command-line parameter, and a kconfig option. The kconfig option is the
only reasonable way to do it, though, so let's just rely on that.
Fixes: eb24af5d7a05 ("crypto: riscv - add vector crypto accelerated AES-{ECB,CBC,CTR,XTS}")
Fixes: bb54668837a0 ("crypto: riscv - add vector crypto accelerated ChaCha20")
Fixes: 600a3853dfa0 ("crypto: riscv - add vector crypto accelerated GHASH")
Fixes: 8c8e40470ffe ("crypto: riscv - add vector crypto accelerated SHA-{256,224}")
Fixes: b3415925a08b ("crypto: riscv - add vector crypto accelerated SHA-{512,384}")
Fixes: 563a5255afa2 ("crypto: riscv - add vector crypto accelerated SM3")
Fixes: b8d06352bbf3 ("crypto: riscv - add vector crypto accelerated SM4")
Cc: stable@vger.kernel.org
Reported-by: Vivian Wang <wangruikang@iscas.ac.cn>
Closes: https://lore.kernel.org/r/b3cfcdac-0337-4db0-a611-258f2868855f@iscas.ac.cn/
Reviewed-by: Jerry Shih <jerry.shih@sifive.com>
Link: https://lore.kernel.org/r/20251206213750.81474-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS updates from Thomas Bogendoerfer:
"Just cleanups and fixes"
* tag 'mips_6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Fix whitespace damage in r4k_wait from VS timer fix
mips: kvm: simplify kvm_mips_deliver_interrupts()
MIPS: alchemy: mtx1: switch to static device properties
mips: Remove __GFP_HIGHMEM masking
MIPS: ftrace: Fix memory corruption when kernel is located beyond 32 bits
MIPS: dts: Always descend vendor subdirectories
mips: configs: loongson1: Update defconfig
MIPS: Fix HOTPLUG_PARALLEL dependency
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-auto
Pull __auto_type to auto conversion from Peter Anvin:
"Convert '__auto_type' to 'auto', defining a macro for 'auto' unless
C23+ is in use"
* tag 'auto-type-conversion-for-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-auto:
tools/virtio: replace "__auto_type" with "auto"
selftests/bpf: replace "__auto_type" with "auto"
arch/x86: replace "__auto_type" with "auto"
arch/nios2: replace "__auto_type" and adjacent equivalent with "auto"
fs/proc: replace "__auto_type" with "const auto"
include/linux: change "__auto_type" to "auto"
compiler_types.h: add "auto" as a macro for "__auto_type"
|
|
Let's properly adjust BALLOON_MIGRATE like the other drivers.
Note that the INFLATE/DEFLATE events are triggered from the core when
enqueueing/dequeueing pages.
This was found by code inspection.
Link: https://lkml.kernel.org/r/20251021100606.148294-3-david@redhat.com
Fixes: fe030c9b85e6 ("powerpc/pseries/cmm: Implement balloon compaction")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
CONFIG_BALLOON_COMPACTION
Patch series "powerpc/pseries/cmm: two smaller fixes".
Two smaller fixes identified while doing a bigger rework.
This patch (of 2):
We always have to initialize the balloon_dev_info, even when compaction is
not configured in: otherwise the containing list and the lock are left
uninitialized.
Likely not many such configs exist in practice, but let's CC stable to
be sure.
This was found by code inspection.
Link: https://lkml.kernel.org/r/20251021100606.148294-1-david@redhat.com
Link: https://lkml.kernel.org/r/20251021100606.148294-2-david@redhat.com
Fixes: fe030c9b85e6 ("powerpc/pseries/cmm: Implement balloon compaction")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
If amd_uncore_event_init() fails, return an error irrespective of the
pmu_version. Setting hwc->config should be safe even if there is an
error so use this opportunity to simplify the code.
Closes: https://lore.kernel.org/all/aTaI0ci3vZ44lmBn@stanley.mountain/
Fixes: d6389d3ccc13 ("perf/x86/amd/uncore: Refactor uncore management")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/076935e23a70335d33bd6e23308b75ae0ad35ba2.1765268667.git.sandipan.das@amd.com
|
|
Group is_permission_fault() with is_translation_fault(), which is
needed to use is_permission_fault() in __do_kernel_fault(). As
this is static inline, there is no need for this to be under
CONFIG_MMU.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
In the inline assembly inside load_unaligned_zeropad(), the "addr" is
constrained as input-only operand. The compiler assumes that on exit
from the asm statement these operands contain the same values as they
had before executing the statement, but when kernel page fault happened, the assembly fixup code "bic %2 %2, #0x3" modify the value of "addr", which may lead to an unexpected behavior.
Use a temporary variable "tmp" to handle it, instead of modifying the
input-only operand, just like what arm64's load_unaligned_zeropad()
does.
Fixes: b9a50f74905a ("ARM: 7450/1: dcache: select DCACHE_WORD_ACCESS for little-endian ARMv6+ CPUs")
Co-developed-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Signed-off-by: Xie Yuanbin <xieyuanbin1@huawei.com>
Signed-off-by: Liyuan Pang <pangliyuan1@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
Replace instances of "__auto_type" with "auto" in:
arch/x86/include/asm/bug.h
arch/x86/include/asm/string_64.h
arch/x86/include/asm/uaccess_64.h
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
|
|
Replace uses of "__auto_type" in arch/nios2/include/asm/uaccess.h with
"auto", and equivalently convert an adjacent cast to the analogous
form.
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
|
|
Similar in nature to ab107276607af90b13a5994997e19b7b9731e251. glibc-2.42
drops the legacy termio struct, but the ioctls.h header still defines some
TC* constants in terms of termio (via sizeof). Hardcode the values instead.
This fixes building Python for example, which falls over like:
./Modules/termios.c:1119:16: error: invalid application of 'sizeof' to incomplete type 'struct termio'
Link: https://bugs.gentoo.org/961769
Link: https://bugs.gentoo.org/962600
Signed-off-by: Sam James <sam@gentoo.org>
Reviewed-by: Magnus Lindholm <linmag7@gmail.com>
Link: https://lore.kernel.org/r/6ebd3451908785cad53b50ca6bc46cfe9d6bc03c.1764922497.git.sam@gentoo.org
Signed-off-by: Magnus Lindholm <linmag7@gmail.com>
|
|
While the GCC and Clang compilers already define __ASSEMBLER__
automatically when compiling assembly code, __ASSEMBLY__ is a
macro that only gets defined by the Makefiles in the kernel.
This can be very confusing when switching between userspace
and kernelspace coding, or when dealing with uapi headers that
rather should use __ASSEMBLER__ instead. So let's standardize now
on the __ASSEMBLER__ macro that is provided by the compilers.
This is a completely mechanical patch (done with a simple "sed -i"
statement).
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: linux-alpha@vger.kernel.org
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Magnus Lindholm <linmag7@gmail.com>
Link: https://lore.kernel.org/r/20251121100044.282684-2-thuth@redhat.com
Signed-off-by: Magnus Lindholm <linmag7@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:
- Enhancements to Linux as the root partition for Microsoft Hypervisor:
- Support a new mode called L1VH, which allows Linux to drive the
hypervisor running the Azure Host directly
- Support for MSHV crash dump collection
- Allow Linux's memory management subsystem to better manage guest
memory regions
- Fix issues that prevented a clean shutdown of the whole system on
bare metal and nested configurations
- ARM64 support for the MSHV driver
- Various other bug fixes and cleanups
- Add support for Confidential VMBus for Linux guest on Hyper-V
- Secure AVIC support for Linux guests on Hyper-V
- Add the mshv_vtl driver to allow Linux to run as the secure kernel in
a higher virtual trust level for Hyper-V
* tag 'hyperv-next-signed-20251207' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (58 commits)
mshv: Cleanly shutdown root partition with MSHV
mshv: Use reboot notifier to configure sleep state
mshv: Add definitions for MSHV sleep state configuration
mshv: Add support for movable memory regions
mshv: Add refcount and locking to mem regions
mshv: Fix huge page handling in memory region traversal
mshv: Move region management to mshv_regions.c
mshv: Centralize guest memory region destruction
mshv: Refactor and rename memory region handling functions
mshv: adjust interrupt control structure for ARM64
Drivers: hv: use kmalloc_array() instead of kmalloc()
mshv: Add ioctl for self targeted passthrough hvcalls
Drivers: hv: Introduce mshv_vtl driver
Drivers: hv: Export some symbols for mshv_vtl
static_call: allow using STATIC_CALL_TRAMP_STR() from assembly
mshv: Extend create partition ioctl to support cpu features
mshv: Allow mappings that overlap in uaddr
mshv: Fix create memory region overlap check
mshv: add WQ_PERCPU to alloc_workqueue users
Drivers: hv: Use kmalloc_array() instead of kmalloc()
...
|
|
If an APICv status updated was pended while L2 was active, immediately
refresh vmcs01's controls instead of pending KVM_REQ_APICV_UPDATE as
kvm_vcpu_update_apicv() only calls into vendor code if a change is
necessary.
E.g. if APICv is inhibited, and then activated while L2 is running:
kvm_vcpu_update_apicv()
|
-> __kvm_vcpu_update_apicv()
|
-> apic->apicv_active = true
|
-> vmx_refresh_apicv_exec_ctrl()
|
-> vmx->nested.update_vmcs01_apicv_status = true
|
-> return
Then L2 exits to L1:
__nested_vmx_vmexit()
|
-> kvm_make_request(KVM_REQ_APICV_UPDATE)
vcpu_enter_guest(): KVM_REQ_APICV_UPDATE
-> kvm_vcpu_update_apicv()
|
-> __kvm_vcpu_update_apicv()
|
-> return // because if (apic->apicv_active == activate)
Reported-by: Chao Gao <chao.gao@intel.com>
Closes: https://lore.kernel.org/all/aQ2jmnN8wUYVEawF@intel.com
Fixes: 7c69661e225c ("KVM: nVMX: Defer APICv updates while L2 is active until L1 is active")
Cc: stable@vger.kernel.org
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
[sean: write changelog]
Link: https://patch.msgid.link/20251205231913.441872-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
The APICv (apic->apicv_active) can be activated or deactivated at runtime,
for instance, because of APICv inhibit reasons. Intel VMX employs different
mechanisms to virtualize LAPIC based on whether APICv is active.
When APICv is activated at runtime, GUEST_INTR_STATUS is used to configure
and report the current pending IRR and ISR states. Unless a specific vector
is explicitly included in EOI_EXIT_BITMAP, its EOI will not be trapped to
KVM. Intel VMX automatically clears the corresponding ISR bit based on the
GUEST_INTR_STATUS.SVI field.
When APICv is deactivated at runtime, the VM_ENTRY_INTR_INFO_FIELD is used
to specify the next interrupt vector to invoke upon VM-entry. The
VMX IDT_VECTORING_INFO_FIELD is used to report un-invoked vectors on
VM-exit. EOIs are always trapped to KVM, so the software can manually clear
pending ISR bits.
There are scenarios where, with APICv activated at runtime, a guest-issued
EOI may not be able to clear the pending ISR bit.
Taking vector 236 as an example, here is one scenario.
1. Suppose APICv is inactive. Vector 236 is pending in the IRR.
2. To handle KVM_REQ_EVENT, KVM moves vector 236 from the IRR to the ISR,
and configures the VM_ENTRY_INTR_INFO_FIELD via vmx_inject_irq().
3. After VM-entry, vector 236 is invoked through the guest IDT. At this
point, the data in VM_ENTRY_INTR_INFO_FIELD is no longer valid. The guest
interrupt handler for vector 236 is invoked.
4. Suppose a VM exit occurs very early in the guest interrupt handler,
before the EOI is issued.
5. Nothing is reported through the IDT_VECTORING_INFO_FIELD because
vector 236 has already been invoked in the guest.
6. Now, suppose APICv is activated. Before the next VM-entry, KVM calls
kvm_vcpu_update_apicv() to activate APICv.
7. Unfortunately, GUEST_INTR_STATUS.SVI is not configured, although
vector 236 is still pending in the ISR.
8. After VM-entry, the guest finally issues the EOI for vector 236.
However, because SVI is not configured, vector 236 is not cleared.
9. ISR is stalled forever on vector 236.
Here is another scenario.
1. Suppose APICv is inactive. Vector 236 is pending in the IRR.
2. To handle KVM_REQ_EVENT, KVM moves vector 236 from the IRR to the ISR,
and configures the VM_ENTRY_INTR_INFO_FIELD via vmx_inject_irq().
3. VM-exit occurs immediately after the next VM-entry. The vector 236 is
not invoked through the guest IDT. Instead, it is saved to the
IDT_VECTORING_INFO_FIELD during the VM-exit.
4. KVM calls kvm_queue_interrupt() to re-queue the un-invoked vector 236
into vcpu->arch.interrupt. A KVM_REQ_EVENT is requested.
5. Now, suppose APICv is activated. Before the next VM-entry, KVM calls
kvm_vcpu_update_apicv() to activate APICv.
6. Although APICv is now active, KVM still uses the legacy
VM_ENTRY_INTR_INFO_FIELD to re-inject vector 236. GUEST_INTR_STATUS.SVI is
not configured.
7. After the next VM-entry, vector 236 is invoked through the guest IDT.
Finally, an EOI occurs. However, due to the lack of GUEST_INTR_STATUS.SVI
configuration, vector 236 is not cleared from the ISR.
8. ISR is stalled forever on vector 236.
Using QEMU as an example, vector 236 is stuck in ISR forever.
(qemu) info lapic 1
dumping local APIC state for CPU 1
LVT0 0x00010700 active-hi edge masked ExtINT (vec 0)
LVT1 0x00010400 active-hi edge masked NMI
LVTPC 0x00000400 active-hi edge NMI
LVTERR 0x000000fe active-hi edge Fixed (vec 254)
LVTTHMR 0x00010000 active-hi edge masked Fixed (vec 0)
LVTT 0x000400ec active-hi edge tsc-deadline Fixed (vec 236)
Timer DCR=0x0 (divide by 2) initial_count = 0 current_count = 0
SPIV 0x000001ff APIC enabled, focus=off, spurious vec 255
ICR 0x000000fd physical edge de-assert no-shorthand
ICR2 0x00000000 cpu 0 (X2APIC ID)
ESR 0x00000000
ISR 236
IRR 37(level) 236
The issue isn't applicable to AMD SVM as KVM simply writes vmcb01 directly
irrespective of whether L1 (vmcs01) or L2 (vmcb02) is active (unlike VMX,
there is no need/cost to switch between VMCBs). In addition,
APICV_INHIBIT_REASON_IRQWIN ensures AMD SVM AVIC is not activated until
the last interrupt is EOI'd.
Fix the bug by configuring Intel VMX GUEST_INTR_STATUS.SVI if APICv is
activated at runtime.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Link: https://patch.msgid.link/20251110063212.34902-1-dongli.zhang@oracle.com
[sean: call out that SVM writes vmcb01 directly, tweak comment]
Link: https://patch.msgid.link/20251205231913.441872-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
All objects are supposed to have a minimal alignment of two, since a
couple of instructions only work with even addresses. Add the missing
align statement for the file string.
Fixes: 6584ff203aec ("bugs/s390: Use 'cond_str' in __EMIT_BUG()")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Fallback to generic BUG implementation in case CONFIG_BUG is disabled.
This restores the old behaviour before 'cond_str' support was added.
It probably doesn't matter, since nobody should disable CONFIG_BUG, but at
least this is consistent to before.
Fixes: 6584ff203aec ("bugs/s390: Use 'cond_str' in __EMIT_BUG()")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
A few checks were missing in gmap_helper_zap_one_page(), which can lead
to memory corruption in the guest under specific circumstances.
Add the missing checks.
Fixes: 5deafa27d9ae ("KVM: s390: Fix to clear PTE when discarding a swapped page")
Cc: stable@vger.kernel.org
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Tested-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Add loongson32_defconfig (for 32BIT) and rename loongson3_defconfig to
loongson64_defconfig (for 64BIT).
Also adjust graphics drivers, such as FB_EFI is replaced with EFIDRM.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Adjust VDSO/VSYSCALL because read_cpu_id() for 32BIT/64BIT are
different, and LoongArch32 doesn't support GENERIC_GETTIMEOFDAY now
(will be supported in future).
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Adjust misc routines for both 32BIT and 64BIT, including: bitops, bswap,
checksum, string, jump label, unaligned access emulator, suspend/wakeup
routines, etc.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Adjust user accessors for both 32BIT and 64BIT, including: get_user(),
put_user(), copy_user(), clear_user(), etc.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Adjust system call for both 32BIT and 64BIT, including: add the uapi
unistd_{32,64}.h and syscall_table_{32,64}.h inclusion, add sys_mmap2()
definition, change the system call entry routines, etc.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Adjust module loader for both 32BIT and 64BIT, including: change the s64
type to long, change the u64 type to unsigned long, change the plt entry
definition and handling, etc.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Adjust time routines for both 32BIT and 64BIT, including: rdtime_h() /
rdtime_l() definitions for 32BIT and rdtime_d() definition for 64BIT,
get_cycles() and get_cycles64() definitions for 32BIT/64BIT, show time
frequency info ("CPU MHz" and "BogoMIPS") in /proc/cpuinfo, etc.
Use do_div() for division which works on both 32BIT and 64BIT platforms.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Adjust process management for both 32BIT and 64BIT, including: CPU
context switching, FPU loading/restoring, process dumping and process
tracing routines.
Q: Why modify switch.S?
A: LoongArch32 has no ldptr.d/stptr.d instructions, and asm offsets of
thead_struct members are too large to be filled in the 12b immediate
field of ld.w/st.w.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Adjust memory management for both 32BIT and 64BIT, including: address
space definition, DMW CSR definition, page table bits definition, boot
time detection of VA/PA bits, page table init, tlb exception handling,
copy_page/clear_page/dump_tlb libraries, etc.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Yawei Li <liyawei@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Adjust boot & setup for both 32BIT and 64BIT, including: efi header
definition, MAX_IO_PICS definition, kernel entry and environment setup
routines, etc.
Add a fallback path in fdt_cpu_clk_init() to avoid 0MHz in /proc/cpuinfo
if there is no valid clock freq from firmware.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
s390 is one of the last architectures using the legacy API for setup and
teardown of PCI MSI IRQs. Migrate the s390 IRQ allocation and teardown
to the MSI parent domain API. For details, see:
https://lore.kernel.org/lkml/20221111120501.026511281@linutronix.de
In detail, create an MSI parent domain for each PCI domain. When a PCI
device sets up MSI or MSI-X IRQs, the library creates a per-device IRQ
domain for this device, which is used by the device for allocating and
freeing IRQs.
The per-device domain delegates this allocation and freeing to the
parent-domain. In the end, the corresponding callbacks of the parent
domain are responsible for allocating and freeing the IRQs.
The allocation is split into two parts:
- zpci_msi_prepare() is called once for each device and allocates the
required resources. On s390, each PCI function has its own airq
vector and a summary bit, which must be configured once per function.
This is done in prepare().
- zpci_msi_alloc() can be called multiple times for allocating one or
more MSI/MSI-X IRQs. This creates a mapping between the virtual IRQ
number in the kernel and the hardware IRQ number.
Freeing is split into two counterparts:
- zpci_msi_free() reverts the effects of zpci_msi_alloc() and
- zpci_msi_teardown() reverts the effects of zpci_msi_prepare(). This is
called once when all IRQs are freed before a device is removed.
Since the parent domain in the end allocates the IRQs, the hwirq
encoding must be unambiguous for all IRQs of all devices. This is
achieved by encoding the hwirq using the devfn and the MSI index.
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Tobias Schumacher <ts@linux.ibm.com>
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
After support for VIRT_XFER_TO_GUEST_WORK is available for s390 it is
possible to also select HAVE_POSIX_CPU_TIMERS_TASK_WORK. See [1] for the
reasons why it makes sense, also for architectures which do not support
PREEMPT_RT.
[1] https://lore.kernel.org/all/20200716201923.228696399@linutronix.de
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Teach the memory hotplug path to tear down KASAN shadow that
was mapped during early boot when a memory block is offlined.
Track for each sclp_mem whether its range was covered by the early
KASAN shadow via an early_shadow_mapped flag. When such a block is
deconfigured and removed via sclp_config_mem_store(), compute the
corresponding shadow range and call vmemmap_free() to unmap the
boot mapped shadow, then clear the flag.
Using vmemmap_free() for the early shadow is safe despite the use
of large mappings in the boot-time KASAN setup. The initial shadow
is mapped with 1M and 2G pages, where possible. The minimum hotplug
memory block size is 128M and always aligned (the identity mapping
is at least 2G aligned), which corresponds to a 16M chunk of at
least 1M aligned shadow. PMD-mapped 1M shadow pages therefore
never need splitting, and PUD-mapped 2G shadow pages can now be
split following the preceding changes.
Relax the modify_pagetable() sanity check in vmem so that, with
KASAN enabled, it may also operate on the KASAN shadow region in
addition to the 1:1 mapping and vmemmap area. This allows the KASAN
shadow unmapping to reuse the common vmem helpers.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Export split_pud_page() so it can be used from the vmem code and teach
modify_pud_table() to split PUD-sized mappings when only a subrange
needs to be removed.
If the range to be removed covers a full PUD-sized mapping, keep the
existing behavior: clear the PUD entry and free the backing large page
(for non-direct mappings). Otherwise, split the PUD-mapped page into
PMD mappings and let the walker handle the smaller ranges.
This is needed for KASAN early shadow removal support: memory hotplug
freeing the KASAN early shadow is the only expected caller that will
try to free 2G PUD-mapped regions of non-direct mappings.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Make boot_pte_alloc() always allocate a full PAGE_SIZE page for
PTE tables, instead of carving two 2K PTE tables out of a single
4K page, similar to commit daa8af80d283 ("s390/mm: Allocate page
table with PAGE_SIZE granularity").
This mirrors the change in the vmem code and ensures that boot page
tables backing the early KASAN shadow can later be fully freed by
the vmem page-table teardown helpers (e.g. when unmapping early
KASAN shadow on memory hotplug).
The leftover-based allocation was originally added to reduce physmem
allocator fragmentation when EDAT was disabled. On current hardware
EDAT1 is available on all production systems, so the complexity is no
longer justified and gets in the way of freeing the shadow mappings.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
ALSA 0.9.0-rc3 is from 2002, 23 years old.
Signed-off-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Janusz Krzysztofik <jmkrzyszt@gmail.com>
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Acked-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/20251203-old-alsa-v1-1-ac80704f52c3@ixit.cz
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial updates from Greg KH:
"Here is the big set of tty/serial driver changes for 6.19-rc1. Nothing
major at all, just small constant churn to make the tty layer
"cleaner" as well as serial driver updates and even a new test added!
Included in here are:
- More tty/serial cleanups from Jiri
- tty tiocsti test added to hopefully ensure we don't regress in this
area again
- sc16is7xx driver updates
- imx serial driver updates
- 8250 driver updates
- new hardware device ids added
- other minor serial/tty driver cleanups and tweaks
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (60 commits)
serial: sh-sci: Fix deadlock during RSCI FIFO overrun error
dt-bindings: serial: rsci: Drop "uart-has-rtscts: false"
LoongArch: dts: Add uart new compatible string
serial: 8250: Add Loongson uart driver support
dt-bindings: serial: 8250: Add Loongson uart compatible
serial: 8250: add driver for KEBA UART
serial: Keep rs485 settings for devices without firmware node
serial: qcom-geni: Enable Serial on SA8255p Qualcomm platforms
serial: qcom-geni: Enable PM runtime for serial driver
serial: sprd: Return -EPROBE_DEFER when uart clock is not ready
tty: serial: samsung: Declare earlycon for Exynos850
serial: icom: Convert PCIBIOS_* return codes to errnos
serial: 8250-of: Fix style issues in 8250_of.c
serial: add support of CPCI cards
serial: mux: Fix kernel doc for mux_poll()
tty: replace use of system_unbound_wq with system_dfl_wq
serial: 8250_platform: simplify IRQF_SHARED handling
serial: 8250: make share_irqs local to 8250_platform
serial: 8250: move skip_txen_test to core
serial: drop SERIAL_8250_DEPRECATED_OPTIONS
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc/IIO driver updates from Greg KH:
"Here is the big set of char/misc/iio driver updates for 6.19-rc1. Lots
of stuff in here including:
- lots of IIO driver updates, cleanups, and additions
- large interconnect driver changes as they get converted over to a
dynamic system of ids
- coresight driver updates
- mwave driver updates
- binder driver updates and changes
- comedi driver fixes now that the fuzzers are being set loose on
them
- nvmem driver updates
- new uio driver addition
- lots of other small char/misc driver updates, full details in the
shortlog
All of these have been in linux-next for a while now"
* tag 'char-misc-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (304 commits)
char: applicom: fix NULL pointer dereference in ac_ioctl
hangcheck-timer: fix coding style spacing
hangcheck-timer: Replace %Ld with %lld
hangcheck-timer: replace printk(KERN_CRIT) with pr_crit
uio: Add SVA support for PCI devices via uio_pci_generic_sva.c
dt-bindings: slimbus: fix warning from example
intel_th: Fix error handling in intel_th_output_open
misc: rp1: Fix an error handling path in rp1_probe()
char: xillybus: add WQ_UNBOUND to alloc_workqueue users
misc: bh1770glc: use pm_runtime_resume_and_get() in power_state_store
misc: cb710: Fix a NULL vs IS_ERR() check in probe()
mux: mmio: Add suspend and resume support
virt: acrn: split acrn_mmio_dev_res out of acrn_mmiodev
greybus: gb-beagleplay: Fix timeout handling in bootloader functions
greybus: add WQ_PERCPU to alloc_workqueue users
char/mwave: drop typedefs
char/mwave: drop printk wrapper
char/mwave: remove printk tracing
char/mwave: remove unneeded fops
char/mwave: remove MWAVE_FUTZ_WITH_OTHER_DEVICES ifdeffery
...
|