| Age | Commit message (Collapse) | Author |
|
Conversion performed via this Coccinelle script:
// SPDX-License-Identifier: GPL-2.0-only
// Options: --include-headers-for-types --all-includes --include-headers --keep-comments
virtual patch
@gfp depends on patch && !(file in "tools") && !(file in "samples")@
identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex,
kzalloc_obj,kzalloc_objs,kzalloc_flex,
kvmalloc_obj,kvmalloc_objs,kvmalloc_flex,
kvzalloc_obj,kvzalloc_objs,kvzalloc_flex};
@@
ALLOC(...
- , GFP_KERNEL
)
$ make coccicheck MODE=patch COCCI=gfp.cocci
Build and boot tested x86_64 with Fedora 42's GCC and Clang:
Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01
Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This converts some of the visually simpler cases that have been split
over multiple lines. I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.
Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script. I probably had made it a bit _too_ trivial.
So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.
The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.
As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Paul Walmsley:
- Add support for control flow integrity for userspace processes.
This is based on the standard RISC-V ISA extensions Zicfiss and
Zicfilp
- Improve ptrace behavior regarding vector registers, and add some
selftests
- Optimize our strlen() assembly
- Enable the ISO-8859-1 code page as built-in, similar to ARM64, for
EFI volume mounting
- Clean up some code slightly, including defining copy_user_page() as
copy_page() rather than memcpy(), aligning us with other
architectures; and using max3() to slightly simplify an expression
in riscv_iommu_init_check()
* tag 'riscv-for-linus-7.0-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (42 commits)
riscv: lib: optimize strlen loop efficiency
selftests: riscv: vstate_exec_nolibc: Use the regular prctl() function
selftests: riscv: verify ptrace accepts valid vector csr values
selftests: riscv: verify ptrace rejects invalid vector csr inputs
selftests: riscv: verify syscalls discard vector context
selftests: riscv: verify initial vector state with ptrace
selftests: riscv: test ptrace vector interface
riscv: ptrace: validate input vector csr registers
riscv: csr: define vtype register elements
riscv: vector: init vector context with proper vlenb
riscv: ptrace: return ENODATA for inactive vector extension
kselftest/riscv: add kselftest for user mode CFI
riscv: add documentation for shadow stack
riscv: add documentation for landing pad / indirect branch tracking
riscv: create a Kconfig fragment for shadow stack and landing pad support
arch/riscv: add dual vdso creation logic and select vdso based on hw
arch/riscv: compile vdso with landing pad and shadow stack note
riscv: enable kernel access to shadow stack memory via the FWFT SBI call
riscv: add kernel command line option to opt out of user CFI
riscv/hwprobe: add zicfilp / zicfiss enumeration in hwprobe
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 paravirt updates from Borislav Petkov:
- A nice cleanup to the paravirt code containing a unification of the
paravirt clock interface, taming the include hell by splitting the
pv_ops structure and removing of a bunch of obsolete code (Juergen
Gross)
* tag 'x86_paravirt_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/paravirt: Use XOR r32,r32 to clear register in pv_vcpu_is_preempted()
x86/paravirt: Remove trailing semicolons from alternative asm templates
x86/pvlocks: Move paravirt spinlock functions into own header
x86/paravirt: Specify pv_ops array in paravirt macros
x86/paravirt: Allow pv-calls outside paravirt.h
objtool: Allow multiple pv_ops arrays
x86/xen: Drop xen_mmu_ops
x86/xen: Drop xen_cpu_ops
x86/xen: Drop xen_irq_ops
x86/paravirt: Move pv_native_*() prototypes to paravirt.c
x86/paravirt: Introduce new paravirt-base.h header
x86/paravirt: Move paravirt_sched_clock() related code into tsc.c
x86/paravirt: Use common code for paravirt_steal_clock()
riscv/paravirt: Use common code for paravirt_steal_clock()
loongarch/paravirt: Use common code for paravirt_steal_clock()
arm64/paravirt: Use common code for paravirt_steal_clock()
arm/paravirt: Use common code for paravirt_steal_clock()
sched: Move clock related paravirt code to kernel/sched
paravirt: Remove asm/paravirt_api_clock.h
x86/paravirt: Move thunk macros to paravirt_types.h
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance event updates from Ingo Molnar:
"x86 PMU driver updates:
- Add support for the core PMU for Intel Diamond Rapids (DMR) CPUs
(Dapeng Mi)
Compared to previous iterations of the Intel PMU code, there's been
a lot of changes, which center around three main areas:
- Introduce the OFF-MODULE RESPONSE (OMR) facility to replace the
Off-Core Response (OCR) facility
- New PEBS data source encoding layout
- Support the new "RDPMC user disable" feature
- Likewise, a large series adds uncore PMU support for Intel Diamond
Rapids (DMR) CPUs (Zide Chen)
This centers around these four main areas:
- DMR may have two Integrated I/O and Memory Hub (IMH) dies,
separate from the compute tile (CBB) dies. Each CBB and each IMH
die has its own discovery domain.
- Unlike prior CPUs that retrieve the global discovery table
portal exclusively via PCI or MSR, DMR uses PCI for IMH PMON
discovery and MSR for CBB PMON discovery.
- DMR introduces several new PMON types: SCA, HAMVF, D2D_ULA, UBR,
PCIE4, CRS, CPC, ITC, OTC, CMS, and PCIE6.
- IIO free-running counters in DMR are MMIO-based, unlike SPR.
- Also add support for Add missing PMON units for Intel Panther Lake,
and support Nova Lake (NVL), which largely maps to Panther Lake.
(Zide Chen)
- KVM integration: Add support for mediated vPMUs (by Kan Liang and
Sean Christopherson, with fixes and cleanups by Peter Zijlstra,
Sandipan Das and Mingwei Zhang)
- Add Intel cstate driver to support for Wildcat Lake (WCL) CPUs,
which are a low-power variant of Panther Lake (Zide Chen)
- Add core, cstate and MSR PMU support for the Airmont NP Intel CPU
(aka MaxLinear Lightning Mountain), which maps to the existing
Airmont code (Martin Schiller)
Performance enhancements:
- Speed up kexec shutdown by avoiding unnecessary cross CPU calls
(Jan H. Schönherr)
- Fix slow perf_event_task_exit() with LBR callstacks (Namhyung Kim)
User-space stack unwinding support:
- Various cleanups and refactorings in preparation to generalize the
unwinding code for other architectures (Jens Remus)
Uprobes updates:
- Transition from kmap_atomic to kmap_local_page (Keke Ming)
- Fix incorrect lockdep condition in filter_chain() (Breno Leitao)
- Fix XOL allocation failure for 32-bit tasks (Oleg Nesterov)
Misc fixes and cleanups:
- s390: Remove kvm_types.h from Kbuild (Randy Dunlap)
- x86/intel/uncore: Convert comma to semicolon (Chen Ni)
- x86/uncore: Clean up const mismatch (Greg Kroah-Hartman)
- x86/ibs: Fix typo in dc_l2tlb_miss comment (Xiang-Bin Shi)"
* tag 'perf-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
s390: remove kvm_types.h from Kbuild
uprobes: Fix incorrect lockdep condition in filter_chain()
x86/ibs: Fix typo in dc_l2tlb_miss comment
x86/uprobes: Fix XOL allocation failure for 32-bit tasks
perf/x86/intel/uncore: Convert comma to semicolon
perf/x86/intel: Add support for rdpmc user disable feature
perf/x86: Use macros to replace magic numbers in attr_rdpmc
perf/x86/intel: Add core PMU support for Novalake
perf/x86/intel: Add support for PEBS memory auxiliary info field in NVL
perf/x86/intel: Add core PMU support for DMR
perf/x86/intel: Add support for PEBS memory auxiliary info field in DMR
perf/x86/intel: Support the 4 new OMR MSRs introduced in DMR and NVL
perf/core: Fix slow perf_event_task_exit() with LBR callstacks
perf/core: Speed up kexec shutdown by avoiding unnecessary cross CPU calls
uprobes: use kmap_local_page() for temporary page mappings
arm/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
mips/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
arm64/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
riscv/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol()
perf/x86/intel/uncore: Add Nova Lake support
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
- Quirk the broken EFI framebuffer geometry on the Valve Steam Deck
- Capture the EDID information of the primary display also on non-x86
EFI systems when booting via the EFI stub.
* tag 'efi-next-for-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: Support EDID information
sysfb: Move edid_info into sysfb_primary_display
sysfb: Pass sysfb_primary_display to devices
sysfb: Replace screen_info with sysfb_primary_display
sysfb: Add struct sysfb_display_info
efi: sysfb_efi: Reduce number of references to global screen_info
efi: earlycon: Reduce number of references to global screen_info
efi: sysfb_efi: Fix efidrmfb and simpledrmfb on Valve Steam Deck
efi: sysfb_efi: Convert swap width and height quirk to a callback
efi: sysfb_efi: Fix lfb_linelength calculation when applying quirks
efi: sysfb_efi: Replace open coded swap with the macro
|
|
Add strict validation for vector csr registers when setting them via
ptrace:
- reject attempts to set reserved bits or invalid field combinations
- enforce strict VL checks against calculated VLMAX values
Vector specs 0.7.1 and 1.0 allow normal applications to set candidate
VL values and read back the hardware-adjusted results, see section 6
for details. Disallow such flexibility in vector ptrace operations
and strictly enforce valid VL input.
The traced process may not update its saved vector context if no vector
instructions execute between breakpoints. So the purpose of the strict
ptrace approach is to make sure that debuggers maintain an accurate view
of the tracee's vector context across multiple halt/resume debug cycles.
Signed-off-by: Sergey Matyukevich <geomatsi@gmail.com>
Reviewed-by: Andy Chiu <andybnac@gmail.com>
Tested-by: Andy Chiu <andybnac@gmail.com>
Link: https://patch.msgid.link/20251214163537.1054292-5-geomatsi@gmail.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
The vstate in thread_struct is zeroed when the vector context is
initialized. That includes read-only register vlenb, which holds
the vector register length in bytes. Zeroed state persists until
mstatus.VS becomes 'dirty' and a context switch saves the actual
hardware values.
This can expose the zero vlenb value to the user-space in early
debug scenarios, e.g. when ptrace attaches to a traced process
early, before any vector instruction except the first one was
executed.
Fix this by specifying proper vlenb on vector context init.
Signed-off-by: Sergey Matyukevich <geomatsi@gmail.com>
Reviewed-by: Andy Chiu <andybnac@gmail.com>
Tested-by: Andy Chiu <andybnac@gmail.com>
Link: https://patch.msgid.link/20251214163537.1054292-3-geomatsi@gmail.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Currently, ptrace returns EINVAL when the vector extension is supported
but not yet activated for the traced process. This error code is not
always appropriate since the ptrace arguments may be valid.
Debug tools like gdbserver expect ENODATA when the requested register
set is not active, e.g. see [1]. This expectation seems to be more
appropriate, so modify the vector ptrace implementation to return:
- EINVAL when V extension is not supported
- ENODATA when V extension is supported but not active
[1] https://github.com/bminor/binutils-gdb/blob/637f25e88675fa47e47f9cc5e2cf37384836b8a2/gdbserver/linux-low.cc#L5020
Signed-off-by: Ilya Mamay <mmamayka01@gmail.com>
Signed-off-by: Sergey Matyukevich <geomatsi@gmail.com>
Reviewed-by: Andy Chiu <andybnac@gmail.com>
Tested-by: Andy Chiu <andybnac@gmail.com>
Link: https://patch.msgid.link/20251214163537.1054292-2-geomatsi@gmail.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Shadow stack instructions are taken from the Zimop ISA extension,
which is mandated on RVA23. Any userspace with shadow stack
instructions in it will fault on hardware that doesn't have support
for Zimop. Thus, a shadow stack-enabled userspace can't be run on
hardware that doesn't support Zimop.
It's not known how Linux userspace providers will respond to this kind
of binary fragmentation. In order to keep kernel portable across
different hardware, 'arch/riscv/kernel/vdso_cfi' is created which has
Makefile logic to compile 'arch/riscv/kernel/vdso' sources with CFI
flags, and 'arch/riscv/kernel/vdso.c' is modified to select the
appropriate vdso depending on whether the underlying CPU implements
the Zimop extension. Since the offset of vdso symbols will change due
to having two different vdso binaries, there is added logic to include
a new generated vdso offset header and dynamically select the offset
(like for rt_sigreturn).
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Acked-by: Charles Mirabile <cmirabil@redhat.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-24-b55691eacf4f@rivosinc.com
[pjw@kernel.org: cleaned up patch description]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
User mode tasks compiled with Zicfilp may call indirectly into the
vdso (like hwprobe indirect calls). Add support for compiling landing
pads into the vdso. Landing pad instructions in the vdso will be
no-ops for tasks which have not enabled landing pads. Furthermore, add
support for the C sources of the vdso to be compiled with shadow stack
and landing pads enabled as well.
Landing pad and shadow stack instructions are emitted only when the
VDSO_CFI cflags option is defined during compile.
Signed-off-by: Jim Shu <jim.shu@sifive.com>
Reviewed-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-23-b55691eacf4f@rivosinc.com
[pjw@kernel.org: cleaned up patch description, issues reported by checkpatch]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
The kernel has to perform shadow stack operations on the user shadow stack.
During signal delivery and sigreturn, the shadow stack token must be
created and validated respectively. Thus shadow stack access for the kernel
must be enabled.
In the future, when kernel shadow stacks are enabled, they must be
enabled as early as possible for better coverage and to prevent any
imbalance between the regular stack and the shadow stack. After
'relocate_enable_mmu' has completed, this is the earliest that it can
be enabled.
Reviewed-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-22-b55691eacf4f@rivosinc.com
[pjw@kernel.org: updated to apply; cleaned up commit message]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Add a kernel command line option to disable part or all
of user CFI. User backward CFI and forward CFI can be controlled
independently. The kernel command line parameter "riscv_nousercfi" can
take the following values:
- "all" : Disable forward and backward cfi both
- "bcfi" : Disable backward cfi
- "fcfi" : Disable forward cfi
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-21-b55691eacf4f@rivosinc.com
[pjw@kernel.org: fixed warnings from checkpatch; cleaned up patch description, doc, printk text]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Add enumeration of the zicfilp and zicfiss extensions in the hwprobe syscall.
Reviewed-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-20-b55691eacf4f@rivosinc.com
[pjw@kernel.org: updated to apply; extend into RISCV_HWPROBE_KEY_IMA_EXT_1; clean patch description]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
We've run out of bits to describe RISC-V ISA extensions in our initial
hwprobe key, RISCV_HWPROBE_KEY_IMA_EXT_0. So, let's add
RISCV_HWPROBE_KEY_IMA_EXT_1, along with the framework to set the
appropriate hwprobe tuple, and add testing for it.
Based on a suggestion from Andrew Jones <andrew.jones@oss.qualcomm.com>,
also fix the documentation for RISCV_HWPROBE_KEY_IMA_EXT_0.
Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com>
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Expose a new register type NT_RISCV_USER_CFI for risc-v CFI status and
state. Intentionally, both landing pad and shadow stack status and
state are rolled into the CFI state. Creating two different
NT_RISCV_USER_XXX would not be useful and would waste a note
type. Enabling, disabling and locking the CFI feature is not allowed
via ptrace set interface. However, setting 'elp' state or setting
shadow stack pointer are allowed via the ptrace set interface. It is
expected that 'gdb' might need to fixup 'elp' state or 'shadow stack'
pointer.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-19-b55691eacf4f@rivosinc.com
[pjw@kernel.org: updated to apply; cleaned patch description and comments; addressed checkpatch issues]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Update __show_regs() to print the captured shadow stack pointer. On
tasks where shadow stack is disabled, simply print 0.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-18-b55691eacf4f@rivosinc.com
[pjw@kernel.org: cleaned up patch description]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Save the shadow stack pointer in the sigcontext structure when
delivering a signal. Restore the shadow stack pointer from sigcontext
on sigreturn.
As part of the save operation, the kernel uses the 'ssamoswap'
instruction to save a snapshot of the current shadow stack on the
shadow stack itself (this can be called a "save token"). During
restore on sigreturn, the kernel retrieves the save token from the top
of the shadow stack and validates it. This ensures that user mode
can't arbitrarily pivot to any shadow stack address without having a
token and thus provides a strong security assurance during the window
between signal delivery and sigreturn.
Use an ABI-compatible way of saving/restoring the shadow stack pointer
into the signal stack. This follows the vector extension, where extra
registers are placed in a form of extension header + extension body in
the stack. The extension header indicates the size of the extra
architectural states plus the size of header itself, and a magic
identifier for the extension. Then, the extension body contains the
new architectural states in the form defined by uapi.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de>
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-17-b55691eacf4f@rivosinc.com
[pjw@kernel.org: cleaned patch description, code comments; resolved checkpatch warning]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
The Zicfiss and Zicfilp extensions introduce a new exception, the
'software check exception', in the privileged ISA, with cause code =
18. This patch implements support for software check exceptions.
Additionally, the patch implements a CFI violation handler which
checks the code in the xtval register. If xtval=2, the software check
exception happened because of an indirect branch that didn't land on a
4 byte aligned PC or on a 'lpad' instruction, or the label value
embedded in 'lpad' didn't match the label value set in the x7
register. If xtval=3, the software check exception happened due to a
mismatch between the link register (x1 or x5) and the top of shadow
stack (on execution of `sspopchk`).
In case of a CFI violation, SIGSEGV is raised with code=SEGV_CPERR.
SEGV_CPERR was introduced by the x86 shadow stack patches.
To keep uprobes working, handle the uprobe event first before
reporting the CFI violation in the software check exception
handler. This is because, when the landing pad is activated, if the
uprobe point is set at the lpad instruction at the beginning of a
function, the system triggers a software check exception instead of an
ebreak exception due to the exception priority. This would prevent
uprobe from working.
Reviewed-by: Zong Li <zong.li@sifive.com>
Co-developed-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-15-b55691eacf4f@rivosinc.com
[pjw@kernel.org: cleaned up the patch description]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
This patch adds a RISC-V implementation of the following prctls:
PR_SET_INDIR_BR_LP_STATUS, PR_GET_INDIR_BR_LP_STATUS and
PR_LOCK_INDIR_BR_LP_STATUS.
Reviewed-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de>
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-14-b55691eacf4f@rivosinc.com
[pjw@kernel.org: clean up patch description]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Implement an architecture-agnostic prctl() interface for setting and
getting shadow stack status. The prctls implemented are
PR_GET_SHADOW_STACK_STATUS, PR_SET_SHADOW_STACK_STATUS and
PR_LOCK_SHADOW_STACK_STATUS.
As part of PR_SET_SHADOW_STACK_STATUS/PR_GET_SHADOW_STACK_STATUS, only
PR_SHADOW_STACK_ENABLE is implemented because RISCV allows each mode to
write to their own shadow stack using 'sspush' or 'ssamoswap'.
PR_LOCK_SHADOW_STACK_STATUS locks the current shadow stack enablement
configuration.
Reviewed-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-12-b55691eacf4f@rivosinc.com
[pjw@kernel.org: cleaned up patch description]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Userspace specifies CLONE_VM to share address space and spawn new
thread. 'clone' allows userspace to specify a new stack for a new
thread. However there is no way to specify a new shadow stack base
address without changing the API. This patch allocates a new shadow
stack whenever CLONE_VM is given.
In case of CLONE_VFORK, the parent is suspended until the child
finishes; thus the child can use the parent's shadow stack. In case of
!CLONE_VM, COW kicks in because entire address space is copied from
parent to child.
'clone3' is extensible and can provide mechanisms for specifying the
shadow stack as an input parameter. This is not settled yet and is
being extensively discussed on the mailing list. Once that's settled,
this code should be adapted.
Reviewed-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-11-b55691eacf4f@rivosinc.com
[pjw@kernel.org: cleaned up patch description]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
As discussed extensively in the changelog for the addition of this
syscall on x86 ("x86/shstk: Introduce map_shadow_stack syscall") the
existing mmap() and madvise() syscalls do not map entirely well onto the
security requirements for shadow stack memory since they lead to windows
where memory is allocated but not yet protected or stacks which are not
properly and safely initialised. Instead a new syscall map_shadow_stack()
has been defined which allocates and initialises a shadow stack page.
This patch implements this syscall for riscv. riscv doesn't require
tokens to be setup by kernel because user mode can do that by
itself. However to provide compatibility and portability with other
architectues, user mode can specify token set flag.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-10-b55691eacf4f@rivosinc.com
Link: https://lore.kernel.org/linux-riscv/aXfRPJvoSsOW8AwM@debug.ba.rivosinc.com/
[pjw@kernel.org: added allocate_shadow_stack() fix per Deepak; fixed bug found by sparse]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
'arch_calc_vm_prot_bits' is implemented on risc-v to return VM_READ |
VM_WRITE if PROT_WRITE is specified. Similarly 'riscv_sys_mmap' is
updated to convert all incoming PROT_WRITE to (PROT_WRITE | PROT_READ).
This is to make sure that any existing apps using PROT_WRITE still work.
Earlier 'protection_map[VM_WRITE]' used to pick read-write PTE encodings.
Now 'protection_map[VM_WRITE]' will always pick PAGE_SHADOWSTACK PTE
encodings for shadow stack. The above changes ensure that existing apps
continue to work because underneath, the kernel will be picking
'protection_map[VM_WRITE|VM_READ]' PTE encodings.
Reviewed-by: Zong Li <zong.li@sifive.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-6-b55691eacf4f@rivosinc.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Carve out space in the RISC-V architecture-specific thread struct for
cfi status and shadow stack in usermode.
This patch:
- defines a new structure cfi_status with status bit for cfi feature
- defines shadow stack pointer, base and size in cfi_status structure
- defines offsets to new member fields in thread in asm-offsets.c
- saves and restores shadow stack pointer on trap entry (U --> S) and exit
(S --> U)
Shadow stack save/restore is gated on feature availability and is
implemented using alternatives. CSR_SSP can be context-switched in
'switch_to' as well, but as soon as kernel shadow stack support gets
rolled in, the shadow stack pointer will need to be switched at trap
entry/exit point (much like 'sp'). It can be argued that a kernel
using a shadow stack deployment scenario may not be as prevalent as
user mode using this feature. But even if there is some minimal
deployment of kernel shadow stack, that means that it needs to be
supported. Thus save/restore of shadow stack pointer is implemented
in entry.S instead of in 'switch_to.h'.
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Zong Li <zong.li@sifive.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-5-b55691eacf4f@rivosinc.com
[pjw@kernel.org: cleaned up patch description]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
This patch adds support for detecting the RISC-V ISA extensions
Zicfiss and Zicfilp. Zicfiss and Zicfilp stand for the unprivileged
integer spec extensions for shadow stack and indirect branch tracking,
respectively.
This patch looks for Zicfiss and Zicfilp in the device tree and
accordingly lights up the corresponding bits in the cpu feature
bitmap. Furthermore this patch adds detection utility functions to
return whether shadow stack or landing pads are supported by the cpu.
Reviewed-by: Zong Li <zong.li@sifive.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Andreas Korb <andreas.korb@aisec.fraunhofer.de> # QEMU, custom CVA6
Tested-by: Valentin Haudiquet <valentin.haudiquet@canonical.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-3-b55691eacf4f@rivosinc.com
[pjw@kernel.org: updated to apply; cleaned up patch description]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Clean up a few warnings reported by sparse in
arch/riscv/kernel/signal.c. These come from code that was added
recently; they were missed when I initially reviewed the patch.
Fixes: 818d78ba1b3f ("riscv: signal: abstract header saving for setup_sigcontext")
Cc: Andy Chiu <andybnac@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202601171848.ydLTJYrz-lkp@intel.com/
[pjw@kernel.org: updated to apply]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
On RV32, updating the 64-bit stimecmp (or vstimecmp) CSR requires two
separate 32-bit writes. A race condition exists if the timer triggers
during these two writes.
The RISC-V Privileged Specification (e.g., Section 3.2.1 for mtimecmp)
recommends a specific 3-step sequence to avoid spurious interrupts
when updating 64-bit comparison registers on 32-bit systems:
1. Set the low-order bits (stimecmp) to all ones (ULONG_MAX).
2. Set the high-order bits (stimecmph) to the desired value.
3. Set the low-order bits (stimecmp) to the desired value.
Current implementation writes the LSB first without ensuring a future
value, which may lead to a transient state where the 64-bit comparison
is incorrectly evaluated as "expired" by the hardware. This results in
spurious timer interrupts.
This patch adopts the spec-recommended 3-step sequence to ensure the
intermediate 64-bit state is never smaller than the current time.
Fixes: ffef54ad4110 ("riscv: Add stimecmp save and restore")
Signed-off-by: Naohiko Shimizu <naohiko.shimizu@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://patch.msgid.link/20260104135938.524-4-naohiko.shimizu@gmail.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Remove the arch specific variant of paravirt_steal_clock() and use
the common one instead.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260105110520.21356-11-jgross@suse.com
|
|
Paravirt clock related functions are available in multiple archs.
In order to share the common parts, move the common static keys
to kernel/sched/ and remove them from the arch specific files.
Make a common paravirt_steal_clock() implementation available in
kernel/sched/cputime.c, guarding it with a new config option
CONFIG_HAVE_PV_STEAL_CLOCK_GEN, which can be selected by an arch
in case it wants to use that common variant.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260105110520.21356-7-jgross@suse.com
|
|
If sbi_ecall.c's functions are traceable,
echo "__sbi_ecall:snapshot" > /sys/kernel/tracing/set_ftrace_filter
may get the kernel into a deadlock.
(Functions in sbi_ecall.c are excluded from tracing if
CONFIG_RISCV_ALTERNATIVE_EARLY is set.)
__sbi_ecall triggers a snapshot of the ringbuffer. The snapshot code
raises an IPI interrupt, which results in another call to __sbi_ecall
and another snapshot...
All it takes to get into this endless loop is one initial __sbi_ecall.
On RISC-V systems without SSTC extension, the clock events in
timer-riscv.c issue periodic sbi ecalls, making the problem easy to
trigger.
Always exclude the sbi_ecall.c functions from tracing to fix the
potential deadlock.
sbi ecalls can easiliy be logged via trace events, excluding ecall
functions from function tracing is not a big limitation.
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Link: https://patch.msgid.link/20251223135043.1336524-1-martin@kaiser.cx
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
The print in sbi_cpu_stop() assumes smp_processor_id() returns an
unsigned int, when it is actually an int. Fix the format string to
avoid mismatch type warnings in rht pr_crit().
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Link: https://patch.msgid.link/20260102145839.657864-1-ben.dooks@codethink.co.uk
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Replace deprecated kmap_atomic() with kmap_local_page().
Signed-off-by: Keke Ming <ming.jvle@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Link: https://patch.msgid.link/20260103084243.195125-2-ming.jvle@gmail.com
|
|
Fix the reference to 'boot-image-header.rst', which was moved to
'Documentation/arch/riscv/' in commit 'ed843ae947f8'
("docs: move riscv under arch").
Signed-off-by: Soham Metha <sohammetha01@gmail.com>
Link: https://patch.msgid.link/20251203194355.63265-1-sohammetha01@gmail.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
The Zk extension is a bundle consisting of Zkn, Zkr, and Zkt. The Zkn
extension itself is a bundle consisting of Zbkb, Zbkc, Zbkx, Zknd, Zkne,
and Zknh.
The current implementation of riscv_zk_bundled_exts manually listed
the dependencies but missed RISCV_ISA_EXT_ZKNH.
Fix this by introducing a RISCV_ISA_EXT_ZKN macro that lists the Zkn
components and using it in both riscv_zk_bundled_exts and
riscv_zkn_bundled_exts.
This adds the missing Zknh extension to Zk and reduces code duplication.
Fixes: 0d8295ed975b ("riscv: add ISA extension parsing for scalar crypto")
Link: https://patch.msgid.link/20231114141256.126749-4-cleger@rivosinc.com/
Signed-off-by: Guodong Xu <guodong@riscstar.com>
Reviewed-by: Clément Léger <cleger@rivosinc.com>
Link: https://patch.msgid.link/20251223-zk-missing-zknh-v1-1-b627c990ee1a@riscstar.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Clang misinterprets the placement of test_kprobes_addresses and
test_kprobes_functions arrays when they are not explicitly assigned
to a data section. This can lead to kmalloc_array() allocation
errors and KUnit failures.
When testing the Clang-compiled code in QEMU, this warning was emitted:
WARNING: CPU: 1 PID: 3000 at mm/page_alloc.c:5159 __alloc_frozen_pages_noprof+0xe6/0x2fc mm/page_alloc.c:5159
Further investigation revealed that the test_kprobes_addresses array
appeared to have over 100,000 elements, including invalid addresses;
whereas, according to test-kprobes-asm.S, test_kprobes_addresses
should only have 25 elements.
When compiling the kernel with GCC, the kernel boots correctly.
This patch fixes the issue by adding .section .rodata to explicitly
place arrays in the read-only data segment.
For detailed debug and analysis, see:
https://github.com/j1akai/temp/blob/main/20251113/readme.md
v1 -> v2:
- Drop changes to .align, and .globl.
Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn>
Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com>
Link: https://patch.msgid.link/738dd4e2.ff73.19a7cd7b4d5.Coremail.xujiakai2025@iscas.ac.cn
Link: https://github.com/llvm/llvm-project/issues/168308
Link: https://patch.msgid.link/20251226032317.1523764-1-jiakaiPeanut@gmail.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
The syscall number is a user-controlled value used to index into the
syscall table. Use array_index_nospec() to clamp this value after the
bounds check to prevent speculative out-of-bounds access and subsequent
data leakage via cache side channels.
Signed-off-by: Lukas Gerlach <lukas.gerlach@cispa.de>
Link: https://patch.msgid.link/20251218191332.35849-3-lukas.gerlach@cispa.de
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Export Zilsd and Zclsd ISA extensions through hwprobe.
Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Link: https://patch.msgid.link/20250826162939.1494021-4-pincheng.plct@isrc.iscas.ac.cn
[pjw@kernel.org: fixed whitespace; updated to apply]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Add parsing for Zilsd and Zclsd ISA extensions which were ratified in
commit f88abf1 ("Integrating load/store pair for RV32 with the
main manual") of the riscv-isa-manual.
Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn>
Reviewed-by: Nutty Liu <nutty.liu@hotmail.com>
Link: https://patch.msgid.link/20250826162939.1494021-3-pincheng.plct@isrc.iscas.ac.cn
[pjw@kernel.org: cleaned up checkpatch issues, whitespace; updated to apply]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
The function save_v_state() served two purposes. First, it saved
extension context into the signal stack. Then, it constructed the
extension header if there was no fault. The second part is independent
of the extension itself. As a result, we can pull that part out, so
future extensions may reuse it. This patch adds arch_ext_list and makes
setup_sigcontext() go through all possible extensions' save() callback.
The callback returns a positive value indicating the size of the
successfully saved extension. Then the kernel proceeds to construct the
header for that extension. The kernel skips an extension if it does
not exist, or if the saving fails for some reasons. The error code is
propagated out on the later case.
This patch does not introduce any functional changes.
Signed-off-by: Andy Chiu <andybnac@gmail.com>
Link: https://patch.msgid.link/20251112-v5_user_cfi_series-v23-16-b55691eacf4f@rivosinc.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Replace the global screen_info with sysfb_primary_display of type
struct sysfb_display_info. Adapt all users of screen_info.
Instances of screen_info are defined for x86, loongarch and EFI,
with only one instance compiled into a specific build. Replace all
of them with sysfb_primary_display.
All existing users of screen_info are updated by pointing them to
sysfb_primary_display.screen instead. This introduces some churn to
the code, but has no impact on functionality.
Boot parameters and EFI config tables are unchanged. They transfer
screen_info as before. The logic in EFI's alloc_screen_info() changes
slightly, as it now returns the screen field of sysfb_primary_display.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci/
Reviewed-by: Richard Lyu <richard.lyu@suse.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Paul Walmsley:
- Enable parallel hotplug for RISC-V
- Optimize vector regset allocation for ptrace()
- Add a kernel selftest for the vector ptrace interface
- Enable the userspace RAID6 test to build and run using RISC-V vectors
- Add initial support for the Zalasr RISC-V ratified ISA extension
- For the Zicbop RISC-V ratified ISA extension to userspace, expose
hardware and kernel support to userspace and add a kselftest for
Zicbop
- Convert open-coded instances of 'asm goto's that are controlled by
runtime ALTERNATIVEs to use riscv_has_extension_{un,}likely(),
following arm64's alternative_has_cap_{un,}likely()
- Remove an unnecessary mask in the GFP flags used in some calls to
pagetable_alloc()
* tag 'riscv-for-linus-6.19-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
selftests/riscv: Add Zicbop prefetch test
riscv: hwprobe: Expose Zicbop extension and its block size
riscv: Introduce Zalasr instructions
riscv: hwprobe: Export Zalasr extension
dt-bindings: riscv: Add Zalasr ISA extension description
riscv: Add ISA extension parsing for Zalasr
selftests: riscv: Add test for the Vector ptrace interface
riscv: ptrace: Optimize the allocation of vector regset
raid6: test: Add support for RISC-V
raid6: riscv: Allow code to be compiled in userspace
raid6: riscv: Prevent compiler from breaking inline vector assembly code
riscv: cmpxchg: Use riscv_has_extension_likely
riscv: bitops: Use riscv_has_extension_likely
riscv: hweight: Use riscv_has_extension_likely
riscv: checksum: Use riscv_has_extension_likely
riscv: pgtable: Use riscv_has_extension_unlikely
riscv: Remove __GFP_HIGHMEM masking
RISC-V: Enable HOTPLUG_PARALLEL for secondary CPUs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki)
Rework the vmalloc() code to support non-blocking allocations
(GFP_ATOIC, GFP_NOWAIT)
"ksm: fix exec/fork inheritance" (xu xin)
Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not
inherited across fork/exec
"mm/zswap: misc cleanup of code and documentations" (SeongJae Park)
Some light maintenance work on the zswap code
"mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira)
Enhance the /sys/kernel/debug/page_owner debug feature by adding
unique identifiers to differentiate the various stack traces so
that userspace monitoring tools can better match stack traces over
time
"mm/page_alloc: pcp->batch cleanups" (Joshua Hahn)
Minor alterations to the page allocator's per-cpu-pages feature
"Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra)
Address a scalability issue in userfaultfd's UFFDIO_MOVE operation
"kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov)
"drivers/base/node: fold node register and unregister functions" (Donet Tom)
Clean up the NUMA node handling code a little
"mm: some optimizations for prot numa" (Kefeng Wang)
Cleanups and small optimizations to the NUMA allocation hinting
code
"mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn)
Address long lock hold times at boot on large machines. These were
causing (harmless) softlockup warnings
"optimize the logic for handling dirty file folios during reclaim" (Baolin Wang)
Remove some now-unnecessary work from page reclaim
"mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park)
Enhance the DAMOS auto-tuning feature
"mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan)
Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace
configuration
"expand mmap_prepare functionality, port more users" (Lorenzo Stoakes)
Enhance the new(ish) file_operations.mmap_prepare() method and port
additional callsites from the old ->mmap() over to ->mmap_prepare()
"Fix stale IOTLB entries for kernel address space" (Lu Baolu)
Fix a bug (and possible security issue on non-x86) in the IOMMU
code. In some situations the IOMMU could be left hanging onto a
stale kernel pagetable entry
"mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang)
Clean up and optimize the folio splitting code
"mm, swap: misc cleanup and bugfix" (Kairui Song)
Some cleanups and a minor fix in the swap discard code
"mm/damon: misc documentation fixups" (SeongJae Park)
"mm/damon: support pin-point targets removal" (SeongJae Park)
Permit userspace to remove a specific monitoring target in the
middle of the current targets list
"mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo)
A couple of cleanups related to mm header file inclusion
"mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He)
improve the selection of swap devices for NUMA machines
"mm: Convert memory block states (MEM_*) macros to enums" (Israel Batista)
Change the memory block labels from macros to enums so they will
appear in kernel debug info
"ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes)
Address an inefficiency when KSM unmerges an address range
"mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park)
Fix leaks and unhandled malloc() failures in DAMON userspace unit
tests
"some cleanups for pageout()" (Baolin Wang)
Clean up a couple of minor things in the page scanner's
writeback-for-eviction code
"mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu)
Move hugetlb's sysfs/sysctl handling code into a new file
"introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes)
Make the VMA guard regions available in /proc/pid/smaps and
improves the mergeability of guarded VMAs
"mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes)
Reduce mmap lock contention for callers performing VMA guard region
operations
"vma_start_write_killable" (Matthew Wilcox)
Start work on permitting applications to be killed when they are
waiting on a read_lock on the VMA lock
"mm/damon/tests: add more tests for online parameters commit" (SeongJae Park)
Add additional userspace testing of DAMON's "commit" feature
"mm/damon: misc cleanups" (SeongJae Park)
"make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes)
Address the possible loss of a VMA's VM_SOFTDIRTY flag when that
VMA is merged with another
"mm: support device-private THP" (Balbir Singh)
Introduce support for Transparent Huge Page (THP) migration in zone
device-private memory
"Optimize folio split in memory failure" (Zi Yan)
"mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang)
Some more cleanups in the folio splitting code
"mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes)
Clean up our handling of pagetable leaf entries by introducing the
concept of 'software leaf entries', of type softleaf_t
"reparent the THP split queue" (Muchun Song)
Reparent the THP split queue to its parent memcg. This is in
preparation for addressing the long-standing "dying memcg" problem,
wherein dead memcg's linger for too long, consuming memory
resources
"unify PMD scan results and remove redundant cleanup" (Wei Yang)
A little cleanup in the hugepage collapse code
"zram: introduce writeback bio batching" (Sergey Senozhatsky)
Improve zram writeback efficiency by introducing batched bio
writeback support
"memcg: cleanup the memcg stats interfaces" (Shakeel Butt)
Clean up our handling of the interrupt safety of some memcg stats
"make vmalloc gfp flags usage more apparent" (Vishal Moola)
Clean up vmalloc's handling of incoming GFP flags
"mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang)
Teach soft dirty and userfaultfd write protect tracking to use
RISC-V's Svrsw60t59b extension
"mm: swap: small fixes and comment cleanups" (Youngjun Park)
Fix a small bug and clean up some of the swap code
"initial work on making VMA flags a bitmap" (Lorenzo Stoakes)
Start work on converting the vma struct's flags to a bitmap, so we
stop running out of them, especially on 32-bit
"mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park)
Address a possible bug in the swap discard code and clean things
up a little
[ This merge also reverts commit ebb9aeb980e5 ("vfio/nvgrace-gpu:
register device memory for poison handling") because it looks
broken to me, I've asked for clarification - Linus ]
* tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
mm: fix vma_start_write_killable() signal handling
mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate
mm/swapfile: fix list iteration when next node is removed during discard
fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling
mm/kfence: add reboot notifier to disable KFENCE on shutdown
memcg: remove inc/dec_lruvec_kmem_state helpers
selftests/mm/uffd: initialize char variable to Null
mm: fix DEBUG_RODATA_TEST indentation in Kconfig
mm: introduce VMA flags bitmap type
tools/testing/vma: eliminate dependency on vma->__vm_flags
mm: simplify and rename mm flags function for clarity
mm: declare VMA flags by bit
zram: fix a spelling mistake
mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity
mm/vmscan: skip increasing kswapd_failures when reclaim was boosted
pagemap: update BUDDY flag documentation
mm: swap: remove scan_swap_map_slots() references from comments
mm: swap: change swap_alloc_slow() to void
mm, swap: remove redundant comment for read_swap_cache_async
mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational
...
|
|
The Svrsw60t59b extension allows to free the PTE reserved bits 60 and 59
for software to use.
Link: https://lkml.kernel.org/r/20251113072806.795029-4-zhangchunyan@iscas.ac.cn
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Deepak Gupta <debug@rivosinc.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Conor Dooley <conor.dooley@microchip.com>
Cc: Conor Dooley <conor@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
- Add `RISCV_HWPROBE_EXT_ZICBOP` to report the presence of the
Zicbop extension.
- Add `RISCV_HWPROBE_KEY_ZICBOP_BLOCK_SIZE` to expose the block
size (in bytes) when Zicbop is supported.
- Update hwprobe.rst to document the new extension bit and block
size key, following the existing Zicbom/Zicboz style.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn>
Link: https://patch.msgid.link/20251118162436.15485-2-zihong.plct@isrc.iscas.ac.cn
[pjw@kernel.org: updated to apply]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Export the Zalasr extension to userspace using hwprobe.
Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Link: https://patch.msgid.link/20251020042056.30283-4-luxu.kernel@bytedance.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Add parsing for Zalasr ISA extension.
Signed-off-by: Xu Lu <luxu.kernel@bytedance.com>
Link: https://patch.msgid.link/20251020042056.30283-2-luxu.kernel@bytedance.com
[pjw@kernel.org: updated to apply]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|