From a14d11a0f5f4105e0df96811dfa81dc5f79fecba Mon Sep 17 00:00:00 2001 From: Andrea Parri Date: Wed, 31 Jan 2024 15:49:34 +0100 Subject: membarrier: Create Documentation/scheduler/membarrier.rst To gather the architecture requirements of the "private/global expedited" membarrier commands. The file will be expanded to integrate further information about the membarrier syscall (as needed/desired in the future). While at it, amend some related inline comments in the membarrier codebase. Suggested-by: Mathieu Desnoyers Signed-off-by: Andrea Parri Reviewed-by: Mathieu Desnoyers Link: https://lore.kernel.org/r/20240131144936.29190-3-parri.andrea@gmail.com Signed-off-by: Palmer Dabbelt --- Documentation/scheduler/index.rst | 1 + Documentation/scheduler/membarrier.rst | 39 ++++++++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+) create mode 100644 Documentation/scheduler/membarrier.rst (limited to 'Documentation') diff --git a/Documentation/scheduler/index.rst b/Documentation/scheduler/index.rst index 3170747226f6..43bd8a145b7a 100644 --- a/Documentation/scheduler/index.rst +++ b/Documentation/scheduler/index.rst @@ -7,6 +7,7 @@ Scheduler completion + membarrier sched-arch sched-bwc sched-deadline diff --git a/Documentation/scheduler/membarrier.rst b/Documentation/scheduler/membarrier.rst new file mode 100644 index 000000000000..2387804b1c63 --- /dev/null +++ b/Documentation/scheduler/membarrier.rst @@ -0,0 +1,39 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================== +membarrier() System Call +======================== + +MEMBARRIER_CMD_{PRIVATE,GLOBAL}_EXPEDITED - Architecture requirements +===================================================================== + +Memory barriers before updating rq->curr +---------------------------------------- + +The commands MEMBARRIER_CMD_PRIVATE_EXPEDITED and MEMBARRIER_CMD_GLOBAL_EXPEDITED +require each architecture to have a full memory barrier after coming from +user-space, before updating rq->curr. This barrier is implied by the sequence +rq_lock(); smp_mb__after_spinlock() in __schedule(). The barrier matches a full +barrier in the proximity of the membarrier system call exit, cf. +membarrier_{private,global}_expedited(). + +Memory barriers after updating rq->curr +--------------------------------------- + +The commands MEMBARRIER_CMD_PRIVATE_EXPEDITED and MEMBARRIER_CMD_GLOBAL_EXPEDITED +require each architecture to have a full memory barrier after updating rq->curr, +before returning to user-space. The schemes providing this barrier on the various +architectures are as follows. + + - alpha, arc, arm, hexagon, mips rely on the full barrier implied by + spin_unlock() in finish_lock_switch(). + + - arm64 relies on the full barrier implied by switch_to(). + + - powerpc, riscv, s390, sparc, x86 rely on the full barrier implied by + switch_mm(), if mm is not NULL; they rely on the full barrier implied + by mmdrop(), otherwise. On powerpc and riscv, switch_mm() relies on + membarrier_arch_switch_mm(). + +The barrier matches a full barrier in the proximity of the membarrier system call +entry, cf. membarrier_{private,global}_expedited(). -- cgit v1.2.3 From cd9b29014dc69609489261efe351d0c7709ae8bf Mon Sep 17 00:00:00 2001 From: Andrea Parri Date: Wed, 31 Jan 2024 15:49:36 +0100 Subject: membarrier: riscv: Provide core serializing command RISC-V uses xRET instructions on return from interrupt and to go back to user-space; the xRET instruction is not core serializing. Use FENCE.I for providing core serialization as follows: - by calling sync_core_before_usermode() on return from interrupt (cf. ipi_sync_core()), - via switch_mm() and sync_core_before_usermode() (respectively, for uthread->uthread and kthread->uthread transitions) before returning to user-space. On RISC-V, the serialization in switch_mm() is activated by resetting the icache_stale_mask of the mm at prepare_sync_core_cmd(). Suggested-by: Palmer Dabbelt Signed-off-by: Andrea Parri Reviewed-by: Mathieu Desnoyers Link: https://lore.kernel.org/r/20240131144936.29190-5-parri.andrea@gmail.com Signed-off-by: Palmer Dabbelt --- .../sched/membarrier-sync-core/arch-support.txt | 18 +++++++++++++- MAINTAINERS | 1 + arch/riscv/Kconfig | 3 +++ arch/riscv/include/asm/membarrier.h | 19 ++++++++++++++ arch/riscv/include/asm/sync_core.h | 29 ++++++++++++++++++++++ kernel/sched/core.c | 4 +++ kernel/sched/membarrier.c | 4 +++ 7 files changed, 77 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/include/asm/sync_core.h (limited to 'Documentation') diff --git a/Documentation/features/sched/membarrier-sync-core/arch-support.txt b/Documentation/features/sched/membarrier-sync-core/arch-support.txt index d96b778b87ed..7425d2b994a3 100644 --- a/Documentation/features/sched/membarrier-sync-core/arch-support.txt +++ b/Documentation/features/sched/membarrier-sync-core/arch-support.txt @@ -10,6 +10,22 @@ # Rely on implicit context synchronization as a result of exception return # when returning from IPI handler, and when returning to user-space. # +# * riscv +# +# riscv uses xRET as return from interrupt and to return to user-space. +# +# Given that xRET is not core serializing, we rely on FENCE.I for providing +# core serialization: +# +# - by calling sync_core_before_usermode() on return from interrupt (cf. +# ipi_sync_core()), +# +# - via switch_mm() and sync_core_before_usermode() (respectively, for +# uthread->uthread and kthread->uthread transitions) before returning +# to user-space. +# +# The serialization in switch_mm() is activated by prepare_sync_core_cmd(). +# # * x86 # # x86-32 uses IRET as return from interrupt, which takes care of the IPI. @@ -43,7 +59,7 @@ | openrisc: | TODO | | parisc: | TODO | | powerpc: | ok | - | riscv: | TODO | + | riscv: | ok | | s390: | ok | | sh: | TODO | | sparc: | TODO | diff --git a/MAINTAINERS b/MAINTAINERS index 78c835cc69c9..cc80968ec355 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -14041,6 +14041,7 @@ L: linux-kernel@vger.kernel.org S: Supported F: Documentation/scheduler/membarrier.rst F: arch/*/include/asm/membarrier.h +F: arch/*/include/asm/sync_core.h F: include/uapi/linux/membarrier.h F: kernel/sched/membarrier.c diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 087abf9e51c6..70836381b948 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -28,14 +28,17 @@ config RISCV select ARCH_HAS_GIGANTIC_PAGE select ARCH_HAS_KCOV select ARCH_HAS_MEMBARRIER_CALLBACKS + select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_MMIOWB select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PMEM_API + select ARCH_HAS_PREPARE_SYNC_CORE_CMD select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_SET_DIRECT_MAP if MMU select ARCH_HAS_SET_MEMORY if MMU select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL select ARCH_HAS_STRICT_MODULE_RWX if MMU && !XIP_KERNEL + select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE select ARCH_HAS_SYSCALL_WRAPPER select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAS_UBSAN_SANITIZE_ALL diff --git a/arch/riscv/include/asm/membarrier.h b/arch/riscv/include/asm/membarrier.h index 6c016ebb5020..47b240d0d596 100644 --- a/arch/riscv/include/asm/membarrier.h +++ b/arch/riscv/include/asm/membarrier.h @@ -22,6 +22,25 @@ static inline void membarrier_arch_switch_mm(struct mm_struct *prev, /* * The membarrier system call requires a full memory barrier * after storing to rq->curr, before going back to user-space. + * + * This barrier is also needed for the SYNC_CORE command when + * switching between processes; in particular, on a transition + * from a thread belonging to another mm to a thread belonging + * to the mm for which a membarrier SYNC_CORE is done on CPU0: + * + * - [CPU0] sets all bits in the mm icache_stale_mask (in + * prepare_sync_core_cmd()); + * + * - [CPU1] stores to rq->curr (by the scheduler); + * + * - [CPU0] loads rq->curr within membarrier and observes + * cpu_rq(1)->curr->mm != mm, so the IPI is skipped on + * CPU1; this means membarrier relies on switch_mm() to + * issue the sync-core; + * + * - [CPU1] switch_mm() loads icache_stale_mask; if the bit + * is zero, switch_mm() may incorrectly skip the sync-core. + * * Matches a full barrier in the proximity of the membarrier * system call entry. */ diff --git a/arch/riscv/include/asm/sync_core.h b/arch/riscv/include/asm/sync_core.h new file mode 100644 index 000000000000..9153016da8f1 --- /dev/null +++ b/arch/riscv/include/asm/sync_core.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_RISCV_SYNC_CORE_H +#define _ASM_RISCV_SYNC_CORE_H + +/* + * RISC-V implements return to user-space through an xRET instruction, + * which is not core serializing. + */ +static inline void sync_core_before_usermode(void) +{ + asm volatile ("fence.i" ::: "memory"); +} + +#ifdef CONFIG_SMP +/* + * Ensure the next switch_mm() on every CPU issues a core serializing + * instruction for the given @mm. + */ +static inline void prepare_sync_core_cmd(struct mm_struct *mm) +{ + cpumask_setall(&mm->context.icache_stale_mask); +} +#else +static inline void prepare_sync_core_cmd(struct mm_struct *mm) +{ +} +#endif /* CONFIG_SMP */ + +#endif /* _ASM_RISCV_SYNC_CORE_H */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index a972628e7756..e4a87bcf28d4 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6721,6 +6721,10 @@ static void __sched notrace __schedule(unsigned int sched_mode) * * The barrier matches a full barrier in the proximity of * the membarrier system call entry. + * + * On RISC-V, this barrier pairing is also needed for the + * SYNC_CORE command when switching between processes, cf. + * the inline comments in membarrier_arch_switch_mm(). */ ++*switch_count; diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c index 6d1f31b3a967..703e8d80a576 100644 --- a/kernel/sched/membarrier.c +++ b/kernel/sched/membarrier.c @@ -342,6 +342,10 @@ static int membarrier_private_expedited(int flags, int cpu_id) /* * Matches memory barriers after rq->curr modification in * scheduler. + * + * On RISC-V, this barrier pairing is also needed for the + * SYNC_CORE command when switching between processes, cf. + * the inline comments in membarrier_arch_switch_mm(). */ smp_mb(); /* system call entry is not a mb. */ -- cgit v1.2.3 From b88727d554f0fb826e0608192f59542497ba19c5 Mon Sep 17 00:00:00 2001 From: Yu Chien Peter Lin Date: Thu, 22 Feb 2024 16:39:40 +0800 Subject: dt-bindings: riscv: Add Andes interrupt controller compatible string Add "andestech,cpu-intc" compatible string to indicate that Andes specific local interrupt is supported on the core, e.g. AX45MP cores have 3 types of non-standard local interrupt which can be handled in supervisor mode: - Slave port ECC error interrupt - Bus write transaction error interrupt - Performance monitor overflow interrupt These interrupts are enabled/disabled via a custom register SLIE instead of the standard interrupt enable register SIE. Signed-off-by: Yu Chien Peter Lin Acked-by: Conor Dooley Reviewed-by: Lad Prabhakar Link: https://lore.kernel.org/r/20240222083946.3977135-5-peterlin@andestech.com Signed-off-by: Palmer Dabbelt --- Documentation/devicetree/bindings/riscv/cpus.yaml | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/riscv/cpus.yaml b/Documentation/devicetree/bindings/riscv/cpus.yaml index 9d8670c00e3b..6ccd75cbbc59 100644 --- a/Documentation/devicetree/bindings/riscv/cpus.yaml +++ b/Documentation/devicetree/bindings/riscv/cpus.yaml @@ -106,7 +106,11 @@ properties: const: 1 compatible: - const: riscv,cpu-intc + oneOf: + - items: + - const: andestech,cpu-intc + - const: riscv,cpu-intc + - const: riscv,cpu-intc interrupt-controller: true -- cgit v1.2.3 From 61609bf2b29dcb07de3aaad7d6212cc3c341192b Mon Sep 17 00:00:00 2001 From: Yu Chien Peter Lin Date: Thu, 22 Feb 2024 16:39:44 +0800 Subject: dt-bindings: riscv: Add Andes PMU extension description Document the ISA string for Andes Technology performance monitor extension which provides counter overflow interrupt and mode filtering mechanisms. Signed-off-by: Yu Chien Peter Lin Acked-by: Conor Dooley Reviewed-by: Lad Prabhakar Link: https://lore.kernel.org/r/20240222083946.3977135-9-peterlin@andestech.com Signed-off-by: Palmer Dabbelt --- Documentation/devicetree/bindings/riscv/extensions.yaml | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'Documentation') diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml index 63d81dc895e5..468c646247aa 100644 --- a/Documentation/devicetree/bindings/riscv/extensions.yaml +++ b/Documentation/devicetree/bindings/riscv/extensions.yaml @@ -477,5 +477,12 @@ properties: latency, as ratified in commit 56ed795 ("Update riscv-crypto-spec-vector.adoc") of riscv-crypto. + - const: xandespmu + description: + The Andes Technology performance monitor extension for counter overflow + and privilege mode filtering. For more details, see Counter Related + Registers in the AX45MP datasheet. + https://www.andestech.com/wp-content/uploads/AX45MP-1C-Rev.-5.0.0-Datasheet.pdf + additionalProperties: true ... -- cgit v1.2.3 From 371a3c2055dbb8a88754902fe19aa4fcc4318c29 Mon Sep 17 00:00:00 2001 From: Charlie Jenkins Date: Tue, 30 Jan 2024 17:07:02 -0800 Subject: docs: riscv: Define behavior of mmap Define mmap on riscv to not provide an address that uses more bits than the hint address, if provided. Signed-off-by: Charlie Jenkins Link: https://lore.kernel.org/r/20240130-use_mmap_hint_address-v3-3-8a655cfa8bcb@rivosinc.com Signed-off-by: Palmer Dabbelt --- Documentation/arch/riscv/vm-layout.rst | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-) (limited to 'Documentation') diff --git a/Documentation/arch/riscv/vm-layout.rst b/Documentation/arch/riscv/vm-layout.rst index 69ff6da1dbf8..e476b4386bd9 100644 --- a/Documentation/arch/riscv/vm-layout.rst +++ b/Documentation/arch/riscv/vm-layout.rst @@ -144,14 +144,8 @@ passing 0 into the hint address parameter of mmap. On CPUs with an address space smaller than sv48, the CPU maximum supported address space will be the default. Software can "opt-in" to receiving VAs from another VA space by providing -a hint address to mmap. A hint address passed to mmap will cause the largest -address space that fits entirely into the hint to be used, unless there is no -space left in the address space. If there is no space available in the requested -address space, an address in the next smallest available address space will be -returned. - -For example, in order to obtain 48-bit VA space, a hint address greater than -:code:`1 << 47` must be provided. Note that this is 47 due to sv48 userspace -ending at :code:`1 << 47` and the addresses beyond this are reserved for the -kernel. Similarly, to obtain 57-bit VA space addresses, a hint address greater -than or equal to :code:`1 << 56` must be provided. +a hint address to mmap. When a hint address is passed to mmap, the returned +address will never use more bits than the hint address. For example, if a hint +address of `1 << 40` is passed to mmap, a valid returned address will never use +bits 41 through 63. If no mappable addresses are available in that range, mmap +will return `MAP_FAILED`. -- cgit v1.2.3