From 38b0ece6d76374b989928021b5d310be11b99b5c Mon Sep 17 00:00:00 2001 From: Ryan Roberts Date: Mon, 9 Jun 2025 10:27:27 +0100 Subject: mm/filemap: allow arch to request folio size for exec memory Change the readahead config so that if it is being requested for an executable mapping, do a synchronous read into a set of folios with an arch-specified order and in a naturally aligned manner. We no longer center the read on the faulting page but simply align it down to the previous natural boundary. Additionally, we don't bother with an asynchronous part. On arm64 if memory is physically contiguous and naturally aligned to the "contpte" size, we can use contpte mappings, which improves utilization of the TLB. When paired with the "multi-size THP" feature, this works well to reduce dTLB pressure. However iTLB pressure is still high due to executable mappings having a low likelihood of being in the required folio size and mapping alignment, even when the filesystem supports readahead into large folios (e.g. XFS). The reason for the low likelihood is that the current readahead algorithm starts with an order-0 folio and increases the folio order by 2 every time the readahead mark is hit. But most executable memory tends to be accessed randomly and so the readahead mark is rarely hit and most executable folios remain order-0. So let's special-case the read(ahead) logic for executable mappings. The trade-off is performance improvement (due to more efficient storage of the translations in iTLB) vs potential for making reclaim more difficult (due to the folios being larger so if a part of the folio is hot the whole thing is considered hot). But executable memory is a small portion of the overall system memory so I doubt this will even register from a reclaim perspective. I've chosen 64K folio size for arm64 which benefits both the 4K and 16K base page size configs. Crucially the same amount of data is still read (usually 128K) so I'm not expecting any read amplification issues. I don't anticipate any write amplification because text is always RO. Note that the text region of an ELF file could be populated into the page cache for other reasons than taking a fault in a mmapped area. The most common case is due to the loader read()ing the header which can be shared with the beginning of text. So some text will still remain in small folios, but this simple, best effort change provides good performance improvements as is. Confine this special-case approach to the bounds of the VMA. This prevents wasting memory for any padding that might exist in the file between sections. Previously the padding would have been contained in order-0 folios and would be easy to reclaim. But now it would be part of a larger folio so more difficult to reclaim. Solve this by simply not reading it into memory in the first place. Benchmarking ============ The below shows pgbench and redis benchmarks on Graviton3 arm64 system. First, confirmation that this patch causes more text to be contained in 64K folios: +----------------------+---------------+---------------+---------------+ | File-backed folios by| system boot | pgbench | redis | | size as percentage of+-------+-------+-------+-------+-------+-------+ | all mapped text mem |before | after |before | after |before | after | +======================+=======+=======+=======+=======+=======+=======+ | base-page-4kB | 78% | 30% | 78% | 11% | 73% | 14% | | thp-aligned-8kB | 1% | 0% | 0% | 0% | 1% | 0% | | thp-aligned-16kB | 17% | 4% | 17% | 3% | 20% | 4% | | thp-aligned-32kB | 1% | 1% | 1% | 2% | 1% | 1% | | thp-aligned-64kB | 3% | 63% | 3% | 81% | 4% | 77% | | thp-aligned-128kB | 0% | 1% | 1% | 1% | 1% | 2% | | thp-unaligned-64kB | 0% | 0% | 0% | 1% | 0% | 1% | | thp-unaligned-128kB | 0% | 1% | 0% | 0% | 0% | 0% | | thp-partial | 0% | 0% | 0% | 1% | 0% | 1% | +----------------------+-------+-------+-------+-------+-------+-------+ | cont-aligned-64kB | 4% | 65% | 4% | 83% | 6% | 79% | +----------------------+-------+-------+-------+-------+-------+-------+ The above shows that for both workloads (each isolated with cgroups) as well as the general system state after boot, the amount of text backed by 4K and 16K folios reduces and the amount backed by 64K folios increases significantly. And the amount of text that is contpte-mapped significantly increases (see last row). And this is reflected in performance improvement. "(I)" indicates a statistically significant improvement. Note TPS and Reqs/sec are rates so bigger is better, ms is time so smaller is better: +-------------+-------------------------------------------+------------+ | Benchmark | Result Class | Improvemnt | +=============+===========================================+============+ | pts/pgbench | Scale: 1 Clients: 1 RO (TPS) | (I) 3.47% | | | Scale: 1 Clients: 1 RO - Latency (ms) | -2.88% | | | Scale: 1 Clients: 250 RO (TPS) | (I) 5.02% | | | Scale: 1 Clients: 250 RO - Latency (ms) | (I) -4.79% | | | Scale: 1 Clients: 1000 RO (TPS) | (I) 6.16% | | | Scale: 1 Clients: 1000 RO - Latency (ms) | (I) -5.82% | | | Scale: 100 Clients: 1 RO (TPS) | 2.51% | | | Scale: 100 Clients: 1 RO - Latency (ms) | -3.51% | | | Scale: 100 Clients: 250 RO (TPS) | (I) 4.75% | | | Scale: 100 Clients: 250 RO - Latency (ms) | (I) -4.44% | | | Scale: 100 Clients: 1000 RO (TPS) | (I) 6.34% | | | Scale: 100 Clients: 1000 RO - Latency (ms)| (I) -5.95% | +-------------+-------------------------------------------+------------+ | pts/redis | Test: GET Connections: 50 (Reqs/sec) | (I) 3.20% | | | Test: GET Connections: 1000 (Reqs/sec) | (I) 2.55% | | | Test: LPOP Connections: 50 (Reqs/sec) | (I) 4.59% | | | Test: LPOP Connections: 1000 (Reqs/sec) | (I) 4.81% | | | Test: LPUSH Connections: 50 (Reqs/sec) | (I) 5.31% | | | Test: LPUSH Connections: 1000 (Reqs/sec) | (I) 4.36% | | | Test: SADD Connections: 50 (Reqs/sec) | (I) 2.64% | | | Test: SADD Connections: 1000 (Reqs/sec) | (I) 4.15% | | | Test: SET Connections: 50 (Reqs/sec) | (I) 3.11% | | | Test: SET Connections: 1000 (Reqs/sec) | (I) 3.36% | +-------------+-------------------------------------------+------------+ [ryan.roberts@arm.com: fix use-after-free] Link: https://lkml.kernel.org/r/ea7f9da7-9a9f-4b85-9d0a-35b320f5ed25@arm.com [ryan.roberts@arm.com: use the vma_pages() helper instead of open-coding] Link: https://lkml.kernel.org/r/0e0f674b-3b7e-494f-ae7a-fc9dbb98dad4@arm.com Link: https://lkml.kernel.org/r/20250609092729.274960-6-ryan.roberts@arm.com Signed-off-by: Ryan Roberts Reviewed-by: Jan Kara Acked-by: Will Deacon Cc: Chaitanya S Prakash Cc: David Hildenbrand Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux/pgtable.h') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 0b6e1f781d86..e4a3895c043b 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -456,6 +456,17 @@ static inline bool arch_has_hw_pte_young(void) } #endif +#ifndef exec_folio_order +/* + * Returns preferred minimum folio order for executable file-backed memory. Must + * be in range [0, PMD_ORDER). Default to order-0. + */ +static inline unsigned int exec_folio_order(void) +{ + return 0; +} +#endif + #ifndef arch_check_zapped_pte static inline void arch_check_zapped_pte(struct vm_area_struct *vma, pte_t pte) -- cgit v1.2.3 From 78ddaa358ec4cdd60bd0e243ced1c83a52c30241 Mon Sep 17 00:00:00 2001 From: Lorenzo Stoakes Date: Wed, 18 Jun 2025 20:42:52 +0100 Subject: mm: change vm_get_page_prot() to accept vm_flags_t argument Patch series "use vm_flags_t consistently". The VMA flags field vma->vm_flags is of type vm_flags_t. Right now this is exactly equivalent to unsigned long, but it should not be assumed to be. Much code that references vma->vm_flags already correctly uses vm_flags_t, but a fairly large chunk of code simply uses unsigned long and assumes that the two are equivalent. This series corrects that and has us use vm_flags_t consistently. This series is motivated by the desire to, in a future series, adjust vm_flags_t to be a u64 regardless of whether the kernel is 32-bit or 64-bit in order to deal with the VMA flag exhaustion issue and avoid all the various problems that arise from it (being unable to use certain features in 32-bit, being unable to add new flags except for 64-bit, etc.) This is therefore a critical first step towards that goal. At any rate, using the correct type is of value regardless. We additionally take the opportunity to refer to VMA flags as vm_flags where possible to make clear what we're referring to. Overall, this series does not introduce any functional change. This patch (of 3): We abstract the type of the VMA flags to vm_flags_t, however in may places it is simply assumed this is unsigned long, which is simply incorrect. At the moment this is simply an incongruity, however in future we plan to change this type and therefore this change is a critical requirement for doing so. Overall, this patch does not introduce any functional change. [lorenzo.stoakes@oracle.com: add missing vm_get_page_prot() instance, remove include] Link: https://lkml.kernel.org/r/552f88e1-2df8-4e95-92b8-812f7c8db829@lucifer.local Link: https://lkml.kernel.org/r/cover.1750274467.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/a12769720a2743f235643b158c4f4f0a9911daf0.1750274467.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes Acked-by: Mike Rapoport (Microsoft) Acked-by: Christian Brauner Reviewed-by: Vlastimil Babka Reviewed-by: Oscar Salvador Reviewed-by: Pedro Falcato Acked-by: Catalin Marinas [arm64] Acked-by: Zi Yan Acked-by: David Hildenbrand Reviewed-by: Anshuman Khandual Cc: Liam R. Howlett Cc: Lorenzo Stoakes Cc: Jann Horn Cc: Kees Cook Cc: Jan Kara Cc: Jarkko Sakkinen Signed-off-by: Andrew Morton --- arch/arm64/mm/mmap.c | 2 +- arch/powerpc/include/asm/book3s/64/pkeys.h | 2 +- arch/powerpc/mm/book3s64/pgtable.c | 2 +- arch/sparc/mm/init_64.c | 2 +- arch/x86/mm/pgprot.c | 2 +- include/linux/mm.h | 4 ++-- include/linux/pgtable.h | 2 +- tools/testing/vma/vma_internal.h | 2 +- 8 files changed, 9 insertions(+), 9 deletions(-) (limited to 'include/linux/pgtable.h') diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c index c86c348857c4..08ee177432c2 100644 --- a/arch/arm64/mm/mmap.c +++ b/arch/arm64/mm/mmap.c @@ -81,7 +81,7 @@ static int __init adjust_protection_map(void) } arch_initcall(adjust_protection_map); -pgprot_t vm_get_page_prot(unsigned long vm_flags) +pgprot_t vm_get_page_prot(vm_flags_t vm_flags) { ptdesc_t prot; diff --git a/arch/powerpc/include/asm/book3s/64/pkeys.h b/arch/powerpc/include/asm/book3s/64/pkeys.h index 5b178139f3c0..ff911b4251d9 100644 --- a/arch/powerpc/include/asm/book3s/64/pkeys.h +++ b/arch/powerpc/include/asm/book3s/64/pkeys.h @@ -5,7 +5,7 @@ #include -static inline u64 vmflag_to_pte_pkey_bits(u64 vm_flags) +static inline u64 vmflag_to_pte_pkey_bits(vm_flags_t vm_flags) { if (!mmu_has_feature(MMU_FTR_PKEY)) return 0x0UL; diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c index 0db01e10a3f8..a89ef89101fc 100644 --- a/arch/powerpc/mm/book3s64/pgtable.c +++ b/arch/powerpc/mm/book3s64/pgtable.c @@ -644,7 +644,7 @@ unsigned long memremap_compat_align(void) EXPORT_SYMBOL_GPL(memremap_compat_align); #endif -pgprot_t vm_get_page_prot(unsigned long vm_flags) +pgprot_t vm_get_page_prot(vm_flags_t vm_flags) { unsigned long prot; diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c index 25ae4c897aae..7ed58bf3aaca 100644 --- a/arch/sparc/mm/init_64.c +++ b/arch/sparc/mm/init_64.c @@ -3201,7 +3201,7 @@ void copy_highpage(struct page *to, struct page *from) } EXPORT_SYMBOL(copy_highpage); -pgprot_t vm_get_page_prot(unsigned long vm_flags) +pgprot_t vm_get_page_prot(vm_flags_t vm_flags) { unsigned long prot = pgprot_val(protection_map[vm_flags & (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); diff --git a/arch/x86/mm/pgprot.c b/arch/x86/mm/pgprot.c index c84bd9540b16..dc1afd5c839d 100644 --- a/arch/x86/mm/pgprot.c +++ b/arch/x86/mm/pgprot.c @@ -32,7 +32,7 @@ void add_encrypt_protection_map(void) protection_map[i] = pgprot_encrypted(protection_map[i]); } -pgprot_t vm_get_page_prot(unsigned long vm_flags) +pgprot_t vm_get_page_prot(vm_flags_t vm_flags) { unsigned long val = pgprot_val(protection_map[vm_flags & (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); diff --git a/include/linux/mm.h b/include/linux/mm.h index b7e2abd8ce0d..78bb177ba55f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3489,10 +3489,10 @@ static inline bool range_in_vma(struct vm_area_struct *vma, } #ifdef CONFIG_MMU -pgprot_t vm_get_page_prot(unsigned long vm_flags); +pgprot_t vm_get_page_prot(vm_flags_t vm_flags); void vma_set_page_prot(struct vm_area_struct *vma); #else -static inline pgprot_t vm_get_page_prot(unsigned long vm_flags) +static inline pgprot_t vm_get_page_prot(vm_flags_t vm_flags) { return __pgprot(0); } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index e4a3895c043b..d05e35a0facf 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -2016,7 +2016,7 @@ typedef unsigned int pgtbl_mod_mask; * x: (yes) yes */ #define DECLARE_VM_GET_PAGE_PROT \ -pgprot_t vm_get_page_prot(unsigned long vm_flags) \ +pgprot_t vm_get_page_prot(vm_flags_t vm_flags) \ { \ return protection_map[vm_flags & \ (VM_READ | VM_WRITE | VM_EXEC | VM_SHARED)]; \ diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h index 0f013784da89..3b1b45256d56 100644 --- a/tools/testing/vma/vma_internal.h +++ b/tools/testing/vma/vma_internal.h @@ -576,7 +576,7 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot) return __pgprot(pgprot_val(oldprot) | pgprot_val(newprot)); } -static inline pgprot_t vm_get_page_prot(unsigned long vm_flags) +static inline pgprot_t vm_get_page_prot(vm_flags_t vm_flags) { return __pgprot(vm_flags); } -- cgit v1.2.3 From 8a6a984c2e0ea406459b445a3910a454bece3aa1 Mon Sep 17 00:00:00 2001 From: Alistair Popple Date: Thu, 19 Jun 2025 18:57:59 +1000 Subject: mm: remove redundant pXd_devmap calls MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit DAX was the only thing that created pmd_devmap and pud_devmap entries however it no longer does as DAX pages are now refcounted normally and pXd_trans_huge() returns true for those. Therefore checking both pXd_devmap and pXd_trans_huge() is redundant and the former can be removed without changing behaviour as it will always be false. Link: https://lkml.kernel.org/r/d58f089dc16b7feb7c6728164f37dea65d64a0d3.1750323463.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple Cc: Balbir Singh Cc: Björn Töpel Cc: Björn Töpel Cc: Christoph Hellwig Cc: Chunyan Zhang Cc: Dan Williams Cc: David Hildenbrand Cc: Deepak Gupta Cc: Gerald Schaefer Cc: Inki Dae Cc: Jason Gunthorpe Cc: John Groves Cc: John Hubbard Cc: Lorenzo Stoakes Cc: Matthew Wilcox (Oracle) Cc: Will Deacon Signed-off-by: Andrew Morton --- fs/dax.c | 5 ++--- include/linux/huge_mm.h | 10 ++++------ include/linux/pgtable.h | 2 +- mm/hmm.c | 4 ++-- mm/huge_memory.c | 23 +++++++++-------------- mm/mapping_dirty_helpers.c | 4 ++-- mm/memory.c | 15 ++++++--------- mm/migrate_device.c | 2 +- mm/mprotect.c | 2 +- mm/mremap.c | 5 ++--- mm/page_vma_mapped.c | 5 ++--- mm/pagewalk.c | 8 +++----- mm/pgtable-generic.c | 7 +++---- mm/userfaultfd.c | 4 ++-- mm/vmscan.c | 3 --- 15 files changed, 40 insertions(+), 59 deletions(-) (limited to 'include/linux/pgtable.h') diff --git a/fs/dax.c b/fs/dax.c index ea0c35794bf9..7d4ecb9d23af 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1937,7 +1937,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, * the PTE we need to set up. If so just return and the fault will be * retried. */ - if (pmd_trans_huge(*vmf->pmd) || pmd_devmap(*vmf->pmd)) { + if (pmd_trans_huge(*vmf->pmd)) { ret = VM_FAULT_NOPAGE; goto unlock_entry; } @@ -2060,8 +2060,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, * the PMD we need to set up. If so just return and the fault will be * retried. */ - if (!pmd_none(*vmf->pmd) && !pmd_trans_huge(*vmf->pmd) && - !pmd_devmap(*vmf->pmd)) { + if (!pmd_none(*vmf->pmd) && !pmd_trans_huge(*vmf->pmd)) { ret = 0; goto unlock_entry; } diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index a2df2308cb2c..26607f2c65fb 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -400,8 +400,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, #define split_huge_pmd(__vma, __pmd, __address) \ do { \ pmd_t *____pmd = (__pmd); \ - if (is_swap_pmd(*____pmd) || pmd_trans_huge(*____pmd) \ - || pmd_devmap(*____pmd)) \ + if (is_swap_pmd(*____pmd) || pmd_trans_huge(*____pmd)) \ __split_huge_pmd(__vma, __pmd, __address, \ false); \ } while (0) @@ -426,8 +425,7 @@ change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, #define split_huge_pud(__vma, __pud, __address) \ do { \ pud_t *____pud = (__pud); \ - if (pud_trans_huge(*____pud) \ - || pud_devmap(*____pud)) \ + if (pud_trans_huge(*____pud)) \ __split_huge_pud(__vma, __pud, __address); \ } while (0) @@ -450,7 +448,7 @@ static inline int is_swap_pmd(pmd_t pmd) static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) { - if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) + if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd)) return __pmd_trans_huge_lock(pmd, vma); else return NULL; @@ -458,7 +456,7 @@ static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd, static inline spinlock_t *pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) { - if (pud_trans_huge(*pud) || pud_devmap(*pud)) + if (pud_trans_huge(*pud)) return __pud_trans_huge_lock(pud, vma); else return NULL; diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index d05e35a0facf..ffcd966cf2d4 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1672,7 +1672,7 @@ static inline int pud_trans_unstable(pud_t *pud) defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) pud_t pudval = READ_ONCE(*pud); - if (pud_none(pudval) || pud_trans_huge(pudval) || pud_devmap(pudval)) + if (pud_none(pudval) || pud_trans_huge(pudval)) return 1; if (unlikely(pud_bad(pudval))) { pud_clear_bad(pud); diff --git a/mm/hmm.c b/mm/hmm.c index 14914da98416..62d3082dc55c 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -360,7 +360,7 @@ again: return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR); } - if (pmd_devmap(pmd) || pmd_trans_huge(pmd)) { + if (pmd_trans_huge(pmd)) { /* * No need to take pmd_lock here, even if some other thread * is splitting the huge pmd we will get that event through @@ -371,7 +371,7 @@ again: * values. */ pmd = pmdp_get_lockless(pmdp); - if (!pmd_devmap(pmd) && !pmd_trans_huge(pmd)) + if (!pmd_trans_huge(pmd)) goto again; return hmm_vma_handle_pmd(walk, addr, end, hmm_pfns, pmd); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 54b5c37d9515..cf808b2eea29 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1459,8 +1459,7 @@ vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write) * but we need to be consistent with PTEs and architectures that * can't support a 'special' bit. */ - BUG_ON(!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) && - !pfn_t_devmap(pfn)); + BUG_ON(!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))); BUG_ON((vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) == (VM_PFNMAP|VM_MIXEDMAP)); BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags)); @@ -1596,8 +1595,7 @@ vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write) * but we need to be consistent with PTEs and architectures that * can't support a 'special' bit. */ - BUG_ON(!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) && - !pfn_t_devmap(pfn)); + BUG_ON(!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))); BUG_ON((vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) == (VM_PFNMAP|VM_MIXEDMAP)); BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags)); @@ -1815,7 +1813,7 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, ret = -EAGAIN; pud = *src_pud; - if (unlikely(!pud_trans_huge(pud) && !pud_devmap(pud))) + if (unlikely(!pud_trans_huge(pud))) goto out_unlock; /* @@ -2677,8 +2675,7 @@ spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) { spinlock_t *ptl; ptl = pmd_lock(vma->vm_mm, pmd); - if (likely(is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || - pmd_devmap(*pmd))) + if (likely(is_swap_pmd(*pmd) || pmd_trans_huge(*pmd))) return ptl; spin_unlock(ptl); return NULL; @@ -2695,7 +2692,7 @@ spinlock_t *__pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) spinlock_t *ptl; ptl = pud_lock(vma->vm_mm, pud); - if (likely(pud_trans_huge(*pud) || pud_devmap(*pud))) + if (likely(pud_trans_huge(*pud))) return ptl; spin_unlock(ptl); return NULL; @@ -2747,7 +2744,7 @@ static void __split_huge_pud_locked(struct vm_area_struct *vma, pud_t *pud, VM_BUG_ON(haddr & ~HPAGE_PUD_MASK); VM_BUG_ON_VMA(vma->vm_start > haddr, vma); VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PUD_SIZE, vma); - VM_BUG_ON(!pud_trans_huge(*pud) && !pud_devmap(*pud)); + VM_BUG_ON(!pud_trans_huge(*pud)); count_vm_event(THP_SPLIT_PUD); @@ -2780,7 +2777,7 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, (address & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE); mmu_notifier_invalidate_range_start(&range); ptl = pud_lock(vma->vm_mm, pud); - if (unlikely(!pud_trans_huge(*pud) && !pud_devmap(*pud))) + if (unlikely(!pud_trans_huge(*pud))) goto out; __split_huge_pud_locked(vma, pud, range.start); @@ -2853,8 +2850,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(haddr & ~HPAGE_PMD_MASK); VM_BUG_ON_VMA(vma->vm_start > haddr, vma); VM_BUG_ON_VMA(vma->vm_end < haddr + HPAGE_PMD_SIZE, vma); - VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd) - && !pmd_devmap(*pmd)); + VM_BUG_ON(!is_pmd_migration_entry(*pmd) && !pmd_trans_huge(*pmd)); count_vm_event(THP_SPLIT_PMD); @@ -3062,8 +3058,7 @@ void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, pmd_t *pmd, bool freeze) { VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE)); - if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || - is_pmd_migration_entry(*pmd)) + if (pmd_trans_huge(*pmd) || is_pmd_migration_entry(*pmd)) __split_huge_pmd_locked(vma, pmd, address, freeze); } diff --git a/mm/mapping_dirty_helpers.c b/mm/mapping_dirty_helpers.c index dc1692ff9e58..c193de6cb23a 100644 --- a/mm/mapping_dirty_helpers.c +++ b/mm/mapping_dirty_helpers.c @@ -129,7 +129,7 @@ static int wp_clean_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long end, pmd_t pmdval = pmdp_get_lockless(pmd); /* Do not split a huge pmd, present or migrated */ - if (pmd_trans_huge(pmdval) || pmd_devmap(pmdval)) { + if (pmd_trans_huge(pmdval)) { WARN_ON(pmd_write(pmdval) || pmd_dirty(pmdval)); walk->action = ACTION_CONTINUE; } @@ -152,7 +152,7 @@ static int wp_clean_pud_entry(pud_t *pud, unsigned long addr, unsigned long end, pud_t pudval = READ_ONCE(*pud); /* Do not split a huge pud */ - if (pud_trans_huge(pudval) || pud_devmap(pudval)) { + if (pud_trans_huge(pudval)) { WARN_ON(pud_write(pudval) || pud_dirty(pudval)); walk->action = ACTION_CONTINUE; } diff --git a/mm/memory.c b/mm/memory.c index 01d51bd95197..150bb62855b1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -675,8 +675,6 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, } } - if (pmd_devmap(pmd)) - return NULL; if (is_huge_zero_pmd(pmd)) return NULL; if (unlikely(pfn > highest_memmap_pfn)) @@ -1240,8 +1238,7 @@ copy_pmd_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, src_pmd = pmd_offset(src_pud, addr); do { next = pmd_addr_end(addr, end); - if (is_swap_pmd(*src_pmd) || pmd_trans_huge(*src_pmd) - || pmd_devmap(*src_pmd)) { + if (is_swap_pmd(*src_pmd) || pmd_trans_huge(*src_pmd)) { int err; VM_BUG_ON_VMA(next-addr != HPAGE_PMD_SIZE, src_vma); err = copy_huge_pmd(dst_mm, src_mm, dst_pmd, src_pmd, @@ -1277,7 +1274,7 @@ copy_pud_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, src_pud = pud_offset(src_p4d, addr); do { next = pud_addr_end(addr, end); - if (pud_trans_huge(*src_pud) || pud_devmap(*src_pud)) { + if (pud_trans_huge(*src_pud)) { int err; VM_BUG_ON_VMA(next-addr != HPAGE_PUD_SIZE, src_vma); @@ -1791,7 +1788,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb, pmd = pmd_offset(pud, addr); do { next = pmd_addr_end(addr, end); - if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) { + if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd)) { if (next - addr != HPAGE_PMD_SIZE) __split_huge_pmd(vma, pmd, addr, false); else if (zap_huge_pmd(tlb, vma, pmd, addr)) { @@ -1833,7 +1830,7 @@ static inline unsigned long zap_pud_range(struct mmu_gather *tlb, pud = pud_offset(p4d, addr); do { next = pud_addr_end(addr, end); - if (pud_trans_huge(*pud) || pud_devmap(*pud)) { + if (pud_trans_huge(*pud)) { if (next - addr != HPAGE_PUD_SIZE) { mmap_assert_locked(tlb->mm); split_huge_pud(vma, pud, addr); @@ -6136,7 +6133,7 @@ retry_pud: pud_t orig_pud = *vmf.pud; barrier(); - if (pud_trans_huge(orig_pud) || pud_devmap(orig_pud)) { + if (pud_trans_huge(orig_pud)) { /* * TODO once we support anonymous PUDs: NUMA case and @@ -6177,7 +6174,7 @@ retry_pud: pmd_migration_entry_wait(mm, vmf.pmd); return 0; } - if (pmd_trans_huge(vmf.orig_pmd) || pmd_devmap(vmf.orig_pmd)) { + if (pmd_trans_huge(vmf.orig_pmd)) { if (pmd_protnone(vmf.orig_pmd) && vma_is_accessible(vma)) return do_huge_pmd_numa_page(&vmf); diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 3158afe7eb23..e05e14d6eacd 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -615,7 +615,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, pmdp = pmd_alloc(mm, pudp, addr); if (!pmdp) goto abort; - if (pmd_trans_huge(*pmdp) || pmd_devmap(*pmdp)) + if (pmd_trans_huge(*pmdp)) goto abort; if (pte_alloc(mm, pmdp)) goto abort; diff --git a/mm/mprotect.c b/mm/mprotect.c index b873b98ab705..88709c01177b 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -376,7 +376,7 @@ again: goto next; _pmd = pmdp_get_lockless(pmd); - if (is_swap_pmd(_pmd) || pmd_trans_huge(_pmd) || pmd_devmap(_pmd)) { + if (is_swap_pmd(_pmd) || pmd_trans_huge(_pmd)) { if ((next - addr != HPAGE_PMD_SIZE) || pgtable_split_needed(vma, cp_flags)) { __split_huge_pmd(vma, pmd, addr, false); diff --git a/mm/mremap.c b/mm/mremap.c index 7e93d3344828..36585041c760 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -820,7 +820,7 @@ unsigned long move_page_tables(struct pagetable_move_control *pmc) new_pud = alloc_new_pud(mm, pmc->new_addr); if (!new_pud) break; - if (pud_trans_huge(*old_pud) || pud_devmap(*old_pud)) { + if (pud_trans_huge(*old_pud)) { if (extent == HPAGE_PUD_SIZE) { move_pgt_entry(pmc, HPAGE_PUD, old_pud, new_pud); /* We ignore and continue on error? */ @@ -839,8 +839,7 @@ unsigned long move_page_tables(struct pagetable_move_control *pmc) if (!new_pmd) break; again: - if (is_swap_pmd(*old_pmd) || pmd_trans_huge(*old_pmd) || - pmd_devmap(*old_pmd)) { + if (is_swap_pmd(*old_pmd) || pmd_trans_huge(*old_pmd)) { if (extent == HPAGE_PMD_SIZE && move_pgt_entry(pmc, HPAGE_PMD, old_pmd, new_pmd)) continue; diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index e463c3be934a..e981a1a292d2 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -246,8 +246,7 @@ restart: */ pmde = pmdp_get_lockless(pvmw->pmd); - if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde) || - (pmd_present(pmde) && pmd_devmap(pmde))) { + if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde)) { pvmw->ptl = pmd_lock(mm, pvmw->pmd); pmde = *pvmw->pmd; if (!pmd_present(pmde)) { @@ -262,7 +261,7 @@ restart: return not_found(pvmw); return true; } - if (likely(pmd_trans_huge(pmde) || pmd_devmap(pmde))) { + if (likely(pmd_trans_huge(pmde))) { if (pvmw->flags & PVMW_MIGRATION) return not_found(pvmw); if (!check_pmd(pmd_pfn(pmde), pvmw)) diff --git a/mm/pagewalk.c b/mm/pagewalk.c index a214a2b40ab9..648038247a8d 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -143,8 +143,7 @@ again: * We are ONLY installing, so avoid unnecessarily * splitting a present huge page. */ - if (pmd_present(*pmd) && - (pmd_trans_huge(*pmd) || pmd_devmap(*pmd))) + if (pmd_present(*pmd) && pmd_trans_huge(*pmd)) continue; } @@ -210,8 +209,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, * We are ONLY installing, so avoid unnecessarily * splitting a present huge page. */ - if (pud_present(*pud) && - (pud_trans_huge(*pud) || pud_devmap(*pud))) + if (pud_present(*pud) && pud_trans_huge(*pud)) continue; } @@ -908,7 +906,7 @@ struct folio *folio_walk_start(struct folio_walk *fw, * TODO: FW_MIGRATION support for PUD migration entries * once there are relevant users. */ - if (!pud_present(pud) || pud_devmap(pud) || pud_special(pud)) { + if (!pud_present(pud) || pud_special(pud)) { spin_unlock(ptl); goto not_found; } else if (!pud_leaf(pud)) { diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 5a882f2b10f9..567e2d084071 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -139,8 +139,7 @@ pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address, { pmd_t pmd; VM_BUG_ON(address & ~HPAGE_PMD_MASK); - VM_BUG_ON(pmd_present(*pmdp) && !pmd_trans_huge(*pmdp) && - !pmd_devmap(*pmdp)); + VM_BUG_ON(pmd_present(*pmdp) && !pmd_trans_huge(*pmdp)); pmd = pmdp_huge_get_and_clear(vma->vm_mm, address, pmdp); flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE); return pmd; @@ -153,7 +152,7 @@ pud_t pudp_huge_clear_flush(struct vm_area_struct *vma, unsigned long address, pud_t pud; VM_BUG_ON(address & ~HPAGE_PUD_MASK); - VM_BUG_ON(!pud_trans_huge(*pudp) && !pud_devmap(*pudp)); + VM_BUG_ON(!pud_trans_huge(*pudp)); pud = pudp_huge_get_and_clear(vma->vm_mm, address, pudp); flush_pud_tlb_range(vma, address, address + HPAGE_PUD_SIZE); return pud; @@ -293,7 +292,7 @@ pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) *pmdvalp = pmdval; if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval))) goto nomap; - if (unlikely(pmd_trans_huge(pmdval) || pmd_devmap(pmdval))) + if (unlikely(pmd_trans_huge(pmdval))) goto nomap; if (unlikely(pmd_bad(pmdval))) { pmd_clear_bad(pmd); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index dd2a25fafb82..cbed91b09640 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -795,8 +795,8 @@ retry: * (This includes the case where the PMD used to be THP and * changed back to none after __pte_alloc().) */ - if (unlikely(!pmd_present(dst_pmdval) || pmd_trans_huge(dst_pmdval) || - pmd_devmap(dst_pmdval))) { + if (unlikely(!pmd_present(dst_pmdval) || + pmd_trans_huge(dst_pmdval))) { err = -EEXIST; break; } diff --git a/mm/vmscan.c b/mm/vmscan.c index 6698fadf5d04..c86a2495138a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3450,9 +3450,6 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned if (!pmd_present(pmd) || is_huge_zero_pmd(pmd)) return -1; - if (WARN_ON_ONCE(pmd_devmap(pmd))) - return -1; - if (!pmd_young(pmd) && !mm_has_notifiers(vma->vm_mm)) return -1; -- cgit v1.2.3 From d438d273417055241ebaaf1ba3be23459fc27cba Mon Sep 17 00:00:00 2001 From: Alistair Popple Date: Thu, 19 Jun 2025 18:58:03 +1000 Subject: mm: remove devmap related functions and page table bits MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Now that DAX and all other reference counts to ZONE_DEVICE pages are managed normally there is no need for the special devmap PTE/PMD/PUD page table bits. So drop all references to these, freeing up a software defined page table bit on architectures supporting it. Link: https://lkml.kernel.org/r/6389398c32cc9daa3dfcaa9f79c7972525d310ce.1750323463.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple Acked-by: Will Deacon # arm64 Acked-by: David Hildenbrand Suggested-by: Chunyan Zhang Reviewed-by: Björn Töpel Reviewed-by: Jason Gunthorpe Cc: Balbir Singh Cc: Björn Töpel Cc: Christoph Hellwig Cc: Dan Williams Cc: Deepak Gupta Cc: Gerald Schaefer Cc: Inki Dae Cc: John Groves Cc: John Hubbard Cc: Lorenzo Stoakes Cc: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- Documentation/mm/arch_pgtable_helpers.rst | 6 --- arch/arm64/Kconfig | 1 - arch/arm64/include/asm/pgtable-prot.h | 1 - arch/arm64/include/asm/pgtable.h | 24 ----------- arch/loongarch/Kconfig | 1 - arch/loongarch/include/asm/pgtable-bits.h | 6 +-- arch/loongarch/include/asm/pgtable.h | 19 --------- arch/powerpc/Kconfig | 1 - arch/powerpc/include/asm/book3s/64/hash-4k.h | 6 --- arch/powerpc/include/asm/book3s/64/hash-64k.h | 7 +--- arch/powerpc/include/asm/book3s/64/pgtable.h | 53 +----------------------- arch/powerpc/include/asm/book3s/64/radix.h | 14 +------ arch/riscv/Kconfig | 1 - arch/riscv/include/asm/pgtable-64.h | 16 -------- arch/riscv/include/asm/pgtable-bits.h | 1 - arch/riscv/include/asm/pgtable.h | 22 ---------- arch/x86/Kconfig | 1 - arch/x86/include/asm/pgtable.h | 51 +---------------------- arch/x86/include/asm/pgtable_types.h | 5 +-- include/linux/mm.h | 7 ---- include/linux/pgtable.h | 19 +-------- mm/Kconfig | 4 -- mm/debug_vm_pgtable.c | 59 --------------------------- mm/hmm.c | 3 +- mm/madvise.c | 8 ++-- 25 files changed, 17 insertions(+), 319 deletions(-) (limited to 'include/linux/pgtable.h') diff --git a/Documentation/mm/arch_pgtable_helpers.rst b/Documentation/mm/arch_pgtable_helpers.rst index af245161d8e7..c88c7fa665d6 100644 --- a/Documentation/mm/arch_pgtable_helpers.rst +++ b/Documentation/mm/arch_pgtable_helpers.rst @@ -30,8 +30,6 @@ PTE Page Table Helpers +---------------------------+--------------------------------------------------+ | pte_protnone | Tests a PROT_NONE PTE | +---------------------------+--------------------------------------------------+ -| pte_devmap | Tests a ZONE_DEVICE mapped PTE | -+---------------------------+--------------------------------------------------+ | pte_soft_dirty | Tests a soft dirty PTE | +---------------------------+--------------------------------------------------+ | pte_swp_soft_dirty | Tests a soft dirty swapped PTE | @@ -104,8 +102,6 @@ PMD Page Table Helpers +---------------------------+--------------------------------------------------+ | pmd_protnone | Tests a PROT_NONE PMD | +---------------------------+--------------------------------------------------+ -| pmd_devmap | Tests a ZONE_DEVICE mapped PMD | -+---------------------------+--------------------------------------------------+ | pmd_soft_dirty | Tests a soft dirty PMD | +---------------------------+--------------------------------------------------+ | pmd_swp_soft_dirty | Tests a soft dirty swapped PMD | @@ -177,8 +173,6 @@ PUD Page Table Helpers +---------------------------+--------------------------------------------------+ | pud_write | Tests a writable PUD | +---------------------------+--------------------------------------------------+ -| pud_devmap | Tests a ZONE_DEVICE mapped PUD | -+---------------------------+--------------------------------------------------+ | pud_mkyoung | Creates a young PUD | +---------------------------+--------------------------------------------------+ | pud_mkold | Creates an old PUD | diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 55fc331af337..94b48b1dae71 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -44,7 +44,6 @@ config ARM64 select ARCH_HAS_NONLEAF_PMD_YOUNG if ARM64_HAFT select ARCH_HAS_PREEMPT_LAZY select ARCH_HAS_PTDUMP - select ARCH_HAS_PTE_DEVMAP select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_HW_PTE_YOUNG select ARCH_HAS_SETUP_DMA_OPS diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h index 7830d031742e..85dceb1c66f4 100644 --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -17,7 +17,6 @@ #define PTE_SWP_EXCLUSIVE (_AT(pteval_t, 1) << 2) /* only for swp ptes */ #define PTE_DIRTY (_AT(pteval_t, 1) << 55) #define PTE_SPECIAL (_AT(pteval_t, 1) << 56) -#define PTE_DEVMAP (_AT(pteval_t, 1) << 57) /* * PTE_PRESENT_INVALID=1 & PTE_VALID=0 indicates that the pte's fields should be diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index e511f909f63c..ba63c8736666 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -190,7 +190,6 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys) #define pte_user(pte) (!!(pte_val(pte) & PTE_USER)) #define pte_user_exec(pte) (!(pte_val(pte) & PTE_UXN)) #define pte_cont(pte) (!!(pte_val(pte) & PTE_CONT)) -#define pte_devmap(pte) (!!(pte_val(pte) & PTE_DEVMAP)) #define pte_tagged(pte) ((pte_val(pte) & PTE_ATTRINDX_MASK) == \ PTE_ATTRINDX(MT_NORMAL_TAGGED)) @@ -372,11 +371,6 @@ static inline pmd_t pmd_mkcont(pmd_t pmd) return __pmd(pmd_val(pmd) | PMD_SECT_CONT); } -static inline pte_t pte_mkdevmap(pte_t pte) -{ - return set_pte_bit(pte, __pgprot(PTE_DEVMAP | PTE_SPECIAL)); -} - #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP static inline int pte_uffd_wp(pte_t pte) { @@ -653,14 +647,6 @@ static inline pmd_t pmd_mkhuge(pmd_t pmd) return __pmd((pmd_val(pmd) & ~mask) | val); } -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -#define pmd_devmap(pmd) pte_devmap(pmd_pte(pmd)) -#endif -static inline pmd_t pmd_mkdevmap(pmd_t pmd) -{ - return pte_pmd(set_pte_bit(pmd_pte(pmd), __pgprot(PTE_DEVMAP))); -} - #ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP #define pmd_special(pte) (!!((pmd_val(pte) & PTE_SPECIAL))) static inline pmd_t pmd_mkspecial(pmd_t pmd) @@ -1302,16 +1288,6 @@ static inline int pmdp_set_access_flags(struct vm_area_struct *vma, return __ptep_set_access_flags(vma, address, (pte_t *)pmdp, pmd_pte(entry), dirty); } - -static inline int pud_devmap(pud_t pud) -{ - return 0; -} - -static inline int pgd_devmap(pgd_t pgd) -{ - return 0; -} #endif #ifdef CONFIG_PAGE_TABLE_CHECK diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index 4b19f93379a1..edb3db230bac 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -25,7 +25,6 @@ config LOONGARCH select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PREEMPT_LAZY - select ARCH_HAS_PTE_DEVMAP select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_SET_MEMORY select ARCH_HAS_SET_DIRECT_MAP diff --git a/arch/loongarch/include/asm/pgtable-bits.h b/arch/loongarch/include/asm/pgtable-bits.h index 7bbfb04a54cc..2fc3789220ac 100644 --- a/arch/loongarch/include/asm/pgtable-bits.h +++ b/arch/loongarch/include/asm/pgtable-bits.h @@ -22,7 +22,6 @@ #define _PAGE_PFN_SHIFT 12 #define _PAGE_SWP_EXCLUSIVE_SHIFT 23 #define _PAGE_PFN_END_SHIFT 48 -#define _PAGE_DEVMAP_SHIFT 59 #define _PAGE_PRESENT_INVALID_SHIFT 60 #define _PAGE_NO_READ_SHIFT 61 #define _PAGE_NO_EXEC_SHIFT 62 @@ -36,7 +35,6 @@ #define _PAGE_MODIFIED (_ULCAST_(1) << _PAGE_MODIFIED_SHIFT) #define _PAGE_PROTNONE (_ULCAST_(1) << _PAGE_PROTNONE_SHIFT) #define _PAGE_SPECIAL (_ULCAST_(1) << _PAGE_SPECIAL_SHIFT) -#define _PAGE_DEVMAP (_ULCAST_(1) << _PAGE_DEVMAP_SHIFT) /* We borrow bit 23 to store the exclusive marker in swap PTEs. */ #define _PAGE_SWP_EXCLUSIVE (_ULCAST_(1) << _PAGE_SWP_EXCLUSIVE_SHIFT) @@ -76,8 +74,8 @@ #define __READABLE (_PAGE_VALID) #define __WRITEABLE (_PAGE_DIRTY | _PAGE_WRITE) -#define _PAGE_CHG_MASK (_PAGE_MODIFIED | _PAGE_SPECIAL | _PAGE_DEVMAP | _PFN_MASK | _CACHE_MASK | _PAGE_PLV) -#define _HPAGE_CHG_MASK (_PAGE_MODIFIED | _PAGE_SPECIAL | _PAGE_DEVMAP | _PFN_MASK | _CACHE_MASK | _PAGE_PLV | _PAGE_HUGE) +#define _PAGE_CHG_MASK (_PAGE_MODIFIED | _PAGE_SPECIAL | _PFN_MASK | _CACHE_MASK | _PAGE_PLV) +#define _HPAGE_CHG_MASK (_PAGE_MODIFIED | _PAGE_SPECIAL | _PFN_MASK | _CACHE_MASK | _PAGE_PLV | _PAGE_HUGE) #define PAGE_NONE __pgprot(_PAGE_PROTNONE | _PAGE_NO_READ | \ _PAGE_USER | _CACHE_CC) diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h index f2aeff544cee..bd128696e96d 100644 --- a/arch/loongarch/include/asm/pgtable.h +++ b/arch/loongarch/include/asm/pgtable.h @@ -409,9 +409,6 @@ static inline int pte_special(pte_t pte) { return pte_val(pte) & _PAGE_SPECIAL; static inline pte_t pte_mkspecial(pte_t pte) { pte_val(pte) |= _PAGE_SPECIAL; return pte; } #endif /* CONFIG_ARCH_HAS_PTE_SPECIAL */ -static inline int pte_devmap(pte_t pte) { return !!(pte_val(pte) & _PAGE_DEVMAP); } -static inline pte_t pte_mkdevmap(pte_t pte) { pte_val(pte) |= _PAGE_DEVMAP; return pte; } - #define pte_accessible pte_accessible static inline unsigned long pte_accessible(struct mm_struct *mm, pte_t a) { @@ -540,17 +537,6 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd) return pmd; } -static inline int pmd_devmap(pmd_t pmd) -{ - return !!(pmd_val(pmd) & _PAGE_DEVMAP); -} - -static inline pmd_t pmd_mkdevmap(pmd_t pmd) -{ - pmd_val(pmd) |= _PAGE_DEVMAP; - return pmd; -} - static inline struct page *pmd_page(pmd_t pmd) { if (pmd_trans_huge(pmd)) @@ -606,11 +592,6 @@ static inline long pmd_protnone(pmd_t pmd) #define pmd_leaf(pmd) ((pmd_val(pmd) & _PAGE_HUGE) != 0) #define pud_leaf(pud) ((pud_val(pud) & _PAGE_HUGE) != 0) -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -#define pud_devmap(pud) (0) -#define pgd_devmap(pgd) (0) -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ - /* * We provide our own get_unmapped area to cope with the virtual aliasing * constraints placed on us by the cache architecture. diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index c3e0cc83f120..7a555c1d3171 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -149,7 +149,6 @@ config PPC select ARCH_HAS_PMEM_API select ARCH_HAS_PREEMPT_LAZY select ARCH_HAS_PTDUMP - select ARCH_HAS_PTE_DEVMAP if PPC_BOOK3S_64 select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_SCALED_CPUTIME if VIRT_CPU_ACCOUNTING_NATIVE && PPC_BOOK3S_64 select ARCH_HAS_SET_MEMORY diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h b/arch/powerpc/include/asm/book3s/64/hash-4k.h index aa90a048f319..7132392fa7cd 100644 --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h @@ -168,12 +168,6 @@ extern pmd_t hash__pmdp_huge_get_and_clear(struct mm_struct *mm, extern int hash__has_transparent_hugepage(void); #endif -static inline pmd_t hash__pmd_mkdevmap(pmd_t pmd) -{ - BUG(); - return pmd; -} - #endif /* !__ASSEMBLY__ */ #endif /* _ASM_POWERPC_BOOK3S_64_HASH_4K_H */ diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h b/arch/powerpc/include/asm/book3s/64/hash-64k.h index 0bf6fd0bf42a..0fb5b7da9478 100644 --- a/arch/powerpc/include/asm/book3s/64/hash-64k.h +++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h @@ -259,7 +259,7 @@ static inline void mark_hpte_slot_valid(unsigned char *hpte_slot_array, */ static inline int hash__pmd_trans_huge(pmd_t pmd) { - return !!((pmd_val(pmd) & (_PAGE_PTE | H_PAGE_THP_HUGE | _PAGE_DEVMAP)) == + return !!((pmd_val(pmd) & (_PAGE_PTE | H_PAGE_THP_HUGE)) == (_PAGE_PTE | H_PAGE_THP_HUGE)); } @@ -281,11 +281,6 @@ extern pmd_t hash__pmdp_huge_get_and_clear(struct mm_struct *mm, extern int hash__has_transparent_hugepage(void); #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -static inline pmd_t hash__pmd_mkdevmap(pmd_t pmd) -{ - return __pmd(pmd_val(pmd) | (_PAGE_PTE | H_PAGE_THP_HUGE | _PAGE_DEVMAP)); -} - #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_BOOK3S_64_HASH_64K_H */ diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h index a2ddcbb3fcb9..c19800365315 100644 --- a/arch/powerpc/include/asm/book3s/64/pgtable.h +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h @@ -88,7 +88,6 @@ #define _PAGE_SOFT_DIRTY _RPAGE_SW3 /* software: software dirty tracking */ #define _PAGE_SPECIAL _RPAGE_SW2 /* software: special page */ -#define _PAGE_DEVMAP _RPAGE_SW1 /* software: ZONE_DEVICE page */ /* * Drivers request for cache inhibited pte mapping using _PAGE_NO_CACHE @@ -109,7 +108,7 @@ */ #define _HPAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_HPTEFLAGS | _PAGE_DIRTY | \ _PAGE_ACCESSED | H_PAGE_THP_HUGE | _PAGE_PTE | \ - _PAGE_SOFT_DIRTY | _PAGE_DEVMAP) + _PAGE_SOFT_DIRTY) /* * user access blocked by key */ @@ -123,7 +122,7 @@ */ #define _PAGE_CHG_MASK (PTE_RPN_MASK | _PAGE_HPTEFLAGS | _PAGE_DIRTY | \ _PAGE_ACCESSED | _PAGE_SPECIAL | _PAGE_PTE | \ - _PAGE_SOFT_DIRTY | _PAGE_DEVMAP) + _PAGE_SOFT_DIRTY) /* * We define 2 sets of base prot bits, one for basic pages (ie, @@ -609,24 +608,6 @@ static inline pte_t pte_mkhuge(pte_t pte) return pte; } -static inline pte_t pte_mkdevmap(pte_t pte) -{ - return __pte_raw(pte_raw(pte) | cpu_to_be64(_PAGE_SPECIAL | _PAGE_DEVMAP)); -} - -/* - * This is potentially called with a pmd as the argument, in which case it's not - * safe to check _PAGE_DEVMAP unless we also confirm that _PAGE_PTE is set. - * That's because the bit we use for _PAGE_DEVMAP is not reserved for software - * use in page directory entries (ie. non-ptes). - */ -static inline int pte_devmap(pte_t pte) -{ - __be64 mask = cpu_to_be64(_PAGE_DEVMAP | _PAGE_PTE); - - return (pte_raw(pte) & mask) == mask; -} - static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { /* FIXME!! check whether this need to be a conditional */ @@ -1379,36 +1360,6 @@ static inline bool arch_needs_pgtable_deposit(void) } extern void serialize_against_pte_lookup(struct mm_struct *mm); - -static inline pmd_t pmd_mkdevmap(pmd_t pmd) -{ - if (radix_enabled()) - return radix__pmd_mkdevmap(pmd); - return hash__pmd_mkdevmap(pmd); -} - -static inline pud_t pud_mkdevmap(pud_t pud) -{ - if (radix_enabled()) - return radix__pud_mkdevmap(pud); - BUG(); - return pud; -} - -static inline int pmd_devmap(pmd_t pmd) -{ - return pte_devmap(pmd_pte(pmd)); -} - -static inline int pud_devmap(pud_t pud) -{ - return pte_devmap(pud_pte(pud)); -} - -static inline int pgd_devmap(pgd_t pgd) -{ - return 0; -} #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #define __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h index 8f55ff74bb68..df23a8267e4d 100644 --- a/arch/powerpc/include/asm/book3s/64/radix.h +++ b/arch/powerpc/include/asm/book3s/64/radix.h @@ -264,7 +264,7 @@ static inline int radix__p4d_bad(p4d_t p4d) static inline int radix__pmd_trans_huge(pmd_t pmd) { - return (pmd_val(pmd) & (_PAGE_PTE | _PAGE_DEVMAP)) == _PAGE_PTE; + return (pmd_val(pmd) & _PAGE_PTE) == _PAGE_PTE; } static inline pmd_t radix__pmd_mkhuge(pmd_t pmd) @@ -274,7 +274,7 @@ static inline pmd_t radix__pmd_mkhuge(pmd_t pmd) static inline int radix__pud_trans_huge(pud_t pud) { - return (pud_val(pud) & (_PAGE_PTE | _PAGE_DEVMAP)) == _PAGE_PTE; + return (pud_val(pud) & _PAGE_PTE) == _PAGE_PTE; } static inline pud_t radix__pud_mkhuge(pud_t pud) @@ -315,16 +315,6 @@ static inline int radix__has_transparent_pud_hugepage(void) } #endif -static inline pmd_t radix__pmd_mkdevmap(pmd_t pmd) -{ - return __pmd(pmd_val(pmd) | (_PAGE_PTE | _PAGE_DEVMAP)); -} - -static inline pud_t radix__pud_mkdevmap(pud_t pud) -{ - return __pud(pud_val(pud) | (_PAGE_PTE | _PAGE_DEVMAP)); -} - struct vmem_altmap; struct dev_pagemap; extern int __meminit radix__vmemmap_create_mapping(unsigned long start, diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index d71ea0f4466f..23df26f39472 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -46,7 +46,6 @@ config RISCV select ARCH_HAS_PREEMPT_LAZY select ARCH_HAS_PREPARE_SYNC_CORE_CMD select ARCH_HAS_PTDUMP if MMU - select ARCH_HAS_PTE_DEVMAP if 64BIT && MMU select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_SET_DIRECT_MAP if MMU select ARCH_HAS_SET_MEMORY if MMU diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h index 7de05db7d3bd..1018d2216901 100644 --- a/arch/riscv/include/asm/pgtable-64.h +++ b/arch/riscv/include/asm/pgtable-64.h @@ -397,24 +397,8 @@ static inline struct page *pgd_page(pgd_t pgd) p4d_t *p4d_offset(pgd_t *pgd, unsigned long address); #ifdef CONFIG_TRANSPARENT_HUGEPAGE -static inline int pte_devmap(pte_t pte); static inline pte_t pmd_pte(pmd_t pmd); static inline pte_t pud_pte(pud_t pud); - -static inline int pmd_devmap(pmd_t pmd) -{ - return pte_devmap(pmd_pte(pmd)); -} - -static inline int pud_devmap(pud_t pud) -{ - return pte_devmap(pud_pte(pud)); -} - -static inline int pgd_devmap(pgd_t pgd) -{ - return 0; -} #endif #endif /* _ASM_RISCV_PGTABLE_64_H */ diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h index a8f5205cea54..179bd4afece4 100644 --- a/arch/riscv/include/asm/pgtable-bits.h +++ b/arch/riscv/include/asm/pgtable-bits.h @@ -19,7 +19,6 @@ #define _PAGE_SOFT (3 << 8) /* Reserved for software */ #define _PAGE_SPECIAL (1 << 8) /* RSW: 0x1 */ -#define _PAGE_DEVMAP (1 << 9) /* RSW, devmap */ #define _PAGE_TABLE _PAGE_PRESENT /* diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 5bd5aae60d53..91697fbf1f90 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -409,13 +409,6 @@ static inline int pte_special(pte_t pte) return pte_val(pte) & _PAGE_SPECIAL; } -#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP -static inline int pte_devmap(pte_t pte) -{ - return pte_val(pte) & _PAGE_DEVMAP; -} -#endif - /* static inline pte_t pte_rdprotect(pte_t pte) */ static inline pte_t pte_wrprotect(pte_t pte) @@ -457,11 +450,6 @@ static inline pte_t pte_mkspecial(pte_t pte) return __pte(pte_val(pte) | _PAGE_SPECIAL); } -static inline pte_t pte_mkdevmap(pte_t pte) -{ - return __pte(pte_val(pte) | _PAGE_DEVMAP); -} - static inline pte_t pte_mkhuge(pte_t pte) { return pte; @@ -790,11 +778,6 @@ static inline pmd_t pmd_mkdirty(pmd_t pmd) return pte_pmd(pte_mkdirty(pmd_pte(pmd))); } -static inline pmd_t pmd_mkdevmap(pmd_t pmd) -{ - return pte_pmd(pte_mkdevmap(pmd_pte(pmd))); -} - #ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP static inline bool pmd_special(pmd_t pmd) { @@ -946,11 +929,6 @@ static inline pud_t pud_mkhuge(pud_t pud) return pud; } -static inline pud_t pud_mkdevmap(pud_t pud) -{ - return pte_pud(pte_mkdevmap(pud_pte(pud))); -} - static inline int pudp_set_access_flags(struct vm_area_struct *vma, unsigned long address, pud_t *pudp, pud_t entry, int dirty) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 71019b3b54ea..bb9b63d76a19 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -101,7 +101,6 @@ config X86 select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PMEM_API if X86_64 select ARCH_HAS_PREEMPT_LAZY - select ARCH_HAS_PTE_DEVMAP if X86_64 select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_HW_PTE_YOUNG select ARCH_HAS_NONLEAF_PMD_YOUNG if PGTABLE_LEVELS > 2 diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 97954c936c54..e33df3da6980 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -301,16 +301,15 @@ static inline bool pmd_leaf(pmd_t pte) } #ifdef CONFIG_TRANSPARENT_HUGEPAGE -/* NOTE: when predicate huge page, consider also pmd_devmap, or use pmd_leaf */ static inline int pmd_trans_huge(pmd_t pmd) { - return (pmd_val(pmd) & (_PAGE_PSE|_PAGE_DEVMAP)) == _PAGE_PSE; + return (pmd_val(pmd) & _PAGE_PSE) == _PAGE_PSE; } #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD static inline int pud_trans_huge(pud_t pud) { - return (pud_val(pud) & (_PAGE_PSE|_PAGE_DEVMAP)) == _PAGE_PSE; + return (pud_val(pud) & _PAGE_PSE) == _PAGE_PSE; } #endif @@ -320,24 +319,6 @@ static inline int has_transparent_hugepage(void) return boot_cpu_has(X86_FEATURE_PSE); } -#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP -static inline int pmd_devmap(pmd_t pmd) -{ - return !!(pmd_val(pmd) & _PAGE_DEVMAP); -} - -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -static inline int pud_devmap(pud_t pud) -{ - return !!(pud_val(pud) & _PAGE_DEVMAP); -} -#else -static inline int pud_devmap(pud_t pud) -{ - return 0; -} -#endif - #ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP static inline bool pmd_special(pmd_t pmd) { @@ -361,12 +342,6 @@ static inline pud_t pud_mkspecial(pud_t pud) return pud_set_flags(pud, _PAGE_SPECIAL); } #endif /* CONFIG_ARCH_SUPPORTS_PUD_PFNMAP */ - -static inline int pgd_devmap(pgd_t pgd) -{ - return 0; -} -#endif #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ static inline pte_t pte_set_flags(pte_t pte, pteval_t set) @@ -527,11 +502,6 @@ static inline pte_t pte_mkspecial(pte_t pte) return pte_set_flags(pte, _PAGE_SPECIAL); } -static inline pte_t pte_mkdevmap(pte_t pte) -{ - return pte_set_flags(pte, _PAGE_SPECIAL|_PAGE_DEVMAP); -} - /* See comments above mksaveddirty_shift() */ static inline pmd_t pmd_mksaveddirty(pmd_t pmd) { @@ -603,11 +573,6 @@ static inline pmd_t pmd_mkwrite_shstk(pmd_t pmd) return pmd_set_flags(pmd, _PAGE_DIRTY); } -static inline pmd_t pmd_mkdevmap(pmd_t pmd) -{ - return pmd_set_flags(pmd, _PAGE_DEVMAP); -} - static inline pmd_t pmd_mkhuge(pmd_t pmd) { return pmd_set_flags(pmd, _PAGE_PSE); @@ -673,11 +638,6 @@ static inline pud_t pud_mkdirty(pud_t pud) return pud_mksaveddirty(pud); } -static inline pud_t pud_mkdevmap(pud_t pud) -{ - return pud_set_flags(pud, _PAGE_DEVMAP); -} - static inline pud_t pud_mkhuge(pud_t pud) { return pud_set_flags(pud, _PAGE_PSE); @@ -1008,13 +968,6 @@ static inline int pte_present(pte_t a) return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE); } -#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP -static inline int pte_devmap(pte_t a) -{ - return (pte_flags(a) & _PAGE_DEVMAP) == _PAGE_DEVMAP; -} -#endif - #define pte_accessible pte_accessible static inline bool pte_accessible(struct mm_struct *mm, pte_t a) { diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index b74ec5c3643b..f63ae8d0aac8 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -34,7 +34,6 @@ #define _PAGE_BIT_UFFD_WP _PAGE_BIT_SOFTW2 /* userfaultfd wrprotected */ #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */ #define _PAGE_BIT_KERNEL_4K _PAGE_BIT_SOFTW3 /* page must not be converted to large */ -#define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4 #ifdef CONFIG_X86_64 #define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW5 /* Saved Dirty bit (leaf) */ @@ -121,11 +120,9 @@ #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) #define _PAGE_NX (_AT(pteval_t, 1) << _PAGE_BIT_NX) -#define _PAGE_DEVMAP (_AT(u64, 1) << _PAGE_BIT_DEVMAP) #define _PAGE_SOFTW4 (_AT(pteval_t, 1) << _PAGE_BIT_SOFTW4) #else #define _PAGE_NX (_AT(pteval_t, 0)) -#define _PAGE_DEVMAP (_AT(pteval_t, 0)) #define _PAGE_SOFTW4 (_AT(pteval_t, 0)) #endif @@ -154,7 +151,7 @@ #define _COMMON_PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \ _PAGE_SPECIAL | _PAGE_ACCESSED | \ _PAGE_DIRTY_BITS | _PAGE_SOFT_DIRTY | \ - _PAGE_DEVMAP | _PAGE_CC | _PAGE_UFFD_WP) + _PAGE_CC | _PAGE_UFFD_WP) #define _PAGE_CHG_MASK (_COMMON_PAGE_CHG_MASK | _PAGE_PAT) #define _HPAGE_CHG_MASK (_COMMON_PAGE_CHG_MASK | _PAGE_PSE | _PAGE_PAT_LARGE) diff --git a/include/linux/mm.h b/include/linux/mm.h index fc365420dfa8..4d833f159988 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2704,13 +2704,6 @@ static inline pud_t pud_mkspecial(pud_t pud) } #endif /* CONFIG_ARCH_SUPPORTS_PUD_PFNMAP */ -#ifndef CONFIG_ARCH_HAS_PTE_DEVMAP -static inline int pte_devmap(pte_t pte) -{ - return 0; -} -#endif - extern pte_t *__get_locked_pte(struct mm_struct *mm, unsigned long addr, spinlock_t **ptl); static inline pte_t *get_locked_pte(struct mm_struct *mm, unsigned long addr, diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index ffcd966cf2d4..cf1515c163e2 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1643,21 +1643,6 @@ static inline int pud_write(pud_t pud) } #endif /* pud_write */ -#if !defined(CONFIG_ARCH_HAS_PTE_DEVMAP) || !defined(CONFIG_TRANSPARENT_HUGEPAGE) -static inline int pmd_devmap(pmd_t pmd) -{ - return 0; -} -static inline int pud_devmap(pud_t pud) -{ - return 0; -} -static inline int pgd_devmap(pgd_t pgd) -{ - return 0; -} -#endif - #if !defined(CONFIG_TRANSPARENT_HUGEPAGE) || \ !defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) static inline int pud_trans_huge(pud_t pud) @@ -1912,8 +1897,8 @@ typedef unsigned int pgtbl_mod_mask; * - It should contain a huge PFN, which points to a huge page larger than * PAGE_SIZE of the platform. The PFN format isn't important here. * - * - It should cover all kinds of huge mappings (e.g., pXd_trans_huge(), - * pXd_devmap(), or hugetlb mappings). + * - It should cover all kinds of huge mappings (i.e. pXd_trans_huge() + * or hugetlb mappings). */ #ifndef pgd_leaf #define pgd_leaf(x) false diff --git a/mm/Kconfig b/mm/Kconfig index 065b1f19dd99..d5d4eca947a6 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1117,9 +1117,6 @@ config ARCH_HAS_CURRENT_STACK_POINTER register alias named "current_stack_pointer", this config can be selected. -config ARCH_HAS_PTE_DEVMAP - bool - config ARCH_HAS_ZONE_DMA_SET bool @@ -1137,7 +1134,6 @@ config ZONE_DEVICE depends on MEMORY_HOTPLUG depends on MEMORY_HOTREMOVE depends on SPARSEMEM_VMEMMAP - depends on ARCH_HAS_PTE_DEVMAP select XARRAY_MULTI help diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 7731b238b534..d84d0c49012f 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -348,12 +348,6 @@ static void __init pud_advanced_tests(struct pgtable_debug_args *args) vaddr &= HPAGE_PUD_MASK; pud = pfn_pud(args->pud_pfn, args->page_prot); - /* - * Some architectures have debug checks to make sure - * huge pud mapping are only found with devmap entries - * For now test with only devmap entries. - */ - pud = pud_mkdevmap(pud); set_pud_at(args->mm, vaddr, args->pudp, pud); flush_dcache_page(page); pudp_set_wrprotect(args->mm, vaddr, args->pudp); @@ -366,7 +360,6 @@ static void __init pud_advanced_tests(struct pgtable_debug_args *args) WARN_ON(!pud_none(pud)); #endif /* __PAGETABLE_PMD_FOLDED */ pud = pfn_pud(args->pud_pfn, args->page_prot); - pud = pud_mkdevmap(pud); pud = pud_wrprotect(pud); pud = pud_mkclean(pud); set_pud_at(args->mm, vaddr, args->pudp, pud); @@ -384,7 +377,6 @@ static void __init pud_advanced_tests(struct pgtable_debug_args *args) #endif /* __PAGETABLE_PMD_FOLDED */ pud = pfn_pud(args->pud_pfn, args->page_prot); - pud = pud_mkdevmap(pud); pud = pud_mkyoung(pud); set_pud_at(args->mm, vaddr, args->pudp, pud); flush_dcache_page(page); @@ -693,53 +685,6 @@ static void __init pmd_protnone_tests(struct pgtable_debug_args *args) static void __init pmd_protnone_tests(struct pgtable_debug_args *args) { } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -#ifdef CONFIG_ARCH_HAS_PTE_DEVMAP -static void __init pte_devmap_tests(struct pgtable_debug_args *args) -{ - pte_t pte = pfn_pte(args->fixed_pte_pfn, args->page_prot); - - pr_debug("Validating PTE devmap\n"); - WARN_ON(!pte_devmap(pte_mkdevmap(pte))); -} - -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -static void __init pmd_devmap_tests(struct pgtable_debug_args *args) -{ - pmd_t pmd; - - if (!has_transparent_hugepage()) - return; - - pr_debug("Validating PMD devmap\n"); - pmd = pfn_pmd(args->fixed_pmd_pfn, args->page_prot); - WARN_ON(!pmd_devmap(pmd_mkdevmap(pmd))); -} - -#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD -static void __init pud_devmap_tests(struct pgtable_debug_args *args) -{ - pud_t pud; - - if (!has_transparent_pud_hugepage()) - return; - - pr_debug("Validating PUD devmap\n"); - pud = pfn_pud(args->fixed_pud_pfn, args->page_prot); - WARN_ON(!pud_devmap(pud_mkdevmap(pud))); -} -#else /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ -static void __init pud_devmap_tests(struct pgtable_debug_args *args) { } -#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ -#else /* CONFIG_TRANSPARENT_HUGEPAGE */ -static void __init pmd_devmap_tests(struct pgtable_debug_args *args) { } -static void __init pud_devmap_tests(struct pgtable_debug_args *args) { } -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -#else -static void __init pte_devmap_tests(struct pgtable_debug_args *args) { } -static void __init pmd_devmap_tests(struct pgtable_debug_args *args) { } -static void __init pud_devmap_tests(struct pgtable_debug_args *args) { } -#endif /* CONFIG_ARCH_HAS_PTE_DEVMAP */ - static void __init pte_soft_dirty_tests(struct pgtable_debug_args *args) { pte_t pte = pfn_pte(args->fixed_pte_pfn, args->page_prot); @@ -1333,10 +1278,6 @@ static int __init debug_vm_pgtable(void) pte_protnone_tests(&args); pmd_protnone_tests(&args); - pte_devmap_tests(&args); - pmd_devmap_tests(&args); - pud_devmap_tests(&args); - pte_soft_dirty_tests(&args); pmd_soft_dirty_tests(&args); pte_swap_soft_dirty_tests(&args); diff --git a/mm/hmm.c b/mm/hmm.c index 62d3082dc55c..f2415b4b2cdd 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -405,8 +405,7 @@ again: return 0; } -#if defined(CONFIG_ARCH_HAS_PTE_DEVMAP) && \ - defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) +#if defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) static inline unsigned long pud_to_hmm_pfn_flags(struct hmm_range *range, pud_t pud) { diff --git a/mm/madvise.c b/mm/madvise.c index 92f427b1b330..070132f9842b 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1069,7 +1069,7 @@ static int guard_install_pud_entry(pud_t *pud, unsigned long addr, pud_t pudval = pudp_get(pud); /* If huge return >0 so we abort the operation + zap. */ - return pud_trans_huge(pudval) || pud_devmap(pudval); + return pud_trans_huge(pudval); } static int guard_install_pmd_entry(pmd_t *pmd, unsigned long addr, @@ -1078,7 +1078,7 @@ static int guard_install_pmd_entry(pmd_t *pmd, unsigned long addr, pmd_t pmdval = pmdp_get(pmd); /* If huge return >0 so we abort the operation + zap. */ - return pmd_trans_huge(pmdval) || pmd_devmap(pmdval); + return pmd_trans_huge(pmdval); } static int guard_install_pte_entry(pte_t *pte, unsigned long addr, @@ -1189,7 +1189,7 @@ static int guard_remove_pud_entry(pud_t *pud, unsigned long addr, pud_t pudval = pudp_get(pud); /* If huge, cannot have guard pages present, so no-op - skip. */ - if (pud_trans_huge(pudval) || pud_devmap(pudval)) + if (pud_trans_huge(pudval)) walk->action = ACTION_CONTINUE; return 0; @@ -1201,7 +1201,7 @@ static int guard_remove_pmd_entry(pmd_t *pmd, unsigned long addr, pmd_t pmdval = pmdp_get(pmd); /* If huge, cannot have guard pages present, so no-op - skip. */ - if (pmd_trans_huge(pmdval) || pmd_devmap(pmdval)) + if (pmd_trans_huge(pmdval)) walk->action = ACTION_CONTINUE; return 0; -- cgit v1.2.3 From 0aa3657df3ec713fca1f00a57a063b28f2a78147 Mon Sep 17 00:00:00 2001 From: Dev Jain Date: Fri, 18 Jul 2025 14:32:40 +0530 Subject: mm: add batched versions of ptep_modify_prot_start/commit Batch ptep_modify_prot_start/commit in preparation for optimizing mprotect, implementing them as a simple loop over the corresponding single pte helpers. Architecture may override these helpers. Link: https://lkml.kernel.org/r/20250718090244.21092-4-dev.jain@arm.com Signed-off-by: Dev Jain Reviewed-by: Lorenzo Stoakes Reviewed-by: Barry Song Reviewed-by: Ryan Roberts Reviewed-by: Zi Yan Cc: Anshuman Khandual Cc: Catalin Marinas Cc: Christophe Leroy Cc: David Hildenbrand Cc: Hugh Dickins Cc: Jann Horn Cc: Joey Gouly Cc: Kevin Brodsky Cc: Lance Yang Cc: Liam Howlett Cc: Matthew Wilcox (Oracle) Cc: Peter Xu Cc: Vlastimil Babka Cc: Will Deacon Cc: Yang Shi Cc: Yicong Yang Cc: Zhenhua Huang Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 84 ++++++++++++++++++++++++++++++++++++++++++++++++- mm/mprotect.c | 4 +-- 2 files changed, 85 insertions(+), 3 deletions(-) (limited to 'include/linux/pgtable.h') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index cf1515c163e2..e3b99920be05 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1331,7 +1331,9 @@ static inline pte_t ptep_modify_prot_start(struct vm_area_struct *vma, /* * Commit an update to a pte, leaving any hardware-controlled bits in - * the PTE unmodified. + * the PTE unmodified. The pte returned from ptep_modify_prot_start() may + * additionally have young and/or dirty bits set where previously they were not, + * so the updated pte may have these additional changes. */ static inline void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, @@ -1340,6 +1342,86 @@ static inline void ptep_modify_prot_commit(struct vm_area_struct *vma, __ptep_modify_prot_commit(vma, addr, ptep, pte); } #endif /* __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION */ + +/** + * modify_prot_start_ptes - Start a pte protection read-modify-write transaction + * over a batch of ptes, which protects against asynchronous hardware + * modifications to the ptes. The intention is not to prevent the hardware from + * making pte updates, but to prevent any updates it may make from being lost. + * Please see the comment above ptep_modify_prot_start() for full description. + * + * @vma: The virtual memory area the pages are mapped into. + * @addr: Address the first page is mapped at. + * @ptep: Page table pointer for the first entry. + * @nr: Number of entries. + * + * May be overridden by the architecture; otherwise, implemented as a simple + * loop over ptep_modify_prot_start(), collecting the a/d bits from each pte + * in the batch. + * + * Note that PTE bits in the PTE batch besides the PFN can differ. + * + * Context: The caller holds the page table lock. The PTEs map consecutive + * pages that belong to the same folio. All other PTE bits must be identical for + * all PTEs in the batch except for young and dirty bits. The PTEs are all in + * the same PMD. + */ +#ifndef modify_prot_start_ptes +static inline pte_t modify_prot_start_ptes(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, unsigned int nr) +{ + pte_t pte, tmp_pte; + + pte = ptep_modify_prot_start(vma, addr, ptep); + while (--nr) { + ptep++; + addr += PAGE_SIZE; + tmp_pte = ptep_modify_prot_start(vma, addr, ptep); + if (pte_dirty(tmp_pte)) + pte = pte_mkdirty(pte); + if (pte_young(tmp_pte)) + pte = pte_mkyoung(pte); + } + return pte; +} +#endif + +/** + * modify_prot_commit_ptes - Commit an update to a batch of ptes, leaving any + * hardware-controlled bits in the PTE unmodified. + * + * @vma: The virtual memory area the pages are mapped into. + * @addr: Address the first page is mapped at. + * @ptep: Page table pointer for the first entry. + * @old_pte: Old page table entry (for the first entry) which is now cleared. + * @pte: New page table entry to be set. + * @nr: Number of entries. + * + * May be overridden by the architecture; otherwise, implemented as a simple + * loop over ptep_modify_prot_commit(). + * + * Context: The caller holds the page table lock. The PTEs are all in the same + * PMD. On exit, the set ptes in the batch map the same folio. The ptes set by + * ptep_modify_prot_start() may additionally have young and/or dirty bits set + * where previously they were not, so the updated ptes may have these + * additional changes. + */ +#ifndef modify_prot_commit_ptes +static inline void modify_prot_commit_ptes(struct vm_area_struct *vma, unsigned long addr, + pte_t *ptep, pte_t old_pte, pte_t pte, unsigned int nr) +{ + int i; + + for (i = 0; i < nr; ++i, ++ptep, addr += PAGE_SIZE) { + ptep_modify_prot_commit(vma, addr, ptep, old_pte, pte); + + /* Advance PFN only, set same prot */ + old_pte = pte_next_pfn(old_pte); + pte = pte_next_pfn(pte); + } +} +#endif + #endif /* CONFIG_MMU */ /* diff --git a/mm/mprotect.c b/mm/mprotect.c index 97adc62c50ab..4977f198168e 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -204,7 +204,7 @@ static long change_pte_range(struct mmu_gather *tlb, } } - oldpte = ptep_modify_prot_start(vma, addr, pte); + oldpte = modify_prot_start_ptes(vma, addr, pte, nr_ptes); ptent = pte_modify(oldpte, newprot); if (uffd_wp) @@ -230,7 +230,7 @@ static long change_pte_range(struct mmu_gather *tlb, can_change_pte_writable(vma, addr, ptent)) ptent = pte_mkwrite(ptent, vma); - ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); + modify_prot_commit_ptes(vma, addr, pte, oldpte, ptent, nr_ptes); if (pte_needs_flush(oldpte, ptent)) tlb_flush_pte_range(tlb, addr, PAGE_SIZE); pages++; -- cgit v1.2.3 From 3dfde97800e06882960cc926d2c428f2128b7c70 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Thu, 24 Jul 2025 10:52:59 +0530 Subject: mm: add get_and_clear_ptes() and clear_ptes() Patch series "Optimizations for khugepaged", v4. If the underlying folio mapped by the ptes is large, we can process those ptes in a batch using folio_pte_batch(). For arm64 specifically, this results in a 16x reduction in the number of ptep_get() calls, since on a contig block, ptep_get() on arm64 will iterate through all 16 entries to collect a/d bits. Next, ptep_clear() will cause a TLBI for every contig block in the range via contpte_try_unfold(). Instead, use clear_ptes() to only do the TLBI at the first and last contig block of the range. For split folios, there will be no pte batching; the batch size returned by folio_pte_batch() will be 1. For pagetable split folios, the ptes will still point to the same large folio; for arm64, this results in the optimization described above, and for other arches, a minor improvement is expected due to a reduction in the number of function calls and batching atomic operations. This patch (of 3): Let's add variants to be used where "full" does not apply -- which will be the majority of cases in the future. "full" really only applies if we are about to tear down a full MM. Use get_and_clear_ptes() in existing code, clear_ptes() users will be added next. Link: https://lkml.kernel.org/r/20250724052301.23844-2-dev.jain@arm.com Signed-off-by: David Hildenbrand Signed-off-by: Dev Jain Reviewed-by: Baolin Wang Reviewed-by: Barry Song Reviewed-by: Lorenzo Stoakes Reviewed-by: Zi Yan Cc: Liam Howlett Cc: Mariano Pache Cc: Ryan Roberts Signed-off-by: Andrew Morton --- arch/arm64/mm/mmu.c | 2 +- include/linux/pgtable.h | 45 +++++++++++++++++++++++++++++++++++++++++++++ mm/mremap.c | 2 +- mm/rmap.c | 2 +- 4 files changed, 48 insertions(+), 3 deletions(-) (limited to 'include/linux/pgtable.h') diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index abd9725796e9..20a89ab97dc5 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1528,7 +1528,7 @@ early_initcall(prevent_bootmem_remove_init); pte_t modify_prot_start_ptes(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, unsigned int nr) { - pte_t pte = get_and_clear_full_ptes(vma->vm_mm, addr, ptep, nr, /* full = */ 0); + pte_t pte = get_and_clear_ptes(vma->vm_mm, addr, ptep, nr); if (alternative_has_cap_unlikely(ARM64_WORKAROUND_2645198)) { /* diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index e3b99920be05..4c035637eeb7 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -736,6 +736,29 @@ static inline pte_t get_and_clear_full_ptes(struct mm_struct *mm, } #endif +/** + * get_and_clear_ptes - Clear present PTEs that map consecutive pages of + * the same folio, collecting dirty/accessed bits. + * @mm: Address space the pages are mapped into. + * @addr: Address the first page is mapped at. + * @ptep: Page table pointer for the first entry. + * @nr: Number of entries to clear. + * + * Use this instead of get_and_clear_full_ptes() if it is known that we don't + * need to clear the full mm, which is mostly the case. + * + * Note that PTE bits in the PTE range besides the PFN can differ. For example, + * some PTEs might be write-protected. + * + * Context: The caller holds the page table lock. The PTEs map consecutive + * pages that belong to the same folio. The PTEs are all in the same PMD. + */ +static inline pte_t get_and_clear_ptes(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, unsigned int nr) +{ + return get_and_clear_full_ptes(mm, addr, ptep, nr, 0); +} + #ifndef clear_full_ptes /** * clear_full_ptes - Clear present PTEs that map consecutive pages of the same @@ -768,6 +791,28 @@ static inline void clear_full_ptes(struct mm_struct *mm, unsigned long addr, } #endif +/** + * clear_ptes - Clear present PTEs that map consecutive pages of the same folio. + * @mm: Address space the pages are mapped into. + * @addr: Address the first page is mapped at. + * @ptep: Page table pointer for the first entry. + * @nr: Number of entries to clear. + * + * Use this instead of clear_full_ptes() if it is known that we don't need to + * clear the full mm, which is mostly the case. + * + * Note that PTE bits in the PTE range besides the PFN can differ. For example, + * some PTEs might be write-protected. + * + * Context: The caller holds the page table lock. The PTEs map consecutive + * pages that belong to the same folio. The PTEs are all in the same PMD. + */ +static inline void clear_ptes(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, unsigned int nr) +{ + clear_full_ptes(mm, addr, ptep, nr, 0); +} + /* * If two threads concurrently fault at the same page, the thread that * won the race updates the PTE and its local TLB/Cache. The other thread diff --git a/mm/mremap.c b/mm/mremap.c index ac39845e9718..677a4d744df9 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -280,7 +280,7 @@ static int move_ptes(struct pagetable_move_control *pmc, old_pte, max_nr_ptes); force_flush = true; } - pte = get_and_clear_full_ptes(mm, old_addr, old_ptep, nr_ptes, 0); + pte = get_and_clear_ptes(mm, old_addr, old_ptep, nr_ptes); pte = move_pte(pte, old_addr, new_addr); pte = move_soft_dirty_pte(pte); diff --git a/mm/rmap.c b/mm/rmap.c index f93ce27132ab..568198e9efc2 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2036,7 +2036,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, flush_cache_range(vma, address, end_addr); /* Nuke the page table entry. */ - pteval = get_and_clear_full_ptes(mm, address, pvmw.pte, nr_pages, 0); + pteval = get_and_clear_ptes(mm, address, pvmw.pte, nr_pages); /* * We clear the PTE but do not flush so potentially * a remote CPU could still be writing to the folio. -- cgit v1.2.3