summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2025-11-25sched: idle: Respect the CPU system wakeup QoS limit for s2idleUlf Hansson
A CPU system wakeup QoS limit may have been requested by user space. To avoid breaking this constraint when entering a low power state during s2idle, let's start to take into account the QoS limit. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com> Tested-by: Kevin Hilman (TI) <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/20251125112650.329269-5-ulf.hansson@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25pmdomain: Respect the CPU system wakeup QoS limit for s2idleUlf Hansson
A CPU system wakeup QoS limit may have been requested by user space. To avoid breaking this constraint when entering a low power state during s2idle through genpd, let's extend the corresponding genpd governor for CPUs. More precisely, during s2idle let the genpd governor select a suitable domain idle state, by taking into account the QoS limit. Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com> Tested-by: Kevin Hilman (TI) <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/20251125112650.329269-3-ulf.hansson@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25PM: QoS: Introduce a CPU system wakeup QoS limitUlf Hansson
Some platforms supports multiple low power states for CPUs that can be used when entering system-wide suspend. Currently we are always selecting the deepest possible state for the CPUs, which can break the system wakeup latency constraint that may be required for a use case. Let's take the first step towards addressing this problem, by introducing an interface for user space, that allows us to specify the CPU system wakeup QoS limit. Subsequent changes will start taking into account the new QoS limit. Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com> Tested-by: Kevin Hilman (TI) <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/20251125112650.329269-2-ulf.hansson@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25vfio/pci: Add vfio_pci_dma_buf_iommufd_map()Jason Gunthorpe
This function is used to establish the "private interconnect" between the VFIO DMABUF exporter and the iommufd DMABUF importer. This is intended to be a temporary API until the core DMABUF interface is improved to natively support a private interconnect and revocable negotiation. This function should only be called by iommufd when trying to map a DMABUF. For now iommufd will only support VFIO DMABUFs. The following improvements are needed in the DMABUF API to generically support more exporters with iommufd/kvm type importers that cannot use the DMA API: 1) Revoke semantics. VFIO needs to be able to prevent access to the MMIO during FLR, and so it will use dma_buf_move_notify() to prevent access. iommmufd does not support fault handling so it cannot implement the full move_notify. Instead if revoke is negotiated the exporter promises not to use move_notify() unless the importer can experiance failures. iommufd will unmap the dmabuf from the iommu page tables while it is revoked. 2) Private interconnect negotiation. iommufd will only be able to map a "private interconnect" that provides a phys_addr_t and a struct p2pdma_provider * to describe the memory. It cannot use a DMA mapped scatterlist since it is directly calling iommu_map(). 3) NULL device during dma_buf_dynamic_attach(). Since iommufd doesn't use the DMA API it doesn't have a DMAable struct device to pass here. Link: https://patch.msgid.link/r/1-v2-b2c110338e3f+5c2-iommufd_dmabuf_jgg@nvidia.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Shuai Xue <xueshuai@linux.alibaba.com> Acked-by: Alex Williamson <alex@shazbot.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-11-25Merge branch 'slab/for-6.19/mempool_alloc_bulk' into slab/for-nextVlastimil Babka
Merges series "mempool_alloc_bulk and various mempool improvements v3" from Christoph Hellwig. From the cover letter [1]: This series adds a bulk version of mempool_alloc that makes allocating multiple objects deadlock safe. The initial users is the blk-crypto-fallback code: https://lore.kernel.org/linux-block/20251031093517.1603379-1-hch@lst.de/ with which v1 was posted, but I also have a few other users in mind. Link: https://lore.kernel.org/all/20251113084022.1255121-1-hch@lst.de/ [1]
2025-11-25Merge branch 'slab/for-6.19/memdesc_prep' into slab/for-nextVlastimil Babka
Merge series "Prepare slab for memdescs" by Matthew Wilcox. From the cover letter [1]: When we separate struct folio, struct page and struct slab from each other, converting to folios then to slabs will be nonsense. It made sense under the 'folio is just a head page' interpretation, but with full separation, page_folio() will return NULL for a page which belongs to a slab. This patch series removes almost all mentions of folio from slab. There are a few folio_test_slab() invocations left around the tree that I haven't decided how to handle yet. We're not yet quite at the point of separately allocating struct slab, but that's what I'll be working on next. Link: https://lore.kernel.org/all/20251113000932.1589073-1-willy@infradead.org/ [1]
2025-11-25Merge branch 'slab/for-6.19/sheaves_cleanups' into slab/for-nextVlastimil Babka
Merge series "slab: preparatory cleanups before adding sheaves to all caches" [1] Cleanups that were written as part of the full sheaves conversion, which is not fully ready yet, but they are useful on their own. Link: https://lore.kernel.org/all/20251105-sheaves-cleanups-v1-0-b8218e1ac7ef@suse.cz/ [1]
2025-11-25Merge tag 'reset-for-v6.19' of https://git.pengutronix.de/git/pza/linux into ↵Arnd Bergmann
soc/drivers Reset controller updates for v6.19 * Add support for LAN969x, eic770 and RZ/G3S reset controllers, for the RZ/G3S USB-PHY reset controller, and for the remaining TH1520 reset controllers. * Drop legacy reset control lookup code. * Include linux/bits.h from linux/reset.h to make it self-contained. * tag 'reset-for-v6.19' of https://git.pengutronix.de/git/pza/linux: Documentation: reset: Remove reset_controller_add_lookup() reset: fix BIT macro reference reset: rzg2l-usbphy-ctrl: Fix a NULL vs IS_ERR() bug in probe reset: th1520: Support reset controllers in more subsystems reset: th1520: Prepare for supporting multiple controllers dt-bindings: reset: thead,th1520-reset: Add controllers for more subsys dt-bindings: reset: thead,th1520-reset: Remove non-VO-subsystem resets reset: remove legacy reset lookup code clk: davinci: psc: drop unused reset lookup reset: rzg2l-usbphy-ctrl: Add support for RZ/G3S SoC reset: rzg2l-usbphy-ctrl: Add support for USB PWRRDY dt-bindings: reset: renesas,rzg2l-usbphy-ctrl: Document RZ/G3S support reset: eswin: Add eic7700 reset driver dt-bindings: reset: eswin: Documentation for eic7700 SoC reset: sparx5: add LAN969x support dt-bindings: reset: microchip: Add LAN969x support Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25Merge tag 'qcom-drivers-for-6.19' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/drivers Qualcomm driver updates for v6.19 Support for hardware-keymanager v1 support for wrapped keys is introduce in the ICE driver. Support for the new Kaanapali mobile platform is added to last-level cache controller, pd-mapper, and UBWC drivers. UBWC driver gains support for the Monaco and Glymur platforms. The PMIC GLINK driver is extended to handle the differences found in targets where the related firmware runs on the SoCCP. Support for running on targets without initialized SMEM is provided, by reworking the SMEM driver to differentiate between "not yet probed" and "probed but there was no SMEM". An unwanted WARN_ON() that triggered if clients asked for a SMEM item beyond the currently running system's limit, was removed, to allow new use cases to gracefully fail on old targets. The Qualcomm socinfo driver is extended with support for version 20 through 23 and support for providing version information about more than 32 remote processors. Identifiers for QCS6490 and SM8850 are also added. Additionally, a number of smaller bug fixes and cleanups in PBS, OCMEM, GSBI, TZMEM, and MDT-loader are included. * tag 'qcom-drivers-for-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (31 commits) soc: qcom: mdt_loader: rename 'firmware' parameter of qcom_mdt_load() soc: qcom: mdt_loader: merge __qcom_mdt_load() and qcom_mdt_load_no_init() soc: qcom: socinfo: Add reserve field to support future extension soc: qcom: socinfo: Add support for new fields in revision 20 dt-bindings: firmware: qcom,scm: Document SCM on Kaanapali SOC soc: qcom: socinfo: add support to extract more than 32 image versions soc: qcom: smem: drop the WARN_ON() on SMEM item validation soc: qcom: ubwc: Add config for Kaanapali soc: qcom: socinfo: Add SoC ID for QCS6490 dt-bindings: arm: qcom,ids: Add SoC ID for QCS6490 soc: qcom: ice: Add HWKM v1 support for wrapped keys soc: qcom: smem: better track SMEM uninitialized state err.h: add INIT_ERR_PTR() macro soc: qcom: smem: fix hwspinlock resource leak in probe error paths dt-bindings: soc: qcom,aoss-qmp: Document the Glymur AOSS side channel dt-bindings: soc: qcom,aoss-qmp: Document the Kaanapali AOSS channel soc: qcom: ubwc: Add QCS8300 UBWC cfg dt-bindings: firmware: qcom,scm: Document Glymur scm soc: qcom: socinfo: Add SM8850 SoC ID dt-bindings: arm: qcom,ids: Add SoC ID for SM8850 ... Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25backlight: lp855x: Fix lp855x.h kernel-doc warningsRandy Dunlap
Add a missing struct short description and a missing leading " *" to lp855x.h to avoid kernel-doc warnings: Warning: include/linux/platform_data/lp855x.h:126 missing initial short description on line: * struct lp855x_platform_data Warning: include/linux/platform_data/lp855x.h:131 bad line: Only valid when mode is PWM_BASED. Fixes: 7be865ab8634 ("backlight: new backlight driver for LP855x devices") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Daniel Thompson (RISCstar) <danielt@kernel.org> Link: https://patch.msgid.link/20251111060916.1995920-1-rdunlap@infradead.org Signed-off-by: Lee Jones <lee@kernel.org>
2025-11-25ACPI: GTDT: Get rid of acpi_arch_timer_mem_init()Marc Zyngier
Since 0f67b56d84b4c ("clocksource/drivers/arm_arch_timer_mmio: Switch over to standalone driver"), acpi_arch_timer_mem_init() is unused. Remove it. Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Acked-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-25string: use __attribute__((nonnull())) in strends()Bartosz Golaszewski
The arguments of strends() must not be NULL so annotate the function with the nonnull attribute. Suggested-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20251118-strends-follow-up-v1-2-d3f8ef750f59@linaro.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-11-25fs: cosmetic fixes to lru handlingMateusz Guzik
1. inode_bit_waitqueue() was somehow placed between __inode_add_lru() and inode_add_lru(). move it up 2. assert ->i_lock is held in __inode_add_lru instead of just claiming it is needed 3. s/__inode_add_lru/__inode_lru_list_add/ for consistency with itself (inode_lru_list_del()) and similar routines for sb and io list management 4. push list presence check into inode_lru_list_del(), just like sb and io list Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251029131428.654761-2-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-25fs: rework I_NEW handling to operate without fencesMateusz Guzik
In the inode hash code grab the state while ->i_lock is held. If found to be set, synchronize the sleep once more with the lock held. In the real world the flag is not set most of the time. Apart from being simpler to reason about, it comes with a minor speed up as now clearing the flag does not require the smp_mb() fence. While here rename wait_on_inode() to wait_on_new_inode() to line it up with __wait_on_freeing_inode(). Christian Brauner <brauner@kernel.org> says: As per the discussion in [1] I folded in the diff sent in [2]. Link: https://lore.kernel.org/69238e4d.a70a0220.d98e3.006e.GAE@google.com [1] Link: https://lore.kernel.org/c2kpawomkbvtahjm7y5mposbhckb7wxthi3iqy5yr22ggpucrm@ufvxwy233qxo [2] Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251010221737.1403539-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-25wifi: ieee80211: correct FILS status codesRia Thomas
The FILS status codes are set to 108/109, but the IEEE 802.11-2020 spec defines them as 112/113. Update the enum so it matches the specification and keeps the kernel consistent with standard values. Fixes: a3caf7440ded ("cfg80211: Add support for FILS shared key authentication offload") Signed-off-by: Ria Thomas <ria.thomas@morsemicro.com> Reviewed-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Link: https://patch.msgid.link/20251124125637.3936154-1-ria.thomas@morsemicro.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-25Merge tag 'tegra-for-6.19-syscore' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into soc/drivers syscore: Changes for v6.19-rc1 Add a parameter to syscore operations to allow passing contextual data, which in turn enables refactoring of drivers to make them independent of global data. This initially only contains the API changes along with the updates for existing drivers. Subsequent work will make use of this to improve drivers. * tag 'tegra-for-6.19-syscore' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux: syscore: Pass context data to callbacks Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25fs, iomap: remove IOCB_DIO_CALLER_COMPChristoph Hellwig
This was added by commit 099ada2c8726 ("io_uring/rw: add write support for IOCB_DIO_CALLER_COMP") and disabled a little later by commit 838b35bb6a89 ("io_uring/rw: disable IOCB_DIO_CALLER_COMP") because it didn't work. Remove all the related code that sat unused for 2 years. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251113170633.1453259-2-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-25Merge tag 'samsung-drivers-6.19' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into soc/drivers Samsung SoC drivers for v6.19 1. ChipID driver: Add support for identifying Exynos8890 and Exynos9610. 2. PMU driver: Allow specifying list of valid registers for the custom regmap used on Google GS101 SoC. The PMU (Power Management Unit) on that SoC uses more complex access to registers than simple MMIO and invalid registers trigger aborts halting the system. 3. Few minor cleanups. 4. Several new bindings for compatible devices. * tag 'samsung-drivers-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux: dt-bindings: soc: samsung: exynos-pmu: allow mipi-phy subnode for Exynos7870 PMU soc: samsung: exynos-chipid: use a local dev variable dt-bindings: soc: samsung: exynos-sysreg: add gs101 hsi0 and misc compatibles dt-bindings: soc: samsung: exynos-sysreg: add power-domains soc: samsung: gs101-pmu: implement access tables for read and write soc: samsung: exynos-pmu: move some gs101 related code into new file soc: samsung: exynos-pmu: allow specifying read & write access tables for secure regmap dt-bindings: samsung: exynos-sysreg: add exynos7870 sysregs soc: samsung: exynos-chipid: add exynos8890 SoC support dt-bindings: hwinfo: samsung,exynos-chipid: add exynos8890-chipid compatible dt-bindings: soc: samsung: exynos-pmu: add exynos8890 compatible soc: samsung: exynos-pmu: Annotate online/offline functions with __must_hold soc: samsung: exynos-chipid: Add exynos9610 SoC support dt-bindings: hwinfo: samsung,exynos-chipid: add exynos9610 compatible dt-bindings: soc: samsung: exynos-sysreg: Add Exynos990 PERIC0/1 compatibles Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-11-25include/linux/fs.h: trivial fix: regualr -> regularAskar Safin
Trivial fix. Signed-off-by: Askar Safin <safinaskar@gmail.com> Link: https://patch.msgid.link/20251120195140.571608-1-safinaskar@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-25fs: Add uoff_tMatthew Wilcox (Oracle)
In a recent commit, I inadvertently changed a comparison from being an unsigned comparison (on 64-bit systems) to being a signed comparison (which it had always been on 32-bit systems). This led to a sporadic fstests failure. To make sure this comparison is always unsigned, introduce a new type, uoff_t which is the unsigned version of loff_t. Generally file sizes are restricted to being a signed integer, but in these two places it is convenient to pass -1 to indicate "up to the end of the file". Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://patch.msgid.link/20251123220518.1447261-1-willy@infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-25x86/cc: Fix enum spelling to fix kernel-doc warningsRandy Dunlap
Make the enum name in kernel-doc match the code to prevent kernel-doc warnings: Warning: include/linux/cc_platform.h:106 Enum value 'CC_ATTR_GUEST_SEV_SNP' not described in enum 'cc_attr' Warning: include/linux/cc_platform.h:106 Excess enum value '%CC_ATTR_SEV_SNP' description in 'cc_attr' Fixes: f742b90e61bb ("x86/mm: Extend cc_attr to include AMD SEV-SNP") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251125022730.3163679-1-rdunlap@infradead.org
2025-11-24PCI: Validate pci_rebar_size_supported() inputIlpo Järvinen
According to Dan Carpenter, smatch detects issue with size parameter given to pci_rebar_size_supported(): drivers/pci/rebar.c:142 pci_rebar_size_supported() error: undefined (user controlled) shift '(((1))) << size' The problem is this call tree, which uses the 'size' from the user to shift in BIT() without validating it: __resource_resize_store # takes 'buf' from user sysfs write kstrtoul(buf, 0, &size) # converts to unsigned long pci_resize_resource # truncates to int pci_rebar_size_supported # BIT(size) without validation There could be similar problems also with pci_resize_resource() parameter values coming from drivers. Add 'size' validation to pci_rebar_size_supported(). There seems to be no SZ_128T prior to this so add one to be able to specify the largest size supported by the kernel (PCIe r7.0 spec already defines sizes even beyond 128TB but kernel does not yet support them). The issue looks older than the introduction of pci_rebar_size_supported() by bb1fabd0d94e ("PCI: Add pci_rebar_size_supported() helper"). It would be also nice to convert 'size' unsigned too everywhere, maybe even u8 but that is left as further work. Fixes: 8bb705e3e79d ("PCI: Add pci_resize_resource() for resizing BARs") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/aSA1WiRG3RuhqZMY@stanley.mountain/ Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: commit log, add report URL] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20251124153740.2995-1-ilpo.jarvinen@linux.intel.com
2025-11-24hugetlb: add __read_mostly to sysctl_hugetlb_shm_groupGregory Price
sysctl bits are mostly-read values. Link: https://lkml.kernel.org/r/20251121194859.265259-2-gourry@gourry.net Signed-off-by: Gregory Price <gourry@gourry.net> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm: userfaultfd: add pgtable_supports_uffd_wp()Chunyan Zhang
Some platforms can customize the PTE/PMD entry uffd-wp bit making it unavailable even if the architecture provides the resource. This patch adds a macro API pgtable_supports_uffd_wp() that allows architectures to define their specific implementations to check if the uffd-wp bit is available on which device the kernel is running. Also this patch is removing "ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP" and "ifdef CONFIG_PTE_MARKER_UFFD_WP" in favor of pgtable_supports_uffd_wp() and uffd_supports_wp_marker() checks respectively that default to IS_ENABLED(CONFIG_HAVE_ARCH_USERFAULTFD_WP) and "IS_ENABLED(CONFIG_HAVE_ARCH_USERFAULTFD_WP) && IS_ENABLED(CONFIG_PTE_MARKER_UFFD_WP)" if not overridden by the architecture, no change in behavior is expected. Link: https://lkml.kernel.org/r/20251113072806.795029-3-zhangchunyan@iscas.ac.cn Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn> Acked-by: David Hildenbrand <david@redhat.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Jones <ajones@ventanamicro.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Conor Dooley <conor.dooley@microchip.com> Cc: Conor Dooley <conor@kernel.org> Cc: Deepak Gupta <debug@rivosinc.com> Cc: Jan Kara <jack@suse.cz> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rob Herring <robh@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yuanchu Xie <yuanchu@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm: softdirty: add pgtable_supports_soft_dirty()Chunyan Zhang
Patch series "mm: Add soft-dirty and uffd-wp support for RISC-V", v15. This patchset adds support for Svrsw60t59b [1] extension which is ratified now, also add soft dirty and userfaultfd write protect tracking for RISC-V. The patches 1 and 2 add macros to allow architectures to define their own checks if the soft-dirty / uffd_wp PTE bits are available, in other words for RISC-V, the Svrsw60t59b extension is supported on which device the kernel is running. Also patch1-2 are removing "ifdef CONFIG_MEM_SOFT_DIRTY" "ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP" and "ifdef CONFIG_PTE_MARKER_UFFD_WP" in favor of checks which if not overridden by the architecture, no change in behavior is expected. This patchset has been tested with kselftest mm suite in which soft-dirty, madv_populate, test_unmerge_uffd_wp, and uffd-unit-tests run and pass, and no regressions are observed in any of the other tests. This patch (of 6): Some platforms can customize the PTE PMD entry soft-dirty bit making it unavailable even if the architecture provides the resource. Add an API which architectures can define their specific implementations to detect if soft-dirty bit is available on which device the kernel is running. This patch is removing "ifdef CONFIG_MEM_SOFT_DIRTY" in favor of pgtable_supports_soft_dirty() checks that defaults to IS_ENABLED(CONFIG_MEM_SOFT_DIRTY), if not overridden by the architecture, no change in behavior is expected. We make sure to never set VM_SOFTDIRTY if !pgtable_supports_soft_dirty(), so we will never run into VM_SOFTDIRTY checks. [lorenzo.stoakes@oracle.com: fix VMA selftests] Link: https://lkml.kernel.org/r/dac6ddfe-773a-43d5-8f69-021b9ca4d24b@lucifer.local Link: https://lkml.kernel.org/r/20251113072806.795029-1-zhangchunyan@iscas.ac.cn Link: https://lkml.kernel.org/r/20251113072806.795029-2-zhangchunyan@iscas.ac.cn Link: https://github.com/riscv-non-isa/riscv-iommu/pull/543 [1] Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn> Acked-by: David Hildenbrand <david@redhat.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Conor Dooley <conor@kernel.org> Cc: Deepak Gupta <debug@rivosinc.com> Cc: Jan Kara <jack@suse.cz> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rob Herring <robh@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andrew Jones <ajones@ventanamicro.com> Cc: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24memcg: remove __lruvec_stat_mod_folioShakeel Butt
__lruvec_stat_mod_folio() is already safe against irqs, so there is no need to have a separate interface (i.e. lruvec_stat_mod_folio) which wraps calls to it with irq disabling and reenabling. Let's rename __lruvec_stat_mod_folio() to lruvec_stat_mod_folio(). Link: https://lkml.kernel.org/r/20251110232008.1352063-5-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24memcg: remove __mod_lruvec_stateShakeel Butt
__mod_lruvec_state() is already safe against irqs, so there is no need to have a separate interface (i.e. mod_lruvec_state) which wraps calls to it with irq disabling and reenabling. Let's rename __mod_lruvec_state() to mod_lruvec_state(). Link: https://lkml.kernel.org/r/20251110232008.1352063-4-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24memcg: remove __mod_lruvec_kmem_stateShakeel Butt
__mod_lruvec_kmem_state() is already safe against irqs, so there is no need to have a separate interface (i.e. mod_lruvec_kmem_state) which wraps calls to it with irq disabling and reenabling. Let's rename __mod_lruvec_kmem_state() to mod_lruvec_kmem_state(). Link: https://lkml.kernel.org/r/20251110232008.1352063-3-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Reviewed-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24memcg: use mod_node_page_state to update statsShakeel Butt
Patch series "memcg: cleanup the memcg stats interfaces". The memcg stats are safe against irq (and nmi) context and thus does not require disabling irqs. However for some stats which are also maintained at node level, it is using irq unsafe interface and thus requiring the users to still disables irqs or use interfaces which explicitly disables irqs. Let's move memcg code to use irq safe node level stats function which is already optimized for architectures with HAVE_CMPXCHG_LOCAL (all major ones), so there will not be any performance penalty for its usage. This patch (of 4): The memcg stats are safe against irq (and nmi) context and thus does not require disabling irqs. However some code paths for memcg stats also update the node level stats and use irq unsafe interface and thus require the users to disable irqs. However node level stats, on architectures with HAVE_CMPXCHG_LOCAL (all major ones), has interface which does not require irq disabling. Let's move memcg stats code to start using that interface for node level stats. Link: https://lkml.kernel.org/r/20251110232008.1352063-1-shakeel.butt@linux.dev Link: https://lkml.kernel.org/r/20251110232008.1352063-2-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm/huge_memory.c: introduce folio_split_unmappedBalbir Singh
Unmapped was added as a parameter to __folio_split() and related call sites to support splitting of folios already in the midst of a migration. This special case arose for device private folio migration since during migration there could be a disconnect between source and destination on the folio size. Introduce folio_split_unmapped() to handle this special case. Also refactor code and add __folio_freeze_and_split_unmapped() helper that is common to both __folio_split() and folio_split_unmapped(). This in turn removes the special casing introduced by the unmapped parameter in __folio_split(). [balbirs@nvidia.com: v2] Link: https://lkml.kernel.org/r/20251115084041.3914728-1-balbirs@nvidia.com [balbirs@nvidia.com: fix clang-20 build] Link: https://lkml.kernel.org/r/20251120134232.3588203-1-balbirs@nvidia.com [akpm@linux-foundation.org: add `inline' to shmem_uncharge() stub, per Balbir] Link: https://lkml.kernel.org/r/20251114012228.2634882-1-balbirs@nvidia.com Signed-off-by: Balbir Singh <balbirs@nvidia.com> Suggested-by: Zi Yan <ziy@nvidia.com> Acked-by: Zi Yan <ziy@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Gregory Price <gourry@gourry.net> Cc: Ying Huang <ying.huang@linux.alibaba.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Mika Penttilä <mpenttil@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm: thp: reparent the split queue during memcg offlineQi Zheng
Similar to list_lru, the split queue is relatively independent and does not need to be reparented along with objcg and LRU folios (holding objcg lock and lru lock). So let's apply the similar mechanism as list_lru to reparent the split queue separately when memcg is offine. This is also a preparation for reparenting LRU folios. Link: https://lkml.kernel.org/r/8703f907c4d1f7e8a2ef2bfed3036a84fa53028b.1762762324.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Nico Pache <npache@redhat.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm: thp: introduce folio_split_queue_lock and its variantsMuchun Song
In future memcg removal, the binding between a folio and a memcg may change, making the split lock within the memcg unstable when held. A new approach is required to reparent the split queue to its parent. This patch starts introducing a unified way to acquire the split lock for future work. It's a code-only refactoring with no functional changes. Link: https://lkml.kernel.org/r/a31a90bcac04dc754f775e87ae3205be3170b571.1762762324.git.zhengqi.arch@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nico Pache <npache@redhat.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm: replace remaining pte_to_swp_entry() with softleaf_from_pte()Lorenzo Stoakes
There are straggler invocations of pte_to_swp_entry() lying around, replace all of these with the software leaf entry equivalent - softleaf_from_pte(). With those removed, eliminate pte_to_swp_entry() altogether. No functional change intended. Link: https://lkml.kernel.org/r/d8ee5ccefe4c42d7c4fe1a2e46f285ac40421cd3.1762812360.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Byungchul Park <byungchul@sk.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chris Li <chrisl@kernel.org> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Gregory Price <gourry@gourry.net> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm: eliminate further swapops predicatesLorenzo Stoakes
Having converted so much of the code base to software leaf entries, we can mop up some remaining cases. We replace is_pfn_swap_entry(), pfn_swap_entry_to_page(), is_writable_device_private_entry(), is_device_exclusive_entry(), is_migration_entry(), is_writable_migration_entry(), is_readable_migration_entry(), swp_offset_pfn() and pfn_swap_entry_folio() with softleaf equivalents. No functional change intended. Link: https://lkml.kernel.org/r/956bc9c031604811c0070d2f4bf2f1373f230213.1762812360.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Byungchul Park <byungchul@sk.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chris Li <chrisl@kernel.org> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Gregory Price <gourry@gourry.net> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm: remove is_hugetlb_entry_[migration, hwpoisoned]()Lorenzo Stoakes
We do not need to have explicit helper functions for these, it adds a level of confusion and indirection when we can simply use software leaf entry logic here instead and spell out the special huge_pte_none() case we must consider. No functional change intended. Link: https://lkml.kernel.org/r/0e92d6924d3de88cd014ce1c53e20edc08fc152e.1762812360.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Byungchul Park <byungchul@sk.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chris Li <chrisl@kernel.org> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Gregory Price <gourry@gourry.net> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm: remove non_swap_entry() and use softleaf helpers insteadLorenzo Stoakes
There is simply no need for the hugely confusing concept of 'non-swap' swap entries now we have the concept of softleaf entries and relevant softleaf_xxx() helpers. Adjust all callers to use these instead and remove non_swap_entry() altogether. No functional change intended. Link: https://lkml.kernel.org/r/2562093f37f4a9cffea0447058014485eb50aaaf.1762812360.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Byungchul Park <byungchul@sk.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chris Li <chrisl@kernel.org> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Gregory Price <gourry@gourry.net> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm: remove remaining is_swap_pmd() users and is_swap_pmd()Lorenzo Stoakes
Update copy_huge_pmd() and change_huge_pmd() to use pmd_is_valid_softleaf() - as this checks for the only valid non-present huge PMD states. Also update mm/debug_vm_pgtable.c to explicitly test for a valid leaf PMD entry (which it was not before, which was incorrect), and have it test against pmd_is_huge() and pmd_is_valid_softleaf() rather than is_swap_pmd(). With these changes done there are no further users of is_swap_pmd(), so remove it. Link: https://lkml.kernel.org/r/1628b00b00c8498bbd2c20b82117ee87845fb738.1762812360.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Byungchul Park <byungchul@sk.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chris Li <chrisl@kernel.org> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Gregory Price <gourry@gourry.net> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm: introduce pmd_is_huge() and use where appropriateLorenzo Stoakes
The leaf entry PMD case is confusing as only migration entries and device private entries are valid at PMD level, not true swap entries. We repeatedly perform checks of the form is_swap_pmd() || pmd_trans_huge() which is itself confusing - it implies that leaf entries at PMD level exist and are different from huge entries. Address this confusion by introduced pmd_is_huge() which checks for either case. Sadly due to header dependency issues (huge_mm.h is included very early on in headers and cannot really rely on much else) we cannot use pmd_is_valid_softleaf() here. However since these are the only valid, handled cases the function is still achieving what it intends to do. We then replace all instances of is_swap_pmd() || pmd_trans_huge() with pmd_is_huge() invocations and adjust logic accordingly to accommodate this. No functional change intended. Link: https://lkml.kernel.org/r/00f79db3b15293cac8f7040a48d69c52d00117e4.1762812360.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Byungchul Park <byungchul@sk.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chris Li <chrisl@kernel.org> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Gregory Price <gourry@gourry.net> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm: replace pmd_to_swp_entry() with softleaf_from_pmd()Lorenzo Stoakes
Introduce softleaf_from_pmd() to do the equivalent operation for PMDs that softleaf_from_pte() fulfils, and cascade changes through code base accordingly, introducing helpers as necessary. We are then able to eliminate pmd_to_swp_entry(), is_pmd_migration_entry(), is_pmd_device_private_entry() and is_pmd_non_present_folio_entry(). This further establishes the use of leaf operations throughout the code base and further establishes the foundations for eliminating is_swap_pmd(). No functional change intended. [lorenzo.stoakes@oracle.com: check writable, not readable/writable, per Vlastimil] Link: https://lkml.kernel.org/r/cd97b6ec-00f9-45a4-9ae0-8f009c212a94@lucifer.local Link: https://lkml.kernel.org/r/3fb431699639ded8fdc63d2210aa77a38c8891f1.1762812360.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: SeongJae Park <sj@kernel.org>\ Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Byungchul Park <byungchul@sk.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chris Li <chrisl@kernel.org> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Gregory Price <gourry@gourry.net> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Wei Xu <weixugc@google.com> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm: avoid unnecessary use of is_swap_pmd()Lorenzo Stoakes
PMD 'non-swap' swap entries are currently used for PMD-level migration entries and device private entries. To add to the confusion in this terminology we use is_swap_pmd() in an inconsistent way similar to how is_swap_pte() was being used - sometimes adopting the convention that !pmd_none(), !pmd_present() implies PMD 'swap' entry, sometimes not. This patch handles the low-hanging fruit of cases where we can simply substitute other predicates for is_swap_pmd(). No functional change intended. Link: https://lkml.kernel.org/r/8a1704b36a009c18032d5bea4cb68e71448fbbe5.1762812360.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Byungchul Park <byungchul@sk.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chris Li <chrisl@kernel.org> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Gregory Price <gourry@gourry.net> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Xu <weixugc@google.com> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm: use leaf entries in debug pgtable + remove is_swap_pte()Lorenzo Stoakes
Remove invocations of is_swap_pte() in mm/debug_vm_pgtable.c and use softleaf_from_pte() and softleaf_is_swap() as necessary to replace this usage. We update the test code to use a 'true' swap entry throughout so we are guaranteed this is not a non-swap entry, so all asserts continue to operate correctly. With this change in place, we no longer use is_swap_pte() anywhere, so remove it. Link: https://lkml.kernel.org/r/222f352e7a99191b4bdfa77e835f2fc0dd83fa72.1762812360.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Byungchul Park <byungchul@sk.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chris Li <chrisl@kernel.org> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Gregory Price <gourry@gourry.net> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Wei Xu <weixugc@google.com> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm: avoid unnecessary uses of is_swap_pte()Lorenzo Stoakes
There's an established convention in the kernel that we treat PTEs as containing swap entries (and the unfortunately named non-swap swap entries) should they be neither empty (i.e. pte_none() evaluating true) nor present (i.e. pte_present() evaluating true). However, there is some inconsistency in how this is applied, as we also have the is_swap_pte() helper which explicitly performs this check: /* check whether a pte points to a swap entry */ static inline int is_swap_pte(pte_t pte) { return !pte_none(pte) && !pte_present(pte); } As this represents a predicate, and it's logical to assume that in order to establish that a PTE entry can correctly be manipulated as a swap/non-swap entry, this predicate seems as if it must first be checked. But we instead, we far more often utilise the established convention of checking pte_none() / pte_present() before operating on entries as if they were swap/non-swap. This patch works towards correcting this inconsistency by removing all uses of is_swap_pte() where we are already in a position where we perform pte_none()/pte_present() checks anyway or otherwise it is clearly logical to do so. We also take advantage of the fact that pte_swp_uffd_wp() is only set on swap entries. Additionally, update comments referencing to is_swap_pte() and non_swap_entry(). No functional change intended. Link: https://lkml.kernel.org/r/17fd6d7f46a846517fd455fadd640af47fcd7c55.1762812360.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Byungchul Park <byungchul@sk.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chris Li <chrisl@kernel.org> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Gregory Price <gourry@gourry.net> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Wei Xu <weixugc@google.com> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yuanchu Xie <yuanchu@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm: introduce leaf entry type and use to simplify leaf entry logicLorenzo Stoakes
The kernel maintains leaf page table entries which contain either: The kernel maintains leaf page table entries which contain either: - Nothing ('none' entries) - Present entries* - Everything else that will cause a fault which the kernel handles * Present entries are either entries the hardware can navigate without page fault or special cases like NUMA hint protnone or PMD with cleared present bit which contain hardware-valid entries modulo the present bit. In the 'everything else' group we include swap entries, but we also include a number of other things such as migration entries, device private entries and marker entries. Unfortunately this 'everything else' group expresses everything through a swp_entry_t type, and these entries are referred to swap entries even though they may well not contain a... swap entry. This is compounded by the rather mind-boggling concept of a non-swap swap entry (checked via non_swap_entry()) and the means by which we twist and turn to satisfy this. This patch lays the foundation for reducing this confusion. We refer to 'everything else' as a 'software-define leaf entry' or 'softleaf'. for short And in fact we scoop up the 'none' entries into this concept also so we are left with: - Present entries. - Softleaf entries (which may be empty). This allows for radical simplification across the board - one can simply convert any leaf page table entry to a leaf entry via softleaf_from_pte(). If the entry is present, we return an empty leaf entry, so it is assumed the caller is aware that they must differentiate between the two categories of page table entries, checking for the former via pte_present(). As a result, we can eliminate a number of places where we would otherwise need to use predicates to see if we can proceed with leaf page table entry conversion and instead just go ahead and do it unconditionally. We do so where we can, adjusting surrounding logic as necessary to integrate the new softleaf_t logic as far as seems reasonable at this stage. We typedef swp_entry_t to softleaf_t for the time being until the conversion can be complete, meaning everything remains compatible regardless of which type is used. We will eventually remove swp_entry_t when the conversion is complete. We introduce a new header file to keep things clear - leafops.h - this imports swapops.h so can direct replace swapops imports without issue, and we do so in all the files that require it. Additionally, add new leafops.h file to core mm maintainers entry. Link: https://lkml.kernel.org/r/c879383aac77d96a03e4d38f7daba893cd35fc76.1762812360.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Byungchul Park <byungchul@sk.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chris Li <chrisl@kernel.org> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Gregory Price <gourry@gourry.net> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Wei Xu <weixugc@google.com> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yuanchu Xie <yuanchu@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm: correctly handle UFFD PTE markersLorenzo Stoakes
Patch series "mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries", v3. There's an established convention in the kernel that we treat leaf page tables (so far at the PTE, PMD level) as containing 'swap entries' should they be neither empty (i.e. p**_none() evaluating true) nor present (i.e. p**_present() evaluating true). However, at the same time we also have helper predicates - is_swap_pte(), is_swap_pmd() - which are inconsistently used. This is problematic, as it is logical to assume that should somebody wish to operate upon a page table swap entry they should first check to see if it is in fact one. It also implies that perhaps, in future, we might introduce a non-present, none page table entry that is not a swap entry. This series resolves this issue by systematically eliminating all use of the is_swap_pte() and is swap_pmd() predicates so we retain only the convention that should a leaf page table entry be neither none nor present it is a swap entry. We also have the further issue that 'swap entry' is unfortunately a really rather overloaded term and in fact refers to both entries for swap and for other information such as migration entries, page table markers, and device private entries. We therefore have the rather 'unique' concept of a 'non-swap' swap entry. This series therefore introduces the concept of 'software leaf entries', of type softleaf_t, to eliminate this confusion. A software leaf entry in this sense is any page table entry which is non-present, and represented by the softleaf_t type. That is - page table leaf entries which are software-controlled by the kernel. This includes 'none' or empty entries, which are simply represented by an zero leaf entry value. In order to maintain compatibility as we transition the kernel to this new type, we simply typedef swp_entry_t to softleaf_t. We introduce a number of predicates and helpers to interact with software leaf entries in include/linux/leafops.h which, as it imports swapops.h, can be treated as a drop-in replacement for swapops.h wherever leaf entry helpers are used. Since softleaf_from_[pte, pmd]() treats present entries as they were empty/none leaf entries, this allows for a great deal of simplification of code throughout the code base, which this series utilises a great deal. We additionally change from swap entry to software leaf entry handling where it makes sense to and eliminate functions from swapops.h where software leaf entries obviate the need for the functions. This patch (of 16): PTE markers were previously only concerned with UFFD-specific logic - that is, PTE entries with the UFFD WP marker set or those marked via UFFDIO_POISON. However since the introduction of guard markers in commit 7c53dfbdb024 ("mm: add PTE_MARKER_GUARD PTE marker"), this has no longer been the case. Issues have been avoided as guard regions are not permitted in conjunction with UFFD, but it still leaves very confusing logic in place, most notably the misleading and poorly named pte_none_mostly() and huge_pte_none_mostly(). This predicate returns true for PTE entries that ought to be treated as none, but only in certain circumstances, and on the assumption we are dealing with H/W poison markers or UFFD WP markers. This patch removes these functions and makes each invocation of these functions instead explicitly check what it needs to check. As part of this effort it introduces is_uffd_pte_marker() to explicitly determine if a marker in fact is used as part of UFFD or not. In the HMM logic we note that the only time we would need to check for a fault is in the case of a UFFD WP marker, otherwise we simply encounter a fault error (VM_FAULT_HWPOISON for H/W poisoned marker, VM_FAULT_SIGSEGV for a guard marker), so only check for the UFFD WP case. While we're here we also refactor code to make it easier to understand. [akpm@linux-foundation.org: fix comment typo, per Mike] Link: https://lkml.kernel.org/r/cover.1762812360.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/c38625fd9a1c1f1cf64ae8a248858e45b3dcdf11.1762812360.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Byungchul Park <byungchul@sk.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chris Li <chrisl@kernel.org> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Gregory Price <gourry@gourry.net> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Kairui Song <kasong@tencent.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Nico Pache <npache@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Wei Xu <weixugc@google.com> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm/huge_memory: merge uniform_split_supported() and ↵Wei Yang
non_uniform_split_supported() uniform_split_supported() and non_uniform_split_supported() share significantly similar logic. The only functional difference is that uniform_split_supported() includes an additional check on the requested @new_order. The reason for this check comes from the following two aspects: * some file system or swap cache just supports order-0 folio * the behavioral difference between uniform/non-uniform split The behavioral difference between uniform split and non-uniform: * uniform split splits folio directly to @new_order * non-uniform split creates after-split folios with orders from folio_order(folio) - 1 to new_order. This means for non-uniform split or !new_order split we should check the file system and swap cache respectively. This commit unifies the logic and merge the two functions into a single combined helper, removing redundant code and simplifying the split support checking mechanism. Link: https://lkml.kernel.org/r/20251106034155.21398-3-richard.weiyang@gmail.com Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages") Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Zi Yan <ziy@nvidia.com> Cc: "David Hildenbrand (Red Hat)" <david@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm/huge_memory: introduce enum split_type for clarityWei Yang
Patch series "mm/huge_memory: Define split_type and consolidate split support checks", v3. This two-patch series focuses on improving code clarity and removing redundancy in the huge memory handling logic related to folio splitting. The series is based on an original proposal to merge two significantly identical functions that check folio split support[1]. During this process, we found an opportunity to improve readability by explicitly defining the split types. Patch 1: define split_type and use it Patch 2: merge uniform_split_supported() and non_uniform_split_supported() This patch (of 2): We currently handle two distinct types of large folio splitting: * uniform split * non-uniform split Differentiating between these types using a simple boolean variable is not obvious and can harm code readability. This commit introduces enum split_type to explicitly define these two types. Replacing the existing boolean variable with this enumeration significantly improves code clarity and expressiveness when dealing with folio splitting logic. No functional change is expected. [akpm@linux-foundation.org: tweak layout, per David] Link: https://lkml.kernel.org/r/20251106034155.21398-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20251106034155.21398-2-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: "David Hildenbrand (Red Hat)" <david@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm/huge_memory: fix kernel-doc comments for folio_split() and relatedZi Yan
try_folio_split_to_order(), folio_split, __folio_split(), and __split_unmapped_folio() do not have correct kernel-doc comment format. Fix them. [ziy@nvidia.com: kernel-doc fixup] Link: https://lkml.kernel.org/r/BE7AC5F3-9E64-4923-861D-C2C4E0CB91EB@nvidia.com [ziy@nvidia.com: add newline to fix an error and a warning from docutils] Link: https://lkml.kernel.org/r/040B38C0-23C6-4AEA-B069-69AE6DAA828B@nvidia.com Link: https://lkml.kernel.org/r/20251031162001.670503-4-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Lance Yang <lance.yang@linux.dev> Reviewed-by: Barry Song <baohua@kernel.org> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Nico Pache <npache@redhat.com> Cc: Pankaj Raghav <kernel@pankajraghav.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm/huge_memory: add split_huge_page_to_order()Zi Yan
Patch series "Optimize folio split in memory failure", v5. This patchset optimizes folio split operations in memory failure code by always splitting a folio to min_order_for_split() to minimize unusable pages, even if min_order_for_split() is non zero and memory failure code would take the failed path eventually for a successfully split folio. This means instead of making the entire original folio unusable memory failure code would only make its after-split folio, which has order of min_order_for_split() and contains HWPoison page, unusable. For soft offline case, since the original folio is still accessible, no split is performed if the folio cannot be split to order-0 to prevent potential performance loss. In addition, add split_huge_page_to_order() to improve code readability and fix kernel-doc comment format for folio_split() and other related functions. Background ========== This patchset is a follow-up of "[PATCH v3] mm/huge_memory: do not change split_huge_page*() target order silently."[1] and [PATCH v4] mm/huge_memory: preserve PG_has_hwpoisoned if a folio is split to >0 order[2], since both are separated out as hotfixes. It improves how memory failure code handles large block size(LBS) folios with min_order_for_split() > 0. By splitting a large folio containing HW poisoned pages to min_order_for_split(), the after-split folios without HW poisoned pages could be freed for reuse. To achieve this, folio split code needs to set has_hwpoisoned on after-split folios containing HW poisoned pages and it is done in the hotfix in [2]. This patchset includes: 1. A patch adds split_huge_page_to_order(), 2. Patch 2 and Patch 3 of "[PATCH v2 0/3] Do not change split folio target order"[3], This patch (of 3): When the caller does not supply a list to split_huge_page_to_list_to_order(), use split_huge_page_to_order() instead. Link: https://lkml.kernel.org/r/20251031162001.670503-1-ziy@nvidia.com Link: https://lkml.kernel.org/r/20251031162001.670503-2-ziy@nvidia.com Link: https://lore.kernel.org/all/20251017013630.139907-1-ziy@nvidia.com/ [1] Link: https://lore.kernel.org/all/20251023030521.473097-1-ziy@nvidia.com/ [2] Link: https://lore.kernel.org/all/20251016033452.125479-1-ziy@nvidia.com/ [3] Signed-off-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Barry Song <baohua@kernel.org> Reviewed-by: Lance Yang <lance.yang@linux.dev> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Nico Pache <npache@redhat.com> Cc: Pankaj Raghav <kernel@pankajraghav.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm/khugepaged: unify pmd folio installation with map_anon_folio_pmd()Wei Yang
Currently we install pmd folio with map_anon_folio_pmd() in __do_huge_pmd_anonymous_page() and do_huge_zero_wp_pmd(). While in collapse_huge_page(), it is done with identical code except statistics adjustment. Unify the process with map_anon_folio_pmd() to install pmd folio. Split it to map_anon_folio_pmd_pf() and map_anon_folio_pmd_nopf() to be used in page fault or not respectively. No functional change is intended. [akpm@linux-foundation.org: remove unneeded map_anon_folio_pmd_nopf() stub, per Wei & David] Link: https://lkml.kernel.org/r/20251008095453.18772-3-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: Lance Yang <lance.yang@linux.dev> Cc: David Hildenbrand <david@redhat.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Dev Jain <dev.jain@arm.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24mm/migrate_device: add THP splitting during migrationBalbir Singh
Implement migrate_vma_split_pages() to handle THP splitting during the migration process when destination cannot allocate compound pages. This addresses the common scenario where migrate_vma_setup() succeeds with MIGRATE_PFN_COMPOUND pages, but the destination device cannot allocate large pages during the migration phase. Key changes: - migrate_vma_split_pages(): Split already-isolated pages during migration - Enhanced folio_split() and __split_unmapped_folio() with isolated parameter to avoid redundant unmap/remap operations This provides a fallback mechansim to ensure migration succeeds even when large page allocation fails at the destination. [matthew.brost@intel.com: add THP splitting during migration] Link: https://lkml.kernel.org/r/20251120230825.181072-2-matthew.brost@intel.com Link: https://lkml.kernel.org/r/20251001065707.920170-12-balbirs@nvidia.com Signed-off-by: Balbir Singh <balbirs@nvidia.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Gregory Price <gourry@gourry.net> Cc: Ying Huang <ying.huang@linux.alibaba.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Mika Penttilä <mpenttil@redhat.com> Cc: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>