summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-kernel-mm-damon13
-rw-r--r--Documentation/admin-guide/cgroup-v2.rst4
-rw-r--r--Documentation/admin-guide/mm/damon/lru_sort.rst22
-rw-r--r--Documentation/admin-guide/mm/damon/reclaim.rst22
-rw-r--r--Documentation/admin-guide/mm/damon/stat.rst35
-rw-r--r--Documentation/admin-guide/mm/damon/usage.rst29
-rw-r--r--Documentation/admin-guide/mm/index.rst1
-rw-r--r--Documentation/admin-guide/mm/pagemap.rst3
-rw-r--r--Documentation/admin-guide/mm/swap_numa.rst78
-rw-r--r--Documentation/admin-guide/mm/transhuge.rst5
-rw-r--r--Documentation/admin-guide/mm/zswap.rst10
-rw-r--r--Documentation/devicetree/bindings/riscv/extensions.yaml6
-rw-r--r--Documentation/filesystems/porting.rst12
-rw-r--r--Documentation/filesystems/proc.rst5
-rw-r--r--Documentation/filesystems/vfs.rst4
-rw-r--r--Documentation/mm/damon/design.rst25
-rw-r--r--Documentation/mm/damon/maintainer-profile.rst10
-rw-r--r--Documentation/mm/memory-model.rst2
-rw-r--r--Documentation/mm/page_owner.rst32
-rw-r--r--Documentation/mm/process_addrs.rst9
-rw-r--r--Documentation/tools/rtla/common_options.txt2
-rw-r--r--Documentation/trace/ftrace.rst8
22 files changed, 216 insertions, 121 deletions
diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-damon b/Documentation/ABI/testing/sysfs-kernel-mm-damon
index b6b71db36ca7..4fb8b7a6d625 100644
--- a/Documentation/ABI/testing/sysfs-kernel-mm-damon
+++ b/Documentation/ABI/testing/sysfs-kernel-mm-damon
@@ -164,6 +164,13 @@ Description: Writing to and reading from this file sets and gets the pid of
the target process if the context is for virtual address spaces
monitoring, respectively.
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/targets/<T>/obsolete_target
+Date: Oct 2025
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the
+ obsoleteness of the matching parameters commit destination
+ target.
+
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/targets/<T>/regions/nr_regions
Date: Mar 2022
Contact: SeongJae Park <sj@kernel.org>
@@ -303,6 +310,12 @@ Contact: SeongJae Park <sj@kernel.org>
Description: Writing to and reading from this file sets and gets the nid
parameter of the goal.
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/goals/<G>/path
+Date: Oct 2025
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the path
+ parameter of the goal.
+
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/weights/sz_permil
Date: Mar 2022
Contact: SeongJae Park <sj@kernel.org>
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 4c072e85acdf..7f5b59d95fce 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1513,6 +1513,10 @@ The following nested keys are defined.
oom_group_kill
The number of times a group OOM has occurred.
+ sock_throttled
+ The number of times network sockets associated with
+ this cgroup are throttled.
+
memory.events.local
Similar to memory.events but the fields in the file are local
to the cgroup i.e. not hierarchical. The file modified event
diff --git a/Documentation/admin-guide/mm/damon/lru_sort.rst b/Documentation/admin-guide/mm/damon/lru_sort.rst
index 7b0775d281b4..72a943202676 100644
--- a/Documentation/admin-guide/mm/damon/lru_sort.rst
+++ b/Documentation/admin-guide/mm/damon/lru_sort.rst
@@ -211,6 +211,28 @@ End of target memory region in physical address.
The end physical address of memory region that DAMON_LRU_SORT will do work
against. By default, biggest System RAM is used as the region.
+addr_unit
+---------
+
+A scale factor for memory addresses and bytes.
+
+This parameter is for setting and getting the :ref:`address unit
+<damon_design_addr_unit>` parameter of the DAMON instance for DAMON_RECLAIM.
+
+``monitor_region_start`` and ``monitor_region_end`` should be provided in this
+unit. For example, let's suppose ``addr_unit``, ``monitor_region_start`` and
+``monitor_region_end`` are set as ``1024``, ``0`` and ``10``, respectively.
+Then DAMON_LRU_SORT will work for 10 KiB length of physical address range that
+starts from address zero (``[0 * 1024, 10 * 1024)`` in bytes).
+
+Stat parameters having ``bytes_`` prefix are also in this unit. For example,
+let's suppose values of ``addr_unit``, ``bytes_lru_sort_tried_hot_regions`` and
+``bytes_lru_sorted_hot_regions`` are ``1024``, ``42``, and ``32``,
+respectively. Then it means DAMON_LRU_SORT tried to LRU-sort 42 KiB of hot
+memory and successfully LRU-sorted 32 KiB of the memory in total.
+
+If unsure, use only the default value (``1``) and forget about this.
+
kdamond_pid
-----------
diff --git a/Documentation/admin-guide/mm/damon/reclaim.rst b/Documentation/admin-guide/mm/damon/reclaim.rst
index af05ae617018..8eba3da8dcee 100644
--- a/Documentation/admin-guide/mm/damon/reclaim.rst
+++ b/Documentation/admin-guide/mm/damon/reclaim.rst
@@ -232,6 +232,28 @@ The end physical address of memory region that DAMON_RECLAIM will do work
against. That is, DAMON_RECLAIM will find cold memory regions in this region
and reclaims. By default, biggest System RAM is used as the region.
+addr_unit
+---------
+
+A scale factor for memory addresses and bytes.
+
+This parameter is for setting and getting the :ref:`address unit
+<damon_design_addr_unit>` parameter of the DAMON instance for DAMON_RECLAIM.
+
+``monitor_region_start`` and ``monitor_region_end`` should be provided in this
+unit. For example, let's suppose ``addr_unit``, ``monitor_region_start`` and
+``monitor_region_end`` are set as ``1024``, ``0`` and ``10``, respectively.
+Then DAMON_RECLAIM will work for 10 KiB length of physical address range that
+starts from address zero (``[0 * 1024, 10 * 1024)`` in bytes).
+
+``bytes_reclaim_tried_regions`` and ``bytes_reclaimed_regions`` are also in
+this unit. For example, let's suppose values of ``addr_unit``,
+``bytes_reclaim_tried_regions`` and ``bytes_reclaimed_regions`` are ``1024``,
+``42``, and ``32``, respectively. Then it means DAMON_RECLAIM tried to reclaim
+42 KiB memory and successfully reclaimed 32 KiB memory in total.
+
+If unsure, use only the default value (``1``) and forget about this.
+
skip_anon
---------
diff --git a/Documentation/admin-guide/mm/damon/stat.rst b/Documentation/admin-guide/mm/damon/stat.rst
index 4c517c2c219a..e5a5a2c4f803 100644
--- a/Documentation/admin-guide/mm/damon/stat.rst
+++ b/Documentation/admin-guide/mm/damon/stat.rst
@@ -10,6 +10,8 @@ on the system's entire physical memory using DAMON, and provides simplified
access monitoring results statistics, namely idle time percentiles and
estimated memory bandwidth.
+.. _damon_stat_monitoring_accuracy_overhead:
+
Monitoring Accuracy and Overhead
================================
@@ -17,9 +19,11 @@ DAMON_STAT uses monitoring intervals :ref:`auto-tuning
<damon_design_monitoring_intervals_autotuning>` to make its accuracy high and
overhead minimum. It auto-tunes the intervals aiming 4 % of observable access
events to be captured in each snapshot, while limiting the resulting sampling
-events to be 5 milliseconds in minimum and 10 seconds in maximum. On a few
+interval to be 5 milliseconds in minimum and 10 seconds in maximum. On a few
production server systems, it resulted in consuming only 0.x % single CPU time,
-while capturing reasonable quality of access patterns.
+while capturing reasonable quality of access patterns. The tuning-resulting
+intervals can be retrieved via ``aggr_interval_us`` :ref:`parameter
+<damon_stat_aggr_interval_us>`.
Interface: Module Parameters
============================
@@ -41,6 +45,18 @@ You can enable DAMON_STAT by setting the value of this parameter as ``Y``.
Setting it as ``N`` disables DAMON_STAT. The default value is set by
``CONFIG_DAMON_STAT_ENABLED_DEFAULT`` build config option.
+.. _damon_stat_aggr_interval_us:
+
+aggr_interval_us
+----------------
+
+Auto-tuned aggregation time interval in microseconds.
+
+Users can read the aggregation interval of DAMON that is being used by the
+DAMON instance for DAMON_STAT. It is :ref:`auto-tuned
+<damon_stat_monitoring_accuracy_overhead>` and therefore the value is
+dynamically changed.
+
estimated_memory_bandwidth
--------------------------
@@ -58,12 +74,13 @@ memory_idle_ms_percentiles
Per-byte idle time (milliseconds) percentiles of the system.
DAMON_STAT calculates how long each byte of the memory was not accessed until
-now (idle time), based on the current DAMON results snapshot. If DAMON found a
-region of access frequency (nr_accesses) larger than zero, every byte of the
-region gets zero idle time. If a region has zero access frequency
-(nr_accesses), how long the region was keeping the zero access frequency (age)
-becomes the idle time of every byte of the region. Then, DAMON_STAT exposes
-the percentiles of the idle time values via this read-only parameter. Reading
-the parameter returns 101 idle time values in milliseconds, separated by comma.
+now (idle time), based on the current DAMON results snapshot. For regions
+having access frequency (nr_accesses) larger than zero, how long the current
+access frequency level was kept multiplied by ``-1`` becomes the idlee time of
+every byte of the region. If a region has zero access frequency (nr_accesses),
+how long the region was keeping the zero access frequency (age) becomes the
+idle time of every byte of the region. Then, DAMON_STAT exposes the
+percentiles of the idle time values via this read-only parameter. Reading the
+parameter returns 101 idle time values in milliseconds, separated by comma.
Each value represents 0-th, 1st, 2nd, 3rd, ..., 99th and 100th percentile idle
times.
diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst
index eae534bc1bee..9991dad60fcf 100644
--- a/Documentation/admin-guide/mm/damon/usage.rst
+++ b/Documentation/admin-guide/mm/damon/usage.rst
@@ -67,7 +67,7 @@ comma (",").
│ │ │ │ │ │ │ intervals_goal/access_bp,aggrs,min_sample_us,max_sample_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ :ref:`targets <sysfs_targets>`/nr_targets
- │ │ │ │ │ │ :ref:`0 <sysfs_target>`/pid_target
+ │ │ │ │ │ │ :ref:`0 <sysfs_target>`/pid_target,obsolete_target
│ │ │ │ │ │ │ :ref:`regions <sysfs_regions>`/nr_regions
│ │ │ │ │ │ │ │ :ref:`0 <sysfs_region>`/start,end
│ │ │ │ │ │ │ │ ...
@@ -81,7 +81,7 @@ comma (",").
│ │ │ │ │ │ │ :ref:`quotas <sysfs_quotas>`/ms,bytes,reset_interval_ms,effective_bytes
│ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
│ │ │ │ │ │ │ │ :ref:`goals <sysfs_schemes_quota_goals>`/nr_goals
- │ │ │ │ │ │ │ │ │ 0/target_metric,target_value,current_value,nid
+ │ │ │ │ │ │ │ │ │ 0/target_metric,target_value,current_value,nid,path
│ │ │ │ │ │ │ :ref:`watermarks <sysfs_watermarks>`/metric,interval_us,high,mid,low
│ │ │ │ │ │ │ :ref:`{core_,ops_,}filters <sysfs_filters>`/nr_filters
│ │ │ │ │ │ │ │ 0/type,matching,allow,memcg_path,addr_start,addr_end,target_idx,min,max
@@ -134,7 +134,8 @@ Users can write below commands for the kdamond to the ``state`` file.
- ``on``: Start running.
- ``off``: Stop running.
- ``commit``: Read the user inputs in the sysfs files except ``state`` file
- again.
+ again. Monitoring :ref:`target region <sysfs_regions>` inputs are also be
+ ignored if no target region is specified.
- ``update_tuned_intervals``: Update the contents of ``sample_us`` and
``aggr_us`` files of the kdamond with the auto-tuning applied ``sampling
interval`` and ``aggregation interval`` for the files. Please refer to
@@ -264,13 +265,20 @@ to ``N-1``. Each directory represents each monitoring target.
targets/<N>/
------------
-In each target directory, one file (``pid_target``) and one directory
-(``regions``) exist.
+In each target directory, two files (``pid_target`` and ``obsolete_target``)
+and one directory (``regions``) exist.
If you wrote ``vaddr`` to the ``contexts/<N>/operations``, each target should
be a process. You can specify the process to DAMON by writing the pid of the
process to the ``pid_target`` file.
+Users can selectively remove targets in the middle of the targets array by
+writing non-zero value to ``obsolete_target`` file and committing it (writing
+``commit`` to ``state`` file). DAMON will remove the matching targets from its
+internal targets array. Users are responsible to construct target directories
+again, so that those correctly represent the changed internal targets array.
+
+
.. _sysfs_regions:
targets/<N>/regions
@@ -289,6 +297,11 @@ In the beginning, this directory has only one file, ``nr_regions``. Writing a
number (``N``) to the file creates the number of child directories named ``0``
to ``N-1``. Each directory represents each initial monitoring target region.
+If ``nr_regions`` is zero when committing new DAMON parameters online (writing
+``commit`` to ``state`` file of :ref:`kdamond <sysfs_kdamond>`), the commit
+logic ignores the target regions. In other words, the current monitoring
+results for the target are preserved.
+
.. _sysfs_region:
regions/<N>/
@@ -402,9 +415,9 @@ number (``N``) to the file creates the number of child directories named ``0``
to ``N-1``. Each directory represents each goal and current achievement.
Among the multiple feedback, the best one is used.
-Each goal directory contains four files, namely ``target_metric``,
-``target_value``, ``current_value`` and ``nid``. Users can set and get the
-four parameters for the quota auto-tuning goals that specified on the
+Each goal directory contains five files, namely ``target_metric``,
+``target_value``, ``current_value`` ``nid`` and ``path``. Users can set and
+get the five parameters for the quota auto-tuning goals that specified on the
:ref:`design doc <damon_design_damos_quotas_auto_tuning>` by writing to and
reading from each of the files. Note that users should further write
``commit_schemes_quota_goals`` to the ``state`` file of the :ref:`kdamond
diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst
index ebc83ca20fdc..bbb563cba5d2 100644
--- a/Documentation/admin-guide/mm/index.rst
+++ b/Documentation/admin-guide/mm/index.rst
@@ -39,7 +39,6 @@ the Linux memory management.
shrinker_debugfs
slab
soft-dirty
- swap_numa
transhuge
userfaultfd
zswap
diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
index e60e9211fd9b..c57e61b5d8aa 100644
--- a/Documentation/admin-guide/mm/pagemap.rst
+++ b/Documentation/admin-guide/mm/pagemap.rst
@@ -115,7 +115,8 @@ Short descriptions to the page flags
A free memory block managed by the buddy system allocator.
The buddy system organizes free memory in blocks of various orders.
An order N block has 2^N physically contiguous pages, with the BUDDY flag
- set for and _only_ for the first page.
+ set for all pages.
+ Before 4.6 only the first page of the block had the flag set.
15 - COMPOUND_HEAD
A compound page with order N consists of 2^N physically contiguous pages.
A compound page with order 2 takes the form of "HTTT", where H donates its
diff --git a/Documentation/admin-guide/mm/swap_numa.rst b/Documentation/admin-guide/mm/swap_numa.rst
deleted file mode 100644
index 2e630627bcee..000000000000
--- a/Documentation/admin-guide/mm/swap_numa.rst
+++ /dev/null
@@ -1,78 +0,0 @@
-===========================================
-Automatically bind swap device to numa node
-===========================================
-
-If the system has more than one swap device and swap device has the node
-information, we can make use of this information to decide which swap
-device to use in get_swap_pages() to get better performance.
-
-
-How to use this feature
-=======================
-
-Swap device has priority and that decides the order of it to be used. To make
-use of automatically binding, there is no need to manipulate priority settings
-for swap devices. e.g. on a 2 node machine, assume 2 swap devices swapA and
-swapB, with swapA attached to node 0 and swapB attached to node 1, are going
-to be swapped on. Simply swapping them on by doing::
-
- # swapon /dev/swapA
- # swapon /dev/swapB
-
-Then node 0 will use the two swap devices in the order of swapA then swapB and
-node 1 will use the two swap devices in the order of swapB then swapA. Note
-that the order of them being swapped on doesn't matter.
-
-A more complex example on a 4 node machine. Assume 6 swap devices are going to
-be swapped on: swapA and swapB are attached to node 0, swapC is attached to
-node 1, swapD and swapE are attached to node 2 and swapF is attached to node3.
-The way to swap them on is the same as above::
-
- # swapon /dev/swapA
- # swapon /dev/swapB
- # swapon /dev/swapC
- # swapon /dev/swapD
- # swapon /dev/swapE
- # swapon /dev/swapF
-
-Then node 0 will use them in the order of::
-
- swapA/swapB -> swapC -> swapD -> swapE -> swapF
-
-swapA and swapB will be used in a round robin mode before any other swap device.
-
-node 1 will use them in the order of::
-
- swapC -> swapA -> swapB -> swapD -> swapE -> swapF
-
-node 2 will use them in the order of::
-
- swapD/swapE -> swapA -> swapB -> swapC -> swapF
-
-Similaly, swapD and swapE will be used in a round robin mode before any
-other swap devices.
-
-node 3 will use them in the order of::
-
- swapF -> swapA -> swapB -> swapC -> swapD -> swapE
-
-
-Implementation details
-======================
-
-The current code uses a priority based list, swap_avail_list, to decide
-which swap device to use and if multiple swap devices share the same
-priority, they are used round robin. This change here replaces the single
-global swap_avail_list with a per-numa-node list, i.e. for each numa node,
-it sees its own priority based list of available swap devices. Swap
-device's priority can be promoted on its matching node's swap_avail_list.
-
-The current swap device's priority is set as: user can set a >=0 value,
-or the system will pick one starting from -1 then downwards. The priority
-value in the swap_avail_list is the negated value of the swap device's
-due to plist being sorted from low to high. The new policy doesn't change
-the semantics for priority >=0 cases, the previous starting from -1 then
-downwards now becomes starting from -2 then downwards and -1 is reserved
-as the promoted value. So if multiple swap devices are attached to the same
-node, they will all be promoted to priority -1 on that node's plist and will
-be used round robin before any other swap devices.
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 1654211cc6cf..5fbc3d89bb07 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -381,6 +381,11 @@ hugepage allocation policy for the tmpfs mount by using the kernel parameter
four valid policies for tmpfs (``always``, ``within_size``, ``advise``,
``never``). The tmpfs mount default policy is ``never``.
+Additionally, Kconfig options are available to set the default hugepage
+policies for shmem (``CONFIG_TRANSPARENT_HUGEPAGE_SHMEM_HUGE_*``) and tmpfs
+(``CONFIG_TRANSPARENT_HUGEPAGE_TMPFS_HUGE_*``) at build time. Refer to the
+Kconfig help for more details.
+
In the same manner as ``thp_anon`` controls each supported anonymous THP
size, ``thp_shmem`` controls each supported shmem THP size. ``thp_shmem``
has the same format as ``thp_anon``, but also supports the policy
diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst
index 283d77217c6f..2464425c783d 100644
--- a/Documentation/admin-guide/mm/zswap.rst
+++ b/Documentation/admin-guide/mm/zswap.rst
@@ -59,11 +59,11 @@ returned by the allocation routine and that handle must be mapped before being
accessed. The compressed memory pool grows on demand and shrinks as compressed
pages are freed. The pool is not preallocated.
-When a swap page is passed from swapout to zswap, zswap maintains a mapping
-of the swap entry, a combination of the swap type and swap offset, to the
-zsmalloc handle that references that compressed swap page. This mapping is
-achieved with a red-black tree per swap type. The swap offset is the search
-key for the tree nodes.
+When a swap page is passed from swapout to zswap, zswap maintains a mapping of
+the swap entry, a combination of the swap type and swap offset, to the zsmalloc
+handle that references that compressed swap page. This mapping is achieved
+with an xarray per swap type. The swap offset is the search key for the xarray
+nodes.
During a page fault on a PTE that is a swap entry, the swapin code calls the
zswap load function to decompress the page into the page allocated by the page
diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml
index 543ac94718e8..194ef4754452 100644
--- a/Documentation/devicetree/bindings/riscv/extensions.yaml
+++ b/Documentation/devicetree/bindings/riscv/extensions.yaml
@@ -217,6 +217,12 @@ properties:
memory types as ratified in the 20191213 version of the privileged
ISA specification.
+ - const: svrsw60t59b
+ description:
+ The Svrsw60t59b extension for providing two more bits[60:59] to
+ PTE/PMD entry as ratified at commit 28bde925e7a7 ("PTE Reserved
+ for SW bits 60:59") of riscv-non-isa/riscv-iommu.
+
- const: svvptc
description:
The standard Svvptc supervisor-level extension for
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index d33429294252..3397937ed838 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1286,6 +1286,11 @@ The vm_area_desc provides the minimum required information for a filesystem
to initialise state upon memory mapping of a file-backed region, and output
parameters for the file system to set this state.
+In nearly all cases, this is all that is required for a filesystem. However, if
+a filesystem needs to perform an operation such a pre-population of page tables,
+then that action can be specified in the vm_area_desc->action field, which can
+be configured using the mmap_action_*() helpers.
+
---
**mandatory**
@@ -1322,3 +1327,10 @@ When vfs_mkdir() returns an error, and so both dputs() the original
dentry and doesn't provide a replacement, it also unlocks the parent.
Consequently the return value from vfs_mkdir() can be passed to
end_creating() and the parent will be unlocked precisely when necessary.
+
+---
+
+**mandatory**
+
+kill_litter_super() is gone; convert to DCACHE_PERSISTENT use (as all
+in-tree filesystems have done).
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 0b86a8022fa1..8256e857e2d7 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -553,7 +553,7 @@ otherwise.
kernel flags associated with the particular virtual memory area in two letter
encoded manner. The codes are the following:
- == =======================================
+ == =============================================================
rd readable
wr writeable
ex executable
@@ -591,7 +591,8 @@ encoded manner. The codes are the following:
sl sealed
lf lock on fault pages
dp always lazily freeable mapping
- == =======================================
+ gu maybe contains guard regions (if not set, definitely doesn't)
+ == =============================================================
Note that there is no guarantee that every flag and associated mnemonic will
be present in all further kernel releases. Things get changed, the flags may
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 4f13b01e42eb..670ba66b60e4 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1213,6 +1213,10 @@ otherwise noted.
file-backed memory mapping, most notably establishing relevant
private state and VMA callbacks.
+ If further action such as pre-population of page tables is required,
+ this can be specified by the vm_area_desc->action field and related
+ parameters.
+
Note that the file operations are implemented by the specific
filesystem in which the inode resides. When opening a device node
(character or block special) most filesystems will call special
diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst
index 80354f4f42ba..2d8d8ca1e0a3 100644
--- a/Documentation/mm/damon/design.rst
+++ b/Documentation/mm/damon/design.rst
@@ -381,8 +381,8 @@ That is, assumes 4% (20% of 20%) DAMON-observed access events ratio (source)
to capture 64% (80% multipled by 80%) real access events (outcomes).
To know how user-space can use this feature via :ref:`DAMON sysfs interface
-<sysfs_interface>`, refer to :ref:`intervals_goal <sysfs_scheme>` part of
-the documentation.
+<sysfs_interface>`, refer to :ref:`intervals_goal
+<damon_usage_sysfs_monitoring_intervals_goal>` part of the documentation.
.. _damon_design_damos:
@@ -564,9 +564,9 @@ aggressiveness (the quota) of the corresponding scheme. For example, if DAMOS
is under achieving the goal, DAMOS automatically increases the quota. If DAMOS
is over achieving the goal, it decreases the quota.
-The goal can be specified with four parameters, namely ``target_metric``,
-``target_value``, ``current_value`` and ``nid``. The auto-tuning mechanism
-tries to make ``current_value`` of ``target_metric`` be same to
+The goal can be specified with five parameters, namely ``target_metric``,
+``target_value``, ``current_value``, ``nid`` and ``path``. The auto-tuning
+mechanism tries to make ``current_value`` of ``target_metric`` be same to
``target_value``.
- ``user_input``: User-provided value. Users could use any metric that they
@@ -581,9 +581,18 @@ tries to make ``current_value`` of ``target_metric`` be same to
set by users at the initial time. In other words, DAMOS does self-feedback.
- ``node_mem_used_bp``: Specific NUMA node's used memory ratio in bp (1/10,000).
- ``node_mem_free_bp``: Specific NUMA node's free memory ratio in bp (1/10,000).
-
-``nid`` is optionally required for only ``node_mem_used_bp`` and
-``node_mem_free_bp`` to point the specific NUMA node.
+- ``node_memcg_used_bp``: Specific cgroup's node used memory ratio for a
+ specific NUMA node, in bp (1/10,000).
+- ``node_memcg_free_bp``: Specific cgroup's node unused memory ratio for a
+ specific NUMA node, in bp (1/10,000).
+
+``nid`` is optionally required for only ``node_mem_used_bp``,
+``node_mem_free_bp``, ``node_memcg_used_bp`` and ``node_memcg_free_bp`` to
+point the specific NUMA node.
+
+``path`` is optionally required for only ``node_memcg_used_bp`` and
+``node_memcg_free_bp`` to point the path to the cgroup. The value should be
+the path of the memory cgroup from the cgroups mount point.
To know how user-space can set the tuning goal metric, the target value, and/or
the current value via :ref:`DAMON sysfs interface <sysfs_interface>`, refer to
diff --git a/Documentation/mm/damon/maintainer-profile.rst b/Documentation/mm/damon/maintainer-profile.rst
index 58a3fb3c5762..e761edada1e9 100644
--- a/Documentation/mm/damon/maintainer-profile.rst
+++ b/Documentation/mm/damon/maintainer-profile.rst
@@ -27,8 +27,8 @@ maintainer.
Note again the patches for `mm-new tree
<https://git.kernel.org/akpm/mm/h/mm-new>`_ are queued by the memory management
-subsystem maintainer. If the patches requires some patches in `damon/next tree
-<https://git.kernel.org/sj/h/damon/next>`_ which not yet merged in mm-new,
+subsystem maintainer. If the patches require some patches in `damon/next tree
+<https://git.kernel.org/sj/h/damon/next>`_ which have not yet merged in mm-new,
please make sure the requirement is clearly specified.
Submit checklist addendum
@@ -57,7 +57,7 @@ Key cycle dates
Patches can be sent anytime. Key cycle dates of the `mm-new
<https://git.kernel.org/akpm/mm/h/mm-new>`_, `mm-unstable
-<https://git.kernel.org/akpm/mm/h/mm-unstable>`_and `mm-stable
+<https://git.kernel.org/akpm/mm/h/mm-unstable>`_ and `mm-stable
<https://git.kernel.org/akpm/mm/h/mm-stable>`_ trees depend on the memory
management subsystem maintainer.
@@ -99,5 +99,5 @@ Schedules and reservation status are available at the Google `doc
<https://docs.google.com/document/d/1v43Kcj3ly4CYqmAkMaZzLiM2GEnWfgdGbZAH3mi2vpM/edit?usp=sharing>`_.
There is also a public Google `calendar
<https://calendar.google.com/calendar/u/0?cid=ZDIwOTA4YTMxNjc2MDQ3NTIyMmUzYTM5ZmQyM2U4NDA0ZGIwZjBiYmJlZGQxNDM0MmY4ZTRjOTE0NjdhZDRiY0Bncm91cC5jYWxlbmRhci5nb29nbGUuY29t>`_
-that has the events. Anyone can subscribe it. DAMON maintainer will also
-provide periodic reminder to the mailing list (damon@lists.linux.dev).
+that has the events. Anyone can subscribe to it. DAMON maintainer will also
+provide periodic reminders to the mailing list (damon@lists.linux.dev).
diff --git a/Documentation/mm/memory-model.rst b/Documentation/mm/memory-model.rst
index 5f3eafbbc520..7957122039e8 100644
--- a/Documentation/mm/memory-model.rst
+++ b/Documentation/mm/memory-model.rst
@@ -165,7 +165,7 @@ The users of `ZONE_DEVICE` are:
* pmem: Map platform persistent memory to be used as a direct-I/O target
via DAX mappings.
-* hmm: Extend `ZONE_DEVICE` with `->page_fault()` and `->page_free()`
+* hmm: Extend `ZONE_DEVICE` with `->page_fault()` and `->folio_free()`
event callbacks to allow a device-driver to coordinate memory management
events related to device-memory, typically GPU memory. See
Documentation/mm/hmm.rst.
diff --git a/Documentation/mm/page_owner.rst b/Documentation/mm/page_owner.rst
index 3a45a20fc05a..6b12f3b007ec 100644
--- a/Documentation/mm/page_owner.rst
+++ b/Documentation/mm/page_owner.rst
@@ -27,7 +27,10 @@ enabled. Other usages are more than welcome.
It can also be used to show all the stacks and their current number of
allocated base pages, which gives us a quick overview of where the memory
is going without the need to screen through all the pages and match the
-allocation and free operation.
+allocation and free operation. It's also possible to show only a numeric
+identifier of all the stacks (without stack traces) and their number of
+allocated base pages (faster to read and parse, eg, for monitoring) that
+can be matched with stacks later (show_handles and show_stacks_handles).
page owner is disabled by default. So, if you'd like to use it, you need
to add "page_owner=on" to your boot cmdline. If the kernel is built
@@ -116,6 +119,33 @@ Usage
nr_base_pages: 20824
...
+ cat /sys/kernel/debug/page_owner_stacks/show_handles > handles_7000.txt
+ cat handles_7000.txt
+ handle: 42
+ nr_base_pages: 20824
+ ...
+
+ cat /sys/kernel/debug/page_owner_stacks/show_stacks_handles > stacks_handles.txt
+ cat stacks_handles.txt
+ post_alloc_hook+0x177/0x1a0
+ get_page_from_freelist+0xd01/0xd80
+ __alloc_pages+0x39e/0x7e0
+ alloc_pages_mpol+0x22e/0x490
+ folio_alloc+0xd5/0x110
+ filemap_alloc_folio+0x78/0x230
+ page_cache_ra_order+0x287/0x6f0
+ filemap_get_pages+0x517/0x1160
+ filemap_read+0x304/0x9f0
+ xfs_file_buffered_read+0xe6/0x1d0 [xfs]
+ xfs_file_read_iter+0x1f0/0x380 [xfs]
+ __kernel_read+0x3b9/0x730
+ kernel_read_file+0x309/0x4d0
+ __do_sys_finit_module+0x381/0x730
+ do_syscall_64+0x8d/0x150
+ entry_SYSCALL_64_after_hwframe+0x62/0x6a
+ handle: 42
+ ...
+
cat /sys/kernel/debug/page_owner > page_owner_full.txt
./page_owner_sort page_owner_full.txt sorted_page_owner.txt
diff --git a/Documentation/mm/process_addrs.rst b/Documentation/mm/process_addrs.rst
index be49e2a269e4..7f2f3e87071d 100644
--- a/Documentation/mm/process_addrs.rst
+++ b/Documentation/mm/process_addrs.rst
@@ -48,7 +48,8 @@ Terminology
* **VMA locks** - The VMA lock is at VMA granularity (of course) which behaves
as a read/write semaphore in practice. A VMA read lock is obtained via
:c:func:`!lock_vma_under_rcu` (and unlocked via :c:func:`!vma_end_read`) and a
- write lock via :c:func:`!vma_start_write` (all VMA write locks are unlocked
+ write lock via vma_start_write() or vma_start_write_killable()
+ (all VMA write locks are unlocked
automatically when the mmap write lock is released). To take a VMA write lock
you **must** have already acquired an :c:func:`!mmap_write_lock`.
* **rmap locks** - When trying to access VMAs through the reverse mapping via a
@@ -907,3 +908,9 @@ Stack expansion
Stack expansion throws up additional complexities in that we cannot permit there
to be racing page faults, as a result we invoke :c:func:`!vma_start_write` to
prevent this in :c:func:`!expand_downwards` or :c:func:`!expand_upwards`.
+
+------------------------
+Functions and structures
+------------------------
+
+.. kernel-doc:: include/linux/mmap_lock.h
diff --git a/Documentation/tools/rtla/common_options.txt b/Documentation/tools/rtla/common_options.txt
index 1c4f3e663cf5..6caa51d02934 100644
--- a/Documentation/tools/rtla/common_options.txt
+++ b/Documentation/tools/rtla/common_options.txt
@@ -48,7 +48,7 @@
If not set, tracer threads keep their default priority. For rtla user threads, it is set to SCHED_FIFO with priority 95. For kernel threads, see *osnoise* and *timerlat* tracer documentation for the running kernel version.
-**-C**, **--cgroup**\[*=cgroup*]
+**-C**, **--cgroup** \[*cgroup*]
Set a *cgroup* to the tracer's threads. If the **-C** option is passed without arguments, the tracer's thread will inherit **rtla**'s *cgroup*. Otherwise, the threads will be placed on the *cgroup* passed to the option.
diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst
index aef674df3afd..d1f313a5f4ad 100644
--- a/Documentation/trace/ftrace.rst
+++ b/Documentation/trace/ftrace.rst
@@ -366,6 +366,14 @@ of ftrace. Here is a list of some of the key files:
for each function. The displayed address is the patch-site address
and can differ from /proc/kallsyms address.
+ syscall_user_buf_size:
+
+ Some system call trace events will record the data from a user
+ space address that one of the parameters point to. The amount of
+ data per event is limited. This file holds the max number of bytes
+ that will be recorded into the ring buffer to hold this data.
+ The max value is currently 165.
+
dyn_ftrace_total_info:
This file is for debugging purposes. The number of functions that