summaryrefslogtreecommitdiff
path: root/Documentation/admin-guide
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/admin-guide')
-rw-r--r--Documentation/admin-guide/LSM/SafeSetID.rst2
-rw-r--r--Documentation/admin-guide/RAS/main.rst2
-rw-r--r--Documentation/admin-guide/aoe/udev.txt6
-rw-r--r--Documentation/admin-guide/blockdev/paride.rst2
-rw-r--r--Documentation/admin-guide/bug-hunting.rst2
-rw-r--r--Documentation/admin-guide/cgroup-v2.rst33
-rw-r--r--Documentation/admin-guide/device-mapper/delay.rst8
-rw-r--r--Documentation/admin-guide/device-mapper/dm-pcache.rst202
-rw-r--r--Documentation/admin-guide/device-mapper/index.rst1
-rw-r--r--Documentation/admin-guide/device-mapper/vdo-design.rst2
-rw-r--r--Documentation/admin-guide/device-mapper/vdo.rst1
-rw-r--r--Documentation/admin-guide/ext4.rst2
-rw-r--r--Documentation/admin-guide/hw-vuln/attack_vector_controls.rst6
-rw-r--r--Documentation/admin-guide/hw-vuln/index.rst1
-rw-r--r--Documentation/admin-guide/hw-vuln/mds.rst2
-rw-r--r--Documentation/admin-guide/hw-vuln/spectre.rst6
-rw-r--r--Documentation/admin-guide/hw-vuln/vmscape.rst110
-rw-r--r--Documentation/admin-guide/kdump/kdump.rst2
-rw-r--r--Documentation/admin-guide/kernel-parameters.rst4
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt52
-rw-r--r--Documentation/admin-guide/laptops/laptop-mode.rst8
-rw-r--r--Documentation/admin-guide/laptops/lg-laptop.rst4
-rw-r--r--Documentation/admin-guide/laptops/sonypi.rst2
-rw-r--r--Documentation/admin-guide/md.rst2
-rw-r--r--Documentation/admin-guide/media/i2c-cardlist.rst1
-rw-r--r--Documentation/admin-guide/media/imx.rst2
-rw-r--r--Documentation/admin-guide/media/ivtv.rst2
-rw-r--r--Documentation/admin-guide/media/si4713.rst6
-rw-r--r--Documentation/admin-guide/mm/damon/start.rst2
-rw-r--r--Documentation/admin-guide/mm/damon/usage.rst13
-rw-r--r--Documentation/admin-guide/mm/transhuge.rst42
-rw-r--r--Documentation/admin-guide/mm/zswap.rst33
-rw-r--r--Documentation/admin-guide/nfs/nfsroot.rst2
-rw-r--r--Documentation/admin-guide/perf/dwc_pcie_pmu.rst4
-rw-r--r--Documentation/admin-guide/perf/fujitsu_uncore_pmu.rst110
-rw-r--r--Documentation/admin-guide/perf/hisi-pmu.rst53
-rw-r--r--Documentation/admin-guide/perf/index.rst1
-rw-r--r--Documentation/admin-guide/quickly-build-trimmed-linux.rst4
-rw-r--r--Documentation/admin-guide/reporting-issues.rst4
-rw-r--r--Documentation/admin-guide/sysctl/fs.rst4
-rw-r--r--Documentation/admin-guide/sysctl/index.rst18
-rw-r--r--Documentation/admin-guide/sysctl/kernel.rst2
-rw-r--r--Documentation/admin-guide/sysctl/net.rst4
-rw-r--r--Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst2
-rw-r--r--Documentation/admin-guide/xfs.rst69
45 files changed, 682 insertions, 158 deletions
diff --git a/Documentation/admin-guide/LSM/SafeSetID.rst b/Documentation/admin-guide/LSM/SafeSetID.rst
index 0ec34863c674..6d439c987563 100644
--- a/Documentation/admin-guide/LSM/SafeSetID.rst
+++ b/Documentation/admin-guide/LSM/SafeSetID.rst
@@ -41,7 +41,7 @@ namespace). The higher level goal is to allow for uid-based sandboxing of system
services without having to give out CAP_SETUID all over the place just so that
non-root programs can drop to even-lesser-privileged uids. This is especially
relevant when one non-root daemon on the system should be allowed to spawn other
-processes as different uids, but its undesirable to give the daemon a
+processes as different uids, but it's undesirable to give the daemon a
basically-root-equivalent CAP_SETUID.
diff --git a/Documentation/admin-guide/RAS/main.rst b/Documentation/admin-guide/RAS/main.rst
index 7ac1d4ccc509..447bfde509fb 100644
--- a/Documentation/admin-guide/RAS/main.rst
+++ b/Documentation/admin-guide/RAS/main.rst
@@ -253,7 +253,7 @@ interface.
Some architectures have ECC detectors for L1, L2 and L3 caches,
along with DMA engines, fabric switches, main data path switches,
interconnections, and various other hardware data paths. If the hardware
-reports it, then a edac_device device probably can be constructed to
+reports it, then an edac_device device probably can be constructed to
harvest and present that to userspace.
diff --git a/Documentation/admin-guide/aoe/udev.txt b/Documentation/admin-guide/aoe/udev.txt
index 5fb756466bc7..d55ecb411c21 100644
--- a/Documentation/admin-guide/aoe/udev.txt
+++ b/Documentation/admin-guide/aoe/udev.txt
@@ -2,7 +2,7 @@
# They may be installed along the following lines. Check the section
# 8 udev manpage to see whether your udev supports SUBSYSTEM, and
# whether it uses one or two equal signs for SUBSYSTEM and KERNEL.
-#
+#
# ecashin@makki ~$ su
# Password:
# bash# find /etc -type f -name udev.conf
@@ -13,7 +13,7 @@
# 10-wacom.rules 50-udev.rules
# bash# cp /path/to/linux/Documentation/admin-guide/aoe/udev.txt \
# /etc/udev/rules.d/60-aoe.rules
-#
+#
# aoe char devices
SUBSYSTEM=="aoe", KERNEL=="discover", NAME="etherd/%k", GROUP="disk", MODE="0220"
@@ -22,5 +22,5 @@ SUBSYSTEM=="aoe", KERNEL=="interfaces", NAME="etherd/%k", GROUP="disk", MODE="02
SUBSYSTEM=="aoe", KERNEL=="revalidate", NAME="etherd/%k", GROUP="disk", MODE="0220"
SUBSYSTEM=="aoe", KERNEL=="flush", NAME="etherd/%k", GROUP="disk", MODE="0220"
-# aoe block devices
+# aoe block devices
KERNEL=="etherd*", GROUP="disk"
diff --git a/Documentation/admin-guide/blockdev/paride.rst b/Documentation/admin-guide/blockdev/paride.rst
index e85ad37cc0e5..b2f627d4c2f8 100644
--- a/Documentation/admin-guide/blockdev/paride.rst
+++ b/Documentation/admin-guide/blockdev/paride.rst
@@ -118,7 +118,7 @@ and high-level drivers that you would use:
================ ============ ========
All parports and all protocol drivers are probed automatically unless probe=0
-parameter is used. So just "modprobe epat" is enough for a Imation SuperDisk
+parameter is used. So just "modprobe epat" is enough for an Imation SuperDisk
drive to work.
Manual device creation::
diff --git a/Documentation/admin-guide/bug-hunting.rst b/Documentation/admin-guide/bug-hunting.rst
index 30858757c9f2..7da0504388ec 100644
--- a/Documentation/admin-guide/bug-hunting.rst
+++ b/Documentation/admin-guide/bug-hunting.rst
@@ -252,7 +252,7 @@ For example, if you find a bug at the gspca's sonixj.c file, you can get
its maintainers with::
$ ./scripts/get_maintainer.pl --bug -f drivers/media/usb/gspca/sonixj.c
- Hans Verkuil <hverkuil@xs4all.nl> (odd fixer:GSPCA USB WEBCAM DRIVER,commit_signer:1/1=100%)
+ Hans Verkuil <hverkuil@kernel.org> (odd fixer:GSPCA USB WEBCAM DRIVER,commit_signer:1/1=100%)
Mauro Carvalho Chehab <mchehab@kernel.org> (maintainer:MEDIA INPUT INFRASTRUCTURE (V4L/DVB),commit_signer:1/1=100%)
Tejun Heo <tj@kernel.org> (commit_signer:1/1=100%)
Bhaktipriya Shridhar <bhaktipriya96@gmail.com> (commit_signer:1/1=100%,authored:1/1=100%,added_lines:4/4=100%,removed_lines:9/9=100%)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 51c0bc4c2dc5..0e6c67ac585a 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -15,6 +15,9 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgrou
.. CONTENTS
+ [Whenever any new section is added to this document, please also add
+ an entry here.]
+
1. Introduction
1-1. Terminology
1-2. What is cgroup?
@@ -25,9 +28,10 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgrou
2-2-2. Threads
2-3. [Un]populated Notification
2-4. Controlling Controllers
- 2-4-1. Enabling and Disabling
- 2-4-2. Top-down Constraint
- 2-4-3. No Internal Process Constraint
+ 2-4-1. Availability
+ 2-4-2. Enabling and Disabling
+ 2-4-3. Top-down Constraint
+ 2-4-4. No Internal Process Constraint
2-5. Delegation
2-5-1. Model of Delegation
2-5-2. Delegation Containment
@@ -61,14 +65,15 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgrou
5-4-1. PID Interface Files
5-5. Cpuset
5.5-1. Cpuset Interface Files
- 5-6. Device
+ 5-6. Device controller
5-7. RDMA
5-7-1. RDMA Interface Files
5-8. DMEM
+ 5-8-1. DMEM Interface Files
5-9. HugeTLB
5.9-1. HugeTLB Interface Files
5-10. Misc
- 5.10-1 Miscellaneous cgroup Interface Files
+ 5.10-1 Misc Interface Files
5.10-2 Migration and Ownership
5-11. Others
5-11-1. perf_event
@@ -1001,6 +1006,24 @@ All cgroup core files are prefixed with "cgroup."
Total number of dying cgroup subsystems (e.g. memory
cgroup) at and beneath the current cgroup.
+ cgroup.stat.local
+ A read-only flat-keyed file which exists in non-root cgroups.
+ The following entry is defined:
+
+ frozen_usec
+ Cumulative time that this cgroup has spent between freezing and
+ thawing, regardless of whether by self or ancestor groups.
+ NB: (not) reaching "frozen" state is not accounted here.
+
+ Using the following ASCII representation of a cgroup's freezer
+ state, ::
+
+ 1 _____
+ frozen 0 __/ \__
+ ab cd
+
+ the duration being measured is the span between a and c.
+
cgroup.freeze
A read-write single value file which exists on non-root cgroups.
Allowed values are "0" and "1". The default is "0".
diff --git a/Documentation/admin-guide/device-mapper/delay.rst b/Documentation/admin-guide/device-mapper/delay.rst
index 4d667228e744..a1e673c0e782 100644
--- a/Documentation/admin-guide/device-mapper/delay.rst
+++ b/Documentation/admin-guide/device-mapper/delay.rst
@@ -3,7 +3,7 @@ dm-delay
========
Device-Mapper's "delay" target delays reads and/or writes
-and/or flushs and optionally maps them to different devices.
+and/or flushes and optionally maps them to different devices.
Arguments::
@@ -18,7 +18,7 @@ Table line has to either have 3, 6 or 9 arguments:
to write and flush operations on optionally different write_device with
optionally different sector offset
-9: same as 6 arguments plus define flush_offset and flush_delay explicitely
+9: same as 6 arguments plus define flush_offset and flush_delay explicitly
on/with optionally different flush_device/flush_offset.
Offsets are specified in sectors.
@@ -40,7 +40,7 @@ Example scripts
#!/bin/sh
#
# Create mapped device delaying write and flush operations for 400ms and
- # splitting reads to device $1 but writes and flushs to different device $2
+ # splitting reads to device $1 but writes and flushes to different device $2
# to different offsets of 2048 and 4096 sectors respectively.
#
dmsetup create delayed --table "0 `blockdev --getsz $1` delay $1 2048 0 $2 4096 400"
@@ -48,7 +48,7 @@ Example scripts
::
#!/bin/sh
#
- # Create mapped device delaying reads for 50ms, writes for 100ms and flushs for 333ms
+ # Create mapped device delaying reads for 50ms, writes for 100ms and flushes for 333ms
# onto the same backing device at offset 0 sectors.
#
dmsetup create delayed --table "0 `blockdev --getsz $1` delay $1 0 50 $2 0 100 $1 0 333"
diff --git a/Documentation/admin-guide/device-mapper/dm-pcache.rst b/Documentation/admin-guide/device-mapper/dm-pcache.rst
new file mode 100644
index 000000000000..09d327ef4b14
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/dm-pcache.rst
@@ -0,0 +1,202 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================================
+dm-pcache — Persistent Cache
+=================================
+
+*Author: Dongsheng Yang <dongsheng.yang@linux.dev>*
+
+This document describes *dm-pcache*, a Device-Mapper target that lets a
+byte-addressable *DAX* (persistent-memory, “pmem”) region act as a
+high-performance, crash-persistent cache in front of a slower block
+device. The code lives in `drivers/md/dm-pcache/`.
+
+Quick feature summary
+=====================
+
+* *Write-back* caching (only mode currently supported).
+* *16 MiB segments* allocated on the pmem device.
+* *Data CRC32* verification (optional, per cache).
+* Crash-safe: every metadata structure is duplicated (`PCACHE_META_INDEX_MAX
+ == 2`) and protected with CRC+sequence numbers.
+* *Multi-tree indexing* (indexing trees sharded by logical address) for high PMem parallelism
+* Pure *DAX path* I/O – no extra BIO round-trips
+* *Log-structured write-back* that preserves backend crash-consistency
+
+
+Constructor
+===========
+
+::
+
+ pcache <cache_dev> <backing_dev> [<number_of_optional_arguments> <cache_mode writeback> <data_crc true|false>]
+
+========================= ====================================================
+``cache_dev`` Any DAX-capable block device (``/dev/pmem0``…).
+ All metadata *and* cached blocks are stored here.
+
+``backing_dev`` The slow block device to be cached.
+
+``cache_mode`` Optional, Only ``writeback`` is accepted at the
+ moment.
+
+``data_crc`` Optional, default to ``false``
+
+ * ``true`` – store CRC32 for every cached entry
+ and verify on reads
+ * ``false`` – skip CRC (faster)
+========================= ====================================================
+
+Example
+-------
+
+.. code-block:: shell
+
+ dmsetup create pcache_sdb --table \
+ "0 $(blockdev --getsz /dev/sdb) pcache /dev/pmem0 /dev/sdb 4 cache_mode writeback data_crc true"
+
+The first time a pmem device is used, dm-pcache formats it automatically
+(super-block, cache_info, etc.).
+
+
+Status line
+===========
+
+``dmsetup status <device>`` (``STATUSTYPE_INFO``) prints:
+
+::
+
+ <sb_flags> <seg_total> <cache_segs> <segs_used> \
+ <gc_percent> <cache_flags> \
+ <key_head_seg>:<key_head_off> \
+ <dirty_tail_seg>:<dirty_tail_off> \
+ <key_tail_seg>:<key_tail_off>
+
+Field meanings
+--------------
+
+=============================== =============================================
+``sb_flags`` Super-block flags (e.g. endian marker).
+
+``seg_total`` Number of physical *pmem* segments.
+
+``cache_segs`` Number of segments used for cache.
+
+``segs_used`` Segments currently allocated (bitmap weight).
+
+``gc_percent`` Current GC high-water mark (0-90).
+
+``cache_flags`` Bit 0 – DATA_CRC enabled
+ Bit 1 – INIT_DONE (cache initialised)
+ Bits 2-5 – cache mode (0 == WB).
+
+``key_head`` Where new key-sets are being written.
+
+``dirty_tail`` First dirty key-set that still needs
+ write-back to the backing device.
+
+``key_tail`` First key-set that may be reclaimed by GC.
+=============================== =============================================
+
+
+Messages
+========
+
+*Change GC trigger*
+
+::
+
+ dmsetup message <dev> 0 gc_percent <0-90>
+
+
+Theory of operation
+===================
+
+Sub-devices
+-----------
+
+==================== =========================================================
+backing_dev Any block device (SSD/HDD/loop/LVM, etc.).
+cache_dev DAX device; must expose direct-access memory.
+==================== =========================================================
+
+Segments and key-sets
+---------------------
+
+* The pmem space is divided into *16 MiB segments*.
+* Each write allocates space from a per-CPU *data_head* inside a segment.
+* A *cache-key* records a logical range on the origin and where it lives
+ inside pmem (segment + offset + generation).
+* 128 keys form a *key-set* (kset); ksets are written sequentially in pmem
+ and are themselves crash-safe (CRC).
+* The pair *(key_tail, dirty_tail)* delimit clean/dirty and live/dead ksets.
+
+Write-back
+----------
+
+Dirty keys are queued into a tree; a background worker copies data
+back to the backing_dev and advances *dirty_tail*. A FLUSH/FUA bio from the
+upper layers forces an immediate metadata commit.
+
+Garbage collection
+------------------
+
+GC starts when ``segs_used >= seg_total * gc_percent / 100``. It walks
+from *key_tail*, frees segments whose every key has been invalidated, and
+advances *key_tail*.
+
+CRC verification
+----------------
+
+If ``data_crc is enabled`` dm-pcache computes a CRC32 over every cached data
+range when it is inserted and stores it in the on-media key. Reads
+validate the CRC before copying to the caller.
+
+
+Failure handling
+================
+
+* *pmem media errors* – all metadata copies are read with
+ ``copy_mc_to_kernel``; an uncorrectable error logs and aborts initialisation.
+* *Cache full* – if no free segment can be found, writes return ``-EBUSY``;
+ dm-pcache retries internally (request deferral).
+* *System crash* – on attach, the driver replays ksets from *key_tail* to
+ rebuild the in-core trees; every segment’s generation guards against
+ use-after-free keys.
+
+
+Limitations & TODO
+==================
+
+* Only *write-back* mode; other modes planned.
+* Only FIFO cache invalidate; other (LRU, ARC...) planned.
+* Table reload is not supported currently.
+* Discard planned.
+
+
+Example workflow
+================
+
+.. code-block:: shell
+
+ # 1. Create devices
+ dmsetup create pcache_sdb --table \
+ "0 $(blockdev --getsz /dev/sdb) pcache /dev/pmem0 /dev/sdb 4 cache_mode writeback data_crc true"
+
+ # 2. Put a filesystem on top
+ mkfs.ext4 /dev/mapper/pcache_sdb
+ mount /dev/mapper/pcache_sdb /mnt
+
+ # 3. Tune GC threshold to 80 %
+ dmsetup message pcache_sdb 0 gc_percent 80
+
+ # 4. Observe status
+ watch -n1 'dmsetup status pcache_sdb'
+
+ # 5. Shutdown
+ umount /mnt
+ dmsetup remove pcache_sdb
+
+
+``dm-pcache`` is under active development; feedback, bug reports and patches
+are very welcome!
diff --git a/Documentation/admin-guide/device-mapper/index.rst b/Documentation/admin-guide/device-mapper/index.rst
index cc5aec861576..f1c1f4b824ba 100644
--- a/Documentation/admin-guide/device-mapper/index.rst
+++ b/Documentation/admin-guide/device-mapper/index.rst
@@ -18,6 +18,7 @@ Device Mapper
dm-integrity
dm-io
dm-log
+ dm-pcache
dm-queue-length
dm-raid
dm-service-time
diff --git a/Documentation/admin-guide/device-mapper/vdo-design.rst b/Documentation/admin-guide/device-mapper/vdo-design.rst
index 3cd59decbec0..faa0ecd4a5ae 100644
--- a/Documentation/admin-guide/device-mapper/vdo-design.rst
+++ b/Documentation/admin-guide/device-mapper/vdo-design.rst
@@ -600,7 +600,7 @@ lock and return itself to the pool.
All storage within vdo is managed as 4KB blocks, but it can accept writes
as small as 512 bytes. Processing a write that is smaller than 4K requires
a read-modify-write operation that reads the relevant 4K block, copies the
-new data over the approriate sectors of the block, and then launches a
+new data over the appropriate sectors of the block, and then launches a
write operation for the modified data block. The read and write stages of
this operation are nearly identical to the normal read and write
operations, and a single data_vio is used throughout this operation.
diff --git a/Documentation/admin-guide/device-mapper/vdo.rst b/Documentation/admin-guide/device-mapper/vdo.rst
index a14e6d3e787c..8a67b320a97b 100644
--- a/Documentation/admin-guide/device-mapper/vdo.rst
+++ b/Documentation/admin-guide/device-mapper/vdo.rst
@@ -1,5 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0-only
+======
dm-vdo
======
diff --git a/Documentation/admin-guide/ext4.rst b/Documentation/admin-guide/ext4.rst
index b857eb6ca1b6..ac0c709ea9e7 100644
--- a/Documentation/admin-guide/ext4.rst
+++ b/Documentation/admin-guide/ext4.rst
@@ -398,7 +398,7 @@ There are 3 different data modes:
* writeback mode
In data=writeback mode, ext4 does not journal data at all. This mode provides
- a similar level of journaling as that of XFS, JFS, and ReiserFS in its default
+ a similar level of journaling as that of XFS and JFS in its default
mode - metadata journaling. A crash+recovery can cause incorrect data to
appear in files which were written shortly before the crash. This mode will
typically provide the best ext4 performance.
diff --git a/Documentation/admin-guide/hw-vuln/attack_vector_controls.rst b/Documentation/admin-guide/hw-vuln/attack_vector_controls.rst
index 6dd0800146f6..d0bdbd81dcf9 100644
--- a/Documentation/admin-guide/hw-vuln/attack_vector_controls.rst
+++ b/Documentation/admin-guide/hw-vuln/attack_vector_controls.rst
@@ -215,9 +215,10 @@ Spectre_v2 X X
Spectre_v2_user X X * (Note 1)
SRBDS X X X X
SRSO X X X X
-SSB (Note 4)
+SSB X
TAA X X X X * (Note 2)
TSA X X X X
+VMSCAPE X
=============== ============== ============ ============= ============== ============ ========
Notes:
@@ -229,9 +230,6 @@ Notes:
3 -- Disables SMT if cross-thread mitigations are fully enabled, the CPU is
vulnerable, and STIBP is not supported
- 4 -- Speculative store bypass is always enabled by default (no kernel
- mitigation applied) unless overridden with spec_store_bypass_disable option
-
When an attack-vector is disabled, all mitigations for the vulnerabilities
listed in the above table are disabled, unless mitigation is required for a
different enabled attack-vector or a mitigation is explicitly selected via a
diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst
index 89ca636081b7..55d747511f83 100644
--- a/Documentation/admin-guide/hw-vuln/index.rst
+++ b/Documentation/admin-guide/hw-vuln/index.rst
@@ -26,3 +26,4 @@ are configurable at compile, boot or run time.
rsb
old_microcode
indirect-target-selection
+ vmscape
diff --git a/Documentation/admin-guide/hw-vuln/mds.rst b/Documentation/admin-guide/hw-vuln/mds.rst
index 48c7b0b72aed..754679db0ce8 100644
--- a/Documentation/admin-guide/hw-vuln/mds.rst
+++ b/Documentation/admin-guide/hw-vuln/mds.rst
@@ -214,7 +214,7 @@ XEON PHI specific considerations
command line with the 'ring3mwait=disable' command line option.
XEON PHI is not affected by the other MDS variants and MSBDS is mitigated
- before the CPU enters a idle state. As XEON PHI is not affected by L1TF
+ before the CPU enters an idle state. As XEON PHI is not affected by L1TF
either disabling SMT is not required for full protection.
.. _mds_smt_control:
diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index 132e0bc6007e..991f12adef8d 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -664,7 +664,7 @@ Intel white papers:
.. _spec_ref1:
-[1] `Intel analysis of speculative execution side channels <https://newsroom.intel.com/wp-content/uploads/sites/11/2018/01/Intel-Analysis-of-Speculative-Execution-Side-Channels.pdf>`_.
+[1] `Intel analysis of speculative execution side channels <https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/analysis-of-speculative-execution-side-channels-white-paper.pdf>`_.
.. _spec_ref2:
@@ -682,7 +682,7 @@ AMD white papers:
.. _spec_ref5:
-[5] `AMD64 technology indirect branch control extension <https://developer.amd.com/wp-content/resources/Architecture_Guidelines_Update_Indirect_Branch_Control.pdf>`_.
+[5] `AMD64 technology indirect branch control extension <https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/white-papers/111006-architecture-guidelines-update-amd64-technology-indirect-branch-control-extension.pdf>`_.
.. _spec_ref6:
@@ -708,7 +708,7 @@ MIPS white paper:
.. _spec_ref10:
-[10] `MIPS: response on speculative execution and side channel vulnerabilities <https://www.mips.com/blog/mips-response-on-speculative-execution-and-side-channel-vulnerabilities/>`_.
+[10] `MIPS: response on speculative execution and side channel vulnerabilities <https://web.archive.org/web/20220512003005if_/https://www.mips.com/blog/mips-response-on-speculative-execution-and-side-channel-vulnerabilities/>`_.
Academic papers:
diff --git a/Documentation/admin-guide/hw-vuln/vmscape.rst b/Documentation/admin-guide/hw-vuln/vmscape.rst
new file mode 100644
index 000000000000..d9b9a2b6c114
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/vmscape.rst
@@ -0,0 +1,110 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+VMSCAPE
+=======
+
+VMSCAPE is a vulnerability that may allow a guest to influence the branch
+prediction in host userspace. It particularly affects hypervisors like QEMU.
+
+Even if a hypervisor may not have any sensitive data like disk encryption keys,
+guest-userspace may be able to attack the guest-kernel using the hypervisor as
+a confused deputy.
+
+Affected processors
+-------------------
+
+The following CPU families are affected by VMSCAPE:
+
+**Intel processors:**
+ - Skylake generation (Parts without Enhanced-IBRS)
+ - Cascade Lake generation - (Parts affected by ITS guest/host separation)
+ - Alder Lake and newer (Parts affected by BHI)
+
+Note that, BHI affected parts that use BHB clearing software mitigation e.g.
+Icelake are not vulnerable to VMSCAPE.
+
+**AMD processors:**
+ - Zen series (families 0x17, 0x19, 0x1a)
+
+** Hygon processors:**
+ - Family 0x18
+
+Mitigation
+----------
+
+Conditional IBPB
+----------------
+
+Kernel tracks when a CPU has run a potentially malicious guest and issues an
+IBPB before the first exit to userspace after VM-exit. If userspace did not run
+between VM-exit and the next VM-entry, no IBPB is issued.
+
+Note that the existing userspace mitigation against Spectre-v2 is effective in
+protecting the userspace. They are insufficient to protect the userspace VMMs
+from a malicious guest. This is because Spectre-v2 mitigations are applied at
+context switch time, while the userspace VMM can run after a VM-exit without a
+context switch.
+
+Vulnerability enumeration and mitigation is not applied inside a guest. This is
+because nested hypervisors should already be deploying IBPB to isolate
+themselves from nested guests.
+
+SMT considerations
+------------------
+
+When Simultaneous Multi-Threading (SMT) is enabled, hypervisors can be
+vulnerable to cross-thread attacks. For complete protection against VMSCAPE
+attacks in SMT environments, STIBP should be enabled.
+
+The kernel will issue a warning if SMT is enabled without adequate STIBP
+protection. Warning is not issued when:
+
+- SMT is disabled
+- STIBP is enabled system-wide
+- Intel eIBRS is enabled (which implies STIBP protection)
+
+System information and options
+------------------------------
+
+The sysfs file showing VMSCAPE mitigation status is:
+
+ /sys/devices/system/cpu/vulnerabilities/vmscape
+
+The possible values in this file are:
+
+ * 'Not affected':
+
+ The processor is not vulnerable to VMSCAPE attacks.
+
+ * 'Vulnerable':
+
+ The processor is vulnerable and no mitigation has been applied.
+
+ * 'Mitigation: IBPB before exit to userspace':
+
+ Conditional IBPB mitigation is enabled. The kernel tracks when a CPU has
+ run a potentially malicious guest and issues an IBPB before the first
+ exit to userspace after VM-exit.
+
+ * 'Mitigation: IBPB on VMEXIT':
+
+ IBPB is issued on every VM-exit. This occurs when other mitigations like
+ RETBLEED or SRSO are already issuing IBPB on VM-exit.
+
+Mitigation control on the kernel command line
+----------------------------------------------
+
+The mitigation can be controlled via the ``vmscape=`` command line parameter:
+
+ * ``vmscape=off``:
+
+ Disable the VMSCAPE mitigation.
+
+ * ``vmscape=ibpb``:
+
+ Enable conditional IBPB mitigation (default when CONFIG_MITIGATION_VMSCAPE=y).
+
+ * ``vmscape=force``:
+
+ Force vulnerability detection and mitigation even on processors that are
+ not known to be affected.
diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst
index 9c6cd52f69cf..7b011eb116a7 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -471,7 +471,7 @@ Notes on loading the dump-capture kernel:
performance degradation. To enable multi-cpu support, you should bring up an
SMP dump-capture kernel and specify maxcpus/nr_cpus options while loading it.
-* For s390x there are two kdump modes: If a ELF header is specified with
+* For s390x there are two kdump modes: If an ELF header is specified with
the elfcorehdr= kernel parameter, it is used by the kdump kernel as it
is done on all other architectures. If no elfcorehdr= kernel parameter is
specified, the s390x kdump kernel dynamically creates the header. The
diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst
index 39d0e7ff0965..7bf8cc7df6b5 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
.. _kernelparameters:
The kernel's command-line parameters
@@ -213,7 +215,7 @@ need or coordination with <Documentation/arch/x86/boot.rst>.
There are also arch-specific kernel-parameters not documented here.
Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
-a trailing = on the name of any parameter states that that parameter will
+a trailing = on the name of any parameter states that the parameter will
be entered as an environment variable, whereas its absence indicates that
it will appear as a kernel argument readable via /proc/cmdline by programs
running once the system is up.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 747a55abf494..3edc5ce0e2a3 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2606,6 +2606,11 @@
for it. Intended to get systems with badly broken
firmware running.
+ irqhandler.duration_warn_us= [KNL]
+ Warn if an IRQ handler exceeds the specified duration
+ threshold in microseconds. Useful for identifying
+ long-running IRQs in the system.
+
irqpoll [HW]
When an interrupt is not handled search all handlers
for it. Also check all handlers each timer
@@ -3700,7 +3705,7 @@
looking for corruption. Enabling this will
both detect corruption and prevent the kernel
from using the memory being corrupted.
- However, its intended as a diagnostic tool; if
+ However, it's intended as a diagnostic tool; if
repeatable BIOS-originated corruption always
affects the same memory, you can use memmap=
to prevent the kernel from using that memory.
@@ -3767,8 +3772,16 @@
mga= [HW,DRM]
- microcode.force_minrev= [X86]
- Format: <bool>
+ microcode= [X86] Control the behavior of the microcode loader.
+ Available options, comma separated:
+
+ base_rev=X - with <X> with format: <u32>
+ Set the base microcode revision of each thread when in
+ debug mode.
+
+ dis_ucode_ldr: disable the microcode loader
+
+ force_minrev:
Enable or disable the microcode minimal revision
enforcement for the runtime microcode loader.
@@ -3829,6 +3842,7 @@
srbds=off [X86,INTEL]
ssbd=force-off [ARM64]
tsx_async_abort=off [X86]
+ vmscape=off [X86]
Exceptions:
This does not have any effect on
@@ -4589,7 +4603,7 @@
bit 2: print timer info
bit 3: print locks info if CONFIG_LOCKDEP is on
bit 4: print ftrace buffer
- bit 5: replay all messages on consoles at the end of panic
+ bit 5: replay all kernel messages on consoles at the end of panic
bit 6: print all CPUs backtrace (if available in the arch)
bit 7: print only tasks in uninterruptible (blocked) state
*Be aware* that this option may print a _lot_ of lines,
@@ -6154,7 +6168,7 @@
rdt= [HW,X86,RDT]
Turn on/off individual RDT features. List is:
cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
- mba, smba, bmec.
+ mba, smba, bmec, abmc.
E.g. to turn on cmt and turn off mba use:
rdt=cmt,!mba
@@ -6405,8 +6419,9 @@
rodata= [KNL,EARLY]
on Mark read-only kernel memory as read-only (default).
off Leave read-only kernel memory writable for debugging.
- full Mark read-only kernel memory and aliases as read-only
- [arm64]
+ noalias Mark read-only kernel memory as read-only but retain
+ writable aliases in the direct map for regions outside
+ of the kernel image. [arm64]
rockchip.usb_uart
[EARLY]
@@ -6428,6 +6443,9 @@
rootflags= [KNL] Set root filesystem mount option string
+ initramfs_options= [KNL]
+ Specify mount options for for the initramfs mount.
+
rootfstype= [KNL] Set root filesystem type
rootwait [KNL] Wait (indefinitely) for root device to show up.
@@ -7382,7 +7400,7 @@
(converted into nanoseconds). Fast, but
depending on the architecture, may not be
in sync between CPUs.
- global - Event time stamps are synchronize across
+ global - Event time stamps are synchronized across
CPUs. May be slower than the local clock,
but better for some race conditions.
counter - Simple counting of events (1, 2, ..)
@@ -7502,12 +7520,12 @@
section.
trace_trigger=[trigger-list]
- [FTRACE] Add a event trigger on specific events.
+ [FTRACE] Add an event trigger on specific events.
Set a trigger on top of a specific event, with an optional
filter.
- The format is is "trace_trigger=<event>.<trigger>[ if <filter>],..."
- Where more than one trigger may be specified that are comma deliminated.
+ The format is "trace_trigger=<event>.<trigger>[ if <filter>],..."
+ Where more than one trigger may be specified that are comma delimited.
For example:
@@ -7515,7 +7533,7 @@
The above will enable the "stacktrace" trigger on the "sched_switch"
event but only trigger it if the "prev_state" of the "sched_switch"
- event is "2" (TASK_UNINTERUPTIBLE).
+ event is "2" (TASK_UNINTERRUPTIBLE).
See also "Event triggers" in Documentation/trace/events.rst
@@ -8041,6 +8059,16 @@
vmpoff= [KNL,S390] Perform z/VM CP command after power off.
Format: <command>
+ vmscape= [X86] Controls mitigation for VMscape attacks.
+ VMscape attacks can leak information from a userspace
+ hypervisor to a guest via speculative side-channels.
+
+ off - disable the mitigation
+ ibpb - use Indirect Branch Prediction Barrier
+ (IBPB) mitigation (default)
+ force - force vulnerability detection even on
+ unaffected processors
+
vsyscall= [X86-64,EARLY]
Controls the behavior of vsyscalls (i.e. calls to
fixed addresses of 0xffffffffff600x00 from legacy
diff --git a/Documentation/admin-guide/laptops/laptop-mode.rst b/Documentation/admin-guide/laptops/laptop-mode.rst
index b61cc601d298..66eb9cd918b5 100644
--- a/Documentation/admin-guide/laptops/laptop-mode.rst
+++ b/Documentation/admin-guide/laptops/laptop-mode.rst
@@ -61,7 +61,7 @@ Caveats
Check your drive's rating, and don't wear down your drive's lifetime if you
don't need to.
-* If you mount some of your ext3/reiserfs filesystems with the -n option, then
+* If you mount some of your ext3 filesystems with the -n option, then
the control script will not be able to remount them correctly. You must set
DO_REMOUNTS=0 in the control script, otherwise it will remount them with the
wrong options -- or it will fail because it cannot write to /etc/mtab.
@@ -96,7 +96,7 @@ control script increases dirty_expire_centisecs and dirty_writeback_centisecs in
dirtied are not forced to be written to disk as often. The control script also
changes the dirty background ratio, so that background writeback of dirty pages
is not done anymore. Combined with a higher commit value (also 10 minutes) for
-ext3 or ReiserFS filesystems (also done automatically by the control script),
+ext3 filesystem (also done automatically by the control script),
this results in concentration of disk activity in a small time interval which
occurs only once every 10 minutes, or whenever the disk is forced to spin up by
a cache miss. The disk can then be spun down in the periods of inactivity.
@@ -587,7 +587,7 @@ Control script::
FST=$(deduce_fstype $MP)
fi
case "$FST" in
- "ext3"|"reiserfs")
+ "ext3")
PARSEDOPTS="$(parse_mount_opts commit "$OPTS")"
mount $DEV -t $FST $MP -o remount,$PARSEDOPTS,commit=$MAX_AGE$NOATIME_OPT
;;
@@ -647,7 +647,7 @@ Control script::
FST=$(deduce_fstype $MP)
fi
case "$FST" in
- "ext3"|"reiserfs")
+ "ext3")
PARSEDOPTS="$(parse_mount_opts_wfstab $DEV commit $OPTS)"
PARSEDOPTS="$(parse_yesno_opts_wfstab $DEV atime atime $PARSEDOPTS)"
mount $DEV -t $FST $MP -o remount,$PARSEDOPTS
diff --git a/Documentation/admin-guide/laptops/lg-laptop.rst b/Documentation/admin-guide/laptops/lg-laptop.rst
index 67fd6932cef4..c4dd534f91ed 100644
--- a/Documentation/admin-guide/laptops/lg-laptop.rst
+++ b/Documentation/admin-guide/laptops/lg-laptop.rst
@@ -48,8 +48,8 @@ This value is reset to 100 when the kernel boots.
Fan mode
--------
-Writing 1/0 to /sys/devices/platform/lg-laptop/fan_mode disables/enables
-the fan silent mode.
+Writing 0/1/2 to /sys/devices/platform/lg-laptop/fan_mode sets fan mode to
+Optimal/Silent/Performance respectively.
USB charge
diff --git a/Documentation/admin-guide/laptops/sonypi.rst b/Documentation/admin-guide/laptops/sonypi.rst
index 190da1234314..7541f56e0007 100644
--- a/Documentation/admin-guide/laptops/sonypi.rst
+++ b/Documentation/admin-guide/laptops/sonypi.rst
@@ -25,7 +25,7 @@ generate, like:
(when available)
Those events (see linux/sonypi.h) can be polled using the character device node
-/dev/sonypi (major 10, minor auto allocated or specified as a option).
+/dev/sonypi (major 10, minor auto allocated or specified as an option).
A simple daemon which translates the jogdial movements into mouse wheel events
can be downloaded at: <http://popies.net/sonypi/>
diff --git a/Documentation/admin-guide/md.rst b/Documentation/admin-guide/md.rst
index 1c2eacc94758..deed823eab01 100644
--- a/Documentation/admin-guide/md.rst
+++ b/Documentation/admin-guide/md.rst
@@ -794,7 +794,7 @@ These currently include:
journal_mode (currently raid5 only)
The cache mode for raid5. raid5 could include an extra disk for
- caching. The mode can be "write-throuth" and "write-back". The
+ caching. The mode can be "write-through" or "write-back". The
default is "write-through".
ppl_write_hint
diff --git a/Documentation/admin-guide/media/i2c-cardlist.rst b/Documentation/admin-guide/media/i2c-cardlist.rst
index 1825a0bb47bd..fff962558cd5 100644
--- a/Documentation/admin-guide/media/i2c-cardlist.rst
+++ b/Documentation/admin-guide/media/i2c-cardlist.rst
@@ -91,7 +91,6 @@ ov5647 OmniVision OV5647 sensor
ov5670 OmniVision OV5670 sensor
ov5675 OmniVision OV5675 sensor
ov5695 OmniVision OV5695 sensor
-ov6650 OmniVision OV6650 sensor
ov7251 OmniVision OV7251 sensor
ov7640 OmniVision OV7640 sensor
ov7670 OmniVision OV7670 sensor
diff --git a/Documentation/admin-guide/media/imx.rst b/Documentation/admin-guide/media/imx.rst
index b8fa70f854fd..bb68100d8acb 100644
--- a/Documentation/admin-guide/media/imx.rst
+++ b/Documentation/admin-guide/media/imx.rst
@@ -96,7 +96,7 @@ Some of the features of this driver include:
motion compensation modes: low, medium, and high motion. Pipelines are
defined that allow sending frames to the VDIC subdev directly from the
CSI. There is also support in the future for sending frames to the
- VDIC from memory buffers via a output/mem2mem devices.
+ VDIC from memory buffers via output/mem2mem devices.
- Includes a Frame Interval Monitor (FIM) that can correct vertical sync
problems with the ADV718x video decoders.
diff --git a/Documentation/admin-guide/media/ivtv.rst b/Documentation/admin-guide/media/ivtv.rst
index 101f16d0263e..8b65ac3f5321 100644
--- a/Documentation/admin-guide/media/ivtv.rst
+++ b/Documentation/admin-guide/media/ivtv.rst
@@ -3,7 +3,7 @@
The ivtv driver
===============
-Author: Hans Verkuil <hverkuil@xs4all.nl>
+Author: Hans Verkuil <hverkuil@kernel.org>
This is a v4l2 device driver for the Conexant cx23415/6 MPEG encoder/decoder.
The cx23415 can do both encoding and decoding, the cx23416 can only do MPEG
diff --git a/Documentation/admin-guide/media/si4713.rst b/Documentation/admin-guide/media/si4713.rst
index be8e6b49b7b4..85dcf1cd2df8 100644
--- a/Documentation/admin-guide/media/si4713.rst
+++ b/Documentation/admin-guide/media/si4713.rst
@@ -13,7 +13,7 @@ Contact: Eduardo Valentin <eduardo.valentin@nokia.com>
Information about the Device
----------------------------
-This chip is a Silicon Labs product. It is a I2C device, currently on 0x63 address.
+This chip is a Silicon Labs product. It is an I2C device, currently on 0x63 address.
Basically, it has transmission and signal noise level measurement features.
The Si4713 integrates transmit functions for FM broadcast stereo transmission.
@@ -28,7 +28,7 @@ Users must comply with local regulations on radio frequency (RF) transmission.
Device driver description
-------------------------
-There are two modules to handle this device. One is a I2C device driver
+There are two modules to handle this device. One is an I2C device driver
and the other is a platform driver.
The I2C device driver exports a v4l2-subdev interface to the kernel.
@@ -113,7 +113,7 @@ Here is a summary of them:
- acomp_attack_time - Sets the attack time for audio dynamic range control.
- acomp_release_time - Sets the release time for audio dynamic range control.
-* Limiter setups audio deviation limiter feature. Once a over deviation occurs,
+* Limiter sets up the audio deviation limiter feature. Once an over deviation occurs,
it is possible to adjust the front-end gain of the audio input and always
prevent over deviation.
diff --git a/Documentation/admin-guide/mm/damon/start.rst b/Documentation/admin-guide/mm/damon/start.rst
index ede14b679d02..ec8c34b2d32f 100644
--- a/Documentation/admin-guide/mm/damon/start.rst
+++ b/Documentation/admin-guide/mm/damon/start.rst
@@ -175,4 +175,4 @@ Below command makes every memory region of size >=4K that has not accessed for
$ sudo damo start --damos_access_rate 0 0 --damos_sz_region 4K max \
--damos_age 60s max --damos_action pageout \
- <pid of your workload>
+ --target_pid <pid of your workload>
diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst
index ff3a2dda1f02..eae534bc1bee 100644
--- a/Documentation/admin-guide/mm/damon/usage.rst
+++ b/Documentation/admin-guide/mm/damon/usage.rst
@@ -61,7 +61,7 @@ comma (",").
│ :ref:`kdamonds <sysfs_kdamonds>`/nr_kdamonds
│ │ :ref:`0 <sysfs_kdamond>`/state,pid,refresh_ms
│ │ │ :ref:`contexts <sysfs_contexts>`/nr_contexts
- │ │ │ │ :ref:`0 <sysfs_context>`/avail_operations,operations
+ │ │ │ │ :ref:`0 <sysfs_context>`/avail_operations,operations,addr_unit
│ │ │ │ │ :ref:`monitoring_attrs <sysfs_monitoring_attrs>`/
│ │ │ │ │ │ intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ │ intervals_goal/access_bp,aggrs,min_sample_us,max_sample_us
@@ -188,9 +188,9 @@ details). At the moment, only one context per kdamond is supported, so only
contexts/<N>/
-------------
-In each context directory, two files (``avail_operations`` and ``operations``)
-and three directories (``monitoring_attrs``, ``targets``, and ``schemes``)
-exist.
+In each context directory, three files (``avail_operations``, ``operations``
+and ``addr_unit``) and three directories (``monitoring_attrs``, ``targets``,
+and ``schemes``) exist.
DAMON supports multiple types of :ref:`monitoring operations
<damon_design_configurable_operations_set>`, including those for virtual address
@@ -205,6 +205,9 @@ You can set and get what type of monitoring operations DAMON will use for the
context by writing one of the keywords listed in ``avail_operations`` file and
reading from the ``operations`` file.
+``addr_unit`` file is for setting and getting the :ref:`address unit
+<damon_design_addr_unit>` parameter of the operations set.
+
.. _sysfs_monitoring_attrs:
contexts/<N>/monitoring_attrs/
@@ -357,7 +360,7 @@ The directory for the :ref:`quotas <damon_design_damos_quotas>` of the given
DAMON-based operation scheme.
Under ``quotas`` directory, four files (``ms``, ``bytes``,
-``reset_interval_ms``, ``effective_bytes``) and two directores (``weights`` and
+``reset_interval_ms``, ``effective_bytes``) and two directories (``weights`` and
``goals``) exist.
You can set the ``time quota`` in milliseconds, ``size quota`` in bytes, and
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 370fba113460..1654211cc6cf 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -225,6 +225,42 @@ to "always" or "madvise"), and it'll be automatically shutdown when
PMD-sized THP is disabled (when both the per-size anon control and the
top-level control are "never")
+process THP controls
+--------------------
+
+A process can control its own THP behaviour using the ``PR_SET_THP_DISABLE``
+and ``PR_GET_THP_DISABLE`` pair of prctl(2) calls. The THP behaviour set using
+``PR_SET_THP_DISABLE`` is inherited across fork(2) and execve(2). These calls
+support the following arguments::
+
+ prctl(PR_SET_THP_DISABLE, 1, 0, 0, 0):
+ This will disable THPs completely for the process, irrespective
+ of global THP controls or madvise(..., MADV_COLLAPSE) being used.
+
+ prctl(PR_SET_THP_DISABLE, 1, PR_THP_DISABLE_EXCEPT_ADVISED, 0, 0):
+ This will disable THPs for the process except when the usage of THPs is
+ advised. Consequently, THPs will only be used when:
+ - Global THP controls are set to "always" or "madvise" and
+ madvise(..., MADV_HUGEPAGE) or madvise(..., MADV_COLLAPSE) is used.
+ - Global THP controls are set to "never" and madvise(..., MADV_COLLAPSE)
+ is used. This is the same behavior as if THPs would not be disabled on
+ a process level.
+ Note that MADV_COLLAPSE is currently always rejected if
+ madvise(..., MADV_NOHUGEPAGE) is set on an area.
+
+ prctl(PR_SET_THP_DISABLE, 0, 0, 0, 0):
+ This will re-enable THPs for the process, as if they were never disabled.
+ Whether THPs will actually be used depends on global THP controls and
+ madvise() calls.
+
+ prctl(PR_GET_THP_DISABLE, 0, 0, 0, 0):
+ This returns a value whose bits indicate how THP-disable is configured:
+ Bits
+ 1 0 Value Description
+ |0|0| 0 No THP-disable behaviour specified.
+ |0|1| 1 THP is entirely disabled for this process.
+ |1|1| 3 THP-except-advised mode is set for this process.
+
Khugepaged controls
-------------------
@@ -383,6 +419,8 @@ option: ``huge=``. It can have following values:
always
Attempt to allocate huge pages every time we need a new page;
+ Always try PMD-sized huge pages first, and fall back to smaller-sized
+ huge pages if the PMD-sized huge page allocation fails;
never
Do not allocate huge pages. Note that ``madvise(..., MADV_COLLAPSE)``
@@ -390,7 +428,9 @@ never
is specified everywhere;
within_size
- Only allocate huge page if it will be fully within i_size.
+ Only allocate huge page if it will be fully within i_size;
+ Always try PMD-sized huge pages first, and fall back to smaller-sized
+ huge pages if the PMD-sized huge page allocation fails;
Also respect madvise() hints;
advise
diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst
index fd3370aa43fe..283d77217c6f 100644
--- a/Documentation/admin-guide/mm/zswap.rst
+++ b/Documentation/admin-guide/mm/zswap.rst
@@ -53,26 +53,17 @@ Zswap receives pages for compression from the swap subsystem and is able to
evict pages from its own compressed pool on an LRU basis and write them back to
the backing swap device in the case that the compressed pool is full.
-Zswap makes use of zpool for the managing the compressed memory pool. Each
-allocation in zpool is not directly accessible by address. Rather, a handle is
+Zswap makes use of zsmalloc for the managing the compressed memory pool. Each
+allocation in zsmalloc is not directly accessible by address. Rather, a handle is
returned by the allocation routine and that handle must be mapped before being
accessed. The compressed memory pool grows on demand and shrinks as compressed
-pages are freed. The pool is not preallocated. By default, a zpool
-of type selected in ``CONFIG_ZSWAP_ZPOOL_DEFAULT`` Kconfig option is created,
-but it can be overridden at boot time by setting the ``zpool`` attribute,
-e.g. ``zswap.zpool=zsmalloc``. It can also be changed at runtime using the sysfs
-``zpool`` attribute, e.g.::
-
- echo zsmalloc > /sys/module/zswap/parameters/zpool
-
-The zsmalloc type zpool has a complex compressed page storage method, and it
-can achieve great storage densities.
+pages are freed. The pool is not preallocated.
When a swap page is passed from swapout to zswap, zswap maintains a mapping
-of the swap entry, a combination of the swap type and swap offset, to the zpool
-handle that references that compressed swap page. This mapping is achieved
-with a red-black tree per swap type. The swap offset is the search key for the
-tree nodes.
+of the swap entry, a combination of the swap type and swap offset, to the
+zsmalloc handle that references that compressed swap page. This mapping is
+achieved with a red-black tree per swap type. The swap offset is the search
+key for the tree nodes.
During a page fault on a PTE that is a swap entry, the swapin code calls the
zswap load function to decompress the page into the page allocated by the page
@@ -96,11 +87,11 @@ attribute, e.g.::
echo lzo > /sys/module/zswap/parameters/compressor
-When the zpool and/or compressor parameter is changed at runtime, any existing
-compressed pages are not modified; they are left in their own zpool. When a
-request is made for a page in an old zpool, it is uncompressed using its
-original compressor. Once all pages are removed from an old zpool, the zpool
-and its compressor are freed.
+When the compressor parameter is changed at runtime, any existing compressed
+pages are not modified; they are left in their own pool. When a request is
+made for a page in an old pool, it is uncompressed using its original
+compressor. Once all pages are removed from an old pool, the pool and its
+compressor are freed.
Some of the pages in zswap are same-value filled pages (i.e. contents of the
page have same value or repetitive pattern). These pages include zero-filled
diff --git a/Documentation/admin-guide/nfs/nfsroot.rst b/Documentation/admin-guide/nfs/nfsroot.rst
index 135218f33394..06990309c6ff 100644
--- a/Documentation/admin-guide/nfs/nfsroot.rst
+++ b/Documentation/admin-guide/nfs/nfsroot.rst
@@ -342,7 +342,7 @@ They depend on various facilities being available:
When using pxelinux, the kernel image is specified using
"kernel <relative-path-below /tftpboot>". The nfsroot parameters
are passed to the kernel by adding them to the "append" line.
- It is common to use serial console in conjunction with pxeliunx,
+ It is common to use serial console in conjunction with pxelinux,
see Documentation/admin-guide/serial-console.rst for more information.
For more information on isolinux, including how to create bootdisks
diff --git a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
index cb376f335f40..167f9281fbf5 100644
--- a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
+++ b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
@@ -16,8 +16,8 @@ provides the following two features:
- one 64-bit counter for Time Based Analysis (RX/TX data throughput and
time spent in each low-power LTSSM state) and
-- one 32-bit counter for Event Counting (error and non-error events for
- a specified lane)
+- one 32-bit counter per event for Event Counting (error and non-error
+ events for a specified lane)
Note: There is no interrupt for counter overflow.
diff --git a/Documentation/admin-guide/perf/fujitsu_uncore_pmu.rst b/Documentation/admin-guide/perf/fujitsu_uncore_pmu.rst
new file mode 100644
index 000000000000..46595b788d3a
--- /dev/null
+++ b/Documentation/admin-guide/perf/fujitsu_uncore_pmu.rst
@@ -0,0 +1,110 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+================================================
+Fujitsu Uncore Performance Monitoring Unit (PMU)
+================================================
+
+This driver supports the Uncore MAC PMUs and the Uncore PCI PMUs found
+in Fujitsu chips.
+Each MAC PMU on these chips is exposed as a uncore perf PMU with device name
+mac_iod<iod>_mac<mac>_ch<ch>.
+And each PCI PMU on these chips is exposed as a uncore perf PMU with device name
+pci_iod<iod>_pci<pci>.
+
+The driver provides a description of its available events and configuration
+options in sysfs, see /sys/bus/event_sources/devices/mac_iod<iod>_mac<mac>_ch<ch>/
+and /sys/bus/event_sources/devices/pci_iod<iod>_pci<pci>/.
+This driver exports:
+- formats, used by perf user space and other tools to configure events
+- events, used by perf user space and other tools to create events
+ symbolically, e.g.:
+ perf stat -a -e mac_iod0_mac0_ch0/event=0x21/ ls
+ perf stat -a -e pci_iod0_pci0/event=0x24/ ls
+- cpumask, used by perf user space and other tools to know on which CPUs
+ to open the events
+
+This driver supports the following events for MAC:
+- cycles
+ This event counts MAC cycles at MAC frequency.
+- read-count
+ This event counts the number of read requests to MAC.
+- read-count-request
+ This event counts the number of read requests including retry to MAC.
+- read-count-return
+ This event counts the number of responses to read requests to MAC.
+- read-count-request-pftgt
+ This event counts the number of read requests including retry with PFTGT
+ flag.
+- read-count-request-normal
+ This event counts the number of read requests including retry without PFTGT
+ flag.
+- read-count-return-pftgt-hit
+ This event counts the number of responses to read requests which hit the
+ PFTGT buffer.
+- read-count-return-pftgt-miss
+ This event counts the number of responses to read requests which miss the
+ PFTGT buffer.
+- read-wait
+ This event counts outstanding read requests issued by DDR memory controller
+ per cycle.
+- write-count
+ This event counts the number of write requests to MAC (including zero write,
+ full write, partial write, write cancel).
+- write-count-write
+ This event counts the number of full write requests to MAC (not including
+ zero write).
+- write-count-pwrite
+ This event counts the number of partial write requests to MAC.
+- memory-read-count
+ This event counts the number of read requests from MAC to memory.
+- memory-write-count
+ This event counts the number of full write requests from MAC to memory.
+- memory-pwrite-count
+ This event counts the number of partial write requests from MAC to memory.
+- ea-mac
+ This event counts energy consumption of MAC.
+- ea-memory
+ This event counts energy consumption of memory.
+- ea-memory-mac-write
+ This event counts the number of write requests from MAC to memory.
+- ea-ha
+ This event counts energy consumption of HA.
+
+ 'ea' is the abbreviation for 'Energy Analyzer'.
+
+Examples for use with perf::
+
+ perf stat -e mac_iod0_mac0_ch0/ea-mac/ ls
+
+And, this driver supports the following events for PCI:
+- pci-port0-cycles
+ This event counts PCI cycles at PCI frequency in port0.
+- pci-port0-read-count
+ This event counts read transactions for data transfer in port0.
+- pci-port0-read-count-bus
+ This event counts read transactions for bus usage in port0.
+- pci-port0-write-count
+ This event counts write transactions for data transfer in port0.
+- pci-port0-write-count-bus
+ This event counts write transactions for bus usage in port0.
+- pci-port1-cycles
+ This event counts PCI cycles at PCI frequency in port1.
+- pci-port1-read-count
+ This event counts read transactions for data transfer in port1.
+- pci-port1-read-count-bus
+ This event counts read transactions for bus usage in port1.
+- pci-port1-write-count
+ This event counts write transactions for data transfer in port1.
+- pci-port1-write-count-bus
+ This event counts write transactions for bus usage in port1.
+- ea-pci
+ This event counts energy consumption of PCI.
+
+ 'ea' is the abbreviation for 'Energy Analyzer'.
+
+Examples for use with perf::
+
+ perf stat -e pci_iod0_pci0/ea-pci/ ls
+
+Given that these are uncore PMUs the driver does not support sampling, therefore
+"perf record" will not work. Per-task perf sessions are not supported.
diff --git a/Documentation/admin-guide/perf/hisi-pmu.rst b/Documentation/admin-guide/perf/hisi-pmu.rst
index 48992a0b8e94..f3d9871906a2 100644
--- a/Documentation/admin-guide/perf/hisi-pmu.rst
+++ b/Documentation/admin-guide/perf/hisi-pmu.rst
@@ -18,9 +18,10 @@ HiSilicon SoC uncore PMU driver
Each device PMU has separate registers for event counting, control and
interrupt, and the PMU driver shall register perf PMU drivers like L3C,
HHA and DDRC etc. The available events and configuration options shall
-be described in the sysfs, see:
+be described in the sysfs, see::
+
+/sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>
-/sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>.
The "perf list" command shall list the available events from sysfs.
Each L3C, HHA and DDRC is registered as a separate PMU with perf. The PMU
@@ -109,8 +110,52 @@ uring channel. It is 2 bits. Some important codes are as follows:
- 2'b11: count the events which sent to the uring_ext (MATA) channel;
- 2'b01: is the same as 2'b11;
- 2'b10: count the events which sent to the uring (non-MATA) channel;
-- 2'b00: default value, count the events which sent to the both uring and
- uring_ext channel;
+- 2'b00: default value, count the events which sent to both uring and
+ uring_ext channels;
+
+6. ch: NoC PMU supports filtering the event counts of certain transaction
+channel with this option. The current supported channels are as follows:
+
+- 3'b010: Request channel
+- 3'b100: Snoop channel
+- 3'b110: Response channel
+- 3'b111: Data channel
+
+7. tt_en: NoC PMU supports counting only transactions that have tracetag set
+if this option is set. See the 2nd list for more information about tracetag.
+
+For HiSilicon uncore PMU v3 whose identifier is 0x40, some uncore PMUs are
+further divided into parts for finer granularity of tracing, each part has its
+own dedicated PMU, and all such PMUs together cover the monitoring job of events
+on particular uncore device. Such PMUs are described in sysfs with name format
+slightly changed::
+
+/sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}_{Z}/ddrc{Y}_{Z}/noc{Y}_{Z}>
+
+Z is the sub-id, indicating different PMUs for part of hardware device.
+
+Usage of most PMUs with different sub-ids are identical. Specially, L3C PMU
+provides ``ext`` option to allow exploration of even finer granual statistics
+of L3C PMU. L3C PMU driver uses that as hint of termination when delivering
+perf command to hardware:
+
+- ext=0: Default, could be used with event names.
+- ext=1 and ext=2: Must be used with event codes, event names are not supported.
+
+An example of perf command could be::
+
+ $# perf stat -a -e hisi_sccl0_l3c1_0/rd_spipe/ sleep 5
+
+or::
+
+ $# perf stat -a -e hisi_sccl0_l3c1_0/event=0x1,ext=1/ sleep 5
+
+As above, ``hisi_sccl0_l3c1_0`` locates PMU of Super CPU CLuster 0, L3 cache 1
+pipe0.
+
+First command locates the first part of L3C since ``ext=0`` is implied by
+default. Second command issues the counting on another part of L3C with the
+event ``0x1``.
Users could configure IDs to count data come from specific CCL/ICL, by setting
srcid_cmd & srcid_msk, and data desitined for specific CCL/ICL by setting
diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
index 072b510385c4..47d9a3df6329 100644
--- a/Documentation/admin-guide/perf/index.rst
+++ b/Documentation/admin-guide/perf/index.rst
@@ -29,3 +29,4 @@ Performance monitor support
cxl
ampere_cspmu
mrvl-pem-pmu
+ fujitsu_uncore_pmu
diff --git a/Documentation/admin-guide/quickly-build-trimmed-linux.rst b/Documentation/admin-guide/quickly-build-trimmed-linux.rst
index 4a5ffb0996a3..cb4b78468a93 100644
--- a/Documentation/admin-guide/quickly-build-trimmed-linux.rst
+++ b/Documentation/admin-guide/quickly-build-trimmed-linux.rst
@@ -273,7 +273,7 @@ again.
does nothing at all; in that case you have to manually install your kernel,
as outlined in the reference section.
- If you are running a immutable Linux distribution, check its documentation
+ If you are running an immutable Linux distribution, check its documentation
and the web to find out how to install your own kernel there.
[:ref:`details<install>`]
@@ -884,7 +884,7 @@ When a build error occurs, it might be caused by some aspect of your machine's
setup that often can be fixed quickly; other times though the problem lies in
the code and can only be fixed by a developer. A close examination of the
failure messages coupled with some research on the internet will often tell you
-which of the two it is. To perform such a investigation, restart the build
+which of the two it is. To perform such an investigation, restart the build
process like this::
make V=1
diff --git a/Documentation/admin-guide/reporting-issues.rst b/Documentation/admin-guide/reporting-issues.rst
index 9a847506f6ec..a68e6d909274 100644
--- a/Documentation/admin-guide/reporting-issues.rst
+++ b/Documentation/admin-guide/reporting-issues.rst
@@ -611,7 +611,7 @@ better place.
How to read the MAINTAINERS file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-To illustrate how to use the :ref:`MAINTAINERS <maintainers>` file, lets assume
+To illustrate how to use the :ref:`MAINTAINERS <maintainers>` file, let's assume
the WiFi in your Laptop suddenly misbehaves after updating the kernel. In that
case it's likely an issue in the WiFi driver. Obviously it could also be some
code it builds upon, but unless you suspect something like that stick to the
@@ -1543,7 +1543,7 @@ as well, because that will speed things up.
And note, it helps developers a great deal if you can specify the exact version
that introduced the problem. Hence if possible within a reasonable time frame,
-try to find that version using vanilla kernels. Lets assume something broke when
+try to find that version using vanilla kernels. Let's assume something broke when
your distributor released a update from Linux kernel 5.10.5 to 5.10.8. Then as
instructed above go and check the latest kernel from that version line, say
5.10.9. If it shows the problem, try a vanilla 5.10.5 to ensure that no patches
diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst
index 6c54718c9d04..9b7f65c3efd8 100644
--- a/Documentation/admin-guide/sysctl/fs.rst
+++ b/Documentation/admin-guide/sysctl/fs.rst
@@ -164,8 +164,8 @@ pipe-user-pages-soft
--------------------
Maximum total number of pages a non-privileged user may allocate for pipes
-before the pipe size gets limited to a single page. Once this limit is reached,
-new pipes will be limited to a single page in size for this user in order to
+before the pipe size gets limited to two pages. Once this limit is reached,
+new pipes will be limited to two pages in size for this user in order to
limit total memory usage, and trying to increase them using ``fcntl()`` will be
denied until usage goes below the limit again. The default value allows to
allocate up to 1024 pipes at their default size. When set to 0, no limit is
diff --git a/Documentation/admin-guide/sysctl/index.rst b/Documentation/admin-guide/sysctl/index.rst
index 03346f98c7b9..4dd2c9b5d752 100644
--- a/Documentation/admin-guide/sysctl/index.rst
+++ b/Documentation/admin-guide/sysctl/index.rst
@@ -66,25 +66,31 @@ This documentation is about:
=============== ===============================================================
abi/ execution domains & personalities
-debug/ <empty>
-dev/ device specific information (eg dev/cdrom/info)
+<$ARCH> tuning controls for various CPU architecture (e.g. csky, s390)
+crypto/ <undocumented>
+debug/ <undocumented>
+dev/ device specific information (e.g. dev/cdrom/info)
fs/ specific filesystems
filehandle, inode, dentry and quota tuning
binfmt_misc <Documentation/admin-guide/binfmt-misc.rst>
kernel/ global kernel info / tuning
miscellaneous stuff
+ some architecture-specific controls
+ security (LSM) stuff
net/ networking stuff, for documentation look in:
<Documentation/networking/>
proc/ <empty>
sunrpc/ SUN Remote Procedure Call (NFS)
+user/ Per user namespace limits
vm/ memory management tuning
buffer and cache management
-user/ Per user per user namespace limits
+xen/ <undocumented>
=============== ===============================================================
-These are the subdirs I have on my system. There might be more
-or other subdirs in another setup. If you see another dir, I'd
-really like to hear about it :-)
+These are the subdirs I have on my system or have been discovered by
+searching through the source code. There might be more or other subdirs
+in another setup. If you see another dir, I'd really like to hear about
+it :-)
.. toctree::
:maxdepth: 1
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 8b49eab937d0..f3ee807b5d8b 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -890,7 +890,7 @@ bit 1 print system memory info
bit 2 print timer info
bit 3 print locks info if ``CONFIG_LOCKDEP`` is on
bit 4 print ftrace buffer
-bit 5 replay all messages on consoles at the end of panic
+bit 5 replay all kernel messages on consoles at the end of panic
bit 6 print all CPUs backtrace (if available in the arch)
bit 7 print only tasks in uninterruptible (blocked) state
===== ============================================
diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
index 7b0c4291c686..2ef50828aff1 100644
--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -222,6 +222,8 @@ rmem_max
The maximum receive socket buffer size in bytes.
+Default: 4194304
+
rps_default_mask
----------------
@@ -247,6 +249,8 @@ wmem_max
The maximum send socket buffer size in bytes.
+Default: 4194304
+
message_burst and message_cost
------------------------------
diff --git a/Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst b/Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst
index d8946b084b1e..d83601f2a459 100644
--- a/Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst
+++ b/Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst
@@ -1757,7 +1757,7 @@ or all of these tasks:
to your bootloader's configuration.
You have to take care of some or all of the tasks yourself, if your
-distribution lacks a installkernel script or does only handle part of them.
+distribution lacks an installkernel script or does only handle part of them.
Consult the distribution's documentation for details. If in doubt, install the
kernel manually::
diff --git a/Documentation/admin-guide/xfs.rst b/Documentation/admin-guide/xfs.rst
index a18328a5fb93..c85cd327af28 100644
--- a/Documentation/admin-guide/xfs.rst
+++ b/Documentation/admin-guide/xfs.rst
@@ -34,22 +34,6 @@ When mounting an XFS filesystem, the following options are accepted.
to the file. Specifying a fixed ``allocsize`` value turns off
the dynamic behaviour.
- attr2 or noattr2
- The options enable/disable an "opportunistic" improvement to
- be made in the way inline extended attributes are stored
- on-disk. When the new form is used for the first time when
- ``attr2`` is selected (either when setting or removing extended
- attributes) the on-disk superblock feature bit field will be
- updated to reflect this format being in use.
-
- The default behaviour is determined by the on-disk feature
- bit indicating that ``attr2`` behaviour is active. If either
- mount option is set, then that becomes the new default used
- by the filesystem.
-
- CRC enabled filesystems always use the ``attr2`` format, and so
- will reject the ``noattr2`` mount option if it is set.
-
discard or nodiscard (default)
Enable/disable the issuing of commands to let the block
device reclaim space freed by the filesystem. This is
@@ -75,12 +59,6 @@ When mounting an XFS filesystem, the following options are accepted.
across the entire filesystem rather than just on directories
configured to use it.
- ikeep or noikeep (default)
- When ``ikeep`` is specified, XFS does not delete empty inode
- clusters and keeps them around on disk. When ``noikeep`` is
- specified, empty inode clusters are returned to the free
- space pool.
-
inode32 or inode64 (default)
When ``inode32`` is specified, it indicates that XFS limits
inode creation to locations which will not result in inode
@@ -253,9 +231,8 @@ latest version and try again.
The deprecation will take place in two parts. Support for mounting V4
filesystems can now be disabled at kernel build time via Kconfig option.
-The option will default to yes until September 2025, at which time it
-will be changed to default to no. In September 2030, support will be
-removed from the codebase entirely.
+These options were changed to default to no in September 2025. In
+September 2030, support will be removed from the codebase entirely.
Note: Distributors may choose to withdraw V4 format support earlier than
the dates listed above.
@@ -268,8 +245,6 @@ Deprecated Mount Options
============================ ================
Mounting with V4 filesystem September 2030
Mounting ascii-ci filesystem September 2030
-ikeep/noikeep September 2025
-attr2/noattr2 September 2025
============================ ================
@@ -285,6 +260,8 @@ Removed Mount Options
osyncisdsync/osyncisosync v4.0
barrier v4.19
nobarrier v4.19
+ ikeep/noikeep v6.18
+ attr2/noattr2 v6.18
=========================== =======
sysctls
@@ -312,9 +289,6 @@ The following sysctls are available for the XFS filesystem:
removes unused preallocation from clean inodes and releases
the unused space back to the free pool.
- fs.xfs.speculative_cow_prealloc_lifetime
- This is an alias for speculative_prealloc_lifetime.
-
fs.xfs.error_level (Min: 0 Default: 3 Max: 11)
A volume knob for error reporting when internal errors occur.
This will generate detailed messages & backtraces for filesystem
@@ -341,17 +315,6 @@ The following sysctls are available for the XFS filesystem:
This option is intended for debugging only.
- fs.xfs.irix_symlink_mode (Min: 0 Default: 0 Max: 1)
- Controls whether symlinks are created with mode 0777 (default)
- or whether their mode is affected by the umask (irix mode).
-
- fs.xfs.irix_sgid_inherit (Min: 0 Default: 0 Max: 1)
- Controls files created in SGID directories.
- If the group ID of the new file does not match the effective group
- ID or one of the supplementary group IDs of the parent dir, the
- ISGID bit is cleared if the irix_sgid_inherit compatibility sysctl
- is set.
-
fs.xfs.inherit_sync (Min: 0 Default: 1 Max: 1)
Setting this to "1" will cause the "sync" flag set
by the **xfs_io(8)** chattr command on a directory to be
@@ -387,24 +350,20 @@ The following sysctls are available for the XFS filesystem:
Deprecated Sysctls
==================
-=========================================== ================
- Name Removal Schedule
-=========================================== ================
-fs.xfs.irix_sgid_inherit September 2025
-fs.xfs.irix_symlink_mode September 2025
-fs.xfs.speculative_cow_prealloc_lifetime September 2025
-=========================================== ================
-
+None currently.
Removed Sysctls
===============
-============================= =======
- Name Removed
-============================= =======
- fs.xfs.xfsbufd_centisec v4.0
- fs.xfs.age_buffer_centisecs v4.0
-============================= =======
+========================================== =======
+ Name Removed
+========================================== =======
+ fs.xfs.xfsbufd_centisec v4.0
+ fs.xfs.age_buffer_centisecs v4.0
+ fs.xfs.irix_symlink_mode v6.18
+ fs.xfs.irix_sgid_inherit v6.18
+ fs.xfs.speculative_cow_prealloc_lifetime v6.18
+========================================== =======
Error handling
==============