summaryrefslogtreecommitdiff
path: root/Documentation/filesystems
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r--Documentation/filesystems/ext4/inodes.rst2
-rw-r--r--Documentation/filesystems/ext4/super.rst4
-rw-r--r--Documentation/filesystems/fscrypt.rst2
-rw-r--r--Documentation/filesystems/gfs2/glocks.rst (renamed from Documentation/filesystems/gfs2-glocks.rst)0
-rw-r--r--Documentation/filesystems/gfs2/index.rst (renamed from Documentation/filesystems/gfs2.rst)12
-rw-r--r--Documentation/filesystems/gfs2/uevents.rst (renamed from Documentation/filesystems/gfs2-uevents.rst)0
-rw-r--r--Documentation/filesystems/index.rst4
-rw-r--r--Documentation/filesystems/iomap/operations.rst50
-rw-r--r--Documentation/filesystems/nfs/index.rst1
-rw-r--r--Documentation/filesystems/nfs/nfsd-io-modes.rst153
-rw-r--r--Documentation/filesystems/nfs/nfsd-maintainer-entry-profile.rst547
-rw-r--r--Documentation/filesystems/porting.rst27
-rw-r--r--Documentation/filesystems/proc.rst5
-rw-r--r--Documentation/filesystems/ramfs-rootfs-initramfs.rst12
-rw-r--r--Documentation/filesystems/resctrl.rst134
-rw-r--r--Documentation/filesystems/vfs.rst4
-rw-r--r--Documentation/filesystems/xfs/xfs-online-fsck-design.rst238
17 files changed, 920 insertions, 275 deletions
diff --git a/Documentation/filesystems/ext4/inodes.rst b/Documentation/filesystems/ext4/inodes.rst
index cfc6c1659931..55cd5c380e92 100644
--- a/Documentation/filesystems/ext4/inodes.rst
+++ b/Documentation/filesystems/ext4/inodes.rst
@@ -297,6 +297,8 @@ The ``i_flags`` field is a combination of these values:
- Inode has inline data (EXT4_INLINE_DATA_FL).
* - 0x20000000
- Create children with the same project ID (EXT4_PROJINHERIT_FL).
+ * - 0x40000000
+ - Use case-insensitive lookups for directory contents (EXT4_CASEFOLD_FL).
* - 0x80000000
- Reserved for ext4 library (EXT4_RESERVED_FL).
* -
diff --git a/Documentation/filesystems/ext4/super.rst b/Documentation/filesystems/ext4/super.rst
index 1b240661bfa3..9a59cded9bd7 100644
--- a/Documentation/filesystems/ext4/super.rst
+++ b/Documentation/filesystems/ext4/super.rst
@@ -671,7 +671,9 @@ following:
* - 0x8000
- Data in inode (INCOMPAT_INLINE_DATA).
* - 0x10000
- - Encrypted inodes are present on the filesystem. (INCOMPAT_ENCRYPT).
+ - Encrypted inodes can be present. (INCOMPAT_ENCRYPT).
+ * - 0x20000
+ - Directories can be marked case-insensitive. (INCOMPAT_CASEFOLD).
.. _super_rocompat:
diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst
index 696a5844bfa3..70af896822e1 100644
--- a/Documentation/filesystems/fscrypt.rst
+++ b/Documentation/filesystems/fscrypt.rst
@@ -450,9 +450,7 @@ API, but the filenames mode still does.
- CONFIG_CRYPTO_HCTR2
- Recommended:
- arm64: CONFIG_CRYPTO_AES_ARM64_CE_BLK
- - arm64: CONFIG_CRYPTO_POLYVAL_ARM64_CE
- x86: CONFIG_CRYPTO_AES_NI_INTEL
- - x86: CONFIG_CRYPTO_POLYVAL_CLMUL_NI
- Adiantum
- Mandatory:
diff --git a/Documentation/filesystems/gfs2-glocks.rst b/Documentation/filesystems/gfs2/glocks.rst
index ce5ff08cbd59..ce5ff08cbd59 100644
--- a/Documentation/filesystems/gfs2-glocks.rst
+++ b/Documentation/filesystems/gfs2/glocks.rst
diff --git a/Documentation/filesystems/gfs2.rst b/Documentation/filesystems/gfs2/index.rst
index 1bc48a13430c..e5e195403561 100644
--- a/Documentation/filesystems/gfs2.rst
+++ b/Documentation/filesystems/gfs2/index.rst
@@ -4,6 +4,9 @@
Global File System 2
====================
+Overview
+========
+
GFS2 is a cluster file system. It allows a cluster of computers to
simultaneously use a block device that is shared between them (with FC,
iSCSI, NBD, etc). GFS2 reads and writes to the block device like a local
@@ -50,3 +53,12 @@ The following man pages are available from gfs2-utils:
gfs2_convert to convert a gfs filesystem to GFS2 in-place
mkfs.gfs2 to make a filesystem
============ =============================================
+
+Implementation Notes
+====================
+
+.. toctree::
+ :maxdepth: 1
+
+ glocks
+ uevents
diff --git a/Documentation/filesystems/gfs2-uevents.rst b/Documentation/filesystems/gfs2/uevents.rst
index f162a2c76c69..f162a2c76c69 100644
--- a/Documentation/filesystems/gfs2-uevents.rst
+++ b/Documentation/filesystems/gfs2/uevents.rst
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index af516e528ded..f4873197587d 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -89,9 +89,7 @@ Documentation for filesystem implementations.
ext3
ext4/index
f2fs
- gfs2
- gfs2-uevents
- gfs2-glocks
+ gfs2/index
hfs
hfsplus
hpfs
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index 387fd9cc72ca..da982ca7e413 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -135,6 +135,27 @@ These ``struct kiocb`` flags are significant for buffered I/O with iomap:
* ``IOCB_DONTCACHE``: Turns on ``IOMAP_DONTCACHE``.
+``struct iomap_read_ops``
+--------------------------
+
+.. code-block:: c
+
+ struct iomap_read_ops {
+ int (*read_folio_range)(const struct iomap_iter *iter,
+ struct iomap_read_folio_ctx *ctx, size_t len);
+ void (*submit_read)(struct iomap_read_folio_ctx *ctx);
+ };
+
+iomap calls these functions:
+
+ - ``read_folio_range``: Called to read in the range. This must be provided
+ by the caller. If this succeeds, iomap_finish_folio_read() must be called
+ after the range is read in, regardless of whether the read succeeded or
+ failed.
+
+ - ``submit_read``: Submit any pending read requests. This function is
+ optional.
+
Internal per-Folio State
------------------------
@@ -182,6 +203,28 @@ The ``flags`` argument to ``->iomap_begin`` will be set to zero.
The pagecache takes whatever locks it needs before calling the
filesystem.
+Both ``iomap_readahead`` and ``iomap_read_folio`` pass in a ``struct
+iomap_read_folio_ctx``:
+
+.. code-block:: c
+
+ struct iomap_read_folio_ctx {
+ const struct iomap_read_ops *ops;
+ struct folio *cur_folio;
+ struct readahead_control *rac;
+ void *read_ctx;
+ };
+
+``iomap_readahead`` must set:
+ * ``ops->read_folio_range()`` and ``rac``
+
+``iomap_read_folio`` must set:
+ * ``ops->read_folio_range()`` and ``cur_folio``
+
+``ops->submit_read()`` and ``read_ctx`` are optional. ``read_ctx`` is used to
+pass in any custom data the caller needs accessible in the ops callbacks for
+fulfilling reads.
+
Buffered Writes
---------------
@@ -317,6 +360,9 @@ The fields are as follows:
delalloc reservations to avoid having delalloc reservations for
clean pagecache.
This function must be supplied by the filesystem.
+ If this succeeds, iomap_finish_folio_write() must be called once writeback
+ completes for the range, regardless of whether the writeback succeeded or
+ failed.
- ``writeback_submit``: Submit the previous built writeback context.
Block based file systems should use the iomap_ioend_writeback_submit
@@ -444,10 +490,6 @@ These ``struct kiocb`` flags are significant for direct I/O with iomap:
Only meaningful for asynchronous I/O, and only if the entire I/O can
be issued as a single ``struct bio``.
- * ``IOCB_DIO_CALLER_COMP``: Try to run I/O completion from the caller's
- process context.
- See ``linux/fs.h`` for more details.
-
Filesystems should call ``iomap_dio_rw`` from ``->read_iter`` and
``->write_iter``, and set ``FMODE_CAN_ODIRECT`` in the ``->open``
function for the file.
diff --git a/Documentation/filesystems/nfs/index.rst b/Documentation/filesystems/nfs/index.rst
index 95c2c009874c..a29a212b5b4d 100644
--- a/Documentation/filesystems/nfs/index.rst
+++ b/Documentation/filesystems/nfs/index.rst
@@ -13,5 +13,6 @@ NFS
rpc-cache
rpc-server-gss
nfs41-server
+ nfsd-io-modes
knfsd-stats
reexport
diff --git a/Documentation/filesystems/nfs/nfsd-io-modes.rst b/Documentation/filesystems/nfs/nfsd-io-modes.rst
new file mode 100644
index 000000000000..0fd6e82478fe
--- /dev/null
+++ b/Documentation/filesystems/nfs/nfsd-io-modes.rst
@@ -0,0 +1,153 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============
+NFSD IO MODES
+=============
+
+Overview
+========
+
+NFSD has historically always used buffered IO when servicing READ and
+WRITE operations. BUFFERED is NFSD's default IO mode, but it is possible
+to override that default to use either DONTCACHE or DIRECT IO modes.
+
+Experimental NFSD debugfs interfaces are available to allow the NFSD IO
+mode used for READ and WRITE to be configured independently. See both:
+
+- /sys/kernel/debug/nfsd/io_cache_read
+- /sys/kernel/debug/nfsd/io_cache_write
+
+The default value for both io_cache_read and io_cache_write reflects
+NFSD's default IO mode (which is NFSD_IO_BUFFERED=0).
+
+Based on the configured settings, NFSD's IO will either be:
+
+- cached using page cache (NFSD_IO_BUFFERED=0)
+- cached but removed from page cache on completion (NFSD_IO_DONTCACHE=1)
+- not cached stable_how=NFS_UNSTABLE (NFSD_IO_DIRECT=2)
+
+To set an NFSD IO mode, write a supported value (0 - 2) to the
+corresponding IO operation's debugfs interface, e.g.::
+
+ echo 2 > /sys/kernel/debug/nfsd/io_cache_read
+ echo 2 > /sys/kernel/debug/nfsd/io_cache_write
+
+To check which IO mode NFSD is using for READ or WRITE, simply read the
+corresponding IO operation's debugfs interface, e.g.::
+
+ cat /sys/kernel/debug/nfsd/io_cache_read
+ cat /sys/kernel/debug/nfsd/io_cache_write
+
+If you experiment with NFSD's IO modes on a recent kernel and have
+interesting results, please report them to linux-nfs@vger.kernel.org
+
+NFSD DONTCACHE
+==============
+
+DONTCACHE offers a hybrid approach to servicing IO that aims to offer
+the benefits of using DIRECT IO without any of the strict alignment
+requirements that DIRECT IO imposes. To achieve this buffered IO is used
+but the IO is flagged to "drop behind" (meaning associated pages are
+dropped from the page cache) when IO completes.
+
+DONTCACHE aims to avoid what has proven to be a fairly significant
+limition of Linux's memory management subsystem if/when large amounts of
+data is infrequently accessed (e.g. read once _or_ written once but not
+read until much later). Such use-cases are particularly problematic
+because the page cache will eventually become a bottleneck to servicing
+new IO requests.
+
+For more context on DONTCACHE, please see these Linux commit headers:
+
+- Overview: 9ad6344568cc3 ("mm/filemap: change filemap_create_folio()
+ to take a struct kiocb")
+- for READ: 8026e49bff9b1 ("mm/filemap: add read support for
+ RWF_DONTCACHE")
+- for WRITE: 974c5e6139db3 ("xfs: flag as supporting FOP_DONTCACHE")
+
+NFSD_IO_DONTCACHE will fall back to NFSD_IO_BUFFERED if the underlying
+filesystem doesn't indicate support by setting FOP_DONTCACHE.
+
+NFSD DIRECT
+===========
+
+DIRECT IO doesn't make use of the page cache, as such it is able to
+avoid the Linux memory management's page reclaim scalability problems
+without resorting to the hybrid use of page cache that DONTCACHE does.
+
+Some workloads benefit from NFSD avoiding the page cache, particularly
+those with a working set that is significantly larger than available
+system memory. The pathological worst-case workload that NFSD DIRECT has
+proven to help most is: NFS client issuing large sequential IO to a file
+that is 2-3 times larger than the NFS server's available system memory.
+The reason for such improvement is NFSD DIRECT eliminates a lot of work
+that the memory management subsystem would otherwise be required to
+perform (e.g. page allocation, dirty writeback, page reclaim). When
+using NFSD DIRECT, kswapd and kcompactd are no longer commanding CPU
+time trying to find adequate free pages so that forward IO progress can
+be made.
+
+The performance win associated with using NFSD DIRECT was previously
+discussed on linux-nfs, see:
+https://lore.kernel.org/linux-nfs/aEslwqa9iMeZjjlV@kernel.org/
+
+But in summary:
+
+- NFSD DIRECT can significantly reduce memory requirements
+- NFSD DIRECT can reduce CPU load by avoiding costly page reclaim work
+- NFSD DIRECT can offer more deterministic IO performance
+
+As always, your mileage may vary and so it is important to carefully
+consider if/when it is beneficial to make use of NFSD DIRECT. When
+assessing comparative performance of your workload please be sure to log
+relevant performance metrics during testing (e.g. memory usage, cpu
+usage, IO performance). Using perf to collect perf data that may be used
+to generate a "flamegraph" for work Linux must perform on behalf of your
+test is a really meaningful way to compare the relative health of the
+system and how switching NFSD's IO mode changes what is observed.
+
+If NFSD_IO_DIRECT is specified by writing 2 (or 3 and 4 for WRITE) to
+NFSD's debugfs interfaces, ideally the IO will be aligned relative to
+the underlying block device's logical_block_size. Also the memory buffer
+used to store the READ or WRITE payload must be aligned relative to the
+underlying block device's dma_alignment.
+
+But NFSD DIRECT does handle misaligned IO in terms of O_DIRECT as best
+it can:
+
+Misaligned READ:
+ If NFSD_IO_DIRECT is used, expand any misaligned READ to the next
+ DIO-aligned block (on either end of the READ). The expanded READ is
+ verified to have proper offset/len (logical_block_size) and
+ dma_alignment checking.
+
+Misaligned WRITE:
+ If NFSD_IO_DIRECT is used, split any misaligned WRITE into a start,
+ middle and end as needed. The large middle segment is DIO-aligned
+ and the start and/or end are misaligned. Buffered IO is used for the
+ misaligned segments and O_DIRECT is used for the middle DIO-aligned
+ segment. DONTCACHE buffered IO is _not_ used for the misaligned
+ segments because using normal buffered IO offers significant RMW
+ performance benefit when handling streaming misaligned WRITEs.
+
+Tracing:
+ The nfsd_read_direct trace event shows how NFSD expands any
+ misaligned READ to the next DIO-aligned block (on either end of the
+ original READ, as needed).
+
+ This combination of trace events is useful for READs::
+
+ echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_read_vector/enable
+ echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_read_direct/enable
+ echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_read_io_done/enable
+ echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_read/enable
+
+ The nfsd_write_direct trace event shows how NFSD splits a given
+ misaligned WRITE into a DIO-aligned middle segment.
+
+ This combination of trace events is useful for WRITEs::
+
+ echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_opened/enable
+ echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_direct/enable
+ echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_io_done/enable
+ echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_write/enable
diff --git a/Documentation/filesystems/nfs/nfsd-maintainer-entry-profile.rst b/Documentation/filesystems/nfs/nfsd-maintainer-entry-profile.rst
new file mode 100644
index 000000000000..4d6b57dbab2a
--- /dev/null
+++ b/Documentation/filesystems/nfs/nfsd-maintainer-entry-profile.rst
@@ -0,0 +1,547 @@
+NFSD Maintainer Entry Profile
+=============================
+
+A Maintainer Entry Profile supplements the top-level process
+documents (found in Documentation/process/) with customs that are
+specific to a subsystem and its maintainers. A contributor may use
+this document to set their expectations and avoid common mistakes.
+A maintainer may use these profiles to look across subsystems for
+opportunities to converge on best common practices.
+
+Overview
+--------
+The Network File System (NFS) is a standardized family of network
+protocols that enable access to files across a set of network-
+connected peer hosts. Applications on NFS clients access files that
+reside on file systems that are shared by NFS servers. A single
+network peer can act as both an NFS client and an NFS server.
+
+NFSD refers to the NFS server implementation included in the Linux
+kernel. An in-kernel NFS server has fast access to files stored
+in file systems local to that server. NFSD can share files stored
+on most of the file system types native to Linux, including xfs,
+ext4, btrfs, and tmpfs.
+
+Mailing list
+------------
+The linux-nfs@vger.kernel.org mailing list is a public list. Its
+purpose is to enable collaboration among developers working on the
+Linux NFS stack, both client and server. It is not a place for
+conversations that are not related directly to the Linux NFS stack.
+
+The linux-nfs mailing list is archived on `lore.kernel.org <https://lore.kernel.org/linux-nfs/>`_.
+
+The Linux NFS community does not have any chat room.
+
+Reporting bugs
+--------------
+If you experience an NFSD-related bug on a distribution-built
+kernel, please start by working with your Linux distributor.
+
+Bug reports against upstream Linux code bases are welcome on the
+linux-nfs@vger.kernel.org mailing list, where some active triage
+can be done. NFSD bugs may also be reported in the Linux kernel
+community's bugzilla at:
+
+ https://bugzilla.kernel.org
+
+Please file NFSD-related bugs under the "Filesystems/NFSD"
+component. In general, including as much detail as possible is a
+good start, including pertinent system log messages from both
+the client and server.
+
+User space software related to NFSD, such as mountd or the exportfs
+command, is contained in the nfs-utils package. Report problems
+with those components to linux-nfs@vger.kernel.org. You might be
+directed to move the report to a specific bug tracker.
+
+Contributor's Guide
+-------------------
+
+Standards compliance
+~~~~~~~~~~~~~~~~~~~~
+The priority is for NFSD to interoperate fully with the Linux NFS
+client. We also test against other popular NFS client implementa-
+tions regularly at NFS bake-a-thon events (also known as plug-
+fests). Non-Linux NFS clients are not part of upstream NFSD CI/CD.
+
+The NFSD community strives to provide an NFS server implementation
+that interoperates with all standards-compliant NFS client
+implementations. This is done by staying as close as is sensible to
+the normative mandates in the IETF's published NFS, RPC, and GSS-API
+standards.
+
+It is always useful to reference an RFC and section number in a code
+comment where behavior deviates from the standard (and even when the
+behavior is compliant but the implementation is obfuscatory).
+
+On the rare occasion when a deviation from standard-mandated
+behavior is needed, brief documentation of the use case or
+deficiencies in the standard is a required part of in-code
+documentation.
+
+Care must always be taken to avoid leaking local error codes (ie,
+errnos) to clients of NFSD. A proper NFS status code is always
+required in NFS protocol replies.
+
+NFSD administrative interfaces
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+NFSD administrative interfaces include:
+
+- an NFSD or SUNRPC module parameter
+
+- export options in /etc/exports
+
+- files under /proc/fs/nfsd/ or /proc/sys/sunrpc/
+
+- the NFSD netlink protocol
+
+Frequently, a request is made to introduce or modify one of NFSD's
+traditional administrative interfaces. Certainly it is technically
+easy to introduce a new administrative setting. However, there are
+good reasons why the NFSD maintainers prefer to leave that as a last
+resort:
+
+- As with any API, administrative interfaces are difficult to get
+ right.
+
+- Once they are documented and have a legacy of use, administrative
+ interfaces become difficult to modify or remove.
+
+- Every new administrative setting multiplies the NFSD test matrix.
+
+- The cost of one administrative interface is incremental, but costs
+ add up across all of the existing interfaces.
+
+It is often better for everyone if effort is made up front to
+understanding the underlying requirement of the new setting, and
+then trying to make it tune itself (or to become otherwise
+unnecessary).
+
+If a new setting is indeed necessary, first consider adding it to
+the NFSD netlink protocol. Or if it doesn't need to be a reliable
+long term user space feature, it can be added to NFSD's menagerie of
+experimental settings which reside under /sys/kernel/debug/nfsd/ .
+
+Field observability
+~~~~~~~~~~~~~~~~~~~
+NFSD employs several different mechanisms for observing operation,
+including counters, printks, WARNings, and static trace points. Each
+have their strengths and weaknesses. Contributors should select the
+most appropriate tool for their task.
+
+- BUG must be avoided if at all possible, as it will frequently
+ result in a full system crash.
+
+- WARN is appropriate only when a full stack trace is useful.
+
+- printk can show detailed information. These must not be used
+ in code paths where they can be triggered repeatedly by remote
+ users.
+
+- dprintk can show detailed information, but can be enabled only
+ in pre-set groups. The overhead of emitting output makes dprintk
+ inappropriate for frequent operations like I/O.
+
+- Counters are always on, but provide little information about
+ individual events other than how frequently they occur.
+
+- static trace points can be enabled individually or in groups
+ (via a glob). These are generally low overhead, and thus are
+ favored for use in hot paths.
+
+- dynamic tracing, such as kprobes or eBPF, are quite flexible but
+ cannot be used in certain environments (eg, full kernel lock-
+ down).
+
+Testing
+~~~~~~~
+The kdevops project
+
+ https://github.com/linux-kdevops/kdevops
+
+contains several NFS-specific workflows, as well as the community
+standard fstests suite. These workflows are based on open source
+testing tools such as ltp and fio. Contributors are encouraged to
+use these tools without kdevops, or contributors should install and
+use kdevops themselves to verify their patches before submission.
+
+Coding style
+~~~~~~~~~~~~
+Follow the coding style preferences described in
+
+ Documentation/process/coding-style.rst
+
+with the following exceptions:
+
+- Add new local variables to a function in reverse Christmas tree
+ order
+
+- Use the kdoc comment style for
+ + non-static functions
+ + static inline functions
+ + static functions that are callbacks/virtual functions
+
+- All new function names start with ``nfsd_`` for non-NFS-version-
+ specific functions.
+
+- New function names that are specific to NFSv2 or NFSv3, or are
+ used by all minor versions of NFSv4, use ``nfsdN_`` where N is
+ the version.
+
+- New function names specific to an NFSv4 minor version can be
+ named with ``nfsd4M_`` where M is the minor version.
+
+Patch preparation
+~~~~~~~~~~~~~~~~~
+Read and follow all guidelines in
+
+ Documentation/process/submitting-patches.rst
+
+Use tagging to identify all patch authors. However, reviewers and
+testers should be added by replying to the email patch submission.
+Email is extensively used in order to publicly archive review and
+testing attributions. These tags are automatically inserted into
+your patches when they are applied.
+
+The code in the body of the diff already shows /what/ is being
+changed. Thus it is not necessary to repeat that in the patch
+description. Instead, the description should contain one or more
+of:
+
+- A brief problem statement ("what is this patch trying to fix?")
+ with a root-cause analysis.
+
+- End-user visible symptoms or items that a support engineer might
+ use to search for the patch, like stack traces.
+
+- A brief explanation of why the patch is the best way to address
+ the problem.
+
+- Any context that reviewers might need to understand the changes
+ made by the patch.
+
+- Any relevant benchmarking results, and/or functional test results.
+
+As detailed in Documentation/process/submitting-patches.rst,
+identify the point in history that the issue being addressed was
+introduced by using a Fixes: tag.
+
+Mention in the patch description if that point in history cannot be
+determined -- that is, no Fixes: tag can be provided. In this case,
+please make it clear to maintainers whether an LTS backport is
+needed even though there is no Fixes: tag.
+
+The NFSD maintainers prefer to add stable tagging themselves, after
+public discussion in response to the patch submission. Contributors
+may suggest stable tagging, but be aware that many version
+management tools add such stable Cc's when you post your patches.
+Don't add "Cc: stable" unless you are absolutely sure the patch
+needs to go to stable during the initial submission process.
+
+Patch submission
+~~~~~~~~~~~~~~~~
+Patches to NFSD are submitted via the kernel's email-based review
+process that is common to most other kernel subsystems.
+
+Just before each submission, rebase your patch or series on the
+nfsd-testing branch at
+
+ https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
+
+The NFSD subsystem is maintained separately from the Linux in-kernel
+NFS client. The NFSD maintainers do not normally take submissions
+for client changes, nor can they respond authoritatively to bug
+reports or feature requests for NFS client code.
+
+This means that contributors might be asked to resubmit patches if
+they were emailed to the incorrect set of maintainers and reviewers.
+This is not a rejection, but simply a correction of the submission
+process.
+
+When in doubt, consult the NFSD entry in the MAINTAINERS file to
+see which files and directories fall under the NFSD subsystem.
+
+The proper set of email addresses for NFSD patches are:
+
+To: the NFSD maintainers and reviewers listed in MAINTAINERS
+Cc: linux-nfs@vger.kernel.org and optionally linux-kernel@
+
+If there are other subsystems involved in the patches (for example
+MM or RDMA) their primary mailing list address can be included in
+the Cc: field. Other contributors and interested parties may be
+included there as well.
+
+In general we prefer that contributors use common patch email tools
+such as "git send-email" or "stg email format/send", which tend to
+get the details right without a lot of fuss.
+
+A series consisting of a single patch is not required to have a
+cover letter. However, a cover letter can be included if there is
+substantial context that is not appropriate to include in the
+patch description.
+
+Please note that, with an e-mail based submission process, series
+cover letters are not part of the work that is committed to the
+kernel source code base or its commit history. Therefore always try
+to keep pertinent information in the patch descriptions.
+
+Design documentation is welcome, but as cover letters are not
+preserved, a perhaps better option is to include a patch that adds
+such documentation under Documentation/filesystems/nfs/.
+
+Reviewers will ask about test coverage and what use cases the
+patches are expected to address. Please be prepared to answer these
+questions.
+
+Review comments from maintainers might be politely stated, but in
+general, these are not optional to address when they are actionable.
+If necessary, the maintainers retain the right to not apply patches
+when contributors refuse to address reasonable requests.
+
+Post changes to kernel source code and user space source code as
+separate series. You can connect the two series with comments in
+your cover letters.
+
+Generally the NFSD maintainers ask for a reposts even for simple
+modifications in order to publicly archive the request and the
+resulting repost before it is pulled into the NFSD trees. This
+also enables us to rebuild a patch series quickly without missing
+changes that might have been discussed via email.
+
+Avoid frequently reposting large series with only small changes. As
+a rule of thumb, posting substantial changes more than once a week
+will result in reviewer overload.
+
+Remember, there are only a handful of subsystem maintainers and
+reviewers, but potentially many sources of contributions. The
+maintainers and reviewers, therefore, are always the less scalable
+resource. Be kind to your friendly neighborhood maintainer.
+
+Patch Acceptance
+~~~~~~~~~~~~~~~~
+There isn't a formal review process for NFSD, but we like to see
+at least two Reviewed-by: notices for patches that are more than
+simple clean-ups. Reviews are done in public on
+linux-nfs@vger.kernel.org and are archived on lore.kernel.org.
+
+Currently the NFSD patch queues are maintained in branches here:
+
+ https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
+
+The NFSD maintainers apply patches initially to the nfsd-testing
+branch, which is always open to new submissions. Patches can be
+applied while review is ongoing. nfsd-testing is a topic branch,
+so it can change frequently, it will be rebased, and your patch
+might get dropped if there is a problem with it.
+
+Generally a script-generated "thank you" email will indicate when
+your patch has been added to the nfsd-testing branch. You can track
+the progress of your patch using the linux-nfs patchworks instance:
+
+ https://patchwork.kernel.org/project/linux-nfs/list/
+
+While your patch is in nfsd-testing, it is exposed to a variety of
+test environments, including community zero-day bots, static
+analysis tools, and NFSD continuous integration testing. The soak
+period is three to four weeks.
+
+Each patch that survives in nfsd-testing for the soak period without
+changes is moved to the nfsd-next branch.
+
+The nfsd-next branch is automatically merged into linux-next and
+fs-next on a nightly basis.
+
+Patches that survive in nfsd-next are included in the next NFSD
+merge window pull request. These windows typically occur once every
+63 days (nine weeks).
+
+When the upstream merge window closes, the nfsd-next branch is
+renamed nfsd-fixes, and a new nfsd-next branch is created, based on
+the upstream -rc1 tag.
+
+Fixes that are destined for an upstream -rc release also run the
+nfsd-testing gauntlet, but are then applied to the nfsd-fixes
+branch. That branch is made available for Linus to pull after a
+short time. In order to limit the risk of introducing regressions,
+we limit such fixes to emergency situations or fixes to breakage
+that occurred during the most recent upstream merge.
+
+Please make it clear when submitting an emergency patch that
+immediate action (either application to -rc or LTS backport) is
+needed.
+
+Sensitive patch submissions and bug reports
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+CVEs are generated by specific members of the Linux kernel community
+and several external entities. The Linux NFS community does not emit
+or assign CVEs. CVEs are assigned after an issue and its fix are
+known.
+
+However, the NFSD maintainers sometimes receive sensitive security
+reports, and at times these are significant enough to need to be
+embargoed. In such rare cases, fixes can be developed and reviewed
+out of the public eye.
+
+Please be aware that many version management tools add the stable
+Cc's when you post your patches. This is generally a nuisance, but
+it can result in outing an embargoed security issue accidentally.
+Don't add "Cc: stable" unless you are absolutely sure the patch
+needs to go to stable@ during the initial submission process.
+
+Patches that are merged without ever appearing on any list, and
+which carry a Reported-by: or Fixes: tag are detected as suspicious
+by security-focused people. We encourage that, after any private
+review, security-sensitive patches should be posted to linux-nfs@
+for the usual public review, archiving, and test period.
+
+LLM-generated submissions
+~~~~~~~~~~~~~~~~~~~~~~~~~
+The Linux kernel community as a whole is still exploring the new
+world of LLM-generated code. The NFSD maintainers will entertain
+submission of patches that are partially or wholly generated by
+LLM-based development tools. Such submissions are held to the
+same standards as submissions created entirely by human authors:
+
+- The human contributor identifies themselves via a Signed-off-by:
+ tag. This tag counts as a DoC.
+
+- The human contributor is solely responsible for code provenance
+ and any contamination by inadvertently-included code with a
+ conflicting license, as usual.
+
+- The human contributor must be able to answer and address review
+ questions. A patch description such as "This fixed my problem
+ but I don't know why" is not acceptable.
+
+- The contribution is subjected to the same test regimen as all
+ other submissions.
+
+- An indication (via a Generated-by: tag or otherwise) that the
+ contribution is LLM-generated is not required.
+
+It is easy to address review comments and fix requests in LLM
+generated code. So easy, in fact, that it becomes tempting to repost
+refreshed code immediately. Please resist that temptation.
+
+As always, please avoid reposting series revisions more than once
+every 24 hours.
+
+Clean-up patches
+~~~~~~~~~~~~~~~~
+The NFSD maintainers discourage patches which perform simple clean-
+ups, which are not in the context of other work. For example:
+
+* Addressing ``checkpatch.pl`` warnings after merge
+* Addressing :ref:`Local variable ordering<rcs>` issues
+* Addressing long-standing whitespace damage
+
+This is because it is felt that the churn that such changes produce
+comes at a greater cost than the value of such clean-ups.
+
+Conversely, spelling and grammar fixes are encouraged.
+
+Stable and LTS support
+----------------------
+Upstream NFSD continuous integration testing runs against LTS trees
+whenever they are updated.
+
+Please indicate when a patch containing a fix needs to be considered
+for LTS kernels, either via a Fixes: tag or explicit mention.
+
+Feature requests
+----------------
+There is no one way to make an official feature request, but
+discussion about the request should eventually make its way to
+the linux-nfs@vger.kernel.org mailing list for public review by
+the community.
+
+Subsystem boundaries
+~~~~~~~~~~~~~~~~~~~~
+NFSD itself is not much more than a protocol engine. This means its
+primary responsibility is to translate the NFS protocol into API
+calls in the Linux kernel. For example, NFSD is not responsible for
+knowing exactly how bytes or file attributes are managed on a block
+device. It relies on other kernel subsystems for that.
+
+If the subsystems on which NFSD relies do not implement a particular
+feature, even if the standard NFS protocols do support that feature,
+that usually means NFSD cannot provide that feature without
+substantial development work in other areas of the kernel.
+
+Specificity
+~~~~~~~~~~~
+Feature requests can come from anywhere, and thus can often be
+nebulous. A requester might not understand what a "use case" or
+"user story" is. These descriptive paradigms are often used by
+developers and architects to understand what is required of a
+design, but are terms of art in the software trade, not used in
+the everyday world.
+
+In order to prevent contributors and maintainers from becoming
+overwhelmed, we won't be afraid of saying "no" politely to
+underspecified requests.
+
+Community roles and their authority
+-----------------------------------
+The purpose of Linux subsystem communities is to provide expertise
+and active stewardship of a narrow set of source files in the Linux
+kernel. This can include managing user space tooling as well.
+
+To contextualize the structure of the Linux NFS community that
+is responsible for stewardship of the NFS server code base, we
+define the community roles here.
+
+- **Contributor** : Anyone who submits a code change, bug fix,
+ recommendation, documentation fix, and so on. A contributor can
+ submit regularly or infrequently.
+
+- **Outside Contributor** : A contributor who is not a regular actor
+ in the Linux NFS community. This can mean someone who contributes
+ to other parts of the kernel, or someone who just noticed a
+ misspelling in a comment and sent a patch.
+
+- **Reviewer** : Someone who is named in the MAINTAINERS file as a
+ reviewer is an area expert who can request changes to contributed
+ code, and expects that contributors will address the request.
+
+- **External Reviewer** : Someone who is not named in the
+ MAINTAINERS file as a reviewer, but who is an area expert.
+ Examples include Linux kernel contributors with networking,
+ security, or persistent storage expertise, or developers who
+ contribute primarily to other NFS implementations.
+
+One or more people will take on the following roles. These people
+are often generically referred to as "maintainers", and are
+identified in the MAINTAINERS file with the "M:" tag under the NFSD
+subsystem.
+
+- **Upstream Release Manager** : This role is responsible for
+ curating contributions into a branch, reviewing test results, and
+ then sending a pull request during merge windows. There is a
+ trust relationship between the release manager and Linus.
+
+- **Bug Triager** : Someone who is a first responder to bug reports
+ submitted to the linux-nfs mailing list or bug trackers, and helps
+ troubleshoot and identify next steps.
+
+- **Security Lead** : The security lead handles contacts from the
+ security community to resolve immediate issues, as well as dealing
+ with long-term security issues such as supply chain concerns. For
+ upstream, that's usually whether contributions violate licensing
+ or other intellectual property agreements.
+
+- **Testing Lead** : The testing lead builds and runs the test
+ infrastructure for the subsystem. The testing lead may ask for
+ patches to be dropped because of ongoing high defect rates.
+
+- **LTS Maintainer** : The LTS maintainer is responsible for managing
+ the Fixes: and Cc: stable annotations on patches, and seeing that
+ patches that cannot be automatically applied to LTS kernels get
+ proper manual backports as necessary.
+
+- **Community Manager** : This umpire role can be asked to call balls
+ and strikes during conflicts, but is also responsible for ensuring
+ the health of the relationships within the community and for
+ facilitating discussions on long-term topics such as how to manage
+ growing technical debt.
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 7233b04668fc..3397937ed838 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -211,7 +211,7 @@ test and set for you.
e.g.::
inode = iget_locked(sb, ino);
- if (inode->i_state & I_NEW) {
+ if (inode_state_read_once(inode) & I_NEW) {
err = read_inode_from_disk(inode);
if (err < 0) {
iget_failed(inode);
@@ -1286,6 +1286,11 @@ The vm_area_desc provides the minimum required information for a filesystem
to initialise state upon memory mapping of a file-backed region, and output
parameters for the file system to set this state.
+In nearly all cases, this is all that is required for a filesystem. However, if
+a filesystem needs to perform an operation such a pre-population of page tables,
+then that action can be specified in the vm_area_desc->action field, which can
+be configured using the mmap_action_*() helpers.
+
---
**mandatory**
@@ -1309,3 +1314,23 @@ a different length, use
vfs_parse_fs_qstr(fc, key, &QSTR_LEN(value, len))
instead.
+
+---
+
+**mandatory**
+
+vfs_mkdir() now returns a dentry - the one returned by ->mkdir(). If
+that dentry is different from the dentry passed in, including if it is
+an IS_ERR() dentry pointer, the original dentry is dput().
+
+When vfs_mkdir() returns an error, and so both dputs() the original
+dentry and doesn't provide a replacement, it also unlocks the parent.
+Consequently the return value from vfs_mkdir() can be passed to
+end_creating() and the parent will be unlocked precisely when necessary.
+
+---
+
+**mandatory**
+
+kill_litter_super() is gone; convert to DCACHE_PERSISTENT use (as all
+in-tree filesystems have done).
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 0b86a8022fa1..8256e857e2d7 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -553,7 +553,7 @@ otherwise.
kernel flags associated with the particular virtual memory area in two letter
encoded manner. The codes are the following:
- == =======================================
+ == =============================================================
rd readable
wr writeable
ex executable
@@ -591,7 +591,8 @@ encoded manner. The codes are the following:
sl sealed
lf lock on fault pages
dp always lazily freeable mapping
- == =======================================
+ gu maybe contains guard regions (if not set, definitely doesn't)
+ == =============================================================
Note that there is no guarantee that every flag and associated mnemonic will
be present in all further kernel releases. Things get changed, the flags may
diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.rst b/Documentation/filesystems/ramfs-rootfs-initramfs.rst
index fa4f81099cb4..a9d271e171c3 100644
--- a/Documentation/filesystems/ramfs-rootfs-initramfs.rst
+++ b/Documentation/filesystems/ramfs-rootfs-initramfs.rst
@@ -290,11 +290,11 @@ Why cpio rather than tar?
This decision was made back in December, 2001. The discussion started here:
- http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1538.html
+- https://lore.kernel.org/lkml/a03cke$640$1@cesium.transmeta.com/
And spawned a second thread (specifically on tar vs cpio), starting here:
- http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1587.html
+- https://lore.kernel.org/lkml/3C25A06D.7030408@zytor.com/
The quick and dirty summary version (which is no substitute for reading
the above threads) is:
@@ -310,7 +310,7 @@ the above threads) is:
either way about the archive format, and there are alternative tools,
such as:
- http://freecode.com/projects/afio
+ https://linux.die.net/man/1/afio
2) The cpio archive format chosen by the kernel is simpler and cleaner (and
thus easier to create and parse) than any of the (literally dozens of)
@@ -331,12 +331,12 @@ the above threads) is:
5) Al Viro made the decision (quote: "tar is ugly as hell and not going to be
supported on the kernel side"):
- http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1540.html
+ - https://lore.kernel.org/lkml/Pine.GSO.4.21.0112222109050.21702-100000@weyl.math.psu.edu/
explained his reasoning:
- - http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1550.html
- - http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1638.html
+ - https://lore.kernel.org/lkml/Pine.GSO.4.21.0112222240530.21702-100000@weyl.math.psu.edu/
+ - https://lore.kernel.org/lkml/Pine.GSO.4.21.0112230849550.23300-100000@weyl.math.psu.edu/
and, most importantly, designed and implemented the initramfs code.
diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index b7f35b07876a..8c8ce678148a 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -17,17 +17,18 @@ AMD refers to this feature as AMD Platform Quality of Service(AMD QoS).
This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo
flag bits:
-=============================================== ================================
-RDT (Resource Director Technology) Allocation "rdt_a"
-CAT (Cache Allocation Technology) "cat_l3", "cat_l2"
-CDP (Code and Data Prioritization) "cdp_l3", "cdp_l2"
-CQM (Cache QoS Monitoring) "cqm_llc", "cqm_occup_llc"
-MBM (Memory Bandwidth Monitoring) "cqm_mbm_total", "cqm_mbm_local"
-MBA (Memory Bandwidth Allocation) "mba"
-SMBA (Slow Memory Bandwidth Allocation) ""
-BMEC (Bandwidth Monitoring Event Configuration) ""
-ABMC (Assignable Bandwidth Monitoring Counters) ""
-=============================================== ================================
+=============================================================== ================================
+RDT (Resource Director Technology) Allocation "rdt_a"
+CAT (Cache Allocation Technology) "cat_l3", "cat_l2"
+CDP (Code and Data Prioritization) "cdp_l3", "cdp_l2"
+CQM (Cache QoS Monitoring) "cqm_llc", "cqm_occup_llc"
+MBM (Memory Bandwidth Monitoring) "cqm_mbm_total", "cqm_mbm_local"
+MBA (Memory Bandwidth Allocation) "mba"
+SMBA (Slow Memory Bandwidth Allocation) ""
+BMEC (Bandwidth Monitoring Event Configuration) ""
+ABMC (Assignable Bandwidth Monitoring Counters) ""
+SDCIAE (Smart Data Cache Injection Allocation Enforcement) ""
+=============================================================== ================================
Historically, new features were made visible by default in /proc/cpuinfo. This
resulted in the feature flags becoming hard to parse by humans. Adding a new
@@ -72,6 +73,11 @@ The 'info' directory contains information about the enabled
resources. Each resource has its own subdirectory. The subdirectory
names reflect the resource names.
+Most of the files in the resource's subdirectory are read-only, and
+describe properties of the resource. Resources that support global
+configuration options also include writable files that can be used
+to modify those settings.
+
Each subdirectory contains the following files with respect to
allocation:
@@ -90,12 +96,19 @@ related to allocation:
must be set when writing a mask.
"shareable_bits":
- Bitmask of shareable resource with other executing
- entities (e.g. I/O). User can use this when
- setting up exclusive cache partitions. Note that
- some platforms support devices that have their
- own settings for cache use which can over-ride
- these bits.
+ Bitmask of shareable resource with other executing entities
+ (e.g. I/O). Applies to all instances of this resource. User
+ can use this when setting up exclusive cache partitions.
+ Note that some platforms support devices that have their
+ own settings for cache use which can over-ride these bits.
+
+ When "io_alloc" is enabled, a portion of each cache instance can
+ be configured for shared use between hardware and software.
+ "bit_usage" should be used to see which portions of each cache
+ instance is configured for hardware use via "io_alloc" feature
+ because every cache instance can have its "io_alloc" bitmask
+ configured independently via "io_alloc_cbm".
+
"bit_usage":
Annotated capacity bitmasks showing how all
instances of the resource are used. The legend is:
@@ -109,16 +122,16 @@ related to allocation:
"H":
Corresponding region is used by hardware only
but available for software use. If a resource
- has bits set in "shareable_bits" but not all
- of these bits appear in the resource groups'
- schematas then the bits appearing in
- "shareable_bits" but no resource group will
- be marked as "H".
+ has bits set in "shareable_bits" or "io_alloc_cbm"
+ but not all of these bits appear in the resource
+ groups' schemata then the bits appearing in
+ "shareable_bits" or "io_alloc_cbm" but no
+ resource group will be marked as "H".
"X":
Corresponding region is available for sharing and
- used by hardware and software. These are the
- bits that appear in "shareable_bits" as
- well as a resource group's allocation.
+ used by hardware and software. These are the bits
+ that appear in "shareable_bits" or "io_alloc_cbm"
+ as well as a resource group's allocation.
"S":
Corresponding region is used by software
and available for sharing.
@@ -136,6 +149,77 @@ related to allocation:
"1":
Non-contiguous 1s value in CBM is supported.
+"io_alloc":
+ "io_alloc" enables system software to configure the portion of
+ the cache allocated for I/O traffic. File may only exist if the
+ system supports this feature on some of its cache resources.
+
+ "disabled":
+ Resource supports "io_alloc" but the feature is disabled.
+ Portions of cache used for allocation of I/O traffic cannot
+ be configured.
+ "enabled":
+ Portions of cache used for allocation of I/O traffic
+ can be configured using "io_alloc_cbm".
+ "not supported":
+ Support not available for this resource.
+
+ The feature can be modified by writing to the interface, for example:
+
+ To enable::
+
+ # echo 1 > /sys/fs/resctrl/info/L3/io_alloc
+
+ To disable::
+
+ # echo 0 > /sys/fs/resctrl/info/L3/io_alloc
+
+ The underlying implementation may reduce resources available to
+ general (CPU) cache allocation. See architecture specific notes
+ below. Depending on usage requirements the feature can be enabled
+ or disabled.
+
+ On AMD systems, io_alloc feature is supported by the L3 Smart
+ Data Cache Injection Allocation Enforcement (SDCIAE). The CLOSID for
+ io_alloc is the highest CLOSID supported by the resource. When
+ io_alloc is enabled, the highest CLOSID is dedicated to io_alloc and
+ no longer available for general (CPU) cache allocation. When CDP is
+ enabled, io_alloc routes I/O traffic using the highest CLOSID allocated
+ for the instruction cache (CDP_CODE), making this CLOSID no longer
+ available for general (CPU) cache allocation for both the CDP_CODE
+ and CDP_DATA resources.
+
+"io_alloc_cbm":
+ Capacity bitmasks that describe the portions of cache instances to
+ which I/O traffic from supported I/O devices are routed when "io_alloc"
+ is enabled.
+
+ CBMs are displayed in the following format:
+
+ <cache_id0>=<cbm>;<cache_id1>=<cbm>;...
+
+ Example::
+
+ # cat /sys/fs/resctrl/info/L3/io_alloc_cbm
+ 0=ffff;1=ffff
+
+ CBMs can be configured by writing to the interface.
+
+ Example::
+
+ # echo 1=ff > /sys/fs/resctrl/info/L3/io_alloc_cbm
+ # cat /sys/fs/resctrl/info/L3/io_alloc_cbm
+ 0=ffff;1=00ff
+
+ # echo "0=ff;1=f" > /sys/fs/resctrl/info/L3/io_alloc_cbm
+ # cat /sys/fs/resctrl/info/L3/io_alloc_cbm
+ 0=00ff;1=000f
+
+ When CDP is enabled "io_alloc_cbm" associated with the CDP_DATA and CDP_CODE
+ resources may reflect the same values. For example, values read from and
+ written to /sys/fs/resctrl/info/L3DATA/io_alloc_cbm may be reflected by
+ /sys/fs/resctrl/info/L3CODE/io_alloc_cbm and vice versa.
+
Memory bandwidth(MB) subdirectory contains the following files
with respect to allocation:
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 4f13b01e42eb..670ba66b60e4 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -1213,6 +1213,10 @@ otherwise noted.
file-backed memory mapping, most notably establishing relevant
private state and VMA callbacks.
+ If further action such as pre-population of page tables is required,
+ this can be specified by the vm_area_desc->action field and related
+ parameters.
+
Note that the file operations are implemented by the specific
filesystem in which the inode resides. When opening a device node
(character or block special) most filesystems will call special
diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
index 8cbcd3c26434..3d9233f403db 100644
--- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
+++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
@@ -105,10 +105,8 @@ occur; this capability aids both strategies.
TLDR; Show Me the Code!
-----------------------
-Code is posted to the kernel.org git trees as follows:
-`kernel changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-symlink>`_,
-`userspace changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-scan-service>`_, and
-`QA test changes <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=repair-dirs>`_.
+Kernel and userspace code has been fully merged as of October 2025.
+
Each kernel patchset adding an online repair function will use the same branch
name across the kernel, xfsprogs, and fstests git repos.
@@ -249,7 +247,7 @@ sharing and lock acquisition rules as the regular filesystem.
This means that scrub cannot take *any* shortcuts to save time, because doing
so could lead to concurrency problems.
In other words, online fsck is not a complete replacement for offline fsck, and
-a complete run of online fsck may take longer than online fsck.
+a complete run of online fsck may take longer than offline fsck.
However, both of these limitations are acceptable tradeoffs to satisfy the
different motivations of online fsck, which are to **minimize system downtime**
and to **increase predictability of operation**.
@@ -764,12 +762,8 @@ allow the online fsck developers to compare online fsck against offline fsck,
and they enable XFS developers to find deficiencies in the code base.
Proposed patchsets include
-`general fuzzer improvements
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fuzzer-improvements>`_,
`fuzzing baselines
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fuzz-baseline>`_,
-and `improvements in fuzz testing comprehensiveness
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=more-fuzz-testing>`_.
+<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fuzz-baseline>`_.
Stress Testing
--------------
@@ -801,11 +795,6 @@ Success is defined by the ability to run all of these tests without observing
any unexpected filesystem shutdowns due to corrupted metadata, kernel hang
check warnings, or any other sort of mischief.
-Proposed patchsets include `general stress testing
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes>`_
-and the `evolution of existing per-function stress testing
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=refactor-scrub-stress>`_.
-
4. User Interface
=================
@@ -886,10 +875,6 @@ apply as nice of a priority to IO and CPU scheduling as possible.
This measure was taken to minimize delays in the rest of the filesystem.
No such hardening has been performed for the cron job.
-Proposed patchset:
-`Enabling the xfs_scrub background service
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-scan-service>`_.
-
Health Reporting
----------------
@@ -912,13 +897,6 @@ notifications and initiate a repair?
*Answer*: These questions remain unanswered, but should be a part of the
conversation with early adopters and potential downstream users of XFS.
-Proposed patchsets include
-`wiring up health reports to correction returns
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=corruption-health-reports>`_
-and
-`preservation of sickness info during memory reclaim
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=indirect-health-reporting>`_.
-
5. Kernel Algorithms and Data Structures
========================================
@@ -1310,21 +1288,6 @@ Space allocation records are cross-referenced as follows:
are there the same number of reverse mapping records for each block as the
reference count record claims?
-Proposed patchsets are the series to find gaps in
-`refcount btree
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-detect-refcount-gaps>`_,
-`inode btree
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-detect-inobt-gaps>`_, and
-`rmap btree
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-detect-rmapbt-gaps>`_ records;
-to find
-`mergeable records
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-detect-mergeable-records>`_;
-and to
-`improve cross referencing with rmap
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-strengthen-rmap-checking>`_
-before starting a repair.
-
Checking Extended Attributes
````````````````````````````
@@ -1756,10 +1719,6 @@ For scrub, the drain works as follows:
To avoid polling in step 4, the drain provides a waitqueue for scrub threads to
be woken up whenever the intent count drops to zero.
-The proposed patchset is the
-`scrub intent drain series
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-drain-intents>`_.
-
.. _jump_labels:
Static Keys (aka Jump Label Patching)
@@ -2036,10 +1995,6 @@ The ``xfarray_store_anywhere`` function is used to insert a record in any
null record slot in the bag; and the ``xfarray_unset`` function removes a
record from the bag.
-The proposed patchset is the
-`big in-memory array
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=big-array>`_.
-
Iterating Array Elements
^^^^^^^^^^^^^^^^^^^^^^^^
@@ -2172,10 +2127,6 @@ However, it should be noted that these repair functions only use blob storage
to cache a small number of entries before adding them to a temporary ondisk
file, which is why compaction is not required.
-The proposed patchset is at the start of the
-`extended attribute repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-xattrs>`_ series.
-
.. _xfbtree:
In-Memory B+Trees
@@ -2214,11 +2165,6 @@ xfiles enables reuse of the entire btree library.
Btrees built atop an xfile are collectively known as ``xfbtrees``.
The next few sections describe how they actually work.
-The proposed patchset is the
-`in-memory btree
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=in-memory-btrees>`_
-series.
-
Using xfiles as a Buffer Cache Target
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -2459,14 +2405,6 @@ This enables the log to release the old EFI to keep the log moving forwards.
EFIs have a role to play during the commit and reaping phases; please see the
next section and the section about :ref:`reaping<reaping>` for more details.
-Proposed patchsets are the
-`bitmap rework
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-bitmap-rework>`_
-and the
-`preparation for bulk loading btrees
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-prep-for-bulk-loading>`_.
-
-
Writing the New Tree
````````````````````
@@ -2623,11 +2561,6 @@ The number of records for the inode btree is the number of xfarray records,
but the record count for the free inode btree has to be computed as inode chunk
records are stored in the xfarray.
-The proposed patchset is the
-`AG btree repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-ag-btrees>`_
-series.
-
Case Study: Rebuilding the Space Reference Counts
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -2716,11 +2649,6 @@ Reverse mappings are added to the bag using ``xfarray_store_anywhere`` and
removed via ``xfarray_unset``.
Bag members are examined through ``xfarray_iter`` loops.
-The proposed patchset is the
-`AG btree repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-ag-btrees>`_
-series.
-
Case Study: Rebuilding File Fork Mapping Indices
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -2757,11 +2685,6 @@ EXTENTS format instead of BMBT, which may require a conversion.
Third, the incore extent map must be reloaded carefully to avoid disturbing
any delayed allocation extents.
-The proposed patchset is the
-`file mapping repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-file-mappings>`_
-series.
-
.. _reaping:
Reaping Old Metadata Blocks
@@ -2843,11 +2766,6 @@ blocks.
As stated earlier, online repair functions use very large transactions to
minimize the chances of this occurring.
-The proposed patchset is the
-`preparation for bulk loading btrees
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-prep-for-bulk-loading>`_
-series.
-
Case Study: Reaping After a Regular Btree Repair
````````````````````````````````````````````````
@@ -2943,11 +2861,6 @@ When the walk is complete, the bitmap disunion operation ``(ag_owner_bitmap &
btrees.
These blocks can then be reaped using the methods outlined above.
-The proposed patchset is the
-`AG btree repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-ag-btrees>`_
-series.
-
.. _rmap_reap:
Case Study: Reaping After Repairing Reverse Mapping Btrees
@@ -2972,11 +2885,6 @@ methods outlined above.
The rest of the process of rebuildng the reverse mapping btree is discussed
in a separate :ref:`case study<rmap_repair>`.
-The proposed patchset is the
-`AG btree repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-ag-btrees>`_
-series.
-
Case Study: Rebuilding the AGFL
```````````````````````````````
@@ -3024,11 +2932,6 @@ more complicated, because computing the correct value requires traversing the
forks, or if that fails, leaving the fields invalid and waiting for the fork
fsck functions to run.
-The proposed patchset is the
-`inode
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-inodes>`_
-repair series.
-
Quota Record Repairs
--------------------
@@ -3045,11 +2948,6 @@ checking are obviously bad limits and timer values.
Quota usage counters are checked, repaired, and discussed separately in the
section about :ref:`live quotacheck <quotacheck>`.
-The proposed patchset is the
-`quota
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-quota>`_
-repair series.
-
.. _fscounters:
Freezing to Fix Summary Counters
@@ -3145,11 +3043,6 @@ long enough to check and correct the summary counters.
| This bug was fixed in Linux 5.17. |
+--------------------------------------------------------------------------+
-The proposed patchset is the
-`summary counter cleanup
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-fscounters>`_
-series.
-
Full Filesystem Scans
---------------------
@@ -3277,15 +3170,6 @@ Second, if the incore inode is stuck in some intermediate state, the scan
coordinator must release the AGI and push the main filesystem to get the inode
back into a loadable state.
-The proposed patches are the
-`inode scanner
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-iscan>`_
-series.
-The first user of the new functionality is the
-`online quotacheck
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-quotacheck>`_
-series.
-
Inode Management
````````````````
@@ -3381,12 +3265,6 @@ To capture these nuances, the online fsck code has a separate ``xchk_irele``
function to set or clear the ``DONTCACHE`` flag to get the required release
behavior.
-Proposed patchsets include fixing
-`scrub iget usage
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-iget-fixes>`_ and
-`dir iget usage
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-dir-iget-fixes>`_.
-
.. _ilocking:
Locking Inodes
@@ -3443,11 +3321,6 @@ If the dotdot entry changes while the directory is unlocked, then a move or
rename operation must have changed the child's parentage, and the scan can
exit early.
-The proposed patchset is the
-`directory repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-dirs>`_
-series.
-
.. _fshooks:
Filesystem Hooks
@@ -3594,11 +3467,6 @@ The inode scan APIs are pretty simple:
- ``xchk_iscan_teardown`` to finish the scan
-This functionality is also a part of the
-`inode scanner
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-iscan>`_
-series.
-
.. _quotacheck:
Case Study: Quota Counter Checking
@@ -3686,11 +3554,6 @@ needing to hold any locks for a long duration.
If repairs are desired, the real and shadow dquots are locked and their
resource counts are set to the values in the shadow dquot.
-The proposed patchset is the
-`online quotacheck
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-quotacheck>`_
-series.
-
.. _nlinks:
Case Study: File Link Count Checking
@@ -3744,11 +3607,6 @@ shadow information.
If no parents are found, the file must be :ref:`reparented <orphanage>` to the
orphanage to prevent the file from being lost forever.
-The proposed patchset is the
-`file link count repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-nlinks>`_
-series.
-
.. _rmap_repair:
Case Study: Rebuilding Reverse Mapping Records
@@ -3828,11 +3686,6 @@ scan for reverse mapping records.
12. Free the xfbtree now that it not needed.
-The proposed patchset is the
-`rmap repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-rmap-btree>`_
-series.
-
Staging Repairs with Temporary Files on Disk
--------------------------------------------
@@ -3971,11 +3824,6 @@ Once a good copy of a data file has been constructed in a temporary file, it
must be conveyed to the file being repaired, which is the topic of the next
section.
-The proposed patches are in the
-`repair temporary files
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-tempfiles>`_
-series.
-
Logged File Content Exchanges
-----------------------------
@@ -4025,11 +3873,6 @@ The new ``XFS_SB_FEAT_INCOMPAT_EXCHRANGE`` incompatible feature flag
in the superblock protects these new log item records from being replayed on
old kernels.
-The proposed patchset is the
-`file contents exchange
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=atomic-file-updates>`_
-series.
-
+--------------------------------------------------------------------------+
| **Sidebar: Using Log-Incompatible Feature Flags** |
+--------------------------------------------------------------------------+
@@ -4323,11 +4166,6 @@ To repair the summary file, write the xfile contents into the temporary file
and use atomic mapping exchange to commit the new contents.
The temporary file is then reaped.
-The proposed patchset is the
-`realtime summary repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-rtsummary>`_
-series.
-
Case Study: Salvaging Extended Attributes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -4369,11 +4207,6 @@ Salvaging extended attributes is done as follows:
4. Reap the temporary file.
-The proposed patchset is the
-`extended attribute repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-xattrs>`_
-series.
-
Fixing Directories
------------------
@@ -4448,11 +4281,6 @@ Unfortunately, the current dentry cache design doesn't provide a means to walk
every child dentry of a specific directory, which makes this a hard problem.
There is no known solution.
-The proposed patchset is the
-`directory repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-dirs>`_
-series.
-
Parent Pointers
```````````````
@@ -4612,11 +4440,6 @@ a :ref:`directory entry live update hook <liveupdate>` as follows:
7. Reap the temporary directory.
-The proposed patchset is the
-`parent pointers directory repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-fsck>`_
-series.
-
Case Study: Repairing Parent Pointers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -4662,11 +4485,6 @@ directory reconstruction:
8. Reap the temporary file.
-The proposed patchset is the
-`parent pointers repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-fsck>`_
-series.
-
Digression: Offline Checking of Parent Pointers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -4755,11 +4573,6 @@ connectivity checks:
4. Move on to examining link counts, as we do today.
-The proposed patchset is the
-`offline parent pointers repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-fsck>`_
-series.
-
Rebuilding directories from parent pointers in offline repair would be very
challenging because xfs_repair currently uses two single-pass scans of the
filesystem during phases 3 and 4 to decide which files are corrupt enough to be
@@ -4903,12 +4716,6 @@ Repairing the directory tree works as follows:
6. If the subdirectory has zero paths, attach it to the lost and found.
-The proposed patches are in the
-`directory tree repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-directory-tree>`_
-series.
-
-
.. _orphanage:
The Orphanage
@@ -4973,11 +4780,6 @@ Orphaned files are adopted by the orphanage as follows:
7. If a runtime error happens, call ``xrep_adoption_cancel`` to release all
resources.
-The proposed patches are in the
-`orphanage adoption
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-orphanage>`_
-series.
-
6. Userspace Algorithms and Data Structures
===========================================
@@ -5091,14 +4893,6 @@ first workqueue's workers until the backlog eases.
This doesn't completely solve the balancing problem, but reduces it enough to
move on to more pressing issues.
-The proposed patchsets are the scrub
-`performance tweaks
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-performance-tweaks>`_
-and the
-`inode scan rebalance
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-iscan-rebalance>`_
-series.
-
.. _scrubrepair:
Scheduling Repairs
@@ -5179,20 +4973,6 @@ immediately.
Corrupt file data blocks reported by phase 6 cannot be recovered by the
filesystem.
-The proposed patchsets are the
-`repair warning improvements
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-better-repair-warnings>`_,
-refactoring of the
-`repair data dependency
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-repair-data-deps>`_
-and
-`object tracking
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-object-tracking>`_,
-and the
-`repair scheduling
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-repair-scheduling>`_
-improvement series.
-
Checking Names for Confusable Unicode Sequences
-----------------------------------------------
@@ -5372,6 +5152,8 @@ The extra flexibility enables several new use cases:
This emulates an atomic device write in software, and can support arbitrary
scattered writes.
+(This functionality was merged into mainline as of 2025)
+
Vectorized Scrub
----------------
@@ -5393,13 +5175,7 @@ It is hoped that ``io_uring`` will pick up enough of this functionality that
online fsck can use that instead of adding a separate vectored scrub system
call to XFS.
-The relevant patchsets are the
-`kernel vectorized scrub
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=vectorized-scrub>`_
-and
-`userspace vectorized scrub
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=vectorized-scrub>`_
-series.
+(This functionality was merged into mainline as of 2025)
Quality of Service Targets for Scrub
------------------------------------