summaryrefslogtreecommitdiff
path: root/fs/nfs
AgeCommit message (Collapse)Author
2 daysMerge tag 'vfs-6.19-rc1.directory.delegations' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull directory delegations update from Christian Brauner: "This contains the work for recall-only directory delegations for knfsd. Add support for simple, recallable-only directory delegations. This was decided at the fall NFS Bakeathon where the NFS client and server maintainers discussed how to merge directory delegation support. The approach starts with recallable-only delegations for several reasons: 1. RFC8881 has gaps that are being addressed in RFC8881bis. In particular, it requires directory position information for CB_NOTIFY callbacks, which is difficult to implement properly under Linux. The spec is being extended to allow that information to be omitted. 2. Client-side support for CB_NOTIFY still lags. The client side involves heuristics about when to request a delegation. 3. Early indication shows simple, recallable-only delegations can help performance. Anna Schumaker mentioned seeing a multi-minute speedup in xfstests runs with them enabled. With these changes, userspace can also request a read lease on a directory that will be recalled on conflicting accesses. This may be useful for applications like Samba. Users can disable leases altogether via the fs.leases-enable sysctl if needed. VFS changes: - Dedicated Type for Delegations Introduce struct delegated_inode to track inodes that may have delegations that need to be broken. This replaces the previous approach of passing raw inode pointers through the delegation breaking code paths, providing better type safety and clearer semantics for the delegation machinery. - Break parent directory delegations in open(..., O_CREAT) codepath - Allow mkdir to wait for delegation break on parent - Allow rmdir to wait for delegation break on parent - Add try_break_deleg calls for parents to vfs_link(), vfs_rename(), and vfs_unlink() - Make vfs_create(), vfs_mknod(), and vfs_symlink() break delegations on parent directory - Clean up argument list for vfs_create() - Expose delegation support to userland Filelock changes: - Make lease_alloc() take a flags argument - Rework the __break_lease API to use flags - Add struct delegated_inode - Push the S_ISREG check down to ->setlease handlers - Lift the ban on directory leases in generic_setlease NFSD changes: - Allow filecache to hold S_IFDIR files - Allow DELEGRETURN on directories - Wire up GET_DIR_DELEGATION handling Fixes: - Fix kernel-doc warnings in __fcntl_getlease - Add needed headers for new struct delegation definition" * tag 'vfs-6.19-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: add needed headers for new struct delegation definition filelock: __fcntl_getlease: fix kernel-doc warnings vfs: expose delegation support to userland nfsd: wire up GET_DIR_DELEGATION handling nfsd: allow DELEGRETURN on directories nfsd: allow filecache to hold S_IFDIR files filelock: lift the ban on directory leases in generic_setlease vfs: make vfs_symlink break delegations on parent dir vfs: make vfs_mknod break delegations on parent directory vfs: make vfs_create break delegations on parent directory vfs: clean up argument list for vfs_create() vfs: break parent dir delegations in open(..., O_CREAT) codepath vfs: allow rmdir to wait for delegation break on parent vfs: allow mkdir to wait for delegation break on parent vfs: add try_break_deleg calls for parents to vfs_{link,rename,unlink} filelock: push the S_ISREG check down to ->setlease handlers filelock: add struct delegated_inode filelock: rework the __break_lease API to use flags filelock: make lease_alloc() take a flags argument
2 daysMerge tag 'kernel-6.19-rc1.cred' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull cred guard updates from Christian Brauner: "This contains substantial credential infrastructure improvements adding guard-based credential management that simplifies code and eliminates manual reference counting in many subsystems. Features: - Kernel Credential Guards Add with_kernel_creds() and scoped_with_kernel_creds() guards that allow using the kernel credentials without allocating and copying them. This was requested by Linus after seeing repeated prepare_kernel_creds() calls that duplicate the kernel credentials only to drop them again later. The new guards completely avoid the allocation and never expose the temporary variable to hold the kernel credentials anywhere in callers. - Generic Credential Guards Add scoped_with_creds() guards for the common override_creds() and revert_creds() pattern. This builds on earlier work that made override_creds()/revert_creds() completely reference count free. - Prepare Credential Guards Add prepare credential guards for the more complex pattern of preparing a new set of credentials and overriding the current credentials with them: - prepare_creds() - modify new creds - override_creds() - revert_creds() - put_cred() Cleanups: - Make init_cred static since it should not be directly accessed - Add kernel_cred() helper to properly access the kernel credentials - Fix scoped_class() macro that was introduced two cycles ago - coredump: split out do_coredump() from vfs_coredump() for cleaner credential handling - coredump: move revert_cred() before coredump_cleanup() - coredump: mark struct mm_struct as const - coredump: pass struct linux_binfmt as const - sev-dev: use guard for path" * tag 'kernel-6.19-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits) trace: use override credential guard trace: use prepare credential guard coredump: use override credential guard coredump: use prepare credential guard coredump: split out do_coredump() from vfs_coredump() coredump: mark struct mm_struct as const coredump: pass struct linux_binfmt as const coredump: move revert_cred() before coredump_cleanup() sev-dev: use override credential guards sev-dev: use prepare credential guard sev-dev: use guard for path cred: add prepare credential guard net/dns_resolver: use credential guards in dns_query() cgroup: use credential guards in cgroup_attach_permissions() act: use credential guards in acct_write_process() smb: use credential guards in cifs_get_spnego_key() nfs: use credential guards in nfs_idmap_get_key() nfs: use credential guards in nfs_local_call_write() nfs: use credential guards in nfs_local_call_read() erofs: use credential guards ...
3 daysMerge tag 'vfs-6.19-rc1.inode' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode updates from Christian Brauner: "Features: - Hide inode->i_state behind accessors. Open-coded accesses prevent asserting they are done correctly. One obvious aspect is locking, but significantly more can be checked. For example it can be detected when the code is clearing flags which are already missing, or is setting flags when it is illegal (e.g., I_FREEING when ->i_count > 0) - Provide accessors for ->i_state, converts all filesystems using coccinelle and manual conversions (btrfs, ceph, smb, f2fs, gfs2, overlayfs, nilfs2, xfs), and makes plain ->i_state access fail to compile - Rework I_NEW handling to operate without fences, simplifying the code after the accessor infrastructure is in place Cleanups: - Move wait_on_inode() from writeback.h to fs.h - Spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb for clarity - Cosmetic fixes to LRU handling - Push list presence check into inode_io_list_del() - Touch up predicts in __d_lookup_rcu() - ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage - Assert on ->i_count in iput_final() - Assert ->i_lock held in __iget() Fixes: - Add missing fences to I_NEW handling" * tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits) dcache: touch up predicts in __d_lookup_rcu() fs: push list presence check into inode_io_list_del() fs: cosmetic fixes to lru handling fs: rework I_NEW handling to operate without fences fs: make plain ->i_state access fail to compile xfs: use the new ->i_state accessors nilfs2: use the new ->i_state accessors overlayfs: use the new ->i_state accessors gfs2: use the new ->i_state accessors f2fs: use the new ->i_state accessors smb: use the new ->i_state accessors ceph: use the new ->i_state accessors btrfs: use the new ->i_state accessors Manual conversion to use ->i_state accessors of all places not covered by coccinelle Coccinelle-based conversion to use ->i_state accessors fs: provide accessors for ->i_state fs: spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb fs: move wait_on_inode() from writeback.h to fs.h fs: add missing fences to I_NEW handling ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage ...
2025-11-12filelock: push the S_ISREG check down to ->setlease handlersJeff Layton
When nfsd starts requesting directory delegations, setlease handlers may see requests for leases on directories. Push the !S_ISREG check down into the non-trivial setlease handlers, so we can selectively enable them where they're supported. FUSE is special: It's the only filesystem that supports atomic_open and allows kernel-internal leases. atomic_open is issued when the VFS doesn't know the state of the dentry being opened. If the file doesn't exist, it may be created, in which case the dir lease should be broken. The existing kernel-internal lease implementation has no provision for this. Ensure that we don't allow directory leases by default going forward by explicitly disabling them there. Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-4-52f3feebb2f2@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-10NFS: Fix LTP test failures when timestamps are delegatedDai Ngo
The utimes01 and utime06 tests fail when delegated timestamps are enabled, specifically in subtests that modify the atime and mtime fields using the 'nobody' user ID. The problem can be reproduced as follow: # echo "/media *(rw,no_root_squash,sync)" >> /etc/exports # export -ra # mount -o rw,nfsvers=4.2 127.0.0.1:/media /tmpdir # cd /opt/ltp # ./runltp -d /tmpdir -s utimes01 # ./runltp -d /tmpdir -s utime06 This issue occurs because nfs_setattr does not verify the inode's UID against the caller's fsuid when delegated timestamps are permitted for the inode. This patch adds the UID check and if it does not match then the request is sent to the server for permission checking. Fixes: e12912d94137 ("NFSv4: Add support for delegated atime and mtime attributes") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10NFSv4: Fix an incorrect parameter when calling nfs4_call_sync()Trond Myklebust
The Smatch static checker noted that in _nfs4_proc_lookupp(), the flag RPC_TASK_TIMEOUT is being passed as an argument to nfs4_init_sequence(), which is clearly incorrect. Since LOOKUPP is an idempotent operation, nfs4_init_sequence() should not ask the server to cache the result. The RPC_TASK_TIMEOUT flag needs to be passed down to the RPC layer. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Fixes: 76998ebb9158 ("NFSv4: Observe the NFS_MOUNT_SOFTREVAL flag in _nfs4_proc_lookupp") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10NFS: sysfs: fix leak when nfs_client kobject add failsYang Xiuwei
If adding the second kobject fails, drop both references to avoid sysfs residue and memory leak. Fixes: e96f9268eea6 ("NFS: Make all of /sys/fs/nfs network-namespace unique") Signed-off-by: Yang Xiuwei <yangxiuwei@kylinos.cn> Reviewed-by: Benjamin Coddington <ben.coddington@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10NFSv2/v3: Fix error handling in nfs_atomic_open_v23()Trond Myklebust
When nfs_do_create() returns an EEXIST error, it means that a regular file could not be created. That could mean that a symlink needs to be resolved. If that's the case, a lookup needs to be kicked off. Reported-by: Stephen Abbene <sabbene87@gmail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=220710 Fixes: 7c6c5249f061 ("NFS: add atomic_open for NFSv3 to handle O_TRUNC correctly.") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10nfs/localio: do not issue misaligned DIO out-of-orderMike Snitzer
From https://lore.kernel.org/linux-nfs/aQHASIumLJyOoZGH@infradead.org/ On Wed, Oct 29, 2025 at 12:20:40AM -0700, Christoph Hellwig wrote: > On Mon, Oct 27, 2025 at 12:18:30PM -0400, Mike Snitzer wrote: > > LOCALIO's misaligned DIO will issue head/tail followed by O_DIRECT > > middle (via AIO completion of that aligned middle). So out of order > > relative to file offset. > > That's in general a really bad idea. It will obviously work, but > both on SSDs and out of place write file systems it is a sure way > to increase your garbage collection overhead a lot down the line. Fix this by never issuing misaligned DIO out of order. This fix means the DIO-aligned middle will only use AIO completion if there is no misaligned end segment. Otherwise, all 3 segments of a misaligned DIO will be issued without AIO completion to ensure file offset increases properly for all partial READ or WRITE situations. Factoring out nfs_local_iter_setup() helps standardize repetitive nfs_local_iters_setup_dio() code and is inspired by cleanup work that Chuck Lever did on the NFSD Direct code. Fixes: c817248fc831 ("nfs/localio: add proper O_DIRECT support for READ and WRITE") Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10nfs/localio: Ensure DIO WRITE's IO on stable storage upon completionMike Snitzer
LOCALIO's misaligned DIO WRITE support requires synchronous IO for any misaligned head and/or tail that are issued using buffered IO. In addition, it is important that the O_DIRECT middle be on stable storage upon its completion via AIO. Otherwise, a misaligned DIO WRITE could mix buffered IO for the head/tail and direct IO for the DIO-aligned middle -- which could lead to problems associated with deferred writes to stable storage (such as out of order partial completions causing incorrect advancement of the file's offset, etc). Fixes: c817248fc831 ("nfs/localio: add proper O_DIRECT support for READ and WRITE") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10nfs/localio: backfill missing partial read support for misaligned DIOMike Snitzer
Misaligned DIO read can be split into 3 IOs, must handle potential for short read from each component IO (follows same pattern used for handling partial writes, except upper layer read code handles advancing offset before retry). Fixes: c817248fc831 ("nfs/localio: add proper O_DIRECT support for READ and WRITE") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10nfs/localio: add refcounting for each iocb IO associated with NFS pgio headerMike Snitzer
Improve completion handling of as many as 3 IOs associated with each misaligned DIO by using a atomic_t to track completion of each IO. Update nfs_local_pgio_done() to use precise atomic_t accounting for remaining iov_iter (up to 3) associated with each iocb, so that each NFS LOCALIO pgio header is only released after all IOs have completed. But also allow early return if/when a short read or write occurs. Fixes reported BUG: KASAN: slab-use-after-free in nfs_local_call_read: https://lore.kernel.org/linux-nfs/aPSvi5Yr2lGOh5Jh@dell-per750-06-vm-07.rhts.eng.pek2.redhat.com/ Reported-by: Yongcheng Yang <yoyang@redhat.com> Fixes: c817248fc831 ("nfs/localio: add proper O_DIRECT support for READ and WRITE") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10nfs/localio: remove unecessary ENOTBLK handling in DIO WRITE supportMike Snitzer
Each filesystem is meant to fallback to retrying DIO in terms buffered IO when it might encounter -ENOTBLK when issuing DIO (which can happen if the VFS cannot invalidate the page cache). So NFS doesn't need special handling for -ENOTBLK. Also, explicitly initialize a couple DIO related iocb members rather than simply rely on data structure zeroing. Fixes: c817248fc831 ("nfs/localio: add proper O_DIRECT support for READ and WRITE") Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10NFS: Check the TLS certificate fields in nfs_match_client()Trond Myklebust
If the TLS security policy is of type RPC_XPRTSEC_TLS_X509, then the cert_serial and privkey_serial fields need to match as well since they define the client's identity, as presented to the server. Fixes: 90c9550a8d65 ("NFS: support the kernel keyring for TLS") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10pnfs: Set transport security policy to RPC_XPRTSEC_NONE unless using TLSTrond Myklebust
The default setting for the transport security policy must be RPC_XPRTSEC_NONE, when using a TCP or RDMA connection without TLS. Conversely, when using TLS, the security policy needs to be set. Fixes: 6c0a8c5fcf71 ("NFS: Have struct nfs_client carry a TLS policy field") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10pnfs: Fix TLS logic in _nfs4_pnfs_v4_ds_connect()Trond Myklebust
Don't try to add an RDMA transport to a client that is already marked as being a TCP/TLS transport. Fixes: a35518cae4b3 ("NFSv4.1/pnfs: fix NFS with TLS in pnfs") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10pnfs: Fix TLS logic in _nfs4_pnfs_v3_ds_connect()Trond Myklebust
Don't try to add an RDMA transport to a client that is already marked as being a TCP/TLS transport. Fixes: 04a15263662a ("pnfs/flexfiles: connect to NFSv3 DS using TLS if MDS connection uses TLS") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-04nfs: use credential guards in nfs_idmap_get_key()Christian Brauner
Use credential guards for scoped credential override with automatic restoration on scope exit. Link: https://patch.msgid.link/20251103-work-creds-guards-simple-v1-12-a3e156839e7f@kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-04nfs: use credential guards in nfs_local_call_write()Christian Brauner
Use credential guards for scoped credential override with automatic restoration on scope exit. Link: https://patch.msgid.link/20251103-work-creds-guards-simple-v1-11-a3e156839e7f@kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-04nfs: use credential guards in nfs_local_call_read()Christian Brauner
Use credential guards for scoped credential override with automatic restoration on scope exit. Link: https://patch.msgid.link/20251103-work-creds-guards-simple-v1-10-a3e156839e7f@kernel.org Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-20Coccinelle-based conversion to use ->i_state accessorsMateusz Guzik
All places were patched by coccinelle with the default expecting that ->i_lock is held, afterwards entries got fixed up by hand to use unlocked variants as needed. The script: @@ expression inode, flags; @@ - inode->i_state & flags + inode_state_read(inode) & flags @@ expression inode, flags; @@ - inode->i_state &= ~flags + inode_state_clear(inode, flags) @@ expression inode, flag1, flag2; @@ - inode->i_state &= ~flag1 & ~flag2 + inode_state_clear(inode, flag1 | flag2) @@ expression inode, flags; @@ - inode->i_state |= flags + inode_state_set(inode, flags) @@ expression inode, flags; @@ - inode->i_state = flags + inode_state_assign(inode, flags) @@ expression inode, flags; @@ - flags = inode->i_state + flags = inode_state_read(inode) @@ expression inode, flags; @@ - READ_ONCE(inode->i_state) & flags + inode_state_read(inode) & flags Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-13NFS4: Fix state renewals missing after bootJoshua Watt
Since the last renewal time was initialized to 0 and jiffies start counting at -5 minutes, any clients connected in the first 5 minutes after a reboot would have their renewal timer set to a very long interval. If the connection was idle, this would result in the client state timing out on the server and the next call to the server would return NFS4ERR_BADSESSION. Fix this by initializing the last renewal time to the current jiffies instead of 0. Signed-off-by: Joshua Watt <jpewhacker@gmail.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-10-13NFS: check if suid/sgid was cleared after a write as neededScott Mayhew
I noticed xfstests generic/193 and generic/355 started failing against knfsd after commit e7a8ebc305f2 ("NFSD: Offer write delegation for OPEN with OPEN4_SHARE_ACCESS_WRITE"). I ran those same tests against ONTAP (which has had write delegation support for a lot longer than knfsd) and they fail there too... so while it's a new failure against knfsd, it isn't an entirely new failure. Add the NFS_INO_REVAL_FORCED flag so that the presence of a delegation doesn't keep the inode from being revalidated to fetch the updated mode. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-10-13NFS4: Apply delay_retrans to async operationsJoshua Watt
The setting of delay_retrans is applied to synchronous RPC operations because the retransmit count is stored in same struct nfs4_exception that is passed each time an error is checked. However, for asynchronous operations (READ, WRITE, LOCKU, CLOSE, DELEGRETURN), a new struct nfs4_exception is made on the stack each time the task callback is invoked. This means that the retransmit count is always zero and thus delay_retrans never takes effect. Apply delay_retrans to these operations by tracking and updating their retransmit count. Change-Id: Ieb33e046c2b277cb979caa3faca7f52faf0568c9 Signed-off-by: Joshua Watt <jpewhacker@gmail.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-10-13NFSv4/flexfiles: fix to allocate mirror->dss before useMike Snitzer
Move mirror_array's dss_count initialization and dss allocation to ff_layout_alloc_mirror(), just before the loop that initializes each nfs4_ff_layout_ds_stripe's nfs_file_localio. Also handle NULL return from kcalloc() and remove one level of indent in ff_layout_alloc_mirror(). This commit fixes dangling nfsd_serv refcount issues seen when using NFS LOCALIO and then attempting to stop the NFSD service. Fixes: 20b1d75fb840 ("NFSv4/flexfiles: Add support for striped layouts") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-10-03Merge tag 'pull-f_path' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull file->f_path constification from Al Viro: "Only one thing was modifying ->f_path of an opened file - acct(2). Massaging that away and constifying a bunch of struct path * arguments in functions that might be given &file->f_path ends up with the situation where we can turn ->f_path into an anon union of const struct path f_path and struct path __f_path, the latter modified only in a few places in fs/{file_table,open,namei}.c, all for struct file instances that are yet to be opened" * tag 'pull-f_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (23 commits) Have cc(1) catch attempts to modify ->f_path kernel/acct.c: saner struct file treatment configfs:get_target() - release path as soon as we grab configfs_item reference apparmor/af_unix: constify struct path * arguments ovl_is_real_file: constify realpath argument ovl_sync_file(): constify path argument ovl_lower_dir(): constify path argument ovl_get_verity_digest(): constify path argument ovl_validate_verity(): constify {meta,data}path arguments ovl_ensure_verity_loaded(): constify datapath argument ksmbd_vfs_set_init_posix_acl(): constify path argument ksmbd_vfs_inherit_posix_acl(): constify path argument ksmbd_vfs_kern_path_unlock(): constify path argument ksmbd_vfs_path_lookup_locked(): root_share_path can be const struct path * check_export(): constify path argument export_operations->open(): constify path argument rqst_exp_get_by_name(): constify path argument nfs: constify path argument of __vfs_getattr() bpf...d_path(): constify path argument done_path_create(): constify path argument ...
2025-10-03Merge tag 'nfs-for-6.18-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "New Features: - Add a Kconfig option to redirect dfprintk() to the trace buffer - Enable use of the RWF_DONTCACHE flag on the NFS client - Add striped layout handling to pNFS flexfiles - Add proper localio handling for READ and WRITE O_DIRECT Bugfixes: - Handle NFS4ERR_GRACE errors during delegation recall - Fix NFSv4.1 backchannel max_resp_sz verification check - Fix mount hang after CREATE_SESSION failure - Fix d_parent->d_inode locking in nfs4_setup_readdir() Other Cleanups and Improvements: - Improvements to write handling tracepoints - Fix a few trivial spelling mistakes - Cleanups to the rpcbind cleanup call sites - Convert the SUNRPC xdr_buf to use a scratch folio instead of scratch page - Remove unused NFS_WBACK_BUSY() macro - Remove __GFP_NOWARN flags - Unexport rpc_malloc() and rpc_free()" * tag 'nfs-for-6.18-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (46 commits) NFS: add basic STATX_DIOALIGN and STATX_DIO_READ_ALIGN support nfs/localio: add tracepoints for misaligned DIO READ and WRITE support nfs/localio: add proper O_DIRECT support for READ and WRITE nfs/localio: refactor iocb initialization nfs/localio: refactor iocb and iov_iter_bvec initialization nfs/localio: avoid issuing misaligned IO using O_DIRECT nfs/localio: make trace_nfs_local_open_fh more useful NFSD: filecache: add STATX_DIOALIGN and STATX_DIO_READ_ALIGN support sunrpc: unexport rpc_malloc() and rpc_free() NFSv4/flexfiles: Add support for striped layouts NFSv4/flexfiles: Update layout stats & error paths for striped layouts NFSv4/flexfiles: Write path updates for striped layouts NFSv4/flexfiles: Commit path updates for striped layouts NFSv4/flexfiles: Read path updates for striped layouts NFSv4/flexfiles: Update low level helper functions to be DS stripe aware. NFSv4/flexfiles: Add data structure support for striped layouts NFSv4/flexfiles: Use ds_commit_idx when marking a write commit NFSv4/flexfiles: Remove cred local variable dependency nfs4_setup_readdir(): insufficient locking for ->d_parent->d_inode dereferencing NFS: Enable use of the RWF_DONTCACHE flag on the NFS client ...
2025-10-03Merge tag 'pull-finish_no_open' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull finish_no_open updates from Al Viro: "finish_no_open calling conventions change to simplify callers" * tag 'pull-finish_no_open' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: slightly simplify nfs_atomic_open() simplify gfs2_atomic_open() simplify fuse_atomic_open() simplify nfs_atomic_open_v23() simplify vboxsf_dir_atomic_open() simplify cifs_atomic_open() 9p: simplify v9fs_vfs_atomic_open_dotl() 9p: simplify v9fs_vfs_atomic_open() allow finish_no_open(file, ERR_PTR(-E...))
2025-10-03Merge tag 'pull-fs_context' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fs_context updates from Al Viro: "Change vfs_parse_fs_string() calling conventions Get rid of the length argument (almost all callers pass strlen() of the string argument there), add vfs_parse_fs_qstr() for the cases that do want separate length" * tag 'pull-fs_context' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: do_nfs4_mount(): switch to vfs_parse_fs_string() change the calling conventions for vfs_parse_fs_string()
2025-09-30NFS: add basic STATX_DIOALIGN and STATX_DIO_READ_ALIGN supportMike Snitzer
NFS doesn't have DIO alignment constraints, so have NFS respond with accommodating DIO alignment attributes (rather than plumb in GETATTR support for STATX_DIOALIGN and STATX_DIO_READ_ALIGN). The most coarse-grained dio_offset_align is the most accommodating (e.g. PAGE_SIZE, in future larger may be supported). Now that NFS has support, NFS reexport will now handle unaligned DIO (NFSD's NFSD_IO_DIRECT support requires the underlying filesystem support STATX_DIOALIGN and/or STATX_DIO_READ_ALIGN). Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-30nfs/localio: add tracepoints for misaligned DIO READ and WRITE supportMike Snitzer
Add nfs_local_dio_class and use it to create nfs_local_dio_read, nfs_local_dio_write and nfs_local_dio_misaligned trace events. These trace events show how NFS LOCALIO splits a given misaligned IO into a mix of misaligned head and/or tail extents and a DIO-aligned middle extent. The misaligned head and/or tail extents are issued using buffered IO and the DIO-aligned middle is issued using O_DIRECT. This combination of trace events is useful for LOCALIO DIO READs: echo 1 > /sys/kernel/tracing/events/nfs/nfs_local_dio_read/enable echo 1 > /sys/kernel/tracing/events/nfs/nfs_local_dio_misaligned/enable echo 1 > /sys/kernel/tracing/events/nfs/nfs_initiate_read/enable echo 1 > /sys/kernel/tracing/events/nfs/nfs_readpage_done/enable echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_read/enable This combination of trace events is useful for LOCALIO DIO WRITEs: echo 1 > /sys/kernel/tracing/events/nfs/nfs_local_dio_write/enable echo 1 > /sys/kernel/tracing/events/nfs/nfs_local_dio_misaligned/enable echo 1 > /sys/kernel/tracing/events/nfs/nfs_initiate_write/enable echo 1 > /sys/kernel/tracing/events/nfs/nfs_writeback_done/enable echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_write/enable Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-30nfs/localio: add proper O_DIRECT support for READ and WRITEMike Snitzer
Because the NFS client will already happily handle misaligned O_DIRECT IO (by sending it out to NFSD via RPC) this commit's new capabilities are for the benefit of LOCALIO. LOCALIO will make best effort to transform misaligned IO to DIO-aligned extents when possible. LOCALIO's READ and WRITE DIO that is misaligned will be split into as many as 3 component IOs (@start, @middle and @end) as needed -- IFF the @middle extent is verified to be DIO-aligned, and then the @start and/or @end are misaligned (due to each being a partial page). Otherwise if the @middle isn't DIO-aligned the code will fallback to issuing only a single contiguous buffered IO. The @middle is only DIO-aligned if both the memory and on-disk offsets for the IO are aligned relative to the underlying local filesystem's block device limits (@dma_alignment and @logical_block_size respectively). The misaligned @start and/or @end extents are issued using buffered IO and the DIO-aligned @middle is issued using O_DIRECT. The @start and @end IOs are issued first using buffered IO with IOCB_SYNC and then the @middle is issued last using direct IO with async completion (AIO). This out of order IO completion means that LOCALIO's IO completion code (nfs_local_read_done and nfs_local_write_done) is only called for the IO's last associated iov_iter completion. And in the case of DIO-aligned @middle it completes last using AIO. nfs_local_pgio_done() is updated to handle piece-wise partial completion of each iov_iter. This implementation for LOCALIO's misaligned DIO handling uses 3 iov_iter that share the same backing pages in their bio_vecs (so unfortunately 'struct nfs_local_kiocb' has 3 instead of only 1). [Reducing LOCALIO's per-IO (struct nfs_local_kiocb) memory use can be explored in the future. One logical progression to improve this code, and eliminate explicit loops over up to 3 iov_iter, is by extending 'struct iov_iter' to support iov_iter_clone() and iov_iter_chain() interfaces that are comparable to what 'struct bio' is able to support in the block layer. But even that wouldn't avoid the need to allocate/use up to 3 iov_iter] Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-30nfs/localio: refactor iocb initializationMike Snitzer
The goal of this commit's various refactoring is to have LOCALIO's per IO initialization occur in process context so that we don't get into a situation where IO fails to be issued from workqueue (e.g. due to lack of memory, etc). Better to have LOCALIO's iocb initialization fail early. There isn't immediate need but this commit makes it possible for LOCALIO to fallback to NFS pagelist code in process context to allow for immediate retry over RPC. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-30nfs/localio: refactor iocb and iov_iter_bvec initializationMike Snitzer
nfs_local_iter_init() is updated to follow the same pattern to initializing LOCALIO's iov_iter_bvec as was established by nfsd_iter_read(). Other LOCALIO iocb initialization refactoring in this commit offers incremental cleanup that will be taken further by the next commit. No functional change. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-30nfs/localio: avoid issuing misaligned IO using O_DIRECTMike Snitzer
Add nfsd_file_dio_alignment and use it to avoid issuing misaligned IO using O_DIRECT. Any misaligned DIO falls back to using buffered IO. Because misaligned DIO is now handled safely, remove the nfs modparam 'localio_O_DIRECT_semantics' that was added to require users opt-in to the requirement that all O_DIRECT be properly DIO-aligned. Also, introduce nfs_iov_iter_aligned_bvec() which is a variant of iov_iter_aligned_bvec() that also verifies the offset associated with an iov_iter is DIO-aligned. NOTE: in a parallel effort, iov_iter_aligned_bvec() is being removed along with iov_iter_is_aligned(). Lastly, add pr_info_ratelimited if underlying filesystem returns -EINVAL because it was made to try O_DIRECT for IO that is not DIO-aligned (shouldn't happen, so its best to be louder if it does). Fixes: 3feec68563d ("nfs/localio: add direct IO enablement with sync and async IO support") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-30nfs/localio: make trace_nfs_local_open_fh more usefulMike Snitzer
Always trigger trace event when LOCALIO opens a file. Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-29Merge tag 'vfs-6.18-rc1.workqueue' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs workqueue updates from Christian Brauner: "This contains various workqueue changes affecting the filesystem layer. Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This replaces the use of system_wq and system_unbound_wq. system_wq is a per-CPU workqueue which isn't very obvious from the name and system_unbound_wq is to be used when locality is not required. So this renames system_wq to system_percpu_wq, and system_unbound_wq to system_dfl_wq. This also adds a new WQ_PERCPU flag to allow the fs subsystem users to explicitly request the use of per-CPU behavior. Both WQ_UNBOUND and WQ_PERCPU flags coexist for one release cycle to allow callers to transition their calls. WQ_UNBOUND will be removed in a next release cycle" * tag 'vfs-6.18-rc1.workqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: WQ_PERCPU added to alloc_workqueue users fs: replace use of system_wq with system_percpu_wq fs: replace use of system_unbound_wq with system_dfl_wq
2025-09-29Merge tag 'vfs-6.18-rc1.inode' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode updates from Christian Brauner: "This contains a series I originally wrote and that Eric brought over the finish line. It moves out the i_crypt_info and i_verity_info pointers out of 'struct inode' and into the fs-specific part of the inode. So now the few filesytems that actually make use of this pay the price in their own private inode storage instead of forcing it upon every user of struct inode. The pointer for the crypt and verity info is simply found by storing an offset to its address in struct fsverity_operations and struct fscrypt_operations. This shrinks struct inode by 16 bytes. I hope to move a lot more out of it in the future so that struct inode becomes really just about very core stuff that we need, much like struct dentry and struct file, instead of the dumping ground it has become over the years. On top of this are a various changes associated with the ongoing inode lifetime handling rework that multiple people are pushing forward: - Stop accessing inode->i_count directly in f2fs and gfs2. They simply should use the __iget() and iput() helpers - Make the i_state flags an enum - Rework the iput() logic Currently, if we are the last iput, and we have the I_DIRTY_TIME bit set, we will grab a reference on the inode again and then mark it dirty and then redo the put. This is to make sure we delay the time update for as long as possible We can rework this logic to simply dec i_count if it is not 1, and if it is do the time update while still holding the i_count reference Then we can replace the atomic_dec_and_lock with locking the ->i_lock and doing atomic_dec_and_test, since we did the atomic_add_unless above - Add an icount_read() helper and convert everyone that accesses inode->i_count directly for this purpose to use the helper - Expand dump_inode() to dump more information about an inode helping in debugging - Add some might_sleep() annotations to iput() and associated helpers" * tag 'vfs-6.18-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: add might_sleep() annotation to iput() and more fs: expand dump_inode() inode: fix whitespace issues fs: add an icount_read helper fs: rework iput logic fs: make the i_state flags an enum fs: stop accessing ->i_count directly in f2fs and gfs2 fsverity: check IS_VERITY() in fsverity_cleanup_inode() fs: remove inode::i_verity_info btrfs: move verity info pointer to fs-specific part of inode f2fs: move verity info pointer to fs-specific part of inode ext4: move verity info pointer to fs-specific part of inode fsverity: add support for info in fs-specific part of inode fs: remove inode::i_crypt_info ceph: move crypt info pointer to fs-specific part of inode ubifs: move crypt info pointer to fs-specific part of inode f2fs: move crypt info pointer to fs-specific part of inode ext4: move crypt info pointer to fs-specific part of inode fscrypt: add support for info in fs-specific part of inode fscrypt: replace raw loads of info pointer with helper function
2025-09-29Merge tag 'vfs-6.18-rc1.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual selections of misc updates for this cycle. Features: - Add "initramfs_options" parameter to set initramfs mount options. This allows to add specific mount options to the rootfs to e.g., limit the memory size - Add RWF_NOSIGNAL flag for pwritev2() Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE signal from being raised when writing on disconnected pipes or sockets. The flag is handled directly by the pipe filesystem and converted to the existing MSG_NOSIGNAL flag for sockets - Allow to pass pid namespace as procfs mount option Ever since the introduction of pid namespaces, procfs has had very implicit behaviour surrounding them (the pidns used by a procfs mount is auto-selected based on the mounting process's active pidns, and the pidns itself is basically hidden once the mount has been constructed) This implicit behaviour has historically meant that userspace was required to do some special dances in order to configure the pidns of a procfs mount as desired. Examples include: * In order to bypass the mnt_too_revealing() check, Kubernetes creates a procfs mount from an empty pidns so that user namespaced containers can be nested (without this, the nested containers would fail to mount procfs) But this requires forking off a helper process because you cannot just one-shot this using mount(2) * Container runtimes in general need to fork into a container before configuring its mounts, which can lead to security issues in the case of shared-pidns containers (a privileged process in the pidns can interact with your container runtime process) While SUID_DUMP_DISABLE and user namespaces make this less of an issue, the strict need for this due to a minor uAPI wart is kind of unfortunate Things would be much easier if there was a way for userspace to just specify the pidns they want. So this pull request contains changes to implement a new "pidns" argument which can be set using fsconfig(2): fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd); fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0); or classic mount(2) / mount(8): // mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid"); Cleanups: - Remove the last references to EXPORT_OP_ASYNC_LOCK - Make file_remove_privs_flags() static - Remove redundant __GFP_NOWARN when GFP_NOWAIT is used - Use try_cmpxchg() in start_dir_add() - Use try_cmpxchg() in sb_init_done_wq() - Replace offsetof() with struct_size() in ioctl_file_dedupe_range() - Remove vfs_ioctl() export - Replace rwlock() with spinlock in epoll code as rwlock causes priority inversion on preempt rt kernels - Make ns_entries in fs/proc/namespaces const - Use a switch() statement() in init_special_inode() just like we do in may_open() - Use struct_size() in dir_add() in the initramfs code - Use str_plural() in rd_load_image() - Replace strcpy() with strscpy() in find_link() - Rename generic_delete_inode() to inode_just_drop() and generic_drop_inode() to inode_generic_drop() - Remove unused arguments from fcntl_{g,s}et_rw_hint() Fixes: - Document @name parameter for name_contains_dotdot() helper - Fix spelling mistake - Always return zero from replace_fd() instead of the file descriptor number - Limit the size for copy_file_range() in compat mode to prevent a signed overflow - Fix debugfs mount options not being applied - Verify the inode mode when loading it from disk in minixfs - Verify the inode mode when loading it from disk in cramfs - Don't trigger automounts with RESOLVE_NO_XDEV If openat2() was called with RESOLVE_NO_XDEV it didn't traverse through automounts, but could still trigger them - Add FL_RECLAIM flag to show_fl_flags() macro so it appears in tracepoints - Fix unused variable warning in rd_load_image() on s390 - Make INITRAMFS_PRESERVE_MTIME depend on BLK_DEV_INITRD - Use ns_capable_noaudit() when determining net sysctl permissions - Don't call path_put() under namespace semaphore in listmount() and statmount()" * tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits) fcntl: trim arguments listmount: don't call path_put() under namespace semaphore statmount: don't call path_put() under namespace semaphore pid: use ns_capable_noaudit() when determining net sysctl permissions fs: rename generic_delete_inode() and generic_drop_inode() init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD initramfs: Replace strcpy() with strscpy() in find_link() initrd: Use str_plural() in rd_load_image() initramfs: Use struct_size() helper to improve dir_add() initrd: Fix unused variable warning in rd_load_image() on s390 fs: use the switch statement in init_special_inode() fs/proc/namespaces: make ns_entries const filelock: add FL_RECLAIM to show_fl_flags() macro eventpoll: Replace rwlock with spinlock selftests/proc: add tests for new pidns APIs procfs: add "pidns" mount option pidns: move is-ancestor logic to helper openat2: don't trigger automounts with RESOLVE_NO_XDEV namei: move cross-device check to __traverse_mounts namei: remove LOOKUP_NO_XDEV check from handle_mounts ...
2025-09-26NFSv4/flexfiles: Add support for striped layoutsJonathan Curley
Updates lseg creation path to parse and add striped layouts. Enable support for striped layouts. Limitations: 1. All mirrors must have the same number of stripes. Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-26NFSv4/flexfiles: Update layout stats & error paths for striped layoutsJonathan Curley
Updates the layout stats logic to be stripe aware. Read and write stats are accumulated on a per DS stripe basis. Also updates error paths to use dss_id where appropraite. Limitations: 1. The layout stats structure is still statically sized to 4 and there is no deduplication logic for deviceids that may appear more than once in a striped layout. Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-26NFSv4/flexfiles: Write path updates for striped layoutsJonathan Curley
Updates write path to calculate and use dss_id to direct IO to the appropriate stripe DS. Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-26NFSv4/flexfiles: Commit path updates for striped layoutsJonathan Curley
Updates the commit path to be stripe aware. This required updating the ds_commit_idx to be stripe aware. ds_commit_idx == mirror_idx * dss_count + dss_id. Updates code paths to utilize the new ds_commit_idx and derive dss_id & mirror_idx where appropriate to contact the correct DS using the corresponding parameters. Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-26NFSv4/flexfiles: Read path updates for striped layoutsJonathan Curley
Updates read path to calculate and use dss_id to direct IO to the appropriate stripe DS. Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-26NFSv4/flexfiles: Update low level helper functions to be DS stripe aware.Jonathan Curley
Updates common helper functions to be dss_id aware. Most cases simply add a dss_id parameter. The has_available functions have been updated with a loop. Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-26NFSv4/flexfiles: Add data structure support for striped layoutsJonathan Curley
Adds a new struct nfs4_ff_layout_ds_stripe that represents a data server stripe within a layout. A new dynamically allocated array of this type has been added to nfs4_ff_layout_mirror and per stripe configuration information has been moved from the mirror type to the stripe based on the RFC. Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-26NFSv4/flexfiles: Use ds_commit_idx when marking a write commitJonathan Curley
Correct this path to use ds_commit_idx. Another noop preparation change. In current code commit_idx == mirror_idx but when striping is enabled that will not be true. Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-26NFSv4/flexfiles: Remove cred local variable dependencyJonathan Curley
No-op preparation change to remove dependency on cred local variable. Subsequent striping diff has a cred per stripe so this local variable can't be trusted to be the same. Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-23nfs4_setup_readdir(): insufficient locking for ->d_parent->d_inode dereferencingAl Viro
Theoretically it's an oopsable race, but I don't believe one can manage to hit it on real hardware; might become doable on a KVM, but it still won't be easy to attack. Anyway, it's easy to deal with - since xdr_encode_hyper() is just a call of put_unaligned_be64(), we can put that under ->d_lock and be done with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-09-23NFS: Enable use of the RWF_DONTCACHE flag on the NFS clientTrond Myklebust
The NFS client needs to defer dropbehind until after any writes to the folio have been persisted on the server. Since this may be a 2 step process, use folio_end_writeback_no_dropbehind() to allow release of the writeback flag, and then call folio_end_dropbehind() once the COMMIT is done. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>