<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/uapi/linux/fcntl.h, branch v6.17.8</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.17.8</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.17.8'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2025-06-24T14:58:42Z</updated>
<entry>
<title>uapi/fcntl: add FD_PIDFS_ROOT</title>
<updated>2025-06-24T14:58:42Z</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2025-06-24T14:57:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3941e37f62fe2c3c8b8675c12183185f20450539'/>
<id>urn:sha1:3941e37f62fe2c3c8b8675c12183185f20450539</id>
<content type='text'>
Add a special file descriptor indicating the root of the pidfs
filesystem.

Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>uapi/fcntl: add FD_INVALID</title>
<updated>2025-06-24T13:50:15Z</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2025-06-24T08:29:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cd5d2006327b6d8488612cb8c03ad7304417c8f2'/>
<id>urn:sha1:cd5d2006327b6d8488612cb8c03ad7304417c8f2</id>
<content type='text'>
Add a marker for an invalid file descriptor.

Link: https://lore.kernel.org/20250624-work-pidfs-fhandle-v2-7-d02a04858fe3@kernel.org
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>fcntl/pidfd: redefine PIDFD_SELF_THREAD_GROUP</title>
<updated>2025-06-24T13:50:15Z</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2025-06-24T13:48:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=67fcec2919e4ed31ab845eb456ad7d6f1e85505c'/>
<id>urn:sha1:67fcec2919e4ed31ab845eb456ad7d6f1e85505c</id>
<content type='text'>
Don't jump somewhere into the middle of the reserved range. We're still
able to change that value it won't be that widely used yet. If not, we
can revert.

Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>uapi/fcntl: mark range as reserved</title>
<updated>2025-06-24T13:50:06Z</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2025-06-24T13:48:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a4c746f06853f91d3759ae8aca514d135b6aa56d'/>
<id>urn:sha1:a4c746f06853f91d3759ae8aca514d135b6aa56d</id>
<content type='text'>
Mark the range from -10000 to -40000 as a range reserved for special
in-kernel values. Move the PIDFD_SELF_*/PIDFD_THREAD_* sentinels over so
all the special values are in one place.

Link: https://lore.kernel.org/20250624-work-pidfs-fhandle-v2-6-d02a04858fe3@kernel.org
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>exec: Add a new AT_EXECVE_CHECK flag to execveat(2)</title>
<updated>2024-12-19T01:00:29Z</updated>
<author>
<name>Mickaël Salaün</name>
<email>mic@digikod.net</email>
</author>
<published>2024-12-12T17:42:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a5874fde3c0884a33ed4145101052318c5e17c74'/>
<id>urn:sha1:a5874fde3c0884a33ed4145101052318c5e17c74</id>
<content type='text'>
Add a new AT_EXECVE_CHECK flag to execveat(2) to check if a file would
be allowed for execution.  The main use case is for script interpreters
and dynamic linkers to check execution permission according to the
kernel's security policy. Another use case is to add context to access
logs e.g., which script (instead of interpreter) accessed a file.  As
any executable code, scripts could also use this check [1].

This is different from faccessat(2) + X_OK which only checks a subset of
access rights (i.e. inode permission and mount options for regular
files), but not the full context (e.g. all LSM access checks).  The main
use case for access(2) is for SUID processes to (partially) check access
on behalf of their caller.  The main use case for execveat(2) +
AT_EXECVE_CHECK is to check if a script execution would be allowed,
according to all the different restrictions in place.  Because the use
of AT_EXECVE_CHECK follows the exact kernel semantic as for a real
execution, user space gets the same error codes.

An interesting point of using execveat(2) instead of openat2(2) is that
it decouples the check from the enforcement.  Indeed, the security check
can be logged (e.g. with audit) without blocking an execution
environment not yet ready to enforce a strict security policy.

LSMs can control or log execution requests with
security_bprm_creds_for_exec().  However, to enforce a consistent and
complete access control (e.g. on binary's dependencies) LSMs should
restrict file executability, or measure executed files, with
security_file_open() by checking file-&gt;f_flags &amp; __FMODE_EXEC.

Because AT_EXECVE_CHECK is dedicated to user space interpreters, it
doesn't make sense for the kernel to parse the checked files, look for
interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
if the format is unknown.  Because of that, security_bprm_check() is
never called when AT_EXECVE_CHECK is used.

It should be noted that script interpreters cannot directly use
execveat(2) (without this new AT_EXECVE_CHECK flag) because this could
lead to unexpected behaviors e.g., `python script.sh` could lead to Bash
being executed to interpret the script.  Unlike the kernel, script
interpreters may just interpret the shebang as a simple comment, which
should not change for backward compatibility reasons.

Because scripts or libraries files might not currently have the
executable permission set, or because we might want specific users to be
allowed to run arbitrary scripts, the following patch provides a dynamic
configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and
SECBIT_EXEC_DENY_INTERACTIVE securebits.

This is a redesign of the CLIP OS 4's O_MAYEXEC:
https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
This patch has been used for more than a decade with customized script
interpreters.  Some examples can be found here:
https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC

Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Acked-by: Paul Moore &lt;paul@paul-moore.com&gt;
Reviewed-by: Serge Hallyn &lt;serge@hallyn.com&gt;
Reviewed-by: Jeff Xu &lt;jeffxu@chromium.org&gt;
Tested-by: Jeff Xu &lt;jeffxu@chromium.org&gt;
Link: https://docs.python.org/3/library/io.html#io.open_code [1]
Signed-off-by: Mickaël Salaün &lt;mic@digikod.net&gt;
Link: https://lore.kernel.org/r/20241212174223.389435-2-mic@digikod.net
Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'vfs-6.13.exportfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs</title>
<updated>2024-11-26T21:26:15Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-11-26T21:26:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1675db5c42b780f8a6d45d080d5ac037d9714f7a'/>
<id>urn:sha1:1675db5c42b780f8a6d45d080d5ac037d9714f7a</id>
<content type='text'>
Pull vfs exportfs updates from Christian Brauner:
 "This contains work to bring NFS connectable file handles to userspace
  servers.

  The name_to_handle_at() system call is extended to encode connectable
  file handles. Such file handles can be resolved to an open file with a
  connected path. So far userspace NFS servers couldn't make use of this
  functionality even though the kernel does already support it. This is
  achieved by introducing a new flag for name_to_handle_at().

  Similarly, the open_by_handle_at() system call is tought to understand
  connectable file handles explicitly created via name_to_handle_at()"

* tag 'vfs-6.13.exportfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: open_by_handle_at() support for decoding "explicit connectable" file handles
  fs: name_to_handle_at() support for "explicit connectable" file handles
  fs: prepare for "explicit connectable" file handles
</content>
</entry>
<entry>
<title>fs: name_to_handle_at() support for "explicit connectable" file handles</title>
<updated>2024-11-15T10:34:57Z</updated>
<author>
<name>Amir Goldstein</name>
<email>amir73il@gmail.com</email>
</author>
<published>2024-10-11T09:00:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c374196b2b9f4b803fccd59ed82f0712041e21e1'/>
<id>urn:sha1:c374196b2b9f4b803fccd59ed82f0712041e21e1</id>
<content type='text'>
nfsd encodes "connectable" file handles for the subtree_check feature,
which can be resolved to an open file with a connected path.
So far, userspace nfs server could not make use of this functionality.

Introduce a new flag AT_HANDLE_CONNECTABLE to name_to_handle_at(2).
When used, the encoded file handle is "explicitly connectable".

The "explicitly connectable" file handle sets bits in the high 16bit of
the handle_type field, so open_by_handle_at(2) will know that it needs
to open a file with a connected path.

old kernels will now recognize the handle_type with high bits set,
so "explicitly connectable" file handles cannot be decoded by
open_by_handle_at(2) on old kernels.

The flag AT_HANDLE_CONNECTABLE is not allowed together with either
AT_HANDLE_FID or AT_EMPTY_PATH.

Signed-off-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Link: https://lore.kernel.org/r/20241011090023.655623-3-amir73il@gmail.com
Fixes: 570df4e9c23f ("ceph: snapshot nfs re-export")
Acked-by:
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs: Simplify getattr interface function checking AT_GETATTR_NOSEC flag</title>
<updated>2024-11-13T16:46:29Z</updated>
<author>
<name>Stefan Berger</name>
<email>stefanb@linux.ibm.com</email>
</author>
<published>2024-11-01T19:37:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=95f567f81e43a1bcb5fbf0559e55b7505707300d'/>
<id>urn:sha1:95f567f81e43a1bcb5fbf0559e55b7505707300d</id>
<content type='text'>
Commit 8a924db2d7b5 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface
function")' introduced the AT_GETATTR_NOSEC flag to ensure that the
call paths only call vfs_getattr_nosec if it is set instead of vfs_getattr.
Now, simplify the getattr interface functions of filesystems where the flag
AT_GETATTR_NOSEC is checked.

There is only a single caller of inode_operations getattr function and it
is located in fs/stat.c in vfs_getattr_nosec. The caller there is the only
one from which the AT_GETATTR_NOSEC flag is passed from.

Two filesystems are checking this flag in .getattr and the flag is always
passed to them unconditionally from only vfs_getattr_nosec:

- ecryptfs:  Simplify by always calling vfs_getattr_nosec in
             ecryptfs_getattr. From there the flag is passed to no other
             function and this function is not called otherwise.

- overlayfs: Simplify by always calling vfs_getattr_nosec in
             ovl_getattr. From there the flag is passed to no other
             function and this function is not called otherwise.

The query_flags in vfs_getattr_nosec will mask-out AT_GETATTR_NOSEC from
any caller using AT_STATX_SYNC_TYPE as mask so that the flag is not
important inside this function. Also, since no filesystem is checking the
flag anymore, remove the flag entirely now, including the BUG_ON check that
never triggered.

The net change of the changes here combined with the original commit is
that ecryptfs and overlayfs do not call vfs_getattr but only
vfs_getattr_nosec.

Fixes: 8a924db2d7b5 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface function")
Reported-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Closes: https://lore.kernel.org/linux-fsdevel/20241101011724.GN1350452@ZenIV/T/#u
Cc: Tyler Hicks &lt;code@tyhicks.com&gt;
Cc: ecryptfs@vger.kernel.org
Cc: Miklos Szeredi &lt;miklos@szeredi.hu&gt;
Cc: Amir Goldstein &lt;amir73il@gmail.com&gt;
Cc: linux-unionfs@vger.kernel.org
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Stefan Berger &lt;stefanb@linux.ibm.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>fhandle: expose u64 mount id to name_to_handle_at(2)</title>
<updated>2024-09-05T09:39:17Z</updated>
<author>
<name>Aleksa Sarai</name>
<email>cyphar@cyphar.com</email>
</author>
<published>2024-08-28T10:19:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4356d575ef0f39a3e8e0ce0c40d84ce900ac3b61'/>
<id>urn:sha1:4356d575ef0f39a3e8e0ce0c40d84ce900ac3b61</id>
<content type='text'>
Now that we provide a unique 64-bit mount ID interface in statx(2), we
can now provide a race-free way for name_to_handle_at(2) to provide a
file handle and corresponding mount without needing to worry about
racing with /proc/mountinfo parsing or having to open a file just to do
statx(2).

While this is not necessary if you are using AT_EMPTY_PATH and don't
care about an extra statx(2) call, users that pass full paths into
name_to_handle_at(2) need to know which mount the file handle comes from
(to make sure they don't try to open_by_handle_at a file handle from a
different filesystem) and switching to AT_EMPTY_PATH would require
allocating a file for every name_to_handle_at(2) call, turning

  err = name_to_handle_at(-EBADF, "/foo/bar/baz", &amp;handle, &amp;mntid,
                          AT_HANDLE_MNT_ID_UNIQUE);

into

  int fd = openat(-EBADF, "/foo/bar/baz", O_PATH | O_CLOEXEC);
  err1 = name_to_handle_at(fd, "", &amp;handle, &amp;unused_mntid, AT_EMPTY_PATH);
  err2 = statx(fd, "", AT_EMPTY_PATH, STATX_MNT_ID_UNIQUE, &amp;statxbuf);
  mntid = statxbuf.stx_mnt_id;
  close(fd);

Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Aleksa Sarai &lt;cyphar@cyphar.com&gt;
Link: https://lore.kernel.org/r/20240828-exportfs-u64-mount-id-v3-2-10c2c4c16708@cyphar.com
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>uapi: explain how per-syscall AT_* flags should be allocated</title>
<updated>2024-09-05T09:37:45Z</updated>
<author>
<name>Aleksa Sarai</name>
<email>cyphar@cyphar.com</email>
</author>
<published>2024-08-28T10:19:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b4fef22c2fb97fa204f0c99c7c7f1c6b422ef0aa'/>
<id>urn:sha1:b4fef22c2fb97fa204f0c99c7c7f1c6b422ef0aa</id>
<content type='text'>
Unfortunately, the way we have gone about adding new AT_* flags has
been a little messy. In the beginning, all of the AT_* flags had generic
meanings and so it made sense to share the flag bits indiscriminately.
However, we inevitably ran into syscalls that needed their own
syscall-specific flags. Due to the lack of a planned out policy, we
ended up with the following situations:

 * Existing syscalls adding new features tended to use new AT_* bits,
   with some effort taken to try to re-use bits for flags that were so
   obviously syscall specific that they only make sense for a single
   syscall (such as the AT_EACCESS/AT_REMOVEDIR/AT_HANDLE_FID triplet).

   Given the constraints of bitflags, this works well in practice, but
   ideally (to avoid future confusion) we would plan ahead and define a
   set of "per-syscall bits" ahead of time so that when allocating new
   bits we don't end up with a complete mish-mash of which bits are
   supposed to be per-syscall and which aren't.

 * New syscalls dealt with this in several ways:

   - Some syscalls (like renameat2(2), move_mount(2), fsopen(2), and
     fspick(2)) created their separate own flag spaces that have no
     overlap with the AT_* flags. Most of these ended up allocating
     their bits sequentually.

     In the case of move_mount(2) and fspick(2), several flags have
     identical meanings to AT_* flags but were allocated in their own
     flag space.

     This makes sense for syscalls that will never share AT_* flags, but
     for some syscalls this leads to duplication with AT_* flags in a
     way that could cause confusion (if renameat2(2) grew a
     RENAME_EMPTY_PATH it seems likely that users could mistake it for
     AT_EMPTY_PATH since it is an *at(2) syscall).

   - Some syscalls unfortunately ended up both creating their own flag
     space while also using bits from other flag spaces. The most
     obvious example is open_tree(2), where the standard usage ends up
     using flags from *THREE* separate flag spaces:

       open_tree(AT_FDCWD, "/foo", OPEN_TREE_CLONE|O_CLOEXEC|AT_RECURSIVE);

     (Note that O_CLOEXEC is also platform-specific, so several future
     OPEN_TREE_* bits are also made unusable in one fell swoop.)

It's not entirely clear to me what the "right" choice is for new
syscalls. Just saying that all future VFS syscalls should use AT_* flags
doesn't seem practical. openat2(2) has RESOLVE_* flags (many of which
don't make much sense to burn generic AT_* flags for) and move_mount(2)
has separate AT_*-like flags for both the source and target so separate
flags are needed anyway (though it seems possible that renameat2(2)
could grow *_EMPTY_PATH flags at some point, and it's a bit of a shame
they can't be reused).

But at least for syscalls that _do_ choose to use AT_* flags, we should
explicitly state the policy that 0x2ff is currently intended for
per-syscall flags and that new flags should err on the side of
overlapping with existing flag bits (so we can extend the scope of
generic flags in the future if necessary).

And add AT_* aliases for the RENAME_* flags to further cement that
renameat2(2) is an *at(2) flag, just with its own per-syscall flags.

Suggested-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: Aleksa Sarai &lt;cyphar@cyphar.com&gt;
Link: https://lore.kernel.org/r/20240828-exportfs-u64-mount-id-v3-1-10c2c4c16708@cyphar.com
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
</feed>
