<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/uapi/linux/fcntl.h, branch v6.13.5</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.13.5</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v6.13.5'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2024-11-26T21:26:15Z</updated>
<entry>
<title>Merge tag 'vfs-6.13.exportfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs</title>
<updated>2024-11-26T21:26:15Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-11-26T21:26:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1675db5c42b780f8a6d45d080d5ac037d9714f7a'/>
<id>urn:sha1:1675db5c42b780f8a6d45d080d5ac037d9714f7a</id>
<content type='text'>
Pull vfs exportfs updates from Christian Brauner:
 "This contains work to bring NFS connectable file handles to userspace
  servers.

  The name_to_handle_at() system call is extended to encode connectable
  file handles. Such file handles can be resolved to an open file with a
  connected path. So far userspace NFS servers couldn't make use of this
  functionality even though the kernel does already support it. This is
  achieved by introducing a new flag for name_to_handle_at().

  Similarly, the open_by_handle_at() system call is tought to understand
  connectable file handles explicitly created via name_to_handle_at()"

* tag 'vfs-6.13.exportfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: open_by_handle_at() support for decoding "explicit connectable" file handles
  fs: name_to_handle_at() support for "explicit connectable" file handles
  fs: prepare for "explicit connectable" file handles
</content>
</entry>
<entry>
<title>fs: name_to_handle_at() support for "explicit connectable" file handles</title>
<updated>2024-11-15T10:34:57Z</updated>
<author>
<name>Amir Goldstein</name>
<email>amir73il@gmail.com</email>
</author>
<published>2024-10-11T09:00:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c374196b2b9f4b803fccd59ed82f0712041e21e1'/>
<id>urn:sha1:c374196b2b9f4b803fccd59ed82f0712041e21e1</id>
<content type='text'>
nfsd encodes "connectable" file handles for the subtree_check feature,
which can be resolved to an open file with a connected path.
So far, userspace nfs server could not make use of this functionality.

Introduce a new flag AT_HANDLE_CONNECTABLE to name_to_handle_at(2).
When used, the encoded file handle is "explicitly connectable".

The "explicitly connectable" file handle sets bits in the high 16bit of
the handle_type field, so open_by_handle_at(2) will know that it needs
to open a file with a connected path.

old kernels will now recognize the handle_type with high bits set,
so "explicitly connectable" file handles cannot be decoded by
open_by_handle_at(2) on old kernels.

The flag AT_HANDLE_CONNECTABLE is not allowed together with either
AT_HANDLE_FID or AT_EMPTY_PATH.

Signed-off-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Link: https://lore.kernel.org/r/20241011090023.655623-3-amir73il@gmail.com
Fixes: 570df4e9c23f ("ceph: snapshot nfs re-export")
Acked-by:
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs: Simplify getattr interface function checking AT_GETATTR_NOSEC flag</title>
<updated>2024-11-13T16:46:29Z</updated>
<author>
<name>Stefan Berger</name>
<email>stefanb@linux.ibm.com</email>
</author>
<published>2024-11-01T19:37:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=95f567f81e43a1bcb5fbf0559e55b7505707300d'/>
<id>urn:sha1:95f567f81e43a1bcb5fbf0559e55b7505707300d</id>
<content type='text'>
Commit 8a924db2d7b5 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface
function")' introduced the AT_GETATTR_NOSEC flag to ensure that the
call paths only call vfs_getattr_nosec if it is set instead of vfs_getattr.
Now, simplify the getattr interface functions of filesystems where the flag
AT_GETATTR_NOSEC is checked.

There is only a single caller of inode_operations getattr function and it
is located in fs/stat.c in vfs_getattr_nosec. The caller there is the only
one from which the AT_GETATTR_NOSEC flag is passed from.

Two filesystems are checking this flag in .getattr and the flag is always
passed to them unconditionally from only vfs_getattr_nosec:

- ecryptfs:  Simplify by always calling vfs_getattr_nosec in
             ecryptfs_getattr. From there the flag is passed to no other
             function and this function is not called otherwise.

- overlayfs: Simplify by always calling vfs_getattr_nosec in
             ovl_getattr. From there the flag is passed to no other
             function and this function is not called otherwise.

The query_flags in vfs_getattr_nosec will mask-out AT_GETATTR_NOSEC from
any caller using AT_STATX_SYNC_TYPE as mask so that the flag is not
important inside this function. Also, since no filesystem is checking the
flag anymore, remove the flag entirely now, including the BUG_ON check that
never triggered.

The net change of the changes here combined with the original commit is
that ecryptfs and overlayfs do not call vfs_getattr but only
vfs_getattr_nosec.

Fixes: 8a924db2d7b5 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface function")
Reported-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Closes: https://lore.kernel.org/linux-fsdevel/20241101011724.GN1350452@ZenIV/T/#u
Cc: Tyler Hicks &lt;code@tyhicks.com&gt;
Cc: ecryptfs@vger.kernel.org
Cc: Miklos Szeredi &lt;miklos@szeredi.hu&gt;
Cc: Amir Goldstein &lt;amir73il@gmail.com&gt;
Cc: linux-unionfs@vger.kernel.org
Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Stefan Berger &lt;stefanb@linux.ibm.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>fhandle: expose u64 mount id to name_to_handle_at(2)</title>
<updated>2024-09-05T09:39:17Z</updated>
<author>
<name>Aleksa Sarai</name>
<email>cyphar@cyphar.com</email>
</author>
<published>2024-08-28T10:19:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4356d575ef0f39a3e8e0ce0c40d84ce900ac3b61'/>
<id>urn:sha1:4356d575ef0f39a3e8e0ce0c40d84ce900ac3b61</id>
<content type='text'>
Now that we provide a unique 64-bit mount ID interface in statx(2), we
can now provide a race-free way for name_to_handle_at(2) to provide a
file handle and corresponding mount without needing to worry about
racing with /proc/mountinfo parsing or having to open a file just to do
statx(2).

While this is not necessary if you are using AT_EMPTY_PATH and don't
care about an extra statx(2) call, users that pass full paths into
name_to_handle_at(2) need to know which mount the file handle comes from
(to make sure they don't try to open_by_handle_at a file handle from a
different filesystem) and switching to AT_EMPTY_PATH would require
allocating a file for every name_to_handle_at(2) call, turning

  err = name_to_handle_at(-EBADF, "/foo/bar/baz", &amp;handle, &amp;mntid,
                          AT_HANDLE_MNT_ID_UNIQUE);

into

  int fd = openat(-EBADF, "/foo/bar/baz", O_PATH | O_CLOEXEC);
  err1 = name_to_handle_at(fd, "", &amp;handle, &amp;unused_mntid, AT_EMPTY_PATH);
  err2 = statx(fd, "", AT_EMPTY_PATH, STATX_MNT_ID_UNIQUE, &amp;statxbuf);
  mntid = statxbuf.stx_mnt_id;
  close(fd);

Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Aleksa Sarai &lt;cyphar@cyphar.com&gt;
Link: https://lore.kernel.org/r/20240828-exportfs-u64-mount-id-v3-2-10c2c4c16708@cyphar.com
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>uapi: explain how per-syscall AT_* flags should be allocated</title>
<updated>2024-09-05T09:37:45Z</updated>
<author>
<name>Aleksa Sarai</name>
<email>cyphar@cyphar.com</email>
</author>
<published>2024-08-28T10:19:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b4fef22c2fb97fa204f0c99c7c7f1c6b422ef0aa'/>
<id>urn:sha1:b4fef22c2fb97fa204f0c99c7c7f1c6b422ef0aa</id>
<content type='text'>
Unfortunately, the way we have gone about adding new AT_* flags has
been a little messy. In the beginning, all of the AT_* flags had generic
meanings and so it made sense to share the flag bits indiscriminately.
However, we inevitably ran into syscalls that needed their own
syscall-specific flags. Due to the lack of a planned out policy, we
ended up with the following situations:

 * Existing syscalls adding new features tended to use new AT_* bits,
   with some effort taken to try to re-use bits for flags that were so
   obviously syscall specific that they only make sense for a single
   syscall (such as the AT_EACCESS/AT_REMOVEDIR/AT_HANDLE_FID triplet).

   Given the constraints of bitflags, this works well in practice, but
   ideally (to avoid future confusion) we would plan ahead and define a
   set of "per-syscall bits" ahead of time so that when allocating new
   bits we don't end up with a complete mish-mash of which bits are
   supposed to be per-syscall and which aren't.

 * New syscalls dealt with this in several ways:

   - Some syscalls (like renameat2(2), move_mount(2), fsopen(2), and
     fspick(2)) created their separate own flag spaces that have no
     overlap with the AT_* flags. Most of these ended up allocating
     their bits sequentually.

     In the case of move_mount(2) and fspick(2), several flags have
     identical meanings to AT_* flags but were allocated in their own
     flag space.

     This makes sense for syscalls that will never share AT_* flags, but
     for some syscalls this leads to duplication with AT_* flags in a
     way that could cause confusion (if renameat2(2) grew a
     RENAME_EMPTY_PATH it seems likely that users could mistake it for
     AT_EMPTY_PATH since it is an *at(2) syscall).

   - Some syscalls unfortunately ended up both creating their own flag
     space while also using bits from other flag spaces. The most
     obvious example is open_tree(2), where the standard usage ends up
     using flags from *THREE* separate flag spaces:

       open_tree(AT_FDCWD, "/foo", OPEN_TREE_CLONE|O_CLOEXEC|AT_RECURSIVE);

     (Note that O_CLOEXEC is also platform-specific, so several future
     OPEN_TREE_* bits are also made unusable in one fell swoop.)

It's not entirely clear to me what the "right" choice is for new
syscalls. Just saying that all future VFS syscalls should use AT_* flags
doesn't seem practical. openat2(2) has RESOLVE_* flags (many of which
don't make much sense to burn generic AT_* flags for) and move_mount(2)
has separate AT_*-like flags for both the source and target so separate
flags are needed anyway (though it seems possible that renameat2(2)
could grow *_EMPTY_PATH flags at some point, and it's a bit of a shame
they can't be reused).

But at least for syscalls that _do_ choose to use AT_* flags, we should
explicitly state the policy that 0x2ff is currently intended for
per-syscall flags and that new flags should err on the side of
overlapping with existing flag bits (so we can extend the scope of
generic flags in the future if necessary).

And add AT_* aliases for the RENAME_* flags to further cement that
renameat2(2) is an *at(2) flag, just with its own per-syscall flags.

Suggested-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Signed-off-by: Aleksa Sarai &lt;cyphar@cyphar.com&gt;
Link: https://lore.kernel.org/r/20240828-exportfs-u64-mount-id-v3-1-10c2c4c16708@cyphar.com
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>fcntl: add F_CREATED_QUERY</title>
<updated>2024-08-19T11:44:34Z</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2024-07-24T13:15:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=820a185896b77814557302b981b092a9e7b36814'/>
<id>urn:sha1:820a185896b77814557302b981b092a9e7b36814</id>
<content type='text'>
Systemd has a helper called openat_report_new() that returns whether a
file was created anew or it already existed before for cases where
O_CREAT has to be used without O_EXCL (cf. [1]). That apparently isn't
something that's specific to systemd but it's where I noticed it.

The current logic is that it first attempts to open the file without
O_CREAT | O_EXCL and if it gets ENOENT the helper tries again with both
flags. If that succeeds all is well. If it now reports EEXIST it
retries.

That works fairly well but some corner cases make this more involved. If
this operates on a dangling symlink the first openat() without O_CREAT |
O_EXCL will return ENOENT but the second openat() with O_CREAT | O_EXCL
will fail with EEXIST. The reason is that openat() without O_CREAT |
O_EXCL follows the symlink while O_CREAT | O_EXCL doesn't for security
reasons. So it's not something we can really change unless we add an
explicit opt-in via O_FOLLOW which seems really ugly.

The caller could try and use fanotify() to register to listen for
creation events in the directory before calling openat(). The caller
could then compare the returned tid to its own tid to ensure that even
in threaded environments it actually created the file. That might work
but is a lot of work for something that should be fairly simple and I'm
uncertain about it's reliability.

The caller could use a bpf lsm hook to hook into security_file_open() to
figure out whether they created the file. That also seems a bit wild.

So let's add F_CREATED_QUERY which allows the caller to check whether
they actually did create the file. That has caveats of course but I
don't think they are problematic:

* In multi-threaded environments a thread can only be sure that it did
  create the file if it calls openat() with O_CREAT. In other words,
  it's obviously not enough to just go through it's fdtable and check
  these fds because another thread could've created the file.

* If there's any codepaths where an openat() with O_CREAT would yield
  the same struct file as that of another thread it would obviously
  cause wrong results. I'm not aware of any such codepaths from openat()
  itself. Imho, that would be a bug.

* Related to the previous point, calling the new fcntl() on files created
  and opened via special-purpose system calls or ioctl()s would cause
  wrong results only if the affected subsystem a) raises FMODE_CREATED
  and b) may return the same struct file for two different calls. I'm
  not seeing anything outside of regular VFS code that raises
  FMODE_CREATED.

  There is code for b) in e.g., the drm layer where the same struct file
  is resurfaced but again FMODE_CREATED isn't used and it would be very
  misleading if it did.

Link: https://github.com/systemd/systemd/blob/11d5e2b5fbf9f6bfa5763fd45b56829ad4f0777f/src/basic/fs-util.c#L1078 [1]
Link: https://lore.kernel.org/r/20240724-work-fcntl-v1-1-e8153a2f1991@kernel.org
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>fcntl: add F_DUPFD_QUERY fcntl()</title>
<updated>2024-05-10T06:26:31Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-05-09T11:04:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c62b758bae6af16fee94f556091fa74883a96b1e'/>
<id>urn:sha1:c62b758bae6af16fee94f556091fa74883a96b1e</id>
<content type='text'>
Often userspace needs to know whether two file descriptors refer to the
same struct file. For example, systemd uses this to filter out duplicate
file descriptors in it's file descriptor store (cf. [1]) and vulkan uses
it to compare dma-buf fds (cf. [2]).

The only api we provided for this was kcmp() but that's not generally
available or might be disallowed because it is way more powerful (allows
ordering of file pointers, operates on non-current task) etc. So give
userspace a simple way of comparing two file descriptors for sameness
adding a new fcntl() F_DUDFD_QUERY.

Link: https://github.com/systemd/systemd/blob/a4f0e0da3573a10bc5404142be8799418760b1d1/src/basic/fd-util.c#L517 [1]
Link: https://gitlab.freedesktop.org/wlroots/wlroots/-/blob/master/render/vulkan/texture.c#L490 [2]
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[brauner: commit message]
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs: Pass AT_GETATTR_NOSEC flag to getattr interface function</title>
<updated>2023-11-18T13:54:07Z</updated>
<author>
<name>Stefan Berger</name>
<email>stefanb@linux.ibm.com</email>
</author>
<published>2023-10-02T12:57:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8a924db2d7b5eb69ba08b1a0af46e9f1359a9bdf'/>
<id>urn:sha1:8a924db2d7b5eb69ba08b1a0af46e9f1359a9bdf</id>
<content type='text'>
When vfs_getattr_nosec() calls a filesystem's getattr interface function
then the 'nosec' should propagate into this function so that
vfs_getattr_nosec() can again be called from the filesystem's gettattr
rather than vfs_getattr(). The latter would add unnecessary security
checks that the initial vfs_getattr_nosec() call wanted to avoid.
Therefore, introduce the getattr flag GETATTR_NOSEC and allow to pass
with the new getattr_flags parameter to the getattr interface function.
In overlayfs and ecryptfs use this flag to determine which one of the
two functions to call.

In a recent code change introduced to IMA vfs_getattr_nosec() ended up
calling vfs_getattr() in overlayfs, which in turn called
security_inode_getattr() on an exiting process that did not have
current-&gt;fs set anymore, which then caused a kernel NULL pointer
dereference. With this change the call to security_inode_getattr() can
be avoided, thus avoiding the NULL pointer dereference.

Reported-by: &lt;syzbot+a67fc5321ffb4b311c98@syzkaller.appspotmail.com&gt;
Fixes: db1d1e8b9867 ("IMA: use vfs_getattr_nosec to get the i_version")
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: &lt;linux-fsdevel@vger.kernel.org&gt;
Cc: Miklos Szeredi &lt;miklos@szeredi.hu&gt;
Cc: Amir Goldstein &lt;amir73il@gmail.com&gt;
Cc: Tyler Hicks &lt;code@tyhicks.com&gt;
Cc: Mimi Zohar &lt;zohar@linux.ibm.com&gt;
Suggested-by: Christian Brauner &lt;brauner@kernel.org&gt;
Co-developed-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Stefan Berger &lt;stefanb@linux.ibm.com&gt;
Link: https://lore.kernel.org/r/20231002125733.1251467-1-stefanb@linux.vnet.ibm.com
Reviewed-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>exportfs: allow exporting non-decodeable file handles to userspace</title>
<updated>2023-05-25T11:16:57Z</updated>
<author>
<name>Amir Goldstein</name>
<email>amir73il@gmail.com</email>
</author>
<published>2023-05-02T12:48:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=96b2b072ee62be8ae68c8ecf14854c4d0505a8f8'/>
<id>urn:sha1:96b2b072ee62be8ae68c8ecf14854c4d0505a8f8</id>
<content type='text'>
Some userspace programs use st_ino as a unique object identifier, even
though inode numbers may be recycable.

This issue has been addressed for NFS export long ago using the exportfs
file handle API and the unique file handle identifiers are also exported
to userspace via name_to_handle_at(2).

fanotify also uses file handles to identify objects in events, but only
for filesystems that support NFS export.

Relax the requirement for NFS export support and allow more filesystems
to export a unique object identifier via name_to_handle_at(2) with the
flag AT_HANDLE_FID.

A file handle requested with the AT_HANDLE_FID flag, may or may not be
usable as an argument to open_by_handle_at(2).

To allow filesystems to opt-in to supporting AT_HANDLE_FID, a struct
export_operations is required, but even an empty struct is sufficient
for encoding FIDs.

Acked-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Acked-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Amir Goldstein &lt;amir73il@gmail.com&gt;
Acked-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Message-Id: &lt;20230502124817.3070545-4-amir73il@gmail.com&gt;
</content>
</entry>
<entry>
<title>mm/memfd: add F_SEAL_EXEC</title>
<updated>2023-01-19T01:12:37Z</updated>
<author>
<name>Daniel Verkamp</name>
<email>dverkamp@chromium.org</email>
</author>
<published>2022-12-15T00:12:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6fd7353829cafc4067aad9eea0dc95da67e7df16'/>
<id>urn:sha1:6fd7353829cafc4067aad9eea0dc95da67e7df16</id>
<content type='text'>
Patch series "mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC", v8.

Since Linux introduced the memfd feature, memfd have always had their
execute bit set, and the memfd_create() syscall doesn't allow setting it
differently.

However, in a secure by default system, such as ChromeOS, (where all
executables should come from the rootfs, which is protected by Verified
boot), this executable nature of memfd opens a door for NoExec bypass and
enables “confused deputy attack”.  E.g, in VRP bug [1]: cros_vm
process created a memfd to share the content with an external process,
however the memfd is overwritten and used for executing arbitrary code and
root escalation.  [2] lists more VRP in this kind.

On the other hand, executable memfd has its legit use, runc uses memfd’s
seal and executable feature to copy the contents of the binary then
execute them, for such system, we need a solution to differentiate runc's
use of executable memfds and an attacker's [3].

To address those above, this set of patches add following:
1&gt; Let memfd_create() set X bit at creation time.
2&gt; Let memfd to be sealed for modifying X bit.
3&gt; A new pid namespace sysctl: vm.memfd_noexec to control the behavior of
   X bit.For example, if a container has vm.memfd_noexec=2, then
   memfd_create() without MFD_NOEXEC_SEAL will be rejected.
4&gt; A new security hook in memfd_create(). This make it possible to a new
   LSM, which rejects or allows executable memfd based on its security policy.


This patch (of 5):

The new F_SEAL_EXEC flag will prevent modification of the exec bits:
written as traditional octal mask, 0111, or as named flags, S_IXUSR |
S_IXGRP | S_IXOTH.  Any chmod(2) or similar call that attempts to modify
any of these bits after the seal is applied will fail with errno EPERM.

This will preserve the execute bits as they are at the time of sealing, so
the memfd will become either permanently executable or permanently
un-executable.

Link: https://lkml.kernel.org/r/20221215001205.51969-1-jeffxu@google.com
Link: https://lkml.kernel.org/r/20221215001205.51969-2-jeffxu@google.com
Signed-off-by: Daniel Verkamp &lt;dverkamp@chromium.org&gt;
Co-developed-by: Jeff Xu &lt;jeffxu@google.com&gt;
Signed-off-by: Jeff Xu &lt;jeffxu@google.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Dmitry Torokhov &lt;dmitry.torokhov@gmail.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Jorge Lucangeli Obes &lt;jorgelo@chromium.org&gt;
Cc: Shuah Khan &lt;skhan@linuxfoundation.org&gt;
Cc: David Herrmann &lt;dh.herrmann@gmail.com&gt;
Cc: kernel test robot &lt;lkp@intel.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
</feed>
