<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/nsproxy.c, branch next/HEAD</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=next%2FHEAD</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=next%2FHEAD'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2026-04-01T13:15:28Z</updated>
<entry>
<title>Merge branch 'namespaces-7.1.misc' into vfs.all</title>
<updated>2026-04-01T13:15:28Z</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2026-04-01T13:15:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8370d1a0baa48b0da0821837a1e4643b5440733c'/>
<id>urn:sha1:8370d1a0baa48b0da0821837a1e4643b5440733c</id>
<content type='text'>
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;

# Conflicts:
#	kernel/fork.c
#	kernel/nsproxy.c
</content>
</entry>
<entry>
<title>nsproxy: Add FOR_EACH_NS_TYPE() X-macro and CLONE_NS_ALL</title>
<updated>2026-03-26T14:22:41Z</updated>
<author>
<name>Mickaël Salaün</name>
<email>mic@digikod.net</email>
</author>
<published>2026-03-12T10:04:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=935a04923ad293cd89bf6ec23fc4efc9cf1a0142'/>
<id>urn:sha1:935a04923ad293cd89bf6ec23fc4efc9cf1a0142</id>
<content type='text'>
Introduce the FOR_EACH_NS_TYPE(X) macro as the single source of truth
for the set of (struct type, CLONE_NEW* flag) pairs that define Linux
namespace types.

Currently, the list of CLONE_NEW* flags is duplicated inline in
multiple call sites and would need another copy in each new consumer.
This makes it easy to miss one when a new namespace type is added.

Derive two things from the X-macro:

- CLONE_NS_ALL: Bitmask of all known CLONE_NEW* flags, usable as a
  validity mask or iteration bound.

- ns_common_type(): Rewritten to use the X-macro via a leading-comma
  _Generic pattern, so the struct-to-flag mapping stays in sync with the
  flag set automatically.

Replace the inline flag enumerations in copy_namespaces(),
unshare_nsproxy_namespaces(), check_setns_flags(), and
ksys_unshare() with CLONE_NS_ALL.

When a new namespace type is added, only FOR_EACH_NS_TYPE needs to
be updated; CLONE_NS_ALL, ns_common_type(), and all the call sites
pick up the change automatically.

Cc: Christian Brauner &lt;brauner@kernel.org&gt;
Cc: Günther Noack &lt;gnoack@google.com&gt;
Signed-off-by: Mickaël Salaün &lt;mic@digikod.net&gt;
Link: https://patch.msgid.link/20260312100444.2609563-4-mic@digikod.net
Reviewed-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>namespace: allow creating empty mount namespaces</title>
<updated>2026-03-12T12:33:55Z</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2026-03-06T16:28:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9d4e752a24f740b31ca827bfab07010e4e7f34b0'/>
<id>urn:sha1:9d4e752a24f740b31ca827bfab07010e4e7f34b0</id>
<content type='text'>
Add support for creating a mount namespace that contains only a copy of
the root mount from the caller's mount namespace, with none of the
child mounts.  This is useful for containers and sandboxes that want to
start with a minimal mount table and populate it from scratch rather
than inheriting and then tearing down the full mount tree.

Two new flags are introduced:

- CLONE_EMPTY_MNTNS for clone3(), using the 64-bit flag space.

- UNSHARE_EMPTY_MNTNS for unshare(), reusing the
  CLONE_PARENT_SETTID bit which has no meaning for unshare.

Both flags imply CLONE_NEWNS.  For the unshare path,
UNSHARE_EMPTY_MNTNS is converted to CLONE_EMPTY_MNTNS in
unshare_nsproxy_namespaces() before it reaches copy_mnt_ns(), so the
mount namespace code only needs to handle a single flag.

In copy_mnt_ns(), when CLONE_EMPTY_MNTNS is set, clone_mnt() is used
instead of copy_tree() to clone only the root mount.  The caller's root
and working directory are both reset to the root dentry of the new
mount.

The cleanup variables are changed from vfsmount pointers with
__free(mntput) to struct path with __free(path_put) because the empty
mount namespace path needs to release both mount and dentry references
when replacing the caller's root and pwd.  In the normal (non-empty)
path only the mount component is set, and dput(NULL) is a no-op so
path_put remains correct there as well.

Link: https://patch.msgid.link/20260306-work-empty-mntns-consolidated-v1-1-6eb30529bbb0@kernel.org
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>nsproxy: fix free_nsproxy() and simplify create_new_namespaces()</title>
<updated>2025-11-14T12:10:38Z</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2025-11-11T21:29:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cefd55bd2159f427228d44864747243946296739'/>
<id>urn:sha1:cefd55bd2159f427228d44864747243946296739</id>
<content type='text'>
Make it possible to handle NULL being passed to the reference count
helpers instead of forcing the caller to handle this. Afterwards we can
nicely allow a cleanup guard to handle nsproxy freeing.

Active reference count handling is not done in nsproxy_free() but rather
in free_nsproxy() as nsproxy_free() is also called from setns() failure
paths where a new nsproxy has been prepared but has not been marked as
active via switch_task_namespaces().

Link: https://lore.kernel.org/690bfb9e.050a0220.2e3c35.0013.GAE@google.com
Link: https://patch.msgid.link/20251111-sakralbau-guthaben-7dcc277d337f@brauner
Fixes: 3c9820d5c64a ("ns: add active reference count")
Reported-by: syzbot+0b2e79f91ff6579bfa5b@syzkaller.appspotmail.com
Reported-by: syzbot+0a8655a80e189278487e@syzkaller.appspotmail.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>ns: add active reference count</title>
<updated>2025-11-03T16:41:17Z</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2025-10-29T12:20:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3a18f809184bc5a1cfad7cde5b8b026e2ff61587'/>
<id>urn:sha1:3a18f809184bc5a1cfad7cde5b8b026e2ff61587</id>
<content type='text'>
The namespace tree is, among other things, currently used to support
file handles for namespaces. When a namespace is created it is placed on
the namespace trees and when it is destroyed it is removed from the
namespace trees.

While a namespace is on the namespace trees with a valid reference count
it is possible to reopen it through a namespace file handle. This is all
fine but has some issues that should be addressed.

On current kernels a namespace is visible to userspace in the
following cases:

(1) The namespace is in use by a task.
(2) The namespace is persisted through a VFS object (namespace file
    descriptor or bind-mount).
    Note that (2) only cares about direct persistence of the namespace
    itself not indirectly via e.g., file-&gt;f_cred file references or
    similar.
(3) The namespace is a hierarchical namespace type and is the parent of
    a single or multiple child namespaces.

Case (3) is interesting because it is possible that a parent namespace
might not fulfill any of (1) or (2), i.e., is invisible to userspace but
it may still be resurrected through the NS_GET_PARENT ioctl().

Currently namespace file handles allow much broader access to namespaces
than what is currently possible via (1)-(3). The reason is that
namespaces may remain pinned for completely internal reasons yet are
inaccessible to userspace.

For example, a user namespace my remain pinned by get_cred() calls to
stash the opener's credentials into file-&gt;f_cred. As it stands file
handles allow to resurrect such a users namespace even though this
should not be possible via (1)-(3). This is a fundamental uapi change
that we shouldn't do if we don't have to.

Consider the following insane case: Various architectures support the
CONFIG_MMU_LAZY_TLB_REFCOUNT option which uses lazy TLB destruction.
When this option is set a userspace task's struct mm_struct may be used
for kernel threads such as the idle task and will only be destroyed once
the cpu's runqueue switches back to another task. But because of ptrace()
permission checks struct mm_struct stashes the user namespace of the
task that struct mm_struct originally belonged to. The kernel thread
will take a reference on the struct mm_struct and thus pin it.

So on an idle system user namespaces can be persisted for arbitrary
amounts of time which also means that they can be resurrected using
namespace file handles. That makes no sense whatsoever. The problem is
of course excarabted on large systems with a huge number of cpus.

To handle this nicely we introduce an active reference count which
tracks (1)-(3). This is easy to do as all of these things are already
managed centrally. Only (1)-(3) will count towards the active reference
count and only namespaces which are active may be opened via namespace
file handles.

The problem is that namespaces may be resurrected. Which means that they
can become temporarily inactive and will be reactived some time later.
Currently the only example of this is the SIOGCSKNS socket ioctl. The
SIOCGSKNS ioctl allows to open a network namespace file descriptor based
on a socket file descriptor.

If a socket is tied to a network namespace that subsequently becomes
inactive but that socket is persisted by another process in another
network namespace (e.g., via SCM_RIGHTS of pidfd_getfd()) then the
SIOCGSKNS ioctl will resurrect this network namespace.

So calls to open_related_ns() and open_namespace() will end up
resurrecting the corresponding namespace tree.

Note that the active reference count does not regulate the lifetime of
the namespace itself. This is still done by the normal reference count.
The active reference count can only be elevated if the regular reference
count is elevated.

The active reference count also doesn't regulate the presence of a
namespace on the namespace trees. It only regulates its visiblity to
namespace file handles (and in later patches to listns()).

A namespace remains on the namespace trees from creation until its
actual destruction. This will allow the kernel to always reach any
namespace trivially and it will also enable subsystems like bpf to walk
the namespace lists on the system for tracing or general introspection
purposes.

Note that different namespaces have different visibility lifetimes on
current kernels. While most namespace are immediately released when the
last task using them exits, the user- and pid namespace are persisted
and thus both remain accessible via /proc/&lt;pid&gt;/ns/&lt;ns_type&gt;.

The user namespace lifetime is aliged with struct cred and is only
released through exit_creds(). However, it becomes inaccessible to
userspace once the last task using it is reaped, i.e., when
release_task() is called and all proc entries are flushed. Similarly,
the pid namespace is also visible until the last task using it has been
reaped and the associated pid numbers are freed.

The active reference counts of the user- and pid namespace are
decremented once the task is reaped.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-11-2e6f823ebdc0@kernel.org
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>ns: rename to exit_nsproxy_namespaces()</title>
<updated>2025-11-03T16:41:17Z</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2025-10-29T12:20:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4b06b70c8244b442d58ae0fb59870cf31fdb422e'/>
<id>urn:sha1:4b06b70c8244b442d58ae0fb59870cf31fdb422e</id>
<content type='text'>
The current naming is very misleading as this really isn't exiting all
of the task's namespaces. It is only exiting the namespaces that hang of
off nsproxy. Reflect that in the name.

Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-10-2e6f823ebdc0@kernel.org
Reviewed-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs</title>
<updated>2025-09-29T18:20:29Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-09-29T18:20:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=18b19abc3709b109676ffd1f48dcd332c2e477d4'/>
<id>urn:sha1:18b19abc3709b109676ffd1f48dcd332c2e477d4</id>
<content type='text'>
Pull namespace updates from Christian Brauner:
 "This contains a larger set of changes around the generic namespace
  infrastructure of the kernel.

  Each specific namespace type (net, cgroup, mnt, ...) embedds a struct
  ns_common which carries the reference count of the namespace and so
  on.

  We open-coded and cargo-culted so many quirks for each namespace type
  that it just wasn't scalable anymore. So given there's a bunch of new
  changes coming in that area I've started cleaning all of this up.

  The core change is to make it possible to correctly initialize every
  namespace uniformly and derive the correct initialization settings
  from the type of the namespace such as namespace operations, namespace
  type and so on. This leaves the new ns_common_init() function with a
  single parameter which is the specific namespace type which derives
  the correct parameters statically. This also means the compiler will
  yell as soon as someone does something remotely fishy.

  The ns_common_init() addition also allows us to remove ns_alloc_inum()
  and drops any special-casing of the initial network namespace in the
  network namespace initialization code that Linus complained about.

  Another part is reworking the reference counting. The reference
  counting was open-coded and copy-pasted for each namespace type even
  though they all followed the same rules. This also removes all open
  accesses to the reference count and makes it private and only uses a
  very small set of dedicated helpers to manipulate them just like we do
  for e.g., files.

  In addition this generalizes the mount namespace iteration
  infrastructure introduced a few cycles ago. As reminder, the vfs makes
  it possible to iterate sequentially and bidirectionally through all
  mount namespaces on the system or all mount namespaces that the caller
  holds privilege over. This allow userspace to iterate over all mounts
  in all mount namespaces using the listmount() and statmount() system
  call.

  Each mount namespace has a unique identifier for the lifetime of the
  systems that is exposed to userspace. The network namespace also has a
  unique identifier working exactly the same way. This extends the
  concept to all other namespace types.

  The new nstree type makes it possible to lookup namespaces purely by
  their identifier and to walk the namespace list sequentially and
  bidirectionally for all namespace types, allowing userspace to iterate
  through all namespaces. Looking up namespaces in the namespace tree
  works completely locklessly.

  This also means we can move the mount namespace onto the generic
  infrastructure and remove a bunch of code and members from struct
  mnt_namespace itself.

  There's a bunch of stuff coming on top of this in the future but for
  now this uses the generic namespace tree to extend a concept
  introduced first for pidfs a few cycles ago. For a while now we have
  supported pidfs file handles for pidfds. This has proven to be very
  useful.

  This extends the concept to cover namespaces as well. It is possible
  to encode and decode namespace file handles using the common
  name_to_handle_at() and open_by_handle_at() apis.

  As with pidfs file handles, namespace file handles are exhaustive,
  meaning it is not required to actually hold a reference to nsfs in
  able to decode aka open_by_handle_at() a namespace file handle.
  Instead the FD_NSFS_ROOT constant can be passed which will let the
  kernel grab a reference to the root of nsfs internally and thus decode
  the file handle.

  Namespaces file descriptors can already be derived from pidfds which
  means they aren't subject to overmount protection bugs. IOW, it's
  irrelevant if the caller would not have access to an appropriate
  /proc/&lt;pid&gt;/ns/ directory as they could always just derive the
  namespace based on a pidfd already.

  It has the same advantage as pidfds. It's possible to reliably and for
  the lifetime of the system refer to a namespace without pinning any
  resources and to compare them trivially.

  Permission checking is kept simple. If the caller is located in the
  namespace the file handle refers to they are able to open it otherwise
  they must hold privilege over the owning namespace of the relevant
  namespace.

  The namespace file handle layout is exposed as uapi and has a stable
  and extensible format. For now it simply contains the namespace
  identifier, the namespace type, and the inode number. The stable
  format means that userspace may construct its own namespace file
  handles without going through name_to_handle_at() as they are already
  allowed for pidfs and cgroup file handles"

* tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (65 commits)
  ns: drop assert
  ns: move ns type into struct ns_common
  nstree: make struct ns_tree private
  ns: add ns_debug()
  ns: simplify ns_common_init() further
  cgroup: add missing ns_common include
  ns: use inode initializer for initial namespaces
  selftests/namespaces: verify initial namespace inode numbers
  ns: rename to __ns_ref
  nsfs: port to ns_ref_*() helpers
  net: port to ns_ref_*() helpers
  uts: port to ns_ref_*() helpers
  ipv4: use check_net()
  net: use check_net()
  net-sysfs: use check_net()
  user: port to ns_ref_*() helpers
  time: port to ns_ref_*() helpers
  pid: port to ns_ref_*() helpers
  ipc: port to ns_ref_*() helpers
  cgroup: port to ns_ref_*() helpers
  ...
</content>
</entry>
<entry>
<title>ns: move ns type into struct ns_common</title>
<updated>2025-09-25T07:23:54Z</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2025-09-24T11:33:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4055526d35746ce8b04bfa5e14e14f28bb163186'/>
<id>urn:sha1:4055526d35746ce8b04bfa5e14e14f28bb163186</id>
<content type='text'>
It's misplaced in struct proc_ns_operations and ns-&gt;ops might be NULL if
the namespace is compiled out but we still want to know the type of the
namespace for the initial namespace struct.

Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>copy_process: pass clone_flags as u64 across calltree</title>
<updated>2025-09-01T13:31:34Z</updated>
<author>
<name>Simon Schuster</name>
<email>schuster.simon@siemens-energy.com</email>
</author>
<published>2025-09-01T13:09:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=edd3cb05c00a040dc72bed20b14b5ba865188bce'/>
<id>urn:sha1:edd3cb05c00a040dc72bed20b14b5ba865188bce</id>
<content type='text'>
With the introduction of clone3 in commit 7f192e3cd316 ("fork: add
clone3") the effective bit width of clone_flags on all architectures was
increased from 32-bit to 64-bit, with a new type of u64 for the flags.
However, for most consumers of clone_flags the interface was not
changed from the previous type of unsigned long.

While this works fine as long as none of the new 64-bit flag bits
(CLONE_CLEAR_SIGHAND and CLONE_INTO_CGROUP) are evaluated, this is still
undesirable in terms of the principle of least surprise.

Thus, this commit fixes all relevant interfaces of callees to
sys_clone3/copy_process (excluding the architecture-specific
copy_thread) to consistently pass clone_flags as u64, so that
no truncation to 32-bit integers occurs on 32-bit architectures.

Signed-off-by: Simon Schuster &lt;schuster.simon@siemens-energy.com&gt;
Link: https://lore.kernel.org/20250901-nios2-implement-clone3-v2-2-53fcf5577d57@siemens-energy.com
Acked-by: David Hildenbrand &lt;david@redhat.com&gt;
Reviewed-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Reviewed-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
<entry>
<title>kernel/nsproxy: remove unnecessary guards</title>
<updated>2025-05-09T11:13:54Z</updated>
<author>
<name>Joel Savitz</name>
<email>jsavitz@redhat.com</email>
</author>
<published>2025-05-08T18:49:29Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5caa2d89b7f1c1249ca7cc47e10930d2cdabd8a5'/>
<id>urn:sha1:5caa2d89b7f1c1249ca7cc47e10930d2cdabd8a5</id>
<content type='text'>
In free_nsproxy() and the error path of create_new_namesapces() the
put_*_ns() calls are guarded by unnecessary NULL checks.

put_pid_ns(), put_ipc_ns(), put_uts_ns(), and put_time_ns() will never
receive a NULL argument unless their namespace type is disabled, and in
this case all four become no-ops at compile time anyway. put_mnt_ns()
will never receive a null argument at any time.

This unguarded usage is in line with other call sites of put_*_ns().

Signed-off-by: Joel Savitz &lt;jsavitz@redhat.com&gt;
Link: https://lore.kernel.org/20250508184930.183040-2-jsavitz@redhat.com
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
</entry>
</feed>
