| Age | Commit message (Collapse) | Author |
|
This patch fixes a race found by Ram in mark_mounts_for_expiry() in
fs/namespace.c.
The bug can only be triggered with simultaneous exiting of a process having
a private namespace, and expiry of a mount from within that namespace.
It's practically impossible to trigger, and I haven't even tried. But
still, a bug is a bug.
The race happens when put_namespace() is called by another task, while
mark_mounts_for_expiry() is between atomic_read() and get_namespace(). In
that case get_namespace() will be called on an already dead namespace with
unforeseeable results.
The solution was suggested by Al Viro, with his own words:
Instead of screwing with atomic_read() in there, why don't we
simply do the following:
a) atomic_dec_and_lock() in put_namespace()
b) __put_namespace() called without dropping lock
c) the first thing done by __put_namespace would be
struct vfsmount *root = namespace->root;
namespace->root = NULL;
spin_unlock(...);
....
umount_tree(root);
...
d) check in mark_... would be simply namespace && namespace->root.
And we are all set; no screwing around with atomic_read(), no magic
at all. Dying namespace gets NULL ->root.
All changes of ->root happen under spinlock.
If under a spinlock we see non-NULL ->mnt_namespace, it won't be
freed until we drop the lock (we will set ->mnt_namespace to NULL
under that lock before we get to freeing namespace).
If under a spinlock we see non-NULL ->mnt_namespace and
->mnt_namespace->root, we can grab a reference to namespace and be
sure that it won't go away.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Acked-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Another rollup of patches which give various symbols static scope
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Here's a patch that I worked out with Al Viro that adds support for a
filesystem (such as kAFS) to perform automounting intrinsically without the
need for a userspace daemon. It also adds support for such mountpoints to be
degraded at the filesystem's behest until they've been untouched long enough
that they'll be removed.
I've a patch (to follow) that removes some #ifdef's from fs/afs/* thus
allowing it to make use of this facility.
There are five pieces to this:
(1) Any interested filesystem needs to have at least one list to which
expirable mountpoints can be added.
Access to this list is governed by the vfsmount_lock.
(2) When a filesystem wants to create an expirable mount, it calls
do_kern_mount() to get a handle on the filesystem it wants mounting, and
then calls do_add_mount() to mount that filesystem on the designated
mountpoint, supplying the list mentioned in (1) to which the vfsmount
will be added.
In kAFS's case, the mountpoint is a directory with a follow_link() method
defined (fs/afs/mntpt.c). This uses the struct nameidata supplied as an
argument as a determination of where the new filesystem should be
mounted.
(3) When something using a vfsmount finishes dealing with it, it calls
mntput(). This unmarks the vfsmount for immediate expiry.
There are two criteria for determining if a vfsmount may be expired - it
mustn't be marked as in use for anything other than being a child of
another vfsmount, and it must have an expiry mark against it already.
(4) The filesystem then determines the policy on expiring the mounts created
in (2). When it feels the need to, it passes the list mentioned in (1) to
mark_mounts_for_expiry() to request everything on the list be expired.
This function examines each mount listed. If the vfsmount meets the
criteria mentioned in (3), then the vfsmount is deleted from the
namespace and disposed of as for unmounting; otherwise the vfsmount is
left untouched apart from now bearing an expiration mark if it didn't
before.
kAFS's expiration policy is simply to invoke this process at regular
intervals for all the mounts on its list.
(5) An expiration facility is also provided to userspace: by calling umount()
with a MNT_EXPIRE flag, it can make a request to unmount only if the
mountpoint hasn't been used since the last request and isn't in use now.
This allows expiration to be driven by userspace instead of by the
kernel if that is desirable.
This also means that do_umount() has to use a different version of
path_release() to everyone else... it can't call mntput() as that clears
the expiration flag, thus rendering this unachievable; so it's version of
path_release() calls _mntput(), which doesn't do the clear.
My original idea was to give the kernel more knowledge of automounted
things. This avoids a certain problem with stat() on a mountpoint causing it
to mount (for example, do "ls -l /afs" on a machine with kAFS), but Al wanted
it done this way.
> Why is autofs unsuitable?
Because:
(1) Autofs is flat; AFS requires a tree - mounts on mounts on mounts on
mounts...
(2) AFS holds the data as to what the mountpoints are and where they go, and
these may be cross-links to subtrees beyond your control. It's also not
trivial to extract a list of mountpoints as is required for autofs.
(3) Autofs is not namespace safe.
(4) Ducking back to userspace to get that to do the mount is pretty tricky if
namespaces are involved.
In fact, autofs may well want to make use of this facility.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
It has five callsites, and is big.
|
|
This fixes one place where I missed the replacing dcache_lock with
vfsmount_lock in put_namespace().
Tested with CLONE_NEWNS flag also.
|
|
(i) The prototypes for free_vfsmnt(), alloc_vfsmnt(), do_kern_mount()
so far occurred in several individual c files. Now they are in
<linux/mount.h>.
(ii) do_kern_mount() has a third argument name that is typically a
constant. It is called with "rootfs", "nfsd", type->name,
"capifs", "usbdevfs", "binfmt_misc" etc. So, it should have a
prototype that expresses this:
do_kern_mount(const char *fstype, int flags, const char *name, void *data);
This makes the ugly cast
- return do_kern_mount(type->name, 0, (char *)type->name, NULL);
+ return do_kern_mount(type->name, 0, type->name, NULL);
go away. Now do_kern_mount() calls type->get_sb(), so also get_sb()
must have a const third argument. That is what the patch below does.
If I am not mistaken, precisely two filesystems do not treat this
argument as a constant, namely afs and cifs. A separate patch
gives some cleanup there.
|
|
this one slipped through the last fix for the redeclarations i sent,
please apply this on to of the other one.
description:
umount_tree() is just used in namespace.[ch], so it declaration
belongs into namespace.h and not fs.h
|
|
Since namespace.h needs the contents of dcache, task struct and
semaphores, it seems sensible to include these two files into
namespace.h.
For the future: If the task_struct in sched.h is split into its own
include file, namespace.h could include this file, but namespace.h
will also need asm/semaphore.h
|
|
Big bits first, I'll redo the smaller bits tomorrow after some sleep.
Same as last time, rediffed against pre5
|
|
- Al Viro: task-private namespaces, more cleanups
|