| Age | Commit message (Collapse) | Author |
|
Turn noatime and nodiratime into per-mount instead of per-sb flags.
After all the preparations this is a rather trivial patch. The mount code
needs to treat the two options as per-mount instead of per-superblock, and
touch_atime needs to be changed to check the new MNT_ flags in addition to
the MS_ flags that are kept for filesystems that are always
noatime/nodiratime but not user settable anymore. Besides that core code
only nfs needed an update because it's leaving atime updates to the server
and thus sets the S_NOATIME flag on every inode, but needs to know whether
it's a real noatime mount for an getattr optimization.
While we're at it I've killed the IS_NOATIME/IS_NODIRATIME macros that were
only used by touch_atime.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Small cleanups in shared mounts code.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
An unbindable mount does not forward or receive propagation. Also
unbindable mount disallows bind mounts. The semantics is as follows.
Bind semantics:
It is invalid to bind mount an unbindable mount.
Move semantics:
It is invalid to move an unbindable mount under shared mount.
Clone-namespace semantics:
If a mount is unbindable in the parent namespace, the corresponding
cloned mount in the child namespace becomes unbindable too. Note:
there is subtle difference, unbindable mounts cannot be bind mounted
but can be cloned during clone-namespace.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
A slave mount always has a master mount from which it receives
mount/umount events. Unlike shared mount the event propagation does not
flow from the slave mount to the master.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This creates shared mounts. A shared mount when bind-mounted to some
mountpoint, propagates mount/umount events to each other. All the
shared mounts that propagate events to each other belong to the same
peer-group.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
A private mount does not forward or receive propagation. This patch
provides user the ability to convert any mount to private.
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The way we currently deal with quota and process accounting that might
keep vfsmount busy at umount time is inherently broken; we try to turn
them off just in case (not quite correctly, at that) and
a) pray umount doesn't fail (otherwise they'll stay turned off)
b) pray nobody doesn anything funny just as we turn quota off
Moreover, LSM provides hooks for doing the same sort of broken logics.
The proper way to deal with that is to introduce the second kind of
reference to vfsmount. Semantics:
- when the last normal reference is dropped, all special ones are
converted to normal ones and if there had been any, cleanup is done.
- normal reference can be cloned into a special one
- special reference can be converted to normal one; that's a no-op if
we'd already passed the point of no return (i.e. mntput() had
converted special references to normal and started cleanup).
The way it works: e.g. starting process accounting converts the vfsmount
reference pinned by the opened file into special one and turns it back
to normal when it gets shut down; acct_auto_close() is done when no
normal references are left. That way it does *not* obstruct umount(2)
and it silently gets turned off when the last normal reference to
vfsmount is gone. Which is exactly what we want...
The same should be done by LSM module that holds some internal
references to vfsmount and wants to shut them down on umount - it should
make them special and security_sb_umount_close() will be called exactly
when the last normal reference to vfsmount is gone.
quota handling is even simpler - we don't use normal file IO anymore, so
there's no need to hold vfsmounts at all. DQUOT_OFF() is done from
deactivate_super(), where it really belongs.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
kernel/power/disk.c needs a declaration of name_to_dev_t() in scope. mount.h
seems like an appropriate choice.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch renames _mntput() to something a little more descriptive:
mntput_no_expire().
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch renames vfsmount->mnt_fslink to something a little more
descriptive: vfsmount->mnt_expire.
Signed-off-by: Mike Waychison <michael.waychison@sun.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
<linux/mount.h> uses atomic_t and spinlock_t, but doesn't include either
<asm/atomic.h> or <linux/spinlock.h>, which means that any users of
<linux/mount.h> have to include them. This patch adds the necessary
#includes to avoid this.
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Here's a patch that I worked out with Al Viro that adds support for a
filesystem (such as kAFS) to perform automounting intrinsically without the
need for a userspace daemon. It also adds support for such mountpoints to be
degraded at the filesystem's behest until they've been untouched long enough
that they'll be removed.
I've a patch (to follow) that removes some #ifdef's from fs/afs/* thus
allowing it to make use of this facility.
There are five pieces to this:
(1) Any interested filesystem needs to have at least one list to which
expirable mountpoints can be added.
Access to this list is governed by the vfsmount_lock.
(2) When a filesystem wants to create an expirable mount, it calls
do_kern_mount() to get a handle on the filesystem it wants mounting, and
then calls do_add_mount() to mount that filesystem on the designated
mountpoint, supplying the list mentioned in (1) to which the vfsmount
will be added.
In kAFS's case, the mountpoint is a directory with a follow_link() method
defined (fs/afs/mntpt.c). This uses the struct nameidata supplied as an
argument as a determination of where the new filesystem should be
mounted.
(3) When something using a vfsmount finishes dealing with it, it calls
mntput(). This unmarks the vfsmount for immediate expiry.
There are two criteria for determining if a vfsmount may be expired - it
mustn't be marked as in use for anything other than being a child of
another vfsmount, and it must have an expiry mark against it already.
(4) The filesystem then determines the policy on expiring the mounts created
in (2). When it feels the need to, it passes the list mentioned in (1) to
mark_mounts_for_expiry() to request everything on the list be expired.
This function examines each mount listed. If the vfsmount meets the
criteria mentioned in (3), then the vfsmount is deleted from the
namespace and disposed of as for unmounting; otherwise the vfsmount is
left untouched apart from now bearing an expiration mark if it didn't
before.
kAFS's expiration policy is simply to invoke this process at regular
intervals for all the mounts on its list.
(5) An expiration facility is also provided to userspace: by calling umount()
with a MNT_EXPIRE flag, it can make a request to unmount only if the
mountpoint hasn't been used since the last request and isn't in use now.
This allows expiration to be driven by userspace instead of by the
kernel if that is desirable.
This also means that do_umount() has to use a different version of
path_release() to everyone else... it can't call mntput() as that clears
the expiration flag, thus rendering this unachievable; so it's version of
path_release() calls _mntput(), which doesn't do the clear.
My original idea was to give the kernel more knowledge of automounted
things. This avoids a certain problem with stat() on a mountpoint causing it
to mount (for example, do "ls -l /afs" on a machine with kAFS), but Al wanted
it done this way.
> Why is autofs unsuitable?
Because:
(1) Autofs is flat; AFS requires a tree - mounts on mounts on mounts on
mounts...
(2) AFS holds the data as to what the mountpoints are and where they go, and
these may be cross-links to subtrees beyond your control. It's also not
trivial to extract a list of mountpoints as is required for autofs.
(3) Autofs is not namespace safe.
(4) Ducking back to userspace to get that to do the mount is pretty tricky if
namespaces are involved.
In fact, autofs may well want to make use of this facility.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: Maneesh Soni <maneesh@in.ibm.com>
While path walking we do follow_mount or follow_down which uses
dcache_lock for serialisation. vfsmount related operations also use
dcache_lock for all updates. I think we can use a separate lock for
vfsmount related work and can improve path walking.
The following two patches does the same. The first one replaces
dcache_lock with new vfsmount_lock in namespace.c. The lock is
local to namespace.c and is not required outside. The second patch
uses RCU to have lock free lookup_mnt(). The patches are quite simple
and straight forward.
The lockmeter reults show reduced contention, and lock acquisitions
for dcache_lock while running dcachebench* on a 4-way SMP box
SPINLOCKS HOLD WAIT
UTIL CON MEAN( MAX ) MEAN( MAX )(% CPU) TOTAL NOWAIT SPIN RJECT NAME
baselkm-2569:
20.7% 20.9% 0.5us( 146us) 2.9us( 144us)(0.81%) 31590840 79.1% 20.9% 0% dcache_lock
mntlkm-2569:
14.3% 13.6% 0.4us( 170us) 2.9us( 187us)(0.42%) 23071746 86.4% 13.6% 0% dcache_lock
We get more than 8% improvement on 4-way SMP and 44% improvement on 16-way
NUMAQ while runing dcachebench*.
Average (usecs/iteration) Std. Deviation
(lower is better)
4-way SMP
2.5.69 15739.3 470.90
2.5.69-mnt 14459.6 298.51
16-way NUMAQ
2.5.69 120426.5 363.78
2.5.69-mnt 63225.8 427.60
*dcachebench is a microbenchmark written by Bill Hartner and is available at
http://www-124.ibm.com/developerworks/opensource/linuxperf/dcachebench/dcachebench.html
vfsmount_lock.patch
-------------------
- Patch for replacing dcache_lock with new vfsmount_lock for all mount
related operation. This removes the need to take dcache_lock while
doing follow_mount or follow_down operations in path walking.
I re-ran dcachebench with 2.5.70 as base on 16-way NUMAQ box.
Average (usecs/iteration) Std. Deviation
(lower is better)
16-way NUMAQ
2.5.70 120710.9 230.67
+ vfsmount_lock.patch 65209.6 242.97
+ lookup_mnt-rcu.patch 64042.3 416.61
So just the lock splitting (vfsmount_lock.patch) gives almost similar benifits
|
|
(i) The prototypes for free_vfsmnt(), alloc_vfsmnt(), do_kern_mount()
so far occurred in several individual c files. Now they are in
<linux/mount.h>.
(ii) do_kern_mount() has a third argument name that is typically a
constant. It is called with "rootfs", "nfsd", type->name,
"capifs", "usbdevfs", "binfmt_misc" etc. So, it should have a
prototype that expresses this:
do_kern_mount(const char *fstype, int flags, const char *name, void *data);
This makes the ugly cast
- return do_kern_mount(type->name, 0, (char *)type->name, NULL);
+ return do_kern_mount(type->name, 0, type->name, NULL);
go away. Now do_kern_mount() calls type->get_sb(), so also get_sb()
must have a const third argument. That is what the patch below does.
If I am not mistaken, precisely two filesystems do not treat this
argument as a constant, namely afs and cifs. A separate patch
gives some cleanup there.
|
|
- Include dcache.h/namei.h in fs/autofs/autofs_i.h not dirhash.c
- Include list.h and spinlock.h in dcache.h
- Include list.h in mount.h and namei.h
|
|
- Keith Owens: module exporting error checking
- Greg KH: USB update
- Paul Mackerras: clean up wait_init_idle(), ppc prefetch macros
- Jan Kara: quota fixes
- Abraham vd Merwe: agpgart support for Intel 830M
- Jakub Jelinek: ELF loader cleanups
- Al Viro: more cleanups
- David Miller: sparc64 fix, netfilter fixes
- me: tweak resurrected oom handling
|
|
- Greg KH: start migration to new "min()/max()"
- Roman Zippel: move affs over to "min()/max()".
- Vojtech Pavlik: VIA update (make sure not to IRQ-unmask a vt82c576)
- Jan Kara: quota bug-fix (don't decrement quota for non-counted inode)
- Anton Altaparmakov: more NTFS updates
- Al Viro: make nosuid/noexec/nodev be per-mount flags, not per-filesystem
- Alan Cox: merge input/joystick layer differences, driver and alpha merge
- Keith Owens: scsi Makefile cleanup
- Trond Myklebust: fix oopsable race in locking code
- Jean Tourrilhes: IrDA update
|
|
- Jeff Hartmann: serverworks AGP gart unload memory leak fix
- Marcelo Tosatti: make zone_inactive_shortage() return how big the shortage is.
- Hugh Dickins: tidy up age_page_down()
- Al Viro: super block handling cleanups
|
|
- Takanori Kawano: brlock indexing bugfix
- Ingo Molnar, Jeff Garzik: softirq updates and fixes
- Al Viro: rampage of superblock cleanups.
- Jean Tourrilhes: Orinoco driver update v6, IrNET update
- Trond Myklebust: NFS brown-paper-bag thing
- Tim Waugh: parport update
- David Miller: networking and sparc updates
- Jes Sorensen: m68k update.
- Ben Fennema: UDF update
- Geert Uytterhoeven: fbdev logo updates
- Willem Riede: osst driver updates
- Paul Mackerras: PPC update
- Marcelo Tosatti: unlazy swap cache
- Mikulas Patocka: hpfs update
|
|
- Alan Cox: camera conversion missed parts
- Neil Brown: md graceful alloc failure
- Andrea Arkangeli: more alpha fixups, bounce buffer deadlock avoidance
- Adam Fritzler: tms380tr driver update
- Al Viro: VFS layer cleanups
|
|
- Johannes Erdfelt: OHCI hash-chain corruption fix, USB updates
- Richard Henderson, Ivan Kokshaysky: alpha PCI iommu fixes
- Tim Waugh: parport changelogs and printk levels
- Andrew Morton: vmalloc off-by-one (overly sensitive) test
- Al Viro: VFS layer cleanups
- Cort Dougan: PPC updates (big bootloader re-org)
- Alan Cox: more merges, remove phillips camera conversion code
- Andrea Arkangeli: alpha fixups
- OGAWA Hirofumi: big-sector support with FAT
- Neil Brown: more md fixes
|
|
|