| Age | Commit message (Collapse) | Author |
|
This is the first patch in a series of patches that removes devfs
support from the kernel. This patch removes the core devfs code, and
its private header file.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.
The filesystem is then required to manually set the superblock and root dentry
pointers. For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).
The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.
This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing. In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.
The patch also makes the following changes:
(*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
pointer argument and return an integer, so most filesystems have to change
very little.
(*) If one of the convenience function is not used, then get_sb() should
normally call simple_set_mnt() to instantiate the vfsmount. This will
always return 0, and so can be tail-called from get_sb().
(*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
dcache upon superblock destruction rather than shrink_dcache_anon().
This is required because the superblock may now have multiple trees that
aren't actually bound to s_root, but that still need to be cleaned up. The
currently called functions assume that the whole tree is rooted at s_root,
and that anonymous dentries are not the roots of trees which results in
dentries being left unculled.
However, with the way NFS superblock sharing are currently set to be
implemented, these assumptions are violated: the root of the filesystem is
simply a dummy dentry and inode (the real inode for '/' may well be
inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
with child trees.
[*] Anonymous until discovered from another tree.
(*) The documentation has been adjusted, including the additional bit of
changing ext2_* into foo_* in the documentation.
[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This is a conversion to make the various file_operations structs in fs/
const. Basically a regexp job, with a few manual fixups
The goal is both to increase correctness (harder to accidentally write to
shared datastructures) and reducing the false sharing of cachelines with
things that get dirty in .data (while .rodata is nicely read only and thus
cache clean)
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.
Modified-by: Ingo Molnar <mingo@elte.hu>
(finished the conversion)
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
This is the fs/ part of the big kfree cleanup patch.
Remove pointless checks for NULL prior to calling kfree() in fs/.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This fixes up the symlink functions for the calling convention change:
* afs, autofs4, befs, devfs, freevxfs, jffs2, jfs, ncpfs, procfs,
smbfs, sysvfs, ufs, xfs - prototype change for ->follow_link()
* befs, smbfs, xfs - same for ->put_link()
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Use the new lock initializers DEFINE_SPIN_LOCK and DEFINE_RW_LOCK
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch corrects a problem that was originally added with the nanosecond
timestamps in stat patch. The problem is that some file systems don't have
enough space in their on disk inode to save nanosecond timestamps, so they
truncate the c/a/mtime to seconds when flushing an dirty node. In core the
inode would have full jiffies granuality.
This can be observed by programs as a timestamp that jumps backwards under
specific loads when an inode is flushed and then reloaded from disk.
The problem was already known when the original patch went in, but it
wasn't deemed important enough at that time. So far there has been only
one report of it causing problems. Now Tridge is worried that it will
break running Excel over samba4 because Excel seems to do very anal
timestamp checking and samba4 will supply 100ns timestamps over the
network.
This patch solves it by putting the time resolution into the superblock of
a fs and always rounding the in core timestamps to that granuality.
This also supercedes some previous ext2/3 hacks to flush the inode less
often when only the subsecond timestamp changes.
I tried to keep the overhead low, in particular it tries to keep divisions
out of fast paths as far as possible.
The patch is quite big but 99% of it is just relatively straight forward
search'n'replace in a lot of fs. Unconverted filesystems will default to a
1ns granuality, but may still show the problem if they continue to use
CURRENT_TIME. I converted all in tree fs.
One possible future extension of this would be to have two time
granualities per superblock - one that specifies the visible resolution,
and the other to specify how often timestamps should be flushed to disk,
which could be tuned with a mount option per fs (e.g. often m/atimes don't
need to be flushed every second). Would be easy to do as an addon if
someone is interested.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
There's no reason to directly #include <asm/bitops.h> since it's
available on all architectures and also included by
#include <linux/bitops.h>.
This patch changes #include <asm/bitops.h> to #include <linux/bitops.h>.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Only legit user is the partitioning code, in addition some uml code is
still using despite the uml people beeing told to fix it at least two
times.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This cleans up and simplifies drivers, and also allows us future
simplification in the VFS layer, since it removes knowledge about
internal VFS layer handling of "f_pos".
|
|
trivial cases - ones where we have no need to clean up after pathname
traversal (link body embedded into inode, etc.).
Plugged leak in devfs_follow_link(), while we are at it.
|
|
From: Mika Kukkonen <mika@osdl.org>
Signed-off-by: Randy Dunlap <rddunlap@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
Nobody seems to have any outstanding work against devfs, so...
|
|
From: Andrey Borzenkov <arvidjaar@mail.ru>
- use struct nameidata in devfs_d_revalidate_wait to detect when it is
called without i_sem hold; take i_sem on parent in this case. This
prevents both deadlock with devfs_lookup by allowing it to drop i_sem
consistently and oops in d_instantiate by ensuring that it always runs
protected
- remove dead code that deals with major number allocation. The only
remaining user was devfs itself and patch changes it to
- use register_chardev to get device number for internal /dev/.devfsd and
/dev/.statd.
- remove dead auto allocation flag as well
- remove code that does module get on dev open - it is handled by fops_get.
Use init_special_inode consistently
- get rid of struct cdev_type and bdev_type - both have just single dev_t
now
|
|
From: James Morris <jmorris@redhat.com>
devfs is passing an empty string to do_mount when it expects a page.
|
|
From: viro@parcelfarce.linux.theplanet.co.uk <viro@parcelfarce.linux.theplanet.co.uk>
bd_acquire() made static, switched to returning the block_device it had
found. Callers updated.
|
|
From: Michael Still <mikal@stillhq.com>
The patch squelches build errors in the kernel-doc make targets by adding
documentation to arguements previously not documented, and updating the
argument names where they have changed.
|
|
From: Jeremy Fitzhardinge <jeremy@goop.org>
I'm resending my patch to fix this problem. To recap: every task_struct
has its own copy of the thread group's pgrp. Only the thread group
leader is allowed to change the tgrp's pgrp, but it only updates its own
copy of pgrp, while all the other threads in the tgrp use the old value
they inherited on creation.
This patch simply updates all the other thread's pgrp when the tgrp
leader changes pgrp. Ulrich has already expressed reservations about
this patch since it is (1) incomplete (it doesn't cover the case of
other ids which have similar problems), (2) racy (it doesn't synchronize
with other threads looking at the task pgrp, so they could see an
inconsistent view) and (3) slow (it takes linear time with respect to
the number of threads in the tgrp).
My reaction is that (1) it fixes the actual bug I'm encountering in a
real program. (2) doesn't really matter for pgrp, since it is mostly an
issue with respect to the terminal job-control code (which is even more
broken without this patch. Regarding (3), I think there are very few
programs which have a large number of threads which change process group
id on a regular basis (a heavily multi-threaded job-control shell?).
Ulrich also said he has a (proposed?) much better fix, which I've been
looking forward to. I'm submitting this patch as a stop-gap fix for a
real bug, and perhaps to prompt the improved patch.
An alternative fix, at least for pgrp, is to change all references to
->pgrp to group_leader->pgrp. This may be sufficient on its own, but it
would be a reasonably intrusive patch (I count 95 instances in 32 files
in the 2.6.0-test3-mm3 tree).
|
|
From: Randy Hron <rwhron@earthlink.net>
remove unneeded linux/version.h usage & some duplicate
#includes;
|
|
the last kdev_t object is gone; ->i_rdev switched to dev_t.
|
|
misc trivial cleanups
|
|
From: Andrey Borzenkov <arvidjaar@mail.ru>
_devfs_walk_path does not check if de it is about to scan is a directory.
Next step is spinlock on non-spinlock memory. It requires either artificial
setup or really broken driver but fairly easy to reproduce once you know how.
It is likely to exist in 2.4 as well.
|
|
From: Andrey Borzenkov <arvidjaar@mail.ru>
devfs_mk_dir freed wrong de and incorrectly passed to devfsd already freed
de. Besides it did not even check if entry found was actually directory.
|
|
From: Andrey Borzenkov <arvidjaar@mail.ru>
A while back Andrey fixed a devfs bug in which we were running
remove_wait_queue() against a wait_queue_head which was on another process's
stack, and which had gone out of scope.
The patch reverts that fix and does it the same way as 2.4: just leave the
waitqueue struct dangling on the waitqueue_head: there is no need to touch it
at all.
It adds a big comment explaining why we are doing this nasty thing.
|
|
As noted by Gergely Nagy:
"devfs_mk_cdev() first checks the mode passed to it, and if it thinks
it is not a char device, it prints a warning and aborts. Now, this
printing involves the local variable `buf' (char buf[64]), which is
not initialised at that point."
The same problem also affects devfs_mk_bdev.
Fixed thus.
|
|
From: Andrey Borzenkov <arvidjaar@mail.ru>
The code that did proper check existed in 2.4 and was removed in 2.5 for
whatever reason. The patch restores it slightly modified as below.
2.4 code looks somewhat unclean in that
- it traverses task list without lock.
- it starts from current->real_parent but nothing prevents current be
init_task itself. This hung for me on 2.5 during boot. May be 2.4 does
something differently.
|
|
From: Andrey Borzenkov <arvidjaar@mail.ru>
I finally hit a painfully trivial way to reproduce another long standing devfs
problem - deadlock between devfs_lookup and devfs_d_revalidate_wait. When
devfs_lookup releases directory i_sem devfs_d_revalidate_wait grabs it (it
happens not for every path) and goes to wait to be waked up. Unfortunately,
devfs_lookup attempts to acquire directory i_sem before ever waking it up ...
To reproduce (2.5.74 UP or SMP - does not matter, single CPU system)
ls /dev/foo & rm -f /dev/foo &
or possibly in a loop but then it easily fills up process table. In my case it
hangs 100% reliably - on 2.5 OR 2.4.
The current fix is to move re-acquire of i_sem after all
devfs_d_revalidate_wait waiters have been waked up. Much better fix would be
to ensure that ->d_revalidate either is always called under i_sem or always
without. But that means the very heart of VFS and I do not dare to touch it.
The fix has been tested on 2.4 (and is part of unofficial Mandrake Club
kernel); I expected the same bug is in 2.5; I just was stupid not seeing the
way to reproduce it before.
|
|
From: Andrey Borzenkov <arvidjaar@mail.ru>
Doing concurrent lookups for the same name in devfs with devfsd and modules
enabled may result in stack coruption.
When devfs_lookup needs to call devfsd it arranges for other lookups for the
same name to wait. It is using local variable as wait queue head. After
devfsd returns devfs_lookup wakes up all waiters and returns. Unfortunately
there is no garantee all waiters will actually get chance to run and clean up
before devfs_lookup returns. so some of them attempt to access already freed
storage on stack.
It is trivial to trigger with SMP kernel (I have single-CPU system if it
matters) doing
while true
do
ls /dev/foo &
done
Without spinlock debug system usually hung dead with reset button as the only
possibility.
I was not able to reproduce it on 2.4 on single-CPU system - in 2.4
devfs_d_revalidate_wait does not attempt to remove itself from wait queue
so it appears to be safe.
The patch makes lookup struct be allocated from heap and adds reference
counter to free it when no more needed.
|
|
- Add open intent information to the 'struct nameidata'.
- Pass the struct nameidata as an optional parameter to the
lookup() inode operation.
- Pass the struct nameidata as an optional parameter to the
d_revalidate() dentry operation.
- Make link_path_walk() set the LOOKUP_CONTINUE flag in nd->flags instead
of passing it as an extra parameter to d_revalidate().
- Make open_namei(), and sys_uselib() set the open()/create() intent
data.
|
|
From: Christoph Hellwig <hch@lst.de>
There may be multiple gendisks with the same .devfs_name in scsi and we
call devfs_mk_dir on each of them. At present that causes a nasty error
message. It is better to permit devfs_mk_dir() to appear to have succeeded.
ie: it's a `mkdir -p'.
|
|
From: Pavel Roskin <proski@gnu.org>, via Christoph Hellwig <hch@infradead.org>
It's already the second time that I encounter a kernel panic in the same
place. When devfs_remove() is called on a non-existent file entry, the
kernel panics and I have to reboot the system.
First time it was unregistering of pseudoterminals. This time it's
ide-floppy module that doesn't register devfs entries if the media is absent
but still tries to unregister them. The bug in ide-floppy will be reported
separately.
The point of this message is that the failure in devfs_remove() is possible,
especially with rarely used drivers. Secondly, is not fatal enough to
justify an immediate panic and reboot. Thirdly, devfs misses a chance to
tell the user what's going wrong.
|
|
(i) The prototypes for free_vfsmnt(), alloc_vfsmnt(), do_kern_mount()
so far occurred in several individual c files. Now they are in
<linux/mount.h>.
(ii) do_kern_mount() has a third argument name that is typically a
constant. It is called with "rootfs", "nfsd", type->name,
"capifs", "usbdevfs", "binfmt_misc" etc. So, it should have a
prototype that expresses this:
do_kern_mount(const char *fstype, int flags, const char *name, void *data);
This makes the ugly cast
- return do_kern_mount(type->name, 0, (char *)type->name, NULL);
+ return do_kern_mount(type->name, 0, type->name, NULL);
go away. Now do_kern_mount() calls type->get_sb(), so also get_sb()
must have a const third argument. That is what the patch below does.
If I am not mistaken, precisely two filesystems do not treat this
argument as a constant, namely afs and cifs. A separate patch
gives some cleanup there.
|
|
From: Christoph Hellwig <hch@lst.de>
Whee! devfs_register isn't used anymore in the whole tree and with
it some other devfs crap. Kill it for good.
|
|
From: Christoph Hellwig <hch@lst.de>
Some people may already have noticed that I've been revamping the devfs API
recently. The worst offender still left is devfs_register, it's prototype
is:
devfs_handle_t devfs_register(devfs_handle_t dir,
const char *name, unsigned int flags,
unsigned int major, unsigned int minor,
umode_t mode, void *ops, void *info)
Of these:
- dir and flags are always zero
- the return value is never used
- info is only used in one driver which doesn't even need it for
operation
- umode_t always describes a character device
- name very often comes from a stack buffer we sprintf'ed into
so obviously we really want a much simpler API instead. My first draft for
this was:
int devfs_mk_cdev(dev_t dev, umode_t mode,
struct file_operations *fops, void *info,
const char *fmt, ...)
this removes the unused argumens, switches to a proper dev_t for the device
number and allows to directly use a printf-like expression as name, getting
rid of the temporary buffers.
Now Al has reappeared and put the first steps of his CIDR for charater device
on public ftp and we'll soon have a similar lookup object + fops mechanism in
generic code as we already habe for blockdevices, i.e. the devfs code to
assign fops from an entry will become superflous as generic code already does
it. That means the fops and info arguments are obsolete before they were
introduced, so I'd like to propose the following API instead:
int devfs_mk_cdev(dev_t dev, umode_t mode, const char *fmt, ...)
which is much nicer anyway. The educated reader will notice that this is
exactly the same prototype devfs_mk_bdev has so I'll probably get suggestions
to merge those two into some kind of devfs_mk_node soon. Personally I don't
like that as character and blockdevices are two really separate entinities
and I'll like to keep them as separate as possible.
Example patch that introduces the API and converts drivers/input attached.
Every driver which calls devfs_mk_cdev (about 50) needs conversion. Note
that the transition can happen in pieces - devfs_register continues to work
after this patch, it's just the plan to get rid of it in the end.
|
|
Return an error code instead of a devfs_handle_t. The handle isn't
useful for anything and the !CONFIG_DEVFS_FS stub in fact returned
NULL which made it entirely useless. Thus only one driver is actually
checking the retval in the current tree..
|
|
|
|
|
|
Previously gendisk.devfs_name was used only for partitioned devices
or CDroms, and for the latter it was slightly broken. Fix it to
work genericly for all gendisks.
|
|
Replaces devfs_register for block devices. Note that we do NOT pass in
an operaion vector here - it was unused in devfs_register already
and our block device code fundamentally ties the operations to the
gendisk. There will be only very few callers of this one anyway..
|
|
|
|
There's just one caller in fs/devfs/base.c left.
|
|
Always pass around the pathnames for the devfs entries / directories
instead of the devfs_handle_ts. Cleanes up the code massivly.
|
|
Pass in the path directly instead of getting it from a devfs_handle_t.
|
|
As several people found out while I was asleep I sent you
a bogus patch version and devfs didn't compile in your
tree since then. Fix it.
|
|
Okay, all flags are gone from devfs callers, time to remove the gunk
handling it. devfs_register prototype will change later.
|
|
.. by moving a bunch of devfs-related code from fs/partition/check.c
to fs/devfs/base.c. Also has the nice sideffect of getting rid of
a bunch of ugly ifdefs.
[This is the new and improved, rediffed, applying and compilable
version. In short it's perfect]
|
|
All arguments except the name are unused - remove them and make the
name printf-like to avoid a few snprintf in the surrounding code.
(also fixes compilation to due a superflous endif in dvb core)
|
|
All devfs_mk_symlink arguments except the from and to strings are
unused. Bring the prototype in shape.
|