user/sven/linux.git - Linux Kernel

Age	Commit message (Collapse)	Author
2006-06-23	[PATCH] vfs: add lock owner argument to flush operation	Miklos Szeredi
	Pass the POSIX lock owner ID to the flush operation. This is useful for filesystems which don't want to store any locking state in inode->i_flock but want to handle locking/unlocking POSIX locks internally. FUSE is one such filesystem but I think it possible that some network filesystems would need this also. Also add a flag to indicate that a POSIX locking request was generated by close(), so filesystems using the above feature won't send an extra locking request in this case. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23	[PATCH] VFS: Permit filesystem to override root dentry on mount	David Howells
	Extend the get_sb() filesystem operation to take an extra argument that permits the VFS to pass in the target vfsmount that defines the mountpoint. The filesystem is then required to manually set the superblock and root dentry pointers. For most filesystems, this should be done with simple_set_mnt() which will set the superblock pointer and then set the root dentry to the superblock's s_root (as per the old default behaviour). The get_sb() op now returns an integer as there's now no need to return the superblock pointer. This patch permits a superblock to be implicitly shared amongst several mount points, such as can be done with NFS to avoid potential inode aliasing. In such a case, simple_set_mnt() would not be called, and instead the mnt_root and mnt_sb would be set directly. The patch also makes the following changes: () the get_sb_() convenience functions in the core kernel now take a vfsmount pointer argument and return an integer, so most filesystems have to change very little. () If one of the convenience function is not used, then get_sb() should normally call simple_set_mnt() to instantiate the vfsmount. This will always return 0, and so can be tail-called from get_sb(). () generic_shutdown_super() now calls shrink_dcache_sb() to clean up the dcache upon superblock destruction rather than shrink_dcache_anon(). This is required because the superblock may now have multiple trees that aren't actually bound to s_root, but that still need to be cleaned up. The currently called functions assume that the whole tree is rooted at s_root, and that anonymous dentries are not the roots of trees which results in dentries being left unculled. However, with the way NFS superblock sharing are currently set to be implemented, these assumptions are violated: the root of the filesystem is simply a dummy dentry and inode (the real inode for '/' may well be inaccessible), and all the vfsmounts are rooted on anonymous[] dentries with child trees. [] Anonymous until discovered from another tree. () The documentation has been adjusted, including the additional bit of changing ext2_ into foo_* in the documentation. [akpm@osdl.org: convert ipath_fs, do other stuff] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nathan Scott <nathans@sgi.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-20	[PATCH] Audit of POSIX Message Queue Syscalls v.2	George C. Wilson
	This patch adds audit support to POSIX message queues. It applies cleanly to the lspp.b15 branch of Al Viro's git tree. There are new auxiliary data structures, and collection and emission routines in kernel/auditsc.c. New hooks in ipc/mqueue.c collect arguments from the syscalls. I tested the patch by building the examples from the POSIX MQ library tarball. Build them -lrt, not against the old MQ library in the tarball. Here's the URL: http://www.geocities.com/wronski12/posix_ipc/libmqueue-4.41.tar.gz Do auditctl -a exit,always -S for mq_open, mq_timedsend, mq_timedreceive, mq_notify, mq_getsetattr. mq_unlink has no new hooks. Please see the corresponding userspace patch to get correct output from auditd for the new record types. [fixes folded] Signed-off-by: George Wilson <ltcgcw@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-28	[PATCH] mqueue comment typo fix	Serge E. Hallyn
	(akpm: I don't do comment typos patches. This one snuck through by accident) Signed-off-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26	[PATCH] one ipc/sem.c->mutex.c converstion too many..	Manfred Spraul
	Ingo's sem2mutex patch incorrectly replaced one reference to ipc/sem.c with ipc/mutex.c in a comment. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26	[PATCH] sem2mutex: ipc, id.sem	Ingo Molnar
	Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22	Remove superfluous NOTIFY_COOKIE_LEN define	Michal Wronski
	NOTIFY_COOKIE_LEN is defined in mqueue.h as well as mqueue.c This patch removes redundant definition from mqueue.c Signed-off-by: Michal Wronski <Michal.Wronski@motorola.com> Signed-Off-By: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-02-09	[NETLINK]: Fix a severe bug	Alexey Kuznetsov
	netlink overrun was broken while improvement of netlink. Destination socket is used in the place where it was meant to be source socket, so that now overrun is never sent to user netlink sockets, when it should be, and it even can be set on kernel socket, which results in complete deadlock of rtnetlink. Suggested fix is to restore status quo passing source socket as additional argument to netlink_attachskb(). A little explanation: overrun is set on a socket, when it failed to receive some message and sender of this messages does not or even have no way to handle this error. This happens in two cases: 1. when kernel sends something. Kernel never retransmits and cannot wait for buffer space. 2. when user sends a broadcast and the message was not delivered to some recipients. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-14	[PATCH] Fix double decrement of mqueue_mnt->mnt_count in sys_mq_open	Alexander Viro
	Fixed the refcounting on failure exits in sys_mq_open() and cleaned the logics up. Rules are actually pretty simple - dentry_open() expects vfsmount and dentry to be pinned down and it either transfers them into created struct file or drops them. Old code had been very confused in that area - if dentry_open() had failed either in do_open() or do_create(), we ended up dentry and mqueue_mnt dropped twice, once by dentry_open() cleanup and then by sys_mq_open(). Fix consists of making the rules for do_create() and do_open() same as for dentry_open() and updating the sys_mq_open() accordingly; that actually leads to more straightforward code and less work on normal path. Signed-off-by: Al Viro <aviro@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11	[PATCH] move capable() to capability.h	Randy.Dunlap
	- Move capable() from sched.h to capability.h; - Use <linux/capability.h> where capable() is used (in include/, block/, ipc/, kernel/, a few drivers/, mm/, security/, & sound/; many more drivers/ to go) Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-09	[PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem	Jes Sorensen
	This patch converts the inode semaphore to a mutex. I have tested it on XFS and compiled as much as one can consider on an ia64. Anyway your luck with it might be different. Modified-by: Ingo Molnar <mingo@elte.hu> (finished the conversion) Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2005-11-06	Update Michal Wronski contact info	Michal Wronski

2005-09-27	[PATCH] Make POSIX message queue sys_mq_open() honor umask	Krzysztof Benedyczak
	We ignored umask when creating new queues via mq_open (when creating with open() on mqueue fs it is ok of course). According to the specification this a bug. This trivial patch fixes this. Signed-off-by: Krzysztof Benedyczak <golbi@mat.uni.torun.pl> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10	[PATCH] merge some from Rusty's trivial patches	Adrian Bunk
	This patch contains the most trivial from Rusty's trivial patches: - spelling fixes - remove duplicate includes Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01	[PATCH] convert that currently tests _NSIG directly to use valid_signal()	Jesper Juhl
	Convert most of the current code that uses _NSIG directly to instead use valid_signal(). This avoids gcc -W warnings and off-by-one errors. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01	[PATCH] use smp_mb/wmb/rmb where possible	akpm@osdl.org
	Replace a number of memory barriers with smp_ variants. This means we won't take the unnecessary hit on UP machines. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-27	[PATCH] handle posix message queues with /proc/sys disabled	Manfred Spraul
	register_sysctl_table() fails if sysctl support is not compiled into the kernel. The POSIX message queue subsystem aborted it's initialization if register_sysctl_table() fails, and that causes an oops in sys_mq_open(). The patch fixes that by ignoring failures from register_sysctl_table(). Signed-off-by; Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-18	[PATCH] add missing linux/syscalls.h includes	Arnd Bergmann
	I found that the prototypes for sys_waitid and sys_fcntl in <linux/syscalls.h> don't match the implementation. In order to keep all prototypes in sync in the future, now include the header from each file implementing any syscall. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-18	[PATCH] make rlimit settings per-process instead of per-thread	Roland McGrath
	POSIX specifies that the limit settings provided by getrlimit/setrlimit are shared by the whole process, not specific to individual threads. This patch changes the behavior of those calls to comply with POSIX. I've moved the struct rlimit array from task_struct to signal_struct, as it has the correct sharing properties. (This reduces kernel memory usage per thread in multithreaded processes by around 100/200 bytes for 32/64 machines respectively.) I took a fairly minimal approach to the locking issues with the newly shared struct rlimit array. It turns out that all the code that is checking limits really just needs to look at one word at a time (one rlim_cur field, usually). It's only the few places like getrlimit itself (and fork), that require atomicity in accessing a whole struct rlimit, so I just used a spin lock for them and no locking for most of the checks. If it turns out that readers of struct rlimit need more atomicity where they are now cheap, or less overhead where they are now atomic (e.g. fork), then seqcount is certainly the right thing to use for them instead of readers using the spin lock. Though it's in signal_struct, I didn't use siglock since the access to rlimits never needs to disable irqs and doesn't overlap with other siglock uses. Instead of adding something new, I overloaded task_lock(task->group_leader) for this; it is used for other things that are not likely to happen simultaneously with limit tweaking. To me that seems preferable to adding a word, but it would be trivial (and arguably cleaner) to add a separate lock for these users (or e.g. just use seqlock, which adds two words but is optimal for readers). Most of the changes here are just the trivial s/->rlim/->signal->rlim/. I stumbled across what must be a long-standing bug, in reparent_to_init. It does: memcpy(current->rlim, init_task.rlim, sizeof((current->rlim))); when surely it was intended to be: memcpy(current->rlim, init_task.rlim, sizeof(current->rlim)); As rlim is an array, the in the sizeof expression gets the size of the first element, so this just changes the first limit (RLIMIT_CPU). This is for kernel threads, where it's clear that resetting all the rlimits is what you want. With that fixed, the setting of RLIMIT_FSIZE in nfsd is superfluous since it will now already have been reset to RLIM_INFINITY. The other subtlety is removing: tsk->rlim[RLIMIT_CPU].rlim_cur = RLIM_INFINITY; in exit_notify, which was to avoid a race signalling during self-reaping exit. As the limit is now shared, a dying thread should not change it for others. Instead, I avoid that race by checking current->state before the RLIMIT_CPU check. (Adding one new conditional in that path is now required one way or another, since if not for this check there would also be a new race with self-reaping exit later on clearing current->signal that would have to be checked for.) The one loose end left by this patch is with process accounting. do_acct_process temporarily resets the RLIMIT_FSIZE limit while writing the accounting record. I left this as it was, but it is now changing a limit that might be shared by other threads still running. I left this in a dubious state because it seems to me that processing accounting may already be more generally a dubious state when it comes to NPTL threads. I would think you would want one record per process, with aggregate data about all threads that ever lived in it, not a separate record for each thread. I don't use process accounting myself, but if anyone is interested in testing it out I could provide a patch to change it this way. One final note, this is not 100% to POSIX compliance in regards to rlimits. POSIX specifies that RLIMIT_CPU refers to a whole process in aggregate, not to each individual thread. I will provide patches later on to achieve that change, assuming this patch goes in first. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-22	[PATCH] IS_ERR() unlikeliness cleanup	Andrew Morton
	Remove now-unneeded open-coded unlikelies around IS_ERR(). Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-06-17	[PATCH] RLIM: adjust default mqueue sizes	Chris Wright
	Lower default sizes for POSIX mqueue allocation now that rlimits are in place. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-06-17	[PATCH] RLIM: enforce rlimits for POSIX mqueue allocation	Chris Wright
	Add a user_struct to the mq_inode_info structure. Charge the maximum number of bytes that could be allocated to a mqueue to the user who creates the mqueue. This is checked against the per user rlimit. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-06-17	[PATCH] RLIM: add mq_attr_ok() helper	Chris Wright
	Add helper function mq_attr_ok() to do mq_attr sanity checking, and do some extra overlow checking. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-05-28	[PATCH] sparse: ipc __user annotation	Alexander Viro

2004-05-10	[PATCH] simplify mqueue_inode_info->messages allocation	Andrew Morton
	From: Chris Wright <chrisw@osdl.org> Currently, if a user creates an mqueue and passes an mq_attr, the info->messages will be created twice (and the extra one is properly freed). This patch simply delays the allocation so that it only ever happens once. The relevant mq_attr data is passed to lower levels via the dentry->d_fsdata fs private data. This also helps isolate the areas we'd need to touch to do rlimits on mqueues.
2004-05-04	[PATCH] fix queues_count accounting in mqueue_delete_inode()	Chris Wright
	During mqueue_get_inode(), it's possible that kmalloc() of the info->messages array will fail. This failure mode will cause the queues_count to be (incorrectly) decremented twice. This patch uses info->messages on mqueue_delete_inode() to determine whether the mqueue was every truly created, and hence proper accounting is needed on destruction.
2004-05-04	[PATCH] fix memleak in sys_mq_timedsend	Chris Wright
	Move error handling to capture all three possible error conditions on sending to a full queue. Without this fix any unprivileged user can leak arbitrary amounts of kernel memory.
2004-04-17	[PATCH] mqueue permission fix	Andrew Morton
	From: Manfred Spraul <manfred@colorfullife.com> Any user can delete any entries in a mqueue mounted filesystem. The attached patch prevents that. - remove the writable test from mq_unlink. - set the sticky bit in the root inode. This affects both mq_unlink and sys_unlink: only the owner (and root) should be allowed to remove queues.
2004-04-14	[PATCH] mq_open() and close_on_exec	Andrew Morton
	From: Chris Wright <chrisw@osdl.org> SUSv3 doesn't seem to specify one way or the other. I don't have the POSIX specs, and the old docs I have suggest that mq_open() creates an object which is to be closed upon exec. Jakub said: I think it is valid and required: http://www.opengroup.org/onlinepubs/007904975/functions/exec.html All open message queue descriptors in the calling process shall be closed, as described in mq_close() I'll add a new test for this into glibc testsuite.
2004-04-14	[PATCH] Fix mq_notify with SIGEV_NONE notification	Andrew Morton
	From: Jakub Jelinek <jakub@redhat.com> mq_notify (q, NULL) and struct sigevent ev = { .sigev_notify = SIGEV_NONE }; mq_notify (q, &ev) are not the same thing in POSIX, yet the kernel treats them the same. Only the former makes the notification available to other processes immediately, see http://www.opengroup.org/onlinepubs/007904975/functions/mq_notify.html Without the patch below, http://sources.redhat.com/ml/libc-hacker/2004-04/msg00028.html glibc test fails. I looked at mq in Solaris and they behave the same in this regard as Linux with this patch. Kernel with this patch passes both Intel POSIX testsuite (with testsuite fixes from Ulrich) and glibc mq testsuite.
2004-04-11	[PATCH] posix message queues: send notifications via netlink	Andrew Morton
	From: Manfred Spraul <manfred@colorfullife.com> SIGEV_THREAD means that a given callback should be called in the context on a new thread. This must be done by the C library. The kernel must deliver a notice of the event to the C library when the callback should be called. This patch switches to a new, simpler interface: User space creates a socket with socket(PF_NETLINK, SOCK_RAW,0) and passes the fd to the mq_notify call together with a cookie. When the mq_notify() condition is satisfied, the kernel "writes" the cookie to the socket. User space then reads the cookie and calls the appropriate callback.
2004-04-11	[PATCH] security bugfix for mqueue	Andrew Morton
	From: Manfred Spraul <manfred@colorfullife.com> I found a security bug in the new mqueue code: a process that has only write permissions to a message queue could call mq_notify(SIGEV_THREAD) and use the returned notification file descriptor to read from the message queue.
2004-04-11	[PATCH] posix message queue update	Andrew Morton
	From: Manfred Spraul <manfred@colorfullife.com> My discussion with Ulrich had one result: - mq_setattr can accept implementation defined flags. Right now we have none, but we might add some later (e.g. switch to CLOCK_MONOTONIC for mq_timed{send,receive} or something similar). When we add flags, we might need the fields for additional information. And they don't hurt. Therefore add four __reserved fields to mq_attr. - fail mq_setattr if we get unknown flags - otherwise glibc can't detect if it's running on a future kernel that supports new features. - use memset to initialize the mq_attr structure - theoretically we could leak kernel memory. - Only set O_NONBLOCK in mq_attr, explicitely clear O_RDWR & friends. openposix uses getattr, attr \|=O_NONBLOCK, setattr - a sane approach. Without clearing O_RDWR, this fails. I've retested all openposix conformance tests with the new patch - the two new FAILED tests check undefined behavior. Note that I won't have net access until Sunday - if the message queue patch breaks something important either ask Krzysztof or drop it. Ulrich had another good idea for SIGEV_THREAD, but I must think about it. It would mean less complexitiy in glibc, but more code in the kernel. I'm not yet convinced that it's overall better.
2004-04-11	[PATCH] posix message queues: made user mountable	Andrew Morton
	From: Manfred Spraul <manfred@colorfullife.com> Make the posix message queue mountable by the user. This replaces ipcs and ipcrm for posix message queue: The admin can check which queues exist with ls and remove stale queues with rm. I'd like a final confirmation from Ulrich that our SIGEV_THREAD approach is the right thing(tm): He's aware of the design and didn't object, but I think he hasn't seen the final API yet.
2004-04-11	[PATCH] posix message queues: linux-specific poll extension	Andrew Morton
	From: Manfred Spraul <manfred@colorfullife.com> Linux specific extension: make the message queue identifiers pollable. It's simple and could be useful.
2004-04-11	[PATCH] posix message queues: implementation	Andrew Morton
	From: Manfred Spraul <manfred@colorfullife.com> Actual implementation of the posix message queues, written by Krzysztof Benedyczak and Michal Wronski. The complete implementation is dependant on CONFIG_POSIX_MQUEUE. It passed the openposix test suite with two exceptions: one mq_unlink test was bad and tested undefined behavior. And Linux succeeds mq_close(open(,,,)). The spec mandates EBADF, but we have decided to ignore that: we would have to add a new syscall just for the right error code. The patch intentionally doesn't use all helpers from fs/libfs for kernel-only filesystems: step 5 allows user space mounts of the file system. Signal changes: The patch redefines SI_MESGQ using __SI_CODE: The generic Linux ABI uses a negative value (i.e. from user) for SI_MESGQ, but the kernel internal value must be posive to pass check_kill_value. Additionally, the patch adds support into copy_siginfo_to_user to copy the "new" signal type to user space. Changes in signal code caused by POSIX message queues patch: General & rationale: mqueues generated signals (only upon notification) must have si_code == SI_MESGQ. In fact such a signal is send from one process which caused notification (== sent message to empty message queue) to another which requested it. Both processes can be of course unrelated in terms of uids/euids. So SI_MESGQ signals must be classified as SI_FROMKERNEL to pass check_kill_permissions (not need to say that this signals ARE from kernel). Signals generated by message queues notification need the same fields in siginfo struct's union _sifields as POSIX.1b signals and we can reuse its union entry. SI_MESGQ was previously defined to -3 in kernel and also in glibc. So in userspace SI_MESGQ must be still visible as -3. Solution: SI_MESGQ is defined in the same style as SI_TIMER using __SI_CODE macro. Details: Fortunately copy_siginfo_to_user copies si_code as short. So we can use remaining part of int value freely. __SI_CODE does the work. SI_MESGQ is in kernel: 6<<16 \| (-3 & 0xffff) what is > 0 but to userspace is copied (short) SI_MESGQ == -3 Actual changes: Changes in include/asm-generic/siginfo.h __SI_MESGQ added in signal.h to represent inside-kernel prefix of SI_MESGQ. SI_MESGQ is redefined from -3 to __SI_CODE(__SI_MESGQ, -3) Except mips architecture those changes should be arch independent (asm-generic/siginfo.h is included in arch versions). On mips SI_MESGQ is redefined to -4 in order to be compatible with IRIX. But the same schema can be used. Change in copy_siginfo_to_user: We only add one line to order the same copy semantics as for _SI_RT. This change isn't very portable - some arch have its own copy_siginfo_to_user. All those should have similar change (but possibly not one-line as _SI_RT case was sometimes ignored because i wasn't used yet, e.g. see ia64 signal.c). Update: mq: only fail with invalid timespec if mq_timed{send,receive} needs to block From: Jakub Jelinek <jakub@redhat.com> POSIX requires EINVAL to be set if: "The process or thread would have blocked, and the abs_timeout parameter specified a nanoseconds field value less than zero or greater than or equal to 1000 million." but 2.6.5-mm3 returns -EINVAL even if the process or thread would not block (if the queue is not empty for timedreceive or not full for timedsend).