user/sven/linux.git - Linux Kernel

Age	Commit message (Collapse)	Author
2010-03-30	include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵	Tejun Heo
	implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2009-04-30	signals: implement sys_rt_tgsigqueueinfo	Thomas Gleixner
	sys_kill has the per thread counterpart sys_tgkill. sigqueueinfo is missing a thread directed counterpart. Such an interface is important for migrating applications from other OSes which have the per thread delivery implemented. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: Ulrich Drepper <drepper@redhat.com>
2009-01-06	Allow times and time system calls to return small negative values	Paul Mackerras
	At the moment, the times() system call will appear to fail for a period shortly after boot, while the value it want to return is between -4095 and -1. The same thing will also happen for the time() system call on 32-bit platforms some time in 2106 or so. On some platforms, such as x86, this is unavoidable because of the system call ABI, but other platforms such as powerpc have a separate error indication from the return value, so system calls can in fact return small negative values without indicating an error. On those platforms, force_successful_syscall_return() provides a way to indicate that the system call return value should not be treated as an error even if it is in the range which would normally be taken as a negative error number. This adds a force_successful_syscall_return() call to the time() and times() system calls plus their 32-bit compat versions, so that they don't erroneously indicate an error on those platforms whose system call ABI has a separate error indication. This will not affect anything on other platforms. Joakim Tjernlund added the fix for time() and the compat versions of time() and times(), after I did the fix for times(). Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-01	cpumask: convert kernel/compat.c	Rusty Russell
	Impact: Reduce stack usage, use new cpumask API. Straightforward conversion; cpumasks' size is given by cpumask_size() (now a variable rather than fixed) and on-stack cpu masks use cpumask_var_t. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-10-20	Merge branches 'timers/clocksource', 'timers/hrtimers', 'timers/nohz', ↵	Thomas Gleixner
	'timers/ntp', 'timers/posixtimers' and 'timers/debug' into v28-timers-for-linus
2008-10-16	compat: generic compat get/settimeofday	Christoph Hellwig
	Nothing arch specific in get/settimeofday. The details of the timeval conversion varied a little from arch to arch, but all with the same results. Also add an extern declaration for sys_tz to linux/time.h because externs in .c files are fowned upon. I'll kill the externs in various other files in a sparate patch. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: David S. Miller <davem@davemloft.net> [ sparc bits ] Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-14	timers: fix itimer/many thread hang	Frank Mayhar
	Overview This patch reworks the handling of POSIX CPU timers, including the ITIMER_PROF, ITIMER_VIRT timers and rlimit handling. It was put together with the help of Roland McGrath, the owner and original writer of this code. The problem we ran into, and the reason for this rework, has to do with using a profiling timer in a process with a large number of threads. It appears that the performance of the old implementation of run_posix_cpu_timers() was at least O(n3) (where "n" is the number of threads in a process) or worse. Everything is fine with an increasing number of threads until the time taken for that routine to run becomes the same as or greater than the tick time, at which point things degrade rather quickly. This patch fixes bug 9906, "Weird hang with NPTL and SIGPROF." Code Changes This rework corrects the implementation of run_posix_cpu_timers() to make it run in constant time for a particular machine. (Performance may vary between one machine and another depending upon whether the kernel is built as single- or multiprocessor and, in the latter case, depending upon the number of running processors.) To do this, at each tick we now update fields in signal_struct as well as task_struct. The run_posix_cpu_timers() function uses those fields to make its decisions. We define a new structure, "task_cputime," to contain user, system and scheduler times and use these in appropriate places: struct task_cputime { cputime_t utime; cputime_t stime; unsigned long long sum_exec_runtime; }; This is included in the structure "thread_group_cputime," which is a new substructure of signal_struct and which varies for uniprocessor versus multiprocessor kernels. For uniprocessor kernels, it uses "task_cputime" as a simple substructure, while for multiprocessor kernels it is a pointer: struct thread_group_cputime { struct task_cputime totals; }; struct thread_group_cputime { struct task_cputime totals; }; We also add a new task_cputime substructure directly to signal_struct, to cache the earliest expiration of process-wide timers, and task_cputime also replaces the it_*_expires fields of task_struct (used for earliest expiration of thread timers). The "thread_group_cputime" structure contains process-wide timers that are updated via account_user_time() and friends. In the non-SMP case the structure is a simple aggregator; unfortunately in the SMP case that simplicity was not achievable due to cache-line contention between CPUs (in one measured case performance was actually _worse_ on a 16-cpu system than the same test on a 4-cpu system, due to this contention). For SMP, the thread_group_cputime counters are maintained as a per-cpu structure allocated using alloc_percpu(). The timer functions update only the timer field in the structure corresponding to the running CPU, obtained using per_cpu_ptr(). We define a set of inline functions in sched.h that we use to maintain the thread_group_cputime structure and hide the differences between UP and SMP implementations from the rest of the kernel. The thread_group_cputime_init() function initializes the thread_group_cputime structure for the given task. The thread_group_cputime_alloc() is a no-op for UP; for SMP it calls the out-of-line function thread_group_cputime_alloc_smp() to allocate and fill in the per-cpu structures and fields. The thread_group_cputime_free() function, also a no-op for UP, in SMP frees the per-cpu structures. The thread_group_cputime_clone_thread() function (also a UP no-op) for SMP calls thread_group_cputime_alloc() if the per-cpu structures haven't yet been allocated. The thread_group_cputime() function fills the task_cputime structure it is passed with the contents of the thread_group_cputime fields; in UP it's that simple but in SMP it must also safely check that tsk->signal is non-NULL (if it is it just uses the appropriate fields of task_struct) and, if so, sums the per-cpu values for each online CPU. Finally, the three functions account_group_user_time(), account_group_system_time() and account_group_exec_runtime() are used by timer functions to update the respective fields of the thread_group_cputime structure. Non-SMP operation is trivial and will not be mentioned further. The per-cpu structure is always allocated when a task creates its first new thread, via a call to thread_group_cputime_clone_thread() from copy_signal(). It is freed at process exit via a call to thread_group_cputime_free() from cleanup_signal(). All functions that formerly summed utime/stime/sum_sched_runtime values from from all threads in the thread group now use thread_group_cputime() to snapshot the values in the thread_group_cputime structure or the values in the task structure itself if the per-cpu structure hasn't been allocated. Finally, the code in kernel/posix-cpu-timers.c has changed quite a bit. The run_posix_cpu_timers() function has been split into a fast path and a slow path; the former safely checks whether there are any expired thread timers and, if not, just returns, while the slow path does the heavy lifting. With the dedicated thread group fields, timers are no longer "rebalanced" and the process_timer_rebalance() function and related code has gone away. All summing loops are gone and all code that used them now uses the thread_group_cputime() inline. When process-wide timers are set, the new task_cputime structure in signal_struct is used to cache the earliest expiration; this is checked in the fast path. Performance The fix appears not to add significant overhead to existing operations. It generally performs the same as the current code except in two cases, one in which it performs slightly worse (Case 5 below) and one in which it performs very significantly better (Case 2 below). Overall it's a wash except in those two cases. I've since done somewhat more involved testing on a dual-core Opteron system. Case 1: With no itimer running, for a test with 100,000 threads, the fixed kernel took 1428.5 seconds, 513 seconds more than the unfixed system, all of which was spent in the system. There were twice as many voluntary context switches with the fix as without it. Case 2: With an itimer running at .01 second ticks and 4000 threads (the most an unmodified kernel can handle), the fixed kernel ran the test in eight percent of the time (5.8 seconds as opposed to 70 seconds) and had better tick accuracy (.012 seconds per tick as opposed to .023 seconds per tick). Case 3: A 4000-thread test with an initial timer tick of .01 second and an interval of 10,000 seconds (i.e. a timer that ticks only once) had very nearly the same performance in both cases: 6.3 seconds elapsed for the fixed kernel versus 5.5 seconds for the unfixed kernel. With fewer threads (eight in these tests), the Case 1 test ran in essentially the same time on both the modified and unmodified kernels (5.2 seconds versus 5.8 seconds). The Case 2 test ran in about the same time as well, 5.9 seconds versus 5.4 seconds but again with much better tick accuracy, .013 seconds per tick versus .025 seconds per tick for the unmodified kernel. Since the fix affected the rlimit code, I also tested soft and hard CPU limits. Case 4: With a hard CPU limit of 20 seconds and eight threads (and an itimer running), the modified kernel was very slightly favored in that while it killed the process in 19.997 seconds of CPU time (5.002 seconds of wall time), only .003 seconds of that was system time, the rest was user time. The unmodified kernel killed the process in 20.001 seconds of CPU (5.014 seconds of wall time) of which .016 seconds was system time. Really, though, the results were too close to call. The results were essentially the same with no itimer running. Case 5: With a soft limit of 20 seconds and a hard limit of 2000 seconds (where the hard limit would never be reached) and an itimer running, the modified kernel exhibited worse tick accuracy than the unmodified kernel: .050 seconds/tick versus .028 seconds/tick. Otherwise, performance was almost indistinguishable. With no itimer running this test exhibited virtually identical behavior and times in both cases. In times past I did some limited performance testing. those results are below. On a four-cpu Opteron system without this fix, a sixteen-thread test executed in 3569.991 seconds, of which user was 3568.435s and system was 1.556s. On the same system with the fix, user and elapsed time were about the same, but system time dropped to 0.007 seconds. Performance with eight, four and one thread were comparable. Interestingly, the timer ticks with the fix seemed more accurate: The sixteen-thread test with the fix received 149543 ticks for 0.024 seconds per tick, while the same test without the fix received 58720 for 0.061 seconds per tick. Both cases were configured for an interval of 0.01 seconds. Again, the other tests were comparable. Each thread in this test computed the primes up to 25,000,000. I also did a test with a large number of threads, 100,000 threads, which is impossible without the fix. In this case each thread computed the primes only up to 10,000 (to make the runtime manageable). System time dominated, at 1546.968 seconds out of a total 2176.906 seconds (giving a user time of 629.938s). It received 147651 ticks for 0.015 seconds per tick, still quite accurate. There is obviously no comparable test without the fix. Signed-off-by: Frank Mayhar <fmayhar@google.com> Cc: Roland McGrath <roland@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-01	ntp: support for TAI	Roman Zippel
	This adds support for setting the TAI value (International Atomic Time). The value is reported back to userspace via timex (as we don't have a ntp_gettime() syscall). Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30	signals: add set_restore_sigmask	Roland McGrath
	This adds the set_restore_sigmask() inline in <linux/thread_info.h> and replaces every set_thread_flag(TIF_RESTORE_SIGMASK) with a call to it. No change, but abstracts the details of the flag protocol from all the calls. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-19	generic: reduce stack pressure in sched_affinity	Mike Travis
	* Modify sched_affinity functions to pass cpumask_t variables by reference instead of by value. * Use new set_cpus_allowed_ptr function. Depends on: [sched-devel]: sched: add new set_cpus_allowed_ptr function Cc: Paul Jackson <pj@sgi.com> Cc: Cliff Wickman <cpw@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17	hrtimer: use nanosleep specific restart_block fields	Thomas Gleixner
	Convert all the nanosleep related users of restart_block to the new nanosleep specific restart_block fields. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-10	hrtimer: don't modify restart_block->fn in restart functions	Oleg Nesterov
	hrtimer_nanosleep_restart() clears/restores restart_block->fn. This is pointless and complicates its usage. Note that if sys_restart_syscall() doesn't actually happen, we have a bogus "pending" restart->fn anyway, this is harmless. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Pavel Emelyanov <xemul@sw.ru> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Toyo Abe <toyoa@mvista.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-10	hrtimer: fix *rmtp/restarts handling in compat_sys_nanosleep()	Oleg Nesterov
	Spotted by Pavel Emelyanov and Alexey Dobriyan. compat_sys_nanosleep() implicitly uses hrtimer_nanosleep_restart(), this can't work. Make a suitable compat_nanosleep_restart() helper. Introduced by commit c70878b4e0b6cf8d2f1e46319e48e821ef4a8aba hrtimer: hook compat_sys_nanosleep up to high res timer code Also, set ->addr_limit = KERNEL_DS before doing hrtimer_nanosleep(), this func was changed by the previous patch and now takes the "__user *" parameter. Thanks to Ingo Molnar for fixing the bug in this patch. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Pavel Emelyanov <xemul@sw.ru> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Toyo Abe <toyoa@mvista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-18	Merge ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt	Linus Torvalds
	* ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt: hrtimer: hook compat_sys_nanosleep up to high res timer code hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier
2007-10-18	whitespace fixes: compat syscalls	Daniel Walker
	Signed-off-by: Daniel Walker <dwalker@mvista.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18	hrtimer: hook compat_sys_nanosleep up to high res timer code	Anton Blanchard
	Now we have high res timers on ppc64 I thought Id test them. It turns out compat_sys_nanosleep hasnt been converted to the hrtimer code and so is limited to HZ resolution. The follow patch converts compat_sys_nanosleep to use high res timers. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-05-11	signal/timer/event: timerfd compat code	Davide Libenzi
	This patch implements the necessary compat code for the timerfd system call. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11	[PATCH] Common compat_sys_sysinfo	Kyle McMartin
	I noticed that almost all architectures implemented exactly the same sys32_sysinfo... except parisc, where a bug was to be found in handling of the uptime. So let's remove a whole whack of code for fun and profit. Cribbed compat_sys_sysinfo from x86_64's implementation, since I figured it would be the best tested. This patch incorporates Arnd's suggestion of not using set_fs/get_fs, but instead extracting out the common code from sys_sysinfo. Cc: Christoph Hellwig <hch@infradead.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2006-11-03	[PATCH] Create compat_sys_migrate_pages	Stephen Rothwell
	This is needed on bigendian 64bit architectures. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-28	[PATCH] Constify compat_get_bitmap argument	Stephen Rothwell
	This means we can call it when the bitmap we want to fetch is declared const. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02	[PATCH] BLOCK: Revert patch to hack around undeclared sigset_t in linux/compat.h	David Howells
	Revert Andrew Morton's patch to temporarily hack around the lack of a declaration of sigset_t in linux/compat.h to make the block-disablement patches build on IA64. This got accidentally pushed to Linus and should be fixed in a different manner. Also make linux/compat.h #include asm/signal.h to gain a definition of sigset_t so that it can externally declare sigset_from_compat(). This has been compile-tested for i386, x86_64, ia64, mips, mips64, frv, ppc and ppc64 and run-tested on frv. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-30	[PATCH] BLOCK: Move extern declarations out of fs/*.c into header files [try #6]	David Howells
	Create a new header file, fs/internal.h, for common definitions local to the sources in the fs/ directory. Move extern definitions that should be in header files from fs/*.c to fs/internal.h or other main header files where they span directories. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-29	[PATCH] posix-timers: Fix clock_nanosleep() doesn't return the remaining ↵	Toyo Abe
	time in compatibility mode The clock_nanosleep() function does not return the time remaining when the sleep is interrupted by a signal. This patch creates a new call out, compat_clock_nanosleep_restart(), which handles returning the remaining time after a sleep is interrupted. This patch revives clock_nanosleep_restart(). It is now accessed via the new call out. The compat_clock_nanosleep_restart() is used for compatibility access. Since this is implemented in compatibility mode the normal path is virtually unaffected - no real performance impact. Signed-off-by: Toyo Abe <toyoa@mvista.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25	[PATCH] N32 sigset and __COMPAT_ENDIAN_SWAP__	akpm@osdl.org
	I'm testing glibc on MIPS64, little-endian, N32, O32 and N64 multilibs. Among the NPTL test failures seen are some arising from sigsuspend problems for N32: it blocks the wrong signals, so SIGCANCEL (SIGRTMIN) is blocked despite glibc's carefully excluding it from sets of signals to block. Specifically, testing suggests it blocks signal N^32 instead of signal N, so (in the example tested) blocking SIGUSR1 (17) blocks signal 49 instead. glibc's sigset_t uses an array of unsigned long, as does the kernel. In both cases, signal N+1 is represented as (1UL << (N % (8 * sizeof (unsigned long)))) in word number (N / (8 * sizeof (unsigned long))). Thus the N32 glibc uses an array of 32-bit words and the N64 kernel uses an array of 64-bit words. For little-endian, the layout is the same, with signals 1-32 in the first 4 bytes, signals 33-64 in the second, etc.; for big-endian, userspace has that layout while in the kernel each 8 bytes have the two halves swapped from the userspace layout. The N32 sigsuspend syscall uses sigset_from_compat to convert the userspace sigset to kernel format. If __COMPAT_ENDIAN_SWAP__ is not set, this uses logic of the form set->sig[0] = compat->sig[0] \| (((long)compat->sig[1]) << 32 ) to convert the userspace sigset to a kernel one. This looks correct to me for both big and little endian, given that in userspace compat->sig[1] will represent signals 33-64, and so will the high 32 bits of set->sig[0] in the kernel. If however __COMPAT_ENDIAN_SWAP__ is set, as it is for __MIPSEL__, it uses set->sig[0] = compat->sig[1] \| (((long)compat->sig[0]) << 32 ); which seems incorrect for both big and little endian, and would explain the observed symptoms. This code is the only use of __COMPAT_ENDIAN_SWAP__, so if incorrect then that macro serves no purpose, in which case something like the following patch would seem appropriate to remove it. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23	[PATCH] move_pages: fix 32 -> 64 bit compat function	Christoph Lameter
	The definition of the third parameter is a pointer to an array of virtual addresses which give us some trouble. The existing code calculated the wrong address in the array since I used void to avoid having to specify a type. I now use the correct type "compat_uptr_t __user *" in the definition of the function in kernel/compat.c. However, I used __u32 in syscalls.h. Would have to include compat.h there in order to provide the same definition which would generate an ugly include situation. On both ia64 and x86_64 compat_uptr_t is u32. So this works although parameter declarations differ. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23	[PATCH] sys_move_pages: 32bit support (i386, x86_64)	Christoph Lameter
	sys_move_pages() support for 32bit (i386 plus x86_64 compat layer) Add support for move_pages() on i386 and also add the compat functions necessary to run 32 bit binaries on x86_64. Add compat_sys_move_pages to the x86_64 32bit binary layer. Note that it is not up to date so I added the missing pieces. Not sure if this is done the right way. [akpm@osdl.org: compile fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27	[PATCH] lightweight robust futexes: compat	Ingo Molnar
	32-bit syscall compatibility support. (This patch also moves all futex related compat functionality into kernel/futex_compat.c.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@infradead.org> Acked-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26	[PATCH] consolidate sys32/compat_adjtimex	Stephen Rothwell
	Create compat_sys_adjtimex and use it an all appropriate places. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-07	[PATCH] remove bogus asm/bug.h includes.	Al Viro
	A bunch of asm/bug.h includes are both not needed (since it will get pulled anyway) and bogus (since they are done too early). Removed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-01-18	[PATCH] Generic sys_rt_sigsuspend()	David Woodhouse
	The TIF_RESTORE_SIGMASK flag allows us to have a generic implementation of sys_rt_sigsuspend() instead of duplicating it for each architecture. This provides such an implementation and makes arch/powerpc use it. It also tidies up the ppc32 sys_sigsuspend() to use TIF_RESTORE_SIGMASK. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10	[PATCH] common compat_sys_timer_create	Christoph Hellwig
	The comment in compat.c is wrong, every architecture provides a get_compat_sigevent() for the IPC compat code already. This basically moves the x86_64 version to common code and removes all the others. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Paul Mackerras <paulus@samba.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10	[PATCH] kernel: fix-up schedule_timeout() usage	Nishanth Aravamudan
	Use schedule_timeout_{,un}interruptible() instead of set_current_state()/schedule_timeout() to reduce kernel size. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16	[PATCH] Fix get_compat_sigevent()	David S. Miller
	I have no idea how a bug like this lasted so long. Anyways, obvious memset()'ing of incorrect pointer. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-13	[PATCH] verify_area cleanup : i386 and misc.	Jesper Juhl
	This patch converts verify_area to access_ok in arch/i386, fs/, kernel/ and a few other bits that didn't fit in the other patches or that I actually was able to test on my hardware - this is by far the best tested of all the patches. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-09	[PATCH] consolidate the last of the compat sigevent structs	Stephen Rothwell
	This patch pulls together the compat_sigevent structs. It also consolidates the copying of these structures into the kernel. The only part of the second union in sigevent that the kernel looks at currently is the _tid, so that is the only bit we copy. This patch depends on my previous two patches "add and use COMPAT_SIGEV_PAD_SIZE" and "Consolidate the last compat sigvals". Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-06	[PATCH] Consolidate compat_sys_waitid	Stephen Rothwell
	This patch does: - consolidate the three implementations of compat_sys_waitid (some were called sys32_waitid). - adds sys_waitid syscall to ppc - adds sys_waitid and compat_sys_waitid syscalls to ppc64 I have left PARISC and MIPS to their own devices (by request). Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-01-14	[PATCH] Make compat_rt_sigtimedwait conform	Matthew Wilcox
	Compat syscalls need to start compat_sys_ otherwise PA-RISC's compat syscall wrappers don't work. Not that the individual involved bothered to patch PA-RISC ... Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-01-11	[PATCH] cputime: introduce cputime	Martin Schwidefsky
	This patch introduces the concept of (virtual) cputime. Each architecture can define its method to measure cputime. The main idea is to define a cputime_t type and a set of operations on it (see asm-generic/cputime.h). Then use the type for utime, stime, cutime, cstime, it_virt_value, it_virt_incr, it_prof_value and it_prof_incr and use the cputime operations for each access to these variables. The default implementation is jiffies based and the effect of this patch for architectures which use the default implementation should be neglectible. There is a second type cputime64_t which is necessary for the kernel_stat cpu statistics. The default cputime_t is 32 bit and based on HZ, this will overflow after 49.7 days. This is not enough for kernel_stat (ihmo not enough for a processes too), so it is necessary to have a 64 bit type. The third thing that gets introduced by this patch is an additional field for the /proc/stat interface: cpu steal time. An architecture can account cpu steal time by calls to the account_stealtime function. The cpu which backs a virtual processor doesn't spent all of its time for the virtual cpu. To get meaningful cpu usage numbers this involuntary wait time needs to be accounted and exported to user space. From: Hugh Dickins <hugh@veritas.com> The p->signal check in account_system_time is insufficient. If the timer interrupt hits near the end of exit_notify, after EXIT_ZOMBIE has been set, another cpu may release_task (NULLifying p->signal) in between account_system_time's check and check_rlimit's dereference. Nor should account_it_prof risk send_sig. But surely account_user_time is safe? Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-01-04	[PATCH] compat: sigtimedwait	Zou Nanhai
	- Merge sys32_rt_sigtimedwait function in X86_64, IA64, PPC64, MIPS, SPARC64, S390 32 bit layer into 1 compat_rt_sigtimedwait function. It will also fix a bug of copy wrong information to 32 bit userspace siginfo structure on X86_64, IA64 and SPARC64 when calling sigtimedwait on 32 bit layer. - Change all name the of siginfo_t32 structure in X86_64, IA64, MIPS, SPARC64 and S390 to the name compat_siginfo_t as used in PPC64. - Patch introduced a macro __COMPAT_ENDIAN_SWAP__ in include/asm-mips/compat.h when MIPS kernel is compiled in little-endian mode. This macro is used to do byte swapping in function sigset_from_compat. - This patch is only tested on X86_64 and IA_64. Signed-off-by: Zou Nan hai <Nanhai.zou@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-01-04	[PATCH] sys_stime needs a compat function	Martin Schwidefsky
	I realized that the best way to get the sys_time/sys_stime problem fixed is to make sys_time 64 bit safe by using "time_t " instead of "int " and to introduce two proper compat functions compat_sys_time and compat_sys_stime. The prototype change of sys_time is transparent for 32 bit architectures because both "int" and "time_t" are 32 bit. For 64 bit the type change would be wrong but luckily no 64 bit architecture uses sys_time/sys_stime in 64 bit mode. The patch makes the following change: ia64 : Remove sys32_time, use compat_sys_time and add (!!) compat_sys_stime to compat syscall table. mips : Use compat_sys_time/compat_sys_stime in 32 bit syscall table. Add #ifdef magic to compile sys_time/sys_stime and compat_sys_time/compat_sys_stime only if needed. parisc : Remove sys32_time, use compat_sys_time and compat_sys_stime. ppc64 : remove sys32_time, ppc64_sys32_stime and ppc64_sys_stime. Use common compat_sys_time, compat_sys_stime and sys_stime. s390 : Use compat_sys_stime. Add #ifdef magic to compile sys_time/sys_stime and compat_sys_time/compat_sys_stime only if needed. sparc64 : Use compat_sys_time/compat_Sys_stime in 32 bit syscall table. um : Remove um_time and um_stime. Use common functions sys_time and sys_stime. This adds a CAP_SYS_TIME check to UMs stime call. x86_64 : Remove sys32_time. Use compat_sys_time and compat_sys_stime in 32 bit syscall table. The original stime bug is fixed for mips, parisc, s390, sparc64 and x86_64. Can the arch-maintainers please take a look at this? From: Martin Schwidefsky <schwidefsky@de.ibm.com> Convert compat_time_t to time_t in 32 bit emulation for sys_stime and consolidate all the different implementation of sys_time, sys_stime and their 32-bit emulation parts. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-11-10	[PATCH] compat syscalls naming standardisation	Matthew Wilcox
	On PA-RISC, we have a unified syscall table for 32 and 64 bit that uses macros to generate the appropriate syscall names (native vs compat). For this to work, we need consistent compat syscall names. Unfortunately, some recent additions drop the 'sys_'. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-09-13	[PATCH] Backward compatibility for compat sched_getaffinity	Anton Blanchard
	The follow patch special cases the NR_CPUS <= BITS_PER_COMPAT_LONG case. Without this patch, a 32bit task would be required to have a 64bit cpumask no matter what value of NR_CPUS are used. With this patch a compat long sized bitmask is allowed if NR_CPUS is small enough to fit within it. Of course applications should be using the glibc wrappers that use an opaque cpu_mask_t type, but there could be older applications using the syscalls directly. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-09-13	[PATCH] Clean up compat sched affinity syscalls	Anton Blanchard
	Remove the set_fs hack in the compat affinity calls. Create sched_getaffinity and sched_setaffinity helper functions that both the native and compat affinity syscalls use. Also make the compat functions match what the native ones are doing now, setaffinity calls succeed no matter what length the bitmask is, but getaffinity calls must pass in bitmasks at least as long as the kernel type. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-09-07	[PATCH] ppc64: compat_get_bitmap/compat_put_bitmap	Anton Blanchard
	Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-09-07	[PATCH] ppc64: fix compat cpu affinity on big endian 64bit	Anton Blanchard
	Add compat sched affinity code. We can argue about how USE_COMPAT_ULONG_CPUMASK works now that the non compat interface has changed. The old non compat behaviour was to require a bitmap long enough in both setaffinity and getaffinity, now its only required in getaffinity. I could do the same for the 32bit interfaces. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-30	[PATCH] fix rusage semantics	Roland McGrath
	This patch changes the rusage bookkeeping and the semantics of the getrusage and times calls in a couple of ways. The first change is in the c* fields counting dead child processes. POSIX requires that children that have died be counted in these fields when they are reaped by a wait* call, and that if they are never reaped (e.g. because of ignoring SIGCHLD, or exitting yourself first) then they are never counted. These were counted in release_task for all threads. I've changed it so they are counted in wait_task_zombie, i.e. exactly when being reaped. POSIX also specifies for RUSAGE_CHILDREN that the report include the reaped child processes of the calling process, i.e. whole thread group in Linux, not just ones forked by the calling thread. POSIX specifies tms_c[us]time fields in the times call the same way. I've moved the c* fields that contain this information into signal_struct, where the single set of counters accumulates data from any thread in the group that calls wait*. Finally, POSIX specifies getrusage and times as returning cumulative totals for the whole process (aka thread group), not just the calling thread. I've added fields in signal_struct to accumulate the stats of detached threads as they die. The process stats are the sums of these records plus the stats of remaining each live/zombie thread. The times and getrusage calls, and the internal uses for filling in wait4 results and siginfo_t, now iterate over the threads in the thread group and sum up their stats along with the stats recorded for threads already dead and gone. I added a new value RUSAGE_GROUP (-3) for the getrusage system call rather than changing the behavior of the old RUSAGE_SELF (0). POSIX specifies RUSAGE_SELF to mean all threads, so the glibc getrusage call will just translate it to RUSAGE_GROUP for new kernels. I did this thinking that someone somewhere might want the old behavior with an old glibc and a new kernel (it is only different if they are using CLONE_THREAD anyway). However, I've changed the times system call to conform to POSIX as well and did not provide any backward compatibility there. In that case there is nothing easy like a parameter value to use, it would have to be a new system call number. That seems pretty pointless. Given that, I wonder if it is worth bothering to preserve the compatible RUSAGE_SELF behavior by introducing RUSAGE_GROUP instead of just changing RUSAGE_SELF's meaning. Comments? I've done some basic testing on x86 and x86-64, and all the numbers come out right after these fixes. (I have a test program that shows a few Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-30	[PATCH] waitid system call	Roland McGrath
	This patch adds a new system call `waitid'. This is a new POSIX call that subsumes the rest of the wait* family and can do some things the older calls cannot. A minor addition is the ability to select what kinds of status to check for with a mask of independent bits, so you can wait for just stops and not terminations, for example. A more significant improvement is the WNOWAIT flag, which allows for polling child status without reaping. This interface fills in a siginfo_t with the same details that a SIGCHLD for the status change has; some of that info (e.g. si_uid) is not available via wait4 or other calls. I've added a new system call that has the parameter conventions of the POSIX function because that seems like the cleanest thing. This patch includes the actual system call table additions for i386 and x86-64; other architectures will need to assign the system call number, and 64-bit ones may need to implement 32-bit compat support for it as I did for x86-64. The new features could instead be provided by some new kludge inventions in the wait4 system call interface (that's what BSD did). If kludges are preferable to adding a system call, I can work up something different. I added a struct rusage field si_rusage to siginfo_t in the SIGCHLD case (this does not affect the size or layout of the struct). This is not part of the POSIX interface, but it makes it so that `waitid' subsumes all the functionality of `wait4'. Future kernel ABIs (new arch's or whatnot) can have only the `waitid' system call and the rest of the wait* family including wait3 and wait4 can be implemented in user space using waitid. There is nothing in user space as yet that would make use of the new field. Most of the new functionality is implemented purely in the waitid system call itself. POSIX also provides for the WCONTINUED flag to report when a child process had been stopped by job control and then resumed with SIGCONT. Corresponding to this, a SIGCHLD is now generated when a child resumes (unless SA_NOCLDSTOP is set), with the value CLD_CONTINUED in siginfo_t.si_code. To implement this, some additional bookkeeping is required in the signal code handling job control stops. The motivation for this work is to make it possible to implement the POSIX semantics of the `waitid' function in glibc completely and correctly. If changing either the system call interface used to accomplish that, or any details of the kernel implementation work, would improve the chances of getting this incorporated, I am more than happy to work through any issues. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-07-28	[PATCH] compat_clock_getres shouldn't return -EFAULT if res == NULL	Arun Sharma
	For clock_getres(clockid_t clock_id, struct timespec *res), the specification says "If res is NULL, the clock resolution is not returned." So this kind of call should succeed. The current implementation returns -EFAULT. The patch fixes the bug in compat_clock_getres(). Signed-off-by: Gordon Jin <gordon.jin@intel.com> Signed-off-by: Arun Sharma <arun.sharma@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-06-04	[COMPAT]: Add __user attributes for pointers passed while KERNEL_DS.	David S. Miller

2004-06-02	Merge bk://kernel.bkbits.net/davem/sparc-2.6	Linus Torvalds
	into ppc970.osdl.org:/home/torvalds/v2.6/linux