user/sven/linux.git - Linux Kernel

Age	Commit message (Collapse)	Author
2004-06-30	[PATCH] sparse: NULL vs 0 - the rest of it	Mika Kukkonen

2004-04-11	[PATCH] fix posix-timers to have proper per-process scope	Andrew Morton
	From: Roland McGrath <roland@redhat.com> The posix-timers implementation associates timers with the creating thread and destroys timers when their creator thread dies. POSIX clearly specifies that these timers are per-process, and a timer should not be torn down when the thread that created it exits. I hope there won't be any controversy on what the correct semantics are here, since POSIX is clear and the Linux feature is called "posix-timers". The attached program built with NPTL -lrt -lpthread demonstrates the bug. The program is correct by POSIX, but fails on Linux. Note that a until just the other day, NPTL had a trivial bug that always disabled its use of kernel timer syscalls (check strace for lack of timer_create/SYS_259). So unless you have built your own NPTL libs very recently, you probably won't see the kernel calls actually used by this program. Also attached is my patch to fix this. It (you guessed it) moves the posix_timers field from task_struct to signal_struct. Access is now governed by the siglock instead of the task lock. exit_itimers is called from __exit_signal, i.e. only on the death of the last thread in the group, rather than from do_exit for every thread. Timers' it_process fields store the group leader's pointer, which won't die. For the case of SIGEV_THREAD_ID, I hold a ref on the task_struct for it_process to stay robust in case the target thread dies; the ref is released and the dangling pointer cleared when the timer fires and the target thread is dead. (This should only come up in a buggy user program, so noone cares exactly how the kernel handles that case. But I think what I did is robust and sensical.) /* Test for bogus per-thread deletion of timers. / #include <stdio.h> #include <error.h> #include <time.h> #include <signal.h> #include <stdint.h> #include <sys/time.h> #include <sys/resource.h> #include <unistd.h> #include <pthread.h> / Creating timers in another thread should work too. / static void do_timer_create(void arg) { struct sigevent const sigev = arg; timer_t const timerId = sigev->sigev_value.sival_ptr; if (timer_create(CLOCK_REALTIME, sigev, timerId) < 0) { perror("timer_create"); return NULL; } return timerId; } int main(void) { int i, res; timer_t timerId; struct itimerspec itval; struct sigevent sigev; itval.it_interval.tv_sec = 2; itval.it_interval.tv_nsec = 0; itval.it_value.tv_sec = 2; itval.it_value.tv_nsec = 0; sigev.sigev_notify = SIGEV_SIGNAL; sigev.sigev_signo = SIGALRM; sigev.sigev_value.sival_ptr = (void )&timerId; for (i = 0; i < 100; i++) { printf("cnt = %d\n", i); pthread_t thr; res = pthread_create(&thr, NULL, &do_timer_create, &sigev); if (res) { error(0, res, "pthread_create"); continue; } void *val; res = pthread_join(thr, &val); if (res) { error(0, res, "pthread_join"); continue; } if (val == NULL) continue; res = timer_settime(timerId, 0, &itval, NULL); if (res < 0) perror("timer_settime"); res = timer_delete(timerId); if (res < 0) perror("timer_delete"); } return 0; }
2004-02-18	[PATCH] NGROUPS 2.6.2rc2 + fixups	Andrew Morton
	From: Tim Hockin <thockin@sun.com>, Neil Brown <neilb@cse.unsw.edu.au>, me New groups infrastructure. task->groups and task->ngroups are replaced by task->group_info. Group)info is a refcounted, dynamic struct with an array of pages. This allows for large numbers of groups. The current limit of 32 groups has been raised to 64k groups. It can be raised more by changing the NGROUPS_MAX constant in limits.h
2004-02-03	[PATCH] initialise cpu_vm_mask in init_mm	Andrew Morton
	From: Anton Blanchard <anton@samba.org> Some architectures use cpu_vm_mask to optimise TLB flushes. On ppc64 we are now using a common flush infrastructure that handles both userspace and kernelspace (vmalloc) pages. In order to avoid triggering this optimisation we need to mark the init mm as having scheduled on all cpus. Things currently work by luck (we check for the cpu only having run on the local cpu, and the field is initialised to 0), but it would be safer to initialise it CPU_MASK_ALL.
2003-08-18	[PATCH] cpumask_t: allow more than BITS_PER_LONG CPUs	Andrew Morton
	From: William Lee Irwin III <wli@holomorphy.com> Contributions from: Jan Dittmer <jdittmer@sfhq.hn.org> Arnd Bergmann <arnd@arndb.de> "Bryan O'Sullivan" <bos@serpentine.com> "David S. Miller" <davem@redhat.com> Badari Pulavarty <pbadari@us.ibm.com> "Martin J. Bligh" <mbligh@aracnet.com> Zwane Mwaikambo <zwane@linuxpower.ca> It has ben tested on x86, sparc64, x86_64, ia64 (I think), ppc and ppc64. cpumask_t enables systems with NR_CPUS > BITS_PER_LONG to utilize all their cpus by creating an abstract data type dedicated to representing cpu bitmasks, similar to fd sets from userspace, and sweeping the appropriate code to update callers to the access API. The fd set-like structure is according to Linus' own suggestion; the macro calling convention to ambiguate representations with minimal code impact is my own invention. Specifically, a new set of inline functions for manipulating arbitrary-width bitmaps is introduced with a relatively simple implementation, in tandem with a new data type representing bitmaps of width NR_CPUS, cpumask_t, whose accessor functions are defined in terms of the bitmap manipulation inlines. This bitmap ADT found an additional use in i386 arch code handling sparse physical APIC ID's, which was convenient to use in this case as the accounting structure was required to be wider to accommodate the physids consumed by larger numbers of cpus. For the sake of simplicity and low code impact, these cpu bitmasks are passed primarily by value; however, an additional set of accessors along with an auxiliary data type with const call-by-reference semantics is provided to address performance concerns raised in connection with very large systems, such as SGI's larger models, where copying and call-by-value overhead would be prohibitive. Few (if any) users of the call-by-reference API are immediately introduced. Also, in order to avoid calling convention overhead on architectures where structures are required to be passed by value, NR_CPUS <= BITS_PER_LONG is special-cased so that cpumask_t falls back to an unsigned long and the accessors perform the usual bit twiddling on unsigned longs as opposed to arrays thereof. Audits were done with the structure overhead in-place, restoring this special-casing only afterward so as to ensure a more complete API conversion while undergoing the majority of its end-user exposure in -mm. More -mm's were shipped after its restoration to be sure that was tested, too. The immediate users of this functionality are Sun sparc64 systems, SGI mips64 and ia64 systems, and IBM ia32, ppc64, and s390 systems. Of these, only the ppc64 machines needing the functionality have yet to be released; all others have had systems requiring it for full functionality for at least 6 months, and in some cases, since the initial Linux port to the affected architecture.
2003-06-02	[PATCH] preallocate signal queue resource - Posix timers	Jim Houston
	This adds a new interface to kernel/signal.c which allows signals to be sent using preallocated sigqueue structures. It also modifies kernel/posix-timers.c to use this interface. The current timer code may fail to deliver a timer expiry signal if there are no sigqueue structures available at the time of the expiry. The Posix specification is clear that the signal queuing resource should be allocated at timer_create time. This allows the error to be returned to the application rather than silently losing the signal. This patch does not change the sigqueue structure allocation policy. I hope to revisit that in another patch. Here is the definition for the new interface: struct sigqueue sigqueue_alloc(void) Preallocate a sigqueue structure for use with the functions described below. void sigqueue_free(struct sigqueue q) Free a preallocated sigqueue structure. If the sigqueue structure being freed is still queued, it will be removed from the queue. I currently leave the signal pending. It may be delivered without the siginfo structure. int send_sigqueue(int sig, struct sigqueue q, struct task_struct p) This function is equivalent to send_sig_info(). It queues a signal to the specified thread using the supplied sigqueue structure. The caller is expected to fill in the siginfo_t which is part of the sigqueue structure. int send_group_sigqueue(int sig, struct sigqueue q, struct task_struct p) This function is equivalent to send_group_sig_info(). It queues the signal to a process allowing the system to select which thread will receive the signal in a multi-threaded process. Again, the sigqueue structure is used to queue the signal. Both send_sigqueue() and send_group_sigqueue() return 0 if the signal is queued. They return 1 if the signal was not queued because the process is ignoring the signal. Both versions include code to increment the si_overrun count if the sigqueue entry is for a Posix timer and they are called while the sigqueue entry is still queued. Yes, I know that the current code doesn't rearm the timer until the signal is delivered. Having this extra bit of code doesn't do any harm, and I plan to use it. These routines do not check if there already is a legacy (non-realtime) signal pending. They always queue the signal. This requires that collect_signal() always checks if there is another matching siginfo before clearing the signal bit.
2003-05-25	[PATCH] Fix dcache_lock/tasklist_lock ranking bug	Andrew Morton
	__unhash_process acquires the dcache_lock while holding the tasklist_lock for writing. This can deadlock. Additionally, fs/proc/base.c incorrectly assumed that p->pid would be set to 0 during release_task. The patch fixes that by adding a new spinlock to the task structure and fixing all references to (!p->pid). The alternative to the new spinlock would be to hold dcache_lock around __unhash_process. - fs/proc/base.c assumed that p->pid is reset to 0 during exit. This is not the case anymore. I now look at the count of the pid structure for PIDTYPE_PID. - de_thread now tested - as broken as it was before: open handles to /proc/<pid> are either stale or invalid after an exec of a nptl process, if the exec was call from a secondary thread. - a few lock_kernels removed - that part of /proc doesn't need it. - additional instances of 'if(current->pid)' replaced with pid_alive.
2003-04-12	[PATCH] convert file_lock to a spinlock	Andrew Morton
	Time to write a 2M file, one byte at a time: Before: 1.09s user 4.92s system 99% cpu 6.014 total 0.74s user 5.28s system 99% cpu 6.023 total 1.03s user 4.97s system 100% cpu 5.991 total After: 0.79s user 5.17s system 99% cpu 5.993 total 0.79s user 5.17s system 100% cpu 5.957 total 0.84s user 5.11s system 100% cpu 5.942 total
2003-02-20	Fix x86 "switch_to()" to properly set the previous task information,	Linus Torvalds
	which is needed to keep track of process usage counts correctly and efficiently.
2003-02-17	[PATCH] POSIX clocks & timers	George Anzinger
	This is version 23 or so of the POSIX timer code. Internal changelog: - Changed the signals code to match the new order of things. Also the new xtime_lock code needed to be picked up. It made some things a lot simpler. - Fixed a spin lock hand off problem in locking timers (thanks to Randy). - Fixed nanosleep to test for out of bound nanoseconds (thanks to Julie). - Fixed a couple of id deallocation bugs that left old ids laying around (hey I get this one). - This version has a new timer id manager. Andrew Morton suggested elimination of recursion (done) and I added code to allow it to release unused nodes. The prior version only released the leaf nodes. (The id manager uses radix tree type nodes.) Also added is a reuse count so ids will not repeat for at least 256 alloc/ free cycles. - The changes for the new sys_call restart now allow one restart function to handle both nanosleep and clock_nanosleep. Saves a bit of code, nice. - All the requested changes and Lindent too :). - I also broke clock_nanosleep() apart much the same way nanosleep() was with the 2.5.50-bk5 changes. TIMER STORMS The POSIX clocks and timers code prevents "timer storms" by not putting repeating timers back in the timer list until the signal is delivered for the prior expiry. Timer events missed by this delay are accounted for in the timer overrun count. The net result is MUCH lower system overhead while presenting the same info to the user as would be the case if an interrupt and timer processing were required for each increment in the overrun count.
2003-02-06	Split up "struct signal_struct" into "signal" and "sighand" parts.	Linus Torvalds
	This is required to get make the old LinuxThread semantics work together with the fixed-for-POSIX full signal sharing. A traditional CLONE_SIGHAND thread (LinuxThread) will not see any other shared signal state, while a new-style CLONE_THREAD thread will share all of it. This way the two methods don't confuse each other.
2002-12-20	[PATCH] Fix CPU bitmask truncation	William Lee Irwin III
	Fix task->cpus_allowed bitmask truncations on 64.bit architectures. Originally by Bjorn Helgaas for 2.4.x.
2002-09-28	[PATCH] atomic-thread-signals	Ingo Molnar
	Avoid racing on signal delivery with thread signal blocking in thread groups. The method to do this is to eliminate the per-thread sigmask_lock, and use the per-group (per 'process') siglock for all signal related activities. This immensely simplified some of the locking interactions within signal.c, and enabled the fixing of the above category of signal delivery races. This became possible due to the former thread-signal patch, which made siglock an irq-safe thing. (it used to be a process-context-only spinlock.) And this is even a speedup for non-threaded applications: only one lock is used. I fixed all places within the kernel except the non-x86 arch sections. Even for them the transition is very straightforward, in almost every case the following is sufficient in arch/*/kernel/signal.c: :1,$s/->sigmask_lock/->sig->siglock/g
2002-09-22	[PATCH] pidhash cleanups, tgid-2.5.38-F3	Ingo Molnar
	This does the following things: - removes the ->thread_group list and uses a new PIDTYPE_TGID pid class to handle thread groups. This cleans up lots of code in signal.c and elsewhere. - fixes sys_execve() if a non-leader thread calls it. (2.5.38 crashed in this case.) - renames list_for_each_noprefetch to __list_for_each. - cleans up delayed-leader parent notification. - introduces link_pid() to optimize PIDTYPE_TGID installation in the thread-group case. I've tested the patch with a number of threaded and non-threaded workloads, and it works just fine. Compiles & boots on UP and SMP x86. The session/pgrp bugs reported to lkml are probably still open, they are the next on my todo - now that we have a clean pidhash architecture they should be easier to fix.
2002-09-13	[PATCH] Use a sync iocb for generic_file_read	Andrew Morton
	This adds support for synchronous iocbs and converts generic_file_read to use a sync iocb to call into generic_file_aio_read. The tests I've run with lmbench on a piii-866 showed no difference in file re-read speed when forced to use a completion path via aio_complete and an -EIOCBQUEUED return from generic_file_aio_read -- people with slower machines might want to test this to see if we can tune it any better. Also, a bug fix to correct a missing call into the aio code from the fork code is present. This patch sets things up for making generic_file_aio_read actually asynchronous.
2002-09-12	[PATCH] sys_exit() threading improvements, BK-curr	Ingo Molnar
	This implements the 'keep the initial thread around until every thread in the group exits' concept in a different, less intrusive way, along your suggestions. There is no exit_done completion handling anymore, freeing of the task is still done by wait4(). This has the following side-effect: detached threads/processes can only be started within a thread group, not in a standalone way. (This also fixes the bugs introduced by the ->exit_done code, which made it possible for a zombie task to be reactivated.) I've introduced the p->group_leader pointer, which can/will be used for other purposes in the future as well - since from now on the thread group leader is always existent. Right now it's used to notify the parent of the thread group leader from the last non-leader thread that exits [if the thread group leader is a zombie already].
2002-09-08	[PATCH] Re: pinpointed: PANIC caused by dequeue_signal() in current Linus	Ingo Molnar
	This fixes the bootup crash. There were two initialization bugs: - INIT_SIGNAL needs to set shared_pending. - exec() needs to set up newsig properly. the second one caused the crash Anton saw.
2002-08-18	[PATCH] O(1) sys_exit(), threading, scalable-exit-2.5.31-B4	Ingo Molnar
	the attached patch updates a number of items: - adds cleanups suggested by Christoph Hellwig: needed unlikely() statements, a superfluous #define and line length problems. - splits up the global ptrace list into per-task ptrace lists. This was pretty straightforward, and this makes the worst-case exit() latency O(nr_children). the per-task ptrace lists unearthed a bug that the previous code did not take care of: tasks on the ptrace list have to be correctly reparented as well. This patch passed my stresstests as well.
2002-08-12	[PATCH] designated initializers for include/linux	Rusty Russell
	These are the completely generic bits (linux/init_task.h and linux/wait.h). From: Art Haas <ahaas@neosoft.com> Here's the latest diffs for the files in include/linux. Patches are against 2.5.31.
2002-07-23	[PATCH] scheduler fixes	Ingo Molnar
	- introduce new type of context-switch locking, this is a must-have for ia64 and sparc64. - load_balance() bug noticed by Scott Rhine and myself: scan the whole list to find imbalance number of tasks, not just the tail of the list. - sched_yield() fix: use current->array not rq->active.
2002-05-17	[PATCH] clean up maximum priorities	Robert Love
	This patch further cleans up and separates the code in an effort to allow setting (a) a larger maximum real-time priority than default and (b) a maximum kernel RT priority that is separate than the maximum priority exported to user-space.
2002-03-14	Cleanup: use list macros for task list	Linus Torvalds

2002-03-14	[PATCH] wait4() WIFSTOPPED starvation fix #1/2	David Howells
	This patch (#1) just converts the task_struct to use struct list_head rather than direct pointers for maintaining the children list.
2002-02-23	- new, less intrusive and faster migration method:	Ingo Molnar
	/* * This is how migration works: * * 1) we queue a migration_req_t structure in the source CPU's * runqueue and wake up that CPU's migration thread. * 2) we down() the locked semaphore => thread blocks. * 3) migration thread wakes up (implicitly it forces the migrated * thread off the CPU) * 4) it gets the migration request and checks whether the migrated * task is still in the wrong runqueue. * 5) if it's in the wrong runqueue then the migration thread removes * it and puts it into the right queue. * 6) migration thread up()s the semaphore. * 7) we wake up and the migration is done. */
2002-02-21	cleanups, speedups and fixes. Added support for non-current set_cpus_allowed().	Ingo Molnar

2002-02-11	merge to the -K3 scheduler.	Ingo Molnar

2002-02-06	[PATCH] thread information block	David Howells
	syscall latency improvement * There's now an asm/thread_info.h header file with the basic structure def and asm offsets in it. * There's now a linux/thread_info.h header file which includes the asm version and wraps some bitops calls to make convenience functions for accessing the low-level flags. * The task_struct has had some fields removed (and some flags), and has acquired a pointer to the thread_info struct. * task_struct's are now allocated on slabs in kernel/fork.c, whereas thread_info structs are allocated at the bottom of the stack pages. * Some more convenience functions are provided at the end of linux/sched.h to access flags in other tasks (these are here because they need to access the task_struct).
2002-02-05	v2.5.2.6 -> v2.5.3	Linus Torvalds
	- Doug Ledford: i810 audio driver update - Evgeniy Polyakov: update various SCSI drivers to new locking - David Howells: syscall latency improvement, try 2 - Francois Romieu: dscc4 driver update - Patrick Mochel: driver model fixes - Andrew Morton: clean up a few details in ext3 inode initialization - Pete Wyckoff: make x86 machine check print out right address.. - Hans Reiser: reiserfs update - Richard Gooch: devfs update - Greg KH: USB updates - Dave Jones: PNPBIOS - Nathan Scott: extended attributes - Corey Minyard: clean up zlib duplication (triplication..)
2002-02-05	v2.5.2.5 -> v2.5.2.6	Linus Torvalds
	- Asit Mallick: mtrr update - Patrick Mochel: split up kernel/device.c into drivers/base - Mikael Pettersson/Al Viro: fix missing in-core inode initialization in ext2 introduced by Al's inode trimming - David Miller: sparc and network updates - Frank Davis: firewire video mmap page remapping fix - me: fix configure help scripts to fix breakage noticed by Dave Jones - Greg KH: USB updates - Kai Germaschewski: ISDN fixes, Config.help entries - Douglas Gilbert: SCSI doc update - Ingo Molnar: x86 taskswitch optimizations, scheduler updates - Mikael Pettersson: make APIC work on old external setups - Al Viro: more inode trimming
2002-02-05	v2.5.2.1 -> v2.5.2.1.1	Linus Torvalds
	- David Howells: abtract out "current->need_resched" as "need_resched()" - Frank Davis: ide-tape update for bio - various: header file fixups - Jens Axboe: fix up bio/ide/highmem issues - Kai Germaschewski: ISDN update - Tim Waugh: parport update - Patrik Mochel: initcall update - Greg KH: USB and Compaq PCI hotplug updates
2002-02-05	v2.5.2 -> v2.5.2.1	Linus Torvalds
	- Al Viro: fix up silly problem in swapfile filp cleanups in 2.5.2 - Tachino Nobuhiro: fix another error return for swapfile filp code - Robert Love: merge some of Ingo's scheduler fixes - David Miller: networking, sparc and some scsi driver fixes - Tim Waugh: parport update - OGAWA Hirofumi: fatfs cleanups and bugfixes - Roland Dreier: fix vsscanf buglets. - Ben LaHaise: include file cleanup - Andre Hedrick: IDE taskfile update