| Age | Commit message (Collapse) | Author |
|
Sync iocbs have a life cycle that don't need a kioctx. Their retrying, if
any, is done in the context of their owner who has allocated them on the
stack.
The sole user of a sync iocb's ctx reference was aio_complete() checking for
an elevated iocb ref count that could never happen. No path which grabs an
iocb ref has access to sync iocbs.
If we were to implement sync iocb cancelation it would be done by the owner of
the iocb using its on-stack reference.
Removing this chunk from aio_complete allows us to remove the entire kioctx
instance from mm_struct, reducing its size by a third. On a i386 testing box
the slab size went from 768 to 504 bytes and from 5 to 8 per page.
Signed-off-by: Zach Brown <zach.brown@oracle.com>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Patch to eliminate struct files_struct.file_lock spinlock on the reader side
and use rcu refcounting rcuref_xxx api for the f_count refcounter. The
updates to the fdtable are done by allocating a new fdtable structure and
setting files->fdt to point to the new structure. The fdtable structure is
protected by RCU thereby allowing lock-free lookup. For fd arrays/sets that
are vmalloced, we use keventd to free them since RCU callbacks can't sleep. A
global list of fdtable to be freed is not scalable, so we use a per-cpu list.
If keventd is already handling the current cpu's work, we use a timer to defer
queueing of that work.
Since the last publication, this patch has been re-written to avoid using
explicit memory barriers and use rcu_assign_pointer(), rcu_dereference()
premitives instead. This required that the fd information is kept in a
separate structure (fdtable) and updated atomically.
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
In order for the RCU to work, the file table array, sets and their sizes must
be updated atomically. Instead of ensuring this through too many memory
barriers, we put the arrays and their sizes in a separate structure. This
patch takes the first step of putting the file table elements in a separate
structure fdtable that is embedded withing files_struct. It also changes all
the users to refer to the file table using files_fdtable() macro. Subsequent
applciation of RCU becomes easier after this.
Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This updates the CFQ io scheduler to the new time sliced design (cfq
v3). It provides full process fairness, while giving excellent
aggregate system throughput even for many competing processes. It
supports io priorities, either inherited from the cpu nice value or set
directly with the ioprio_get/set syscalls. The latter closely mimic
set/getpriority.
This import is based on my latest from -mm.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Instead of requiring architecture code to interact with the scheduler's
locking implementation, provide a couple of defines that can be used by the
architecture to request runqueue unlocked context switches, and ask for
interrupts to be enabled over the context switch.
Also replaces the "switch_lock" used by these architectures with an oncpu
flag (note, not a potentially slow bitflag). This eliminates one bus
locked memory operation when context switching, and simplifies the
task_running function.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
POSIX requires that setitimer, getitimer, and alarm work on a per-process
basis. Currently, Linux implements these for individual threads. This patch
fixes these semantics for the ITIMER_REAL timer (which generates SIGALRM),
making it shared by all threads in a process (thread group).
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
POSIX requires that when you claim _POSIX_CPUTIME and _POSIX_THREAD_CPUTIME,
not only the clock_* calls but also timer_* calls must support the thread and
process CPU time clocks. This patch provides that support, building on my
recent additions to support these clocks in the POSIX clock_* interfaces.
This patch will not work without those changes, as well as the patch fixing
the timer lock-siglock deadlock problem.
The apparent pervasive changes to posix-timers.c are simply that some fields
of struct k_itimer have changed name and moved into a union. This was
appropriate since the data structures required for the existing real-time
timer support and for the new thread/process CPU-time timers are quite
different.
The glibc patches to support CPU time clocks using the new kernel support is
in http://people.redhat.com/roland/glibc/kernel-cpuclocks.patch, and that
includes tests for the timer support (if you build glibc with NPTL).
From: Christoph Lameter <clameter@sgi.com>
Your patch breaks the mmtimer driver because it used k_itimer values for
its own purposes. Here is a fix by defining an additional structure in
k_itimer (same approach for mmtimer as the cpu timers):
From: Roland McGrath <roland@redhat.com>
Fix bug identified by Alexander Nyberg <alexn@dsv.su.se>
> The problem arises from code touching the union in alloc_posix_timer()
> which makes firing go non-zero. When firing is checked in
> posix_cpu_timer_set() it will be positive causing an infinite loop.
>
> So either the below fix or preferably move the INIT_LIST_HEAD(x) from
> alloc_posix_timer() to somewhere later where it doesn't disturb the other
> union members.
Thanks for finding this problem. The latter is what I think is the right
solution. This patch does that, and also removes some superfluous rezeroing.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
sa_handler isn't the first member of struct sigaction on mips. Use C99
initializers to avoid a compiler warning. (There don't seem to be more
serious problems as mips worked with that warning for ages)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
There is really no point in each task_struct having its own waitchld_exit.
In the only use of it, the waitchld_exit of each thread in a group gets
woken up at the same time. So, there might as well just be one wait queue
for the whole thread group. This patch does that by moving the field from
task_struct to signal_struct. It should have no effect on the behavior,
but saves a little work and a little storage in the multithreaded case.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
POSIX specifies that the limit settings provided by getrlimit/setrlimit are
shared by the whole process, not specific to individual threads. This
patch changes the behavior of those calls to comply with POSIX.
I've moved the struct rlimit array from task_struct to signal_struct, as it
has the correct sharing properties. (This reduces kernel memory usage per
thread in multithreaded processes by around 100/200 bytes for 32/64
machines respectively.) I took a fairly minimal approach to the locking
issues with the newly shared struct rlimit array. It turns out that all
the code that is checking limits really just needs to look at one word at a
time (one rlim_cur field, usually). It's only the few places like
getrlimit itself (and fork), that require atomicity in accessing a whole
struct rlimit, so I just used a spin lock for them and no locking for most
of the checks. If it turns out that readers of struct rlimit need more
atomicity where they are now cheap, or less overhead where they are now
atomic (e.g. fork), then seqcount is certainly the right thing to use for
them instead of readers using the spin lock. Though it's in signal_struct,
I didn't use siglock since the access to rlimits never needs to disable
irqs and doesn't overlap with other siglock uses. Instead of adding
something new, I overloaded task_lock(task->group_leader) for this; it is
used for other things that are not likely to happen simultaneously with
limit tweaking. To me that seems preferable to adding a word, but it would
be trivial (and arguably cleaner) to add a separate lock for these users
(or e.g. just use seqlock, which adds two words but is optimal for readers).
Most of the changes here are just the trivial s/->rlim/->signal->rlim/.
I stumbled across what must be a long-standing bug, in reparent_to_init.
It does:
memcpy(current->rlim, init_task.rlim, sizeof(*(current->rlim)));
when surely it was intended to be:
memcpy(current->rlim, init_task.rlim, sizeof(current->rlim));
As rlim is an array, the * in the sizeof expression gets the size of the
first element, so this just changes the first limit (RLIMIT_CPU). This is
for kernel threads, where it's clear that resetting all the rlimits is what
you want. With that fixed, the setting of RLIMIT_FSIZE in nfsd is
superfluous since it will now already have been reset to RLIM_INFINITY.
The other subtlety is removing:
tsk->rlim[RLIMIT_CPU].rlim_cur = RLIM_INFINITY;
in exit_notify, which was to avoid a race signalling during self-reaping
exit. As the limit is now shared, a dying thread should not change it for
others. Instead, I avoid that race by checking current->state before the
RLIMIT_CPU check. (Adding one new conditional in that path is now required
one way or another, since if not for this check there would also be a new
race with self-reaping exit later on clearing current->signal that would
have to be checked for.)
The one loose end left by this patch is with process accounting.
do_acct_process temporarily resets the RLIMIT_FSIZE limit while writing the
accounting record. I left this as it was, but it is now changing a limit
that might be shared by other threads still running. I left this in a
dubious state because it seems to me that processing accounting may already
be more generally a dubious state when it comes to NPTL threads. I would
think you would want one record per process, with aggregate data about all
threads that ever lived in it, not a separate record for each thread.
I don't use process accounting myself, but if anyone is interested in
testing it out I could provide a patch to change it this way.
One final note, this is not 100% to POSIX compliance in regards to rlimits.
POSIX specifies that RLIMIT_CPU refers to a whole process in aggregate, not
to each individual thread. I will provide patches later on to achieve that
change, assuming this patch goes in first.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
From: Roland McGrath <roland@redhat.com>
The posix-timers implementation associates timers with the creating thread
and destroys timers when their creator thread dies. POSIX clearly
specifies that these timers are per-process, and a timer should not be torn
down when the thread that created it exits. I hope there won't be any
controversy on what the correct semantics are here, since POSIX is clear
and the Linux feature is called "posix-timers".
The attached program built with NPTL -lrt -lpthread demonstrates the bug.
The program is correct by POSIX, but fails on Linux. Note that a until
just the other day, NPTL had a trivial bug that always disabled its use of
kernel timer syscalls (check strace for lack of timer_create/SYS_259). So
unless you have built your own NPTL libs very recently, you probably won't
see the kernel calls actually used by this program.
Also attached is my patch to fix this. It (you guessed it) moves the
posix_timers field from task_struct to signal_struct. Access is now
governed by the siglock instead of the task lock. exit_itimers is called
from __exit_signal, i.e. only on the death of the last thread in the
group, rather than from do_exit for every thread. Timers' it_process
fields store the group leader's pointer, which won't die. For the case of
SIGEV_THREAD_ID, I hold a ref on the task_struct for it_process to stay
robust in case the target thread dies; the ref is released and the dangling
pointer cleared when the timer fires and the target thread is dead. (This
should only come up in a buggy user program, so noone cares exactly how the
kernel handles that case. But I think what I did is robust and sensical.)
/* Test for bogus per-thread deletion of timers. */
#include <stdio.h>
#include <error.h>
#include <time.h>
#include <signal.h>
#include <stdint.h>
#include <sys/time.h>
#include <sys/resource.h>
#include <unistd.h>
#include <pthread.h>
/* Creating timers in another thread should work too. */
static void *do_timer_create(void *arg)
{
struct sigevent *const sigev = arg;
timer_t *const timerId = sigev->sigev_value.sival_ptr;
if (timer_create(CLOCK_REALTIME, sigev, timerId) < 0) {
perror("timer_create");
return NULL;
}
return timerId;
}
int main(void)
{
int i, res;
timer_t timerId;
struct itimerspec itval;
struct sigevent sigev;
itval.it_interval.tv_sec = 2;
itval.it_interval.tv_nsec = 0;
itval.it_value.tv_sec = 2;
itval.it_value.tv_nsec = 0;
sigev.sigev_notify = SIGEV_SIGNAL;
sigev.sigev_signo = SIGALRM;
sigev.sigev_value.sival_ptr = (void *)&timerId;
for (i = 0; i < 100; i++) {
printf("cnt = %d\n", i);
pthread_t thr;
res = pthread_create(&thr, NULL, &do_timer_create, &sigev);
if (res) {
error(0, res, "pthread_create");
continue;
}
void *val;
res = pthread_join(thr, &val);
if (res) {
error(0, res, "pthread_join");
continue;
}
if (val == NULL)
continue;
res = timer_settime(timerId, 0, &itval, NULL);
if (res < 0)
perror("timer_settime");
res = timer_delete(timerId);
if (res < 0)
perror("timer_delete");
}
return 0;
}
|
|
From: Tim Hockin <thockin@sun.com>,
Neil Brown <neilb@cse.unsw.edu.au>,
me
New groups infrastructure. task->groups and task->ngroups are replaced by
task->group_info. Group)info is a refcounted, dynamic struct with an array
of pages. This allows for large numbers of groups. The current limit of
32 groups has been raised to 64k groups. It can be raised more by changing
the NGROUPS_MAX constant in limits.h
|
|
From: Anton Blanchard <anton@samba.org>
Some architectures use cpu_vm_mask to optimise TLB flushes. On ppc64 we
are now using a common flush infrastructure that handles both userspace and
kernelspace (vmalloc) pages. In order to avoid triggering this
optimisation we need to mark the init mm as having scheduled on all cpus.
Things currently work by luck (we check for the cpu only having run on the
local cpu, and the field is initialised to 0), but it would be safer to
initialise it CPU_MASK_ALL.
|
|
From: William Lee Irwin III <wli@holomorphy.com>
Contributions from:
Jan Dittmer <jdittmer@sfhq.hn.org>
Arnd Bergmann <arnd@arndb.de>
"Bryan O'Sullivan" <bos@serpentine.com>
"David S. Miller" <davem@redhat.com>
Badari Pulavarty <pbadari@us.ibm.com>
"Martin J. Bligh" <mbligh@aracnet.com>
Zwane Mwaikambo <zwane@linuxpower.ca>
It has ben tested on x86, sparc64, x86_64, ia64 (I think), ppc and ppc64.
cpumask_t enables systems with NR_CPUS > BITS_PER_LONG to utilize all their
cpus by creating an abstract data type dedicated to representing cpu
bitmasks, similar to fd sets from userspace, and sweeping the appropriate
code to update callers to the access API. The fd set-like structure is
according to Linus' own suggestion; the macro calling convention to ambiguate
representations with minimal code impact is my own invention.
Specifically, a new set of inline functions for manipulating arbitrary-width
bitmaps is introduced with a relatively simple implementation, in tandem with
a new data type representing bitmaps of width NR_CPUS, cpumask_t, whose
accessor functions are defined in terms of the bitmap manipulation inlines.
This bitmap ADT found an additional use in i386 arch code handling sparse
physical APIC ID's, which was convenient to use in this case as the
accounting structure was required to be wider to accommodate the physids
consumed by larger numbers of cpus.
For the sake of simplicity and low code impact, these cpu bitmasks are passed
primarily by value; however, an additional set of accessors along with an
auxiliary data type with const call-by-reference semantics is provided to
address performance concerns raised in connection with very large systems,
such as SGI's larger models, where copying and call-by-value overhead would
be prohibitive. Few (if any) users of the call-by-reference API are
immediately introduced.
Also, in order to avoid calling convention overhead on architectures where
structures are required to be passed by value, NR_CPUS <= BITS_PER_LONG is
special-cased so that cpumask_t falls back to an unsigned long and the
accessors perform the usual bit twiddling on unsigned longs as opposed to
arrays thereof. Audits were done with the structure overhead in-place,
restoring this special-casing only afterward so as to ensure a more complete
API conversion while undergoing the majority of its end-user exposure in -mm.
More -mm's were shipped after its restoration to be sure that was tested,
too.
The immediate users of this functionality are Sun sparc64 systems, SGI mips64
and ia64 systems, and IBM ia32, ppc64, and s390 systems. Of these, only the
ppc64 machines needing the functionality have yet to be released; all others
have had systems requiring it for full functionality for at least 6 months,
and in some cases, since the initial Linux port to the affected architecture.
|
|
This adds a new interface to kernel/signal.c which allows signals to be
sent using preallocated sigqueue structures. It also modifies
kernel/posix-timers.c to use this interface.
The current timer code may fail to deliver a timer expiry signal if
there are no sigqueue structures available at the time of the expiry.
The Posix specification is clear that the signal queuing resource should
be allocated at timer_create time. This allows the error to be returned
to the application rather than silently losing the signal.
This patch does not change the sigqueue structure allocation policy. I
hope to revisit that in another patch.
Here is the definition for the new interface:
struct sigqueue *sigqueue_alloc(void)
Preallocate a sigqueue structure for use with the functions
described below.
void sigqueue_free(struct sigqueue *q)
Free a preallocated sigqueue structure. If the sigqueue
structure being freed is still queued, it will be removed
from the queue. I currently leave the signal pending.
It may be delivered without the siginfo structure.
int send_sigqueue(int sig, struct sigqueue *q, struct task_struct *p)
This function is equivalent to send_sig_info(). It queues
a signal to the specified thread using the supplied sigqueue
structure. The caller is expected to fill in the siginfo_t
which is part of the sigqueue structure.
int send_group_sigqueue(int sig, struct sigqueue *q, struct task_struct *p)
This function is equivalent to send_group_sig_info(). It queues
the signal to a process allowing the system to select which thread
will receive the signal in a multi-threaded process.
Again, the sigqueue structure is used to queue the signal.
Both send_sigqueue() and send_group_sigqueue() return 0 if the signal
is queued. They return 1 if the signal was not queued because the
process is ignoring the signal.
Both versions include code to increment the si_overrun count if the
sigqueue entry is for a Posix timer and they are called while the
sigqueue entry is still queued. Yes, I know that the current code
doesn't rearm the timer until the signal is delivered. Having this
extra bit of code doesn't do any harm, and I plan to use it.
These routines do not check if there already is a legacy (non-realtime)
signal pending. They always queue the signal. This requires that
collect_signal() always checks if there is another matching siginfo
before clearing the signal bit.
|
|
__unhash_process acquires the dcache_lock while holding the
tasklist_lock for writing. This can deadlock. Additionally,
fs/proc/base.c incorrectly assumed that p->pid would be set to 0 during
release_task.
The patch fixes that by adding a new spinlock to the task structure and
fixing all references to (!p->pid).
The alternative to the new spinlock would be to hold dcache_lock around
__unhash_process.
- fs/proc/base.c assumed that p->pid is reset to 0 during exit. This is
not the case anymore. I now look at the count of the pid structure for
PIDTYPE_PID.
- de_thread now tested - as broken as it was before: open handles to
/proc/<pid> are either stale or invalid after an exec of a nptl process,
if the exec was call from a secondary thread.
- a few lock_kernels removed - that part of /proc doesn't need it.
- additional instances of 'if(current->pid)' replaced with pid_alive.
|
|
Time to write a 2M file, one byte at a time:
Before:
1.09s user 4.92s system 99% cpu 6.014 total
0.74s user 5.28s system 99% cpu 6.023 total
1.03s user 4.97s system 100% cpu 5.991 total
After:
0.79s user 5.17s system 99% cpu 5.993 total
0.79s user 5.17s system 100% cpu 5.957 total
0.84s user 5.11s system 100% cpu 5.942 total
|
|
which is needed to keep track of process usage counts correctly and
efficiently.
|
|
This is version 23 or so of the POSIX timer code.
Internal changelog:
- Changed the signals code to match the new order of things. Also the
new xtime_lock code needed to be picked up. It made some things a lot
simpler.
- Fixed a spin lock hand off problem in locking timers (thanks
to Randy).
- Fixed nanosleep to test for out of bound nanoseconds
(thanks to Julie).
- Fixed a couple of id deallocation bugs that left old ids
laying around (hey I get this one).
- This version has a new timer id manager. Andrew Morton
suggested elimination of recursion (done) and I added code
to allow it to release unused nodes. The prior version only
released the leaf nodes. (The id manager uses radix tree
type nodes.) Also added is a reuse count so ids will not
repeat for at least 256 alloc/ free cycles.
- The changes for the new sys_call restart now allow one
restart function to handle both nanosleep and clock_nanosleep.
Saves a bit of code, nice.
- All the requested changes and Lindent too :).
- I also broke clock_nanosleep() apart much the same way
nanosleep() was with the 2.5.50-bk5 changes.
TIMER STORMS
The POSIX clocks and timers code prevents "timer storms" by
not putting repeating timers back in the timer list until
the signal is delivered for the prior expiry. Timer events
missed by this delay are accounted for in the timer overrun
count. The net result is MUCH lower system overhead while
presenting the same info to the user as would be the case if
an interrupt and timer processing were required for each
increment in the overrun count.
|
|
This is required to get make the old LinuxThread semantics work
together with the fixed-for-POSIX full signal sharing. A traditional
CLONE_SIGHAND thread (LinuxThread) will not see any other shared
signal state, while a new-style CLONE_THREAD thread will share all
of it.
This way the two methods don't confuse each other.
|
|
Fix task->cpus_allowed bitmask truncations on 64.bit architectures.
Originally by Bjorn Helgaas for 2.4.x.
|
|
Avoid racing on signal delivery with thread signal blocking in thread
groups.
The method to do this is to eliminate the per-thread sigmask_lock, and
use the per-group (per 'process') siglock for all signal related
activities. This immensely simplified some of the locking interactions
within signal.c, and enabled the fixing of the above category of signal
delivery races.
This became possible due to the former thread-signal patch, which made
siglock an irq-safe thing. (it used to be a process-context-only
spinlock.) And this is even a speedup for non-threaded applications:
only one lock is used.
I fixed all places within the kernel except the non-x86 arch sections.
Even for them the transition is very straightforward, in almost every
case the following is sufficient in arch/*/kernel/signal.c:
:1,$s/->sigmask_lock/->sig->siglock/g
|
|
This does the following things:
- removes the ->thread_group list and uses a new PIDTYPE_TGID pid class
to handle thread groups. This cleans up lots of code in signal.c and
elsewhere.
- fixes sys_execve() if a non-leader thread calls it. (2.5.38 crashed in
this case.)
- renames list_for_each_noprefetch to __list_for_each.
- cleans up delayed-leader parent notification.
- introduces link_pid() to optimize PIDTYPE_TGID installation in the
thread-group case.
I've tested the patch with a number of threaded and non-threaded
workloads, and it works just fine. Compiles & boots on UP and SMP x86.
The session/pgrp bugs reported to lkml are probably still open, they are
the next on my todo - now that we have a clean pidhash architecture they
should be easier to fix.
|
|
This adds support for synchronous iocbs and converts generic_file_read
to use a sync iocb to call into generic_file_aio_read.
The tests I've run with lmbench on a piii-866 showed no difference in
file re-read speed when forced to use a completion path via aio_complete
and an -EIOCBQUEUED return from generic_file_aio_read -- people with
slower machines might want to test this to see if we can tune it any
better. Also, a bug fix to correct a missing call into the aio code
from the fork code is present. This patch sets things up for making
generic_file_aio_read actually asynchronous.
|
|
This implements the 'keep the initial thread around until every thread
in the group exits' concept in a different, less intrusive way, along
your suggestions. There is no exit_done completion handling anymore,
freeing of the task is still done by wait4(). This has the following
side-effect: detached threads/processes can only be started within a
thread group, not in a standalone way.
(This also fixes the bugs introduced by the ->exit_done code, which made
it possible for a zombie task to be reactivated.)
I've introduced the p->group_leader pointer, which can/will be used for
other purposes in the future as well - since from now on the thread
group leader is always existent. Right now it's used to notify the
parent of the thread group leader from the last non-leader thread that
exits [if the thread group leader is a zombie already].
|
|
This fixes the bootup crash. There were two initialization bugs:
- INIT_SIGNAL needs to set shared_pending.
- exec() needs to set up newsig properly.
the second one caused the crash Anton saw.
|
|
the attached patch updates a number of items:
- adds cleanups suggested by Christoph Hellwig: needed unlikely()
statements, a superfluous #define and line length problems.
- splits up the global ptrace list into per-task ptrace lists. This was
pretty straightforward, and this makes the worst-case exit() latency
O(nr_children).
the per-task ptrace lists unearthed a bug that the previous code did not
take care of: tasks on the ptrace list have to be correctly reparented as
well. This patch passed my stresstests as well.
|
|
These are the completely generic bits (linux/init_task.h and linux/wait.h).
From: Art Haas <ahaas@neosoft.com>
Here's the latest diffs for the files in include/linux.
Patches are against 2.5.31.
|
|
- introduce new type of context-switch locking, this is a must-have for
ia64 and sparc64.
- load_balance() bug noticed by Scott Rhine and myself: scan the
whole list to find imbalance number of tasks, not just the tail
of the list.
- sched_yield() fix: use current->array not rq->active.
|
|
This patch further cleans up and separates the code in an effort to
allow setting (a) a larger maximum real-time priority than default and
(b) a maximum kernel RT priority that is separate than the maximum
priority exported to user-space.
|
|
|
|
This patch (#1) just converts the task_struct to use struct list_head rather
than direct pointers for maintaining the children list.
|
|
/*
* This is how migration works:
*
* 1) we queue a migration_req_t structure in the source CPU's
* runqueue and wake up that CPU's migration thread.
* 2) we down() the locked semaphore => thread blocks.
* 3) migration thread wakes up (implicitly it forces the migrated
* thread off the CPU)
* 4) it gets the migration request and checks whether the migrated
* task is still in the wrong runqueue.
* 5) if it's in the wrong runqueue then the migration thread removes
* it and puts it into the right queue.
* 6) migration thread up()s the semaphore.
* 7) we wake up and the migration is done.
*/
|
|
|
|
|
|
syscall latency improvement
* There's now an asm/thread_info.h header file with the basic structure
def and asm offsets in it.
* There's now a linux/thread_info.h header file which includes the asm
version and wraps some bitops calls to make convenience functions for
accessing the low-level flags.
* The task_struct has had some fields removed (and some flags), and has
acquired a pointer to the thread_info struct.
* task_struct's are now allocated on slabs in kernel/fork.c, whereas
thread_info structs are allocated at the bottom of the stack pages.
* Some more convenience functions are provided at the end of linux/sched.h to
access flags in other tasks (these are here because they need to access the
task_struct).
|
|
- Doug Ledford: i810 audio driver update
- Evgeniy Polyakov: update various SCSI drivers to new locking
- David Howells: syscall latency improvement, try 2
- Francois Romieu: dscc4 driver update
- Patrick Mochel: driver model fixes
- Andrew Morton: clean up a few details in ext3 inode initialization
- Pete Wyckoff: make x86 machine check print out right address..
- Hans Reiser: reiserfs update
- Richard Gooch: devfs update
- Greg KH: USB updates
- Dave Jones: PNPBIOS
- Nathan Scott: extended attributes
- Corey Minyard: clean up zlib duplication (triplication..)
|
|
- Asit Mallick: mtrr update
- Patrick Mochel: split up kernel/device.c into drivers/base
- Mikael Pettersson/Al Viro: fix missing in-core inode initialization
in ext2 introduced by Al's inode trimming
- David Miller: sparc and network updates
- Frank Davis: firewire video mmap page remapping fix
- me: fix configure help scripts to fix breakage noticed by Dave Jones
- Greg KH: USB updates
- Kai Germaschewski: ISDN fixes, Config.help entries
- Douglas Gilbert: SCSI doc update
- Ingo Molnar: x86 taskswitch optimizations, scheduler updates
- Mikael Pettersson: make APIC work on old external setups
- Al Viro: more inode trimming
|
|
- David Howells: abtract out "current->need_resched" as "need_resched()"
- Frank Davis: ide-tape update for bio
- various: header file fixups
- Jens Axboe: fix up bio/ide/highmem issues
- Kai Germaschewski: ISDN update
- Tim Waugh: parport update
- Patrik Mochel: initcall update
- Greg KH: USB and Compaq PCI hotplug updates
|
|
- Al Viro: fix up silly problem in swapfile filp cleanups in 2.5.2
- Tachino Nobuhiro: fix another error return for swapfile filp code
- Robert Love: merge some of Ingo's scheduler fixes
- David Miller: networking, sparc and some scsi driver fixes
- Tim Waugh: parport update
- OGAWA Hirofumi: fatfs cleanups and bugfixes
- Roland Dreier: fix vsscanf buglets.
- Ben LaHaise: include file cleanup
- Andre Hedrick: IDE taskfile update
|