| Age | Commit message (Collapse) | Author |
|
This is the next iteration of the workqueue abstraction.
The framework includes:
- per-CPU queueing support.
on SMP there is a per-CPU worker thread (bound to its CPU) and per-CPU
work queues - this feature is completely transparent to workqueue-users.
keventd automatically uses this feature. XFS can now update to work-queues
and have the same per-CPU performance as it had with its per-CPU worker
threads.
- delayed work submission
there's a new queue_delayed_work(wq, work, delay) function and a new
schedule_delayed_work(work, delay) function. The later one is used to
correctly fix former tq_timer users. I've reverted those changes in 2.5.40
that changed tq_timer uses to schedule_work() - eg. in the case of
random.c or the tty flip queue it was definitely the wrong thing to do.
delayed work means a timer embedded in struct work_struct. I considered
using split struct work_struct and delayed_work_struct types, but lots
of code actively uses task-queues in both delayed and non-delayed mode,
so i went for the more generic approach that allows both methods of work
submission. Delayed timers do not cause any other overhead in the
normal submission path otherwise.
- multithreaded run_workqueue() implementation
the run_workqueue() function can now be called from multiple contexts, and
a worker thread will only use up a single entryy - this property is used
by the flushing code, and can potentially be used in the future to extend
the number of per-CPU worker threads.
- more reliable flushing
there's now a 'pending work' counter, which is used to accurately detect
when the last work-function has finished execution. It's also used to
correctly flush against timed requests. I'm not convinced whether the old
keventd implementation got this detail right.
- i switched the arguments of the queueing function(s) per Jeff's
suggestion, it's more straightforward this way.
Driver fixes:
i have converted almost every affected driver to the new framework. This
cleaned up tons of code. I also fixed a number of drivers that were still
using BHs (these drivers did not compile in 2.5.40).
while this means lots of changes, it might ease the QA decision whether to
put this patch into 2.5.
The pach converts roughly 80% of all tqueue-using code to workqueues - and
all the places that are not converted to workqueues yet are places that do
not compile in vanilla 2.5.40 anyway, due to unrelated changes. I've
converted a fair number of drivers that do not compile in 2.5.40, and i
think i've managed to convert every driver that compiles under 2.5.40.
|
|
This is the smptimers patch plus the removal of old BHs and a rewrite of
task-queue handling.
Basically with the removal of TIMER_BH i think the time is right to get
rid of old BHs forever, and to do a massive cleanup of all related
fields. The following five basic 'execution context' abstractions are
supported by the kernel:
- hardirq
- softirq
- tasklet
- keventd-driven task-queues
- process contexts
I've done the following cleanups/simplifications to task-queues:
- removed the ability to define your own task-queue, what can be done is
to schedule_task() a given task to keventd, and to flush all pending
tasks.
This is actually a quite easy transition, since 90% of all task-queue
users in the kernel used BH_IMMEDIATE - which is very similar in
functionality to keventd.
I believe task-queues should not be removed from the kernel altogether.
It's true that they were written as a candidate replacement for BHs
originally, but they do make sense in a different way: it's perhaps the
easiest interface to do deferred processing from IRQ context, in
performance-uncritical code areas. They are easier to use than
tasklets.
code that cares about performance should convert to tasklets - as the
timer code and the serial subsystem has done already. For extreme
performance softirqs should be used - the net subsystem does this.
and we can do this for 2.6 - there are only a couple of areas left after
fixing all the BH_IMMEDIATE places.
i have moved all the taskqueue handling code into kernel/context.c, and
only kept the basic 'queue a task' definitions in include/linux/tqueue.h.
I've converted three of the most commonly used BH_IMMEDIATE users:
tty_io.c, floppy.c and random.c. [random.c might need more thought
though.]
i've also cleaned up kernel/timer.c over that of the stock smptimers
patch: privatized the timer-vec definitions (nothing needs it,
init_timer() used it mistakenly) and cleaned up the code. Plus i've moved
some code around that does not belong into timer.c, and within timer.c
i've organized data and functions along functionality and further
separated the base timer code from the NTP bits.
net_bh_lock: i have removed it, since it would synchronize to nothing. The
old protocol handlers should still run on UP, and on SMP the kernel prints
a warning upon use. Alexey, is this approach fine with you?
scalable timers: i've further improved the patch ported to 2.5 by wli and
Dipankar. There is only one pending issue i can see, the question of
whether to migrate timers in mod_timer() or not. I'm quite convinced that
they should be migrated, but i might be wrong. It's a 10 lines change to
switch between migrating and non-migrating timers, we can do performance
tests later on. The current, more complex migration code is pretty fast
and has been stable under extremely high networking loads in the past 2
years, so we can immediately switch to the simpler variant if someone
proves it improves performance. (I'd say if non-migrating timers improve
Apache performance on one of the bigger NUMA boxes then the point is
proven, no further though will be needed.)
|
|
Avoid racing on signal delivery with thread signal blocking in thread
groups.
The method to do this is to eliminate the per-thread sigmask_lock, and
use the per-group (per 'process') siglock for all signal related
activities. This immensely simplified some of the locking interactions
within signal.c, and enabled the fixing of the above category of signal
delivery races.
This became possible due to the former thread-signal patch, which made
siglock an irq-safe thing. (it used to be a process-context-only
spinlock.) And this is even a speedup for non-threaded applications:
only one lock is used.
I fixed all places within the kernel except the non-x86 arch sections.
Even for them the transition is very straightforward, in almost every
case the following is sufficient in arch/*/kernel/signal.c:
:1,$s/->sigmask_lock/->sig->siglock/g
|
|
This is actually part of the work I've been doing to remove BHs, but it
stands by itself.
|
|
Here's suspend-to-{RAM,disk} combined patch for
2.5.17. Suspend-to-disk is pretty stable and was tested in
2.4-ac. Suspend-to-RAM is little more experimental, but works for me,
and is certainly better than disk-eating version currently in kernel.
Major parts are: process stopper, S3 specific code, S4 specific
code.
|
|
|
|
- Trond Myklebust: deadlock checking in lockd server
- Tim Waugh: fix up parport wrong #define
- Christoph Hellwig: i2c update, ext2 cleanup
- Al Viro: fix partition handling sanity check.
- Trond Myklebust: make NFS use SLAB_NOFS, and not play games with PF_MEMALLOC
- Ben Fennema: UDF update
- Alan Cox: continued merging
- Chris Mason: get /proc buffer memory sizes right after buf-in-page-cache
|
|
- Russell King: ARM updates
- Al Viro: more init cleanups
- Cort Dougan: more PPC updates
- David Miller: cleanups, pci mmap updates
- Neil Brown: raid resync by sector
- Alan Cox: more merging with -ac
- Johannes Erdfelt: USB updates
- Kai Germaschewski: ISDN updates
- Tobias Ringstrom: dmfe.c network driver update
- Trond Myklebust: NFS client updates and cleanups
|
|
- ReiserFS merge
- fix DRM R128/AGP dependency
|
|
|