| Age | Commit message (Collapse) | Author |
|
Headers touched: linux/interrupt.h, linux/sched.h, linux/timer.h
|
|
Add some infrastructure for statically initialising timers,
use that in workqueues.
|
|
If two CPUs run mod_timer against the same not-pending timer then they
have no locking relationship. They can both see the timer as
not-pending and they both add the timer to their cpu-local list. The
CPU which gets there second corrupts the first CPU's lists.
This was causing Dave Hansen's 8-way to oops after a couple of minutes
of specweb testing.
I believe that to fix this we need locking which is associated with the
timer itself. The easy fix is hashed spinlocking based on the timer's
address. The hard fix is a lock inside the timer itself.
It is hard because init_timer() becomes compulsory, to initialise that
spinlock. An unknown number of code paths in the kernel just wipe the
timer to all-zeroes and start using it.
I chose the hard way - it is cleaner and more idiomatic. The patch
also adds a "magic number" to the timer so we can detect when a timer
was not correctly initialised. A warning and stack backtrace is
generated and the timer is fixed up. After 16 such warnings the
warning mechanism shuts itself up until a reboot.
It took six patches to my kernel to stop the warnings from coming out.
The uninitialised timers are extremely easy to find and fix. But it
will take some time to weed them all out. Maybe we should go for
the hashed locking...
Note that the new timer->lock means that we can clean up some awkward
"oh we raced, let's try again" code in timer.c. But to do that we'd
also need to take timer->lock in the commonly-called del_timer(), so I
left it as-is.
The lock is not needed in add_timer() because concurrent
add_timer()/add_timer() and concurrent add_timer()/mod_timer() are
illegal.
|
|
add_timer_on is like add_timer, except it takes a target CPU on which
to add the timer.
The slab code needs per-cpu timers for shrinking the per-cpu caches.
|
|
This does a number of timer subsystem enhancements:
- simplified timer initialization, now it's the cheapest possible thing:
static inline void init_timer(struct timer_list * timer)
{
timer->base = NULL;
}
since the timer functions already did a !timer->base check this did not
have any effect on their fastpath.
- the rule from now on is that timer->base is set upon activation of the
timer, and cleared upon deactivation. This also made it possible to:
- reorganize all the timer handling code to not assume anything about
timer->entry.next and timer->entry.prev - this also removed lots of
unnecessery cleaning of these fields. Removed lots of unnecessary list
operations from the fastpath.
- simplified del_timer_sync(): it now uses del_timer() plus some simple
synchronization code. Note that this also fixes a bug: if mod_timer (or
add_timer) moves a currently executing timer to another CPU's timer
vector, then del_timer_sync() does not synchronize with the handler
properly.
- bugfix: moved run_local_timers() from scheduler_tick() into
update_process_times() .. scheduler_tick() might be called from the fork
code which will not quite have the intended effect ...
- removed the APIC-timer-IRQ shifting done on SMP, Dipankar Sarma's
testing shows no negative effects.
- cleaned up include/linux/timer.h:
- removed the timer_t typedef, and fixes up kernel/workqueue.c to use
the 'struct timer_list' name instead.
- removed unnecessery includes
- renamed the 'list' field to 'entry' (it's an entry not a list head)
- exchanged the 'function' and 'data' fields. This, besides being
more logical, also unearthed the last few remaining places that
initialized timers by assuming some given field ordering, the patch
also fixes these places. (fs/xfs/pagebuf/page_buf.c,
net/core/profile.c and net/ipv4/inetpeer.c)
- removed the defunct sync_timers(), timer_enter() and timer_exit()
prototypes.
- added docbook-style comments.
- other kernel/timer.c changes:
- base->running_timer does not have to be volatile ...
- added consistent comments to all the important functions.
- made the sync-waiting in del_timer_sync preempt- and lowpower-
friendly.
i've compiled, booted & tested the patched kernel on x86 UP and SMP. I
have tried moderately high networking load as well, to make sure the timer
changes are correct - they appear to be.
|
|
This is the smptimers patch plus the removal of old BHs and a rewrite of
task-queue handling.
Basically with the removal of TIMER_BH i think the time is right to get
rid of old BHs forever, and to do a massive cleanup of all related
fields. The following five basic 'execution context' abstractions are
supported by the kernel:
- hardirq
- softirq
- tasklet
- keventd-driven task-queues
- process contexts
I've done the following cleanups/simplifications to task-queues:
- removed the ability to define your own task-queue, what can be done is
to schedule_task() a given task to keventd, and to flush all pending
tasks.
This is actually a quite easy transition, since 90% of all task-queue
users in the kernel used BH_IMMEDIATE - which is very similar in
functionality to keventd.
I believe task-queues should not be removed from the kernel altogether.
It's true that they were written as a candidate replacement for BHs
originally, but they do make sense in a different way: it's perhaps the
easiest interface to do deferred processing from IRQ context, in
performance-uncritical code areas. They are easier to use than
tasklets.
code that cares about performance should convert to tasklets - as the
timer code and the serial subsystem has done already. For extreme
performance softirqs should be used - the net subsystem does this.
and we can do this for 2.6 - there are only a couple of areas left after
fixing all the BH_IMMEDIATE places.
i have moved all the taskqueue handling code into kernel/context.c, and
only kept the basic 'queue a task' definitions in include/linux/tqueue.h.
I've converted three of the most commonly used BH_IMMEDIATE users:
tty_io.c, floppy.c and random.c. [random.c might need more thought
though.]
i've also cleaned up kernel/timer.c over that of the stock smptimers
patch: privatized the timer-vec definitions (nothing needs it,
init_timer() used it mistakenly) and cleaned up the code. Plus i've moved
some code around that does not belong into timer.c, and within timer.c
i've organized data and functions along functionality and further
separated the base timer code from the NTP bits.
net_bh_lock: i have removed it, since it would synchronize to nothing. The
old protocol handlers should still run on UP, and on SMP the kernel prints
a warning upon use. Alexey, is this approach fine with you?
scalable timers: i've further improved the patch ported to 2.5 by wli and
Dipankar. There is only one pending issue i can see, the question of
whether to migrate timers in mod_timer() or not. I'm quite convinced that
they should be migrated, but i might be wrong. It's a 10 lines change to
switch between migrating and non-migrating timers, we can do performance
tests later on. The current, more complex migration code is pretty fast
and has been stable under extremely high networking loads in the past 2
years, so we can immediately switch to the simpler variant if someone
proves it improves performance. (I'd say if non-migrating timers improve
Apache performance on one of the bigger NUMA boxes then the point is
proven, no further though will be needed.)
|
|
include/linux/timer.h needs to include <linux/stddef.h>
to get the definition of NULL.
|
|
Nobody's using it any more, kill:
|
|
Trivial patch update against 2.5.17:
Tim Schmielau <tim@physik3.uni-rostock.de>: move jiffies from sched.h to it's own jiffies.h:
Move 'jiffies' from sched.h to their own header.
Then pull the sched.h dependency from 67 files that include sched.h for
no apparent reason other than the jiffies declaration.
Move the time_[before,after}{_eq}() macros from timer.h to jiffies.h,
since there are *no* files using them that don't also use jiffies.
Many more sched.h dependencies can be killed after capable(),
request_irq(), and free_irq() are moved out of <linux/sched.h>.
Tim Schmielau <tim@physik3.uni-rostock.de>
|
|
- Andrea: fix races in do_wp_page, free_swap_and_cache
- me: clena up page dirty handling
- Tim Waugh: parport IRQ probing and documentation fixes
- Greg KH: USB updates
- Michael Warfield: computone driver update
- Randy Dunlap: add knowledge about some new io-apics
- Richard Henderson: alpha updates
- Trond Myklebust: make readdir xdr verify the reply packet
- Paul Mackerras: PPC update
- Jens Axboe: make cpqarray and cciss play nice with the request layer
- Massimo Dal Zotto: SMM driver for Dell Inspiron 8000
- Richard Gooch: devfs symlink deadlock fix
- Anton Altaparmakov: make NTFS compile on sparc
|
|
- me/Al Viro: fix bdget() oops with block device modules that don't
clean up after they exit
- Alan Cox: continued merging (drivers, license tags)
- David Miller: sparc update, network fixes
- Christoph Hellwig: work around broken drivers that add a gendisk more
than once
- Jakub Jelinek: handle more ELF loading special cases
- Trond Myklebust: NFS client and lockd reclaimer cleanups/fixes
- Greg KH: USB updates
- Mikael Pettersson: sparate out local APIC / IO-APIC config options
|
|
- Neil Brown: md cleanups/fixes
- Andrew Morton: console locking merge
- Andrea Arkangeli: major VM merge
|
|
- Al Viro: clean up driver "invalidate_device()" mess
- Andries Brouwer: make sd.c work with USB Dane-Elec CompactFlash Card
Reader
- me: fix nasty lazy kernel page table update problem
- me: undo fork changes. Too many user-level bugs and unresolved issues.
- Peter Anvin: iso9660 cleanups
- Alan Cox: big merge
- Johannes Erdfelt: UHCI pci DMA setup fix
|
|
|