| Age | Commit message (Collapse) | Author |
|
FASTCALL() is always expanded to empty, remove it.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Dave Young reported warnings from lockdep that the workqueue API
can sometimes try to register lockdep classes with the same key
but different names. This is not permitted in lockdep.
Unfortunately, I was unaware of that restriction when I wrote
the code to debug workqueue problems with lockdep and used the
workqueue name as the lockdep class name. This can obviously
lead to the problem if the workqueue name is dynamic.
This patch solves the problem by always using a constant name
for the workqueue's lockdep class, namely either the constant
name that was passed in or a string consisting of the variable
name.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
|
|
In the following scenario:
code path 1:
my_function() -> lock(L1); ...; flush_workqueue(); ...
code path 2:
run_workqueue() -> my_work() -> ...; lock(L1); ...
you can get a deadlock when my_work() is queued or running
but my_function() has acquired L1 already.
This patch adds a pseudo-lock to each workqueue to make lockdep
warn about this scenario.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Change cancel_work_sync() and cancel_delayed_work_sync() to return a boolean
indicating whether the work was actually cancelled. A zero return value means
that the work was not pending/queued.
Without that kind of change it is not possible to avoid flush_workqueue()
sometimes, see the next patch as an example.
Also, this patch unifies both functions and kills the (unlikely) busy-wait
loop.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Jarek Poplawski <jarkao2@o2.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Imho, the current naming of cancel_xxx workqueue functions is very confusing.
cancel_delayed_work()
cancel_rearming_delayed_work()
cancel_rearming_delayed_workqueue() // obsolete
cancel_work_sync()
This looks as if the first 2 functions differ in "type" of their argument
which is not true any longer, nowadays the difference is the behaviour.
The semantics of cancel_rearming_delayed_work(dwork) was changed
significantly, it doesn't require that dwork rearms itself, and cancels dwork
synchronously.
Rename it to cancel_delayed_work_sync(). This matches cancel_delayed_work()
and cancel_work_sync(). Re-create cancel_rearming_delayed_work() as a simple
inline obsolete wrapper, like cancel_rearming_delayed_workqueue().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Jarek Poplawski <jarkao2@o2.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
As pointed out by Jarek Poplawski, the patch
[WORKQUEUE]: cancel_delayed_work: use del_timer() instead of del_timer_sync()
commit: 071b638689464c6b39407025eedd810d5b5e6f5d
was wrong, it was merged by mistake after that.
From the changelog:
after this patch:
...
delayed_work_timer_fn->__queue_work() in progress.
The latter doesn't differ from the caller's POV,
it does make a difference if the caller calls flush_workqueue() after
cancel_delayed_work(), in that case flush_workqueue() can miss this
work_struct.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Jarek Poplawski <jarkao2@o2.pl>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
It is a known fact that freezeable multithreaded workqueues doesn't like
CPU_DEAD. We keep them only for the incoming CPU-hotplug rework.
Sadly, we can't just kill create_freezeable_workqueue() right now, make
them singlethread.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq
(this was possible from the very beginnig, I missed this). So we can unify
flush_work_keventd and flush_work.
Also, rename flush_work() to cancel_work_sync() and fix all callers.
Perhaps this is not the best name, but "flush_work" is really bad.
(akpm: this is why the earlier patches bypassed maintainers)
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Auke Kok <auke-jan.h.kok@intel.com>,
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We don't have any users, and it is not so trivial to use NOAUTOREL works
correctly. It is better to simplify API.
Delete NOAUTOREL support and rename work_release to work_clear_pending to
avoid a confusion.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
cancel_rearming_delayed_workqueue(wq, dwork) doesn't need the first
parameter. We don't hang on un-queued dwork any longer, and work->data
doesn't change its type. This means we can always figure out "wq" from
dwork when it is needed.
Remove this parameter, and rename the function to
cancel_rearming_delayed_work(). Re-create an inline "obsolete"
cancel_rearming_delayed_workqueue(wq) which just calls
cancel_rearming_delayed_work().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Because it has no callers.
Actually, I think the whole idea of run_scheduled_work() was not right, not
good to mix "unqueue this work and execute its ->func()" in one function.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
A basic problem with flush_scheduled_work() is that it blocks behind _all_
presently-queued works, rather than just the work whcih the caller wants to
flush. If the caller holds some lock, and if one of the queued work happens
to want that lock as well then accidental deadlocks can occur.
One example of this is the phy layer: it wants to flush work while holding
rtnl_lock(). But if a linkwatch event happens to be queued, the phy code will
deadlock because the linkwatch callback function takes rtnl_lock.
So we implement a new function which will flush a *single* work - just the one
which the caller wants to free up. Thus we avoid the accidental deadlocks
which can arise from unrelated subsystems' callbacks taking shared locks.
flush_work() non-blockingly dequeues the work_struct which we want to kill,
then it waits for its handler to complete on all CPUs.
Add ->current_work to the "struct cpu_workqueue_struct", it points to
currently running "struct work_struct". When flush_work(work) detects
->current_work == work, it inserts a barrier at the _head_ of ->worklist
(and thus right _after_ that work) and waits for completition. This means
that the next work fired on that CPU will be this barrier, or another
barrier queued by concurrent flush_work(), so the caller of flush_work()
will be woken before any "regular" work has a chance to run.
When wait_on_work() unlocks workqueue_mutex (or whatever we choose to protect
against CPU hotplug), CPU may go away. But in that case take_over_work() will
move a barrier we queued to another CPU, it will be fired sometime, and
wait_on_work() will be woken.
Actually, we are doing cleanup_workqueue_thread()->kthread_stop() before
take_over_work(), so cwq->thread should complete its ->worklist (and thus
the barrier), because currently we don't check kthread_should_stop() in
run_workqueue(). But even if we did, everything should be ok.
[akpm@osdl.org: cleanup]
[akpm@osdl.org: add flush_work_keventd() wrapper]
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add a new deferrable delayed work init. This can be used to schedule work
that are 'unimportant' when CPU is idle and can be called later, when CPU
eventually comes out of idle.
Use this init in cpufreq ondemand governor.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
del_timer_sync() buys nothing for cancel_delayed_work(), but it is less
efficient since it locks the timer unconditionally, and may wait for the
completion of the delayed_work_timer_fn().
cancel_delayed_work() == 0 means:
before this patch:
work->func may still be running or queued
after this patch:
work->func may still be running or queued, or
delayed_work_timer_fn->__queue_work() in progress.
The latter doesn't differ from the caller's POV,
delayed_work_timer_fn() is called with _PENDING
bit set.
cancel_delayed_work() == 1 with this patch adds a new possibility:
delayed_work->work was cancelled, but delayed_work_timer_fn
is still running (this is only possible for the re-arming
works on single-threaded workqueue).
In this case the timer was re-started by work->func(), nobody
else can do this. This in turn means that delayed_work_timer_fn
has already passed __queue_work() (and wont't touch delayed_work)
because nobody else can queue delayed_work->work.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On architectures where the atomicity of the bit operations is handled by
external means (ie a separate spinlock to protect concurrent accesses),
just doing a direct assignment on the workqueue data field (as done by
commit 4594bf159f1962cec3b727954b7c598b07e2e737) can cause the
assignment to be lost due to lack of serialization with the bitops on
the same word.
So we need to serialize the assignment with the locks on those
architectures (notably older ARM chips, PA-RISC and sparc32).
So rather than using an "unsigned long", let's use "atomic_long_t",
which already has a safe assignment operation (atomic_long_set()) on
such architectures.
This requires that the atomic operations use the same atomicity locks as
the bit operations do, but that is largely the case anyway. Sparc32
will probably need fixing.
Architectures (including modern ARM with LL/SC) that implement sane
atomic operations for SMP won't see any of this matter.
Cc: Russell King <rmk+lkml@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: David Miller <davem@davemloft.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Linux Arch Maintainers <linux-arch@vger.kernel.org>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Nobody uses it, but it was still wrong. Using the macro argument name
'work' meant that when we used 'work' as a member name, that would also
get replaced by the macro argument.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This allows workqueue users to run just their own pending work, rather
than wait for the whole workqueue to finish running. This solves the
deadlock with networking libphy that was due to other workqueue entries
possibly needing a lock that was held by the routine that wanted to
flush its own work.
It's not wonderful: if you absolutely need to synchronize with the work
function having been executed, any user strictly speaking should have
its own completion tracking logic, since when we run things explicitly
by hand, the generic workqueue layer can no longer help us synchronize.
Also, this is strictly only usable for work that has been scheduled
without any delayed timers. You can not mix the new interface with
schedule_delayed_work().
But it's better than what we had currently.
Acked-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Make it possible to create a workqueue the worker thread of which will be
frozen during suspend, along with other kernel threads.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@suspend2.net>
Cc: David Chinner <dgc@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Pass the work_struct pointer to the work function rather than context data.
The work function can use container_of() to work out the data.
For the cases where the container of the work_struct may go away the moment the
pending bit is cleared, it is made possible to defer the release of the
structure by deferring the clearing of the pending bit.
To make this work, an extra flag is introduced into the management side of the
work_struct. This governs auto-release of the structure upon execution.
Ordinarily, the work queue executor would release the work_struct for further
scheduling or deallocation by clearing the pending bit prior to jumping to the
work function. This means that, unless the driver makes some guarantee itself
that the work_struct won't go away, the work function may not access anything
else in the work_struct or its container lest they be deallocated.. This is a
problem if the auxiliary data is taken away (as done by the last patch).
However, if the pending bit is *not* cleared before jumping to the work
function, then the work function *may* access the work_struct and its container
with no problems. But then the work function must itself release the
work_struct by calling work_release().
In most cases, automatic release is fine, so this is the default. Special
initiators exist for the non-auto-release case (ending in _NAR).
Signed-Off-By: David Howells <dhowells@redhat.com>
|
|
Reclaim a word from the size of the work_struct by folding the pending bit and
the wq_data pointer together. This shouldn't cause misalignment problems as
all pointers should be at least 4-byte aligned.
Signed-Off-By: David Howells <dhowells@redhat.com>
|
|
Define a type for the work function prototype. It's not only kept in the
work_struct struct, it's also passed as an argument to several functions.
This makes it easier to change it.
Signed-Off-By: David Howells <dhowells@redhat.com>
|
|
Separate delayable work items from non-delayable work items be splitting them
into a separate structure (delayed_work), which incorporates a work_struct and
the timer_list removed from work_struct.
The work_struct struct is huge, and this limits it's usefulness. On a 64-bit
architecture it's nearly 100 bytes in size. This reduces that by half for the
non-delayable type of event.
Signed-Off-By: David Howells <dhowells@redhat.com>
|
|
Add queue_delayed_work_on() interface for workqueues.
Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
|
|
We have several points in the SCSI stack (primarily for our device
functions) where we need to guarantee process context, but (given the
place where the last reference was released) we cannot guarantee this.
This API gets around the issue by executing the function directly if
the caller has process context, but scheduling a workqueue to execute
in process context if the caller doesn't have it.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
swap migration's isolate_lru_page() currently uses an IPI to notify other
processors that the lru caches need to be drained if the page cannot be
found on the LRU. The IPI interrupt may interrupt a processor that is just
processing lru requests and cause a race condition.
This patch introduces a new function run_on_each_cpu() that uses the
keventd() to run the LRU draining on each processor. Processors disable
preemption when dealing the LRU caches (these are per processor) and thus
executing LRU draining from another process is safe.
Thanks to Lee Schermerhorn <lee.schermerhorn@hp.com> for finding this race
condition.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This was unexported by Arjan because we have no current users.
However, during a conversion from tasklets to workqueues of the parisc led
functions, we ran across a case where this was needed. In particular, the
open coded equivalent of cancel_rearming_delayed_workqueue was implemented
incorrectly, which is, I think, all the evidence necessary that this is a
useful API.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: Arjan van de Ven <arjan@infradead.org>
cancel_rearming_delayed_workqueue() is only used inside workqueue.c; make
this function static (the more useful wrapper around it later in that
function remains non-static and exported)
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I'm submitting two patches associated with moving cache_reap functionality
out of timer context. Note that these patches do not make any further
optimizations to cache_reap at this time.
The first patch adds a function similiar to schedule_delayed_work to allow
work to be scheduled on another cpu.
The second patch makes use of schedule_delayed_work_on to schedule
cache_reap to run from keventd.
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
cancel_delayed_work() forgets to clear the workqueue's pending flag. This
makes the workqueue appear to be permanently busy, so any subsequent attempts
to use it will fail.
|
|
From: Rusty Russell <rusty@rustcorp.com.au>
Workqueues are a great primitive for running things from user context from
a completely clean environment. Unfortunately, they currently insist on
creating one thread per CPU, which is overkill for many situations, so the
more generic keventd workqueue is used for these. Recently deadlocks using
keventd were demonstrated, showing that it is not suitable for all uses.
1) Clean up CPU iterators. Always a nice touch.
2) Add __create_workqueue() and create_singlethread_workqueue(),
keeping source compatibility.
3) Put workqueues in workqueue list even if !CONFIG_HOTPLUG_CPU (means
we need a lock to protect that list). Now we can tell if a wq is
single-threaded using list_empty(&wq->list).
4) For single-threaded workqueues, override CPU in queue_work,
delayed_work_timer_fn and flush_workqueue to be 0. flush_workqueue
now does redundant passes for single-threaded workqueues, but the
code remains simple.
5) Make create_workqueue_thread return the thread, so we can easily
kthread_bind for multi-threaded workqueues.
akpm fixes:
- Fix up is_single_threaded() handling
- single-threaded wq thread does not have "/0" appended.
|
|
From: Rusty Russell <rusty@rustcorp.com.au>
These two patches provide the framework for stopping kernel threads to
allow hotplug CPU. This one just adds kthread.c and kthread.h, next
one uses it.
Most importantly, adds a Monty Python quote to the kernel.
Details:
The hotplug CPU code introduces two major problems:
1) Threads which previously never stopped (migration thread,
ksoftirqd, keventd) have to be stopped cleanly as CPUs go offline.
2) Threads which previously never had to be created now have
to be created when a CPU goes online.
Unfortunately, stopping a thread is fairly baroque, involving memory
barriers, a completion and spinning until the task is actually dead
(for example, complete_and_exit() must be used if inside a module).
There are also three problems in starting a thread:
1) Doing it from a random process context risks environment contamination:
better to do it from keventd to guarantee a clean environment, a-la
call_usermodehelper.
2) Getting the task struct without races is a hard: see kernel/sched.c
migration_call(), kernel/workqueue.c create_workqueue_thread().
3) There are races in starting a thread for a CPU which is not yet
online: migration thread does a complex dance at the moment for
a similar reason (there may be no migration thread to migrate us).
Place all this logic in some primitives to make life easier:
kthread_create() and kthread_stop(). These primitives require no
extra data-structures in the caller: they operate on normal "struct
task_struct"s.
Other changes:
- Expose keventd_up(), as keventd and migration threads will use kthread to
launch, and kthread normally uses workqueues and must recognize this case.
- Kthreads created at boot before "keventd" are spawned directly. However,
this means that they don't have all signals blocked, and hence can be
killed. The simplest solution is to always explicitly block all signals in
the kthread.
- Change over the migration threads, the workqueue threads and the
ksoftirqd threads to use kthread.
- module.c currently spawns threads directly to stop the machine, so a
module can be atomically tested for removal.
- Unfortunately, this means that the current task is manipulated (which
races with set_cpus_allowed, for example), and it can't set its priority
artificially high. Using a kernel thread can solve this cleanly, and with
kthread_run, it's simple.
- kthreads use keventd, so they inherit its cpus_allowed mask. Unset it.
All current users set it explicity anyway, but it's nice to fix.
- call_usermode_helper uses keventd, so the process created inherits its
cpus_allowed mask. Unset it.
- Prevent errors in boot when cpus_possible() contains a cpu which is not
online (ie. a cpu didn't come up). This doesn't happen on x86, since a
boot failure makes that CPU no longer possible (hacky, but it works).
- When the cpu fails to come up, some callbacks do kthread_stop(), which
doesn't work without keventd (which hasn't started yet). Call it directly,
and take care that it restores signal state (note: do_sigaction does a
flush on blocked signals, so we don't need to repeat it).
|
|
The workqueue code currently has a notion of a per-cpu queue being "busy".
flush_scheduled_work()'s responsibility is to wait for a queue to be not busy.
Problem is, flush_scheduled_work() can easily hang up.
- The workqueue is deemed "busy" when there are pending delayed
(timer-based) works. But if someone repeatedly schedules new delayed work
in the callback, the queue will never fall idle, and flush_scheduled_work()
will not terminate.
- If someone reschedules work (not delayed work) in the work function, that
too will cause the queue to never go idle, and flush_scheduled_work() will
not terminate.
So what this patch does is:
- Create a new "cancel_delayed_work()" which will try to kill off any
timer-based delayed works.
- Change flush_scheduled_work() so that it is immune to people re-adding
work in the work callout handler.
We can do this by recognising that the caller does *not* want to wait
until the workqueue is "empty". The caller merely wants to wait until all
works which were pending at the time flush_scheduled_work() was called have
completed.
The patch uses a couple of sequence numbers for that.
So now, if someone wants to reliably remove delayed work they should do:
/*
* Make sure that my work-callback will no longer schedule new work
*/
my_driver_is_shutting_down = 1;
/*
* Kill off any pending delayed work
*/
cancel_delayed_work(&my_work);
/*
* OK, there will be no new works scheduled. But there may be one
* currently queued or in progress. So wait for that to complete.
*/
flush_scheduled_work();
The patch also changes the flush_workqueue() sleep to be uninterruptible.
We cannot legally bale out if a signal is delivered anyway.
|
|
Teach DECLARE_WORK about __TIMER_INITIALIZER. So all statically
initialised workqueues have valid timers. eg:
drivers/char/random.c:batch_work.
|
|
|
|
This does a number of timer subsystem enhancements:
- simplified timer initialization, now it's the cheapest possible thing:
static inline void init_timer(struct timer_list * timer)
{
timer->base = NULL;
}
since the timer functions already did a !timer->base check this did not
have any effect on their fastpath.
- the rule from now on is that timer->base is set upon activation of the
timer, and cleared upon deactivation. This also made it possible to:
- reorganize all the timer handling code to not assume anything about
timer->entry.next and timer->entry.prev - this also removed lots of
unnecessery cleaning of these fields. Removed lots of unnecessary list
operations from the fastpath.
- simplified del_timer_sync(): it now uses del_timer() plus some simple
synchronization code. Note that this also fixes a bug: if mod_timer (or
add_timer) moves a currently executing timer to another CPU's timer
vector, then del_timer_sync() does not synchronize with the handler
properly.
- bugfix: moved run_local_timers() from scheduler_tick() into
update_process_times() .. scheduler_tick() might be called from the fork
code which will not quite have the intended effect ...
- removed the APIC-timer-IRQ shifting done on SMP, Dipankar Sarma's
testing shows no negative effects.
- cleaned up include/linux/timer.h:
- removed the timer_t typedef, and fixes up kernel/workqueue.c to use
the 'struct timer_list' name instead.
- removed unnecessery includes
- renamed the 'list' field to 'entry' (it's an entry not a list head)
- exchanged the 'function' and 'data' fields. This, besides being
more logical, also unearthed the last few remaining places that
initialized timers by assuming some given field ordering, the patch
also fixes these places. (fs/xfs/pagebuf/page_buf.c,
net/core/profile.c and net/ipv4/inetpeer.c)
- removed the defunct sync_timers(), timer_enter() and timer_exit()
prototypes.
- added docbook-style comments.
- other kernel/timer.c changes:
- base->running_timer does not have to be volatile ...
- added consistent comments to all the important functions.
- made the sync-waiting in del_timer_sync preempt- and lowpower-
friendly.
i've compiled, booted & tested the patched kernel on x86 UP and SMP. I
have tried moderately high networking load as well, to make sure the timer
changes are correct - they appear to be.
|
|
This is the next iteration of the workqueue abstraction.
The framework includes:
- per-CPU queueing support.
on SMP there is a per-CPU worker thread (bound to its CPU) and per-CPU
work queues - this feature is completely transparent to workqueue-users.
keventd automatically uses this feature. XFS can now update to work-queues
and have the same per-CPU performance as it had with its per-CPU worker
threads.
- delayed work submission
there's a new queue_delayed_work(wq, work, delay) function and a new
schedule_delayed_work(work, delay) function. The later one is used to
correctly fix former tq_timer users. I've reverted those changes in 2.5.40
that changed tq_timer uses to schedule_work() - eg. in the case of
random.c or the tty flip queue it was definitely the wrong thing to do.
delayed work means a timer embedded in struct work_struct. I considered
using split struct work_struct and delayed_work_struct types, but lots
of code actively uses task-queues in both delayed and non-delayed mode,
so i went for the more generic approach that allows both methods of work
submission. Delayed timers do not cause any other overhead in the
normal submission path otherwise.
- multithreaded run_workqueue() implementation
the run_workqueue() function can now be called from multiple contexts, and
a worker thread will only use up a single entryy - this property is used
by the flushing code, and can potentially be used in the future to extend
the number of per-CPU worker threads.
- more reliable flushing
there's now a 'pending work' counter, which is used to accurately detect
when the last work-function has finished execution. It's also used to
correctly flush against timed requests. I'm not convinced whether the old
keventd implementation got this detail right.
- i switched the arguments of the queueing function(s) per Jeff's
suggestion, it's more straightforward this way.
Driver fixes:
i have converted almost every affected driver to the new framework. This
cleaned up tons of code. I also fixed a number of drivers that were still
using BHs (these drivers did not compile in 2.5.40).
while this means lots of changes, it might ease the QA decision whether to
put this patch into 2.5.
The pach converts roughly 80% of all tqueue-using code to workqueues - and
all the places that are not converted to workqueues yet are places that do
not compile in vanilla 2.5.40 anyway, due to unrelated changes. I've
converted a fair number of drivers that do not compile in 2.5.40, and i
think i've managed to convert every driver that compiles under 2.5.40.
|