| Age | Commit message (Collapse) | Author |
|
Switch the memory policy of the kevent threads to MPOL_DEFAULT while
leaving the kzalloc of the workqueue structure on interleave. This means
that all code executed in the context of the kevent thread is allocating
node local.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Alok Kataria <alok.kataria@calsoftinc.com>
Cc: Andi Kleen <ak@suse.de>
Cc: <pj@sgi.com>
Cc: <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
s/until until/until/
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
|
Use a private lock instead. It protects all per-cpu data structures in
workqueue.c, including the workqueues list.
Fix a bug in schedule_on_each_cpu(): it was forgetting to lock down the
per-cpu resources.
Unfixed long-standing bug: if someone unplugs the CPU identified by
`singlethread_cpu' the kernel will get very sick.
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
kernel/workqueue.c was omitted from generating kernel documentation. This
adds a new section "Workqueues and Kevents" and adds documentation for some
of the functions.
Some functions in this file already had DocBook-style comments, now they
finally become visible.
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
Move workqueue exports to where the functions are defined.
[CPUFREQ] Misc cleanups in ondemand.
[CPUFREQ] Make ondemand sampling per CPU and remove the mutex usage in sampling path.
[CPUFREQ] Add queue_delayed_work_on() interface for workqueues.
[CPUFREQ] Remove slowdown from ondemand sampling path.
|
|
cleanup: remove task_t and convert all the uses to struct task_struct. I
introduced it for the scheduler anno and it was a mistake.
Conversion was mostly scripted, the result was reviewed and all
secondary whitespace and style impact (if any) was fixed up by hand.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Dave Jones <davej@redhat.com>
|
|
Add queue_delayed_work_on() interface for workqueues.
Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
|
|
In 2.6.17, there was a problem with cpu_notifiers and XFS. I provided a
band-aid solution to solve that problem. In the process, i undid all the
changes you both were making to ensure that these notifiers were available
only at init time (unless CONFIG_HOTPLUG_CPU is defined).
We deferred the real fix to 2.6.18. Here is a set of patches that fixes the
XFS problem cleanly and makes the cpu notifiers available only at init time
(unless CONFIG_HOTPLUG_CPU is defined).
If CONFIG_HOTPLUG_CPU is defined then cpu notifiers are available at run
time.
This patch reverts the notifier_call changes made in 2.6.17
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
If a cpu hotplug callback fails on CPU_UP_PREPARE, all callbacks will be
called with CPU_UP_CANCELED. A few of these callbacks assume that on
CPU_UP_PREPARE a pointer to task has been stored in a percpu array. This
assumption is not true if CPU_UP_PREPARE fails and the following calls to
kthread_bind() in CPU_UP_CANCELED will cause an addressing exception
because of passing a NULL pointer.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
schedule_on_each_cpu() presently does a large kmalloc - 96 kbytes on 1024 CPU
64-bit.
Rework it so that we do one 8192-byte allocation and then a pile of tiny ones,
via alloc_percpu(). This has a much higher chance of success (100% in the
current VM).
This also has the effect of reducing the memory requirements from NR_CPUS*n to
num_possible_cpus()*n.
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
list_splice_init(list, head) does unneeded job if it is known that
list_empty(head) == 1. We can use list_replace_init() instead.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Few of the notifier_chain_register() callers use __init in the definition
of notifier_call. It is incorrect as the function definition should be
available after the initializations (they do not unregister them during
initializations).
This patch fixes all such usages to _not_ have the notifier_call __init
section.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
We have several points in the SCSI stack (primarily for our device
functions) where we need to guarantee process context, but (given the
place where the last reference was released) we cannot guarantee this.
This API gets around the issue by executing the function directly if
the caller has process context, but scheduling a workqueue to execute
in process context if the caller doesn't have it.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
|
Remove the "inline" keyword from a bunch of big functions in the kernel with
the goal of shrinking it by 30kb to 40kb
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Use first_cpu(cpu_possible_map) for the single-thread workqueue case. We
used to hardcode 0, but that broke on systems where !cpu_possible(0) when
workqueue_struct->cpu_workqueue_struct was changed from a static array to
alloc_percpu.
Commit id bce61dd49d6ba7799be2de17c772e4c701558f14 ("Fix hardcoded cpu=0 in
workqueue for per_cpu_ptr() calls") fixed that for Ben's funky sparc64
system, but it regressed my Power5. Offlining cpu 0 oopses upon the next
call to queue_work for a single-thread workqueue, because now we try to
manipulate per_cpu_ptr(wq->cpu_wq, 1), which is uninitialized.
So we need to establish an unchanging "slot" for single-thread workqueues
which will have a valid percpu allocation. Since alloc_percpu keys off of
cpu_possible_map, which must not change after initialization, make this
slot == first_cpu(cpu_possible_map).
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
__create_workqueue() not checking return of alloc_percpu()
NULL dereference was possible.
Signed-off-by: Ben Collins <bcollins@ubuntu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
swap migration's isolate_lru_page() currently uses an IPI to notify other
processors that the lru caches need to be drained if the page cannot be
found on the LRU. The IPI interrupt may interrupt a processor that is just
processing lru requests and cause a race condition.
This patch introduces a new function run_on_each_cpu() that uses the
keventd() to run the LRU draining on each processor. Processors disable
preemption when dealing the LRU caches (these are per processor) and thus
executing LRU draining from another process is safe.
Thanks to Lee Schermerhorn <lee.schermerhorn@hp.com> for finding this race
condition.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Tracked this down on an Ultra Enterprise 3000. It's a 6-way machine. Odd
thing about this machine (and it's good for finding bugs like this) is that
the CPU id's are not 0 based. For instance, on my machine the CPU's are
6/7/10/11/14/15.
This caused some NULL pointer dereference in kernel/workqueue.c because for
single_threaded workqueue's, it hardcoded the cpu to 0.
I changed the 0's to any_online_cpu(cpu_online_mask), which cpumask.h
claims is "First cpu in mask". So this fits the same usage.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Replace smp_processor_id() with any_online_cpu(cpu_online_map) in order to
avoid lots of "BUG: using smp_processor_id() in preemptible [00000001]
code:..." messages in case taking a cpu online fails.
All the traces start at the last notifier_call_chain(...) in kernel/cpu.c.
Since we hold the cpu_control semaphore it shouldn't be any problem to access
cpu_online_map.
The reason why cpu_up failed is simply that the cpu that was supposed to be
taken online wasn't even there. That is because on s390 we never know when a
new cpu comes and therefore cpu_possible_map consists of only ones and doesn't
reflect reality.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch makes the workqueus use alloc_percpu instead of an array. The
workqueues are placed on nodes local to each processor.
The workqueue structure can grow to a significant size on a system with
lots of processors if this patch is not applied. 64 bit architectures with
all debugging features enabled and configured for 512 processors will not
be able to boot without this patch.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch introduces a kzalloc wrapper and converts kernel/ to use it. It
saves a little program text.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
With "-W -Wno-unused -Wno-sign-compare" I get the following compile warning:
CC kernel/workqueue.o
kernel/workqueue.c: In function `workqueue_cpu_callback':
kernel/workqueue.c:504: warning: ordered comparison of pointer with integer zero
On error create_workqueue_thread() returns NULL, not negative pointer, so
following trivial patch suggests itself.
Signed-off-by: Mika Kukkonen <mikukkon@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
We have a chek in there to make sure that the name won't overflow
task_struct.comm[], but it's triggering for scsi with lots of HBAs, only
scsi is using single-threaded workqueues which don't append the "/%d"
anyway.
All too hard. Just kill the BUG_ON.
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
[ kthread_create() uses vsnprintf() and limits the thing, so no
actual overflow can actually happen regardless ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This was unexported by Arjan because we have no current users.
However, during a conversion from tasklets to workqueues of the parisc led
functions, we ran across a case where this was needed. In particular, the
open coded equivalent of cancel_rearming_delayed_workqueue was implemented
incorrectly, which is, I think, all the evidence necessary that this is a
useful API.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
into nuts.davemloft.net:/disk1/BK/net-2.6.12
|
|
We weren't binding new worker threads to their cpu when onlining. Using
preempt and the debug version of smp_processor_id found this.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: Arjan van de Ven <arjan@infradead.org>
cancel_rearming_delayed_workqueue() is only used inside workqueue.c; make
this function static (the more useful wrapper around it later in that
function remains non-static and exported)
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Kernel core files converted to use the new lock initializers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Timeslice proportion has been increased substantially for -niced tasks. As
a result of this kernel threads have much larger timeslices than they
previously had.
Change kernel threads' nice value to -5 to bring their timeslice back in
line with previous behaviour. This means kernel threads will be less
likely to cause large latencies under periods of system stress for normal
nice 0 tasks.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The work address is increasingly unreliable and incompetently run.
Time to remove all visible instances of it and rely only on one
which isn't run by crack-monkeys.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
|
|
I noticed a large machine was doing about 400,000 context switches per
second when oprofile was enabled. Upon closer inspection it looks like we
were rearming the buffer sync timer without modifying the expire time.
Now that we have schedule_delayed_work_on I believe we can remove the timer
completely. Each cpu should be offset by 1 jiffy so they dont all fire at
the same time. I bumped DEFAULT_TIMER_EXPIRE from 2 to 10 times a second
to be sure we reap cpu buffers.
With the following patch the same large machine gets about 4000 context
switches per second.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
I'm submitting two patches associated with moving cache_reap functionality
out of timer context. Note that these patches do not make any further
optimizations to cache_reap at this time.
The first patch adds a function similiar to schedule_delayed_work to allow
work to be scheduled on another cpu.
The second patch makes use of schedule_delayed_work_on to schedule
cache_reap to run from keventd.
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: "Anil" <anil.s.keshavamurthy@intel.com>
We don't need lock_cpu_hotplug()/unlock_cpu_hotplug for singlethreaded
workqueues.
Signed-off-by: Anil Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: "Anil" <anil.s.keshavamurthy@intel.com>
In flush_workqueue() for a single_threaded_worqueue case the code flushes the
same cpu_workqueue_struct for each online_cpu.
Change things so that we only perform the flush once in this case.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Fix some silliness in there.
|
|
Fix a waitqueue-handling race in worker_thread().
|
|
Fix bug identified by Srivatsa Vaddagiri <vatsa@in.ibm.com>:
There's a deadlock in __create_workqueue when CONFIG_HOTPLUG_CPU is set. This
can happen when create_workqueue_thread fails to create a worker thread. In
that case, we call destroy_workqueue with cpu hotplug lock held.
destroy_workqueue however also attempts to take the same lock.
|
|
From: Rusty Russell <rusty@rustcorp.com.au>
Workqueues are a great primitive for running things from user context from
a completely clean environment. Unfortunately, they currently insist on
creating one thread per CPU, which is overkill for many situations, so the
more generic keventd workqueue is used for these. Recently deadlocks using
keventd were demonstrated, showing that it is not suitable for all uses.
1) Clean up CPU iterators. Always a nice touch.
2) Add __create_workqueue() and create_singlethread_workqueue(),
keeping source compatibility.
3) Put workqueues in workqueue list even if !CONFIG_HOTPLUG_CPU (means
we need a lock to protect that list). Now we can tell if a wq is
single-threaded using list_empty(&wq->list).
4) For single-threaded workqueues, override CPU in queue_work,
delayed_work_timer_fn and flush_workqueue to be 0. flush_workqueue
now does redundant passes for single-threaded workqueues, but the
code remains simple.
5) Make create_workqueue_thread return the thread, so we can easily
kthread_bind for multi-threaded workqueues.
akpm fixes:
- Fix up is_single_threaded() handling
- single-threaded wq thread does not have "/0" appended.
|
|
From: Nigel Cunningham <ncunningham@users.sourceforge.net>
A few weeks ago, Pavel and I agreed that PF_IOTHREAD should be renamed to
PF_NOFREEZE. This reflects the fact that some threads so marked aren't
actually used for IO while suspending, but simply shouldn't be frozen.
This patch, against 2.6.5 vanilla, applies that change. In the
refrigerator calls, the actual value doesn't matter (so long as it's
non-zero) and it makes more sense to use PF_FREEZE so I've used that.
|
|
Workqueues need to bring up/destroy the per-cpu thread on cpu up/down.
1) Add a global list of workqueues, and keep the name in the structure
(to name the newly created thread).
2) Remove BUG_ON in run_workqueue, since thread is dragged off CPU when
it goes down.
3) Lock out cpu up/down in flush_workqueue, create_workqueue and
destroy_workqueue.
4) Add notifier to add/destroy workqueue threads, and take over work.
|
|
Add a debug check for workqueues nested more than three deep via the
direct-run-workqueue() path.
|
|
Because keventd is a resource which is shared between unrelated parts of the
kernel it is possible for one person's workqueue handler to accidentally call
another person's flush_scheduled_work(). thockin managed it by calling
mntput() from a workqueue handler. It deadlocks.
It's simple enough to fix: teach flush_scheduled_work() to go direct when it
discovers that the calling thread is the one which should be running the
work.
Note that this can cause recursion. The depth of that recursion is equal to
the number of currently-queued works which themselves want to call
flush_scheduled_work(). If this ever exceeds three I'll eat my hat.
|
|
From: Srivatsa Vaddagiri <vatsa@in.ibm.com>
current_is_keventd() doesn't need to search across all the CPUs to identify
itself.
|
|
From: Gerd Knorr <kraxel@suse.de>
Current gcc's error out if a function's declaration and definition disagree
about the register passing convention.
The patch adds a new `fastcall' declatation primitive, and uses that in all
the FASTCALL functions which we could find. A number of inconsistencies were
fixed up along the way.
|
|
From: Rusty Russell <rusty@rustcorp.com.au>
Move duplicated code to __queue_work(), and don't set the CPU for
queue_delayed_work() until the timer goes off. The second one only has an
effect on CONFIG_HOTPLUG_CPU where the CPU goes down and the timer goes off
on a different CPU than it was scheduled on.
|
|
From: Rusty Russell <rusty@rustcorp.com.au>
These two patches provide the framework for stopping kernel threads to
allow hotplug CPU. This one just adds kthread.c and kthread.h, next
one uses it.
Most importantly, adds a Monty Python quote to the kernel.
Details:
The hotplug CPU code introduces two major problems:
1) Threads which previously never stopped (migration thread,
ksoftirqd, keventd) have to be stopped cleanly as CPUs go offline.
2) Threads which previously never had to be created now have
to be created when a CPU goes online.
Unfortunately, stopping a thread is fairly baroque, involving memory
barriers, a completion and spinning until the task is actually dead
(for example, complete_and_exit() must be used if inside a module).
There are also three problems in starting a thread:
1) Doing it from a random process context risks environment contamination:
better to do it from keventd to guarantee a clean environment, a-la
call_usermodehelper.
2) Getting the task struct without races is a hard: see kernel/sched.c
migration_call(), kernel/workqueue.c create_workqueue_thread().
3) There are races in starting a thread for a CPU which is not yet
online: migration thread does a complex dance at the moment for
a similar reason (there may be no migration thread to migrate us).
Place all this logic in some primitives to make life easier:
kthread_create() and kthread_stop(). These primitives require no
extra data-structures in the caller: they operate on normal "struct
task_struct"s.
Other changes:
- Expose keventd_up(), as keventd and migration threads will use kthread to
launch, and kthread normally uses workqueues and must recognize this case.
- Kthreads created at boot before "keventd" are spawned directly. However,
this means that they don't have all signals blocked, and hence can be
killed. The simplest solution is to always explicitly block all signals in
the kthread.
- Change over the migration threads, the workqueue threads and the
ksoftirqd threads to use kthread.
- module.c currently spawns threads directly to stop the machine, so a
module can be atomically tested for removal.
- Unfortunately, this means that the current task is manipulated (which
races with set_cpus_allowed, for example), and it can't set its priority
artificially high. Using a kernel thread can solve this cleanly, and with
kthread_run, it's simple.
- kthreads use keventd, so they inherit its cpus_allowed mask. Unset it.
All current users set it explicity anyway, but it's nice to fix.
- call_usermode_helper uses keventd, so the process created inherits its
cpus_allowed mask. Unset it.
- Prevent errors in boot when cpus_possible() contains a cpu which is not
online (ie. a cpu didn't come up). This doesn't happen on x86, since a
boot failure makes that CPU no longer possible (hacky, but it works).
- When the cpu fails to come up, some callbacks do kthread_stop(), which
doesn't work without keventd (which hasn't started yet). Call it directly,
and take care that it restores signal state (note: do_sigaction does a
flush on blocked signals, so we don't need to repeat it).
|
|
From: Rusty Russell <rusty@rustcorp.com.au>
It turns out that run_workqueue never has signal_pending(), since setting
the handler to SIG_IGN means "don't make zombies, I'm ignoring them". Fix
the comment, don't allow the signal, and remove the unused waitpid loop.
This also allows simpler conversion of workueues to the kthread mechanism,
which uses signals to indicate it's time to stop.
|
|
From: Rusty Russell <rusty@rustcorp.com.au>
Some places use cpu_online() where they should be using cpu_possible, most
commonly for tallying statistics. This makes no difference without hotplug
CPU.
Use the for_each_cpu() macro in those places, providing good examples (and
making the external hotplug CPU patch smaller).
Some places use cpu_online() where they should be using cpu_possible, most
commonly for tallying statistics. This makes no difference without hotplug
CPU.
Use the for_each_cpu() macro in those places, providing good examples (and
making the external hotplug CPU patch smaller).
|
|
From: William Lee Irwin III <wli@holomorphy.com>
Contributions from:
Jan Dittmer <jdittmer@sfhq.hn.org>
Arnd Bergmann <arnd@arndb.de>
"Bryan O'Sullivan" <bos@serpentine.com>
"David S. Miller" <davem@redhat.com>
Badari Pulavarty <pbadari@us.ibm.com>
"Martin J. Bligh" <mbligh@aracnet.com>
Zwane Mwaikambo <zwane@linuxpower.ca>
It has ben tested on x86, sparc64, x86_64, ia64 (I think), ppc and ppc64.
cpumask_t enables systems with NR_CPUS > BITS_PER_LONG to utilize all their
cpus by creating an abstract data type dedicated to representing cpu
bitmasks, similar to fd sets from userspace, and sweeping the appropriate
code to update callers to the access API. The fd set-like structure is
according to Linus' own suggestion; the macro calling convention to ambiguate
representations with minimal code impact is my own invention.
Specifically, a new set of inline functions for manipulating arbitrary-width
bitmaps is introduced with a relatively simple implementation, in tandem with
a new data type representing bitmaps of width NR_CPUS, cpumask_t, whose
accessor functions are defined in terms of the bitmap manipulation inlines.
This bitmap ADT found an additional use in i386 arch code handling sparse
physical APIC ID's, which was convenient to use in this case as the
accounting structure was required to be wider to accommodate the physids
consumed by larger numbers of cpus.
For the sake of simplicity and low code impact, these cpu bitmasks are passed
primarily by value; however, an additional set of accessors along with an
auxiliary data type with const call-by-reference semantics is provided to
address performance concerns raised in connection with very large systems,
such as SGI's larger models, where copying and call-by-value overhead would
be prohibitive. Few (if any) users of the call-by-reference API are
immediately introduced.
Also, in order to avoid calling convention overhead on architectures where
structures are required to be passed by value, NR_CPUS <= BITS_PER_LONG is
special-cased so that cpumask_t falls back to an unsigned long and the
accessors perform the usual bit twiddling on unsigned longs as opposed to
arrays thereof. Audits were done with the structure overhead in-place,
restoring this special-casing only afterward so as to ensure a more complete
API conversion while undergoing the majority of its end-user exposure in -mm.
More -mm's were shipped after its restoration to be sure that was tested,
too.
The immediate users of this functionality are Sun sparc64 systems, SGI mips64
and ia64 systems, and IBM ia32, ppc64, and s390 systems. Of these, only the
ppc64 machines needing the functionality have yet to be released; all others
have had systems requiring it for full functionality for at least 6 months,
and in some cases, since the initial Linux port to the affected architecture.
|