| Age | Commit message (Collapse) | Author |
|
There are some callers in cpufreq hotplug notify path that the lowest
function calls lock_cpu_hotplug(). The lock is already held during
cpu_up() and cpu_down() calls when the notify calls are broadcast to
registered clients.
Ideally if possible, we could disable_preempt() at the highest caller and
make sure we dont sleep in the path down in cpufreq->driver_target() calls
but the calls are so intertwined and cumbersome to cleanup.
Hence we consistently use lock_cpu_hotplug() and unlock_cpu_hotplug() in
all places.
- Removed export of cpucontrol semaphore and made it static.
- removed explicit uses of up/down with lock_cpu_hotplug()
so we can keep track of the the callers in same thread context and
just keep refcounts without calling a down() that causes a deadlock.
- Removed current_in_hotplug() uses
- Removed PF_HOTPLUG_CPU in sched.h introduced for the current_in_hotplug()
temporary workaround.
Tested with insmod of cpufreq_stat.ko, and logical online/offline
to make sure we dont have any hang situations.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Cc: Zwane Mwaikambo <zwane@linuxpower.ca>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
When calling target drivers to set frequency, we take cpucontrol lock.
When we modified the code to accomodate CPU hotplug, there was an attempt
to take a double lock of cpucontrol leading to a deadlock. Since the
current thread context is already holding the cpucontrol lock, we dont need
to make another attempt to acquire it.
Now we leave a trace in current->flags indicating current thread already is
under cpucontrol lock held, so we dont attempt to do this another time.
Thanks to Andrew Morton for the beating:-)
From: Brice Goglin <Brice.Goglin@ens-lyon.org>
Build fix
(akpm: this patch is still unpleasant. Ashok continues to look for a cleaner
solution, doesn't he? ;))
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
cpufreq entries in sysfs should only be populated when CPU is online state.
When we either boot with maxcpus=x and then boot the other cpus by echoing
to sysfs online file, these entries should be created and destroyed when
CPU_DEAD is notified. Same treatement as cache entries under sysfs.
We place the processor in the lowest frequency, so hw managed P-State
transitions can still work on the other threads to save power.
Primary goal was to just make these directories appear/disapper dynamically.
There is one in this patch i had to do, which i really dont like myself but
probably best if someone handling the cpufreq infrastructure could give
this code right treatment if this is not acceptable. I guess its probably
good for the first cut.
- Converting lock_cpu_hotplug()/unlock_cpu_hotplug() to disable/enable preempt.
The locking was smack in the middle of the notification path, when the
hotplug is already holding the lock. I tried another solution to avoid this
so avoid taking locks if we know we are from notification path. The solution
was getting very ugly and i decided this was probably good for this iteration
until someone who understands cpufreq could do a better job than me.
(akpm: export cpucontrol to GPL modules: drivers/cpufreq/cpufreq_stats.c now
does lock_cpu_hotplug())
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Zwane Mwaikambo <zwane@holomorphy.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
(The i386 CPU hotplug patch provides infrastructure for some work which Pavel
is doing as well as for ACPI S3 (suspend-to-RAM) work which Li Shaohua
<shaohua.li@intel.com> is doing)
The following provides i386 architecture support for safely unregistering and
registering processors during runtime, updated for the current -mm tree. In
order to avoid dumping cpu hotplug code into kernel/irq/* i dropped the
cpu_online check in do_IRQ() by modifying fixup_irqs(). The difference being
that on cpu offline, fixup_irqs() is called before we clear the cpu from
cpu_online_map and a long delay in order to ensure that we never have any
queued external interrupts on the APICs. There are additional changes to s390
and ppc64 to account for this change.
1) Add CONFIG_HOTPLUG_CPU
2) disable local APIC timer on dead cpus.
3) Disable preempt around irq balancing to prevent CPUs going down.
4) Print irq stats for all possible cpus.
5) Debugging check for interrupts on offline cpus.
6) Hacky fixup_irqs() to redirect irqs when cpus go off/online.
7) play_dead() for offline cpus to spin inside.
8) Handle offline cpus set in flush_tlb_others().
9) Grab lock earlier in smp_call_function() to prevent CPUs going down.
10) Implement __cpu_disable() and __cpu_die().
11) Enable local interrupts in cpu_enable() after fixup_irqs()
12) Don't fiddle with NMI on dead cpu, but leave intact on other cpus.
13) Program IRQ affinity whilst cpu is still in cpu_online_map on offline.
Signed-off-by: Zwane Mwaikambo <zwane@linuxpower.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch introduces the concept of (virtual) cputime. Each architecture
can define its method to measure cputime. The main idea is to define a
cputime_t type and a set of operations on it (see asm-generic/cputime.h).
Then use the type for utime, stime, cutime, cstime, it_virt_value,
it_virt_incr, it_prof_value and it_prof_incr and use the cputime operations
for each access to these variables. The default implementation is jiffies
based and the effect of this patch for architectures which use the default
implementation should be neglectible.
There is a second type cputime64_t which is necessary for the kernel_stat
cpu statistics. The default cputime_t is 32 bit and based on HZ, this will
overflow after 49.7 days. This is not enough for kernel_stat (ihmo not
enough for a processes too), so it is necessary to have a 64 bit type.
The third thing that gets introduced by this patch is an additional field
for the /proc/stat interface: cpu steal time. An architecture can account
cpu steal time by calls to the account_stealtime function. The cpu which
backs a virtual processor doesn't spent all of its time for the virtual
cpu. To get meaningful cpu usage numbers this involuntary wait time needs
to be accounted and exported to user space.
From: Hugh Dickins <hugh@veritas.com>
The p->signal check in account_system_time is insufficient. If the timer
interrupt hits near the end of exit_notify, after EXIT_ZOMBIE has been set,
another cpu may release_task (NULLifying p->signal) in between
account_system_time's check and check_rlimit's dereference. Nor should
account_it_prof risk send_sig. But surely account_user_time is safe?
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Fix (harmless?) smp_processor_id() usage in preemptible section of
cpu_down.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Move hotplug_path[] out of kmod.[ch] to kobject_uevent.[ch] where
it belongs now. At some time in the future we should fix the remaining bad
hotplug calls (no SEQNUM, no netlink uevent):
./drivers/input/input.c (no DEVPATH on some hotplug events!)
./drivers/pnp/pnpbios/core.c
./drivers/s390/crypto/z90main.c
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
|
|
From: Keshavamurthy Anil S <anil.s.keshavamurthy@intel.com>
Remove cpu_run_sbin_hotplug() - use kobject_hotplug() instead.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
|
|
Introduce CPU_DOWN_FAILED notifier, so we can cope with a failure after a
CPU_DOWN_PREPARE notice.
This fixes 3/8 "add CPU_DOWN_PREPARE notifier" to be useful
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Add a CPU_DOWN_PREPARE hotplug CPU notifier. This is needed so we can dettach
all sched-domains before a CPU goes down, thus we can build domains from
online cpumasks, and not have to check for the possibility of a CPU coming up
or going down.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Add $DEVPATH to the environmental variables during /sbin/hotplug call.
Signed-off-by: Josef 'Jeff' Sipek <jeffpc@optonline.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: Paul Jackson <pj@sgi.com>
This patch makes cpu_present_map a real map for all configurations, instead of
a constant for non-SMP. It also moves the definition of cpu_present_map out
of kernel/cpu.c into kernel/sched.c, because cpu.c isn't compiled into non-SMP
kernels.
The pattern is that each of the possible, present and online cpu maps are
actual kernel global cpumask_t variables, for all configurations. They are
documented in include/linux/cpumask.h. Some of the UP (NR_CPUS=1) code
cheats, and hardcodes the assumption that the single bit position of these
maps is always set, as an optimization.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: Paul Jackson <pj@sgi.com>
With a hotplug capable kernel, there is a requirement to distinguish a
possible CPU from one actually present. The set of possible CPU numbers
doesn't change during a single system boot, but the set of present CPUs
changes as CPUs are physically inserted into or removed from a system. The
cpu_possible_map does not change once initialized at boot, but the
cpu_present_map changes dynamically as CPUs are inserted or removed.
Paul Jackson <pj@sgi.com> provided an expanded explanation:
Ashok's cpu hot plug patch adds a cpu_present_map, resulting in the following
cpu maps being available. All the following maps are fixed size bitmaps of
size NR_CPUS.
#ifdef CONFIG_HOTPLUG_CPU
cpu_possible_map - map with all NR_CPUS bits set
cpu_present_map - map with bit 'cpu' set iff cpu is populated
cpu_online_map - map with bit 'cpu' set iff cpu available to scheduler
#else
cpu_possible_map - map with bit 'cpu' set iff cpu is populated
cpu_present_map - copy of cpu_possible_map
cpu_online_map - map with bit 'cpu' set iff cpu available to scheduler
#endif
In either case, NR_CPUS is fixed at compile time, as the static size of these
bitmaps. The cpu_possible_map is fixed at boot time, as the set of CPU id's
that it is possible might ever be plugged in at anytime during the life of
that system boot. The cpu_present_map is dynamic(*), representing which CPUs
are currently plugged in. And cpu_online_map is the dynamic subset of
cpu_present_map, indicating those CPUs available for scheduling.
If HOTPLUG is enabled, then cpu_possible_map is forced to have all NR_CPUS
bits set, otherwise it is just the set of CPUs that ACPI reports present at
boot.
If HOTPLUG is enabled, then cpu_present_map varies dynamically, depending on
what ACPI reports as currently plugged in, otherwise cpu_present_map is just a
copy of cpu_possible_map.
(*) Well, cpu_present_map is dynamic in the hotplug case. If not hotplug,
it's the same as cpu_possible_map, hence fixed at boot.
|
|
From: Srivatsa Vaddagiri <vatsa@in.ibm.com>
migrate_all_tasks is currently run with rest of the machine stopped.
It iterates thr' the complete task table, turning off cpu affinity of any task
that it finds affine to the dying cpu. Depending on the task table
size this can take considerable time. All this time machine is stopped, doing
nothing.
Stopping the machine for such extended periods can be avoided if we do
task migration in CPU_DEAD notification and that's precisely what this patch
does.
The patch puts idle task to the _front_ of the dying CPU's runqueue at the
highest priority possible. This cause idle thread to run _immediately_ after
kstopmachine thread yields. Idle thread notices that its cpu is offline and
dies quickly. Task migration can then be done at leisure in CPU_DEAD
notification, when rest of the CPUs are running.
Some advantages with this approach are:
- More scalable. Predicatable amout of time that machine is stopped.
- No changes to hot path/core code. We are just exploiting scheduler
rules which runs the next high-priority task on the runqueue. Also
since I put idle task to the _front_ of the runqueue, there
are no races when a equally high priority task is woken up
and added to the runqueue. It gets in at the back of the runqueue,
_after_ idle task!
- cpu_is_offline check that is presenty required in try_to_wake_up,
idle_balance and rebalance_tick can be removed, thus speeding them
up a bit
From: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Rusty mentioned that the unlikely hints against cpu_is_offline is
redundant since the macro already has that hint. Patch below removes those
redundant hints I added.
|
|
Implement cpu_down(): uses stop_machine to freeze the machine, then
uses (arch-specific) __cpu_disable() and migrate_all_tasks().
Whole thing under CONFIG_HOTPLUG_CPU, so doesn't break archs which
don't define that.
|
|
|
|
The registration and unregistration of CPU notifiers should be done
under the cpucontrol sem. They should also be exported.
|
|
From: Jes Sorensen <jes@trained-monkey.org>
I'd like to propose the following for 2.6.1-mm/2.6.2. On systems with a
large number of CPUs the number of printk's flowing by for each CPU
booting starts becoming a real console hog.
The following patch eliminates a couple of them (already sent a patch to
David for the ia64 specific ones) as well as changes the
"Building zonelist : X" in "Built Y zonelists". IMHO it doesn't make any
sense to print for each zonelist since it's run in a for loop running
from 0 to Y-1 anyway.
The patch nukes a few new printk's that were introduced with the
scheduler changes to the NUMA code in -mm3, if these are still needed
then I won't fight for that part of the patch.
|
|
From: Jes Sorensen <jes@trained-monkey.org>
The following patch removes a couple of null-ilizers of global variables.
Not a big deal, but every byte helps in the .data segment ;-)
|
|
Trivial patch: when these were introduced cpu.h didn't exist.
|
|
Patch from Dipankar Sarma <dipankar@in.ibm.com>
This is Manfred's patch which provides a CPU_UP_PREPARE cpu notifier to
allow initialization of per_cpu data just before the cpu becomes fully
functional.
It also provides a facility for the CPU_UP_PREPARE handler to return
NOTIFY_BAD to signify that the CPU is not permitted to come up. If
that happens, a CPU_UP_CANCELLED message is passed to all the handlers.
The patch also fixes a bogus NOFITY_BAD return from the softirq setup
code.
Patch has been acked by Rusty.
We need this mechanism in slab for starting per-cpu timers and for
allocating the per-cpu slab hgead arrays *before* the CPU has come up
and started using slab.
|
|
|
|
This patch alters the boot sequence to "plug in" each CPU, one at a
time. You need the patch for each architecture, as well. The
interface used to be "smp_boot_cpus()", "smp_commence()", and each
arch implemented the "maxcpus" boot arg itself. With this patch,
it is:
smp_prepare_cpus(maxcpus): probe for cpus and set up cpu_possible(cpu).
__cpu_up(cpu): called *after* initcalls, for each cpu where
cpu_possible(cpu) is true.
smp_cpus_done(maxcpus): called after every cpu has been brought up
|