| Age | Commit message (Collapse) | Author |
|
It was a stupid workaround for the "static inline" vs.
"extern inline" issues of long ago, and it is what causes
schedule() to be inlined like crazy into kernel/sched.c
when -Os is specified.
MIPS and S390 should probably do the same.
Now CC_OPTIMIZE_FOR_SIZE can be safely used on sparc64
once more.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Also, disable on sparc64 - a number of people report breakage. Probably
a compiler bug, but it's quite possible that it tickles some latent
kernel problem too.
It still defaults to 'y' everywhere else (when enabled through
EXPERIMENTAL), and Dave Jones points out that Fedora (and RHEL4) has
been building with size optimizations for a long time on x86, x86-64,
ia64, s390, s390x, ppc32 and ppc64. So it is really only moderately
experimental, but the sparc64 breakage certainly shows that it can
trigger "issues".
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Let's put my money where my mouth is. Smaller code is almost always
faster, if only because a single I$ miss ends up leaving a lot of cycles
to make up for. And system software - kernels in particular - are known
for taking more cache misses than most other kinds.
On my random config, this made the kernel about 10% smaller, and lmbench
seems to say that it's pretty uniformly faster too. Your milage may vary.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
KOBJECT_UEVENT=n seems to be a common pitfall for udev users in 2.6.14 .
-mm already contains a bigger patch removing this option that is IMHO
too big for being applied now to 2.6.15-rc.
This patch simply allows KOBJECT_UEVENT=n only if EMBEDDED.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Run idle threads with preempt disabled.
Also corrected a bugs in arm26's cpu_idle (make it actually call schedule()).
How did it ever work before?
Might fix the CPU hotplugging hang which Nigel Cunningham noted.
We think the bug hits if the idle thread is preempted after checking
need_resched() and before going to sleep, then the CPU offlined.
After calling stop_machine_run, the CPU eventually returns from preemption and
into the idle thread and goes to sleep. The CPU will continue executing
previous idle and have no chance to call play_dead.
By disabling preemption until we are ready to explicitly schedule, this bug is
fixed and the idle threads generally become more robust.
From: alexs <ashepard@u.washington.edu>
PPC build fix
From: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
MIPS build fix
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
drivers/block/ is right now a mix of core and driver parts. Lets move
the core parts to a new top level directory. Al will move the fs/
related block parts to block/ next.
Signed-off-by: Jens Axboe <axboe@suse.de>
|
|
Commit f2b36db692b7ff6972320ad9839ae656a3b0ee3e causes a bootup hang on
at least one machine. Revert for now until we understand why. The old
code may be ugly, but it works.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Try to make the INIT_ENV_ARG_LIMIT help text more readable and
understandable.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
All kinds of ugliness exists because we don't initialize
the apics during init_IRQs.
- We calibrate jiffies in non apic mode even when we are using apics.
- We have to have special code to initialize the apics when non-smp.
- The legacy i8259 must exist and be setup correctly, even
when we won't use it past initialization.
- The kexec on panic code must restore the state of the io_apics.
- init/main.c needs a special case for !smp smp_init on x86
In addition to pure code movement I needed a couple
of non-obvious changes:
- Move setup_boot_APIC_clock into APIC_late_time_init for
simplicity.
- Use cpu_khz to generate a better approximation of loops_per_jiffies
so I can verify the timer interrupt is working.
- Call setup_apic_nmi_watchdog again after cpu_khz is initialized on
the boot cpu.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Besides freeing initrd memory, also clear out the now dangling pointers to
it, to make sure accidental late use attempts can be detected.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Since early userspace was added, there's no way to override which init to
run from it. Some people tack on an extra cpio archive with a link from
/init depending on what they want to run, but that's sometimes impractical.
Changing the "init=" to also override the early userspace isn't feasible,
since it is still used to indicate what init to run from disk when early
userspace has completed doing whatever it's doing (i.e. load filesystem
modules and drivers).
Instead, introduce "rdinit=" and make it override the default "/init" if
specified.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
I passed init=/mylinuxrc to the kernel on the command line. The kernel
silently dropped down to exec /sbin/init. It turned out that /mylinuxrc
had improper permissions. Without any warning message from the kernel that
something was wrong it took awhile to find the issue. The patch below adds
a warning.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch adds a new kernel debug feature: CONFIG_DETECT_SOFTLOCKUP.
When enabled then per-CPU watchdog threads are started, which try to run
once per second. If they get delayed for more than 10 seconds then a
callback from the timer interrupt detects this condition and prints out a
warning message and a stack dump (once per lockup incident). The feature
is otherwise non-intrusive, it doesnt try to unlock the box in any way, it
only gets the debug info out, automatically, and on all CPUs affected by
the lockup.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-Off-By: Matthias Urlichs <smurf@smurf.noris.de>
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
This patch is against 2.6.10, but still applies cleanly. It's just
s/driverfs/sysfs/ in this file.
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Of this type, mostly:
CHECK net/ipv6/netfilter.c
net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static?
net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static?
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
upon the git commit
If CONFIG_AUTO_LOCALVERSION is set, the user is using a git-based tree, and the
current HEAD is not referred to by any tags in .git/refs/tags/, append -g and
the first 8 characters of the commit to the version string. This makes it
easier to use git-bisect, and/or to do a daily build, without trampling on your
older, working builds, or accidentally setting up conflicting sets of modules.
Signed-off-by: Ryan Anderson <ryan@michonline.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
|
Move initramfs options from Device Drivers | Block Drivers to General Setup
This is a more natural place for this option.
Furthermore separate out intramfs options to usr/Kconfig
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
|
Minor cleanup.
Move things into their include files, remove obsolete includes, fix
indentation, remove obsolete special cases etc.
I also added the per cpu section to asm-generic/sections.h and fixed
init/main.c to use it.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Add kerneldoc to kernel/cpuset.c
Fix cpuset typos in init/Kconfig
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Clarify the KALLSYMS_ALL help text slightly.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
|
|
From: Matt Mackall <mpm@selenic.com>
Add PREEMPT to UTS_VERSION where enabled as is done for SMP to make
preempt kernels easily identifiable.
Added SMP PREEMPT as comment in compile.h to force it to be
updated when they change (sam).
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
|
Remove ROOT_DEV after unexporting it in the previous patch, as requested time
ago by Christoph Hellwig.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
kernel/power/disk.c needs a declaration of name_to_dev_t() in scope. mount.h
seems like an appropriate choice.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On system boot up, there was an failure reported to boot.msg:
<5>Trying to move old root to /initrd ... failed
According to initrd(4) man page, step #7 of BOOT-UP OPERATION
is described as below:
7. If the normal root file has directory /initrd, device
/dev/ram0 is moved from / to /initrd. Otherwise if
directory /initrd does not exist device /dev/ram0 is
unmounted.
We got service calls from customers concerning about this failure message
at boot time. Many systems do not have /initrd and thus the message can be
changed in the case of non-existing /initrd so that it does not sound like
a failure of the system.
Signed-off-by: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch tweaks idle thread setup semantics a bit: instead of setting
NEED_RESCHED in init_idle(), we do an explicit schedule() before calling
into cpu_idle().
This patch, while having no negative side-effects, enables wider use of
cond_resched()s. (which might happen in the stock kernel too, but it's
particulary important for voluntary-preempt)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Alexey Dobriyan <adobriyan@mail.ru>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Issue:
Current tsc based delay_calibration can result in significant errors in
loops_per_jiffy count when the platform events like SMIs
(System Management Interrupts that are non-maskable) are present. This could
lead to potential kernel panic(). This issue is becoming more visible with 2.6
kernel (as default HZ is 1000) and on platforms with higher SMI handling
latencies. During the boot time, SMIs are mostly used by BIOS (for things
like legacy keyboard emulation).
Description:
The psuedocode for current delay calibration with tsc based delay looks like
(0) Estimate a value for loops_per_jiffy
(1) While (loops_per_jiffy estimate is accurate enough)
(2) wait for jiffy transition (jiffy1)
(3) Note down current tsc (tsc1)
(4) loop until tsc becomes tsc1 + loops_per_jiffy
(5) check whether jiffy changed since jiffy1 or not and refine
loops_per_jiffy estimate
Consider the following cases
Case 1:
If SMIs happen between (2) and (3) above, we can end up with a
loops_per_jiffy value that is too low. This results in shorted delays and
kernel can panic () during boot (Mostly at IOAPIC timer initialization
timer_irq_works() as we don't have enough timer interrupts in a specified
interval).
Case 2:
If SMIs happen between (3) and (4) above, then we can end up with a
loops_per_jiffy value that is too high. And with current i386 code, too
high lpj value (greater than 17M) can result in a overflow in
delay.c:__const_udelay() again resulting in shorter delay and panic().
Solution:
The patch below makes the calibration routine aware of asynchronous events
like SMIs. We increase the delay calibration time and also identify any
significant errors (greater than 12.5%) in the calibration and notify it to
user.
Patch below changes both i386 and x86-64 architectures to use this
new and improved calibrate_delay_direct() routine.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch modifies the way pagesets in struct zone are managed.
Each zone has a per-cpu array of pagesets. So any particular CPU has some
memory in each zone structure which belongs to itself. Even if that CPU is
not local to that zone.
So the patch relocates the pagesets for each cpu to the node that is nearest
to the cpu instead of allocating the pagesets in the (possibly remote) target
zone. This means that the operations to manage pages on remote zone can be
done with information available locally.
We play a macro trick so that non-NUMA pmachines avoid the additional
pointer chase on the page allocator fastpath.
AIM7 benchmark on a 32 CPU SGI Altix
w/o patches:
Tasks jobs/min jti jobs/min/task real cpu
1 484.68 100 484.6769 12.01 1.97 Fri Mar 25 11:01:42 2005
100 27140.46 89 271.4046 21.44 148.71 Fri Mar 25 11:02:04 2005
200 30792.02 82 153.9601 37.80 296.72 Fri Mar 25 11:02:42 2005
300 32209.27 81 107.3642 54.21 451.34 Fri Mar 25 11:03:37 2005
400 34962.83 78 87.4071 66.59 588.97 Fri Mar 25 11:04:44 2005
500 31676.92 75 63.3538 91.87 742.71 Fri Mar 25 11:06:16 2005
600 36032.69 73 60.0545 96.91 885.44 Fri Mar 25 11:07:54 2005
700 35540.43 77 50.7720 114.63 1024.28 Fri Mar 25 11:09:49 2005
800 33906.70 74 42.3834 137.32 1181.65 Fri Mar 25 11:12:06 2005
900 34120.67 73 37.9119 153.51 1325.26 Fri Mar 25 11:14:41 2005
1000 34802.37 74 34.8024 167.23 1465.26 Fri Mar 25 11:17:28 2005
with slab API changes and pageset patch:
Tasks jobs/min jti jobs/min/task real cpu
1 485.00 100 485.0000 12.00 1.96 Fri Mar 25 11:46:18 2005
100 28000.96 89 280.0096 20.79 150.45 Fri Mar 25 11:46:39 2005
200 32285.80 79 161.4290 36.05 293.37 Fri Mar 25 11:47:16 2005
300 40424.15 84 134.7472 43.19 438.42 Fri Mar 25 11:47:59 2005
400 39155.01 79 97.8875 59.46 590.05 Fri Mar 25 11:48:59 2005
500 37881.25 82 75.7625 76.82 730.19 Fri Mar 25 11:50:16 2005
600 39083.14 78 65.1386 89.35 872.79 Fri Mar 25 11:51:46 2005
700 38627.83 77 55.1826 105.47 1022.46 Fri Mar 25 11:53:32 2005
800 39631.94 78 49.5399 117.48 1169.94 Fri Mar 25 11:55:30 2005
900 36903.70 79 41.0041 141.94 1310.78 Fri Mar 25 11:57:53 2005
1000 36201.23 77 36.2012 160.77 1458.31 Fri Mar 25 12:00:34 2005
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com>
Signed-off-by: Shai Fultheim <Shai@Scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
Actually, the real support was added by some earlier patches. Now we simply
re-enable the config. option. I've actually tested it and it works well.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Audit now actually requires netlink. So make it depend on CONFIG_NET,
and remove the inline dependencies on CONFIG_NET.
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
|
|
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
|
|
|
|
This patch is for -mm only. It should probably be included in git-audit,
and should be forwarded to Linus iff git-audit is.
It updates the audit-syscall-{entry,exit} calls to current -mm.
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
|
|
Arrange for all kernel printks to be no-ops. Only available if
CONFIG_EMBEDDED.
This patch saves about 375k on my laptop config and nearly 100k on minimal
configs.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch eliminates all kernel BUGs, trims about 35k off the typical
kernel, and makes the system slightly faster.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Disable some hardware-only configuration options when configuring for ARCH=um.
By the way, we rename CONFIG_USERMODE to CONFIG_UML, as requested some time
ago by the UML maintainer Jeff Dike.
We also update defconfig as a consequence of all this.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: "Catalin(ux aka Dino) BOIE" <util@deuroconsult.ro>, Paolo 'Blaisorblade'
Giarrusso <blaisorblade@yahoo.it>, Jeff Dike <jdike@addtoit.com> Increase UML
command line size. And fix a crash from passing an overly-long command line
to UML.
XXX: check that init can handle 128 params and 128 env. var. The original
patch set this limit to 256, but it seems me too much. Think!
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch contains the following cleanups on several architectures:
- make some needlessly global code static
- remove the following write-only (except for printk's) variables:
- cache_decay_ticks
- smp_threads_ready
- cacheflush_time
I've only tried the compilation on i386, but I hope all mistakes I made
are on unimportant architectures. ;-)
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch modifies a few of the printk() loglevels used in init/main.c in
an attempt to make them a bit more appropriate.
The default loglevel is KERN_WARNING, but a few printk's without explicit
loglevel are not (in my oppinion) warnings, so add proper warning levels -
for instance; telling the user how many CPU's were brought up is hardly a
warning, make it KERN_INFO instead. The initial printing of linux_banner
is not a warning condition, I'd say it's more of a NOTICE or even INFO
condition - I've made it KERN_NOTICE just as the printing of the kernel
command line. A few printk's without explicit loglevel do match the
default one, but I've made them explicit (the default could change in the
future, and if it does then explicitly setting the proper loglevel is a
nice thing).
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Clarify the BASE_FULL help text.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: <mjg59@scrf.ucam.org>
When using a fully modularized kernel it is necessary to activate resume
manually as the device node might not be available during kernel init.
This patch implements a new sysfs attribute '/sys/power/resume' which allows
for manual activation of software resume. When read from it prints the
configured resume device in 'major:minor' format. When written to it expects
a device in 'major:minor' format. This device is then checked for a suspended
image and resume is started if a valid image is found. The original
functionality is left in place.
It should be used from initramfs, or with care.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
o This properly indents the kernel hacking menu.
o Move LOG_BUF_SHIFT into kernel hacking menu (it already depended on DEBUG_KERNEL).
o Add DEBUG_KERNEL dependency to EARLY_PRINTK, DEBUG_PREEMPT and FRAME_POINTER.
o Remove overlong dependency, which included practically every arch.
o Merge the two MAGIC_SYSRQ menu entries.
o Remove unnecessary "default n" options.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
|
This my cpuset patch, with the following changes in the last two weeks:
1) Updated to 2.6.8.1-mm1
2) [Simon Derr <Simon.Derr@bull.net>] Fix new cpuset to begin empty,
not copied from parent. Needed to avoid breaking exclusive property.
3) [Dinakar Guniguntala <dino@in.ibm.com>] Finish initializing top
cpuset from cpu_possible_map after smp_init() called.
4) [Paul Jackson <pj@sgi.com>] Check on each call to __alloc_pages()
if the current tasks cpuset mems_allowed has changed. Use a cpuset
generation number, bumped on any cpuset memory placement change,
to make this check efficient. Update the tasks mems_allowed from
its cpuset, if the cpuset has changed.
5) [Paul Jackson <pj@sgi.com>] If a task is moved to another cpuset,
then update its cpus_allowed, using set_cpus_allowed().
6) [Paul Jackson <pj@sgi.com>] Update Documentation/cpusets.txt to
reflect above changes (4) and (5).
I continue to recommend the following patch for inclusion in your 2.6.9-*mm
series, when that opens. It provides an important facility for high
performance computing on large systems. Simon Derr of Bull (France) and
myself are the primary authors. Erich Focht has indicated that NEC is also
a potential user of this patch on the TX-7 NUMA machines, and that he
"would very much welcome the inclusion of cpusets."
I offer this update to lkml, in order to invite continued feedback.
The one prerequiste patch for this cpuset patch was just posted before this
one. That was a patch to provide a new bitmap list format, of which
cpusets is the first user.
This patch has been built on top of 2.6.8.1-mm1, for the arch's:
i386 x86_64 sparc ia64 powerpc-405 powerpc-750 sparc64
with and without CONFIG_CPUSET. It has been booted and tested on ia64
(sn2_defconfig, SN2 hardware). The 'alpha' arch also built, except for
what seems to be an unrelated toolchain problem (crosstool ld sigsegv) in
the final link step.
===
Cpusets provide a mechanism for assigning a set of CPUs and Memory Nodes to
a set of tasks.
Cpusets constrain the CPU and Memory placement of tasks to only the
processor and memory resources within a tasks current cpuset. They form a
nested hierarchy visible in a virtual file system. These are the essential
hooks, beyond what is already present, required to manage dynamic job
placement on large systems.
Cpusets require small kernel hooks in init, exit, fork, mempolicy,
sched_setaffinity, page_alloc and vmscan. And they require a "struct
cpuset" pointer, a cpuset_mems_generation, and a "mems_allowed" nodemask_t
(to go along with the "cpus_allowed" cpumask_t that's already there) in
each task struct.
These hooks:
1) establish and propagate cpusets,
2) enforce CPU placement in sched_setaffinity,
3) enforce Memory placement in mbind and sys_set_mempolicy,
4) restrict page allocation and scanning to mems_allowed, and
5) restrict migration and set_cpus_allowed to cpus_allowed.
The other required hook, restricting task scheduling to CPUs in a tasks
cpus_allowed mask, is already present.
Cpusets extend the usefulness of, the existing placement support that was
added to Linux 2.6 kernels: sched_setaffinity() for CPU placement, and
mbind() and set_mempolicy() for memory placement. On smaller or dedicated
use systems, the existing calls are often sufficient.
On larger NUMA systems, running more than one, performance critical, job,
it is necessary to be able to manage jobs in their entirety. This includes
providing a job with exclusive CPU and memory that no other job can use,
and being able to list all tasks currently in a cpuset.
A given job running within a cpuset, would likely use the existing
placement calls to manage its CPU and memory placement in more detail.
Cpusets are named, nested sets of CPUs and Memory Nodes. Each cpuset is
represented by a directory in the cpuset virtual file system, normally
mounted at /dev/cpuset.
Each cpuset directory provides the following files, which can be
read and written:
cpus:
List of CPUs allowed to tasks in that cpuset.
mems:
List of Memory Nodes allowed to tasks in that cpuset.
tasks:
List of pid's of tasks in that cpuset.
cpu_exclusive:
Flag (0 or 1) - if set, cpuset has exclusive use of
its CPUs (no sibling or cousin cpuset may overlap CPUs).
mem_exclusive:
Flag (0 or 1) - if set, cpuset has exclusive use of
its Memory Nodes (no sibling or cousin may overlap).
notify_on_release:
Flag (0 or 1) - if set, then /sbin/cpuset_release_agent
will be invoked, with the name (/dev/cpuset relative path)
of that cpuset in argv[1], when the last user of it (task
or child cpuset) goes away. This supports automatic
cleanup of abandoned cpusets.
In addition one new filetype is added to the /proc file system:
/proc/<pid>/cpuset:
For each task (pid), list its cpuset path, relative to the
root of the cpuset file system. This file is read-only.
New cpusets are created using 'mkdir' (at the shell or in C). Old ones are
removed using 'rmdir'. The above files are accessed using read(2) and
write(2) system calls, or shell commands such as 'cat' and 'echo'.
The CPUs and Memory Nodes in a given cpuset are always a subset of its
parent. The root cpuset has all possible CPUs and Memory Nodes in the
system. A cpuset may be exclusive (cpu or memory) only if its parent is
similarly exclusive.
See further Documentation/cpusets.txt, at the top of the following
patch.
/proc interface:
It is useful, when learning and making new uses of cpusets and placement to be
able to see what are the current value of a tasks cpus_allowed and
mems_allowed, which are the actual placement used by the kernel scheduler and
memory allocator.
The cpus_allowed and mems_allowed values are needed by user space apps that
are micromanaging placement, such as when moving an app to a obtained by
that app within its cpuset using sched_setaffinity, mbind and
set_mempolicy.
The cpus_allowed value is also available via the sched_getaffinity system
call. But since the entire rest of the cpuset API, including the display
of mems_allowed added here, is via an ascii style presentation in /proc and
/dev/cpuset, it is worth the extra couple lines of code to display
cpus_allowed in the same way.
This patch adds the display of these two fields to the 'status' file in the
/proc/<pid> directory of each task. The fields are only added if
CONFIG_CPUSETS is enabled (which is also needed to define the mems_allowed
field of each task). The new output lines look like:
$ tail -2 /proc/1/status
Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff
Mems_allowed: ffffffff,ffffffff
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The attached patch causes process and session keyrings to be shared
properly when CLONE_THREAD is in force. It does this by moving the keyring
pointers into struct signal_struct[*].
[*] I have a patch to rename this to struct thread_group that I'll revisit
after the advent of 2.6.11.
Furthermore, once this patch is applied, process keyrings will no longer be
allocated at fork, but will instead only be allocated when needed.
Allocating them at fork was a way of half getting around the sharing across
threads problem, but that's no longer necessary.
This revision of the patch has the documentation changes patch rolled into it
and no longer abstracts the locking for signal_struct into a pair of macros.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch adds a generic array sorting library routine. This is meant
to replace qsort, which has two problem areas for kernel use.
The first issue is quadratic worst-case performance. While quicksort
worst-case datasets are rarely encountered in normal scenarios, it is
in fact quite easy to construct worst cases for almost all quicksort
algorithms given source or access to an element comparison callback.
This could allow attackers to cause sorts that would otherwise take
less than a millisecond to take seconds and sorts that should take
less than a second to take weeks or months. Fixing this problem
requires randomizing pivot selection with a secure random number
generator, which is rather expensive.
The second is that quicksort's recursion tracking requires either
nontrivial amounts of stack space or dynamic memory allocation and out
of memory error handling.
By comparison, heapsort has both O(n log n) average and worst-case
performance and practically no extra storage requirements. This
version runs within 70-90% of the average performance of optimized
quicksort so it should be an acceptable replacement wherever quicksort
would be used in the kernel.
Note that this function has an extra parameter for passing in an
optimized swapping function. This is worth 10% or more over the
typical byte-by-byte exchange functions.
Benchmarks:
qsort: glibc variant 1189 bytes (+ 256/1024 stack)
qsort_3f: my simplified variant 459 bytes (+ 256/1024 stack)
heapsort: the version below 346 bytes
shellsort: an optimized shellsort 196 bytes
P4 1.8GHz Opteron 1.4GHz (32-bit)
size algorithm cycles relative cycles relative
100:
qsort: 38682 100.00% 27631 100.00%
qsort_3f: 36277 106.63% 22406 123.32%
heapsort: 43574 88.77% 30301 91.19%
shellsort: 39087 98.97% 25139 109.91%
200:
qsort: 86468 100.00% 61148 100.00%
qsort_3f: 78918 109.57% 48959 124.90%
heapsort: 98040 88.20% 68235 89.61%
shellsort: 95688 90.36% 62279 98.18%
400:
qsort: 187720 100.00% 131313 100.00%
qsort_3f: 174905 107.33% 107954 121.64%
heapsort: 223896 83.84% 154241 85.13%
shellsort: 223037 84.17% 148990 88.14%
800:
qsort: 407060 100.00% 287460 100.00%
qsort_3f: 385106 105.70% 239131 120.21%
heapsort: 484662 83.99% 340099 84.52%
shellsort: 537110 75.79% 354755 81.03%
1600:
qsort: 879596 100.00% 621331 100.00%
qsort_3f: 861568 102.09% 522013 119.03%
heapsort: 1079750 81.46% 746677 83.21%
shellsort: 1234243 71.27% 820782 75.70%
3200:
qsort: 1903902 100.00% 1342126 100.00%
qsort_3f: 1908816 99.74% 1131496 118.62%
heapsort: 2515493 75.69% 1630333 82.32%
shellsort: 2985339 63.78% 1964794 68.31%
6400:
qsort: 4046370 100.00% 2909215 100.00%
qsort_3f: 4164468 97.16% 2468393 117.86%
heapsort: 5150659 78.56% 3533585 82.33%
shellsort: 6650225 60.85% 4429849 65.67%
12800:
qsort: 8729730 100.00% 6185097 100.00%
qsort_3f: 8776885 99.46% 5288826 116.95%
heapsort: 11064224 78.90% 7603061 81.35%
shellsort: 15487905 56.36% 10305163 60.02%
25600:
qsort: 18357770 100.00% 13172205 100.00%
qsort_3f: 18687842 98.23% 11337115 116.19%
heapsort: 24121241 76.11% 16612122 79.29%
shellsort: 35552814 51.64% 24106987 54.64%
51200:
qsort: 38658883 100.00% 28008505 100.00%
qsort_3f: 39498463 97.87% 24339675 115.07%
heapsort: 50553552 76.47% 37013828 75.67%
shellsort: 82602416 46.80% 56201889 49.84%
102400:
qsort: 81197794 100.00% 58918933 100.00%
qsort_3f: 84257930 96.37% 51986219 113.34%
heapsort: 110540577 73.46% 81419675 72.36%
shellsort: 191303132 42.44% 129786472 45.40%
From: Zou Nan hai <nanhai.zou@intel.com>
The new sort routine only works if there are an even number of entries in
the ia64 exception fix-up tables. If the number of entries is odd the sort
fails, and then random get_user/put_user calls can fail.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch series introduced a new pair of CONFIG_EMBEDDED options call
CONFIG_BASE_FULL/CONFIG_BASE_SMALL. Disabling CONFIG_BASE_FULL sets the
boolean CONFIG_BASE_SMALL to 1 and it is used to shrink a number of core data
structures. The space savings for the current batch is around 14k.
This patch:
Add CONFIG_BASE_SMALL for miscellaneous core size that don't warrant
their own options. Example users to follow.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
With hotplug cpu and preempt, we tend to see smp_processor_id warnings from
idle loop code because it's always checking whether its cpu has gone
offline. Replacing every use of smp_processor_id with _smp_processor_id in
all idle loop code is one solution; another way is explicitly binding idle
threads to their cpus (the smp_processor_id warning does not fire if the
caller is bound only to the calling cpu). This has the (admittedly slight)
advantage of letting us know if an idle thread ever runs on the wrong cpu.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Joel Schopp <jschopp@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|