| Age | Commit message (Collapse) | Author |
|
It has caused more problems than it ever really solved, and is
apparently not getting cleaned up and fixed. We can put it back when
it's stable and isn't likely to make warning or bug events worse.
In the meantime, enable frame pointers for more readable stack traces.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Updated patch adding a variant of RCU that permits sleeping in read-side
critical sections. SRCU is as follows:
o Each use of SRCU creates its own srcu_struct, and each
srcu_struct has its own set of grace periods. This is
critical, as it prevents one subsystem with a blocking
reader from holding up SRCU grace periods for other
subsystems.
o The SRCU primitives (srcu_read_lock(), srcu_read_unlock(),
and synchronize_srcu()) all take a pointer to a srcu_struct.
o The SRCU primitives must be called from process context.
o srcu_read_lock() returns an int that must be passed to
the matching srcu_read_unlock(). Realtime RCU avoids the
need for this by storing the state in the task struct,
but SRCU needs to allow a given code path to pass through
multiple SRCU domains -- storing state in the task struct
would therefore require either arbitrary space in the
task struct or arbitrary limits on SRCU nesting. So I
kicked the state-storage problem up to the caller.
Of course, it is not permitted to call synchronize_srcu()
while in an SRCU read-side critical section.
o There is no call_srcu(). It would not be hard to implement
one, but it seems like too easy a way to OOM the system.
(Hey, we have enough trouble with call_rcu(), which does
-not- permit readers to sleep!!!) So, if you want it,
please tell me why...
[josht@us.ibm.com: sparse notation]
Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch defines the uts namespace and some manipulators.
Adds the uts namespace to task_struct, and initializes a
system-wide init namespace.
It leaves a #define for system_utsname so sysctl will compile.
This define will be removed in a separate patch.
[akpm@osdl.org: build fix, cleanup]
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch adds a nsproxy structure to the task struct. Later patches will
move the fs namespace pointer into this structure, and introduce a new utsname
namespace into the nsproxy.
The vserver and openvz functionality, then, would be implemented in large part
by virtualizing/isolating more and more resources into namespaces, each
contained in the nsproxy.
[akpm@osdl.org: build fix]
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Add some basic accounting fields to the taskstats struct, add a new
kernel/tsacct.c to handle basic accounting data handling upon exit. A handle
is added to taskstats.c to invoke the basic accounting data handling.
Signed-off-by: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: "Michal Piotrowski" <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Add infrastructure to track "maximum allowable latency" for power saving
policies.
The reason for adding this infrastructure is that power management in the
idle loop needs to make a tradeoff between latency and power savings
(deeper power save modes have a longer latency to running code again). The
code that today makes this tradeoff just does a rather simple algorithm;
however this is not good enough: There are devices and use cases where a
lower latency is required than that the higher power saving states provide.
An example would be audio playback, but another example is the ipw2100
wireless driver that right now has a very direct and ugly acpi hook to
disable some higher power states randomly when it gets certain types of
error.
The proposed solution is to have an interface where drivers can
* announce the maximum latency (in microseconds) that they can deal with
* modify this latency
* give up their constraint
and a function where the code that decides on power saving strategy can
query the current global desired maximum.
This patch has a user of each side: on the consumer side, ACPI is patched
to use this, on the producer side the ipw2100 driver is patched.
A generic maximum latency is also registered of 2 timer ticks (more and you
lose accurate time tracking after all).
While the existing users of the patch are x86 specific, the infrastructure
is not. I'd like to ask the arch maintainers of other architectures if the
infrastructure is generic enough for their use (assuming the architecture
has such a tradeoff as concept at all), and the sound/multimedia driver
owners to look at the driver facing API to see if this is something they
can use.
[akpm@osdl.org: cleanups]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jesse.barnes@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Create a "taskstats" interface based on generic netlink (NETLINK_GENERIC
family), for getting statistics of tasks and thread groups during their
lifetime and when they exit. The interface is intended for use by multiple
accounting packages though it is being created in the context of delay
accounting.
This patch creates the interface without populating the fields of the data
that is sent to the user in response to a command or upon the exit of a task.
Each accounting package interested in using taskstats has to provide an
additional patch to add its stats to the common structure.
[akpm@osdl.org: cleanups, Kconfig fix]
Signed-off-by: Shailabh Nagar <nagar@us.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
Cc: Erich Focht <efocht@ess.nec.de>
Cc: Levent Serinol <lserinol@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Initialization code related to collection of per-task "delay" statistics which
measure how long it had to wait for cpu, sync block io, swapping etc. The
collection of statistics and the interface are in other patches. This patch
sets up the data structures and allows the statistics collection to be
disabled through a kernel boot parameter.
Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Peter Chubb <peterc@gelato.unsw.edu.au>
Cc: Erich Focht <efocht@ess.nec.de>
Cc: Levent Serinol <lserinol@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Use the lock validator framework to prove spinlock and rwlock locking
correctness.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Use the lock validator framework to prove rwsem locking correctness.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Lock validator /proc/lockdep and /proc/lockdep_stats support.
(FIXME: should go into debugfs)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Do 'make oldconfig' and accept all the defaults for new config options -
reboot into the kernel and if everything goes well it should boot up fine and
you should have /proc/lockdep and /proc/lockdep_stats files.
Typically if the lock validator finds some problem it will print out
voluminous debug output that begins with "BUG: ..." and which syslog output
can be used by kernel developers to figure out the precise locking scenario.
What does the lock validator do? It "observes" and maps all locking rules as
they occur dynamically (as triggered by the kernel's natural use of spinlocks,
rwlocks, mutexes and rwsems). Whenever the lock validator subsystem detects a
new locking scenario, it validates this new rule against the existing set of
rules. If this new rule is consistent with the existing set of rules then the
new rule is added transparently and the kernel continues as normal. If the
new rule could create a deadlock scenario then this condition is printed out.
When determining validity of locking, all possible "deadlock scenarios" are
considered: assuming arbitrary number of CPUs, arbitrary irq context and task
context constellations, running arbitrary combinations of all the existing
locking scenarios. In a typical system this means millions of separate
scenarios. This is why we call it a "locking correctness" validator - for all
rules that are observed the lock validator proves it with mathematical
certainty that a deadlock could not occur (assuming that the lock validator
implementation itself is correct and its internal data structures are not
corrupted by some other kernel subsystem). [see more details and conditionals
of this statement in include/linux/lockdep.h and
Documentation/lockdep-design.txt]
Furthermore, this "all possible scenarios" property of the validator also
enables the finding of complex, highly unlikely multi-CPU multi-context races
via single single-context rules, increasing the likelyhood of finding bugs
drastically. In practical terms: the lock validator already found a bug in
the upstream kernel that could only occur on systems with 3 or more CPUs, and
which needed 3 very unlikely code sequences to occur at once on the 3 CPUs.
That bug was found and reported on a single-CPU system (!). So in essence a
race will be found "piecemail-wise", triggering all the necessary components
for the race, without having to reproduce the race scenario itself! In its
short existence the lock validator found and reported many bugs before they
actually caused a real deadlock.
To further increase the efficiency of the validator, the mapping is not per
"lock instance", but per "lock-class". For example, all struct inode objects
in the kernel have inode->inotify_mutex. If there are 10,000 inodes cached,
then there are 10,000 lock objects. But ->inotify_mutex is a single "lock
type", and all locking activities that occur against ->inotify_mutex are
"unified" into this single lock-class. The advantage of the lock-class
approach is that all historical ->inotify_mutex uses are mapped into a single
(and as narrow as possible) set of locking rules - regardless of how many
different tasks or inode structures it took to build this set of rules. The
set of rules persist during the lifetime of the kernel.
To see the rough magnitude of checking that the lock validator does, here's a
portion of /proc/lockdep_stats, fresh after bootup:
lock-classes: 694 [max: 2048]
direct dependencies: 1598 [max: 8192]
indirect dependencies: 17896
all direct dependencies: 16206
dependency chains: 1910 [max: 8192]
in-hardirq chains: 17
in-softirq chains: 105
in-process chains: 1065
stack-trace entries: 38761 [max: 131072]
combined max dependencies: 2033928
hardirq-safe locks: 24
hardirq-unsafe locks: 176
softirq-safe locks: 53
softirq-unsafe locks: 137
irq-safe locks: 59
irq-unsafe locks: 176
The lock validator has observed 1598 actual single-thread locking patterns,
and has validated all possible 2033928 distinct locking scenarios.
More details about the design of the lock validator can be found in
Documentation/lockdep-design.txt, which can also found at:
http://redhat.com/~mingo/lockdep-patches/lockdep-design.txt
[bunk@stusta.de: cleanups]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Framework to generate and save stacktraces quickly, without printing anything
to the console.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
RT-mutex tester: scriptable tester for rt mutexes, which allows userspace
scripting of mutex unit-tests (and dynamic tests as well), using the actual
rt-mutex implementation of the kernel.
[akpm@osdl.org: fixlet]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Runtime debugging functionality for rt-mutexes.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Core functions for the rt-mutex subsystem.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
* x86-64: (83 commits)
[PATCH] x86_64: x86_64 stack usage debugging
[PATCH] x86_64: (resend) x86_64 stack overflow debugging
[PATCH] x86_64: msi_apic.c build fix
[PATCH] x86_64: i386/x86-64 Add nmi watchdog support for new Intel CPUs
[PATCH] x86_64: Avoid broadcasting NMI IPIs
[PATCH] x86_64: fix apic error on bootup
[PATCH] x86_64: enlarge window for stack growth
[PATCH] x86_64: Minor string functions optimizations
[PATCH] x86_64: Move export symbols to their C functions
[PATCH] x86_64: Standardize i386/x86_64 handling of NMI_VECTOR
[PATCH] x86_64: Fix modular pc speaker
[PATCH] x86_64: remove sys32_ni_syscall()
[PATCH] x86_64: Do not use -ffunction-sections for modules
[PATCH] x86_64: Add cpu_relax to apic_wait_icr_idle
[PATCH] x86_64: adjust kstack_depth_to_print default
[PATCH] i386/x86-64: adjust /proc/interrupts column headings
[PATCH] x86_64: Fix race in cpu_local_* on preemptible kernels
[PATCH] x86_64: Fix fast check in safe_smp_processor_id
[PATCH] x86_64: x86_64 setup.c - printing cmp related boottime information
[PATCH] i386/x86-64/ia64: Move polling flag into thread_info_status
...
Manual resolve of trivial conflict in arch/i386/kernel/Makefile
|
|
These are the generic bits needed to enable reliable stack traces based
on Dwarf2-like (.eh_frame) unwind information. Subsequent patches will
enable x86-64 and i386 to make use of this.
Thanks to Andi Kleen and Ingo Molnar, who pointed out several possibilities
for improvement.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Modify the update_wall_time function so it increments time using the
clocksource abstraction instead of jiffies. Since the only clocksource driver
currently provided is the jiffies clocksource, this should result in no
functional change. Additionally, a timekeeping_init and timekeeping_resume
function has been added to initialize and maintain some of the new timekeping
state.
[hirofumi@mail.parknet.co.jp: fixlet]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This was already a bad plan when I argued against adding it in the first
place. Good riddance.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
|
|
32-bit syscall compatibility support. (This patch also moves all futex
related compat functionality into kernel/futex_compat.c.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: (22 commits)
[PATCH] fix audit_init failure path
[PATCH] EXPORT_SYMBOL patch for audit_log, audit_log_start, audit_log_end and audit_format
[PATCH] sem2mutex: audit_netlink_sem
[PATCH] simplify audit_free() locking
[PATCH] Fix audit operators
[PATCH] promiscuous mode
[PATCH] Add tty to syscall audit records
[PATCH] add/remove rule update
[PATCH] audit string fields interface + consumer
[PATCH] SE Linux audit events
[PATCH] Minor cosmetic cleanups to the code moved into auditfilter.c
[PATCH] Fix audit record filtering with !CONFIG_AUDITSYSCALL
[PATCH] Fix IA64 success/failure indication in syscall auditing.
[PATCH] Miscellaneous bug and warning fixes
[PATCH] Capture selinux subject/object context information.
[PATCH] Exclude messages by message type
[PATCH] Collect more inode information during syscall processing.
[PATCH] Pass dentry, not just name, in fsnotify creation hooks.
[PATCH] Define new range of userspace messages.
[PATCH] Filter rule comparators
...
Fixed trivial conflict in security/selinux/hooks.c
|
|
Original patch from Paul Mundt, sysfs parts removed by me since they
were broken.
Signed-off-by: Jens Axboe <axboe@suse.de>
|
|
This fixes the per-user and per-message-type filtering when syscall
auditing isn't enabled.
[AV: folded followup fix from the same author]
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Build kernel/intermodule.c only when required.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
hrtimer subsystem core. It is initialized at bootup and expired by the timer
interrupt, but is otherwise not utilized by any other subsystem yet.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
- Moving the crash_dump.c file to arch dependent part as kmap_atomic_pfn is
specific to i386 and highmem may not exist in other archs.
- Use ioremap for x86_64 to map the previous kernel memory.
- In copy_oldmem_page(), we now directly copy to the user/kernel buffer and
avoid the unneccesary copy to a kmalloc'd page.
Signed-off-by: Rachita Kothiyal <rachita@in.ibm.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
mutex implementation - add debugging code.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
|
|
mutex implementation, core files: just the basic subsystem, no users of it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
|
|
This patch is a rewrite of the one submitted on October 1st, using modules
(http://marc.theaimsgroup.com/?l=linux-kernel&m=112819093522998&w=2).
This rewrite adds a tristate CONFIG_RCU_TORTURE_TEST, which enables an
intense torture test of the RCU infratructure. This is needed due to the
continued changes to the RCU infrastructure to accommodate dynamic ticks,
CPU hotplug, realtime, and so on. Most of the code is in a separate file
that is compiled only if the CONFIG variable is set. Documentation on how
to run the test and interpret the output is also included.
This code has been tested on i386 and ppc64, and an earlier version of the
code has received extensive testing on a number of architectures as part of
the PREEMPT_RT patchset.
Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Since CONFIG_IKCONFIG_PROC already depends on CONFIG_IKCONFIG, adding
configs.o again is redundant.
Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch (written by me and also containing many suggestions of Arjan van
de Ven) does a major cleanup of the spinlock code. It does the following
things:
- consolidates and enhances the spinlock/rwlock debugging code
- simplifies the asm/spinlock.h files
- encapsulates the raw spinlock type and moves generic spinlock
features (such as ->break_lock) into the generic code.
- cleans up the spinlock code hierarchy to get rid of the spaghetti.
Most notably there's now only a single variant of the debugging code,
located in lib/spinlock_debug.c. (previously we had one SMP debugging
variant per architecture, plus a separate generic one for UP builds)
Also, i've enhanced the rwlock debugging facility, it will now track
write-owners. There is new spinlock-owner/CPU-tracking on SMP builds too.
All locks have lockup detection now, which will work for both soft and hard
spin/rwlock lockups.
The arch-level include files now only contain the minimally necessary
subset of the spinlock code - all the rest that can be generalized now
lives in the generic headers:
include/asm-i386/spinlock_types.h | 16
include/asm-x86_64/spinlock_types.h | 16
I have also split up the various spinlock variants into separate files,
making it easier to see which does what. The new layout is:
SMP | UP
----------------------------|-----------------------------------
asm/spinlock_types_smp.h | linux/spinlock_types_up.h
linux/spinlock_types.h | linux/spinlock_types.h
asm/spinlock_smp.h | linux/spinlock_up.h
linux/spinlock_api_smp.h | linux/spinlock_api_up.h
linux/spinlock.h | linux/spinlock.h
/*
* here's the role of the various spinlock/rwlock related include files:
*
* on SMP builds:
*
* asm/spinlock_types.h: contains the raw_spinlock_t/raw_rwlock_t and the
* initializers
*
* linux/spinlock_types.h:
* defines the generic type and initializers
*
* asm/spinlock.h: contains the __raw_spin_*()/etc. lowlevel
* implementations, mostly inline assembly code
*
* (also included on UP-debug builds:)
*
* linux/spinlock_api_smp.h:
* contains the prototypes for the _spin_*() APIs.
*
* linux/spinlock.h: builds the final spin_*() APIs.
*
* on UP builds:
*
* linux/spinlock_type_up.h:
* contains the generic, simplified UP spinlock type.
* (which is an empty structure on non-debug builds)
*
* linux/spinlock_types.h:
* defines the generic type and initializers
*
* linux/spinlock_up.h:
* contains the __raw_spin_*()/etc. version of UP
* builds. (which are NOPs on non-debug, non-preempt
* builds)
*
* (included on UP-non-debug builds:)
*
* linux/spinlock_api_up.h:
* builds the _spin_*() APIs.
*
* linux/spinlock.h: builds the final spin_*() APIs.
*/
All SMP and UP architectures are converted by this patch.
arm, i386, ia64, ppc, ppc64, s390/s390x, x64 was build-tested via
crosscompilers. m32r, mips, sh, sparc, have not been tested yet, but should
be mostly fine.
From: Grant Grundler <grundler@parisc-linux.org>
Booted and lightly tested on a500-44 (64-bit, SMP kernel, dual CPU).
Builds 32-bit SMP kernel (not booted or tested). I did not try to build
non-SMP kernels. That should be trivial to fix up later if necessary.
I converted bit ops atomic_hash lock to raw_spinlock_t. Doing so avoids
some ugly nesting of linux/*.h and asm/*.h files. Those particular locks
are well tested and contained entirely inside arch specific code. I do NOT
expect any new issues to arise with them.
If someone does ever need to use debug/metrics with them, then they will
need to unravel this hairball between spinlocks, atomic ops, and bit ops
that exist only because parisc has exactly one atomic instruction: LDCW
(load and clear word).
From: "Luck, Tony" <tony.luck@intel.com>
ia64 fix
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjanv@infradead.org>
Signed-off-by: Grant Grundler <grundler@parisc-linux.org>
Cc: Matthew Wilcox <willy@debian.org>
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Mikael Pettersson <mikpe@csd.uu.se>
Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch adds a new kernel debug feature: CONFIG_DETECT_SOFTLOCKUP.
When enabled then per-CPU watchdog threads are started, which try to run
once per second. If they get delayed for more than 10 seconds then a
callback from the timer interrupt detects this condition and prints out a
warning message and a stack dump (once per lockup incident). The feature
is otherwise non-intrusive, it doesnt try to unlock the box in any way, it
only gets the debug info out, automatically, and on all CPUs affected by
the lockup.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-Off-By: Matthias Urlichs <smurf@smurf.noris.de>
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch provides the interfaces necessary to read the dump contents,
treating it as a high memory device.
Signed off by Hariprasad Nellitheertha <hari@in.ibm.com>
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch introduces the architecture independent implementation the
sys_kexec_load, the compat_sys_kexec_load system calls.
Kexec on panic support has been integrated into the core patch and is
relatively clean.
In addition the hopefully architecture independent option
crashkernel=size@location has been docuemented. It's purpose is to reserve
space for the panic kernel to live, and where no DMA transfer will ever be
setup to access.
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Alexander Nyberg <alexn@telia.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
While looking at code generated by gcc4.0 I noticed some functions still
had frame pointers, even after we stopped ppc64 from defining
CONFIG_FRAME_POINTER. It turns out kernel/Makefile hardwires
-fno-omit-frame-pointer on when compiling schedule.c.
Create CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER and define it on architectures
that dont require frame pointers in sched.c code.
(akpm: blame me for the name)
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
into mars.ravnborg.org:/home/sam/bk/kbuild
|
|
This my cpuset patch, with the following changes in the last two weeks:
1) Updated to 2.6.8.1-mm1
2) [Simon Derr <Simon.Derr@bull.net>] Fix new cpuset to begin empty,
not copied from parent. Needed to avoid breaking exclusive property.
3) [Dinakar Guniguntala <dino@in.ibm.com>] Finish initializing top
cpuset from cpu_possible_map after smp_init() called.
4) [Paul Jackson <pj@sgi.com>] Check on each call to __alloc_pages()
if the current tasks cpuset mems_allowed has changed. Use a cpuset
generation number, bumped on any cpuset memory placement change,
to make this check efficient. Update the tasks mems_allowed from
its cpuset, if the cpuset has changed.
5) [Paul Jackson <pj@sgi.com>] If a task is moved to another cpuset,
then update its cpus_allowed, using set_cpus_allowed().
6) [Paul Jackson <pj@sgi.com>] Update Documentation/cpusets.txt to
reflect above changes (4) and (5).
I continue to recommend the following patch for inclusion in your 2.6.9-*mm
series, when that opens. It provides an important facility for high
performance computing on large systems. Simon Derr of Bull (France) and
myself are the primary authors. Erich Focht has indicated that NEC is also
a potential user of this patch on the TX-7 NUMA machines, and that he
"would very much welcome the inclusion of cpusets."
I offer this update to lkml, in order to invite continued feedback.
The one prerequiste patch for this cpuset patch was just posted before this
one. That was a patch to provide a new bitmap list format, of which
cpusets is the first user.
This patch has been built on top of 2.6.8.1-mm1, for the arch's:
i386 x86_64 sparc ia64 powerpc-405 powerpc-750 sparc64
with and without CONFIG_CPUSET. It has been booted and tested on ia64
(sn2_defconfig, SN2 hardware). The 'alpha' arch also built, except for
what seems to be an unrelated toolchain problem (crosstool ld sigsegv) in
the final link step.
===
Cpusets provide a mechanism for assigning a set of CPUs and Memory Nodes to
a set of tasks.
Cpusets constrain the CPU and Memory placement of tasks to only the
processor and memory resources within a tasks current cpuset. They form a
nested hierarchy visible in a virtual file system. These are the essential
hooks, beyond what is already present, required to manage dynamic job
placement on large systems.
Cpusets require small kernel hooks in init, exit, fork, mempolicy,
sched_setaffinity, page_alloc and vmscan. And they require a "struct
cpuset" pointer, a cpuset_mems_generation, and a "mems_allowed" nodemask_t
(to go along with the "cpus_allowed" cpumask_t that's already there) in
each task struct.
These hooks:
1) establish and propagate cpusets,
2) enforce CPU placement in sched_setaffinity,
3) enforce Memory placement in mbind and sys_set_mempolicy,
4) restrict page allocation and scanning to mems_allowed, and
5) restrict migration and set_cpus_allowed to cpus_allowed.
The other required hook, restricting task scheduling to CPUs in a tasks
cpus_allowed mask, is already present.
Cpusets extend the usefulness of, the existing placement support that was
added to Linux 2.6 kernels: sched_setaffinity() for CPU placement, and
mbind() and set_mempolicy() for memory placement. On smaller or dedicated
use systems, the existing calls are often sufficient.
On larger NUMA systems, running more than one, performance critical, job,
it is necessary to be able to manage jobs in their entirety. This includes
providing a job with exclusive CPU and memory that no other job can use,
and being able to list all tasks currently in a cpuset.
A given job running within a cpuset, would likely use the existing
placement calls to manage its CPU and memory placement in more detail.
Cpusets are named, nested sets of CPUs and Memory Nodes. Each cpuset is
represented by a directory in the cpuset virtual file system, normally
mounted at /dev/cpuset.
Each cpuset directory provides the following files, which can be
read and written:
cpus:
List of CPUs allowed to tasks in that cpuset.
mems:
List of Memory Nodes allowed to tasks in that cpuset.
tasks:
List of pid's of tasks in that cpuset.
cpu_exclusive:
Flag (0 or 1) - if set, cpuset has exclusive use of
its CPUs (no sibling or cousin cpuset may overlap CPUs).
mem_exclusive:
Flag (0 or 1) - if set, cpuset has exclusive use of
its Memory Nodes (no sibling or cousin may overlap).
notify_on_release:
Flag (0 or 1) - if set, then /sbin/cpuset_release_agent
will be invoked, with the name (/dev/cpuset relative path)
of that cpuset in argv[1], when the last user of it (task
or child cpuset) goes away. This supports automatic
cleanup of abandoned cpusets.
In addition one new filetype is added to the /proc file system:
/proc/<pid>/cpuset:
For each task (pid), list its cpuset path, relative to the
root of the cpuset file system. This file is read-only.
New cpusets are created using 'mkdir' (at the shell or in C). Old ones are
removed using 'rmdir'. The above files are accessed using read(2) and
write(2) system calls, or shell commands such as 'cat' and 'echo'.
The CPUs and Memory Nodes in a given cpuset are always a subset of its
parent. The root cpuset has all possible CPUs and Memory Nodes in the
system. A cpuset may be exclusive (cpu or memory) only if its parent is
similarly exclusive.
See further Documentation/cpusets.txt, at the top of the following
patch.
/proc interface:
It is useful, when learning and making new uses of cpusets and placement to be
able to see what are the current value of a tasks cpus_allowed and
mems_allowed, which are the actual placement used by the kernel scheduler and
memory allocator.
The cpus_allowed and mems_allowed values are needed by user space apps that
are micromanaging placement, such as when moving an app to a obtained by
that app within its cpuset using sched_setaffinity, mbind and
set_mempolicy.
The cpus_allowed value is also available via the sched_getaffinity system
call. But since the entire rest of the cpuset API, including the display
of mems_allowed added here, is via an ascii style presentation in /proc and
/dev/cpuset, it is worth the extra couple lines of code to display
cpus_allowed in the same way.
This patch adds the display of these two fields to the 'status' file in the
/proc/<pid> directory of each task. The fields are only added if
CONFIG_CPUSETS is enabled (which is also needed to define the mems_allowed
field of each task). The new output lines look like:
$ tail -2 /proc/1/status
Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff
Mems_allowed: ffffffff,ffffffff
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch provides support for thread and process CPU time clocks in the
POSIX clock interface. Both the existing utime and utime+stime information
(already available via getrusage et al) can be used, as well as a new
(potentially) more precise and accurate clock (which cannot distinguish user
from system time). The clock used is that provided by the `sched_clock'
function already used internally by the scheduler. This gives a way for
platforms to provide the highest-resolution CPU time tracking that is
available cheaply, and some already do so (such as x86 using TSC). Because
this clock is already sampled internally by the scheduler, this new tracking
adds only the tiniest new overhead to accomplish the bookkeeping.
Some notes:
This allows per-thread clocks to be accessed only by other threads in the same
process. The only POSIX calls that access these are defined only for
in-process use, and having this check is necessary for the userland
implementations of the POSIX clock functions to robustly refuse stale
clockid_t's in the face of potential PID reuse.
This makes no constraint on who can see whose per-process clocks. This
information is already available for the VIRT and PROF (i.e. utime and stime)
information via /proc. I am open to suggestions on if/how security
constraints on who can see whose clocks should be imposed.
The SCHED clock information is now available only via clock_* syscalls. This
means that per-thread information is not available outside the process.
Perhaps /proc should show sched_time as well? This would let ps et al show
this more-accurate information.
When this code is merged, it will be supported in glibc. I have written the
support and added some test programs for glibc, which are what I mainly used
to test the new kernel code. You can get those here:
http://people.redhat.com/roland/glibc/kernel-cpuclocks.patch
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
I'd need it merged into mainline at some point, unless anybody has strong
arguments against it. All I can guarantee here, is that I'll back it out
myself in the future, iff Cpushare will fail and nobody else started using
it in the meantime for similar security purposes.
(akpm: project details are at http://www.cpushare.com/technical. It seems
like a good idea to me, and one which is worth supporting. I agree that for
this to be successful, the added robustness of Andrea's simple and specific
jail is worthwhile).
Signed-off-by: Andrea Arcangeli <andrea@cpushare.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch makes a needlessly global variable static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Randy Dunlap <rddunlap@osdl.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
|
Sticking the not-implemented syscall stuff in sys.c is a pain because the
cond_syscall()s explode when certain prototypes are in scope. And we need
those prototypes' header files for the C code in sys.c.
Fix all that up by moving all the sys_ni_syscall code into its own .c file.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
A simple ringbuffer implementation for various character drivers.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
The following patch series consolidates the various instances of waitqueue
hashing to use a uniform structure and share the per-zone hashtable among all
waitqueue hashers. This is expected to increase the number of hashtable
buckets available for waiting on bh's and inodes and eliminate statically
allocated kernel data structures for greater node locality and reduced kernel
image size. Some attempt was made to look similar to Oleg Nesterov's
suggested API in order to provide some kind of credit for independent
invention of something very similar (the original versions of these patches
predated my public postings on the subject of filtered waitqueues).
These patches have the further benefit and intention of enabling aio to use
filtered wakeups by standardizing the data structure passed to wake functions
so that embedded waitqueue elements in aio structures may be succesfully
passed to the filtered wakeup wake functions, though this patch series doesn't
implement that particular functionality.
Successfully stress-tested on x86-64, and ia64 in recent prior versions.
This patch:
Move waitqueue -related functions not needing static functions in sched.c
to kernel/wait.c
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The main goal of this patch is to consolidate all the different but still
fundamentally similar arch/*/kernel/irq.c code into the kernel/irq/ subsystem.
There are 4 new files in the kernel/irq/ directory:
- handle.c: core bits: __do_IRQ() and handle_IRQ_event(),
callable from arch-specific irq.c code.
- manage.c: the main driver apis
- spurious.c: the handling of buggy interrupt sources.
- autoprobe.c: probing of interrupts - older code but still in use.
- proc.c: /proc/irq/ code.
- internals.h for irq-core-internal interfaces not visible to drivers
nor arch PIC code.
An architecture enables the generic hardirq code by defining
CONFIG_GENERIC_HARDIRQS in its arch Kconfig. People doing this conversion
should check out the x86/x64/ppc/ppc64 patches for details - the conversion is
quite straightforward but every converted function (i.e. every function
removed from the arch irq.c) _must_ be matched to the generic version and if
there is any detail that the generic code should do it has to be added to the
generic code. All of the currently converted 4 architectures were converted
like that, and the generic code was extended/fixed along the way.
Other changes related to this patchset:
- clean up the irq include files (linux/irq.h, linux/interrupt.h,
linux/hardirq.h) and consolidate asm-*/[hard]irq.h. Note, to keep all
non-touched architectures in an untouched state this consolidation is
done carefully and strictly under CONFIG_GENERIC_HARDIRQS.
Once the consolidation is done we can do a couple of final cleanups
to reach the following logical splitup of 3 include files:
linux/interrupt.h: driver-visible APIs and details
linux/irq.h: core irq and arch-PIC code, internals
asm-*/irq.h: arch PIC and irq delivery details
the following include files will likely vanish:
linux/hardirq.h merges into linux/irq.h
asm-*/hardirq.h: merges into asm-*/irq.h
asm-*/hw_irq.h: merges into asm-*/irq.h
Christoph would like to do these once the current wave of
cleanups gets in.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
into kroah.com:/home/greg/linux/BK/driver-2.6
|
|
Thanks to Kay Sievers for pointing this out.
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
|
|
This patch achieves out of line spinlocks by creating kernel/spinlock.c
and using the _raw_* inline locking functions.
Now, as much as this is supposed to be arch agnostic, there was still a
fair amount of rummaging about in archs, mostly for the cases where the
arch already has out of line locks and i wanted to avoid the extra call,
saving that extra call also makes lock profiling easier. PPC32/64 was
an example of such an arch and i have added the necessary profile_pc()
function as an example.
Size differences are with CONFIG_PREEMPT enabled since we wanted to
determine how much could be saved by moving that lot out of line too.
ppc64 = 259897 bytes:
text data bss dec hex filename
5489808 1962724 709064 8161596 7c893c vmlinux-after
5749577 1962852 709064 8421493 808075 vmlinux-before
sparc64 = 193368 bytes:
text data bss dec hex filename
3472037 633712 308920 4414669 435ccd vmlinux-after
3665285 633832 308920 4608037 465025 vmlinux-before
i386 = 416075 bytes
text data bss dec hex filename
5808371 867442 326864 7002677 6ada35 vmlinux-after
6221254 870634 326864 7418752 713380 vmlinux-before
x86-64 = 282446 bytes
text data bss dec hex filename
4598025 1450644 523632 6572301 64490d vmlinux-after
4881679 1449436 523632 6854747 68985b vmlinux-before
It has been compile tested (UP, SMP, PREEMPT) on i386, x86-64, sparc,
sparc64, ppc64, ppc32 and runtime tested on i386, x86-64 and sparc64.
Signed-off-by: Zwane Mwaikambo <zwane@fsmlabs.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|