summaryrefslogtreecommitdiff
path: root/kernel/Makefile
AgeCommit message (Collapse)Author
2006-12-15Remove stack unwinder for nowLinus Torvalds
It has caused more problems than it ever really solved, and is apparently not getting cleaned up and fixed. We can put it back when it's stable and isn't likely to make warning or bug events worse. In the meantime, enable frame pointers for more readable stack traces. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] srcu-3: RCU variant permitting read-side blockingPaul E. McKenney
Updated patch adding a variant of RCU that permits sleeping in read-side critical sections. SRCU is as follows: o Each use of SRCU creates its own srcu_struct, and each srcu_struct has its own set of grace periods. This is critical, as it prevents one subsystem with a blocking reader from holding up SRCU grace periods for other subsystems. o The SRCU primitives (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu()) all take a pointer to a srcu_struct. o The SRCU primitives must be called from process context. o srcu_read_lock() returns an int that must be passed to the matching srcu_read_unlock(). Realtime RCU avoids the need for this by storing the state in the task struct, but SRCU needs to allow a given code path to pass through multiple SRCU domains -- storing state in the task struct would therefore require either arbitrary space in the task struct or arbitrary limits on SRCU nesting. So I kicked the state-storage problem up to the caller. Of course, it is not permitted to call synchronize_srcu() while in an SRCU read-side critical section. o There is no call_srcu(). It would not be hard to implement one, but it seems like too easy a way to OOM the system. (Hey, we have enough trouble with call_rcu(), which does -not- permit readers to sleep!!!) So, if you want it, please tell me why... [josht@us.ibm.com: sparse notation] Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Josh Triplett <josh@freedesktop.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] namespaces: utsname: implement utsname namespacesSerge E. Hallyn
This patch defines the uts namespace and some manipulators. Adds the uts namespace to task_struct, and initializes a system-wide init namespace. It leaves a #define for system_utsname so sysctl will compile. This define will be removed in a separate patch. [akpm@osdl.org: build fix, cleanup] Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] namespaces: add nsproxySerge E. Hallyn
This patch adds a nsproxy structure to the task struct. Later patches will move the fs namespace pointer into this structure, and introduce a new utsname namespace into the nsproxy. The vserver and openvz functionality, then, would be implemented in large part by virtualizing/isolating more and more resources into namespaces, each contained in the nsproxy. [akpm@osdl.org: build fix] Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01[PATCH] csa: basic accounting over taskstatsJay Lan
Add some basic accounting fields to the taskstats struct, add a new kernel/tsacct.c to handle basic accounting data handling upon exit. A handle is added to taskstats.c to invoke the basic accounting data handling. Signed-off-by: Jay Lan <jlan@sgi.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Chris Sturtivant <csturtiv@sgi.com> Cc: Tony Ernst <tee@sgi.com> Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net> Cc: "Michal Piotrowski" <michal.k.k.piotrowski@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01[PATCH] maximum latency tracking infrastructureArjan van de Ven
Add infrastructure to track "maximum allowable latency" for power saving policies. The reason for adding this infrastructure is that power management in the idle loop needs to make a tradeoff between latency and power savings (deeper power save modes have a longer latency to running code again). The code that today makes this tradeoff just does a rather simple algorithm; however this is not good enough: There are devices and use cases where a lower latency is required than that the higher power saving states provide. An example would be audio playback, but another example is the ipw2100 wireless driver that right now has a very direct and ugly acpi hook to disable some higher power states randomly when it gets certain types of error. The proposed solution is to have an interface where drivers can * announce the maximum latency (in microseconds) that they can deal with * modify this latency * give up their constraint and a function where the code that decides on power saving strategy can query the current global desired maximum. This patch has a user of each side: on the consumer side, ACPI is patched to use this, on the producer side the ipw2100 driver is patched. A generic maximum latency is also registered of 2 timer ticks (more and you lose accurate time tracking after all). While the existing users of the patch are x86 specific, the infrastructure is not. I'd like to ask the arch maintainers of other architectures if the infrastructure is generic enough for their use (assuming the architecture has such a tradeoff as concept at all), and the sound/multimedia driver owners to look at the driver facing API to see if this is something they can use. [akpm@osdl.org: cleanups] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jesse Barnes <jesse.barnes@intel.com> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14[PATCH] per-task-delay-accounting: taskstats interfaceShailabh Nagar
Create a "taskstats" interface based on generic netlink (NETLINK_GENERIC family), for getting statistics of tasks and thread groups during their lifetime and when they exit. The interface is intended for use by multiple accounting packages though it is being created in the context of delay accounting. This patch creates the interface without populating the fields of the data that is sent to the user in response to a command or upon the exit of a task. Each accounting package interested in using taskstats has to provide an additional patch to add its stats to the common structure. [akpm@osdl.org: cleanups, Kconfig fix] Signed-off-by: Shailabh Nagar <nagar@us.ibm.com> Signed-off-by: Balbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Peter Chubb <peterc@gelato.unsw.edu.au> Cc: Erich Focht <efocht@ess.nec.de> Cc: Levent Serinol <lserinol@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-14[PATCH] per-task-delay-accounting: setupShailabh Nagar
Initialization code related to collection of per-task "delay" statistics which measure how long it had to wait for cpu, sync block io, swapping etc. The collection of statistics and the interface are in other patches. This patch sets up the data structures and allows the statistics collection to be disabled through a kernel boot parameter. Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com> Signed-off-by: Balbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Peter Chubb <peterc@gelato.unsw.edu.au> Cc: Erich Focht <efocht@ess.nec.de> Cc: Levent Serinol <lserinol@gmail.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03[PATCH] lockdep: prove spinlock rwlock locking correctnessIngo Molnar
Use the lock validator framework to prove spinlock and rwlock locking correctness. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03[PATCH] lockdep: prove rwsem locking correctnessIngo Molnar
Use the lock validator framework to prove rwsem locking correctness. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03[PATCH] lockdep: procfsIngo Molnar
Lock validator /proc/lockdep and /proc/lockdep_stats support. (FIXME: should go into debugfs) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03[PATCH] lockdep: coreIngo Molnar
Do 'make oldconfig' and accept all the defaults for new config options - reboot into the kernel and if everything goes well it should boot up fine and you should have /proc/lockdep and /proc/lockdep_stats files. Typically if the lock validator finds some problem it will print out voluminous debug output that begins with "BUG: ..." and which syslog output can be used by kernel developers to figure out the precise locking scenario. What does the lock validator do? It "observes" and maps all locking rules as they occur dynamically (as triggered by the kernel's natural use of spinlocks, rwlocks, mutexes and rwsems). Whenever the lock validator subsystem detects a new locking scenario, it validates this new rule against the existing set of rules. If this new rule is consistent with the existing set of rules then the new rule is added transparently and the kernel continues as normal. If the new rule could create a deadlock scenario then this condition is printed out. When determining validity of locking, all possible "deadlock scenarios" are considered: assuming arbitrary number of CPUs, arbitrary irq context and task context constellations, running arbitrary combinations of all the existing locking scenarios. In a typical system this means millions of separate scenarios. This is why we call it a "locking correctness" validator - for all rules that are observed the lock validator proves it with mathematical certainty that a deadlock could not occur (assuming that the lock validator implementation itself is correct and its internal data structures are not corrupted by some other kernel subsystem). [see more details and conditionals of this statement in include/linux/lockdep.h and Documentation/lockdep-design.txt] Furthermore, this "all possible scenarios" property of the validator also enables the finding of complex, highly unlikely multi-CPU multi-context races via single single-context rules, increasing the likelyhood of finding bugs drastically. In practical terms: the lock validator already found a bug in the upstream kernel that could only occur on systems with 3 or more CPUs, and which needed 3 very unlikely code sequences to occur at once on the 3 CPUs. That bug was found and reported on a single-CPU system (!). So in essence a race will be found "piecemail-wise", triggering all the necessary components for the race, without having to reproduce the race scenario itself! In its short existence the lock validator found and reported many bugs before they actually caused a real deadlock. To further increase the efficiency of the validator, the mapping is not per "lock instance", but per "lock-class". For example, all struct inode objects in the kernel have inode->inotify_mutex. If there are 10,000 inodes cached, then there are 10,000 lock objects. But ->inotify_mutex is a single "lock type", and all locking activities that occur against ->inotify_mutex are "unified" into this single lock-class. The advantage of the lock-class approach is that all historical ->inotify_mutex uses are mapped into a single (and as narrow as possible) set of locking rules - regardless of how many different tasks or inode structures it took to build this set of rules. The set of rules persist during the lifetime of the kernel. To see the rough magnitude of checking that the lock validator does, here's a portion of /proc/lockdep_stats, fresh after bootup: lock-classes: 694 [max: 2048] direct dependencies: 1598 [max: 8192] indirect dependencies: 17896 all direct dependencies: 16206 dependency chains: 1910 [max: 8192] in-hardirq chains: 17 in-softirq chains: 105 in-process chains: 1065 stack-trace entries: 38761 [max: 131072] combined max dependencies: 2033928 hardirq-safe locks: 24 hardirq-unsafe locks: 176 softirq-safe locks: 53 softirq-unsafe locks: 137 irq-safe locks: 59 irq-unsafe locks: 176 The lock validator has observed 1598 actual single-thread locking patterns, and has validated all possible 2033928 distinct locking scenarios. More details about the design of the lock validator can be found in Documentation/lockdep-design.txt, which can also found at: http://redhat.com/~mingo/lockdep-patches/lockdep-design.txt [bunk@stusta.de: cleanups] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03[PATCH] lockdep: stacktrace subsystem, coreIngo Molnar
Framework to generate and save stacktraces quickly, without printing anything to the console. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] pi-futex: rt mutex testerThomas Gleixner
RT-mutex tester: scriptable tester for rt mutexes, which allows userspace scripting of mutex unit-tests (and dynamic tests as well), using the actual rt-mutex implementation of the kernel. [akpm@osdl.org: fixlet] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] pi-futex: rt mutex debugIngo Molnar
Runtime debugging functionality for rt-mutexes. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] pi-futex: rt mutex coreIngo Molnar
Core functions for the rt-mutex subsystem. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26Merge branch 'x86-64'Linus Torvalds
* x86-64: (83 commits) [PATCH] x86_64: x86_64 stack usage debugging [PATCH] x86_64: (resend) x86_64 stack overflow debugging [PATCH] x86_64: msi_apic.c build fix [PATCH] x86_64: i386/x86-64 Add nmi watchdog support for new Intel CPUs [PATCH] x86_64: Avoid broadcasting NMI IPIs [PATCH] x86_64: fix apic error on bootup [PATCH] x86_64: enlarge window for stack growth [PATCH] x86_64: Minor string functions optimizations [PATCH] x86_64: Move export symbols to their C functions [PATCH] x86_64: Standardize i386/x86_64 handling of NMI_VECTOR [PATCH] x86_64: Fix modular pc speaker [PATCH] x86_64: remove sys32_ni_syscall() [PATCH] x86_64: Do not use -ffunction-sections for modules [PATCH] x86_64: Add cpu_relax to apic_wait_icr_idle [PATCH] x86_64: adjust kstack_depth_to_print default [PATCH] i386/x86-64: adjust /proc/interrupts column headings [PATCH] x86_64: Fix race in cpu_local_* on preemptible kernels [PATCH] x86_64: Fix fast check in safe_smp_processor_id [PATCH] x86_64: x86_64 setup.c - printing cmp related boottime information [PATCH] i386/x86-64/ia64: Move polling flag into thread_info_status ... Manual resolve of trivial conflict in arch/i386/kernel/Makefile
2006-06-26[PATCH] x86_64: reliable stack trace supportJan Beulich
These are the generic bits needed to enable reliable stack traces based on Dwarf2-like (.eh_frame) unwind information. Subsequent patches will enable x86-64 and i386 to make use of this. Thanks to Andi Kleen and Ingo Molnar, who pointed out several possibilities for improvement. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] Time: Use clocksource infrastructure for update_wall_timejohn stultz
Modify the update_wall_time function so it increments time using the clocksource abstraction instead of jiffies. Since the only clocksource driver currently provided is the jiffies clocksource, this should result in no functional change. Additionally, a timekeeping_init and timekeeping_resume function has been added to initialize and maintain some of the new timekeping state. [hirofumi@mail.parknet.co.jp: fixlet] Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-08Finally remove the obnoxious inter_module_xxx()David Woodhouse
This was already a bad plan when I argued against adding it in the first place. Good riddance. Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2006-03-27[PATCH] lightweight robust futexes: compatIngo Molnar
32-bit syscall compatibility support. (This patch also moves all futex related compat functionality into kernel/futex_compat.c.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@infradead.org> Acked-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25Merge branch 'audit.b3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: (22 commits) [PATCH] fix audit_init failure path [PATCH] EXPORT_SYMBOL patch for audit_log, audit_log_start, audit_log_end and audit_format [PATCH] sem2mutex: audit_netlink_sem [PATCH] simplify audit_free() locking [PATCH] Fix audit operators [PATCH] promiscuous mode [PATCH] Add tty to syscall audit records [PATCH] add/remove rule update [PATCH] audit string fields interface + consumer [PATCH] SE Linux audit events [PATCH] Minor cosmetic cleanups to the code moved into auditfilter.c [PATCH] Fix audit record filtering with !CONFIG_AUDITSYSCALL [PATCH] Fix IA64 success/failure indication in syscall auditing. [PATCH] Miscellaneous bug and warning fixes [PATCH] Capture selinux subject/object context information. [PATCH] Exclude messages by message type [PATCH] Collect more inode information during syscall processing. [PATCH] Pass dentry, not just name, in fsnotify creation hooks. [PATCH] Define new range of userspace messages. [PATCH] Filter rule comparators ... Fixed trivial conflict in security/selinux/hooks.c
2006-03-23[PATCH] relay: migrate from relayfs to a generic relay APIJens Axboe
Original patch from Paul Mundt, sysfs parts removed by me since they were broken. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-20[PATCH] Fix audit record filtering with !CONFIG_AUDITSYSCALLDavid Woodhouse
This fixes the per-user and per-message-type filtering when syscall auditing isn't enabled. [AV: folded followup fix from the same author] Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-01-16[PATCH] build kernel/intermodule.c only when requiredAdrian Bunk
Build kernel/intermodule.c only when required. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10[PATCH] hrtimer: hrtimer core codeThomas Gleixner
hrtimer subsystem core. It is initialized at bootup and expired by the timer interrupt, but is otherwise not utilized by any other subsystem yet. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10[PATCH] kdump: read previous kernel's memoryVivek Goyal
- Moving the crash_dump.c file to arch dependent part as kmap_atomic_pfn is specific to i386 and highmem may not exist in other archs. - Use ioremap for x86_64 to map the previous kernel memory. - In copy_oldmem_page(), we now directly copy to the user/kernel buffer and avoid the unneccesary copy to a kmalloc'd page. Signed-off-by: Rachita Kothiyal <rachita@in.ibm.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-09[PATCH] mutex subsystem, debugging codeIngo Molnar
mutex implementation - add debugging code. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@infradead.org>
2006-01-09[PATCH] mutex subsystem, coreIngo Molnar
mutex implementation, core files: just the basic subsystem, no users of it. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@infradead.org>
2005-10-30[PATCH] RCU torture-testing kernel modulePaul E. McKenney
This patch is a rewrite of the one submitted on October 1st, using modules (http://marc.theaimsgroup.com/?l=linux-kernel&m=112819093522998&w=2). This rewrite adds a tristate CONFIG_RCU_TORTURE_TEST, which enables an intense torture test of the RCU infratructure. This is needed due to the continued changes to the RCU infrastructure to accommodate dynamic ticks, CPU hotplug, realtime, and so on. Most of the code is in a separate file that is compiled only if the CONFIG variable is set. Documentation on how to run the test and interpret the output is also included. This code has been tested on i386 and ppc64, and an earlier version of the code has received extensive testing on a number of architectures as part of the PREEMPT_RT patchset. Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] Remove redundant configs.oBrian Gerst
Since CONFIG_IKCONFIG_PROC already depends on CONFIG_IKCONFIG, adding configs.o again is redundant. Signed-off-by: Brian Gerst <bgerst@didntduck.org> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10[PATCH] spinlock consolidationIngo Molnar
This patch (written by me and also containing many suggestions of Arjan van de Ven) does a major cleanup of the spinlock code. It does the following things: - consolidates and enhances the spinlock/rwlock debugging code - simplifies the asm/spinlock.h files - encapsulates the raw spinlock type and moves generic spinlock features (such as ->break_lock) into the generic code. - cleans up the spinlock code hierarchy to get rid of the spaghetti. Most notably there's now only a single variant of the debugging code, located in lib/spinlock_debug.c. (previously we had one SMP debugging variant per architecture, plus a separate generic one for UP builds) Also, i've enhanced the rwlock debugging facility, it will now track write-owners. There is new spinlock-owner/CPU-tracking on SMP builds too. All locks have lockup detection now, which will work for both soft and hard spin/rwlock lockups. The arch-level include files now only contain the minimally necessary subset of the spinlock code - all the rest that can be generalized now lives in the generic headers: include/asm-i386/spinlock_types.h | 16 include/asm-x86_64/spinlock_types.h | 16 I have also split up the various spinlock variants into separate files, making it easier to see which does what. The new layout is: SMP | UP ----------------------------|----------------------------------- asm/spinlock_types_smp.h | linux/spinlock_types_up.h linux/spinlock_types.h | linux/spinlock_types.h asm/spinlock_smp.h | linux/spinlock_up.h linux/spinlock_api_smp.h | linux/spinlock_api_up.h linux/spinlock.h | linux/spinlock.h /* * here's the role of the various spinlock/rwlock related include files: * * on SMP builds: * * asm/spinlock_types.h: contains the raw_spinlock_t/raw_rwlock_t and the * initializers * * linux/spinlock_types.h: * defines the generic type and initializers * * asm/spinlock.h: contains the __raw_spin_*()/etc. lowlevel * implementations, mostly inline assembly code * * (also included on UP-debug builds:) * * linux/spinlock_api_smp.h: * contains the prototypes for the _spin_*() APIs. * * linux/spinlock.h: builds the final spin_*() APIs. * * on UP builds: * * linux/spinlock_type_up.h: * contains the generic, simplified UP spinlock type. * (which is an empty structure on non-debug builds) * * linux/spinlock_types.h: * defines the generic type and initializers * * linux/spinlock_up.h: * contains the __raw_spin_*()/etc. version of UP * builds. (which are NOPs on non-debug, non-preempt * builds) * * (included on UP-non-debug builds:) * * linux/spinlock_api_up.h: * builds the _spin_*() APIs. * * linux/spinlock.h: builds the final spin_*() APIs. */ All SMP and UP architectures are converted by this patch. arm, i386, ia64, ppc, ppc64, s390/s390x, x64 was build-tested via crosscompilers. m32r, mips, sh, sparc, have not been tested yet, but should be mostly fine. From: Grant Grundler <grundler@parisc-linux.org> Booted and lightly tested on a500-44 (64-bit, SMP kernel, dual CPU). Builds 32-bit SMP kernel (not booted or tested). I did not try to build non-SMP kernels. That should be trivial to fix up later if necessary. I converted bit ops atomic_hash lock to raw_spinlock_t. Doing so avoids some ugly nesting of linux/*.h and asm/*.h files. Those particular locks are well tested and contained entirely inside arch specific code. I do NOT expect any new issues to arise with them. If someone does ever need to use debug/metrics with them, then they will need to unravel this hairball between spinlocks, atomic ops, and bit ops that exist only because parisc has exactly one atomic instruction: LDCW (load and clear word). From: "Luck, Tony" <tony.luck@intel.com> ia64 fix Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjanv@infradead.org> Signed-off-by: Grant Grundler <grundler@parisc-linux.org> Cc: Matthew Wilcox <willy@debian.org> Signed-off-by: Hirokazu Takata <takata@linux-m32r.org> Signed-off-by: Mikael Pettersson <mikpe@csd.uu.se> Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-07[PATCH] detect soft lockupsIngo Molnar
This patch adds a new kernel debug feature: CONFIG_DETECT_SOFTLOCKUP. When enabled then per-CPU watchdog threads are started, which try to run once per second. If they get delayed for more than 10 seconds then a callback from the timer interrupt detects this condition and prints out a warning message and a stack dump (once per lockup incident). The feature is otherwise non-intrusive, it doesnt try to unlock the box in any way, it only gets the debug info out, automatically, and on all CPUs affected by the lockup. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-Off-By: Matthias Urlichs <smurf@smurf.noris.de> Signed-off-by: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25[PATCH] kdump: Routines for copying dump pagesVivek Goyal
This patch provides the interfaces necessary to read the dump contents, treating it as a high memory device. Signed off by Hariprasad Nellitheertha <hari@in.ibm.com> Signed-off-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25[PATCH] kexec: add kexec syscallsEric W. Biederman
This patch introduces the architecture independent implementation the sys_kexec_load, the compat_sys_kexec_load system calls. Kexec on panic support has been integrated into the core patch and is relatively clean. In addition the hopefully architecture independent option crashkernel=size@location has been docuemented. It's purpose is to reserve space for the panic kernel to live, and where no DMA transfer will ever be setup to access. Signed-off-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Alexander Nyberg <alexn@telia.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05[PATCH] ppc64: remove hidden -fno-omit-frame-pointer for schedule.cAnton Blanchard
While looking at code generated by gcc4.0 I noticed some functions still had frame pointers, even after we stopped ppc64 from defining CONFIG_FRAME_POINTER. It turns out kernel/Makefile hardwires -fno-omit-frame-pointer on when compiling schedule.c. Create CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER and define it on architectures that dont require frame pointers in sched.c code. (akpm: blame me for the name) Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-12Merge mars.ravnborg.org:/home/sam/bk/linux-2.6Sam Ravnborg
into mars.ravnborg.org:/home/sam/bk/kbuild
2005-03-09[PATCH] cpusets - big numa cpu and memory placementPaul Jackson
This my cpuset patch, with the following changes in the last two weeks: 1) Updated to 2.6.8.1-mm1 2) [Simon Derr <Simon.Derr@bull.net>] Fix new cpuset to begin empty, not copied from parent. Needed to avoid breaking exclusive property. 3) [Dinakar Guniguntala <dino@in.ibm.com>] Finish initializing top cpuset from cpu_possible_map after smp_init() called. 4) [Paul Jackson <pj@sgi.com>] Check on each call to __alloc_pages() if the current tasks cpuset mems_allowed has changed. Use a cpuset generation number, bumped on any cpuset memory placement change, to make this check efficient. Update the tasks mems_allowed from its cpuset, if the cpuset has changed. 5) [Paul Jackson <pj@sgi.com>] If a task is moved to another cpuset, then update its cpus_allowed, using set_cpus_allowed(). 6) [Paul Jackson <pj@sgi.com>] Update Documentation/cpusets.txt to reflect above changes (4) and (5). I continue to recommend the following patch for inclusion in your 2.6.9-*mm series, when that opens. It provides an important facility for high performance computing on large systems. Simon Derr of Bull (France) and myself are the primary authors. Erich Focht has indicated that NEC is also a potential user of this patch on the TX-7 NUMA machines, and that he "would very much welcome the inclusion of cpusets." I offer this update to lkml, in order to invite continued feedback. The one prerequiste patch for this cpuset patch was just posted before this one. That was a patch to provide a new bitmap list format, of which cpusets is the first user. This patch has been built on top of 2.6.8.1-mm1, for the arch's: i386 x86_64 sparc ia64 powerpc-405 powerpc-750 sparc64 with and without CONFIG_CPUSET. It has been booted and tested on ia64 (sn2_defconfig, SN2 hardware). The 'alpha' arch also built, except for what seems to be an unrelated toolchain problem (crosstool ld sigsegv) in the final link step. === Cpusets provide a mechanism for assigning a set of CPUs and Memory Nodes to a set of tasks. Cpusets constrain the CPU and Memory placement of tasks to only the processor and memory resources within a tasks current cpuset. They form a nested hierarchy visible in a virtual file system. These are the essential hooks, beyond what is already present, required to manage dynamic job placement on large systems. Cpusets require small kernel hooks in init, exit, fork, mempolicy, sched_setaffinity, page_alloc and vmscan. And they require a "struct cpuset" pointer, a cpuset_mems_generation, and a "mems_allowed" nodemask_t (to go along with the "cpus_allowed" cpumask_t that's already there) in each task struct. These hooks: 1) establish and propagate cpusets, 2) enforce CPU placement in sched_setaffinity, 3) enforce Memory placement in mbind and sys_set_mempolicy, 4) restrict page allocation and scanning to mems_allowed, and 5) restrict migration and set_cpus_allowed to cpus_allowed. The other required hook, restricting task scheduling to CPUs in a tasks cpus_allowed mask, is already present. Cpusets extend the usefulness of, the existing placement support that was added to Linux 2.6 kernels: sched_setaffinity() for CPU placement, and mbind() and set_mempolicy() for memory placement. On smaller or dedicated use systems, the existing calls are often sufficient. On larger NUMA systems, running more than one, performance critical, job, it is necessary to be able to manage jobs in their entirety. This includes providing a job with exclusive CPU and memory that no other job can use, and being able to list all tasks currently in a cpuset. A given job running within a cpuset, would likely use the existing placement calls to manage its CPU and memory placement in more detail. Cpusets are named, nested sets of CPUs and Memory Nodes. Each cpuset is represented by a directory in the cpuset virtual file system, normally mounted at /dev/cpuset. Each cpuset directory provides the following files, which can be read and written: cpus: List of CPUs allowed to tasks in that cpuset. mems: List of Memory Nodes allowed to tasks in that cpuset. tasks: List of pid's of tasks in that cpuset. cpu_exclusive: Flag (0 or 1) - if set, cpuset has exclusive use of its CPUs (no sibling or cousin cpuset may overlap CPUs). mem_exclusive: Flag (0 or 1) - if set, cpuset has exclusive use of its Memory Nodes (no sibling or cousin may overlap). notify_on_release: Flag (0 or 1) - if set, then /sbin/cpuset_release_agent will be invoked, with the name (/dev/cpuset relative path) of that cpuset in argv[1], when the last user of it (task or child cpuset) goes away. This supports automatic cleanup of abandoned cpusets. In addition one new filetype is added to the /proc file system: /proc/<pid>/cpuset: For each task (pid), list its cpuset path, relative to the root of the cpuset file system. This file is read-only. New cpusets are created using 'mkdir' (at the shell or in C). Old ones are removed using 'rmdir'. The above files are accessed using read(2) and write(2) system calls, or shell commands such as 'cat' and 'echo'. The CPUs and Memory Nodes in a given cpuset are always a subset of its parent. The root cpuset has all possible CPUs and Memory Nodes in the system. A cpuset may be exclusive (cpu or memory) only if its parent is similarly exclusive. See further Documentation/cpusets.txt, at the top of the following patch. /proc interface: It is useful, when learning and making new uses of cpusets and placement to be able to see what are the current value of a tasks cpus_allowed and mems_allowed, which are the actual placement used by the kernel scheduler and memory allocator. The cpus_allowed and mems_allowed values are needed by user space apps that are micromanaging placement, such as when moving an app to a obtained by that app within its cpuset using sched_setaffinity, mbind and set_mempolicy. The cpus_allowed value is also available via the sched_getaffinity system call. But since the entire rest of the cpuset API, including the display of mems_allowed added here, is via an ascii style presentation in /proc and /dev/cpuset, it is worth the extra couple lines of code to display cpus_allowed in the same way. This patch adds the display of these two fields to the 'status' file in the /proc/<pid> directory of each task. The fields are only added if CONFIG_CPUSETS is enabled (which is also needed to define the mems_allowed field of each task). The new output lines look like: $ tail -2 /proc/1/status Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff Mems_allowed: ffffffff,ffffffff Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Simon Derr <simon.derr@bull.net> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-07[PATCH] posix-timers: high-resolution CPU clocks for POSIX clock_* syscallsRoland McGrath
This patch provides support for thread and process CPU time clocks in the POSIX clock interface. Both the existing utime and utime+stime information (already available via getrusage et al) can be used, as well as a new (potentially) more precise and accurate clock (which cannot distinguish user from system time). The clock used is that provided by the `sched_clock' function already used internally by the scheduler. This gives a way for platforms to provide the highest-resolution CPU time tracking that is available cheaply, and some already do so (such as x86 using TSC). Because this clock is already sampled internally by the scheduler, this new tracking adds only the tiniest new overhead to accomplish the bookkeeping. Some notes: This allows per-thread clocks to be accessed only by other threads in the same process. The only POSIX calls that access these are defined only for in-process use, and having this check is necessary for the userland implementations of the POSIX clock functions to robustly refuse stale clockid_t's in the face of potential PID reuse. This makes no constraint on who can see whose per-process clocks. This information is already available for the VIRT and PROF (i.e. utime and stime) information via /proc. I am open to suggestions on if/how security constraints on who can see whose clocks should be imposed. The SCHED clock information is now available only via clock_* syscalls. This means that per-thread information is not available outside the process. Perhaps /proc should show sched_time as well? This would let ps et al show this more-accurate information. When this code is merged, it will be supported in glibc. I have written the support and added some test programs for glibc, which are what I mainly used to test the new kernel code. You can get those here: http://people.redhat.com/roland/glibc/kernel-cpuclocks.patch Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-07[PATCH] seccomp: secure computing supportAndrea Arcangeli
I'd need it merged into mainline at some point, unless anybody has strong arguments against it. All I can guarantee here, is that I'll back it out myself in the future, iff Cpushare will fail and nobody else started using it in the meantime for similar security purposes. (akpm: project details are at http://www.cpushare.com/technical. It seems like a good idea to me, and one which is worth supporting. I agree that for this to be successful, the added robustness of Andrea's simple and specific jail is worthwhile). Signed-off-by: Andrea Arcangeli <andrea@cpushare.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-01-31kernel/configs.c: make a variable staticAdrian Bunk
This patch makes a needlessly global variable static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Randy Dunlap <rddunlap@osdl.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2004-11-01[PATCH] standalone sys_ni.c for not-implemented syscallsPeter Chubb
Sticking the not-implemented syscall stuff in sys.c is a pain because the cond_syscall()s explode when certain prototypes are in scope. And we need those prototypes' header files for the C code in sys.c. Fix all that up by moving all the sys_ni_syscall code into its own .c file. Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-18Trivial Makefile mergeLinus Torvalds
2004-10-18[PATCH] A simple FIFO implementationStelian Pop
A simple ringbuffer implementation for various character drivers. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-18mergeGreg Kroah-Hartman
2004-10-18[PATCH] move waitqueue functions to kernel/wait.cWilliam Lee Irwin III
The following patch series consolidates the various instances of waitqueue hashing to use a uniform structure and share the per-zone hashtable among all waitqueue hashers. This is expected to increase the number of hashtable buckets available for waiting on bh's and inodes and eliminate statically allocated kernel data structures for greater node locality and reduced kernel image size. Some attempt was made to look similar to Oleg Nesterov's suggested API in order to provide some kind of credit for independent invention of something very similar (the original versions of these patches predated my public postings on the subject of filtered waitqueues). These patches have the further benefit and intention of enabling aio to use filtered wakeups by standardizing the data structure passed to wake functions so that embedded waitqueue elements in aio structures may be succesfully passed to the filtered wakeup wake functions, though this patch series doesn't implement that particular functionality. Successfully stress-tested on x86-64, and ia64 in recent prior versions. This patch: Move waitqueue -related functions not needing static functions in sched.c to kernel/wait.c Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-18[PATCH] generic irq subsystem: coreIngo Molnar
The main goal of this patch is to consolidate all the different but still fundamentally similar arch/*/kernel/irq.c code into the kernel/irq/ subsystem. There are 4 new files in the kernel/irq/ directory: - handle.c: core bits: __do_IRQ() and handle_IRQ_event(), callable from arch-specific irq.c code. - manage.c: the main driver apis - spurious.c: the handling of buggy interrupt sources. - autoprobe.c: probing of interrupts - older code but still in use. - proc.c: /proc/irq/ code. - internals.h for irq-core-internal interfaces not visible to drivers nor arch PIC code. An architecture enables the generic hardirq code by defining CONFIG_GENERIC_HARDIRQS in its arch Kconfig. People doing this conversion should check out the x86/x64/ppc/ppc64 patches for details - the conversion is quite straightforward but every converted function (i.e. every function removed from the arch irq.c) _must_ be matched to the generic version and if there is any detail that the generic code should do it has to be added to the generic code. All of the currently converted 4 architectures were converted like that, and the generic code was extended/fixed along the way. Other changes related to this patchset: - clean up the irq include files (linux/irq.h, linux/interrupt.h, linux/hardirq.h) and consolidate asm-*/[hard]irq.h. Note, to keep all non-touched architectures in an untouched state this consolidation is done carefully and strictly under CONFIG_GENERIC_HARDIRQS. Once the consolidation is done we can do a couple of final cleanups to reach the following logical splitup of 3 include files: linux/interrupt.h: driver-visible APIs and details linux/irq.h: core irq and arch-PIC code, internals asm-*/irq.h: arch PIC and irq delivery details the following include files will likely vanish: linux/hardirq.h merges into linux/irq.h asm-*/hardirq.h: merges into asm-*/irq.h asm-*/hw_irq.h: merges into asm-*/irq.h Christoph would like to do these once the current wave of cleanups gets in. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-09-12Merge kroah.com:/home/greg/linux/BK/bleed-2.6Greg Kroah-Hartman
into kroah.com:/home/greg/linux/BK/driver-2.6
2004-09-04ksysfs: don't build ksysfs if CONFIG_SYSFS is not enabled.Greg Kroah-Hartman
Thanks to Kay Sievers for pointing this out. Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
2004-09-04[PATCH] out-of-line locks / genericZwane Mwaikambo
This patch achieves out of line spinlocks by creating kernel/spinlock.c and using the _raw_* inline locking functions. Now, as much as this is supposed to be arch agnostic, there was still a fair amount of rummaging about in archs, mostly for the cases where the arch already has out of line locks and i wanted to avoid the extra call, saving that extra call also makes lock profiling easier. PPC32/64 was an example of such an arch and i have added the necessary profile_pc() function as an example. Size differences are with CONFIG_PREEMPT enabled since we wanted to determine how much could be saved by moving that lot out of line too. ppc64 = 259897 bytes: text data bss dec hex filename 5489808 1962724 709064 8161596 7c893c vmlinux-after 5749577 1962852 709064 8421493 808075 vmlinux-before sparc64 = 193368 bytes: text data bss dec hex filename 3472037 633712 308920 4414669 435ccd vmlinux-after 3665285 633832 308920 4608037 465025 vmlinux-before i386 = 416075 bytes text data bss dec hex filename 5808371 867442 326864 7002677 6ada35 vmlinux-after 6221254 870634 326864 7418752 713380 vmlinux-before x86-64 = 282446 bytes text data bss dec hex filename 4598025 1450644 523632 6572301 64490d vmlinux-after 4881679 1449436 523632 6854747 68985b vmlinux-before It has been compile tested (UP, SMP, PREEMPT) on i386, x86-64, sparc, sparc64, ppc64, ppc32 and runtime tested on i386, x86-64 and sparc64. Signed-off-by: Zwane Mwaikambo <zwane@fsmlabs.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>