summaryrefslogtreecommitdiff
path: root/include/linux/kernel.h
AgeCommit message (Collapse)Author
2006-06-26[PATCH] x86_64: reliable stack trace supportJan Beulich
These are the generic bits needed to enable reliable stack traces based on Dwarf2-like (.eh_frame) unwind information. Subsequent patches will enable x86-64 and i386 to make use of this. Thanks to Andi Kleen and Ingo Molnar, who pointed out several possibilities for improvement. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-25[PATCH] Implement kasprintfJeremy Fitzhardinge
Implement kasprintf, a kernel version of asprintf. This allocates the memory required for the formatted string, including the trailing '\0'. Returns NULL on allocation failure. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] remove unlikely() in might_sleep_if()Hua Zhong
The likely() profiling tools show that __alloc_page() causes a lot of misses: ! 132 119193 __alloc_pages():mm/page_alloc.c@937 Because most __alloc_page() calls are not atomic. Signed-off-by: Hua Zhong <hzhong@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] writeback: fix range handlingOGAWA Hirofumi
When a writeback_control's `start' and `end' fields are used to indicate a one-byte-range starting at file offset zero, the required values of .start=0,.end=0 mean that the ->writepages() implementation has no way of telling that it is being asked to perform a range request. Because we're currently overloading (start == 0 && end == 0) to mean "this is not a write-a-range request". To make all this sane, the patch changes range of writeback_control. So caller does: If it is calling ->writepages() to write pages, it sets range (range_start/end or range_cyclic) always. And if range_cyclic is true, ->writepages() thinks the range is cyclic, otherwise it just uses range_start and range_end. This patch does, - Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h -1 is usually ok for range_end (type is long long). But, if someone did, range_end += val; range_end is "val - 1" u64val = range_end >> bits; u64val is "~(0ULL)" or something, they are wrong. So, this adds LLONG_MAX to avoid nasty things, and uses LLONG_MAX for range_end. - All callers of ->writepages() sets range_start/end or range_cyclic. - Fix updates of ->writeback_index. It seems already bit strange. If it starts at 0 and ended by check of nr_to_write, this last index may reduce chance to scan end of file. So, this updates ->writeback_index only if range_cyclic is true or whole-file is scanned. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Nathan Scott <nathans@sgi.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Steven French <sfrench@us.ibm.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-15[PATCH] symbol_put_addr() locks kernelTrent Piepho
Even since a previous patch: Fix race between CONFIG_DEBUG_SLABALLOC and modules Sun, 27 Jun 2004 17:55:19 +0000 (17:55 +0000) http://www.kernel.org/git/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commit;h=92b3db26d31cf21b70e3c1eadc56c179506d8fbe The function symbol_put_addr() will deadlock the kernel. symbol_put_addr() would acquire modlist_lock, then while holding the lock call two functions kernel_text_address() and module_text_address() which also try to acquire the same lock. This deadlocks the kernel of course. This patch changes symbol_put_addr() to not acquire the modlist_lock, it doesn't need it since it never looks at the module list directly. Also, it now uses core_kernel_text() instead of kernel_text_address(). The latter has an additional check for addr inside a module, but we don't need to do that since we call module_text_address() (the same function kernel_text_address uses) ourselves. Signed-off-by: Trent Piepho <xyzzy@speakeasy.org> Cc: Zwane Mwaikambo <zwane@fsmlabs.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Johannes Stezenbach <js@linuxtv.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-11[PATCH] the scheduled unexport of panic_timeoutAdrian Bunk
Implement the scheduled unexport of panic_timeout. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] Notifier chain update: API changesAlan Stern
The kernel's implementation of notifier chains is unsafe. There is no protection against entries being added to or removed from a chain while the chain is in use. The issues were discussed in this thread: http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 We noticed that notifier chains in the kernel fall into two basic usage classes: "Blocking" chains are always called from a process context and the callout routines are allowed to sleep; "Atomic" chains can be called from an atomic context and the callout routines are not allowed to sleep. We decided to codify this distinction and make it part of the API. Therefore this set of patches introduces three new, parallel APIs: one for blocking notifiers, one for atomic notifiers, and one for "raw" notifiers (which is really just the old API under a new name). New kinds of data structures are used for the heads of the chains, and new routines are defined for registration, unregistration, and calling a chain. The three APIs are explained in include/linux/notifier.h and their implementation is in kernel/sys.c. With atomic and blocking chains, the implementation guarantees that the chain links will not be corrupted and that chain callers will not get messed up by entries being added or removed. For raw chains the implementation provides no guarantees at all; users of this API must provide their own protections. (The idea was that situations may come up where the assumptions of the atomic and blocking APIs are not appropriate, so it should be possible for users to handle these things in their own way.) There are some limitations, which should not be too hard to live with. For atomic/blocking chains, registration and unregistration must always be done in a process context since the chain is protected by a mutex/rwsem. Also, a callout routine for a non-raw chain must not try to register or unregister entries on its own chain. (This did happen in a couple of places and the code had to be changed to avoid it.) Since atomic chains may be called from within an NMI handler, they cannot use spinlocks for synchronization. Instead we use RCU. The overhead falls almost entirely in the unregister routine, which is okay since unregistration is much less frequent that calling a chain. Here is the list of chains that we adjusted and their classifications. None of them use the raw API, so for the moment it is only a placeholder. ATOMIC CHAINS ------------- arch/i386/kernel/traps.c: i386die_chain arch/ia64/kernel/traps.c: ia64die_chain arch/powerpc/kernel/traps.c: powerpc_die_chain arch/sparc64/kernel/traps.c: sparc64die_chain arch/x86_64/kernel/traps.c: die_chain drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list kernel/panic.c: panic_notifier_list kernel/profile.c: task_free_notifier net/bluetooth/hci_core.c: hci_notifier net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain net/ipv6/addrconf.c: inet6addr_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain net/netlink/af_netlink.c: netlink_chain BLOCKING CHAINS --------------- arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain arch/s390/kernel/process.c: idle_chain arch/x86_64/kernel/process.c idle_notifier drivers/base/memory.c: memory_chain drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list drivers/macintosh/adb.c: adb_client_list drivers/macintosh/via-pmu.c sleep_notifier_list drivers/macintosh/via-pmu68k.c sleep_notifier_list drivers/macintosh/windfarm_core.c wf_client_list drivers/usb/core/notify.c usb_notifier_list drivers/video/fbmem.c fb_notifier_list kernel/cpu.c cpu_chain kernel/module.c module_notify_list kernel/profile.c munmap_notifier kernel/profile.c task_exit_notifier kernel/sys.c reboot_notifier_list net/core/dev.c netdev_chain net/decnet/dn_dev.c: dnaddr_chain net/ipv4/devinet.c: inetaddr_chain It's possible that some of these classifications are wrong. If they are, please let us know or submit a patch to fix them. Note that any chain that gets called very frequently should be atomic, because the rwsem read-locking used for blocking chains is very likely to incur cache misses on SMP systems. (However, if the chain's callout routines may sleep then the chain cannot be atomic.) The patch set was written by Alan Stern and Chandra Seetharaman, incorporating material written by Keith Owens and suggestions from Paul McKenney and Andrew Morton. [jes@sgi.com: restructure the notifier chain initialization macros] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25[PATCH] roundup_pow_of_two() 64-bit fixAndrew Morton
fls() takes an integer, so roundup_pow_of_two() is busted for ulongs larger than 2^32-1. Fix this by implementing and using fls_long(). (Why does roundup_pow_of_two() return a long?) (Why is roundup_pow_of_two() __attribute_const__ whereas long_log2() is __attribute_pure__?) (Why does long_log2() suck so much? Because we were missing fls_long()?) Cc: Roland Dreier <rdreier@cisco.com> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Cc: John Hawkes <hawkes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23[PATCH] pause_on_oops command line optionAndrew Morton
Attempt to fix the problem wherein people's oops reports scroll off the screen due to repeated oopsing or to oopses on other CPUs. If this happens the user can reboot with the `pause_on_oops=<seconds>' option. It will allow the first oopsing CPU to print an oops record just a single time. Second oopsing attempts, or oopses on other CPUs will cause those CPUs to enter a tight loop until the specified number of seconds have elapsed. The patch implements the infrastructure generically in the expectation that architectures other than x86 will find it useful. Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-17[PATCH] x86_64: Add boot option to disable randomized mappings and cleanupAndi Kleen
AMD SimNow!'s JIT doesn't like them at all in the guest. For distribution installation it's easiest if it's a boot time option. Also I moved the variable to a more appropiate place and make it independent from sysctl And marked __read_mostly which it is. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-24[ACPI] merge 3549 4320 4485 4588 4980 5483 5651 acpica asus fops pnpacpi ↵Len Brown
branches into release Signed-off-by: Len Brown <len.brown@intel.com>
2006-01-17[IPV6]: Preserve procfs IPV6 address output formatYOSHIFUJI Hideaki
Procfs always output IPV6 addresses without the colon characters, and we cannot change that. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-13[NET]: Use NIP6_FMT in kernel.hJoe Perches
There are errors and inconsistency in the display of NIP6 strings. ie: net/ipv6/ip6_flowlabel.c There are errors and inconsistency in the display of NIPQUAD strings too. ie: net/netfilter/nf_conntrack_ftp.c This patch: adds NIP6_FMT to kernel.h changes all code to use NIP6_FMT fixes net/ipv6/ip6_flowlabel.c adds NIPQUAD_FMT to kernel.h fixes net/netfilter/nf_conntrack_ftp.c changes a few uses of "%u.%u.%u.%u" to NIPQUAD_FMT for symmetry to NIP6_FMT Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10[PATCH] dump_thread() cleanupakpm@osdl.org
) From: Adrian Bunk <bunk@stusta.de> - create one common dump_thread() prototype in kernel.h - dump_thread() is only used in fs/binfmt_aout.c and can therefore be removed on all architectures where CONFIG_BINFMT_AOUT is not available Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-09[PATCH] mutex subsystem, add typecheck_fn(type, function)Chuck Ebbert
add typecheck_fn(type, function) to do type-checking of function pointers. Modified-by: Ingo Molnar <mingo@elte.hu> (made it typeof() based, instead of typedef based.) Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2006-01-08[PATCH] remove gcc-2 checksAndrew Morton
Remove various things which were checking for gcc-1.x and gcc-2.x compilers. From: Adrian Bunk <bunk@stusta.de> Some documentation updates and removes some code paths for gcc < 3.2. Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-15[ACPI] fix reboot upon suspend-to-diskAlexey Starikovskiy
http://bugzilla.kernel.org/show_bug.cgi?id=4320 Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Len Brown <len.brown@intel.com>
2005-11-07[PATCH] __deprecated_for_modules: panic_timeoutAdrian Bunk
This looks like something which out-of-tree code could possibly be using. Give panic_timeout the twelve-month treatment. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07[PATCH] kernel-docs: fix kernel-doc format problemsRandy Dunlap
Convert to proper kernel-doc format. Some have extra blank lines (not allowed immed. after the function name) or need blank lines (after all parameters). Function summary must be only one line. Colon (":") in a function description does weird things (causes kernel-doc to think that it's a new section head sadly). Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30[PATCH] include/linux/kernel.h:BUILD_BUG_ON(): fix a commentNikita Danilov
Fix comment describing BUILD_BUG_ON: BUG_ON is not an assertion (unfortunately). Signed-off-by: Nikita Danilov <nikita@clusterfs.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13[PATCH] Make BUILD_BUG_ON fail at compile time.Andi Kleen
Force a compiler error instead of a link error, because they are easier to track down. Idea stolen from code by Jan Beulich <jbeulich@novell.com> If the argument to BUILD_BUG_ON evaluates to non-zero the compiler will do: t.c:6: error: size of array `type name' is negative (surprised that gcc doesn't have an extension for this) Signed-off-by: "Andi Kleen" <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25[PATCH] sched: voluntary kernel preemptionIngo Molnar
This patch adds a new preemption model: 'Voluntary Kernel Preemption'. The 3 models can be selected from a new menu: (X) No Forced Preemption (Server) ( ) Voluntary Kernel Preemption (Desktop) ( ) Preemptible Kernel (Low-Latency Desktop) we still default to the stock (Server) preemption model. Voluntary preemption works by adding a cond_resched() (reschedule-if-needed) call to every might_sleep() check. It is lighter than CONFIG_PREEMPT - at the cost of not having as tight latencies. It represents a different latency/complexity/overhead tradeoff. It has no runtime impact at all if disabled. Here are size stats that show how the various preemption models impact the kernel's size: text data bss dec hex filename 3618774 547184 179896 4345854 424ffe vmlinux.stock 3626406 547184 179896 4353486 426dce vmlinux.voluntary +0.2% 3748414 548640 179896 4476950 445016 vmlinux.preempt +3.5% voluntary-preempt is +0.2% of .text, preempt is +3.5%. This feature has been tested for many months by lots of people (and it's also included in the RHEL4 distribution and earlier variants were in Fedora as well), and it's intended for users and distributions who dont want to use full-blown CONFIG_PREEMPT for one reason or another. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01[PATCH] clean up kernel messagesMatt Mackall
Arrange for all kernel printks to be no-ops. Only available if CONFIG_EMBEDDED. This patch saves about 375k on my laptop config and nearly 100k on minimal configs. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-28[PATCH] Enable gcc warnings for vsprintf/vsnprintf with "format" attributeSolar Designer
Extend the gcc printk format-string checking. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-11[PATCH] docbook: new kernel-doc comments for might_sleep & wait_event_*Martin Waitz
New kernel-doc comments for might_sleep & wait_event_* Signed-off-by: Martin Waitz <tali@admingilde.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-04[PATCH] Randomisation: enable by defaultArjan van de Ven
Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-04[PATCH] Randomisation: global sysctlArjan van de Ven
This first patch of the series introduces a sysctl (default off) that enables/disables the randomisation feature globally. Since randomisation may make it harder to debug really tricky situations (reproducability goes down), the sysadmin needs a way to disable it globally. Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-01-04[PATCH] panic_timeout: move to kernel.hRandy Dunlap
Move 'panic_timeout' to linux/kernel.h. ipmi_watchdog.c wanted to know why panic_timeout isn't in some header file. However, ipmi_watchdog.c doesn't even use it, so that reference was deleted. Other references now use kernel.h instead of straight extern int. Signed-off-by: Randy Dunlap <rddunlap@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-01-04[PATCH] GP-REL data supportDavid Howells
The attached patch makes it possible to support gp-rel addressing for small variables. Since the FR-V cpu's have fixed-length instructions and plenty of general-purpose registers, one register is nominated as a base for the small data area. This makes it possible to use single-insn accesses to access global and static variables instead of having to use multiple instructions. This, however, causes problems with small variables used to pinpoint the beginning and end of sections. The compiler assumes it can use gp-rel addressing for these, but the linker then complains because the displacement is out of range. By declaring certain variables as arrays or by forcing them into named sections, the compiler is persuaded to access them as if they can be outside the displacement range. Declaring the variables as "const void" type also works. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-11-03x86: regparm calling convention for exceptions and interrupts.Linus Torvalds
This clarifies more of the x86 caller/callee stack ownership issues by making the exception and interrupt handler assembler interfaces use register calling conventions. System calls still use the stack. Tested with "crashme" on UP/SMP.
2004-11-01[PATCH] Add panic blinking to 2.6Andi Kleen
This patch readds the panic blinking that was in 2.4 to 2.6. This is useful to see when you're in X that the machine has paniced It addresses previously criticism. It should work now when the keyboard interrupt is off. It doesn't fully emulate the handler, but has a timeout for this case. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-18[PATCH] taint on bad_pageNick Piggin
Hugh and I both thought this would be generally useful. Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-18[PATCH] x86-64/i386: add mce taintingAndi Kleen
This patch adds machine check tainting. When a handled machine check occurs the oops gets a new 'M' flag. This is useful to ignore machines with hardware problems in oops reports. On i386 a thermal failure also sets this flag. Done for x86-64 and i386 so far. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-09-22[PATCH] implement roundup_pow_two()Ryan Cumming
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-22[PATCH] vprintk supportMatt Mackall
Add vprintk call. This lets us directly pass varargs stuff to the console without using vsnprintf to an intermediate buffer. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-06-26[PATCH] Fix race between CONFIG_DEBUG_SLABALLOC and modulesRusty Russell
store_stackinfo() does an unlocked module list walk during normal runtime which opens up a race with the module load/unload code. This can be triggered by simply unloading and loading a module in a loop with CONFIG_DEBUG_PAGEALLOC resulting in store_stackinfo() tripping over bad list pointers. kernel_text_address doesn't take any locks, because during an OOPS we don't want to deadlock. Rename that to __kernel_text_address, and make kernel_text_address take the lock. Signed-off-by: Zwane Mwaikambo <zwane@fsmlabs.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (modified) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-06-23[PATCH] abs() fixesAndrew Morton
OK, the pending abs() disaster has hit: drivers/usb/class/audio.c:404: warning: static declaration of 'abs' follows non-static declaration This is due to the declaration in kernel.h. AFAIK there's not even a matching definition for that. The patch implements abs() as a macro in kernel.h and kills off various private implementations. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-06-20[PATCH] Permit inode & dentry hash tables to be allocated > MAX_ORDER sizeDavid Howells
Here's a patch to allocate memory for big system hash tables with the bootmem allocator rather than with main page allocator. It is needed for three reasons: (1) So that the size can be bigger than MAX_ORDER. IBM have done some testing on their big PPC64 systems (64GB of RAM) with linux-2.4 and found that they get better performance if the sizes of the inode cache hash, dentry cache hash, buffer head hash and page cache hash are increased beyond MAX_ORDER (order 11). Now the main allocator can't allocate anything larger than MAX_ORDER, but the bootmem allocator can. In 2.6 it appears that only the inode and dentry hashes remain of those four, but there are other hash tables that could use this service. (2) Changing MAX_ORDER appears to have a number of effects beyond just limiting the maximum size that can be allocated in one go. (3) Should someone want a hash table in which each bucket isn't a power of two in size, memory will be wasted as the chunk of memory allocated will be a power of two in size (to hold a power of two number of buckets). On the other hand, using the bootmem allocator means the allocation will only take up sufficient pages to hold it, rather than the next power of two up. Admittedly, this point doesn't apply to the dentry and inode hashes, but it might to another hash table that might want to use this service. I've coelesced the meat of the inode and dentry allocation routines into one such routine in mm/page_alloc.c that the the respective initialisation functions now call before mem_init() is called. This routine gets it's approximation of memory size by counting up the ZONE_NORMAL and ZONE_DMA pages (and ZONE_HIGHMEM if requested) in all the nodes passed to the main allocator by paging_init() (or wherever the arch does it). It does not use max_low_pfn as that doesn't seem to be available on all archs, and it doesn't use num_physpages since that includes highmem pages not available to the kernel for allocating data structures upon - which may not be appropriate when calculating hash table size. On the off chance that the size of each hash bucket may not be exactly a power of two, the routine will only allocate as many pages as is necessary to ensure that the number of buckets is exactly a power of two, rather than allocating the smallest power-of-two sized chunk of memory that will hold the same array of buckets. The maximum size of any single hash table is given by MAX_SYS_HASH_TABLE_ORDER, as is now defined in linux/mmzone.h. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-05-19[PATCH] system_state splitupAndrew Morton
Split the system_state state `SYSTEM_SHUTDOWN' into SYSTEM_HALT, SYSTEM_POWER_OFF and SYSTEM_RESTART and export system_state to modules. This allows driver shutdown routines to know why they are being shutdown. The IDE subsystem wants this so that it knows to not spin the disks down across a reboot.
2004-04-11[PATCH] generalise system_runningAndrew Morton
From: Olof Johansson <olof@austin.ibm.com> It's currently a boolean, but that means that system_running goes to zero again when shutting down. So we then use code (in the page allocator) which is only designed to be used during bootup - it is marked __init. So we need to be able to distinguish early boot state from late shutdown state. Rename system_running to system_state and give it the three appropriate states.
2004-03-09Remove 'const' from min/max, to avoid gcc warning about double usage.Linus Torvalds
2004-02-18[PATCH] snprintf fixesAndrew Morton
From: Juergen Quade <quade@hsnr.de> Lots of places in the kernel are using [v]snprintf wrongly: they assume it returns the number of characters copied. It doesn't. It returns the number of characters which _would_ have been copied had the buffer not been filled up. So create new functions vscnprintf() and scnprintf() which have the expected (sane) semaptics, and migrate callers over to using them.
2004-02-05[NET]: Simply net_ratelimit().Andrew Morton
Reimplement net_ratelimit() in terms of the new printk_ratelimit(). As net_ratelimit() already has it own sysctls we generalise printk_ratelimit() a bit so that networking does not lose its existing sysctls and so that it can use different time constants from the more generic printk_ratelimit().
2004-01-18[PATCH] generalise net_ratelimit (printk_ratelimit)Andrew Morton
From: Anton Blanchard <anton@samba.org> Generate a global printk rate-limiting function, printk_ratelimit(). Also, use it in the page allocator warning code. Also add a dump_stack to that code. Later, we need to switch net_ratelimit() over to use printk_ratelimit().
2003-12-29[PATCH] sqrt() fixesAndrew Morton
It turns out that the int_sqrt() function in oom_kill.c gets it wrong. But fb_sqrt() in fbmon.c gets its math right. Move that function into lib/int_sqrt.c, and consolidate. (oom_kill.c fix from Thomas Schlichter <schlicht@uni-mannheim.de>)
2003-10-21[PATCH] export system_running to other filesAndrew Morton
There seems to be no header file which declares system_running.
2003-09-21[PATCH] ECC supportAndrew Morton
From: "Nakajima, Jun" <jun.nakajima@intel.com> Split the increasingly messy compiler.h file into per-compiler files and also add support for non-gcc compilers. With the current implementation: include/linux/compiler.h defines the compiler-dependent abstractions which can be overwritten by per-compiler definitions. include/linux/compiler-gcc.h contains the common definitions for all gcc versions. include/linux/compiler-gcc[2,3,+].h contains gcc major version specific definitions. include/linux/compiler-intel.h contains intel compiler specific definitions."
2003-09-03[PATCH] might_sleep() improvementsAndrew Morton
From: Mitchell Blank Jr <mitch@sfgoth.com> This patch makes the following improvements to might_sleep(): o Add a "might_sleep_if()" macro for when we might sleep only if some condition is met. It's a bit tidier, and has an unlikely() in it. o Add might_sleep checks to skb_share_check() and skb_unshare() which sometimes need to allocate memory. o Make all architectures call might_sleep() in both down() and down_interruptible(). Before only ppc, ppc64, and i386 did this check. (sh did the check on down() but not down_interruptible())
2003-06-13[PATCH] Fix typo in commentJörn Engel
2003-06-06[PATCH] Move BUG/BUG_ON/WARN_ON to asm headersPaul Mackerras
This patch moves the definitions of BUG, BUG_ON and WARN_ON from <linux/kernel.h> to <asm/bug.h> (which <linux/kernel.h> includes), and supplies a new implementation for PPC which uses a conditional trap instruction for BUG_ON and WARN_ON, thus avoiding a conditional branch. This patch trims over 50kB from the size of the kernel that I use on powermacs. With this patch, on PPC we have a __bug_table section in the vmlinux binary, and also in modules if they use BUG, BUG_ON or WARN_ON. The __bug_table section has one entry for each BUG/BUG_ON/WARN_ON, giving the address of the trap instruction and the corresponding line number, filename and function name. This information is used in the exception handler for the exception that the trap instruction produces. The arch-specific module code handles the __bug_table section so that BUG/BUG_ON/WARN_ON work correctly in modules. Several architecture maintainers have acked this change. It should be completely benign for all of the other architectures (though they may decide to do something similar if they have a conditional trap instruction available).