| Age | Commit message (Collapse) | Author |
|
Move the headers to include/asm-x86 and fixup the
header install make rules
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
per cpu data section contains two types of data. One set which is
exclusively accessed by the local cpu and the other set which is per cpu,
but also shared by remote cpus. In the current kernel, these two sets are
not clearely separated out. This can potentially cause the same data
cacheline shared between the two sets of data, which will result in
unnecessary bouncing of the cacheline between cpus.
One way to fix the problem is to cacheline align the remotely accessed per
cpu data, both at the beginning and at the end. Because of the padding at
both ends, this will likely cause some memory wastage and also the
interface to achieve this is not clean.
This patch:
Moves the remotely accessed per cpu data (which is currently marked
as ____cacheline_aligned_in_smp) into a different section, where all the data
elements are cacheline aligned. And as such, this differentiates the local
only data and remotely accessed data cleanly.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Rather than using a single constant PERCPU_ENOUGH_ROOM, compute it as
the sum of kernel_percpu + PERCPU_MODULE_RESERVE. This is now common
to all architectures; if an architecture wants to set
PERCPU_ENOUGH_ROOM to something special, then it may do so (ia64 is
the only one which does).
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <ak@suse.de>
|
|
Trivial typo fix in the "syntax error if percpu macros are incorrectly
used" patch. I misspelled "identifier" in all places. D'Oh!
Thanks to Dirk Mueller to point this out.
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
[PATCH] Don't set calgary iommu as default y
[PATCH] i386/x86-64: New Intel feature flags
[PATCH] x86: Add a cumulative thermal throttle event counter.
[PATCH] i386: Make the jiffies compares use the 64bit safe macros.
[PATCH] x86: Refactor thermal throttle processing
[PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
[PATCH] Fix unwinder warning in traps.c
[PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
[PATCH] x86: Move direct PCI scanning functions out of line
[PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
[PATCH] Don't leak NT bit into next task
[PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
[PATCH] Fix some broken white space in ia32_signal.c
[PATCH] Initialize argument registers for 32bit signal handlers.
[PATCH] Remove all traces of signal number conversion
[PATCH] Don't synchronize time reading on single core AMD systems
[PATCH] Remove outdated comment in x86-64 mmconfig code
[PATCH] Use string instructions for Core2 copy/clear
[PATCH] x86: - restore i8259A eoi status on resume
[PATCH] i386: Split multi-line printk in oops output.
...
|
|
get_cpu_var()/per_cpu()/__get_cpu_var() arguments must be simple
identifiers. Otherwise the arch dependent implementations might break.
This patch enforces the correct usage of the macros by producing a syntax
error if the variable is not a simple identifier.
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Now for a completely different but trivial approach.
I just boot tested it with 255 CPUS and everything worked.
Currently everything (except module data) we place in
the per cpu area we know about at compile time. So
instead of allocating a fixed size for the per_cpu area
allocate the number of bytes we need plus a fixed constant
for to be used for modules.
It isn't perfect but it is much less of a pain to
work with than what we are doing now.
AK: fixed warning
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
|
|
Add the per_cpu_offset() generic method. (used by the lock validator)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
There are several instances of per_cpu(foo, raw_smp_processor_id()), which
is semantically equivalent to __get_cpu_var(foo) but without the warning
that smp_processor_id() can give if CONFIG_DEBUG_PREEMPT is enabled. For
those architectures with optimized per-cpu implementations, namely ia64,
powerpc, s390, sparc64 and x86_64, per_cpu() turns into more and slower
code than __get_cpu_var(), so it would be preferable to use __get_cpu_var
on those platforms.
This defines a __raw_get_cpu_var(x) macro which turns into per_cpu(x,
raw_smp_processor_id()) on architectures that use the generic per-cpu
implementation, and turns into __get_cpu_var(x) on the architectures that
have an optimized per-cpu implementation.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
for_each_cpu() actually iterates across all possible CPUs. We've had
mistakes in the past where people were using for_each_cpu() where they
should have been iterating across only online or present CPUs. This is
inefficient and possibly buggy.
We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this
in the future.
This patch replaces for_each_cpu with for_each_possible_cpu.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
When we stop allocating percpu memory for not-possible CPUs we must not touch
the percpu data for not-possible CPUs at all. The correct way of doing this
is to test cpu_possible() or to use for_each_cpu().
This patch is a kernel-wide sweep of all instances of NR_CPUS. I found very
few instances of this bug, if any. But the patch converts lots of open-coded
test to use the preferred helper macros.
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Christian Zankel <chris@zankel.net>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Jens Axboe <axboe@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Helper patch to change cpu_pda users to use macros to access cpu_pda
instead of the cpu_pda[] array.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Fix (in the architectures I'm actually building for) the UP definition of
per_cpu so that the cpu specified may be any expression, not just an
identifier or a suffix expression.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: Mikael Pettersson <mikpe@csd.uu.se>
Here are some patches to fix compilation warnings from
gcc-3.4.0 in the 2.6.6-rc3 x86_64 kernel.
- puts() type conflict in boot/compressed/misc.c:
rename to putstr(), just like i386 did
- cast-as-lvalue in ia32_copy_siginfo_from_user():
use temporary
- code before declaration in io_apic.c:
move decl up
- code before declaration in ioremap.c:
move existing #ifndef up
- cast-as-lvalue (tons of them) from UP version of per_cpu():
merged asm-generic's version
|
|
Make everything compile and boot again.
- Update defconfig
- Some minor cleanup
- Introduce physid_t for APIC masks (fixes UP kernels)
- Add bandaid for CardBus bridges and broken BIOS (Vojtech)
- Add bandaid for unsynchronized TSCs (Vojtech)
- Fix ffs(0) return value (fixes XFS)
- Fix compilation with software suspend
|
|
Without these changes an x86-64 NUMA kernel won't boot in many
configurations.
The main change is the improved IOMMU code which supports merging of
mappings and has various bugfixes.
- Update defconfig
- Use argument ptregs in 32bit elf_core_copy_task_fpregs
- Harden aperture fixup code: read aperture from the AGP bridge if needed,
better error checking.
- Support nmi_watchdog=panic to panic on watchdog trigger
- IOMMU: Support panic on IOMMU overflow (iommu=panic)
- IOMMU: Force SAC for mappings >40bits when iommu=force is active
(this can potentially give better performance)
- IOMMU: Cache northbridges for faster TLB flush
- IOMMU: Fix SMP race in TLB flush
- IOMMU: Merge pci_alloc_consistent and pci_map_single
- IOMMU: Clean up leak tracing
- IOMMU: Rewrite pci_map_sg, support merging of mappings
On overflow fall back to piece-by-piece mapping.
- IOMMU: Tell block layer to assume merging when iommu force is active
(this gives better performance with MTP fusion, drawback is that the
overflow/fragmentation handling of the IOMMU area is still a big
dubious with that)
- Fix/clean up per cpu data
- Add 64bit clean time(2)
- Export cpu_callout_map for IPv6
- Handle nodes with no own memory in NUMA discovery.
This fixes boot on various newer Opteron motherboards where the memory
is only connected to a single CPU.
- Fix fallback path for failed NUMA discovery. numnodes has to be reset.
- Check for enabled nodes in NUMA discovery (Eric Biederman)
- Remove NUMA emunodes support. Has badly bitrotted.
- Add __clear_bit_string for IOMMU code
- Add new 32bit system calls to ia32_unistd.h
- Remove duplicate default_do_nmi prototype
- Make PCI_DMA_BUS_IS_PHYS dependent on no_iommu
- Fix padding length of siginfo_t to match glibc
- More pci direct access functions.
|
|
Only bug fixes and making it compile again and a few minor features.
Also one security fix that got lost earlier
- Document boot options
- Better cpu local data
- Emulate FIOQSIZE
- Fix return value of 32bit ipccall
- Various minor style fixes
- Save some memory in apic tables
- Merge with 2.6.0test2/i386
- Readd ioport fix
- Sort exception tables at boot time
- Add local.h
- Fix for_each_cpu on UP
- Add utimes and tgkill system calls for 64bit
- Update defconfig
|
|
A few updates for x86-64 in 2.5.44. Some of the bugs fixed were serious.
- Don't count ACPI mappings in end_pfn. This shrinks mem_map a lot
on many setups.
- Fix mem= option. Remove custom mapping support.
- Revert per_cpu implementation to the generic version. The optimized one
that used %gs directly triggered too many toolkit problems and was an
constant source of bugs.
- Make sure pgd_offset_k works correctly for vmalloc mappings. This makes
modules work again properly.
- Export pci dma symbols
- Export other symbols to make more modules work
- Don't drop physical address bits >32bit on iommu free.
- Add more prototypes to fix warnings
- Resync pci subsystem with i386
- Fix pci dma kernel option parsing.
- Do PCI peer bus scanning after ACPI in case it missed some busses
(that's a workaround - 2.5 ACPI seems to have some problems here that
I need to investigate more closely)
- Remove the .eh_frame on linking. This saves several hundred KB in the
bzImage
- Fix MTRR initialization. It works properly now on SMP again.
- Fix kernel option parsing, it was broken by section name changes in
init.h
- A few other cleanups and fixes.
- Fix nonatomic warning in ioport.c
|
|
And here all the other x86-64 changes that have accumulated in my tree.
It's various bugfixes and cleanups.
Changes:
- fix nmi watchdog
- remove local timer spreading over CPUs - it's useless here and caused many problems
- New offset.h computation from Kai
- Lots of changes for the C99 initializer syntax
- New MTRR driver from Dave & Mats
- Bugfix: kernel threads don't start with interrupts disabled anymore, which fixes
various boottime hangs (this was fixed a long time ago, but the bug crept in again
by the backdoor)
- Do %gs reload in context switch lockless
- Fix device_not_available entry point race
- New per CPU GDT layout following i386: the layot is not completely
compatible with i386, which may problems with Wine in theory.
Haven't seen any yet.
- Support disableapic option
- driverfs support removed for now because it caused crashes
- Updates for new signal setup
- Support for kallsyms
- Port TLS clone flags/syscalls: unfortunately made the context switch
even uglier than it already is.
- Security fixes for ptrace
- New in_interrupt()/atomic setup ported from i386
- New makefiles mostly from Kai
- Various updates ported from i386
|
|
This makes introduces get_cpu_var()/put_cpu_var() which gets a
per-cpu variable and disables preemption, and renames the (unsafe
under preemption) "this_cpu()" macro to __get_cpu_var(). It also
deletes the redundant definitions in linux/smp.h.
|
|
Here is the big 2.5.21 x86-64 sync patch. It only touches arch/x86_64
and include/asm-x86_64. It requires a few other changes that I'm sending
in separate mail.
Changes:
- merge wit 2.5.21
- merge from 2.5.21/i386 (new PCI code, new LDT code etc.)
- sync with 2.4-x86_64 tree.
- minor updates to 32bit emulation
- better early console; including serial support.
- now set up dummy PDA for booting to avoid problems
- Fix GS reloading in context switch one instruction race
- Remove hardcoded names from mpparse code
- Fix inline assembly for RAID-5 xor (similar change needed for i386)
- Real per cpu data support based on PDA field
- Cleanup of offset.c generation requested by Kai: it only puts structure
offsets into offset.h now.
- Fix i387 fxsave signal frame problems.
- Add uname emulation via personality ("linux32")
- New SSE optimized checksum-copy, copy*user, memcpy, clear_page, copy_page
functions. Other tunings/cleanups in checksum and other user memory
access function.
- check if exception table is really sorted
- Cleanups in page table handling in preparation of non executable pages
support.
- Cleanup PDA access to not require offset.h (thanks to kai for kicking me
to this)
- use long long for u64/s64 to avoid more warnings
- remove CONFIG_ISA
- fix various bugs and other cleanups
|
|
This patch brings 2.5.8 in sync with the x86-64 2.4 development tree again
(excluding device drivers)
It has lots of bug fixes and enhancements. It only touches architecture
specific files.
- Sync with 2.5.8
- SMP/APIC supported now.
- Module loading works now.
- Time keeping bugs fixed.
- entry.S streamlined and some bugs fixed.
- modify_ldt works now
- mostly rewritten FPU support (including FXRSTOR for initial FPU
initialization based on the initial state)
- 32bit emulation enhanced and bugs fixed.
- rewrote mm initialization and lots of cleanups in the page table handling
__PAGE_OFFSET is now moved to 0x10000000000 and some vmalloc/ioremap
problems have been fixed. They have an own PML4 slot now.
- WCHAN reporting support for RIP (but not RSP)
- Lots of various other bug fixes and cleanups.
Currently broken:
- ACPI
- MTRR
It needs some other bugfixes outside architecture specific code. I sent
them all in separate mail.
|