summaryrefslogtreecommitdiff
path: root/include/asm-generic/pgtable.h
AgeCommit message (Collapse)Author
2012-07-04thp: avoid atomic64_read in pmd_read_atomic for 32bit PAEAndrea Arcangeli
commit e4eed03fd06578571c01d4f1478c874bb432c815 upstream. In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding the mmap_sem for reading, cmpxchg8b cannot be used to read pmd contents under Xen. So instead of dealing only with "consistent" pmdvals in pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals where the low 32bit and high 32bit could be inconsistent (to avoid having to use cmpxchg8b). The only guarantee we get from pmd_read_atomic is that if the low part of the pmd was found null, the high part will be null too (so the pmd will be considered unstable). And if the low part of the pmd is found "stable" later, then it means the whole pmd was read atomically (because after a pmd is stable, neither MADV_DONTNEED nor page faults can alter it anymore, and we read the high part after the low part). In the 32bit PAE x86 case, it is enough to read the low part of the pmdval atomically to declare the pmd as "stable" and that's true for THP and no THP, furthermore in the THP case we also have a barrier() that will prevent any inconsistent pmdvals to be cached by a later re-read of the *pmd. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Jonathan Nieder <jrnieder@gmail.com> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Tested-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-07-04mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race conditionAndrea Arcangeli
commit 26c191788f18129af0eb32a358cdaea0c7479626 upstream. When holding the mmap_sem for reading, pmd_offset_map_lock should only run on a pmd_t that has been read atomically from the pmdp pointer, otherwise we may read only half of it leading to this crash. PID: 11679 TASK: f06e8000 CPU: 3 COMMAND: "do_race_2_panic" #0 [f06a9dd8] crash_kexec at c049b5ec #1 [f06a9e2c] oops_end at c083d1c2 #2 [f06a9e40] no_context at c0433ded #3 [f06a9e64] bad_area_nosemaphore at c043401a #4 [f06a9e6c] __do_page_fault at c0434493 #5 [f06a9eec] do_page_fault at c083eb45 #6 [f06a9f04] error_code (via page_fault) at c083c5d5 EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP: 00000000 DS: 007b ESI: 9e201000 ES: 007b EDI: 01fb4700 GS: 00e0 CS: 0060 EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246 #7 [f06a9f38] _spin_lock at c083bc14 #8 [f06a9f44] sys_mincore at c0507b7d #9 [f06a9fb0] system_call at c083becd start len EAX: ffffffda EBX: 9e200000 ECX: 00001000 EDX: 6228537f DS: 007b ESI: 00000000 ES: 007b EDI: 003d0f00 SS: 007b ESP: 62285354 EBP: 62285388 GS: 0033 CS: 0073 EIP: 00291416 ERR: 000000da EFLAGS: 00000286 This should be a longstanding bug affecting x86 32bit PAE without THP. Only archs with 64bit large pmd_t and 32bit unsigned long should be affected. With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad() would partly hide the bug when the pmd transition from none to stable, by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is enabled a new set of problem arises by the fact could then transition freely in any of the none, pmd_trans_huge or pmd_trans_stable states. So making the barrier in pmd_none_or_trans_huge_or_clear_bad() unconditional isn't good idea and it would be a flakey solution. This should be fully fixed by introducing a pmd_read_atomic that reads the pmd in order with THP disabled, or by reading the pmd atomically with cmpxchg8b with THP enabled. Luckily this new race condition only triggers in the places that must already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix is localized there but this bug is not related to THP. NOTE: this can trigger on x86 32bit systems with PAE enabled with more than 4G of ram, otherwise the high part of the pmd will never risk to be truncated because it would be zero at all times, in turn so hiding the SMP race. This bug was discovered and fully debugged by Ulrich, quote: ---- [..] pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and eax. 496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd) 497 { 498 /* depend on compiler for an atomic pmd read */ 499 pmd_t pmdval = *pmd; // edi = pmd pointer 0xc0507a74 <sys_mincore+548>: mov 0x8(%esp),%edi ... // edx = PTE page table high address 0xc0507a84 <sys_mincore+564>: mov 0x4(%edi),%edx ... // eax = PTE page table low address 0xc0507a8e <sys_mincore+574>: mov (%edi),%eax [..] Please note that the PMD is not read atomically. These are two "mov" instructions where the high order bits of the PMD entry are fetched first. Hence, the above machine code is prone to the following race. - The PMD entry {high|low} is 0x0000000000000000. The "mov" at 0xc0507a84 loads 0x00000000 into edx. - A page fault (on another CPU) sneaks in between the two "mov" instructions and instantiates the PMD. - The PMD entry {high|low} is now 0x00000003fda38067. The "mov" at 0xc0507a8e loads 0xfda38067 into eax. ---- Reported-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-04-02mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read modeAndrea Arcangeli
commit 1a5a9906d4e8d1976b701f889d8f35d54b928f25 upstream. In some cases it may happen that pmd_none_or_clear_bad() is called with the mmap_sem hold in read mode. In those cases the huge page faults can allocate hugepmds under pmd_none_or_clear_bad() and that can trigger a false positive from pmd_bad() that will not like to see a pmd materializing as trans huge. It's not khugepaged causing the problem, khugepaged holds the mmap_sem in write mode (and all those sites must hold the mmap_sem in read mode to prevent pagetables to go away from under them, during code review it seems vm86 mode on 32bit kernels requires that too unless it's restricted to 1 thread per process or UP builds). The race is only with the huge pagefaults that can convert a pmd_none() into a pmd_trans_huge(). Effectively all these pmd_none_or_clear_bad() sites running with mmap_sem in read mode are somewhat speculative with the page faults, and the result is always undefined when they run simultaneously. This is probably why it wasn't common to run into this. For example if the madvise(MADV_DONTNEED) runs zap_page_range() shortly before the page fault, the hugepage will not be zapped, if the page fault runs first it will be zapped. Altering pmd_bad() not to error out if it finds hugepmds won't be enough to fix this, because zap_pmd_range would then proceed to call zap_pte_range (which would be incorrect if the pmd become a pmd_trans_huge()). The simplest way to fix this is to read the pmd in the local stack (regardless of what we read, no need of actual CPU barriers, only compiler barrier needed), and be sure it is not changing under the code that computes its value. Even if the real pmd is changing under the value we hold on the stack, we don't care. If we actually end up in zap_pte_range it means the pmd was not none already and it was not huge, and it can't become huge from under us (khugepaged locking explained above). All we need is to enforce that there is no way anymore that in a code path like below, pmd_trans_huge can be false, but pmd_none_or_clear_bad can run into a hugepmd. The overhead of a barrier() is just a compiler tweak and should not be measurable (I only added it for THP builds). I don't exclude different compiler versions may have prevented the race too by caching the value of *pmd on the stack (that hasn't been verified, but it wouldn't be impossible considering pmd_none_or_clear_bad, pmd_bad, pmd_trans_huge, pmd_none are all inlines and there's no external function called in between pmd_trans_huge and pmd_none_or_clear_bad). if (pmd_trans_huge(*pmd)) { if (next-addr != HPAGE_PMD_SIZE) { VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem)); split_huge_page_pmd(vma->vm_mm, pmd); } else if (zap_huge_pmd(tlb, vma, pmd, addr)) continue; /* fall through */ } if (pmd_none_or_clear_bad(pmd)) Because this race condition could be exercised without special privileges this was reported in CVE-2012-1179. The race was identified and fully explained by Ulrich who debugged it. I'm quoting his accurate explanation below, for reference. ====== start quote ======= mapcount 0 page_mapcount 1 kernel BUG at mm/huge_memory.c:1384! At some point prior to the panic, a "bad pmd ..." message similar to the following is logged on the console: mm/memory.c:145: bad pmd ffff8800376e1f98(80000000314000e7). The "bad pmd ..." message is logged by pmd_clear_bad() before it clears the page's PMD table entry. 143 void pmd_clear_bad(pmd_t *pmd) 144 { -> 145 pmd_ERROR(*pmd); 146 pmd_clear(pmd); 147 } After the PMD table entry has been cleared, there is an inconsistency between the actual number of PMD table entries that are mapping the page and the page's map count (_mapcount field in struct page). When the page is subsequently reclaimed, __split_huge_page() detects this inconsistency. 1381 if (mapcount != page_mapcount(page)) 1382 printk(KERN_ERR "mapcount %d page_mapcount %d\n", 1383 mapcount, page_mapcount(page)); -> 1384 BUG_ON(mapcount != page_mapcount(page)); The root cause of the problem is a race of two threads in a multithreaded process. Thread B incurs a page fault on a virtual address that has never been accessed (PMD entry is zero) while Thread A is executing an madvise() system call on a virtual address within the same 2 MB (huge page) range. virtual address space .---------------------. | | | | .-|---------------------| | | | | | |<-- B(fault) | | | 2 MB | |/////////////////////|-. huge < |/////////////////////| > A(range) page | |/////////////////////|-' | | | | | | '-|---------------------| | | | | '---------------------' - Thread A is executing an madvise(..., MADV_DONTNEED) system call on the virtual address range "A(range)" shown in the picture. sys_madvise // Acquire the semaphore in shared mode. down_read(&current->mm->mmap_sem) ... madvise_vma switch (behavior) case MADV_DONTNEED: madvise_dontneed zap_page_range unmap_vmas unmap_page_range zap_pud_range zap_pmd_range // // Assume that this huge page has never been accessed. // I.e. content of the PMD entry is zero (not mapped). // if (pmd_trans_huge(*pmd)) { // We don't get here due to the above assumption. } // // Assume that Thread B incurred a page fault and .---------> // sneaks in here as shown below. | // | if (pmd_none_or_clear_bad(pmd)) | { | if (unlikely(pmd_bad(*pmd))) | pmd_clear_bad | { | pmd_ERROR | // Log "bad pmd ..." message here. | pmd_clear | // Clear the page's PMD entry. | // Thread B incremented the map count | // in page_add_new_anon_rmap(), but | // now the page is no longer mapped | // by a PMD entry (-> inconsistency). | } | } | v - Thread B is handling a page fault on virtual address "B(fault)" shown in the picture. ... do_page_fault __do_page_fault // Acquire the semaphore in shared mode. down_read_trylock(&mm->mmap_sem) ... handle_mm_fault if (pmd_none(*pmd) && transparent_hugepage_enabled(vma)) // We get here due to the above assumption (PMD entry is zero). do_huge_pmd_anonymous_page alloc_hugepage_vma // Allocate a new transparent huge page here. ... __do_huge_pmd_anonymous_page ... spin_lock(&mm->page_table_lock) ... page_add_new_anon_rmap // Here we increment the page's map count (starts at -1). atomic_set(&page->_mapcount, 0) set_pmd_at // Here we set the page's PMD entry which will be cleared // when Thread A calls pmd_clear_bad(). ... spin_unlock(&mm->page_table_lock) The mmap_sem does not prevent the race because both threads are acquiring it in shared mode (down_read). Thread B holds the page_table_lock while the page's map count and PMD table entry are updated. However, Thread A does not synchronize on that lock. ====== end quote ======= [akpm@linux-foundation.org: checkpatch fixes] Reported-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Jones <davej@redhat.com> Acked-by: Larry Woodman <lwoodman@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Mark Salter <msalter@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2011-06-15include/asm-generic/pgtable.h: fix unbalanced parenthesisNicolas Kaiser
Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-23[S390] merge page_test_dirty and page_clear_dirtyMartin Schwidefsky
The page_clear_dirty primitive always sets the default storage key which resets the access control bits and the fetch protection bit. That will surprise a KVM guest that sets non-zero access control bits or the fetch protection bit. Merge page_test_dirty and page_clear_dirty back to a single function and only clear the dirty bit from the storage key. In addition move the function page_test_and_clear_dirty and page_test_and_clear_young to page.h where they belong. This requires to change the parameter from a struct page * to a page frame number. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-02-28mm: <asm-generic/pgtable.h> must include <linux/mm_types.h>Ben Hutchings
Commit e2cda3226481 ("thp: add pmd mangling generic functions") replaced some macros in <asm-generic/pgtable.h> with inline functions. If the functions are to be defined (not all architectures need them) then struct vm_area_struct must be defined first. So include <linux/mm_types.h>. Fixes a build failure seen in Debian: CC [M] drivers/media/dvb/mantis/mantis_pci.o In file included from arch/arm/include/asm/pgtable.h:460, from drivers/media/dvb/mantis/mantis_pci.c:25: include/asm-generic/pgtable.h: In function 'ptep_test_and_clear_young': include/asm-generic/pgtable.h:29: error: dereferencing pointer to incomplete type Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-16fix non-x86 build failure in pmdp_get_and_clearAndrea Arcangeli
pmdp_get_and_clear/pmdp_clear_flush/pmdp_splitting_flush were trapped as BUG() and they were defined only to diminish the risk of build issues on not-x86 archs and to be consistent with the generic pte methods previously defined in include/asm-generic/pgtable.h. But they are causing more trouble than they were supposed to solve, so it's simpler not to define them when THP is off. This is also correcting the export of pmdp_splitting_flush which is currently unused (x86 isn't using the generic implementation in mm/pgtable-generic.c and no other arch needs that [yet]). Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Sam Ravnborg <sam@ravnborg.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13thp: add pmd mangling generic functionsAndrea Arcangeli
Some are needed to build but not actually used on archs not supporting transparent hugepages. Others like pmdp_clear_flush are used by x86 too. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13thp: special pmd_trans_* functionsAndrea Arcangeli
These returns 0 at compile time when the config option is disabled, to allow gcc to eliminate the transparent hugepage function calls at compile time without additional #ifdefs (only the export of those functions have to be visible to gcc but they won't be required at link time and huge_memory.o can be not built at all). _PAGE_BIT_UNUSED1 is never used for pmd, only on pte. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-25[S390] add support for nonquiescing sskeMartin Schwidefsky
Improve performance of the sske operation by using the nonquiescing variant if the affected page has no mappings established. On machines with no support for the new sske variant the mask bit will be ignored. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-08-23x86, mm: Avoid unnecessary TLB flushShaohua Li
In x86, access and dirty bits are set automatically by CPU when CPU accesses memory. When we go into the code path of below flush_tlb_fix_spurious_fault(), we already set dirty bit for pte and don't need flush tlb. This might mean tlb entry in some CPUs hasn't dirty bit set, but this doesn't matter. When the CPUs do page write, they will automatically check the bit and no software involved. On the other hand, flush tlb in below position is harmful. Test creates CPU number of threads, each thread writes to a same but random address in same vma range and we measure the total time. Under a 4 socket system, original time is 1.96s, while with the patch, the time is 0.8s. Under a 2 socket system, there is 20% time cut too. perf shows a lot of time are taking to send ipi/handle ipi for tlb flush. Signed-off-by: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <20100816011655.GA362@sli10-desk.sh.intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andrea Archangeli <aarcange@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-23asm-generic: add dummy pgprot_noncached()Paul Mundt
Most architectures now provide a pgprot_noncached(), the remaining ones can simply use an dummy default implementation, except for cris and xtensa, which should override the default appropriately. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Chris Zankel <chris@zankel.net> Cc: Magnus Damm <magnus.damm@gmail.com>
2009-03-29x86/paravirt: finish change from lazy cpu to context switch start/endJeremy Fitzhardinge
Impact: fix lazy context switch API Pass the previous and next tasks into the context switch start end calls, so that the called functions can properly access the task state (esp in end_context_switch, in which the next task is not yet completely current). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2009-03-29x86/pvops: replace arch_enter_lazy_cpu_mode with arch_start_context_switchJeremy Fitzhardinge
Impact: simplification, prepare for later changes Make lazy cpu mode more specific to context switching, so that it makes sense to do more context-switch specific things in the callbacks. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2009-01-13x86 PAT: change track_pfn_vma_new to take pgprot_t pointer paramvenkatesh.pallipadi@intel.com
Impact: cleanup Change the protection parameter for track_pfn_vma_new() into a pgprot_t pointer. Subsequent patch changes the x86 PAT handling to return a compatible memtype in pgprot_t, if what was requested cannot be allowed due to conflicts. No fuctionality change in this patch. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-19x86: PAT: move track untrack pfnmap stubs to asm-genericvenkatesh.pallipadi@intel.com
Impact: Cleanup and branch hints only. Move the track and untrack pfn stub routines from memory.c to asm-generic. Also add unlikely to pfnmap related calls in fork and exit path. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-12-18x86: PAT: add pgprot_writecombine() interface for drivers - v3venkatesh.pallipadi@intel.com
Impact: New mm functionality. Add pgprot_writecombine. pgprot_writecombine will be aliased to pgprot_noncached when not supported by the architecture. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-07-15mm: fix build on non-mmu machinesSebastian Siewior
Commit 1ea0704e0d aka "mm: add a ptep_modify_prot transaction abstraction" caused: | CC init/main.o |In file included from include2/asm/pgtable.h:68, | from /home/bigeasy/git/linux-2.6-m68k/include/linux/mm.h:39, | from include2/asm/uaccess.h:8, | from /home/bigeasy/git/linux-2.6-m68k/include/linux/poll.h:13, | from /home/bigeasy/git/linux-2.6-m68k/include/linux/rtc.h:113, | from /home/bigeasy/git/linux-2.6-m68k/include/linux/efi.h:19, | from /home/bigeasy/git/linux-2.6-m68k/init/main.c:43: |/linux-2.6/include/asm-generic/pgtable.h: In function '__ptep_modify_prot_start': |/linux-2.6/include/asm-generic/pgtable.h:209: error: implicit declaration of function 'ptep_get_and_clear' |/linux-2.6/include/asm-generic/pgtable.h:209: error: incompatible types in return |/linux-2.6/include/asm-generic/pgtable.h: In function '__ptep_modify_prot_commit': |/linux-2.6/include/asm-generic/pgtable.h:220: error: implicit declaration of function 'set_pte_at' |make[2]: *** [init/main.o] Error 1 |make[1]: *** [init] Error 2 |make: *** [sub-make] Error 2 on my m68knommu box. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Hugh Dickins <hugh@veritas.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Sebastian Siewior <bigeasy@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-25mm: add a ptep_modify_prot transaction abstractionJeremy Fitzhardinge
This patch adds an API for doing read-modify-write updates to a pte's protection bits which may race against hardware updates to the pte. After reading the pte, the hardware may asynchonously set the accessed or dirty bits on a pte, which would be lost when writing back the modified pte value. The existing technique to handle this race is to use ptep_get_and_clear() atomically fetch the old pte value and clear it in memory. This has the effect of marking the pte as non-present, which will prevent the hardware from updating its state. When the new value is written back, the pte will be present again, and the hardware can resume updating the access/dirty flags. When running in a virtualized environment, pagetable updates are relatively expensive, since they generally involve some trap into the hypervisor. To mitigate the cost of these updates, we tend to batch them. However, because of the atomic nature of ptep_get_and_clear(), it is inherently non-batchable. This new interface allows batching by giving the underlying implementation enough information to open a transaction between the read and write phases: ptep_modify_prot_start() returns the current pte value, and puts the pte entry into a state where either the hardware will not update the pte, or if it does, the updates will be preserved on commit. ptep_modify_prot_commit() writes back the updated pte, makes sure that any hardware updates made since ptep_modify_prot_start() are preserved. ptep_modify_prot_start() and _commit() must be exactly paired, and used while holding the appropriate pte lock. They do not protect against other software updates of the pte in any way. The current implementations of ptep_modify_prot_start and _commit are functionally unchanged from before: _start() uses ptep_get_and_clear() fetch the pte and zero the entry, preventing any hardware updates. _commit() simply writes the new pte value back knowing that the hardware has not updated the pte in the meantime. The only current user of this interface is mprotect Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-16flush icache before set_pte() on ia64: flush icache at set_pteKAMEZAWA Hiroyuki
Current ia64 kernel flushes icache by lazy_mmu_prot_update() *after* set_pte(). This is too late. This patch removes lazy_mmu_prot_update and add modfied set_pte() for flushing if necessary. This patch flush icache of a page when new pte has exec bit. && new pte has present bit && new pte is user's page. && (old *ptep is not present || new pte's pfn is not same to old *ptep's ptn) && new pte's page has no Pg_arch_1 bit. Pg_arch_1 is set when a page is cache consistent. I think this condition checks are much easier to understand than considering "Where sync_icache_dcache() should be inserted ?". pte_user() for ia64 was removed by http://lkml.org/lkml/2007/6/12/67 as clean-up. So, I added it again. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-11changing include/asm-generic/pgtable.h for non-mmuGreg Ungerer
There are some parts of include/asm-generic/pgtable.h that are relevant to the non-mmu architectures. To make it easier to include this from them I would like to ifdef the relevant parts. Without this there is a handful of functions that are referenced in here that are not defined on many non-mmu architectures. They could be defined out of course, as an alternative approach. Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17mm: remove ptep_test_and_clear_dirty and ptep_clear_flush_dirtyMartin Schwidefsky
Nobody is using ptep_test_and_clear_dirty and ptep_clear_flush_dirty. Remove the functions from all architectures. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17mm: remove ptep_establish()Martin Schwidefsky
The last user of ptep_establish in mm/ is long gone. Remove the architecture primitive as well. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-16Rework ptep_set_access_flags and fix sun4cBenjamin Herrenschmidt
Some changes done a while ago to avoid pounding on ptep_set_access_flags and update_mmu_cache in some race situations break sun4c which requires update_mmu_cache() to always be called on minor faults. This patch reworks ptep_set_access_flags() semantics, implementations and callers so that it's now responsible for returning whether an update is necessary or not (basically whether the PTE actually changed). This allow fixing the sparc implementation to always return 1 on sun4c. [akpm@linux-foundation.org: fixes, cleanups] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hugh Dickins <hugh@veritas.com> Cc: David Miller <davem@davemloft.net> Cc: Mark Fortescue <mark@mtfhpc.demon.co.uk> Acked-by: William Lee Irwin III <wli@holomorphy.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-27[S390] split page_test_and_clear_dirty.Martin Schwidefsky
The page_test_and_clear_dirty primitive really consists of two operations, page_test_dirty and the page_clear_dirty. The combination of the two is not an atomic operation, so it makes more sense to have two separate operations instead of one. In addition to the improved readability of the s390 version of SetPageUptodate, it now avoids the page_test_dirty operation which is an insert-storage-key-extended (iske) instruction which is an expensive operation. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-04-08[PATCH] Proper fix for highmem kmap_atomic functions for VMI for 2.6.21Zachary Amsden
Since lazy MMU batching mode still allows interrupts to enter, it is possible for interrupt handlers to try to use kmap_atomic, which fails when lazy mode is active, since the PTE update to highmem will be delayed. The best workaround is to issue an explicit flush in kmap_atomic_functions case; this is the only way nested PTE updates can happen in the interrupt handler. Thanks to Jeremy Fitzhardinge for noting the bug and suggestions on a fix. This patch gets reverted again when we start 2.6.22 and the bug gets fixed differently. Signed-off-by: Zachary Amsden <zach@vmware.com> Cc: Andi Kleen <ak@muc.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-13[PATCH] i386: paravirt CPU hypercall batching modeZachary Amsden
The VMI ROM has a mode where hypercalls can be queued and batched. This turns out to be a significant win during context switch, but must be done at a specific point before side effects to CPU state are visible to subsequent instructions. This is similar to the MMU batching hooks already provided. The same hooks could be used by the Xen backend to implement a context switch multicall. To explain a bit more about lazy modes in the paravirt patches, basically, the idea is that only one of lazy CPU or MMU mode can be active at any given time. Lazy MMU mode is similar to this lazy CPU mode, and allows for batching of multiple PTE updates (say, inside a remap loop), but to avoid keeping some kind of state machine about when to flush cpu or mmu updates, we just allow one or the other to be active. Although there is no real reason a more comprehensive scheme could not be implemented, there is also no demonstrated need for this extra complexity. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-10-01[PATCH] paravirt: remove set pte atomicZachary Amsden
Now that ptep_establish has a definition in PAE i386 3-level paging code, the only paging model which is insane enough to have multi-word hardware PTEs which are not efficient to set atomically, we can remove the ghost of set_pte_atomic from other architectures which falesly duplicated it, and remove all knowledge of it from the generic pgtable code. set_pte_atomic is now a private pte operator which is specific to i386 Signed-off-by: Zachary Amsden <zach@vmware.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01[PATCH] paravirt: lazy mmu mode hooks.patchZachary Amsden
Implement lazy MMU update hooks which are SMP safe for both direct and shadow page tables. The idea is that PTE updates and page invalidations while in lazy mode can be batched into a single hypercall. We use this in VMI for shadow page table synchronization, and it is a win. It also can be used by PPC and for direct page tables on Xen. For SMP, the enter / leave must happen under protection of the page table locks for page tables which are being modified. This is because otherwise, you end up with stale state in the batched hypercall, which other CPUs can race ahead of. Doing this under the protection of the locks guarantees the synchronization is correct, and also means that spurious faults which are generated during this window by remote CPUs are properly handled, as the page fault handler must re-check the PTE under protection of the same lock. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01[PATCH] paravirt: pte clear not presentZachary Amsden
Change pte_clear_full to a more appropriately named pte_clear_not_present, allowing optimizations when not-present mapping changes need not be reflected in the hardware TLB for protected page table modes. There is also another case that can use it in the fremap code. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26[PATCH] x86: trivial pgtable.h __ASSEMBLY__ moveRusty Russell
Parsing generic pgtable.h in assembler is simply crazy. None of this file is needed in assembler code, and C inline functions and structures routine break one or more different compiles. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-01[SPARC64]: Fix D-cache corruption in mremapDavid S. Miller
If we move a mapping from one virtual address to another, and this changes the virtual color of the mapping to those pages, we can see corrupt data due to D-cache aliasing. Check for and deal with this by overriding the move_pte() macro. Set things up so that other platforms can cleanly override the move_pte() macro too. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-07[PATCH] fix remaining missing includesTim Schmielau
Fix more include file problems that surfaced since I submitted the previous fix-missing-includes.patch. This should now allow not to include sched.h from module.h, which is done by a followup patch. Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29[PATCH] mm: update comments to pte lockHugh Dickins
Updated several references to page_table_lock in common code comments. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-28[PATCH] mm: move_pte to remap ZERO_PAGENick Piggin
Move the ZERO_PAGE remapping complexity to the move_pte macro in asm-generic, have it conditionally depend on __HAVE_ARCH_MULTIPLE_ZERO_PAGE, which gets defined for MIPS. For architectures without __HAVE_ARCH_MULTIPLE_ZERO_PAGE, move_pte becomes a noop. From: Hugh Dickins <hugh@veritas.com> Fix nasty little bug we've missed in Nick's mremap move ZERO_PAGE patch. The "pte" at that point may be a swap entry or a pte_file entry: we must check pte_present before perhaps corrupting such an entry. Patch below against 2.6.14-rc2-mm1, but the same bug is in 2.6.14-rc2's mm/mremap.c, and more dangerous there since it's affecting all arches: I think the safest course is to send Nick's patch and Yoichi's build fix and this fix (build tested) on to Linus - so only MIPS can be affected. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-05[PATCH] x86: ptep_clear optimizationZachary Amsden
Add a new accessor for PTEs, which passes the full hint from the mmu_gather struct; this allows architectures with hardware pagetables to optimize away atomic PTE operations when destroying an address space. Removing the locked operation should allow better pipelining of memory access in this loop. I measured an average savings of 30-35 cycles per zap_pte_range on the first 500 destructions on Pentium-M, but I believe the optimization would win more on older processors which still assert the bus lock on xchg for an exclusive cacheline. Update: I made some new measurements, and this saves exactly 26 cycles over ptep_get_and_clear on Pentium M. On P4, with a PAE kernel, this saves 180 cycles per ptep_get_and_clear, for a whopping 92160 cycles savings for a full address space destruction. pte_clear_full is not yet used, but is provided for future optimizations (in particular, when running inside of a hypervisor that queues page table updates, the full hint allows us to avoid queueing unnecessary page table update for an address space in the process of being destroyed. This is not a huge win, but it does help a bit, and sets the stage for further hypervisor optimization of the mm layer on all architectures. Signed-off-by: Zachary Amsden <zach@vmware.com> Cc: Christoph Lameter <christoph@lameter.com> Cc: <linux-mm@kvack.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21[PATCH] msync: check pte dirty earlierAbhijit Karmarkar
It's common practice to msync a large address range regularly, in which often only a few ptes have actually been dirtied since the previous pass. sync_pte_range then goes much faster if it tests whether pte is dirty before locating and accessing each struct page cacheline; and it is hardly slowed by ptep_clear_flush_dirty repeating that test in the opposite case, when every pte actually is dirty. But beware, s390's pte_dirty always says false, since its dirty bit is kept in the storage key, located via the struct page address. So skip this optimization in its case: use a pte_maybe_dirty macro which just says true if page_test_and_clear_dirty is implemented. Signed-off-by: Abhijit Karmarkar <abhijitk@veritas.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-19[PATCH] freepgt: remove arch pgd_addr_endHugh Dickins
ia64 and sparc64 hurriedly had to introduce their own variants of pgd_addr_end, to leapfrog over the holes in their virtual address spaces which the final clear_page_range suddenly presented when converted from pgd_index to pgd_addr_end. But now that free_pgtables respects the vma list, those holes are never presented, and the arch variants can go. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-28[PATCH] arch hook for notifying changes in PTE protections bitsSeth Rohit
Recently on IA-64, we have found an issue where old data could be used by apps. The sequence of operations includes few mprotects from user space (glibc) goes like this: 1- The text region of an executable is mmaped using PROT_READ|PROT_EXEC. As a result, a shared page is allocated to user. 2- User then requests the text region to be mprotected with PROT_READ|PROT_WRITE. Kernel removes the execute permission and leave the read permission on the text region. 3- Subsequent write operation by user results in page fault and eventually resulting in COW break. User gets a new private copy of the page. At this point kernel marks the new page for defered flush. 4- User then request the text region to be mprotected back with PROT_READ|PROT_EXEC. mprotect suppport code in kernel, flushes the caches, updates the PTEs and then flushes the TLBs. Though after updating the PTEs with new permissions, we don't let the arch specific code know about the new mappings (through update_mmu_cache like routine). IA-64 typically uses update_mmu_cache to check for the defered flush flag (that got set in step 3) to maintain cache coherency lazily (The local I and D caches on IA-64 are incoherent). DavidM suggeested that we would need to add a hook in the function change_pte_range in mm/mprotect.c This would let the architecture specific code to look at the new ptes to decide if it needs to update any other architectual/kernel state based on the updated (new permissions) PTE values. We have added a new hook lazy_mmu_prot_update(pte_t) that gets called protection bits in PTEs change. This hook provides an opportunity to arch specific code to do needful. On IA-64 this will be used for lazily making the I and D caches coherent. Signed-off-by: David Mosberger <davidm@hpl.hp.com> Signed-off-by: Rohit Seth <rohit.seth@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-15[SPARC64]: Override {pgd,pmd}_addr_end() to handle vaddr hole.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-03-13[PATCH] ptwalk: pud and pmd foldedHugh Dickins
Nick Piggin's patch to fold away most of the pud and pmd levels when not required. Adjusted to define minimal pud_addr_end (in the 4LEVEL_HACK case too) and pmd_addr_end. Responsible for half of the savings. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-13[PATCH] ptwalk: change_protectionHugh Dickins
Begin the pagetable walker cleanup with a straightforward example, mprotect's change_protection. Started out from Nick Piggin's for_each proposal, but I prefer less hidden; and these are all do while loops, which degrade slightly when converted to for loops. Firmly agree with Andi and Nick that addr,end is the way to go: size is good at the user interface level, but unhelpful down in the loops. And the habit of an "address" which is actually an offset from some base has bitten us several times: use proper address at each level, whyever not? Don't apply each mask at two levels: all we need is a set of macros pgd_addr_end, pud_addr_end, pmd_addr_end to give the address of the end of each range. Which need to take the min of two addresses, with 0 as the greatest. Started out with a different macro, assumed end never 0; but clear_page_range (alone) might be passed end 0 by some out-of-tree memory layouts: could special case it, but this macro compiles smaller. Check "addr != end" instead of "addr < end" to work on that end 0 case. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-13[PATCH] ptwalk: p?d_none_or_clear_badHugh Dickins
Replace the repetitive p?d_none, p?d_bad, p?d_ERROR, p?d_clear clauses by pgd_none_or_clear_bad, pud_none_or_clear_bad, pmd_none_or_clear_bad inlines throughout common and i386 - avoids a sprinkling of "unlikely"s. Tests inline, but unlikely error handling in mm/memory.c - so the ERROR file and line won't tell much; but it comes too late anyway, and hardly ever seen outside development. Let mremap use them in get_one_pte_map, as it already did in _nested; but leave follow_page and untouched_anonymous page just skipping _bad as before - they don't have quite the same ownership of the mm. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-03-09[PATCH] ptep_test_and_clear_dirty typo fixAndrew Morton
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-02-22[MM]: Add set_pte_at() which takes 'mm' and 'addr' args.David S. Miller
I'm taking a slightly different approach this time around so things are easier to integrate. Here is the first patch which builds the infrastructure. Basically: 1) Add set_pte_at() which is set_pte() with 'mm' and 'addr' arguments added. All generic code uses set_pte_at(). Most platforms simply get this define: #define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval) I chose this method over simply changing all set_pte() call sites because many platforms implement this in assembler and it would take forever to preserve the build and stabilize things if modifying that was necessary. Soon, with platform maintainer's help, we can kill of set_pte() entirely. To be honest, there are only a handful of set_pte() call sites in the arch specific code. Actually, in this patch ppc64 is completely set_pte() free and does not define it. 2) pte_clear() gets 'mm' and 'addr' arguments now. This had a cascading effect on many ptep_test_and_*() routines. Specifically: a) ptep_test_and_clear_{young,dirty}() now take 'vma' and 'address' args. b) ptep_get_and_clear now take 'mm' and 'address' args. c) ptep_mkdirty was deleted, unused by any code. d) ptep_set_wrprotect now takes 'mm' and 'address' args. I've tested this patch as follows: 1) compile and run tested on sparc64/SMP 2) compile tested on: a) ppc64/SMP b) i386 both with and without PAE enabled Signed-off-by: David S. Miller <davem@davemloft.net>
2004-10-13[PATCH] ptep_establish smp race x86 PAE >4GAndrea Arcangeli
This avoid userspace mm corruption during COWs with threads (i.e. malloc;fork;clone) on x86 PAE with >4G of ram Signed-Off-By: Andrea Arcangeli <andrea@novell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-07-28[PATCH] Make get_user_pages() work again for ia64 gate areaDavid Mosberger
Changeset roland@redhat.com[torvalds]|ChangeSet|20040624165002|30880 inadvertently broke ia64 because the patch assumed that pgd_offset_k() is just an optimization of pgd_offset(), which it is not. This patch fixes the problem by introducing pgd_offset_gate(). On architectures on which the gate area lives in the user's address-space, this should be aliased to pgd_offset() and on architectures on which the gate area lives in the kernel-mapped segment, this should be aliased to pgd_offset_k(). This bug was found and tracked down by Peter Chubb. Signed-off-by: <davidm@hpl.hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-05-25Split ptep_establish into "establish" and "update_access_flags"Linus Torvalds
ptep_establish() is used to establish a new mapping at COW time, and it always replaces a non-writable page mapping with a totally new page mapping that is dirty (and likely writable, although ptrace may cause a non-writable new mapping). Because it was nonwritable, we don't have to worry about losing concurrent dirty page bit updates. ptep_update_access_flags() leaves the same page mapping, but updates the accessed/dirty/writable bits (it only ever sets them, and never removes any permissions). Often easier, but it may race with a dirty bit update on another CPU. Booted on x86 and ppc64. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-05-24Introduce architecture-specific "ptep_update_dirty_accessed()"Linus Torvalds
helper function to write-back the dirty and accessed bits from ptep_establish(). Right now this defaults to the same old "set_pte()" that we've always done, except for x86 where we now fix the (unlikely) race in updating accessed bits and dropping a concurrent dirty bit.
2004-05-24Pass in a "dirty" argument to ptep_establish in Linus Torvalds
preparation for pte update race fix. This does not actually use the information yet, but the next few patches will start to put it to some good use.