<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/hugetlb.h, branch v3.2.78</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2.78</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2.78'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2014-02-15T19:20:17Z</updated>
<entry>
<title>mm: hugetlbfs: fix hugetlbfs optimization</title>
<updated>2014-02-15T19:20:17Z</updated>
<author>
<name>Andrea Arcangeli</name>
<email>aarcange@redhat.com</email>
</author>
<published>2013-11-21T22:32:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8ae94088319ea3047cdf68adc9fafdcf862e2790'/>
<id>urn:sha1:8ae94088319ea3047cdf68adc9fafdcf862e2790</id>
<content type='text'>
commit 27c73ae759774e63313c1fbfeb17ba076cea64c5 upstream.

Commit 7cb2ef56e6a8 ("mm: fix aio performance regression for database
caused by THP") can cause dereference of a dangling pointer if
split_huge_page runs during PageHuge() if there are updates to the
tail_page-&gt;private field.

Also it is repeating compound_head twice for hugetlbfs and it is running
compound_head+compound_trans_head for THP when a single one is needed in
both cases.

The new code within the PageSlab() check doesn't need to verify that the
THP page size is never bigger than the smallest hugetlbfs page size, to
avoid memory corruption.

A longstanding theoretical race condition was found while fixing the
above (see the change right after the skip_unlock label, that is
relevant for the compound_lock path too).

By re-establishing the _mapcount tail refcounting for all compound
pages, this also fixes the below problem:

  echo 0 &gt;/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

  BUG: Bad page state in process bash  pfn:59a01
  page:ffffea000139b038 count:0 mapcount:10 mapping:          (null) index:0x0
  page flags: 0x1c00000000008000(tail)
  Modules linked in:
  CPU: 6 PID: 2018 Comm: bash Not tainted 3.12.0+ #25
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  Call Trace:
    dump_stack+0x55/0x76
    bad_page+0xd5/0x130
    free_pages_prepare+0x213/0x280
    __free_pages+0x36/0x80
    update_and_free_page+0xc1/0xd0
    free_pool_huge_page+0xc2/0xe0
    set_max_huge_pages.part.58+0x14c/0x220
    nr_hugepages_store_common.isra.60+0xd0/0xf0
    nr_hugepages_store+0x13/0x20
    kobj_attr_store+0xf/0x20
    sysfs_write_file+0x189/0x1e0
    vfs_write+0xc5/0x1f0
    SyS_write+0x55/0xb0
    system_call_fastpath+0x16/0x1b

Signed-off-by: Khalid Aziz &lt;khalid.aziz@oracle.com&gt;
Signed-off-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Tested-by: Khalid Aziz &lt;khalid.aziz@oracle.com&gt;
Cc: Pravin Shelar &lt;pshelar@nicira.com&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Ben Hutchings &lt;bhutchings@solarflare.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Johannes Weiner &lt;jweiner@redhat.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andi Kleen &lt;andi@firstfloor.org&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[Khalid Aziz: Backported to 3.4]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>futex: Take hugepages into account when generating futex_key</title>
<updated>2013-07-27T04:34:18Z</updated>
<author>
<name>Zhang Yi</name>
<email>wetpzy@gmail.com</email>
</author>
<published>2013-06-25T13:19:31Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=695742a1522eb8e7ed0231e85aec51bf92c4af30'/>
<id>urn:sha1:695742a1522eb8e7ed0231e85aec51bf92c4af30</id>
<content type='text'>
commit 13d60f4b6ab5b702dc8d2ee20999f98a93728aec upstream.

The futex_keys of process shared futexes are generated from the page
offset, the mapping host and the mapping index of the futex user space
address. This should result in an unique identifier for each futex.

Though this is not true when futexes are located in different subpages
of an hugepage. The reason is, that the mapping index for all those
futexes evaluates to the index of the base page of the hugetlbfs
mapping. So a futex at offset 0 of the hugepage mapping and another
one at offset PAGE_SIZE of the same hugepage mapping have identical
futex_keys. This happens because the futex code blindly uses
page-&gt;index.

Steps to reproduce the bug:

1. Map a file from hugetlbfs. Initialize pthread_mutex1 at offset 0
   and pthread_mutex2 at offset PAGE_SIZE of the hugetlbfs
   mapping.

   The mutexes must be initialized as PTHREAD_PROCESS_SHARED because
   PTHREAD_PROCESS_PRIVATE mutexes are not affected by this issue as
   their keys solely depend on the user space address.

2. Lock mutex1 and mutex2

3. Create thread1 and in the thread function lock mutex1, which
   results in thread1 blocking on the locked mutex1.

4. Create thread2 and in the thread function lock mutex2, which
   results in thread2 blocking on the locked mutex2.

5. Unlock mutex2. Despite the fact that mutex2 got unlocked, thread2
   still blocks on mutex2 because the futex_key points to mutex1.

To solve this issue we need to take the normal page index of the page
which contains the futex into account, if the futex is in an hugetlbfs
mapping. In other words, we calculate the normal page mapping index of
the subpage in the hugetlbfs mapping.

Mappings which are not based on hugetlbfs are not affected and still
use page-&gt;index.

Thanks to Mel Gorman who provided a patch for adding proper evaluation
functions to the hugetlbfs code to avoid exposing hugetlbfs specific
details to the futex code.

[ tglx: Massaged changelog ]

Signed-off-by: Zhang Yi &lt;zhang.yi20@zte.com.cn&gt;
Reviewed-by: Jiang Biao &lt;jiang.biao2@zte.com.cn&gt;
Tested-by: Ma Chenggong &lt;ma.chenggong@zte.com.cn&gt;
Reviewed-by: 'Mel Gorman' &lt;mgorman@suse.de&gt;
Acked-by: 'Darren Hart' &lt;dvhart@linux.intel.com&gt;
Cc: 'Peter Zijlstra' &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/000101ce71a6%24a83c5880%24f8b50980%24@com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables</title>
<updated>2012-08-09T23:25:10Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2012-07-31T23:46:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6f72a41f67bb23a6478a0277d97f563830d3f25d'/>
<id>urn:sha1:6f72a41f67bb23a6478a0277d97f563830d3f25d</id>
<content type='text'>
commit d833352a4338dc31295ed832a30c9ccff5c7a183 upstream.

If a process creates a large hugetlbfs mapping that is eligible for page
table sharing and forks heavily with children some of whom fault and
others which destroy the mapping then it is possible for page tables to
get corrupted.  Some teardowns of the mapping encounter a "bad pmd" and
output a message to the kernel log.  The final teardown will trigger a
BUG_ON in mm/filemap.c.

This was reproduced in 3.4 but is known to have existed for a long time
and goes back at least as far as 2.6.37.  It was probably was introduced
in 2.6.20 by [39dde65c: shared page table for hugetlb page].  The messages
look like this;

[  ..........] Lots of bad pmd messages followed by this
[  127.164256] mm/memory.c:391: bad pmd ffff880412e04fe8(80000003de4000e7).
[  127.164257] mm/memory.c:391: bad pmd ffff880412e04ff0(80000003de6000e7).
[  127.164258] mm/memory.c:391: bad pmd ffff880412e04ff8(80000003de0000e7).
[  127.186778] ------------[ cut here ]------------
[  127.186781] kernel BUG at mm/filemap.c:134!
[  127.186782] invalid opcode: 0000 [#1] SMP
[  127.186783] CPU 7
[  127.186784] Modules linked in: af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf ext3 jbd dm_mod coretemp crc32c_intel usb_storage ghash_clmulni_intel aesni_intel i2c_i801 r8169 mii uas sr_mod cdrom sg iTCO_wdt iTCO_vendor_support shpchp serio_raw cryptd aes_x86_64 e1000e pci_hotplug dcdbas aes_generic container microcode ext4 mbcache jbd2 crc16 sd_mod crc_t10dif i915 drm_kms_helper drm i2c_algo_bit ehci_hcd ahci libahci usbcore rtc_cmos usb_common button i2c_core intel_agp video intel_gtt fan processor thermal thermal_sys hwmon ata_generic pata_atiixp libata scsi_mod
[  127.186801]
[  127.186802] Pid: 9017, comm: hugetlbfs-test Not tainted 3.4.0-autobuild #53 Dell Inc. OptiPlex 990/06D7TR
[  127.186804] RIP: 0010:[&lt;ffffffff810ed6ce&gt;]  [&lt;ffffffff810ed6ce&gt;] __delete_from_page_cache+0x15e/0x160
[  127.186809] RSP: 0000:ffff8804144b5c08  EFLAGS: 00010002
[  127.186810] RAX: 0000000000000001 RBX: ffffea000a5c9000 RCX: 00000000ffffffc0
[  127.186811] RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff88042dfdad00
[  127.186812] RBP: ffff8804144b5c18 R08: 0000000000000009 R09: 0000000000000003
[  127.186813] R10: 0000000000000000 R11: 000000000000002d R12: ffff880412ff83d8
[  127.186814] R13: ffff880412ff83d8 R14: 0000000000000000 R15: ffff880412ff83d8
[  127.186815] FS:  00007fe18ed2c700(0000) GS:ffff88042dce0000(0000) knlGS:0000000000000000
[  127.186816] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  127.186817] CR2: 00007fe340000503 CR3: 0000000417a14000 CR4: 00000000000407e0
[  127.186818] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  127.186819] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  127.186820] Process hugetlbfs-test (pid: 9017, threadinfo ffff8804144b4000, task ffff880417f803c0)
[  127.186821] Stack:
[  127.186822]  ffffea000a5c9000 0000000000000000 ffff8804144b5c48 ffffffff810ed83b
[  127.186824]  ffff8804144b5c48 000000000000138a 0000000000001387 ffff8804144b5c98
[  127.186825]  ffff8804144b5d48 ffffffff811bc925 ffff8804144b5cb8 0000000000000000
[  127.186827] Call Trace:
[  127.186829]  [&lt;ffffffff810ed83b&gt;] delete_from_page_cache+0x3b/0x80
[  127.186832]  [&lt;ffffffff811bc925&gt;] truncate_hugepages+0x115/0x220
[  127.186834]  [&lt;ffffffff811bca43&gt;] hugetlbfs_evict_inode+0x13/0x30
[  127.186837]  [&lt;ffffffff811655c7&gt;] evict+0xa7/0x1b0
[  127.186839]  [&lt;ffffffff811657a3&gt;] iput_final+0xd3/0x1f0
[  127.186840]  [&lt;ffffffff811658f9&gt;] iput+0x39/0x50
[  127.186842]  [&lt;ffffffff81162708&gt;] d_kill+0xf8/0x130
[  127.186843]  [&lt;ffffffff81162812&gt;] dput+0xd2/0x1a0
[  127.186845]  [&lt;ffffffff8114e2d0&gt;] __fput+0x170/0x230
[  127.186848]  [&lt;ffffffff81236e0e&gt;] ? rb_erase+0xce/0x150
[  127.186849]  [&lt;ffffffff8114e3ad&gt;] fput+0x1d/0x30
[  127.186851]  [&lt;ffffffff81117db7&gt;] remove_vma+0x37/0x80
[  127.186853]  [&lt;ffffffff81119182&gt;] do_munmap+0x2d2/0x360
[  127.186855]  [&lt;ffffffff811cc639&gt;] sys_shmdt+0xc9/0x170
[  127.186857]  [&lt;ffffffff81410a39&gt;] system_call_fastpath+0x16/0x1b
[  127.186858] Code: 0f 1f 44 00 00 48 8b 43 08 48 8b 00 48 8b 40 28 8b b0 40 03 00 00 85 f6 0f 88 df fe ff ff 48 89 df e8 e7 cb 05 00 e9 d2 fe ff ff &lt;0f&gt; 0b 55 83 e2 fd 48 89 e5 48 83 ec 30 48 89 5d d8 4c 89 65 e0
[  127.186868] RIP  [&lt;ffffffff810ed6ce&gt;] __delete_from_page_cache+0x15e/0x160
[  127.186870]  RSP &lt;ffff8804144b5c08&gt;
[  127.186871] ---[ end trace 7cbac5d1db69f426 ]---

The bug is a race and not always easy to reproduce.  To reproduce it I was
doing the following on a single socket I7-based machine with 16G of RAM.

$ hugeadm --pool-pages-max DEFAULT:13G
$ echo $((18*1048576*1024)) &gt; /proc/sys/kernel/shmmax
$ echo $((18*1048576*1024)) &gt; /proc/sys/kernel/shmall
$ for i in `seq 1 9000`; do ./hugetlbfs-test; done

On my particular machine, it usually triggers within 10 minutes but
enabling debug options can change the timing such that it never hits.
Once the bug is triggered, the machine is in trouble and needs to be
rebooted.  The machine will respond but processes accessing proc like "ps
aux" will hang due to the BUG_ON.  shutdown will also hang and needs a
hard reset or a sysrq-b.

The basic problem is a race between page table sharing and teardown.  For
the most part page table sharing depends on i_mmap_mutex.  In some cases,
it is also taking the mm-&gt;page_table_lock for the PTE updates but with
shared page tables, it is the i_mmap_mutex that is more important.

Unfortunately it appears to be also insufficient. Consider the following
situation

Process A					Process B
---------					---------
hugetlb_fault					shmdt
  						LockWrite(mmap_sem)
    						  do_munmap
						    unmap_region
						      unmap_vmas
						        unmap_single_vma
						          unmap_hugepage_range
      						            Lock(i_mmap_mutex)
							    Lock(mm-&gt;page_table_lock)
							    huge_pmd_unshare/unmap tables &lt;--- (1)
							    Unlock(mm-&gt;page_table_lock)
      						            Unlock(i_mmap_mutex)
  huge_pte_alloc				      ...
    Lock(i_mmap_mutex)				      ...
    vma_prio_walk, find svma, spte		      ...
    Lock(mm-&gt;page_table_lock)			      ...
    share spte					      ...
    Unlock(mm-&gt;page_table_lock)			      ...
    Unlock(i_mmap_mutex)			      ...
  hugetlb_no_page									  &lt;--- (2)
						      free_pgtables
						        unlink_file_vma
							hugetlb_free_pgd_range
						    remove_vma_list

In this scenario, it is possible for Process A to share page tables with
Process B that is trying to tear them down.  The i_mmap_mutex on its own
does not prevent Process A walking Process B's page tables.  At (1) above,
the page tables are not shared yet so it unmaps the PMDs.  Process A sets
up page table sharing and at (2) faults a new entry.  Process B then trips
up on it in free_pgtables.

This patch fixes the problem by adding a new function
__unmap_hugepage_range_final that is only called when the VMA is about to
be destroyed.  This function clears VM_MAYSHARE during
unmap_hugepage_range() under the i_mmap_mutex.  This makes the VMA
ineligible for sharing and avoids the race.  Superficially this looks like
it would then be vunerable to truncate and madvise issues but hugetlbfs
has its own truncate handlers so does not use unmap_mapping_range() and
does not support madvise(DONTNEED).

This should be treated as a -stable candidate if it is merged.

Test program is as follows. The test case was mostly written by Michal
Hocko with a few minor changes to reproduce this bug.

==== CUT HERE ====

static size_t huge_page_size = (2UL &lt;&lt; 20);
static size_t nr_huge_page_A = 512;
static size_t nr_huge_page_B = 5632;

unsigned int get_random(unsigned int max)
{
	struct timeval tv;

	gettimeofday(&amp;tv, NULL);
	srandom(tv.tv_usec);
	return random() % max;
}

static void play(void *addr, size_t size)
{
	unsigned char *start = addr,
		      *end = start + size,
		      *a;
	start += get_random(size/2);

	/* we could itterate on huge pages but let's give it more time. */
	for (a = start; a &lt; end; a += 4096)
		*a = 0;
}

int main(int argc, char **argv)
{
	key_t key = IPC_PRIVATE;
	size_t sizeA = nr_huge_page_A * huge_page_size;
	size_t sizeB = nr_huge_page_B * huge_page_size;
	int shmidA, shmidB;
	void *addrA = NULL, *addrB = NULL;
	int nr_children = 300, n = 0;

	if ((shmidA = shmget(key, sizeA, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
		perror("shmget:");
		return 1;
	}

	if ((addrA = shmat(shmidA, addrA, SHM_R|SHM_W)) == (void *)-1UL) {
		perror("shmat");
		return 1;
	}
	if ((shmidB = shmget(key, sizeB, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
		perror("shmget:");
		return 1;
	}

	if ((addrB = shmat(shmidB, addrB, SHM_R|SHM_W)) == (void *)-1UL) {
		perror("shmat");
		return 1;
	}

fork_child:
	switch(fork()) {
		case 0:
			switch (n%3) {
			case 0:
				play(addrA, sizeA);
				break;
			case 1:
				play(addrB, sizeB);
				break;
			case 2:
				break;
			}
			break;
		case -1:
			perror("fork:");
			break;
		default:
			if (++n &lt; nr_children)
				goto fork_child;
			play(addrA, sizeA);
			break;
	}
	shmdt(addrA);
	shmdt(addrB);
	do {
		wait(NULL);
	} while (--n &gt; 0);
	shmctl(shmidA, IPC_RMID, NULL);
	shmctl(shmidB, IPC_RMID, NULL);
	return 0;
}

[akpm@linux-foundation.org: name the declaration's args, fix CONFIG_HUGETLBFS=n build]
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Reviewed-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[bwh: Backported to 3.2:
 - Adjust context
 - Drop the mmu_gather * parameters]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>hugepages: fix use after free bug in "quota" handling</title>
<updated>2012-07-25T03:11:01Z</updated>
<author>
<name>David Gibson</name>
<email>david@gibson.dropbear.id.au</email>
</author>
<published>2012-03-21T23:34:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8fc536fc7a502841b732edfc6d5a66f3c138132d'/>
<id>urn:sha1:8fc536fc7a502841b732edfc6d5a66f3c138132d</id>
<content type='text'>
commit 90481622d75715bfcb68501280a917dbfe516029 upstream.

hugetlbfs_{get,put}_quota() are badly named.  They don't interact with the
general quota handling code, and they don't much resemble its behaviour.
Rather than being about maintaining limits on on-disk block usage by
particular users, they are instead about maintaining limits on in-memory
page usage (including anonymous MAP_PRIVATE copied-on-write pages)
associated with a particular hugetlbfs filesystem instance.

Worse, they work by having callbacks to the hugetlbfs filesystem code from
the low-level page handling code, in particular from free_huge_page().
This is a layering violation of itself, but more importantly, if the
kernel does a get_user_pages() on hugepages (which can happen from KVM
amongst others), then the free_huge_page() can be delayed until after the
associated inode has already been freed.  If an unmount occurs at the
wrong time, even the hugetlbfs superblock where the "quota" limits are
stored may have been freed.

Andrew Barry proposed a patch to fix this by having hugepages, instead of
storing a pointer to their address_space and reaching the superblock from
there, had the hugepages store pointers directly to the superblock,
bumping the reference count as appropriate to avoid it being freed.
Andrew Morton rejected that version, however, on the grounds that it made
the existing layering violation worse.

This is a reworked version of Andrew's patch, which removes the extra, and
some of the existing, layering violation.  It works by introducing the
concept of a hugepage "subpool" at the lower hugepage mm layer - that is a
finite logical pool of hugepages to allocate from.  hugetlbfs now creates
a subpool for each filesystem instance with a page limit set, and a
pointer to the subpool gets added to each allocated hugepage, instead of
the address_space pointer used now.  The subpool has its own lifetime and
is only freed once all pages in it _and_ all other references to it (i.e.
superblocks) are gone.

subpools are optional - a NULL subpool pointer is taken by the code to
mean that no subpool limits are in effect.

Previous discussion of this bug found in:  "Fix refcounting in hugetlbfs
quota handling.". See:  https://lkml.org/lkml/2011/8/11/28 or
http://marc.info/?l=linux-mm&amp;m=126928970510627&amp;w=1

v2: Fixed a bug spotted by Hillf Danton, and removed the extra parameter to
alloc_huge_page() - since it already takes the vma, it is not necessary.

Signed-off-by: Andrew Barry &lt;abarry@cray.com&gt;
Signed-off-by: David Gibson &lt;david@gibson.dropbear.id.au&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: Hillf Danton &lt;dhillf@gmail.com&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[bwh: Backported to 3.2: adjust context to apply after commit
 c50ac050811d6485616a193eb0f37bfbd191cc89 'hugetlb: fix resv_map leak in
 error path', backported in 3.2.20]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>hugetlb: remove dummy definitions of HPAGE_MASK and HPAGE_SIZE</title>
<updated>2011-11-19T11:15:59Z</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2011-11-19T10:33:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a5c86e986f0b2fe779f13cf53ce6e9f467b03950'/>
<id>urn:sha1:a5c86e986f0b2fe779f13cf53ce6e9f467b03950</id>
<content type='text'>
Dummy, non-zero definitions for HPAGE_MASK and HPAGE_SIZE were added in
51c6f666fceb ("mm: ZAP_BLOCK causes redundant work") to avoid a divide
by zero in generic kernel code.

That code has since been removed, but probably should never have been
added in the first place: we don't want HPAGE_SIZE to act like PAGE_SIZE
for code that is working with hugepages, for example, when the
dependency on CONFIG_HUGETLB_PAGE has not been fulfilled.

Because hugepage size can differ from architecture to architecture, each
is required to have their own definitions for both HPAGE_MASK and
HPAGE_SIZE.  This is always done in arch/*/include/asm/page.h.

So, just remove the dummy and dangerous definitions since they are no
longer needed and reveals the correct dependencies.  Tested on
architectures using the definitions with allyesconfig: x86 (even with
thp), hppa, mips, powerpc, s390, sh3, sh4, sparc, and sparc64, and with
defconfig on ia64.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>hugetlb: add phys addr to struct huge_bootmem_page</title>
<updated>2011-07-26T03:57:07Z</updated>
<author>
<name>Becky Bruce</name>
<email>beckyb@kernel.crashing.org</email>
</author>
<published>2011-07-26T00:11:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ee8f248d266ec6966c0ce6b7dec24de43dcc1b58'/>
<id>urn:sha1:ee8f248d266ec6966c0ce6b7dec24de43dcc1b58</id>
<content type='text'>
This is needed on HIGHMEM systems - we don't always have a virtual
address so store the physical address and map it in as needed.

[akpm@linux-foundation.org: cleanup]
Signed-off-by: Becky Bruce &lt;beckyb@kernel.crashing.org&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Fix build with !HUGETLBFS</title>
<updated>2011-05-26T19:03:50Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2011-05-26T19:03:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=be93d8cfbae1996052e91b2883d306a5d9d0fe18'/>
<id>urn:sha1:be93d8cfbae1996052e91b2883d306a5d9d0fe18</id>
<content type='text'>
I stupidly broke the case of CONFIG_HUGETLBFS=n when doing the
conversion to vm_flags_t in commit ca16d140af91 ("mm: don't access
vm_flags as 'int'").  And my 'allyesconfig' build didn't find it, for
obvious reasons..

Include &lt;linux/mm_types.h&gt; in &lt;linux/hugetlb.h&gt;.  The problem could have
been avoided by just turning the hugetlb_file_setup() error wrapper into
a macro, but mm_types.h is a reasonable include in this file.

Reported-by: Richard -rw- Weinberger &lt;richard.weinberger@gmail.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: don't access vm_flags as 'int'</title>
<updated>2011-05-26T16:20:31Z</updated>
<author>
<name>KOSAKI Motohiro</name>
<email>kosaki.motohiro@jp.fujitsu.com</email>
</author>
<published>2011-05-26T10:16:19Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ca16d140af91febe25daeb9e032bf8bd46b8c31f'/>
<id>urn:sha1:ca16d140af91febe25daeb9e032bf8bd46b8c31f</id>
<content type='text'>
The type of vma-&gt;vm_flags is 'unsigned long'. Neither 'int' nor
'unsigned int'. This patch fixes such misuse.

Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
[ Changed to use a typedef - we'll extend it to cover more cases
  later, since there has been discussion about making it a 64-bit
  type..                      - Linus ]
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Encode huge page size for VM_FAULT_HWPOISON errors</title>
<updated>2010-10-08T07:32:46Z</updated>
<author>
<name>Andi Kleen</name>
<email>ak@linux.intel.com</email>
</author>
<published>2010-10-06T19:45:00Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=aa50d3a7aa8147b9e14dc9d5972a5d2359db4ef8'/>
<id>urn:sha1:aa50d3a7aa8147b9e14dc9d5972a5d2359db4ef8</id>
<content type='text'>
This fixes a problem introduced with the hugetlb hwpoison handling

The user space SIGBUS signalling wants to know the size of the hugepage
that caused a HWPOISON fault.

Unfortunately the architecture page fault handlers do not have easy
access to the struct page.

Pass the information out in the fault error code instead.

I added a separate VM_FAULT_HWPOISON_LARGE bit for this case and encode
the hpage index in some free upper bits of the fault code. The small
page hwpoison keeps stays with the VM_FAULT_HWPOISON name to minimize
changes.

Also add code to hugetlb.h to convert that index into a page shift.

Will be used in a further patch.

Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: fengguang.wu@intel.com
Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
</content>
</entry>
<entry>
<title>HWPOISON, hugetlb: add free check to dequeue_hwpoison_huge_page()</title>
<updated>2010-10-08T07:32:45Z</updated>
<author>
<name>Naoya Horiguchi</name>
<email>n-horiguchi@ah.jp.nec.com</email>
</author>
<published>2010-09-08T01:19:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6de2b1aab94355482bd2accdc115666509667458'/>
<id>urn:sha1:6de2b1aab94355482bd2accdc115666509667458</id>
<content type='text'>
This check is necessary to avoid race between dequeue and allocation,
which can cause a free hugepage to be dequeued twice and get kernel unstable.

Signed-off-by: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Signed-off-by: Wu Fengguang &lt;fengguang.wu@intel.com&gt;
Acked-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Reviewed-by: Christoph Lameter &lt;cl@linux.com&gt;
Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
</content>
</entry>
</feed>
