<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/mm, branch v3.4.40</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.4.40</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.4.40'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2013-04-12T16:38:47Z</updated>
<entry>
<title>mm: prevent mmap_cache race in find_vma()</title>
<updated>2013-04-12T16:38:47Z</updated>
<author>
<name>Jan Stancek</name>
<email>jstancek@redhat.com</email>
</author>
<published>2013-04-08T20:00:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cdc8958a19b38773292cb6916f77f1e3bdf6d0fc'/>
<id>urn:sha1:cdc8958a19b38773292cb6916f77f1e3bdf6d0fc</id>
<content type='text'>
commit b6a9b7f6b1f21735a7456d534dc0e68e61359d2c upstream.

find_vma() can be called by multiple threads with read lock
held on mm-&gt;mmap_sem and any of them can update mm-&gt;mmap_cache.
Prevent compiler from re-fetching mm-&gt;mmap_cache, because other
readers could update it in the meantime:

               thread 1                             thread 2
                                        |
  find_vma()                            |  find_vma()
    struct vm_area_struct *vma = NULL;  |
    vma = mm-&gt;mmap_cache;               |
    if (!(vma &amp;&amp; vma-&gt;vm_end &gt; addr     |
        &amp;&amp; vma-&gt;vm_start &lt;= addr)) {    |
                                        |    mm-&gt;mmap_cache = vma;
    return vma;                         |
     ^^ compiler may optimize this      |
        local variable out and re-read  |
        mm-&gt;mmap_cache                  |

This issue can be reproduced with gcc-4.8.0-1 on s390x by running
mallocstress testcase from LTP, which triggers:

  kernel BUG at mm/rmap.c:1088!
    Call Trace:
     ([&lt;000003d100c57000&gt;] 0x3d100c57000)
      [&lt;000000000023a1c0&gt;] do_wp_page+0x2fc/0xa88
      [&lt;000000000023baae&gt;] handle_pte_fault+0x41a/0xac8
      [&lt;000000000023d832&gt;] handle_mm_fault+0x17a/0x268
      [&lt;000000000060507a&gt;] do_protection_exception+0x1e2/0x394
      [&lt;0000000000603a04&gt;] pgm_check_handler+0x138/0x13c
      [&lt;000003fffcf1f07a&gt;] 0x3fffcf1f07a
    Last Breaking-Event-Address:
      [&lt;000000000024755e&gt;] page_add_new_anon_rmap+0xc2/0x168

Thanks to Jakub Jelinek for his insight on gcc and helping to
track this down.

Signed-off-by: Jan Stancek &lt;jstancek@redhat.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[bwh: Backported to 3.2: adjust context, indentation]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm/hugetlb: fix total hugetlbfs pages count when using memory overcommit accouting</title>
<updated>2013-03-28T19:12:12Z</updated>
<author>
<name>Wanpeng Li</name>
<email>liwanp@linux.vnet.ibm.com</email>
</author>
<published>2013-03-22T22:04:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bea7cce4319edd8c15d78833f9b5bb752f997662'/>
<id>urn:sha1:bea7cce4319edd8c15d78833f9b5bb752f997662</id>
<content type='text'>
commit d00285884c0892bb1310df96bce6056e9ce9b9d9 upstream.

hugetlb_total_pages is used for overcommit calculations but the current
implementation considers only the default hugetlb page size (which is
either the first defined hugepage size or the one specified by
default_hugepagesz kernel boot parameter).

If the system is configured for more than one hugepage size, which is
possible since commit a137e1cc6d6e ("hugetlbfs: per mount huge page
sizes") then the overcommit estimation done by __vm_enough_memory()
(resp.  shown by meminfo_proc_show) is not precise - there is an
impression of more available/allowed memory.  This can lead to an
unexpected ENOMEM/EFAULT resp.  SIGSEGV when memory is accounted.

Testcase:
  boot: hugepagesz=1G hugepages=1
  the default overcommit ratio is 50
  before patch:

    egrep 'CommitLimit' /proc/meminfo
    CommitLimit:     55434168 kB

  after patch:

    egrep 'CommitLimit' /proc/meminfo
    CommitLimit:     54909880 kB

[akpm@linux-foundation.org: coding-style tweak]
Signed-off-by: Wanpeng Li &lt;liwanp@linux.vnet.ibm.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Hillf Danton &lt;dhillf@gmail.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys</title>
<updated>2013-03-14T18:29:51Z</updated>
<author>
<name>Mathieu Desnoyers</name>
<email>mathieu.desnoyers@efficios.com</email>
</author>
<published>2013-02-25T15:20:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3126603e01babcec7cfe2f284099e2adff095bff'/>
<id>urn:sha1:3126603e01babcec7cfe2f284099e2adff095bff</id>
<content type='text'>
commit 8aec0f5d4137532de14e6554fd5dd201ff3a3c49 upstream.

Looking at mm/process_vm_access.c:process_vm_rw() and comparing it to
compat_process_vm_rw() shows that the compatibility code requires an
explicit "access_ok()" check before calling
compat_rw_copy_check_uvector(). The same difference seems to appear when
we compare fs/read_write.c:do_readv_writev() to
fs/compat.c:compat_do_readv_writev().

This subtle difference between the compat and non-compat requirements
should probably be debated, as it seems to be error-prone. In fact,
there are two others sites that use this function in the Linux kernel,
and they both seem to get it wrong:

Now shifting our attention to fs/aio.c, we see that aio_setup_iocb()
also ends up calling compat_rw_copy_check_uvector() through
aio_setup_vectored_rw(). Unfortunately, the access_ok() check appears to
be missing. Same situation for
security/keys/compat.c:compat_keyctl_instantiate_key_iov().

I propose that we add the access_ok() check directly into
compat_rw_copy_check_uvector(), so callers don't have to worry about it,
and it therefore makes the compat call code similar to its non-compat
counterpart. Place the access_ok() check in the same location where
copy_from_user() can trigger a -EFAULT error in the non-compat code, so
the ABI behaviors are alike on both compat and non-compat.

While we are here, fix compat_do_readv_writev() so it checks for
compat_rw_copy_check_uvector() negative return values.

And also, fix a memory leak in compat_keyctl_instantiate_key_iov() error
handling.

Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Acked-by: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Signed-off-by: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages</title>
<updated>2013-02-28T14:59:01Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2013-02-23T00:35:59Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f4ce5b3f263c05e7af66fa4eba7c4eb8e1b006bf'/>
<id>urn:sha1:f4ce5b3f263c05e7af66fa4eba7c4eb8e1b006bf</id>
<content type='text'>
commit 67d46b296a1ba1477c0df8ff3bc5e0167a0b0732 upstream.

Rob van der Heij reported the following (paraphrased) on private mail.

	The scenario is that I want to avoid backups to fill up the page
	cache and purge stuff that is more likely to be used again (this is
	with s390x Linux on z/VM, so I don't give it as much memory that
	we don't care anymore). So I have something with LD_PRELOAD that
	intercepts the close() call (from tar, in this case) and issues
	a posix_fadvise() just before closing the file.

	This mostly works, except for small files (less than 14 pages)
	that remains in page cache after the face.

Unfortunately Rob has not had a chance to test this exact patch but the
test program below should be reproducing the problem he described.

The issue is the per-cpu pagevecs for LRU additions.  If the pages are
added by one CPU but fadvise() is called on another then the pages
remain resident as the invalidate_mapping_pages() only drains the local
pagevecs via its call to pagevec_release().  The user-visible effect is
that a program that uses fadvise() properly is not obeyed.

A possible fix for this is to put the necessary smarts into
invalidate_mapping_pages() to globally drain the LRU pagevecs if a
pagevec page could not be discarded.  The downside with this is that an
inode cache shrink would send a global IPI and memory pressure
potentially causing global IPI storms is very undesirable.

Instead, this patch adds a check during fadvise(POSIX_FADV_DONTNEED) to
check if invalidate_mapping_pages() discarded all the requested pages.
If a subset of pages are discarded it drains the LRU pagevecs and tries
again.  If the second attempt fails, it assumes it is due to the pages
being mapped, locked or dirty and does not care.  With this patch, an
application using fadvise() correctly will be obeyed but there is a
downside that a malicious application can force the kernel to send
global IPIs and increase overhead.

If accepted, I would like this to be considered as a -stable candidate.
It's not an urgent issue but it's a system call that is not working as
advertised which is weak.

The following test program demonstrates the problem.  It should never
report that pages are still resident but will without this patch.  It
assumes that CPU 0 and 1 exist.

int main() {
	int fd;
	int pagesize = getpagesize();
	ssize_t written = 0, expected;
	char *buf;
	unsigned char *vec;
	int resident, i;
	cpu_set_t set;

	/* Prepare a buffer for writing */
	expected = FILESIZE_PAGES * pagesize;
	buf = malloc(expected + 1);
	if (buf == NULL) {
		printf("ENOMEM\n");
		exit(EXIT_FAILURE);
	}
	buf[expected] = 0;
	memset(buf, 'a', expected);

	/* Prepare the mincore vec */
	vec = malloc(FILESIZE_PAGES);
	if (vec == NULL) {
		printf("ENOMEM\n");
		exit(EXIT_FAILURE);
	}

	/* Bind ourselves to CPU 0 */
	CPU_ZERO(&amp;set);
	CPU_SET(0, &amp;set);
	if (sched_setaffinity(getpid(), sizeof(set), &amp;set) == -1) {
		perror("sched_setaffinity");
		exit(EXIT_FAILURE);
	}

	/* open file, unlink and write buffer */
	fd = open("fadvise-test-file", O_CREAT|O_EXCL|O_RDWR);
	if (fd == -1) {
		perror("open");
		exit(EXIT_FAILURE);
	}
	unlink("fadvise-test-file");
	while (written &lt; expected) {
		ssize_t this_write;
		this_write = write(fd, buf + written, expected - written);

		if (this_write == -1) {
			perror("write");
			exit(EXIT_FAILURE);
		}

		written += this_write;
	}
	free(buf);

	/*
	 * Force ourselves to another CPU. If fadvise only flushes the local
	 * CPUs pagevecs then the fadvise will fail to discard all file pages
	 */
	CPU_ZERO(&amp;set);
	CPU_SET(1, &amp;set);
	if (sched_setaffinity(getpid(), sizeof(set), &amp;set) == -1) {
		perror("sched_setaffinity");
		exit(EXIT_FAILURE);
	}

	/* sync and fadvise to discard the page cache */
	fsync(fd);
	if (posix_fadvise(fd, 0, expected, POSIX_FADV_DONTNEED) == -1) {
		perror("posix_fadvise");
		exit(EXIT_FAILURE);
	}

	/* map the file and use mincore to see which parts of it are resident */
	buf = mmap(NULL, expected, PROT_READ, MAP_SHARED, fd, 0);
	if (buf == NULL) {
		perror("mmap");
		exit(EXIT_FAILURE);
	}
	if (mincore(buf, expected, vec) == -1) {
		perror("mincore");
		exit(EXIT_FAILURE);
	}

	/* Check residency */
	for (i = 0, resident = 0; i &lt; FILESIZE_PAGES; i++) {
		if (vec[i])
			resident++;
	}
	if (resident != 0) {
		printf("Nr unexpected pages resident: %d\n", resident);
		exit(EXIT_FAILURE);
	}

	munmap(buf, expected);
	close(fd);
	free(vec);
	exit(EXIT_SUCCESS);
}

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Reported-by: Rob van der Heij &lt;rvdheij@gmail.com&gt;
Tested-by: Rob van der Heij &lt;rvdheij@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>tmpfs: fix use-after-free of mempolicy object</title>
<updated>2013-02-28T14:59:01Z</updated>
<author>
<name>Greg Thelen</name>
<email>gthelen@google.com</email>
</author>
<published>2013-02-23T00:36:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=06c7976c91566d676766ad57b574757f6d0abd02'/>
<id>urn:sha1:06c7976c91566d676766ad57b574757f6d0abd02</id>
<content type='text'>
commit 5f00110f7273f9ff04ac69a5f85bb535a4fd0987 upstream.

The tmpfs remount logic preserves filesystem mempolicy if the mpol=M
option is not specified in the remount request.  A new policy can be
specified if mpol=M is given.

Before this patch remounting an mpol bound tmpfs without specifying
mpol= mount option in the remount request would set the filesystem's
mempolicy object to a freed mempolicy object.

To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run:
    # mkdir /tmp/x

    # mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x

    # grep /tmp/x /proc/mounts
    nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0

    # mount -o remount,size=200M nodev /tmp/x

    # grep /tmp/x /proc/mounts
    nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0
        # note ? garbage in mpol=... output above

    # dd if=/dev/zero of=/tmp/x/f count=1
        # panic here

Panic:
    BUG: unable to handle kernel NULL pointer dereference at           (null)
    IP: [&lt;          (null)&gt;]           (null)
    [...]
    Oops: 0010 [#1] SMP DEBUG_PAGEALLOC
    Call Trace:
      mpol_shared_policy_init+0xa5/0x160
      shmem_get_inode+0x209/0x270
      shmem_mknod+0x3e/0xf0
      shmem_create+0x18/0x20
      vfs_create+0xb5/0x130
      do_last+0x9a1/0xea0
      path_openat+0xb3/0x4d0
      do_filp_open+0x42/0xa0
      do_sys_open+0xfe/0x1e0
      compat_sys_open+0x1b/0x20
      cstar_dispatch+0x7/0x1f

Non-debug kernels will not crash immediately because referencing the
dangling mpol will not cause a fault.  Instead the filesystem will
reference a freed mempolicy object, which will cause unpredictable
behavior.

The problem boils down to a dropped mpol reference below if
shmem_parse_options() does not allocate a new mpol:

    config = *sbinfo
    shmem_parse_options(data, &amp;config, true)
    mpol_put(sbinfo-&gt;mpol)
    sbinfo-&gt;mpol = config.mpol  /* BUG: saves unreferenced mpol */

This patch avoids the crash by not releasing the mempolicy if
shmem_parse_options() doesn't create a new mpol.

How far back does this issue go? I see it in both 2.6.36 and 3.3.  I did
not look back further.

Signed-off-by: Greg Thelen &lt;gthelen@google.com&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mmu_notifier_unregister NULL Pointer deref and multiple -&gt;release() callouts</title>
<updated>2013-02-28T14:59:00Z</updated>
<author>
<name>Robin Holt</name>
<email>holt@sgi.com</email>
</author>
<published>2013-02-23T00:35:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7fecb64cca9a98f60ff4bc794bb353512bd5c665'/>
<id>urn:sha1:7fecb64cca9a98f60ff4bc794bb353512bd5c665</id>
<content type='text'>
commit 751efd8610d3d7d67b7bdf7f62646edea7365dd7 upstream.

There is a race condition between mmu_notifier_unregister() and
__mmu_notifier_release().

Assume two tasks, one calling mmu_notifier_unregister() as a result of a
filp_close() -&gt;flush() callout (task A), and the other calling
mmu_notifier_release() from an mmput() (task B).

                A                               B
t1                                              srcu_read_lock()
t2              if (!hlist_unhashed())
t3                                              srcu_read_unlock()
t4              srcu_read_lock()
t5                                              hlist_del_init_rcu()
t6                                              synchronize_srcu()
t7              srcu_read_unlock()
t8              hlist_del_rcu()  &lt;--- NULL pointer deref.

Additionally, the list traversal in __mmu_notifier_release() is not
protected by the by the mmu_notifier_mm-&gt;hlist_lock which can result in
callouts to the -&gt;release() notifier from both mmu_notifier_unregister()
and __mmu_notifier_release().

-stable suggestions:

The stable trees prior to 3.7.y need commits 21a92735f660 and
70400303ce0c cherry-picked in that order prior to cherry-picking this
commit.  The 3.7.y tree already has those two commits.

Signed-off-by: Robin Holt &lt;holt@sgi.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Wanpeng Li &lt;liwanp@linux.vnet.ibm.com&gt;
Cc: Xiao Guangrong &lt;xiaoguangrong@linux.vnet.ibm.com&gt;
Cc: Avi Kivity &lt;avi@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Marcelo Tosatti &lt;mtosatti@redhat.com&gt;
Cc: Sagi Grimberg &lt;sagig@mellanox.co.il&gt;
Cc: Haggai Eran &lt;haggaie@mellanox.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: mmu_notifier: make the mmu_notifier srcu static</title>
<updated>2013-02-28T14:59:00Z</updated>
<author>
<name>Andrea Arcangeli</name>
<email>aarcange@redhat.com</email>
</author>
<published>2012-10-08T23:31:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d0730c2bf33c1cdab392f88013be6b5ebacf0f29'/>
<id>urn:sha1:d0730c2bf33c1cdab392f88013be6b5ebacf0f29</id>
<content type='text'>
commit 70400303ce0c4ced3139499c676d5c79636b0c72 upstream.

The variable must be static especially given the variable name.

s/RCU/SRCU/ over a few comments.

Signed-off-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Xiao Guangrong &lt;xiaoguangrong@linux.vnet.ibm.com&gt;
Cc: Sagi Grimberg &lt;sagig@mellanox.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Haggai Eran &lt;haggaie@mellanox.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: mmu_notifier: have mmu_notifiers use a global SRCU so they may safely schedule</title>
<updated>2013-02-28T14:59:00Z</updated>
<author>
<name>Sagi Grimberg</name>
<email>sagig@mellanox.co.il</email>
</author>
<published>2012-10-08T23:29:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4209ee0d3f7992af3903b5f9a3359f6d2f597c4b'/>
<id>urn:sha1:4209ee0d3f7992af3903b5f9a3359f6d2f597c4b</id>
<content type='text'>
commit 21a92735f660eaecf69a6f2e777f18463760ec32 upstream.

With an RCU based mmu_notifier implementation, any callout to
mmu_notifier_invalidate_range_{start,end}() or
mmu_notifier_invalidate_page() would not be allowed to call schedule()
as that could potentially allow a modification to the mmu_notifier
structure while it is currently being used.

Since srcu allocs 4 machine words per instance per cpu, we may end up
with memory exhaustion if we use srcu per mm.  So all mms share a global
srcu.  Note that during large mmu_notifier activity exit &amp; unregister
paths might hang for longer periods, but it is tolerable for current
mmu_notifier clients.

Signed-off-by: Sagi Grimberg &lt;sagig@mellanox.co.il&gt;
Signed-off-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Haggai Eran &lt;haggaie@mellanox.com&gt;
Cc: "Paul E. McKenney" &lt;paulmck@us.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: fix pageblock bitmap allocation</title>
<updated>2013-02-28T14:58:58Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-02-18T17:58:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=17c2f96bd1de5999900303b3eb923b8cdf6ee2ba'/>
<id>urn:sha1:17c2f96bd1de5999900303b3eb923b8cdf6ee2ba</id>
<content type='text'>
commit 7c45512df987c5619db041b5c9b80d281e26d3db upstream.

Commit c060f943d092 ("mm: use aligned zone start for pfn_to_bitidx
calculation") fixed out calculation of the index into the pageblock
bitmap when a !SPARSEMEM zome was not aligned to pageblock_nr_pages.

However, the _allocation_ of that bitmap had never taken this alignment
requirement into accout, so depending on the exact size and alignment of
the zone, the use of that index could then access past the allocation,
resulting in some very subtle memory corruption.

This was reported (and bisected) by Ingo Molnar: one of his random
config builds would hang with certain very specific kernel command line
options.

In the meantime, commit c060f943d092 has been marked for stable, so this
fix needs to be back-ported to the stable kernels that backported the
commit to use the right alignment.

Bisected-and-tested-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>thp, memcg: split hugepage for memcg oom on cow</title>
<updated>2013-01-17T16:50:53Z</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2012-05-29T22:06:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9035b20988416173a48ad2f00190846faf15ccb7'/>
<id>urn:sha1:9035b20988416173a48ad2f00190846faf15ccb7</id>
<content type='text'>
commit 1f1d06c34f7675026326cd9f39ff91e4555cf355 upstream.

On COW, a new hugepage is allocated and charged to the memcg.  If the
system is oom or the charge to the memcg fails, however, the fault
handler will return VM_FAULT_OOM which results in an oom kill.

Instead, it's possible to fallback to splitting the hugepage so that the
COW results only in an order-0 page being allocated and charged to the
memcg which has a higher liklihood to succeed.  This is expensive
because the hugepage must be split in the page fault handler, but it is
much better than unnecessarily oom killing a process.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Johannes Weiner &lt;jweiner@redhat.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
