<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/mm/compaction.c, branch v3.4.93</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.4.93</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.4.93'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2013-01-17T16:50:43Z</updated>
<entry>
<title>mm: compaction: fix echo 1 &gt; compact_memory return error issue</title>
<updated>2013-01-17T16:50:43Z</updated>
<author>
<name>Jason Liu</name>
<email>r64343@freescale.com</email>
</author>
<published>2013-01-11T22:31:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c0b96525363543a1ba6a277546ebc26ad9a53aa1'/>
<id>urn:sha1:c0b96525363543a1ba6a277546ebc26ad9a53aa1</id>
<content type='text'>
commit 7964c06d66c76507d8b6b662bffea770c29ef0ce upstream.

when run the folloing command under shell, it will return error

  sh/$ echo 1 &gt; /proc/sys/vm/compact_memory
  sh/$ sh: write error: Bad address

After strace, I found the following log:

  ...
  write(1, "1\n", 2)               = 3
  write(1, "", 4294967295)         = -1 EFAULT (Bad address)
  write(2, "echo: write error: Bad address\n", 31echo: write error: Bad address
  ) = 31

This tells system return 3(COMPACT_COMPLETE) after write data to
compact_memory.

The fix is to make the system just return 0 instead 3(COMPACT_COMPLETE)
from sysctl_compaction_handler after compaction_nodes finished.

Signed-off-by: Jason Liu &lt;r64343@freescale.com&gt;
Suggested-by: David Rientjes &lt;rientjes@google.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Acked-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm, thp: abort compaction if migration page cannot be charged to memcg</title>
<updated>2012-07-16T16:04:44Z</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2012-07-11T21:02:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7a08b440fa93e036968102597c8a2ab809a9bdc4'/>
<id>urn:sha1:7a08b440fa93e036968102597c8a2ab809a9bdc4</id>
<content type='text'>
commit 4bf2bba3750f10aa9e62e6949bc7e8329990f01b upstream.

If page migration cannot charge the temporary page to the memcg,
migrate_pages() will return -ENOMEM.  This isn't considered in memory
compaction however, and the loop continues to iterate over all
pageblocks trying to isolate and migrate pages.  If a small number of
very large memcgs happen to be oom, however, these attempts will mostly
be futile leading to an enormous amout of cpu consumption due to the
page migration failures.

This patch will short circuit and fail memory compaction if
migrate_pages() returns -ENOMEM.  COMPACT_PARTIAL is returned in case
some migrations were successful so that the page allocator will retry.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Kamezawa Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mm: compaction: make compact_control order signed</title>
<updated>2012-03-22T00:54:56Z</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@oracle.com</email>
</author>
<published>2012-03-21T23:33:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=aad6ec3777bf4930d4f7293745cc4c17a2d87947'/>
<id>urn:sha1:aad6ec3777bf4930d4f7293745cc4c17a2d87947</id>
<content type='text'>
"order" is -1 when compacting via /proc/sys/vm/compact_memory.  Making
it unsigned causes a bug in __compact_pgdat() when we test:

	if (cc-&gt;order &lt; 0 || !compaction_deferred(zone, cc-&gt;order))
		compact_zone(zone, cc);

[akpm@linux-foundation.org: make __compact_pgdat()'s comparison match other code sites]
Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Minchan Kim &lt;minchan@kernel.org&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>compact_pgdat: workaround lockdep warning in kswapd</title>
<updated>2012-03-22T00:54:56Z</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2012-03-21T23:33:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8575ec29f61da83a2bf382c8c490499dc022101e'/>
<id>urn:sha1:8575ec29f61da83a2bf382c8c490499dc022101e</id>
<content type='text'>
I get this lockdep warning from swapping load on linux-next, due to
"vmscan: kswapd carefully call compaction".

=================================
[ INFO: inconsistent lock state ]
3.3.0-rc2-next-20120201 #5 Not tainted
---------------------------------
inconsistent {RECLAIM_FS-ON-W} -&gt; {IN-RECLAIM_FS-W} usage.
kswapd0/28 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (pcpu_alloc_mutex){+.+.?.}, at: [&lt;ffffffff810d6684&gt;] pcpu_alloc+0x67/0x325
{RECLAIM_FS-ON-W} state was registered at:
  [&lt;ffffffff81099b75&gt;] mark_held_locks+0xd7/0x103
  [&lt;ffffffff8109a13c&gt;] lockdep_trace_alloc+0x85/0x9e
  [&lt;ffffffff810f6bdc&gt;] __kmalloc+0x6c/0x14b
  [&lt;ffffffff810d57fd&gt;] pcpu_mem_zalloc+0x59/0x62
  [&lt;ffffffff810d5d16&gt;] pcpu_extend_area_map+0x26/0xb1
  [&lt;ffffffff810d679f&gt;] pcpu_alloc+0x182/0x325
  [&lt;ffffffff810d694d&gt;] __alloc_percpu+0xb/0xd
  [&lt;ffffffff8142ebfd&gt;] snmp_mib_init+0x1e/0x2e
  [&lt;ffffffff8185cd8d&gt;] ipv4_mib_init_net+0x7a/0x184
  [&lt;ffffffff813dc963&gt;] ops_init.clone.0+0x6b/0x73
  [&lt;ffffffff813dc9cc&gt;] register_pernet_operations+0x61/0xa0
  [&lt;ffffffff813dca8e&gt;] register_pernet_subsys+0x29/0x42
  [&lt;ffffffff8185d044&gt;] inet_init+0x1ad/0x252
  [&lt;ffffffff810002e3&gt;] do_one_initcall+0x7a/0x12f
  [&lt;ffffffff81832bc5&gt;] kernel_init+0x9d/0x11e
  [&lt;ffffffff814e51e4&gt;] kernel_thread_helper+0x4/0x10
irq event stamp: 656613
hardirqs last  enabled at (656613): [&lt;ffffffff814e0ddc&gt;] __mutex_unlock_slowpath+0x104/0x128
hardirqs last disabled at (656612): [&lt;ffffffff814e0d34&gt;] __mutex_unlock_slowpath+0x5c/0x128
softirqs last  enabled at (655568): [&lt;ffffffff8105b4a5&gt;] __do_softirq+0x120/0x136
softirqs last disabled at (654757): [&lt;ffffffff814e52dc&gt;] call_softirq+0x1c/0x30

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(pcpu_alloc_mutex);
  &lt;Interrupt&gt;
    lock(pcpu_alloc_mutex);

 *** DEADLOCK ***

no locks held by kswapd0/28.

stack backtrace:
Pid: 28, comm: kswapd0 Not tainted 3.3.0-rc2-next-20120201 #5
Call Trace:
 [&lt;ffffffff810981f4&gt;] print_usage_bug+0x1bf/0x1d0
 [&lt;ffffffff81096c3e&gt;] ? print_irq_inversion_bug+0x1d9/0x1d9
 [&lt;ffffffff810982c0&gt;] mark_lock_irq+0xbb/0x22e
 [&lt;ffffffff810c5399&gt;] ? free_hot_cold_page+0x13d/0x14f
 [&lt;ffffffff81098684&gt;] mark_lock+0x251/0x331
 [&lt;ffffffff81098893&gt;] mark_irqflags+0x12f/0x141
 [&lt;ffffffff81098e32&gt;] __lock_acquire+0x58d/0x753
 [&lt;ffffffff810d6684&gt;] ? pcpu_alloc+0x67/0x325
 [&lt;ffffffff81099433&gt;] lock_acquire+0x54/0x6a
 [&lt;ffffffff810d6684&gt;] ? pcpu_alloc+0x67/0x325
 [&lt;ffffffff8107a5b8&gt;] ? add_preempt_count+0xa9/0xae
 [&lt;ffffffff814e0a21&gt;] mutex_lock_nested+0x5e/0x315
 [&lt;ffffffff810d6684&gt;] ? pcpu_alloc+0x67/0x325
 [&lt;ffffffff81098f81&gt;] ? __lock_acquire+0x6dc/0x753
 [&lt;ffffffff810c9fb0&gt;] ? __pagevec_release+0x2c/0x2c
 [&lt;ffffffff810d6684&gt;] pcpu_alloc+0x67/0x325
 [&lt;ffffffff810c9fb0&gt;] ? __pagevec_release+0x2c/0x2c
 [&lt;ffffffff810d694d&gt;] __alloc_percpu+0xb/0xd
 [&lt;ffffffff8106c35e&gt;] schedule_on_each_cpu+0x23/0x110
 [&lt;ffffffff810c9fcb&gt;] lru_add_drain_all+0x10/0x12
 [&lt;ffffffff810f126f&gt;] __compact_pgdat+0x20/0x182
 [&lt;ffffffff810f15c2&gt;] compact_pgdat+0x27/0x29
 [&lt;ffffffff810c306b&gt;] ? zone_watermark_ok+0x1a/0x1c
 [&lt;ffffffff810cdf6f&gt;] balance_pgdat+0x732/0x751
 [&lt;ffffffff810ce0ed&gt;] kswapd+0x15f/0x178
 [&lt;ffffffff810cdf8e&gt;] ? balance_pgdat+0x751/0x751
 [&lt;ffffffff8106fd11&gt;] kthread+0x84/0x8c
 [&lt;ffffffff814e51e4&gt;] kernel_thread_helper+0x4/0x10
 [&lt;ffffffff810787ed&gt;] ? finish_task_switch+0x85/0xea
 [&lt;ffffffff814e3861&gt;] ? retint_restore_args+0xe/0xe
 [&lt;ffffffff8106fc8d&gt;] ? __init_kthread_worker+0x56/0x56
 [&lt;ffffffff814e51e0&gt;] ? gs_change+0xb/0xb

The RECLAIM_FS notations indicate that it's doing the GFP_FS checking that
Nick hacked into lockdep a while back: I think we're intended to read that
"&lt;Interrupt&gt;" in the DEADLOCK scenario as "&lt;Direct reclaim&gt;".

I'm hazy, I have not reached any conclusion as to whether it's right to
complain or not; but I believe it's uneasy about kswapd now doing the
mutex_lock(&amp;pcpu_alloc_mutex) which lru_add_drain_all() entails.  Nor have
I reached any conclusion as to whether it's important for kswapd to do
that draining or not.

But so as not to get blocked on this, with lockdep disabled from giving
further reports, here's a patch which removes the lru_add_drain_all() from
kswapd's callpath (and calls it only once from compact_nodes(), instead of
once per node).

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>vmscan: only defer compaction for failed order and higher</title>
<updated>2012-03-22T00:54:56Z</updated>
<author>
<name>Rik van Riel</name>
<email>riel@redhat.com</email>
</author>
<published>2012-03-21T23:33:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=aff622495c9a0b56148192e53bdec539f5e147f2'/>
<id>urn:sha1:aff622495c9a0b56148192e53bdec539f5e147f2</id>
<content type='text'>
Currently a failed order-9 (transparent hugepage) compaction can lead to
memory compaction being temporarily disabled for a memory zone.  Even if
we only need compaction for an order 2 allocation, eg.  for jumbo frames
networking.

The fix is relatively straightforward: keep track of the highest order at
which compaction is succeeding, and only defer compaction for orders at
which compaction is failing.

Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Acked-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Hillf Danton &lt;dhillf@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>vmscan: kswapd carefully call compaction</title>
<updated>2012-03-22T00:54:56Z</updated>
<author>
<name>Rik van Riel</name>
<email>riel@redhat.com</email>
</author>
<published>2012-03-21T23:33:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7be62de99adcab4449d416977b4274985c5fe023'/>
<id>urn:sha1:7be62de99adcab4449d416977b4274985c5fe023</id>
<content type='text'>
With CONFIG_COMPACTION enabled, kswapd does not try to free contiguous
free pages, even when it is woken for a higher order request.

This could be bad for eg.  jumbo frame network allocations, which are done
from interrupt context and cannot compact memory themselves.  Higher than
before allocation failure rates in the network receive path have been
observed in kernels with compaction enabled.

Teach kswapd to defragment the memory zones in a node, but only if
required and compaction is not deferred in a zone.

[akpm@linux-foundation.org: reduce scope of zones_need_compaction]
Signed-off-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Hillf Danton &lt;dhillf@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: compaction: check for overlapping nodes during isolation for migration</title>
<updated>2012-02-09T03:03:51Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2012-02-09T01:13:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=dc9086004b3d5db75997a645b3fe08d9138b7ad0'/>
<id>urn:sha1:dc9086004b3d5db75997a645b3fe08d9138b7ad0</id>
<content type='text'>
When isolating pages for migration, migration starts at the start of a
zone while the free scanner starts at the end of the zone.  Migration
avoids entering a new zone by never going beyond the free scanned.

Unfortunately, in very rare cases nodes can overlap.  When this happens,
migration isolates pages without the LRU lock held, corrupting lists
which will trigger errors in reclaim or during page free such as in the
following oops

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  IP: [&lt;ffffffff810f795c&gt;] free_pcppages_bulk+0xcc/0x450
  PGD 1dda554067 PUD 1e1cb58067 PMD 0
  Oops: 0000 [#1] SMP
  CPU 37
  Pid: 17088, comm: memcg_process_s Tainted: G            X
  RIP: free_pcppages_bulk+0xcc/0x450
  Process memcg_process_s (pid: 17088, threadinfo ffff881c2926e000, task ffff881c2926c0c0)
  Call Trace:
    free_hot_cold_page+0x17e/0x1f0
    __pagevec_free+0x90/0xb0
    release_pages+0x22a/0x260
    pagevec_lru_move_fn+0xf3/0x110
    putback_lru_page+0x66/0xe0
    unmap_and_move+0x156/0x180
    migrate_pages+0x9e/0x1b0
    compact_zone+0x1f3/0x2f0
    compact_zone_order+0xa2/0xe0
    try_to_compact_pages+0xdf/0x110
    __alloc_pages_direct_compact+0xee/0x1c0
    __alloc_pages_slowpath+0x370/0x830
    __alloc_pages_nodemask+0x1b1/0x1c0
    alloc_pages_vma+0x9b/0x160
    do_huge_pmd_anonymous_page+0x160/0x270
    do_page_fault+0x207/0x4c0
    page_fault+0x25/0x30

The "X" in the taint flag means that external modules were loaded but but
is unrelated to the bug triggering.  The real problem was because the PFN
layout looks like this

  Zone PFN ranges:
    DMA      0x00000010 -&gt; 0x00001000
    DMA32    0x00001000 -&gt; 0x00100000
    Normal   0x00100000 -&gt; 0x01e80000
  Movable zone start PFN for each node
  early_node_map[14] active PFN ranges
      0: 0x00000010 -&gt; 0x0000009b
      0: 0x00000100 -&gt; 0x0007a1ec
      0: 0x0007a354 -&gt; 0x0007a379
      0: 0x0007f7ff -&gt; 0x0007f800
      0: 0x00100000 -&gt; 0x00680000
      1: 0x00680000 -&gt; 0x00e80000
      0: 0x00e80000 -&gt; 0x01080000
      1: 0x01080000 -&gt; 0x01280000
      0: 0x01280000 -&gt; 0x01480000
      1: 0x01480000 -&gt; 0x01680000
      0: 0x01680000 -&gt; 0x01880000
      1: 0x01880000 -&gt; 0x01a80000
      0: 0x01a80000 -&gt; 0x01c80000
      1: 0x01c80000 -&gt; 0x01e80000

The fix is straight-forward.  isolate_migratepages() has to make a
similar check to isolate_freepage to ensure that it never isolates pages
from a zone it does not hold the LRU lock for.

This was discovered in a 3.0-based kernel but it affects 3.1.x, 3.2.x
and current mainline.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Acked-by: Michal Nazarewicz &lt;mina86@mina86.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for migration</title>
<updated>2012-02-04T00:16:41Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2012-02-03T23:37:18Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0bf380bc70ecba68cb4d74dc656cc2fa8c4d801a'/>
<id>urn:sha1:0bf380bc70ecba68cb4d74dc656cc2fa8c4d801a</id>
<content type='text'>
When isolating for migration, migration starts at the start of a zone
which is not necessarily pageblock aligned.  Further, it stops isolating
when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
an invalid PFN which can result in a crash.  This was originally reported
against a 3.0-based kernel with the following trace in a crash dump.

PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
 #0 [d72d3ad0] crash_kexec at c028cfdb
 #1 [d72d3b24] oops_end at c05c5322
 #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
 #3 [d72d3bec] bad_area at c0227fb6
 #4 [d72d3c00] do_page_fault at c05c72ec
 #5 [d72d3c80] error_code (via page_fault) at c05c47a4
    EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
    DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
    CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
 #6 [d72d3cb4] isolate_migratepages at c030b15a
 #7 [d72d3d14] zone_watermark_ok at c02d26cb
 #8 [d72d3d2c] compact_zone at c030b8de
 #9 [d72d3d68] compact_zone_order at c030bba1
#10 [d72d3db4] try_to_compact_pages at c030bc84
#11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
#12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
#13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
#14 [d72d3eb8] alloc_pages_vma at c030a845
#15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
#16 [d72d3f00] handle_mm_fault at c02f36c6
#17 [d72d3f30] do_page_fault at c05c70ed
#18 [d72d3fb0] error_code (via page_fault) at c05c47a4
    EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
    DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
    SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
    CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202

It was also reported by Herbert van den Bergh against 3.1-based kernel
with the following snippet from the console log.

BUG: unable to handle kernel paging request at 01c00008
IP: [&lt;c0522399&gt;] isolate_migratepages+0x119/0x390
*pdpt = 000000002f7ce001 *pde = 0000000000000000

It is expected that it also affects 3.2.x and current mainline.

The problem is that pfn_valid is only called on the first PFN being
checked and that PFN is not necessarily aligned.  Lets say we have a case
like this

H = MAX_ORDER_NR_PAGES boundary
| = pageblock boundary
m = cc-&gt;migrate_pfn
f = cc-&gt;free_pfn
o = memory hole

H------|------H------|----m-Hoooooo|ooooooH-f----|------H

The migrate_pfn is just below a memory hole and the free scanner is beyond
the hole.  When isolate_migratepages started, it scans from migrate_pfn to
migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
pfn_valid() on the first PFN but then scans into the hole where there are
not necessarily valid struct pages.

This patch ensures that isolate_migratepages calls pfn_valid when
necessary.

Reported-by: Herbert van den Bergh &lt;herbert.van.den.bergh@oracle.com&gt;
Tested-by: Herbert van den Bergh &lt;herbert.van.den.bergh@oracle.com&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Acked-by: Michal Nazarewicz &lt;mina86@mina86.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: compaction: introduce sync-light migration for use by compaction</title>
<updated>2012-01-13T04:13:09Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2012-01-13T01:19:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a6bc32b899223a877f595ef9ddc1e89ead5072b8'/>
<id>urn:sha1:a6bc32b899223a877f595ef9ddc1e89ead5072b8</id>
<content type='text'>
This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
mode that avoids writing back pages to backing storage.  Async compaction
maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
used.

This avoids sync compaction stalling for an excessive length of time,
particularly when copying files to a USB stick where there might be a
large number of dirty pages backed by a filesystem that does not support
-&gt;writepages.

[aarcange@redhat.com: This patch is heavily based on Andrea's work]
[akpm@linux-foundation.org: fix fs/nfs/write.c build]
[akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Andy Isaacson &lt;adi@hexapodia.org&gt;
Cc: Nai Xia &lt;nai.xia@gmail.com&gt;
Cc: Johannes Weiner &lt;jweiner@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: compaction: make isolate_lru_page() filter-aware again</title>
<updated>2012-01-13T04:13:09Z</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2012-01-13T01:19:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c82449352854ff09e43062246af86bdeb628f0c3'/>
<id>urn:sha1:c82449352854ff09e43062246af86bdeb628f0c3</id>
<content type='text'>
Commit 39deaf85 ("mm: compaction: make isolate_lru_page() filter-aware")
noted that compaction does not migrate dirty or writeback pages and that
is was meaningless to pick the page and re-add it to the LRU list.  This
had to be partially reverted because some dirty pages can be migrated by
compaction without blocking.

This patch updates "mm: compaction: make isolate_lru_page" by skipping
over pages that migration has no possibility of migrating to minimise LRU
disruption.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Reviewed-by: Rik van Riel&lt;riel@redhat.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Reviewed-by: Minchan Kim &lt;minchan@kernel.org&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Andy Isaacson &lt;adi@hexapodia.org&gt;
Cc: Nai Xia &lt;nai.xia@gmail.com&gt;
Cc: Johannes Weiner &lt;jweiner@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
