<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/memcontrol.h, branch v3.0.85</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.0.85</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.0.85'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2012-08-01T19:27:16Z</updated>
<entry>
<title>mm: change isolate mode from #define to bitwise type</title>
<updated>2012-08-01T19:27:16Z</updated>
<author>
<name>Minchan Kim</name>
<email>minchan.kim@gmail.com</email>
</author>
<published>2011-11-01T00:06:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a15a3971cc49eefbde40b397a446c0fa9c5fed9c'/>
<id>urn:sha1:a15a3971cc49eefbde40b397a446c0fa9c5fed9c</id>
<content type='text'>
commit 4356f21d09283dc6d39a6f7287a65ddab61e2808 upstream.

Stable note: Not tracked in Bugzilla. This patch makes later patches
	easier to apply but has no other impact.

Change ISOLATE_XXX macro with bitwise isolate_mode_t type.  Normally,
macro isn't recommended as it's type-unsafe and making debugging harder as
symbol cannot be passed throught to the debugger.

Quote from Johannes
" Hmm, it would probably be cleaner to fully convert the isolation mode
into independent flags.  INACTIVE, ACTIVE, BOTH is currently a
tri-state among flags, which is a bit ugly."

This patch moves isolate mode from swap.h to mmzone.h by memcontrol.h

Signed-off-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;


</content>
</entry>
<entry>
<title>memcg: add mem_cgroup_replace_page_cache() to fix LRU issue</title>
<updated>2012-01-26T01:24:43Z</updated>
<author>
<name>KAMEZAWA Hiroyuki</name>
<email>kamezawa.hiroyu@jp.fujitsu.com</email>
</author>
<published>2012-01-13T01:17:44Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ea1c62778121f6ece5e0120250716b45e204cb13'/>
<id>urn:sha1:ea1c62778121f6ece5e0120250716b45e204cb13</id>
<content type='text'>
commit ab936cbcd02072a34b60d268f94440fd5cf1970b upstream.

Commit ef6a3c6311 ("mm: add replace_page_cache_page() function") added a
function replace_page_cache_page().  This function replaces a page in the
radix-tree with a new page.  WHen doing this, memory cgroup needs to fix
up the accounting information.  memcg need to check PCG_USED bit etc.

In some(many?) cases, 'newpage' is on LRU before calling
replace_page_cache().  So, memcg's LRU accounting information should be
fixed, too.

This patch adds mem_cgroup_replace_page_cache() and removes the old hooks.
 In that function, old pages will be unaccounted without touching
res_counter and new page will be accounted to the memcg (of old page).
WHen overwriting pc-&gt;mem_cgroup of newpage, take zone-&gt;lru_lock and avoid
races with LRU handling.

Background:
  replace_page_cache_page() is called by FUSE code in its splice() handling.
  Here, 'newpage' is replacing oldpage but this newpage is not a newly allocated
  page and may be on LRU. LRU mis-accounting will be critical for memory cgroup
  because rmdir() checks the whole LRU is empty and there is no account leak.
  If a page is on the other LRU than it should be, rmdir() will fail.

This bug was added in March 2011, but no bug report yet.  I guess there
are not many people who use memcg and FUSE at the same time with upstream
kernels.

The result of this bug is that admin cannot destroy a memcg because of
account leak.  So, no panic, no deadlock.  And, even if an active cgroup
exist, umount can succseed.  So no problem at shutdown.

Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Miklos Szeredi &lt;mszeredi@suse.cz&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
</entry>
<entry>
<title>vmscan,memcg: memcg aware swap token</title>
<updated>2011-06-16T03:03:59Z</updated>
<author>
<name>KOSAKI Motohiro</name>
<email>kosaki.motohiro@jp.fujitsu.com</email>
</author>
<published>2011-06-15T22:08:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a433658c30974fc87ba3ff52d7e4e6299762aa3d'/>
<id>urn:sha1:a433658c30974fc87ba3ff52d7e4e6299762aa3d</id>
<content type='text'>
Currently, memcg reclaim can disable swap token even if the swap token mm
doesn't belong in its memory cgroup.  It's slightly risky.  If an admin
creates very small mem-cgroup and silly guy runs contentious heavy memory
pressure workload, every tasks are going to lose swap token and then
system may become unresponsive.  That's bad.

This patch adds 'memcg' parameter into disable_swap_token().  and if the
parameter doesn't match swap token, VM doesn't disable it.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Reviewed-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Reviewed-by: Rik van Riel&lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: add the pagefault count into memcg stats</title>
<updated>2011-05-27T00:12:36Z</updated>
<author>
<name>Ying Han</name>
<email>yinghan@google.com</email>
</author>
<published>2011-05-26T23:25:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=456f998ec817ebfa254464be4f089542fa390645'/>
<id>urn:sha1:456f998ec817ebfa254464be4f089542fa390645</id>
<content type='text'>
Two new stats in per-memcg memory.stat which tracks the number of page
faults and number of major page faults.

  "pgfault"
  "pgmajfault"

They are different from "pgpgin"/"pgpgout" stat which count number of
pages charged/discharged to the cgroup and have no meaning of reading/
writing page to disk.

It is valuable to track the two stats for both measuring application's
performance as well as the efficiency of the kernel page reclaim path.
Counting pagefaults per process is useful, but we also need the aggregated
value since processes are monitored and controlled in cgroup basis in
memcg.

Functional test: check the total number of pgfault/pgmajfault of all
memcgs and compare with global vmstat value:

  $ cat /proc/vmstat | grep fault
  pgfault 1070751
  pgmajfault 553

  $ cat /dev/cgroup/memory.stat | grep fault
  pgfault 1071138
  pgmajfault 553
  total_pgfault 1071142
  total_pgmajfault 553

  $ cat /dev/cgroup/A/memory.stat | grep fault
  pgfault 199
  pgmajfault 0
  total_pgfault 199
  total_pgmajfault 0

Performance test: run page fault test(pft) wit 16 thread on faulting in
15G anon pages in 16G container.  There is no regression noticed on the
"flt/cpu/s"

Sample output from pft:

  TAG pft:anon-sys-default:
    Gb  Thr CLine   User     System     Wall    flt/cpu/s fault/wsec
    15   16   1     0.67s   233.41s    14.76s   16798.546 266356.260

  +-------------------------------------------------------------------------+
      N           Min           Max        Median           Avg        Stddev
  x  10     16682.962     17344.027     16913.524     16928.812      166.5362
  +  10     16695.568     16923.896     16820.604     16824.652     84.816568
  No difference proven at 95.0% confidence

[akpm@linux-foundation.org: fix build]
[hughd@google.com: shmem fix]
Signed-off-by: Ying Han &lt;yinghan@google.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Acked-by: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: rename mem_cgroup_zone_nr_pages() to mem_cgroup_zone_nr_lru_pages()</title>
<updated>2011-05-27T00:12:35Z</updated>
<author>
<name>Ying Han</name>
<email>yinghan@google.com</email>
</author>
<published>2011-05-26T23:25:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1bac180bd29e03989f50054af97b53b8d37a364a'/>
<id>urn:sha1:1bac180bd29e03989f50054af97b53b8d37a364a</id>
<content type='text'>
The caller of the function has been renamed to zone_nr_lru_pages(), and
this is just fixing up in the memcg code.  The current name is easily to
be mis-read as zone's total number of pages.

Signed-off-by: Ying Han &lt;yinghan@google.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: Balbir Singh &lt;balbir@in.ibm.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: reclaim memory from nodes in round-robin order</title>
<updated>2011-05-27T00:12:35Z</updated>
<author>
<name>Ying Han</name>
<email>yinghan@google.com</email>
</author>
<published>2011-05-26T23:25:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=889976dbcb1218119fdd950fb7819084e37d7d37'/>
<id>urn:sha1:889976dbcb1218119fdd950fb7819084e37d7d37</id>
<content type='text'>
Presently, memory cgroup's direct reclaim frees memory from the current
node.  But this has some troubles.  Usually when a set of threads works in
a cooperative way, they tend to operate on the same node.  So if they hit
limits under memcg they will reclaim memory from themselves, damaging the
active working set.

For example, assume 2 node system which has Node 0 and Node 1 and a memcg
which has 1G limit.  After some work, file cache remains and the usages
are

   Node 0:  1M
   Node 1:  998M.

and run an application on Node 0, it will eat its foot before freeing
unnecessary file caches.

This patch adds round-robin for NUMA and adds equal pressure to each node.
When using cpuset's spread memory feature, this will work very well.

But yes, a better algorithm is needed.

[akpm@linux-foundation.org: comment editing]
[kamezawa.hiroyu@jp.fujitsu.com: fix time comparisons]
Signed-off-by: Ying Han &lt;yinghan@google.com&gt;
Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Balbir Singh &lt;balbir@in.ibm.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: count the soft_limit reclaim in global background reclaim</title>
<updated>2011-05-27T00:12:35Z</updated>
<author>
<name>Ying Han</name>
<email>yinghan@google.com</email>
</author>
<published>2011-05-26T23:25:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=0ae5e89c60c9eb87da36a2614836bc434b0ec2ad'/>
<id>urn:sha1:0ae5e89c60c9eb87da36a2614836bc434b0ec2ad</id>
<content type='text'>
The global kswapd scans per-zone LRU and reclaims pages regardless of the
cgroup. It breaks memory isolation since one cgroup can end up reclaiming
pages from another cgroup. Instead we should rely on memcg-aware target
reclaim including per-memcg kswapd and soft_limit hierarchical reclaim under
memory pressure.

In the global background reclaim, we do soft reclaim before scanning the
per-zone LRU. However, the return value is ignored. This patch is the first
step to skip shrink_zone() if soft_limit reclaim does enough work.

This is part of the effort which tries to reduce reclaiming pages in global
LRU in memcg. The per-memcg background reclaim patchset further enhances the
per-cgroup targetting reclaim, which I should have V4 posted shortly.

Try running multiple memory intensive workloads within seperate memcgs. Watch
the counters of soft_steal in memory.stat.

  $ cat /dev/cgroup/A/memory.stat | grep 'soft'
  soft_steal 240000
  soft_scan 240000
  total_soft_steal 240000
  total_soft_scan 240000

This patch:

In the global background reclaim, we do soft reclaim before scanning the
per-zone LRU.  However, the return value is ignored.

We would like to skip shrink_zone() if soft_limit reclaim does enough
work.  Also, we need to make the memory pressure balanced across per-memcg
zones, like the logic vm-core.  This patch is the first step where we
start with counting the nr_scanned and nr_reclaimed from soft_limit
reclaim into the global scan_control.

Signed-off-by: Ying Han &lt;yinghan@google.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Balbir Singh &lt;balbir@in.ibm.com&gt;
Acked-by: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: fix mem_cgroup_rotate_reclaimable_page()</title>
<updated>2011-04-14T23:06:54Z</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2011-04-14T22:21:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=67954fe95705a8ff80335964bd7e621d13fbc499'/>
<id>urn:sha1:67954fe95705a8ff80335964bd7e621d13fbc499</id>
<content type='text'>
commit 3f58a8294333 ("move memcg reclaimable page into tail of inactive
list") added inline keyword twice in its prototype.

    CC      arch/x86/kernel/asm-offsets.s
  In file included from include/linux/swap.h:8,
                   from include/linux/suspend.h:4,
                   from arch/x86/kernel/asm-offsets.c:12:
  include/linux/memcontrol.h:220: error: duplicate `inline'

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: add memcg sanity checks at allocating and freeing pages</title>
<updated>2011-03-24T02:46:25Z</updated>
<author>
<name>Daisuke Nishimura</name>
<email>nishimura@mxp.nes.nec.co.jp</email>
</author>
<published>2011-03-23T23:42:25Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f212ad7cf9c73f8a7fa160e223dcb3f074441a72'/>
<id>urn:sha1:f212ad7cf9c73f8a7fa160e223dcb3f074441a72</id>
<content type='text'>
Add checks at allocating or freeing a page whether the page is used (iow,
charged) from the view point of memcg.

This check may be useful in debugging a problem and we did similar checks
before the commit 52d4b9ac(memcg: allocate all page_cgroup at boot).

This patch adds some overheads at allocating or freeing memory, so it's
enabled only when CONFIG_DEBUG_VM is enabled.

Signed-off-by: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Cc: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: move memcg reclaimable page into tail of inactive list</title>
<updated>2011-03-23T00:44:03Z</updated>
<author>
<name>Minchan Kim</name>
<email>minchan.kim@gmail.com</email>
</author>
<published>2011-03-22T23:32:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3f58a82943337fb6e79acfa5346719a97d3c0b98'/>
<id>urn:sha1:3f58a82943337fb6e79acfa5346719a97d3c0b98</id>
<content type='text'>
The rotate_reclaimable_page function moves just written out pages, which
the VM wanted to reclaim, to the end of the inactive list.  That way the
VM will find those pages first next time it needs to free memory.

This patch applies the rule in memcg.  It can help to prevent unnecessary
working page eviction of memcg.

Signed-off-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Acked-by: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
