<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/memcontrol.h, branch v3.2</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.2'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2011-11-02T23:07:00Z</updated>
<entry>
<title>memcg: skip scanning active lists based on individual size</title>
<updated>2011-11-02T23:07:00Z</updated>
<author>
<name>Johannes Weiner</name>
<email>jweiner@redhat.com</email>
</author>
<published>2011-11-02T20:38:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9b272977e3b99a8699361d214b51f98c8a9e0e7b'/>
<id>urn:sha1:9b272977e3b99a8699361d214b51f98c8a9e0e7b</id>
<content type='text'>
Reclaim decides to skip scanning an active list when the corresponding
inactive list is above a certain size in comparison to leave the assumed
working set alone while there are still enough reclaim candidates around.

The memcg implementation of comparing those lists instead reports whether
the whole memcg is low on the requested type of inactive pages,
considering all nodes and zones.

This can lead to an oversized active list not being scanned because of the
state of the other lists in the memcg, as well as an active list being
scanned while its corresponding inactive list has enough pages.

Not only is this wrong, it's also a scalability hazard, because the global
memory state over all nodes and zones has to be gathered for each memcg
and zone scanned.

Make these calculations purely based on the size of the two LRU lists
that are actually affected by the outcome of the decision.

Signed-off-by: Johannes Weiner &lt;jweiner@redhat.com&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Reviewed-by: Ying Han &lt;yinghan@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: rename mem variable to memcg</title>
<updated>2011-11-02T23:06:59Z</updated>
<author>
<name>Raghavendra K T</name>
<email>raghavendra.kt@linux.vnet.ibm.com</email>
</author>
<published>2011-11-02T20:38:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c0ff4b8540a5c158b8e5bafb7d767298b67b0b92'/>
<id>urn:sha1:c0ff4b8540a5c158b8e5bafb7d767298b67b0b92</id>
<content type='text'>
The memcg code sometimes uses "struct mem_cgroup *mem" and sometimes uses
"struct mem_cgroup *memcg".  Rename all mem variables to memcg in source
file.

Signed-off-by: Raghavendra K T &lt;raghavendra.kt@linux.vnet.ibm.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: change isolate mode from #define to bitwise type</title>
<updated>2011-11-01T00:30:44Z</updated>
<author>
<name>Minchan Kim</name>
<email>minchan.kim@gmail.com</email>
</author>
<published>2011-11-01T00:06:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4356f21d09283dc6d39a6f7287a65ddab61e2808'/>
<id>urn:sha1:4356f21d09283dc6d39a6f7287a65ddab61e2808</id>
<content type='text'>
Change ISOLATE_XXX macro with bitwise isolate_mode_t type.  Normally,
macro isn't recommended as it's type-unsafe and making debugging harder as
symbol cannot be passed throught to the debugger.

Quote from Johannes
" Hmm, it would probably be cleaner to fully convert the isolation mode
into independent flags.  INACTIVE, ACTIVE, BOTH is currently a
tri-state among flags, which is a bit ugly."

This patch moves isolate mode from swap.h to mmzone.h by memcontrol.h

Signed-off-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: Revert "memcg: add memory.vmscan_stat"</title>
<updated>2011-09-15T01:09:38Z</updated>
<author>
<name>Johannes Weiner</name>
<email>jweiner@redhat.com</email>
</author>
<published>2011-09-14T23:21:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=185efc0f9a1f2d6ad6d4782c5d9e529f3290567f'/>
<id>urn:sha1:185efc0f9a1f2d6ad6d4782c5d9e529f3290567f</id>
<content type='text'>
Revert the post-3.0 commit 82f9d486e59f5 ("memcg: add
memory.vmscan_stat").

The implementation of per-memcg reclaim statistics violates how memcg
hierarchies usually behave: hierarchically.

The reclaim statistics are accounted to child memcgs and the parent
hitting the limit, but not to hierarchy levels in between.  Usually,
hierarchical statistics are perfectly recursive, with each level
representing the sum of itself and all its children.

Since this exports statistics to userspace, this may lead to confusion
and problems with changing things after the release, so revert it now,
we can try again later.

Signed-off-by: Johannes Weiner &lt;jweiner@redhat.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>tmpfs: convert mem_cgroup shmem to radix-swap</title>
<updated>2011-08-04T00:25:24Z</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2011-08-03T23:21:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=aa3b189551ad8e5cc1d9c663735c131650238278'/>
<id>urn:sha1:aa3b189551ad8e5cc1d9c663735c131650238278</id>
<content type='text'>
Remove mem_cgroup_shmem_charge_fallback(): it was only required when we
had to move swappage to filecache with GFP_NOWAIT.

Remove the GFP_NOWAIT special case from mem_cgroup_cache_charge(), by
moving its call out from shmem_add_to_page_cache() to two of thats three
callers.  But leave it doing mem_cgroup_uncharge_cache_page() on error:
although asymmetrical, it's easier for all 3 callers to handle.

These two changes would also be appropriate if anyone were to start
using shmem_read_mapping_page_gfp() with GFP_NOWAIT.

Remove mem_cgroup_get_shmem_target(): mc_handle_file_pte() can test
radix_tree_exceptional_entry() to get what it needs for itself.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: add memory.vmscan_stat</title>
<updated>2011-07-26T23:49:42Z</updated>
<author>
<name>KAMEZAWA Hiroyuki</name>
<email>kamezawa.hiroyu@jp.fujitsu.com</email>
</author>
<published>2011-07-26T23:08:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=82f9d486e59f588c7d100865c36510644abda356'/>
<id>urn:sha1:82f9d486e59f588c7d100865c36510644abda356</id>
<content type='text'>
The commit log of 0ae5e89c60c9 ("memcg: count the soft_limit reclaim
in...") says it adds scanning stats to memory.stat file.  But it doesn't
because we considered we needed to make a concensus for such new APIs.

This patch is a trial to add memory.scan_stat. This shows
  - the number of scanned pages(total, anon, file)
  - the number of rotated pages(total, anon, file)
  - the number of freed pages(total, anon, file)
  - the number of elaplsed time (including sleep/pause time)

  for both of direct/soft reclaim.

The biggest difference with oringinal Ying's one is that this file
can be reset by some write, as

  # echo 0 ...../memory.scan_stat

Example of output is here. This is a result after make -j 6 kernel
under 300M limit.

  [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.scan_stat
  [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.vmscan_stat
  scanned_pages_by_limit 9471864
  scanned_anon_pages_by_limit 6640629
  scanned_file_pages_by_limit 2831235
  rotated_pages_by_limit 4243974
  rotated_anon_pages_by_limit 3971968
  rotated_file_pages_by_limit 272006
  freed_pages_by_limit 2318492
  freed_anon_pages_by_limit 962052
  freed_file_pages_by_limit 1356440
  elapsed_ns_by_limit 351386416101
  scanned_pages_by_system 0
  scanned_anon_pages_by_system 0
  scanned_file_pages_by_system 0
  rotated_pages_by_system 0
  rotated_anon_pages_by_system 0
  rotated_file_pages_by_system 0
  freed_pages_by_system 0
  freed_anon_pages_by_system 0
  freed_file_pages_by_system 0
  elapsed_ns_by_system 0
  scanned_pages_by_limit_under_hierarchy 9471864
  scanned_anon_pages_by_limit_under_hierarchy 6640629
  scanned_file_pages_by_limit_under_hierarchy 2831235
  rotated_pages_by_limit_under_hierarchy 4243974
  rotated_anon_pages_by_limit_under_hierarchy 3971968
  rotated_file_pages_by_limit_under_hierarchy 272006
  freed_pages_by_limit_under_hierarchy 2318492
  freed_anon_pages_by_limit_under_hierarchy 962052
  freed_file_pages_by_limit_under_hierarchy 1356440
  elapsed_ns_by_limit_under_hierarchy 351386416101
  scanned_pages_by_system_under_hierarchy 0
  scanned_anon_pages_by_system_under_hierarchy 0
  scanned_file_pages_by_system_under_hierarchy 0
  rotated_pages_by_system_under_hierarchy 0
  rotated_anon_pages_by_system_under_hierarchy 0
  rotated_file_pages_by_system_under_hierarchy 0
  freed_pages_by_system_under_hierarchy 0
  freed_anon_pages_by_system_under_hierarchy 0
  freed_file_pages_by_system_under_hierarchy 0
  elapsed_ns_by_system_under_hierarchy 0

total_xxxx is for hierarchy management.

This will be useful for further memcg developments and need to be
developped before we do some complicated rework on LRU/softlimit
management.

This patch adds a new struct memcg_scanrecord into scan_control struct.
sc-&gt;nr_scanned at el is not designed for exporting information.  For
example, nr_scanned is reset frequentrly and incremented +2 at scanning
mapped pages.

To avoid complexity, I added a new param in scan_control which is for
exporting scanning score.

Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Cc: Andrew Bresticker &lt;abrestic@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: consolidate memory cgroup lru stat functions</title>
<updated>2011-07-26T23:49:42Z</updated>
<author>
<name>KAMEZAWA Hiroyuki</name>
<email>kamezawa.hiroyu@jp.fujitsu.com</email>
</author>
<published>2011-07-26T23:08:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bb2a0de92c891b8feeedc0178acb3ae009d899a8'/>
<id>urn:sha1:bb2a0de92c891b8feeedc0178acb3ae009d899a8</id>
<content type='text'>
In mm/memcontrol.c, there are many lru stat functions as..

  mem_cgroup_zone_nr_lru_pages
  mem_cgroup_node_nr_file_lru_pages
  mem_cgroup_nr_file_lru_pages
  mem_cgroup_node_nr_anon_lru_pages
  mem_cgroup_nr_anon_lru_pages
  mem_cgroup_node_nr_unevictable_lru_pages
  mem_cgroup_nr_unevictable_lru_pages
  mem_cgroup_node_nr_lru_pages
  mem_cgroup_nr_lru_pages
  mem_cgroup_get_local_zonestat

Some of them are under #ifdef MAX_NUMNODES &gt;1 and others are not.
This seems bad. This patch consolidates all functions into

  mem_cgroup_zone_nr_lru_pages()
  mem_cgroup_node_nr_lru_pages()
  mem_cgroup_nr_lru_pages()

For these functions, "which LRU?" information is passed by a mask.

example:
  mem_cgroup_nr_lru_pages(mem, BIT(LRU_ACTIVE_ANON))

And I added some macro as ALL_LRU, ALL_LRU_FILE, ALL_LRU_ANON.

example:
  mem_cgroup_nr_lru_pages(mem, ALL_LRU)

BTW, considering layout of NUMA memory placement of counters, this patch seems
to be better.

Now, when we gather all LRU information, we scan in following orer
    for_each_lru -&gt; for_each_node -&gt; for_each_zone.

This means we'll touch cache lines in different node in turn.

After patch, we'll scan
    for_each_node -&gt; for_each_zone -&gt; for_each_lru(mask)

Then, we'll gather information in the same cacheline at once.

[akpm@linux-foundation.org: fix warnigns, build error]
Signed-off-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Cc: Balbir Singh &lt;bsingharora@gmail.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Ying Han &lt;yinghan@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>vmscan,memcg: memcg aware swap token</title>
<updated>2011-06-16T03:03:59Z</updated>
<author>
<name>KOSAKI Motohiro</name>
<email>kosaki.motohiro@jp.fujitsu.com</email>
</author>
<published>2011-06-15T22:08:13Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a433658c30974fc87ba3ff52d7e4e6299762aa3d'/>
<id>urn:sha1:a433658c30974fc87ba3ff52d7e4e6299762aa3d</id>
<content type='text'>
Currently, memcg reclaim can disable swap token even if the swap token mm
doesn't belong in its memory cgroup.  It's slightly risky.  If an admin
creates very small mem-cgroup and silly guy runs contentious heavy memory
pressure workload, every tasks are going to lose swap token and then
system may become unresponsive.  That's bad.

This patch adds 'memcg' parameter into disable_swap_token().  and if the
parameter doesn't match swap token, VM doesn't disable it.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Reviewed-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Reviewed-by: Rik van Riel&lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: add the pagefault count into memcg stats</title>
<updated>2011-05-27T00:12:36Z</updated>
<author>
<name>Ying Han</name>
<email>yinghan@google.com</email>
</author>
<published>2011-05-26T23:25:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=456f998ec817ebfa254464be4f089542fa390645'/>
<id>urn:sha1:456f998ec817ebfa254464be4f089542fa390645</id>
<content type='text'>
Two new stats in per-memcg memory.stat which tracks the number of page
faults and number of major page faults.

  "pgfault"
  "pgmajfault"

They are different from "pgpgin"/"pgpgout" stat which count number of
pages charged/discharged to the cgroup and have no meaning of reading/
writing page to disk.

It is valuable to track the two stats for both measuring application's
performance as well as the efficiency of the kernel page reclaim path.
Counting pagefaults per process is useful, but we also need the aggregated
value since processes are monitored and controlled in cgroup basis in
memcg.

Functional test: check the total number of pgfault/pgmajfault of all
memcgs and compare with global vmstat value:

  $ cat /proc/vmstat | grep fault
  pgfault 1070751
  pgmajfault 553

  $ cat /dev/cgroup/memory.stat | grep fault
  pgfault 1071138
  pgmajfault 553
  total_pgfault 1071142
  total_pgmajfault 553

  $ cat /dev/cgroup/A/memory.stat | grep fault
  pgfault 199
  pgmajfault 0
  total_pgfault 199
  total_pgmajfault 0

Performance test: run page fault test(pft) wit 16 thread on faulting in
15G anon pages in 16G container.  There is no regression noticed on the
"flt/cpu/s"

Sample output from pft:

  TAG pft:anon-sys-default:
    Gb  Thr CLine   User     System     Wall    flt/cpu/s fault/wsec
    15   16   1     0.67s   233.41s    14.76s   16798.546 266356.260

  +-------------------------------------------------------------------------+
      N           Min           Max        Median           Avg        Stddev
  x  10     16682.962     17344.027     16913.524     16928.812      166.5362
  +  10     16695.568     16923.896     16820.604     16824.652     84.816568
  No difference proven at 95.0% confidence

[akpm@linux-foundation.org: fix build]
[hughd@google.com: shmem fix]
Signed-off-by: Ying Han &lt;yinghan@google.com&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Acked-by: Balbir Singh &lt;balbir@linux.vnet.ibm.com&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>memcg: rename mem_cgroup_zone_nr_pages() to mem_cgroup_zone_nr_lru_pages()</title>
<updated>2011-05-27T00:12:35Z</updated>
<author>
<name>Ying Han</name>
<email>yinghan@google.com</email>
</author>
<published>2011-05-26T23:25:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=1bac180bd29e03989f50054af97b53b8d37a364a'/>
<id>urn:sha1:1bac180bd29e03989f50054af97b53b8d37a364a</id>
<content type='text'>
The caller of the function has been renamed to zone_nr_lru_pages(), and
this is just fixing up in the memcg code.  The current name is easily to
be mis-read as zone's total number of pages.

Signed-off-by: Ying Han &lt;yinghan@google.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Reviewed-by: Minchan Kim &lt;minchan.kim@gmail.com&gt;
Cc: Balbir Singh &lt;balbir@in.ibm.com&gt;
Cc: Daisuke Nishimura &lt;nishimura@mxp.nes.nec.co.jp&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
