<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/mm.h, branch v4.1.9</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.1.9</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.1.9'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2015-09-29T17:26:06Z</updated>
<entry>
<title>mm: make page pfmemalloc check more robust</title>
<updated>2015-09-29T17:26:06Z</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2015-08-21T21:11:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c85ea6919f1cf092ad6418323e49faac6ac7cf3c'/>
<id>urn:sha1:c85ea6919f1cf092ad6418323e49faac6ac7cf3c</id>
<content type='text'>
commit 2f064f3485cd29633ad1b3cfb00cc519509a3d72 upstream.

Commit c48a11c7ad26 ("netvm: propagate page-&gt;pfmemalloc to skb") added
checks for page-&gt;pfmemalloc to __skb_fill_page_desc():

        if (page-&gt;pfmemalloc &amp;&amp; !page-&gt;mapping)
                skb-&gt;pfmemalloc = true;

It assumes page-&gt;mapping == NULL implies that page-&gt;pfmemalloc can be
trusted.  However, __delete_from_page_cache() can set set page-&gt;mapping
to NULL and leave page-&gt;index value alone.  Due to being in union, a
non-zero page-&gt;index will be interpreted as true page-&gt;pfmemalloc.

So the assumption is invalid if the networking code can see such a page.
And it seems it can.  We have encountered this with a NFS over loopback
setup when such a page is attached to a new skbuf.  There is no copying
going on in this case so the page confuses __skb_fill_page_desc which
interprets the index as pfmemalloc flag and the network stack drops
packets that have been allocated using the reserves unless they are to
be queued on sockets handling the swapping which is the case here and
that leads to hangs when the nfs client waits for a response from the
server which has been dropped and thus never arrive.

The struct page is already heavily packed so rather than finding another
hole to put it in, let's do a trick instead.  We can reuse the index
again but define it to an impossible value (-1UL).  This is the page
index so it should never see the value that large.  Replace all direct
users of page-&gt;pfmemalloc by page_is_pfmemalloc which will hide this
nastiness from unspoiled eyes.

The information will get lost if somebody wants to use page-&gt;index
obviously but that was the case before and the original code expected
that the information should be persisted somewhere else if that is
really needed (e.g.  what SLAB and SLUB do).

[akpm@linux-foundation.org: fix blooper in slub]
Fixes: c48a11c7ad26 ("netvm: propagate page-&gt;pfmemalloc to skb")
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Debugged-by: Vlastimil Babka &lt;vbabka@suse.com&gt;
Debugged-by: Jiri Bohac &lt;jbohac@suse.com&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;


</content>
</entry>
<entry>
<title>Revert "mm: avoid tail page refcounting on non-THP compound pages"</title>
<updated>2015-04-22T16:44:36Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-04-22T16:44:36Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f3ca10dde49043fbbb055854278b26954b549b36'/>
<id>urn:sha1:f3ca10dde49043fbbb055854278b26954b549b36</id>
<content type='text'>
This reverts commit 8d63d99a5dfbdb997d12dd3c07b2070ca723db3b.

It causes in VM mapping refcount errors:

  page:ffffea0010a15040 count:0 mapcount:1 mapping:          (null) index:0x0
  flags: 0x8000000000008014(referenced|dirty|tail)
  page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
  ------------[ cut here ]------------
  kernel BUG at mm/swap.c:134!

as reported by Borislav Petkov

Reported-and-tested-by: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Kirill A. Shutemov &lt;kirill@shutemov.name&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: linux-mm@kvack.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: new pfn_mkwrite same as page_mkwrite for VM_PFNMAP</title>
<updated>2015-04-15T23:35:20Z</updated>
<author>
<name>Boaz Harrosh</name>
<email>boaz@plexistor.com</email>
</author>
<published>2015-04-15T23:15:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=dd9061846a3ba01b0fa45423aaa087e4a69187fa'/>
<id>urn:sha1:dd9061846a3ba01b0fa45423aaa087e4a69187fa</id>
<content type='text'>
This will allow FS that uses VM_PFNMAP | VM_MIXEDMAP (no page structs) to
get notified when access is a write to a read-only PFN.

This can happen if we mmap() a file then first mmap-read from it to
page-in a read-only PFN, than we mmap-write to the same page.

We need this functionality to fix a DAX bug, where in the scenario above
we fail to set ctime/mtime though we modified the file.  An xfstest is
attached to this patchset that shows the failure and the fix.  (A DAX
patch will follow)

This functionality is extra important for us, because upon dirtying of a
pmem page we also want to RDMA the page to a remote cluster node.

We define a new pfn_mkwrite and do not reuse page_mkwrite because
  1 - The name ;-)
  2 - But mainly because it would take a very long and tedious
      audit of all page_mkwrite functions of VM_MIXEDMAP/VM_PFNMAP
      users. To make sure they do not now CRASH. For example current
      DAX code (which this is for) would crash.
      If we would want to reuse page_mkwrite, We will need to first
      patch all users, so to not-crash-on-no-page. Then enable this
      patch. But even if I did that I would not sleep so well at night.
      Adding a new vector is the safest thing to do, and is not that
      expensive. an extra pointer at a static function vector per driver.
      Also the new vector is better for performance, because else we
      Will call all current Kernel vectors, so to:
        check-ha-no-page-do-nothing and return.

No need to call it from do_shared_fault because do_wp_page is called to
change pte permissions anyway.

Signed-off-by: Yigal Korman &lt;yigal@plexistor.com&gt;
Signed-off-by: Boaz Harrosh &lt;boaz@plexistor.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Matthew Wilcox &lt;matthew.r.wilcox@intel.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: uninline and cleanup page-mapping related helpers</title>
<updated>2015-04-15T23:35:19Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2015-04-15T23:14:53Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e39155ea11eac6da056b04669d7c9fc612e2065a'/>
<id>urn:sha1:e39155ea11eac6da056b04669d7c9fc612e2065a</id>
<content type='text'>
Most-used page-&gt;mapping helper -- page_mapping() -- has already uninlined.
 Let's uninline also page_rmapping() and page_anon_vma().  It saves us
depending on configuration around 400 bytes in text:

   text	   data	    bss	    dec	    hex	filename
 660318	  99254	 410000	1169572	 11d8a4	mm/built-in.o-before
 659854	  99254	 410000	1169108	 11d6d4	mm/built-in.o

I also tried to make code a bit more clean.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>include/linux/mm.h: simplify flag check</title>
<updated>2015-04-15T23:35:19Z</updated>
<author>
<name>Borislav Petkov</name>
<email>bp@suse.de</email>
</author>
<published>2015-04-15T23:14:47Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cdd7875e0c4db5c41e28b645d3bf7d41bd2cbb45'/>
<id>urn:sha1:cdd7875e0c4db5c41e28b645d3bf7d41bd2cbb45</id>
<content type='text'>
Flip the flag test so that it is the simplest.  No functional change, just
a small readability improvement:

No code changed:

  # arch/x86/kernel/sys_x86_64.o:

   text    data     bss     dec     hex filename
   1551      24       0    1575     627 sys_x86_64.o.before
   1551      24       0    1575     627 sys_x86_64.o.after

md5:
   70708d1b1ad35cc891118a69dc1a63f9  sys_x86_64.o.before.asm
   70708d1b1ad35cc891118a69dc1a63f9  sys_x86_64.o.after.asm

Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: avoid tail page refcounting on non-THP compound pages</title>
<updated>2015-04-15T23:35:17Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2015-04-15T23:13:12Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8d63d99a5dfbdb997d12dd3c07b2070ca723db3b'/>
<id>urn:sha1:8d63d99a5dfbdb997d12dd3c07b2070ca723db3b</id>
<content type='text'>
THP uses tail page refcounting to be able to split huge pages at any time.
 Tail page refcounting is not needed for other users of compound pages and
it's harmful because of overhead.

We try to exclude non-THP pages from tail page refcounting using
__compound_tail_refcounted() check.  It excludes most common non-THP
compound pages: SL*B and hugetlb, but it doesn't catch rest of __GFP_COMP
users -- drivers.

And it's not only about overhead.

Drivers might want to use compound pages to get refcounting semantics
suitable for mapping high-order pages to userspace.  But tail page
refcounting breaks it.

Tail page refcounting uses -&gt;_mapcount in tail pages to store GUP pins on
them.  It means GUP pins would affect page_mapcount() for tail pages.
It's not a problem for THP, because it never maps tail pages.  But unlike
THP, drivers map parts of compound pages with PTEs and it makes
page_mapcount() be called for tail pages.

In particular, GUP pins would shift PSS up and affect /proc/kpagecount for
such pages.  But, I'm not aware about anything which can lead to crash or
other serious misbehaviour.

Since currently all THP pages are anonymous and all drivers pages are not,
we can fix the __compound_tail_refcounted() check by requiring PageAnon()
to enable tail page refcounting.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: consolidate all page-flags helpers in &lt;linux/page-flags.h&gt;</title>
<updated>2015-04-15T23:35:17Z</updated>
<author>
<name>Kirill A. Shutemov</name>
<email>kirill.shutemov@linux.intel.com</email>
</author>
<published>2015-04-15T23:13:08Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e8c6158fef15a1532bd5242a0cd88565eedabe61'/>
<id>urn:sha1:e8c6158fef15a1532bd5242a0cd88565eedabe61</id>
<content type='text'>
Currently we take a naive approach to page flags on compound pages - we
set the flag on the page without consideration if the flag makes sense
for tail page or for compound page in general.  This patchset try to
sort this out by defining per-flag policy on what need to be done if
page-flag helper operate on compound page.

The last patch in the patchset also sanitizes usege of page-&gt;mapping for
tail pages.  We don't define the meaning of page-&gt;mapping for tail
pages.  Currently it's always NULL, which can be inconsistent with head
page and potentially lead to problems.

For now I caught one case of illegal usage of page flags or -&gt;mapping:
sound subsystem allocates pages with __GFP_COMP and maps them with PTEs.
It leads to setting dirty bit on tail pages and access to tail_page's
-&gt;mapping.  I don't see any bad behaviour caused by this, but worth
fixing anyway.

This patchset makes more sense if you take my THP refcounting into
account: we will see more compound pages mapped with PTEs and we need to
define behaviour of flags on compound pages to avoid bugs.

This patch (of 16):

We have page-flags helper function declarations/definitions spread over
several header files.  Let's consolidate them in &lt;linux/page-flags.h&gt;.

Signed-off-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Naoya Horiguchi &lt;n-horiguchi@ah.jp.nec.com&gt;
Cc: Steve Capper &lt;steve.capper@linaro.org&gt;
Cc: "Aneesh Kumar K.V" &lt;aneesh.kumar@linux.vnet.ibm.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: Jerome Marchand &lt;jmarchan@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: completely remove dumping per-cpu lists from show_mem()</title>
<updated>2015-04-14T23:49:01Z</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>koct9i@gmail.com</email>
</author>
<published>2015-04-14T22:45:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=761b06771adeeb734e9eebc6f70f916cb9e2f643'/>
<id>urn:sha1:761b06771adeeb734e9eebc6f70f916cb9e2f643</id>
<content type='text'>
It seems nobody needs this.

Signed-off-by: Konstantin Khlebnikov &lt;koct9i@gmail.com&gt;
Cc: Michal Hocko &lt;mhocko@suse.cz&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: hide per-cpu lists in output of show_mem()</title>
<updated>2015-04-14T23:49:01Z</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@yandex-team.ru</email>
</author>
<published>2015-04-14T22:45:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d1bfcdb8ce0ea6eb6034daa7ff02548e0bc9c21b'/>
<id>urn:sha1:d1bfcdb8ce0ea6eb6034daa7ff02548e0bc9c21b</id>
<content type='text'>
This makes show_mem() much less verbose on huge machines.  Instead of huge
and almost useless dump of counters for each per-zone per-cpu lists this
patch prints the sum of these counters for each zone (free_pcp) and size
of per-cpu list for current cpu (local_pcp).

The filter flag SHOW_MEM_PERCPU_LISTS reverts to the old verbose mode.

[akpm@linux-foundation.org: update show_free_areas comment]
Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>page_writeback: clean up mess around cancel_dirty_page()</title>
<updated>2015-04-14T23:49:01Z</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@yandex-team.ru</email>
</author>
<published>2015-04-14T22:45:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b9ea25152e56365ce149b9a39637cd7a16eec556'/>
<id>urn:sha1:b9ea25152e56365ce149b9a39637cd7a16eec556</id>
<content type='text'>
This patch replaces cancel_dirty_page() with a helper function
account_page_cleaned() which only updates counters.  It's called from
truncate_complete_page() and from try_to_free_buffers() (hack for ext3).
Page is locked in both cases, page-lock protects against concurrent
dirtiers: see commit 2d6d7f982846 ("mm: protect set_page_dirty() from
ongoing truncation").

Delete_from_page_cache() shouldn't be called for dirty pages, they must
be handled by caller (either written or truncated).  This patch treats
final dirty accounting fixup at the end of __delete_from_page_cache() as
a debug check and adds WARN_ON_ONCE() around it.  If something removes
dirty pages without proper handling that might be a bug and unwritten
data might be lost.

Hugetlbfs has no dirty pages accounting, ClearPageDirty() is enough
here.

cancel_dirty_page() in nfs_wb_page_cancel() is redundant.  This is
helper for nfs_invalidate_page() and it's called only in case complete
invalidation.

The mess was started in v2.6.20 after commits 46d2277c796f ("Clean up
and make try_to_free_buffers() not race with dirty pages") and
3e67c0987d75 ("truncate: clear page dirtiness before running
try_to_free_buffers()") first was reverted right in v2.6.20 in commit
ecdfc9787fe5 ("Resurrect 'try_to_free_buffers()' VM hackery"), second in
v2.6.25 commit a2b345642f53 ("Fix dirty page accounting leak with ext3
data=journal").

Custom fixes were introduced between these points.  NFS in v2.6.23, commit
1b3b4a1a2deb ("NFS: Fix a write request leak in nfs_invalidate_page()").
Kludge in __delete_from_page_cache() in v2.6.24, commit 3a6927906f1b ("Do
dirty page accounting when removing a page from the page cache").  Since
v2.6.25 all of them are redundant.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
