<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/xarray.h, branch v5.8.11</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.8.11</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.8.11'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2020-06-08T18:05:56Z</updated>
<entry>
<title>xarray.h: correct return code documentation for xa_store_{bh,irq}()</title>
<updated>2020-06-08T18:05:56Z</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2020-06-08T04:40:20Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=01f39c1c11ee5bf44a1df49e47eb53a86515b9dc'/>
<id>urn:sha1:01f39c1c11ee5bf44a1df49e47eb53a86515b9dc</id>
<content type='text'>
__xa_store() and xa_store() document that the functions can fail, and
that the return code can be an xa_err() encoded error code.

xa_store_bh() and xa_store_irq() do not document that the functions can
fail and that they can also return xa_err() encoded error codes.

Thus: Update the documentation.

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Link: http://lkml.kernel.org/r/20200430111424.16634-1-manfred@colorfullife.com
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>xarray: Fix early termination of xas_for_each_marked</title>
<updated>2020-03-12T21:42:08Z</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2020-03-12T21:29:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7e934cf5ace1dceeb804f7493fa28bb697ed3c52'/>
<id>urn:sha1:7e934cf5ace1dceeb804f7493fa28bb697ed3c52</id>
<content type='text'>
xas_for_each_marked() is using entry == NULL as a termination condition
of the iteration. When xas_for_each_marked() is used protected only by
RCU, this can however race with xas_store(xas, NULL) in the following
way:

TASK1                                   TASK2
page_cache_delete()         	        find_get_pages_range_tag()
                                          xas_for_each_marked()
                                            xas_find_marked()
                                              off = xas_find_chunk()

  xas_store(&amp;xas, NULL)
    xas_init_marks(&amp;xas);
    ...
    rcu_assign_pointer(*slot, NULL);
                                              entry = xa_entry(off);

And thus xas_for_each_marked() terminates prematurely possibly leading
to missed entries in the iteration (translating to missing writeback of
some pages or a similar problem).

If we find a NULL entry that has been marked, skip it (unless we're trying
to allocate an entry).

Reported-by: Jan Kara &lt;jack@suse.cz&gt;
CC: stable@vger.kernel.org
Fixes: ef8e5717db01 ("page cache: Convert delete_batch to XArray")
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
</content>
</entry>
<entry>
<title>XArray: Fix incorrect comment in header file</title>
<updated>2020-01-31T20:09:49Z</updated>
<author>
<name>Chengguang Xu</name>
<email>cgxu519@mykernel.net</email>
</author>
<published>2019-12-06T10:29:03Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=24a448b165253b6f2ab1e0bcdba9a733007681d6'/>
<id>urn:sha1:24a448b165253b6f2ab1e0bcdba9a733007681d6</id>
<content type='text'>
256 means Retry entry and 257 means Zero entry,
so fix it.

Signed-off-by: Chengguang Xu &lt;cgxu519@mykernel.net&gt;
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
</content>
</entry>
<entry>
<title>XArray: Add xa_for_each_range</title>
<updated>2020-01-18T03:33:37Z</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2020-01-12T20:54:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=00ed452c210a0bc1ff3ee79e1ce6b199f00a0638'/>
<id>urn:sha1:00ed452c210a0bc1ff3ee79e1ce6b199f00a0638</id>
<content type='text'>
This function supports iterating over a range of an array.  Also add
documentation links for xa_for_each_start().

Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
</content>
</entry>
<entry>
<title>XArray: Add wrappers for nested spinlocks</title>
<updated>2020-01-18T03:32:17Z</updated>
<author>
<name>Matthew Wilcox (Oracle)</name>
<email>willy@infradead.org</email>
</author>
<published>2020-01-17T17:36:32Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=82a958497dc9120e0e8043da82273baedd255aaf'/>
<id>urn:sha1:82a958497dc9120e0e8043da82273baedd255aaf</id>
<content type='text'>
Some users need to take an xarray lock while holding another xarray lock.

Reported-by: Doug Gilbert &lt;dgilbert@interlog.com&gt;
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
</content>
</entry>
<entry>
<title>xarray.h: fix kernel-doc warning</title>
<updated>2019-10-14T22:04:01Z</updated>
<author>
<name>Randy Dunlap</name>
<email>rdunlap@infradead.org</email>
</author>
<published>2019-10-14T21:12:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=13bea898cd9153be91baa92de68ec32ff0b99362'/>
<id>urn:sha1:13bea898cd9153be91baa92de68ec32ff0b99362</id>
<content type='text'>
Fix (Sphinx) kernel-doc warning in &lt;linux/xarray.h&gt;:

  include/linux/xarray.h:232: WARNING: Unexpected indentation.

Link: http://lkml.kernel.org/r/89ba2134-ce23-7c10-5ee1-ef83b35aa984@infradead.org
Fixes: a3e4d3f97ec8 ("XArray: Redesign xa_alloc API")
Signed-off-by: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: fix page cache convergence regression</title>
<updated>2019-05-31T17:52:41Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2019-05-24T14:12:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7b785645e8f13e17cbce492708cf6e7039d32e46'/>
<id>urn:sha1:7b785645e8f13e17cbce492708cf6e7039d32e46</id>
<content type='text'>
Since a28334862993 ("page cache: Finish XArray conversion"), on most
major Linux distributions, the page cache doesn't correctly transition
when the hot data set is changing, and leaves the new pages thrashing
indefinitely instead of kicking out the cold ones.

On a freshly booted, freshly ssh'd into virtual machine with 1G RAM
running stock Arch Linux:

[root@ham ~]# ./reclaimtest.sh
+ dd of=workingset-a bs=1M count=0 seek=600
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ ./mincore workingset-a
153600/153600 workingset-a
+ dd of=workingset-b bs=1M count=0 seek=600
+ cat workingset-b
+ cat workingset-b
+ cat workingset-b
+ cat workingset-b
+ ./mincore workingset-a workingset-b
104029/153600 workingset-a
120086/153600 workingset-b
+ cat workingset-b
+ cat workingset-b
+ cat workingset-b
+ cat workingset-b
+ ./mincore workingset-a workingset-b
104029/153600 workingset-a
120268/153600 workingset-b

workingset-b is a 600M file on a 1G host that is otherwise entirely
idle. No matter how often it's being accessed, it won't get cached.

While investigating, I noticed that the non-resident information gets
aggressively reclaimed - /proc/vmstat::workingset_nodereclaim. This is
a problem because a workingset transition like this relies on the
non-resident information tracked in the page cache tree of evicted
file ranges: when the cache faults are refaults of recently evicted
cache, we challenge the existing active set, and that allows a new
workingset to establish itself.

Tracing the shrinker that maintains this memory revealed that all page
cache tree nodes were allocated to the root cgroup. This is a problem,
because 1) the shrinker sizes the amount of non-resident information
it keeps to the size of the cgroup's other memory and 2) on most major
Linux distributions, only kernel threads live in the root cgroup and
everything else gets put into services or session groups:

[root@ham ~]# cat /proc/self/cgroup
0::/user.slice/user-0.slice/session-c1.scope

As a result, we basically maintain no non-resident information for the
workloads running on the system, thus breaking the caching algorithm.

Looking through the code, I found the culprit in the above-mentioned
patch: when switching from the radix tree to xarray, it dropped the
__GFP_ACCOUNT flag from the tree node allocations - the flag that
makes sure the allocated memory gets charged to and tracked by the
cgroup of the calling process - in this case, the one doing the fault.

To fix this, allow xarray users to specify per-tree flag that makes
xarray allocate nodes using __GFP_ACCOUNT. Then restore the page cache
tree annotation to request such cgroup tracking for the cache nodes.

With this patch applied, the page cache correctly converges on new
workingsets again after just a few iterations:

[root@ham ~]# ./reclaimtest.sh
+ dd of=workingset-a bs=1M count=0 seek=600
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ ./mincore workingset-a
153600/153600 workingset-a
+ dd of=workingset-b bs=1M count=0 seek=600
+ cat workingset-b
+ ./mincore workingset-a workingset-b
124607/153600 workingset-a
87876/153600 workingset-b
+ cat workingset-b
+ ./mincore workingset-a workingset-b
81313/153600 workingset-a
133321/153600 workingset-b
+ cat workingset-b
+ ./mincore workingset-a workingset-b
63036/153600 workingset-a
153600/153600 workingset-b

Cc: stable@vger.kernel.org # 4.20+
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
</content>
</entry>
<entry>
<title>XArray: Use xa_cmpxchg to implement xa_reserve</title>
<updated>2019-02-20T22:08:54Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@infradead.org</email>
</author>
<published>2019-02-20T16:51:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=962033d55d0761e0716a01a715c6659c8c8dfc41'/>
<id>urn:sha1:962033d55d0761e0716a01a715c6659c8c8dfc41</id>
<content type='text'>
Jason feels this is clearer, and it saves a function and an exported
symbol.

Suggested-by: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
</content>
</entry>
<entry>
<title>XArray: Fix xa_release in allocating arrays</title>
<updated>2019-02-20T22:08:54Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@infradead.org</email>
</author>
<published>2019-02-20T16:30:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b38f6c50270683abf35a388f82cafecce971a003'/>
<id>urn:sha1:b38f6c50270683abf35a388f82cafecce971a003</id>
<content type='text'>
xa_cmpxchg() was a little too magic in turning ZERO entries into NULL,
and would leave the entry set to the ZERO entry instead of releasing
it for future use.  After careful review of existing users of
xa_cmpxchg(), change the semantics so that it does not translate either
incoming argument from NULL into ZERO entries.

Add several tests to the test-suite to make sure this problem doesn't
come back.

Reported-by: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
</content>
</entry>
<entry>
<title>XArray: Mark xa_insert and xa_reserve as must_check</title>
<updated>2019-02-09T05:00:49Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@infradead.org</email>
</author>
<published>2019-02-08T19:02:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f818b82b80164014d7ee3df89bb110808778c796'/>
<id>urn:sha1:f818b82b80164014d7ee3df89bb110808778c796</id>
<content type='text'>
If the user doesn't care about the return value from xa_insert(), then
they should be using xa_store() instead.  The point of xa_reserve() is
to get the return value early before taking another lock, so this should
also be __must_check.

Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
</content>
</entry>
</feed>
