<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/include/linux/xarray.h, branch v5.3.14</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.3.14</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.3.14'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2019-05-31T17:52:41Z</updated>
<entry>
<title>mm: fix page cache convergence regression</title>
<updated>2019-05-31T17:52:41Z</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2019-05-24T14:12:46Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7b785645e8f13e17cbce492708cf6e7039d32e46'/>
<id>urn:sha1:7b785645e8f13e17cbce492708cf6e7039d32e46</id>
<content type='text'>
Since a28334862993 ("page cache: Finish XArray conversion"), on most
major Linux distributions, the page cache doesn't correctly transition
when the hot data set is changing, and leaves the new pages thrashing
indefinitely instead of kicking out the cold ones.

On a freshly booted, freshly ssh'd into virtual machine with 1G RAM
running stock Arch Linux:

[root@ham ~]# ./reclaimtest.sh
+ dd of=workingset-a bs=1M count=0 seek=600
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ ./mincore workingset-a
153600/153600 workingset-a
+ dd of=workingset-b bs=1M count=0 seek=600
+ cat workingset-b
+ cat workingset-b
+ cat workingset-b
+ cat workingset-b
+ ./mincore workingset-a workingset-b
104029/153600 workingset-a
120086/153600 workingset-b
+ cat workingset-b
+ cat workingset-b
+ cat workingset-b
+ cat workingset-b
+ ./mincore workingset-a workingset-b
104029/153600 workingset-a
120268/153600 workingset-b

workingset-b is a 600M file on a 1G host that is otherwise entirely
idle. No matter how often it's being accessed, it won't get cached.

While investigating, I noticed that the non-resident information gets
aggressively reclaimed - /proc/vmstat::workingset_nodereclaim. This is
a problem because a workingset transition like this relies on the
non-resident information tracked in the page cache tree of evicted
file ranges: when the cache faults are refaults of recently evicted
cache, we challenge the existing active set, and that allows a new
workingset to establish itself.

Tracing the shrinker that maintains this memory revealed that all page
cache tree nodes were allocated to the root cgroup. This is a problem,
because 1) the shrinker sizes the amount of non-resident information
it keeps to the size of the cgroup's other memory and 2) on most major
Linux distributions, only kernel threads live in the root cgroup and
everything else gets put into services or session groups:

[root@ham ~]# cat /proc/self/cgroup
0::/user.slice/user-0.slice/session-c1.scope

As a result, we basically maintain no non-resident information for the
workloads running on the system, thus breaking the caching algorithm.

Looking through the code, I found the culprit in the above-mentioned
patch: when switching from the radix tree to xarray, it dropped the
__GFP_ACCOUNT flag from the tree node allocations - the flag that
makes sure the allocated memory gets charged to and tracked by the
cgroup of the calling process - in this case, the one doing the fault.

To fix this, allow xarray users to specify per-tree flag that makes
xarray allocate nodes using __GFP_ACCOUNT. Then restore the page cache
tree annotation to request such cgroup tracking for the cache nodes.

With this patch applied, the page cache correctly converges on new
workingsets again after just a few iterations:

[root@ham ~]# ./reclaimtest.sh
+ dd of=workingset-a bs=1M count=0 seek=600
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ ./mincore workingset-a
153600/153600 workingset-a
+ dd of=workingset-b bs=1M count=0 seek=600
+ cat workingset-b
+ ./mincore workingset-a workingset-b
124607/153600 workingset-a
87876/153600 workingset-b
+ cat workingset-b
+ ./mincore workingset-a workingset-b
81313/153600 workingset-a
133321/153600 workingset-b
+ cat workingset-b
+ ./mincore workingset-a workingset-b
63036/153600 workingset-a
153600/153600 workingset-b

Cc: stable@vger.kernel.org # 4.20+
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reviewed-by: Shakeel Butt &lt;shakeelb@google.com&gt;
Signed-off-by: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
</content>
</entry>
<entry>
<title>XArray: Use xa_cmpxchg to implement xa_reserve</title>
<updated>2019-02-20T22:08:54Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@infradead.org</email>
</author>
<published>2019-02-20T16:51:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=962033d55d0761e0716a01a715c6659c8c8dfc41'/>
<id>urn:sha1:962033d55d0761e0716a01a715c6659c8c8dfc41</id>
<content type='text'>
Jason feels this is clearer, and it saves a function and an exported
symbol.

Suggested-by: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
</content>
</entry>
<entry>
<title>XArray: Fix xa_release in allocating arrays</title>
<updated>2019-02-20T22:08:54Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@infradead.org</email>
</author>
<published>2019-02-20T16:30:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b38f6c50270683abf35a388f82cafecce971a003'/>
<id>urn:sha1:b38f6c50270683abf35a388f82cafecce971a003</id>
<content type='text'>
xa_cmpxchg() was a little too magic in turning ZERO entries into NULL,
and would leave the entry set to the ZERO entry instead of releasing
it for future use.  After careful review of existing users of
xa_cmpxchg(), change the semantics so that it does not translate either
incoming argument from NULL into ZERO entries.

Add several tests to the test-suite to make sure this problem doesn't
come back.

Reported-by: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
</content>
</entry>
<entry>
<title>XArray: Mark xa_insert and xa_reserve as must_check</title>
<updated>2019-02-09T05:00:49Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@infradead.org</email>
</author>
<published>2019-02-08T19:02:45Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f818b82b80164014d7ee3df89bb110808778c796'/>
<id>urn:sha1:f818b82b80164014d7ee3df89bb110808778c796</id>
<content type='text'>
If the user doesn't care about the return value from xa_insert(), then
they should be using xa_store() instead.  The point of xa_reserve() is
to get the return value early before taking another lock, so this should
also be __must_check.

Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
</content>
</entry>
<entry>
<title>XArray: Add cyclic allocation</title>
<updated>2019-02-06T18:32:25Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@infradead.org</email>
</author>
<published>2018-11-06T19:13:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2fa044e51a1f35d7b04cbde07ec513b0ba195e38'/>
<id>urn:sha1:2fa044e51a1f35d7b04cbde07ec513b0ba195e38</id>
<content type='text'>
This differs slightly from the IDR equivalent in five ways.

1. It can allocate up to UINT_MAX instead of being limited to INT_MAX,
   like xa_alloc().  Also like xa_alloc(), it will write to the 'id'
   pointer before placing the entry in the XArray.
2. The 'next' cursor is allocated separately from the XArray instead
   of being part of the IDR.  This saves memory for all the users which
   do not use the cyclic allocation API and suits some users better.
3. It returns -EBUSY instead of -ENOSPC.
4. It will attempt to wrap back to the minimum value on memory allocation
   failure as well as on an -EBUSY error, assuming that a user would
   rather allocate a small ID than suffer an ID allocation failure.
5. It reports whether it has wrapped, which is important to some users.

Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
</content>
</entry>
<entry>
<title>XArray: Redesign xa_alloc API</title>
<updated>2019-02-06T18:32:23Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@infradead.org</email>
</author>
<published>2018-12-31T15:41:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a3e4d3f97ec844de005a679585c04c5c03dfbdb6'/>
<id>urn:sha1:a3e4d3f97ec844de005a679585c04c5c03dfbdb6</id>
<content type='text'>
It was too easy to forget to initialise the start index.  Add an
xa_limit data structure which can be used to pass min &amp; max, and
define a couple of special values for common cases.  Also add some
more tests cribbed from the IDR test suite.  Change the return value
from -ENOSPC to -EBUSY to match xa_insert().

Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
</content>
</entry>
<entry>
<title>XArray: Add support for 1s-based allocation</title>
<updated>2019-02-06T18:13:24Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@infradead.org</email>
</author>
<published>2018-10-26T18:43:22Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3ccaf57a6a63ad171a951dcaddffc453b2414c7b'/>
<id>urn:sha1:3ccaf57a6a63ad171a951dcaddffc453b2414c7b</id>
<content type='text'>
A lot of places want to allocate IDs starting at 1 instead of 0.
While the xa_alloc() API supports this, it's not very efficient if lots
of IDs are allocated, due to having to walk down to the bottom of the
tree to see if ID 1 is available, then all the way over to the next
non-allocated ID.  This method marks ID 0 as being occupied which wastes
one slot in the XArray, but preserves xa_empty() as working.

Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
</content>
</entry>
<entry>
<title>XArray: Change xa_insert to return -EBUSY</title>
<updated>2019-02-06T18:12:15Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@infradead.org</email>
</author>
<published>2019-02-06T18:07:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fd9dc93e36231fb6d520e0edd467058fad4fd12d'/>
<id>urn:sha1:fd9dc93e36231fb6d520e0edd467058fad4fd12d</id>
<content type='text'>
Userspace translates EEXIST to "File exists" which isn't a very good
error message for the problem.  "Device or resource busy" is a better
indication of what went wrong.

Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
</content>
</entry>
<entry>
<title>XArray: Update xa_erase family descriptions</title>
<updated>2019-02-05T04:16:58Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>willy@infradead.org</email>
</author>
<published>2019-01-26T05:52:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=809ab9371ca0a96b44d9866ad82849410759a45b'/>
<id>urn:sha1:809ab9371ca0a96b44d9866ad82849410759a45b</id>
<content type='text'>
xa_erase does not allocate memory and doesn't have a gfp parameter.
Update the descriptions of all four variants to be more useful.

Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
</content>
</entry>
<entry>
<title>XArray: Fix an arithmetic error in xa_is_err</title>
<updated>2019-01-17T12:19:42Z</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@oracle.com</email>
</author>
<published>2019-01-17T12:15:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=edcddd4c879af48ec922d680b2d56834c085683b'/>
<id>urn:sha1:edcddd4c879af48ec922d680b2d56834c085683b</id>
<content type='text'>
There is a math problem here which leads to a lot of static checker
warnings for me:

net/sunrpc/clnt.c:451 rpc_new_client() error: (-4096) too low for ERR_PTR

Error values are from -1 to -4095 or from 0xffffffff to 0xfffff001 in
hexadecimal.  (I am assuming a 32 bit system for simplicity).  We are
using the lowest two bits to hold some internal XArray data so the
error is shifted two spaces to the left.  0xfffff001 &lt;&lt; 2 is 0xffffc004.
And finally we want to check that BIT(1) is set so we add 2 which gives
us 0xffffc006.

In other words, we should be checking that "entry &gt;= 0xffffc006", but
the check is actually testing if "entry &gt;= 0xffffc002".

Fixes: 76b4e5299565 ("XArray: Permit storing 2-byte-aligned pointers")
Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
[Use xa_mk_internal() instead of changing the bracketing]
Signed-off-by: Matthew Wilcox &lt;willy@infradead.org&gt;
</content>
</entry>
</feed>
