<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/tools/testing/radix-tree, branch v4.18-rc2</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.18-rc2</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.18-rc2'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2018-05-26T01:12:10Z</updated>
<entry>
<title>idr: fix invalid ptr dereference on item delete</title>
<updated>2018-05-26T01:12:10Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>mawilcox@microsoft.com</email>
</author>
<published>2018-05-25T21:47:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7a4deea1aa8bddfed4ef1b35fc2b6732563d8ad5'/>
<id>urn:sha1:7a4deea1aa8bddfed4ef1b35fc2b6732563d8ad5</id>
<content type='text'>
If the radix tree underlying the IDR happens to be full and we attempt
to remove an id which is larger than any id in the IDR, we will call
__radix_tree_delete() with an uninitialised 'slot' pointer, at which
point anything could happen.  This was easiest to hit with a single
entry at id 0 and attempting to remove a non-0 id, but it could have
happened with 64 entries and attempting to remove an id &gt;= 64.

Roman said:

  The syzcaller test boils down to opening /dev/kvm, creating an
  eventfd, and calling a couple of KVM ioctls. None of this requires
  superuser. And the result is dereferencing an uninitialized pointer
  which is likely a crash. The specific path caught by syzbot is via
  KVM_HYPERV_EVENTD ioctl which is new in 4.17. But I guess there are
  other user-triggerable paths, so cc:stable is probably justified.

Matthew added:

  We have around 250 calls to idr_remove() in the kernel today. Many of
  them pass an ID which is embedded in the object they're removing, so
  they're safe. Picking a few likely candidates:

  drivers/firewire/core-cdev.c looks unsafe; the ID comes from an ioctl.
  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c is similar
  drivers/atm/nicstar.c could be taken down by a handcrafted packet

Link: http://lkml.kernel.org/r/20180518175025.GD6361@bombadil.infradead.org
Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree")
Reported-by: &lt;syzbot+35666cba7f0a337e2e79@syzkaller.appspotmail.com&gt;
Debugged-by: Roman Kagan &lt;rkagan@virtuozzo.com&gt;
Signed-off-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>radix tree test suite: multi-order iteration race</title>
<updated>2018-05-19T00:17:12Z</updated>
<author>
<name>Ross Zwisler</name>
<email>ross.zwisler@linux.intel.com</email>
</author>
<published>2018-05-18T23:09:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fd8f58c40b703e47697c9f12bc16c31f14c161f1'/>
<id>urn:sha1:fd8f58c40b703e47697c9f12bc16c31f14c161f1</id>
<content type='text'>
Add a test which shows a race in the multi-order iteration code.  This
test reliably hits the race in under a second on my machine, and is the
result of a real bug report against kernel a production v4.15 based
kernel (4.15.6-300.fc27.x86_64).  With a real kernel this issue is hit
when using order 9 PMD DAX radix tree entries.

The race has to do with how we tear down multi-order sibling entries
when we are removing an item from the tree.  Remember that an order 2
entry looks like this:

  struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]

where 'entry' is in some slot in the struct radix_tree_node, and the
three slots following 'entry' contain sibling pointers which point back
to 'entry.'

When we delete 'entry' from the tree, we call :

  radix_tree_delete()
    radix_tree_delete_item()
      __radix_tree_delete()
        replace_slot()

replace_slot() first removes the siblings in order from the first to the
last, then at then replaces 'entry' with NULL.  This means that for a
brief period of time we end up with one or more of the siblings removed,
so:

  struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]

This causes an issue if you have a reader iterating over the slots in
the tree via radix_tree_for_each_slot() while only under
rcu_read_lock()/rcu_read_unlock() protection.  This is a common case in
mm/filemap.c.

The issue is that when __radix_tree_next_slot() =&gt; skip_siblings() tries
to skip over the sibling entries in the slots, it currently does so with
an exact match on the slot directly preceding our current slot.
Normally this works:

                                      V preceding slot
  struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
                                              ^ current slot

This lets you find the first sibling, and you skip them all in order.

But in the case where one of the siblings is NULL, that slot is skipped
and then our sibling detection is interrupted:

                                             V preceding slot
  struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
                                                    ^ current slot

This means that the sibling pointers aren't recognized since they point
all the way back to 'entry', so we think that they are normal internal
radix tree pointers.  This causes us to think we need to walk down to a
struct radix_tree_node starting at the address of 'entry'.

In a real running kernel this will crash the thread with a GP fault when
you try and dereference the slots in your broken node starting at
'entry'.

In the radix tree test suite this will be caught by the address
sanitizer:

  ==27063==ERROR: AddressSanitizer: heap-buffer-overflow on address
  0x60c0008ae400 at pc 0x00000040ce4f bp 0x7fa89b8fcad0 sp 0x7fa89b8fcac0
  READ of size 8 at 0x60c0008ae400 thread T3
      #0 0x40ce4e in __radix_tree_next_slot /home/rzwisler/project/linux/tools/testing/radix-tree/radix-tree.c:1660
      #1 0x4022cc in radix_tree_next_slot linux/../../../../include/linux/radix-tree.h:567
      #2 0x4022cc in iterator_func /home/rzwisler/project/linux/tools/testing/radix-tree/multiorder.c:655
      #3 0x7fa8a088d50a in start_thread (/lib64/libpthread.so.0+0x750a)
      #4 0x7fa8a03bd16e in clone (/lib64/libc.so.6+0xf516e)

Link: http://lkml.kernel.org/r/20180503192430.7582-5-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: CR, Sapthagirish &lt;sapthagirish.cr@intel.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>radix tree test suite: add item_delete_rcu()</title>
<updated>2018-05-19T00:17:12Z</updated>
<author>
<name>Ross Zwisler</name>
<email>ross.zwisler@linux.intel.com</email>
</author>
<published>2018-05-18T23:08:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3e252fa7d4f711798e7a3f5ff2d7b62f0e2987ce'/>
<id>urn:sha1:3e252fa7d4f711798e7a3f5ff2d7b62f0e2987ce</id>
<content type='text'>
Currently the lifetime of "struct item" entries in the radix tree are
not controlled by RCU, but are instead deleted inline as they are
removed from the tree.

In the following patches we add a test which has threads iterating over
items pulled from the tree and verifying them in an
rcu_read_lock()/rcu_read_unlock() section.  This means that though an
item has been removed from the tree it could still be being worked on by
other threads until the RCU grace period expires.  So, we need to
actually free the "struct item" structures at the end of the grace
period, just as we do with "struct radix_tree_node" items.

Link: http://lkml.kernel.org/r/20180503192430.7582-4-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: CR, Sapthagirish &lt;sapthagirish.cr@intel.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>radix tree test suite: fix mapshift build target</title>
<updated>2018-05-19T00:17:12Z</updated>
<author>
<name>Ross Zwisler</name>
<email>ross.zwisler@linux.intel.com</email>
</author>
<published>2018-05-18T23:08:51Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8d9fa88edd5e360b71765feeadb915d4066c9684'/>
<id>urn:sha1:8d9fa88edd5e360b71765feeadb915d4066c9684</id>
<content type='text'>
Commit c6ce3e2fe3da ("radix tree test suite: Add config option for map
shift") introduced a phony makefile target called 'mapshift' that ends
up generating the file generated/map-shift.h.  This phony target was
then added as a dependency of the top level 'targets' build target,
which is what is run when you go to tools/testing/radix-tree and just
type 'make'.

Unfortunately, this phony target doesn't actually work as a dependency,
so you end up getting:

  $ make
  make: *** No rule to make target 'generated/map-shift.h', needed by 'main.o'.  Stop.
  make: *** Waiting for unfinished jobs....

Fix this by making the file generated/map-shift.h our real makefile
target, and add this a dependency of the top level build target.

Link: http://lkml.kernel.org/r/20180503192430.7582-2-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler &lt;ross.zwisler@linux.intel.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: CR, Sapthagirish &lt;sapthagirish.cr@intel.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Jan Kara &lt;jack@suse.cz&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>radix tree: use GFP_ZONEMASK bits of gfp_t for flags</title>
<updated>2018-04-11T17:28:39Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>mawilcox@microsoft.com</email>
</author>
<published>2018-04-10T23:36:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=fa290cda102c096f5ca394277d65d3dbd689930b'/>
<id>urn:sha1:fa290cda102c096f5ca394277d65d3dbd689930b</id>
<content type='text'>
Patch series "XArray", v9.  (First part thereof).

This patchset is, I believe, appropriate for merging for 4.17.  It
contains the XArray implementation, to eventually replace the radix
tree, and converts the page cache to use it.

This conversion keeps the radix tree and XArray data structures in sync
at all times.  That allows us to convert the page cache one function at
a time and should allow for easier bisection.  Other than renaming some
elements of the structures, the data structures are fundamentally
unchanged; a radix tree walk and an XArray walk will touch the same
number of cachelines.  I have changes planned to the XArray data
structure, but those will happen in future patches.

Improvements the XArray has over the radix tree:

 - The radix tree provides operations like other trees do; 'insert' and
   'delete'. But what most users really want is an automatically
   resizing array, and so it makes more sense to give users an API that
   is like an array -- 'load' and 'store'. We still have an 'insert'
   operation for users that really want that semantic.

 - The XArray considers locking as part of its API. This simplifies a
   lot of users who formerly had to manage their own locking just for
   the radix tree. It also improves code generation as we can now tell
   RCU that we're holding a lock and it doesn't need to generate as much
   fencing code. The other advantage is that tree nodes can be moved
   (not yet implemented).

 - GFP flags are now parameters to calls which may need to allocate
   memory. The radix tree forced users to decide what the allocation
   flags would be at creation time. It's much clearer to specify them at
   allocation time.

 - Memory is not preloaded; we don't tie up dozens of pages on the off
   chance that the slab allocator fails. Instead, we drop the lock,
   allocate a new node and retry the operation. We have to convert all
   the radix tree, IDA and IDR preload users before we can realise this
   benefit, but I have not yet found a user which cannot be converted.

 - The XArray provides a cmpxchg operation. The radix tree forces users
   to roll their own (and at least four have).

 - Iterators take a 'max' parameter. That simplifies many users and will
   reduce the amount of iteration done.

 - Iteration can proceed backwards. We only have one user for this, but
   since it's called as part of the pagefault readahead algorithm, that
   seemed worth mentioning.

 - RCU-protected pointers are not exposed as part of the API. There are
   some fun bugs where the page cache forgets to use rcu_dereference()
   in the current codebase.

 - Value entries gain an extra bit compared to radix tree exceptional
   entries. That gives us the extra bit we need to put huge page swap
   entries in the page cache.

 - Some iterators now take a 'filter' argument instead of having
   separate iterators for tagged/untagged iterations.

The page cache is improved by this:

 - Shorter, easier to read code

 - More efficient iterations

 - Reduction in size of struct address_space

 - Fewer walks from the top of the data structure; the XArray API
   encourages staying at the leaf node and conducting operations there.

This patch (of 8):

None of these bits may be used for slab allocations, so we can use them
as radix tree flags as long as we mask them off before passing them to
the slab allocator. Move the IDR flag from the high bits to the
GFP_ZONEMASK bits.

Link: http://lkml.kernel.org/r/20180313132639.17387-3-willy@infradead.org
Signed-off-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Acked-by: Jeff Layton &lt;jlayton@kernel.org&gt;
Cc: Darrick J. Wong &lt;darrick.wong@oracle.com&gt;
Cc: Dave Chinner &lt;david@fromorbit.com&gt;
Cc: Ryusuke Konishi &lt;konishi.ryusuke@lab.ntt.co.jp&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>idr: Fix handling of IDs above INT_MAX</title>
<updated>2018-02-26T19:39:30Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>mawilcox@microsoft.com</email>
</author>
<published>2018-02-26T19:39:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=4b0ad07653ee94182e2d8f21404242c9e83ad0b4'/>
<id>urn:sha1:4b0ad07653ee94182e2d8f21404242c9e83ad0b4</id>
<content type='text'>
Khalid reported that the kernel selftests are currently failing:

selftests: test_bpf.sh
========================================
test_bpf: [FAIL]
not ok 1..8 selftests:  test_bpf.sh [FAIL]

He bisected it to 6ce711f2750031d12cec91384ac5cfa0a485b60a ("idr: Make
1-based IDRs more efficient").

The root cause is doing a signed comparison in idr_alloc_u32() instead
of an unsigned comparison.  I went looking for any similar problems and
found a couple (which would each result in the failure to warn in two
situations that aren't supposed to happen).

I knocked up a few test-cases to prove that I was right and added them
to the test-suite.

Reported-by: Khalid Aziz &lt;khalid.aziz@oracle.com&gt;
Tested-by: Khalid Aziz &lt;khalid.aziz@oracle.com&gt;
Signed-off-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
</content>
</entry>
<entry>
<title>radix tree test suite: Fix build</title>
<updated>2018-02-25T11:00:11Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>mawilcox@microsoft.com</email>
</author>
<published>2018-02-25T11:00:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=3d4d5d618639c3155cfce57101d619a0935434d2'/>
<id>urn:sha1:3d4d5d618639c3155cfce57101d619a0935434d2</id>
<content type='text'>
 - Add an empty linux/compiler_types.h (now being included by kconfig.h)
 - Add __GFP_ZERO
 - Add kzalloc
 - Test __GFP_DIRECT_RECLAIM instead of __GFP_NOWARN

Signed-off-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
</content>
</entry>
<entry>
<title>idr: Make 1-based IDRs more efficient</title>
<updated>2018-02-06T21:41:29Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>mawilcox@microsoft.com</email>
</author>
<published>2017-11-30T18:45:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6ce711f2750031d12cec91384ac5cfa0a485b60a'/>
<id>urn:sha1:6ce711f2750031d12cec91384ac5cfa0a485b60a</id>
<content type='text'>
About 20% of the IDR users in the kernel want the allocated IDs to start
at 1.  The implementation currently searches all the way down the left
hand side of the tree, finds no free ID other than ID 0, walks all the
way back up, and then all the way down again.  This patch 'rebases' the
ID so we fill the entire radix tree, rather than leave a gap at 0.

Chris Wilson says: "I did the quick hack of allocating index 0 of the
idr and that eradicated idr_get_free() from being at the top of the
profiles for the many-object stress tests. This improvement will be
much appreciated."

Signed-off-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
</content>
</entry>
<entry>
<title>idr: Remove idr_alloc_ext</title>
<updated>2018-02-06T21:41:28Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>mawilcox@microsoft.com</email>
</author>
<published>2017-11-28T20:16:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=460488c58ca8b9167463ac22ec9a2e33db351962'/>
<id>urn:sha1:460488c58ca8b9167463ac22ec9a2e33db351962</id>
<content type='text'>
It has no more users, so remove it.  Move idr_alloc() back into idr.c,
move the guts of idr_alloc_cmn() into idr_alloc_u32(), remove the
wrappers around idr_get_free_cmn() and rename it to idr_get_free().
While there is now no interface to allocate IDs larger than a u32,
the IDR internals remain ready to handle a larger ID should a need arise.

These changes make it possible to provide the guarantee that, if the
nextid pointer points into the object, the object's ID will be initialised
before a concurrent lookup can find the object.

Signed-off-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
</content>
</entry>
<entry>
<title>IDR test suite: Check handling negative end correctly</title>
<updated>2018-02-06T20:07:20Z</updated>
<author>
<name>Matthew Wilcox</name>
<email>mawilcox@microsoft.com</email>
</author>
<published>2017-11-28T19:27:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6e6d301490936789ff57daaaaf63f44d928a4028'/>
<id>urn:sha1:6e6d301490936789ff57daaaaf63f44d928a4028</id>
<content type='text'>
One of the charming quirks of the idr_alloc() interface is that you
can pass a negative end and it will be interpreted as "maximum".  Ensure
we don't break that.

Signed-off-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
</content>
</entry>
</feed>
