<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/locking, branch v4.4.175</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.175</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.4.175'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2018-11-21T08:27:31Z</updated>
<entry>
<title>locking/lockdep: Fix debug_locks off performance problem</title>
<updated>2018-11-21T08:27:31Z</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2018-10-19T01:45:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ef42ef8451faae09fe3a0b00c603ceeda8f5f5c5'/>
<id>urn:sha1:ef42ef8451faae09fe3a0b00c603ceeda8f5f5c5</id>
<content type='text'>
[ Upstream commit 9506a7425b094d2f1d9c877ed5a78f416669269b ]

It was found that when debug_locks was turned off because of a problem
found by the lockdep code, the system performance could drop quite
significantly when the lock_stat code was also configured into the
kernel. For instance, parallel kernel build time on a 4-socket x86-64
server nearly doubled.

Further analysis into the cause of the slowdown traced back to the
frequent call to debug_locks_off() from the __lock_acquired() function
probably due to some inconsistent lockdep states with debug_locks
off. The debug_locks_off() function did an unconditional atomic xchg
to write a 0 value into debug_locks which had already been set to 0.
This led to severe cacheline contention in the cacheline that held
debug_locks.  As debug_locks is being referenced in quite a few different
places in the kernel, this greatly slow down the system performance.

To prevent that trashing of debug_locks cacheline, lock_acquired()
and lock_contended() now checks the state of debug_locks before
proceeding. The debug_locks_off() function is also modified to check
debug_locks before calling __debug_locks_off().

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Link: http://lkml.kernel.org/r/1539913518-15598-1-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>locking/osq_lock: Fix osq_lock queue corruption</title>
<updated>2018-09-19T20:48:56Z</updated>
<author>
<name>Prateek Sood</name>
<email>prsood@codeaurora.org</email>
</author>
<published>2017-07-14T13:47:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d914882c936d9c3a1fa4c10d5950c5f0a7d32d79'/>
<id>urn:sha1:d914882c936d9c3a1fa4c10d5950c5f0a7d32d79</id>
<content type='text'>
commit 50972fe78f24f1cd0b9d7bbf1f87d2be9e4f412e upstream.

Fix ordering of link creation between node-&gt;prev and prev-&gt;next in
osq_lock(). A case in which the status of optimistic spin queue is
CPU6-&gt;CPU2 in which CPU6 has acquired the lock.

        tail
          v
  ,-. &lt;- ,-.
  |6|    |2|
  `-' -&gt; `-'

At this point if CPU0 comes in to acquire osq_lock, it will update the
tail count.

  CPU2			CPU0
  ----------------------------------

				       tail
				         v
			  ,-. &lt;- ,-.    ,-.
			  |6|    |2|    |0|
			  `-' -&gt; `-'    `-'

After tail count update if CPU2 starts to unqueue itself from
optimistic spin queue, it will find an updated tail count with CPU0 and
update CPU2 node-&gt;next to NULL in osq_wait_next().

  unqueue-A

	       tail
	         v
  ,-. &lt;- ,-.    ,-.
  |6|    |2|    |0|
  `-'    `-'    `-'

  unqueue-B

  -&gt;tail != curr &amp;&amp; !node-&gt;next

If reordering of following stores happen then prev-&gt;next where prev
being CPU2 would be updated to point to CPU0 node:

				       tail
				         v
			  ,-. &lt;- ,-.    ,-.
			  |6|    |2|    |0|
			  `-'    `-' -&gt; `-'

  osq_wait_next()
    node-&gt;next &lt;- 0
    xchg(node-&gt;next, NULL)

	       tail
	         v
  ,-. &lt;- ,-.    ,-.
  |6|    |2|    |0|
  `-'    `-'    `-'

  unqueue-C

At this point if next instruction
	WRITE_ONCE(next-&gt;prev, prev);
in CPU2 path is committed before the update of CPU0 node-&gt;prev = prev then
CPU0 node-&gt;prev will point to CPU6 node.

	       tail
    v----------. v
  ,-. &lt;- ,-.    ,-.
  |6|    |2|    |0|
  `-'    `-'    `-'
     `----------^

At this point if CPU0 path's node-&gt;prev = prev is committed resulting
in change of CPU0 prev back to CPU2 node. CPU2 node-&gt;next is NULL
currently,

				       tail
			                 v
			  ,-. &lt;- ,-. &lt;- ,-.
			  |6|    |2|    |0|
			  `-'    `-'    `-'
			     `----------^

so if CPU0 gets into unqueue path of osq_lock it will keep spinning
in infinite loop as condition prev-&gt;next == node will never be true.

Signed-off-by: Prateek Sood &lt;prsood@codeaurora.org&gt;
[ Added pictures, rewrote comments. ]
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: sramana@codeaurora.org
Link: http://lkml.kernel.org/r/1500040076-27626-1-git-send-email-prsood@codeaurora.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Amit Pundir &lt;amit.pundir@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>locking/rwsem-xadd: Fix missed wakeup due to reordering of load</title>
<updated>2018-09-19T20:48:56Z</updated>
<author>
<name>Prateek Sood</name>
<email>prsood@codeaurora.org</email>
</author>
<published>2017-09-07T14:30:58Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=70cc08c44fb55b587c7485a15549e9f9a12c9405'/>
<id>urn:sha1:70cc08c44fb55b587c7485a15549e9f9a12c9405</id>
<content type='text'>
commit 9c29c31830a4eca724e137a9339137204bbb31be upstream.

If a spinner is present, there is a chance that the load of
rwsem_has_spinner() in rwsem_wake() can be reordered with
respect to decrement of rwsem count in __up_write() leading
to wakeup being missed:

 spinning writer                  up_write caller
 ---------------                  -----------------------
 [S] osq_unlock()                 [L] osq
  spin_lock(wait_lock)
  sem-&gt;count=0xFFFFFFFF00000001
            +0xFFFFFFFF00000000
  count=sem-&gt;count
  MB
                                   sem-&gt;count=0xFFFFFFFE00000001
                                             -0xFFFFFFFF00000001
                                   spin_trylock(wait_lock)
                                   return
 rwsem_try_write_lock(count)
 spin_unlock(wait_lock)
 schedule()

Reordering of atomic_long_sub_return_release() in __up_write()
and rwsem_has_spinner() in rwsem_wake() can cause missing of
wakeup in up_write() context. In spinning writer, sem-&gt;count
and local variable count is 0XFFFFFFFE00000001. It would result
in rwsem_try_write_lock() failing to acquire rwsem and spinning
writer going to sleep in rwsem_down_write_failed().

The smp_rmb() will make sure that the spinner state is
consulted after sem-&gt;count is updated in up_write context.

Signed-off-by: Prateek Sood &lt;prsood@codeaurora.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: dave@stgolabs.net
Cc: longman@redhat.com
Cc: parri.andrea@gmail.com
Cc: sramana@codeaurora.org
Link: http://lkml.kernel.org/r/1504794658-15397-1-git-send-email-prsood@codeaurora.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Amit Pundir &lt;amit.pundir@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>locking/lockdep: Do not record IRQ state within lockdep code</title>
<updated>2018-08-24T11:26:55Z</updated>
<author>
<name>Steven Rostedt (VMware)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2018-04-04T18:06:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c40dc96f7f7e29d7c1e520c46c87178c2d4b1dcc'/>
<id>urn:sha1:c40dc96f7f7e29d7c1e520c46c87178c2d4b1dcc</id>
<content type='text'>
[ Upstream commit fcc784be837714a9173b372ff9fb9b514590dad9 ]

While debugging where things were going wrong with mapping
enabling/disabling interrupts with the lockdep state and actual real
enabling and disabling interrupts, I had to silent the IRQ
disabling/enabling in debug_check_no_locks_freed() because it was
always showing up as it was called before the splat was.

Use raw_local_irq_save/restore() for not only debug_check_no_locks_freed()
but for all internal lockdep functions, as they hide useful information
about where interrupts were used incorrectly last.

Signed-off-by: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Link: https://lkml.kernel.org/lkml/20180404140630.3f4f4c7a@gandalf.local.home
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>locking/qspinlock: Ensure node-&gt;count is updated before initialising node</title>
<updated>2018-05-30T05:48:57Z</updated>
<author>
<name>Will Deacon</name>
<email>will.deacon@arm.com</email>
</author>
<published>2018-02-13T13:22:57Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=abd9138a1b0a987498296f93be174aab0d41b051'/>
<id>urn:sha1:abd9138a1b0a987498296f93be174aab0d41b051</id>
<content type='text'>
[ Upstream commit 11dc13224c975efcec96647a4768a6f1bb7a19a8 ]

When queuing on the qspinlock, the count field for the current CPU's head
node is incremented. This needn't be atomic because locking in e.g. IRQ
context is balanced and so an IRQ will return with node-&gt;count as it
found it.

However, the compiler could in theory reorder the initialisation of
node[idx] before the increment of the head node-&gt;count, causing an
IRQ to overwrite the initialised node and potentially corrupt the lock
state.

Avoid the potential for this harmful compiler reordering by placing a
barrier() between the increment of the head node-&gt;count and the subsequent
node initialisation.

Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: http://lkml.kernel.org/r/1518528177-19169-3-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>locking/mutex: Allow next waiter lockless wakeup</title>
<updated>2018-01-17T08:35:27Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2016-01-25T02:23:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bd44e3f19d14e196fdd2635698ff5612e971dfa5'/>
<id>urn:sha1:bd44e3f19d14e196fdd2635698ff5612e971dfa5</id>
<content type='text'>
commit 1329ce6fbbe4536592dfcfc8d64d61bfeb598fe6 upstream.

Make use of wake-queues and enable the wakeup to occur after releasing the
wait_lock. This is similar to what we do with rtmutex top waiter,
slightly shortening the critical region and allow other waiters to
acquire the wait_lock sooner. In low contention cases it can also help
the recently woken waiter to find the wait_lock available (fastpath)
when it continues execution.

Reviewed-by: Waiman Long &lt;Waiman.Long@hpe.com&gt;
Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Ding Tianhong &lt;dingtianhong@huawei.com&gt;
Cc: Jason Low &lt;jason.low2@hp.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Paul E. McKenney &lt;paulmck@us.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Cc: Waiman Long &lt;waiman.long@hpe.com&gt;
Cc: Will Deacon &lt;Will.Deacon@arm.com&gt;
Link: http://lkml.kernel.org/r/20160125022343.GA3322@linux-uzut.site
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>locking/lockdep: Add nest_lock integrity test</title>
<updated>2017-10-21T15:09:03Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2017-03-01T15:23:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=28eab3db727efb7ad4eb17aaa83df59c3d50e330'/>
<id>urn:sha1:28eab3db727efb7ad4eb17aaa83df59c3d50e330</id>
<content type='text'>
[ Upstream commit 7fb4a2cea6b18dab56d609530d077f168169ed6b ]

Boqun reported that hlock-&gt;references can overflow. Add a debug test
for that to generate a clear error when this happens.

Without this, lockdep is likely to report a mysterious failure on
unlock.

Reported-by: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Chris Wilson &lt;chris@chris-wilson.co.uk&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Nicolai Hähnle &lt;Nicolai.Haehnle@amd.com&gt;
Cc: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@verizon.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>locktorture: Fix potential memory leak with rw lock test</title>
<updated>2017-09-13T21:09:46Z</updated>
<author>
<name>Yang Shi</name>
<email>yang.shi@linaro.org</email>
</author>
<published>2016-11-10T21:06:39Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=10863607c242e970cfc14c42b35689737c397fe4'/>
<id>urn:sha1:10863607c242e970cfc14c42b35689737c397fe4</id>
<content type='text'>
commit f4dbba591945dc301c302672adefba9e2ec08dc5 upstream.

When running locktorture module with the below commands with kmemleak enabled:

$ modprobe locktorture torture_type=rw_lock_irq
$ rmmod locktorture

The below kmemleak got caught:

root@10:~# echo scan &gt; /sys/kernel/debug/kmemleak
[  323.197029] kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
root@10:~# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffffffc07592d500 (size 128):
  comm "modprobe", pid 368, jiffies 4294924118 (age 205.824s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 c3 7b 02 00 00 00 00 00  .........{......
    00 00 00 00 00 00 00 00 d7 9b 02 00 00 00 00 00  ................
  backtrace:
    [&lt;ffffff80081e5a88&gt;] create_object+0x110/0x288
    [&lt;ffffff80086c6078&gt;] kmemleak_alloc+0x58/0xa0
    [&lt;ffffff80081d5acc&gt;] __kmalloc+0x234/0x318
    [&lt;ffffff80006fa130&gt;] 0xffffff80006fa130
    [&lt;ffffff8008083ae4&gt;] do_one_initcall+0x44/0x138
    [&lt;ffffff800817e28c&gt;] do_init_module+0x68/0x1cc
    [&lt;ffffff800811c848&gt;] load_module+0x1a68/0x22e0
    [&lt;ffffff800811d340&gt;] SyS_finit_module+0xe0/0xf0
    [&lt;ffffff80080836f0&gt;] el0_svc_naked+0x24/0x28
    [&lt;ffffffffffffffff&gt;] 0xffffffffffffffff
unreferenced object 0xffffffc07592d480 (size 128):
  comm "modprobe", pid 368, jiffies 4294924118 (age 205.824s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 3b 6f 01 00 00 00 00 00  ........;o......
    00 00 00 00 00 00 00 00 23 6a 01 00 00 00 00 00  ........#j......
  backtrace:
    [&lt;ffffff80081e5a88&gt;] create_object+0x110/0x288
    [&lt;ffffff80086c6078&gt;] kmemleak_alloc+0x58/0xa0
    [&lt;ffffff80081d5acc&gt;] __kmalloc+0x234/0x318
    [&lt;ffffff80006fa22c&gt;] 0xffffff80006fa22c
    [&lt;ffffff8008083ae4&gt;] do_one_initcall+0x44/0x138
    [&lt;ffffff800817e28c&gt;] do_init_module+0x68/0x1cc
    [&lt;ffffff800811c848&gt;] load_module+0x1a68/0x22e0
    [&lt;ffffff800811d340&gt;] SyS_finit_module+0xe0/0xf0
    [&lt;ffffff80080836f0&gt;] el0_svc_naked+0x24/0x28
    [&lt;ffffffffffffffff&gt;] 0xffffffffffffffff

It is because cxt.lwsa and cxt.lrsa don't get freed in module_exit, so free
them in lock_torture_cleanup() and free writer_tasks if reader_tasks is
failed at memory allocation.

Signed-off-by: Yang Shi &lt;yang.shi@linaro.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Reviewed-by: Josh Triplett &lt;josh@joshtriplett.org&gt;
Cc: 石洋 &lt;yang.s@alibaba-inc.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>locking/rtmutex: Use READ_ONCE() in rt_mutex_owner()</title>
<updated>2016-12-15T16:49:22Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2016-11-30T21:04:42Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c6a5bf4cda12e4a3c097036ce7044ee65e4cf6d9'/>
<id>urn:sha1:c6a5bf4cda12e4a3c097036ce7044ee65e4cf6d9</id>
<content type='text'>
commit 1be5d4fa0af34fb7bafa205aeb59f5c7cc7a089d upstream.

While debugging the rtmutex unlock vs. dequeue race Will suggested to use
READ_ONCE() in rt_mutex_owner() as it might race against the
cmpxchg_release() in unlock_rt_mutex_safe().

Will: "It's a minor thing which will most likely not matter in practice"

Careful search did not unearth an actual problem in todays code, but it's
better to be safe than surprised.

Suggested-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: David Daney &lt;ddaney@caviumnetworks.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Sebastian Siewior &lt;bigeasy@linutronix.de&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Link: http://lkml.kernel.org/r/20161130210030.431379999@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>locking/rtmutex: Prevent dequeue vs. unlock race</title>
<updated>2016-12-15T16:49:22Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2016-11-30T21:04:41Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b27d9147f24a501d632916964f77c1221246fae9'/>
<id>urn:sha1:b27d9147f24a501d632916964f77c1221246fae9</id>
<content type='text'>
commit dbb26055defd03d59f678cb5f2c992abe05b064a upstream.

David reported a futex/rtmutex state corruption. It's caused by the
following problem:

CPU0		CPU1		CPU2

l-&gt;owner=T1
		rt_mutex_lock(l)
		lock(l-&gt;wait_lock)
		l-&gt;owner = T1 | HAS_WAITERS;
		enqueue(T2)
		boost()
		  unlock(l-&gt;wait_lock)
		schedule()

				rt_mutex_lock(l)
				lock(l-&gt;wait_lock)
				l-&gt;owner = T1 | HAS_WAITERS;
				enqueue(T3)
				boost()
				  unlock(l-&gt;wait_lock)
				schedule()
		signal(-&gt;T2)	signal(-&gt;T3)
		lock(l-&gt;wait_lock)
		dequeue(T2)
		deboost()
		  unlock(l-&gt;wait_lock)
				lock(l-&gt;wait_lock)
				dequeue(T3)
				  ===&gt; wait list is now empty
				deboost()
				 unlock(l-&gt;wait_lock)
		lock(l-&gt;wait_lock)
		fixup_rt_mutex_waiters()
		  if (wait_list_empty(l)) {
		    owner = l-&gt;owner &amp; ~HAS_WAITERS;
		    l-&gt;owner = owner
		     ==&gt; l-&gt;owner = T1
		  }

				lock(l-&gt;wait_lock)
rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
				  if (wait_list_empty(l)) {
				    owner = l-&gt;owner &amp; ~HAS_WAITERS;
cmpxchg(l-&gt;owner, T1, NULL)
 ===&gt; Success (l-&gt;owner = NULL)
				    l-&gt;owner = owner
				     ==&gt; l-&gt;owner = T1
				  }

That means the problem is caused by fixup_rt_mutex_waiters() which does the
RMW to clear the waiters bit unconditionally when there are no waiters in
the rtmutexes rbtree.

This can be fatal: A concurrent unlock can release the rtmutex in the
fastpath because the waiters bit is not set. If the cmpxchg() gets in the
middle of the RMW operation then the previous owner, which just unlocked
the rtmutex is set as the owner again when the write takes place after the
successfull cmpxchg().

The solution is rather trivial: verify that the owner member of the rtmutex
has the waiters bit set before clearing it. This does not require a
cmpxchg() or other atomic operations because the waiters bit can only be
set and cleared with the rtmutex wait_lock held. It's also safe against the
fast path unlock attempt. The unlock attempt via cmpxchg() will either see
the bit set and take the slowpath or see the bit cleared and release it
atomically in the fastpath.

It's remarkable that the test program provided by David triggers on ARM64
and MIPS64 really quick, but it refuses to reproduce on x86-64, while the
problem exists there as well. That refusal might explain that this got not
discovered earlier despite the bug existing from day one of the rtmutex
implementation more than 10 years ago.

Thanks to David for meticulously instrumenting the code and providing the
information which allowed to decode this subtle problem.

Reported-by: David Daney &lt;ddaney@caviumnetworks.com&gt;
Tested-by: David Daney &lt;david.daney@cavium.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Sebastian Siewior &lt;bigeasy@linutronix.de&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core")
Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
</feed>
