<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/ipc, branch v3.16.58</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.16.58</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.16.58'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2017-08-26T01:14:49Z</updated>
<entry>
<title>mqueue: fix a use-after-free in sys_mq_notify()</title>
<updated>2017-08-26T01:14:49Z</updated>
<author>
<name>Cong Wang</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2017-07-09T20:19:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b6ec053486fa50c169eb377e53c72b2a481fbd00'/>
<id>urn:sha1:b6ec053486fa50c169eb377e53c72b2a481fbd00</id>
<content type='text'>
commit f991af3daabaecff34684fd51fac80319d1baad1 upstream.

The retry logic for netlink_attachskb() inside sys_mq_notify()
is nasty and vulnerable:

1) The sock refcnt is already released when retry is needed
2) The fd is controllable by user-space because we already
   release the file refcnt

so we when retry but the fd has been just closed by user-space
during this small window, we end up calling netlink_detachskb()
on the error path which releases the sock again, later when
the user-space closes this socket a use-after-free could be
triggered.

Setting 'sock' to NULL here should be sufficient to fix it.

Reported-by: GeneBlue &lt;geneblue.mail@gmail.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>ipc/shm: Fix shmat mmap nil-page protection</title>
<updated>2017-03-16T02:27:27Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2017-02-27T22:28:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=13508cb3bf17a73b91957e7070112b710bdfadc2'/>
<id>urn:sha1:13508cb3bf17a73b91957e7070112b710bdfadc2</id>
<content type='text'>
commit 95e91b831f87ac8e1f8ed50c14d709089b4e01b8 upstream.

The issue is described here, with a nice testcase:

    https://bugzilla.kernel.org/show_bug.cgi?id=192931

The problem is that shmat() calls do_mmap_pgoff() with MAP_FIXED, and
the address rounded down to 0.  For the regular mmap case, the
protection mentioned above is that the kernel gets to generate the
address -- arch_get_unmapped_area() will always check for MAP_FIXED and
return that address.  So by the time we do security_mmap_addr(0) things
get funky for shmat().

The testcase itself shows that while a regular user crashes, root will
not have a problem attaching a nil-page.  There are two possible fixes
to this.  The first, and which this patch does, is to simply allow root
to crash as well -- this is also regular mmap behavior, ie when hacking
up the testcase and adding mmap(...  |MAP_FIXED).  While this approach
is the safer option, the second alternative is to ignore SHM_RND if the
rounded address is 0, thus only having MAP_SHARED flags.  This makes the
behavior of shmat() identical to the mmap() case.  The downside of this
is obviously user visible, but does make sense in that it maintains
semantics after the round-down wrt 0 address and mmap.

Passes shm related ltp tests.

Link: http://lkml.kernel.org/r/1486050195-18629-1-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Reported-by: Gareth Evans &lt;gareth.evans@contextis.co.uk&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@googlemail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>ipc/sem.c: fix complex_count vs. simple op race</title>
<updated>2017-02-23T03:54:12Z</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2016-10-11T20:54:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=accb9f16adbaeb5878974abc7b0e334790e59c11'/>
<id>urn:sha1:accb9f16adbaeb5878974abc7b0e334790e59c11</id>
<content type='text'>
commit 5864a2fd3088db73d47942370d0f7210a807b9bc upstream.

Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a
race:

sem_lock has a fast path that allows parallel simple operations.
There are two reasons why a simple operation cannot run in parallel:
 - a non-simple operations is ongoing (sma-&gt;sem_perm.lock held)
 - a complex operation is sleeping (sma-&gt;complex_count != 0)

As both facts are stored independently, a thread can bypass the current
checks by sleeping in the right positions.  See below for more details
(or kernel bugzilla 105651).

The patch fixes that by creating one variable (complex_mode)
that tracks both reasons why parallel operations are not possible.

The patch also updates stale documentation regarding the locking.

With regards to stable kernels:
The patch is required for all kernels that include the
commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") (3.10?)

The alternative is to revert the patch that introduced the race.

The patch is safe for backporting, i.e. it makes no assumptions
about memory barriers in spin_unlock_wait().

Background:
Here is the race of the current implementation:

Thread A: (simple op)
- does the first "sma-&gt;complex_count == 0" test

Thread B: (complex op)
- does sem_lock(): This includes an array scan. But the scan can't
  find Thread A, because Thread A does not own sem-&gt;lock yet.
- the thread does the operation, increases complex_count,
  drops sem_lock, sleeps

Thread A:
- spin_lock(&amp;sem-&gt;lock), spin_is_locked(sma-&gt;sem_perm.lock)
- sleeps before the complex_count test

Thread C: (complex op)
- does sem_lock (no array scan, complex_count==1)
- wakes up Thread B.
- decrements complex_count

Thread A:
- does the complex_count test

Bug:
Now both thread A and thread C operate on the same array, without
any synchronization.

Fixes: 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()")
Link: http://lkml.kernel.org/r/1469123695-5661-1-git-send-email-manfred@colorfullife.com
Reported-by: &lt;felixh@informatik.uni-bremen.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: &lt;1vier1@web.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[bwh: Backported to 3.16:
 - We missed out on some earlier memory barrier changes
 - Use set_mb instead of smp_store_mb]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>ipc: remove use of seq_printf return value</title>
<updated>2017-02-23T03:54:11Z</updated>
<author>
<name>Joe Perches</name>
<email>joe@perches.com</email>
</author>
<published>2015-04-15T23:17:54Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=5b11c133308bb386e8b6eb6dfc07f232bf0306a1'/>
<id>urn:sha1:5b11c133308bb386e8b6eb6dfc07f232bf0306a1</id>
<content type='text'>
commit 7f032d6ef6154868a2a5d5f6b2c3f8587292196c upstream.

The seq_printf return value, because it's frequently misused,
will eventually be converted to void.

See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to
     seq_has_overflowed() and make public")

Signed-off-by: Joe Perches &lt;joe@perches.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>sysv, ipc: fix security-layer leaking</title>
<updated>2016-11-20T01:17:04Z</updated>
<author>
<name>Fabian Frederick</name>
<email>fabf@skynet.be</email>
</author>
<published>2016-08-02T21:03:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=08397b1a5cd433944e424de2be1f92eefb3836c2'/>
<id>urn:sha1:08397b1a5cd433944e424de2be1f92eefb3836c2</id>
<content type='text'>
commit 9b24fef9f0410fb5364245d6cc2bd044cc064007 upstream.

Commit 53dad6d3a8e5 ("ipc: fix race with LSMs") updated ipc_rcu_putref()
to receive rcu freeing function but used generic ipc_rcu_free() instead
of msg_rcu_free() which does security cleaning.

Running LTP msgsnd06 with kmemleak gives the following:

  cat /sys/kernel/debug/kmemleak

  unreferenced object 0xffff88003c0a11f8 (size 8):
    comm "msgsnd06", pid 1645, jiffies 4294672526 (age 6.549s)
    hex dump (first 8 bytes):
      1b 00 00 00 01 00 00 00                          ........
    backtrace:
      kmemleak_alloc+0x23/0x40
      kmem_cache_alloc_trace+0xe1/0x180
      selinux_msg_queue_alloc_security+0x3f/0xd0
      security_msg_queue_alloc+0x2e/0x40
      newque+0x4e/0x150
      ipcget+0x159/0x1b0
      SyS_msgget+0x39/0x40
      entry_SYSCALL_64_fastpath+0x13/0x8f

Manfred Spraul suggested to fix sem.c as well and Davidlohr Bueso to
only use ipc_rcu_free in case of security allocation failure in newary()

Fixes: 53dad6d3a8e ("ipc: fix race with LSMs")
Link: http://lkml.kernel.org/r/1470083552-22966-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick &lt;fabf@skynet.be&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
</entry>
<entry>
<title>ipc/sem.c: fully initialize sem_array before making it visible</title>
<updated>2015-10-28T10:33:14Z</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2014-12-02T23:59:34Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=35cfc2b3a9dafbe486e3198b80a5b25076d91ce1'/>
<id>urn:sha1:35cfc2b3a9dafbe486e3198b80a5b25076d91ce1</id>
<content type='text'>
commit e8577d1f0329d4842e8302e289fb2c22156abef4 upstream.

ipc_addid() makes a new ipc identifier visible to everyone.  New objects
start as locked, so that the caller can complete the initialization
after the call.  Within struct sem_array, at least sma-&gt;sem_base and
sma-&gt;sem_nsems are accessed without any locks, therefore this approach
doesn't work.

Thus: Move the ipc_addid() to the end of the initialization.

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Reported-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Acked-by: Rafael Aquini &lt;aquini@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Luis Henriques &lt;luis.henriques@canonical.com&gt;
</content>
</entry>
<entry>
<title>Initialize msg/shm IPC objects before doing ipc_addid()</title>
<updated>2015-10-28T10:33:06Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-09-30T16:48:40Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=792d3057eb976a74671c406a6b70a73652cd01e2'/>
<id>urn:sha1:792d3057eb976a74671c406a6b70a73652cd01e2</id>
<content type='text'>
commit b9a532277938798b53178d5a66af6e2915cb27cf upstream.

As reported by Dmitry Vyukov, we really shouldn't do ipc_addid() before
having initialized the IPC object state.  Yes, we initialize the IPC
object in a locked state, but with all the lockless RCU lookup work,
that IPC object lock no longer means that the state cannot be seen.

We already did this for the IPC semaphore code (see commit e8577d1f0329:
"ipc/sem.c: fully initialize sem_array before making it visible") but we
clearly forgot about msg and shm.

Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques &lt;luis.henriques@canonical.com&gt;
</content>
</entry>
<entry>
<title>ipc/sem.c: update/correct memory barriers</title>
<updated>2015-09-03T12:29:54Z</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2015-08-14T22:35:10Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=69a9a86b645f2f9df3df71104adf88197628a0cb'/>
<id>urn:sha1:69a9a86b645f2f9df3df71104adf88197628a0cb</id>
<content type='text'>
commit 3ed1f8a99d70ea1cd1508910eb107d0edcae5009 upstream.

sem_lock() did not properly pair memory barriers:

!spin_is_locked() and spin_unlock_wait() are both only control barriers.
The code needs an acquire barrier, otherwise the cpu might perform read
operations before the lock test.

As no primitive exists inside &lt;include/spinlock.h&gt; and since it seems
noone wants another primitive, the code creates a local primitive within
ipc/sem.c.

With regards to -stable:

The change of sem_wait_array() is a bugfix, the change to sem_lock() is a
nop (just a preprocessor redefinition to improve the readability).  The
bugfix is necessary for all kernels that use sem_wait_array() (i.e.:
starting from 3.10).

Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Reported-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Kirill Tkhai &lt;ktkhai@parallels.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Luis Henriques &lt;luis.henriques@canonical.com&gt;
</content>
</entry>
<entry>
<title>ipc/sem.c: change memory barrier in sem_lock() to smp_rmb()</title>
<updated>2015-09-03T12:29:53Z</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2014-12-13T00:58:11Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=30f995ba77ca406d872e0833276bb4f9b8bf740c'/>
<id>urn:sha1:30f995ba77ca406d872e0833276bb4f9b8bf740c</id>
<content type='text'>
commit 2e094abfd1f29a08a60523b42d4508281b8dee0e upstream.

When I fixed bugs in the sem_lock() logic, I was more conservative than
necessary.  Therefore it is safe to replace the smp_mb() with smp_rmb().
And: With smp_rmb(), semop() syscalls are up to 10% faster.

The race we must protect against is:

	sem-&gt;lock is free
	sma-&gt;complex_count = 0
	sma-&gt;sem_perm.lock held by thread B

thread A:

A: spin_lock(&amp;sem-&gt;lock)

			B: sma-&gt;complex_count++; (now 1)
			B: spin_unlock(&amp;sma-&gt;sem_perm.lock);

A: spin_is_locked(&amp;sma-&gt;sem_perm.lock);
A: XXXXX memory barrier
A: if (sma-&gt;complex_count == 0)

Thread A must read the increased complex_count value, i.e. the read must
not be reordered with the read of sem_perm.lock done by spin_is_locked().

Since it's about ordering of reads, smp_rmb() is sufficient.

[akpm@linux-foundation.org: update sem_lock() comment, from Davidlohr]
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Reviewed-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Acked-by: Rafael Aquini &lt;aquini@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[ luis: 3.16 prereq for:
  3ed1f8a99d70 "ipc/sem.c: update/correct memory barriers" ]
Signed-off-by: Luis Henriques &lt;luis.henriques@canonical.com&gt;
</content>
</entry>
<entry>
<title>ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits</title>
<updated>2015-09-03T12:29:53Z</updated>
<author>
<name>Herton R. Krzesinski</name>
<email>herton@redhat.com</email>
</author>
<published>2015-08-14T22:35:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=76ce4fe19d6b3ade32baaf8e6fbfbc0290c101a7'/>
<id>urn:sha1:76ce4fe19d6b3ade32baaf8e6fbfbc0290c101a7</id>
<content type='text'>
commit 602b8593d2b4138c10e922eeaafe306f6b51817b upstream.

The current semaphore code allows a potential use after free: in
exit_sem we may free the task's sem_undo_list while there is still
another task looping through the same semaphore set and cleaning the
sem_undo list at freeary function (the task called IPC_RMID for the same
semaphore set).

For example, with a test program [1] running which keeps forking a lot
of processes (which then do a semop call with SEM_UNDO flag), and with
the parent right after removing the semaphore set with IPC_RMID, and a
kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
in the kernel log:

   Slab corruption (Not tainted): kmalloc-64 start=ffff88003b45c1c0, len=64
   000: 6b 6b 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b  kkkkkkkk.kkkkkkk
   010: ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
   Prev obj: start=ffff88003b45c180, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff c0 fb 01 37 00 88 ff ff  ...........7....
   Next obj: start=ffff88003b45c200, len=64
   000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
   010: ff ff ff ff ff ff ff ff 68 29 a7 3c 00 88 ff ff  ........h).&lt;....
   BUG: spinlock wrong CPU on CPU#2, test/18028
   general protection fault: 0000 [#1] SMP
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 2 PID: 18028 Comm: test Not tainted 4.2.0-rc5+ #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: spin_dump+0x53/0xc0
   Call Trace:
     spin_bug+0x30/0x40
     do_raw_spin_unlock+0x71/0xa0
     _raw_spin_unlock+0xe/0x10
     freeary+0x82/0x2a0
     ? _raw_spin_lock+0xe/0x10
     semctl_down.clone.0+0xce/0x160
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     SyS_semctl+0x236/0x2c0
     ? syscall_trace_leave+0xde/0x130
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2c a4 81 31 c0 65 8b 15 eb 40 f3 7e e8 08 31 68 00 4d 85 e4 44 8b 4b 08 74 5e &lt;45&gt; 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89
   RIP  [&lt;ffffffff810d6053&gt;] spin_dump+0x53/0xc0
    RSP &lt;ffff88003750fd68&gt;
   ---[ end trace 783ebb76612867a0 ]---
   NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:18053]
   Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
   CPU: 3 PID: 18053 Comm: test Tainted: G      D         4.2.0-rc5+ #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
   RIP: native_read_tsc+0x0/0x20
   Call Trace:
     ? delay_tsc+0x40/0x70
     __delay+0xf/0x20
     do_raw_spin_lock+0x96/0x140
     _raw_spin_lock+0xe/0x10
     sem_lock_and_putref+0x11/0x70
     SYSC_semtimedop+0x7bf/0x960
     ? handle_mm_fault+0xbf6/0x1880
     ? dequeue_task_fair+0x79/0x4a0
     ? __do_page_fault+0x19a/0x430
     ? kfree_debugcheck+0x16/0x40
     ? __do_page_fault+0x19a/0x430
     ? __audit_syscall_entry+0xa8/0x100
     ? do_audit_syscall_entry+0x66/0x70
     ? syscall_trace_enter_phase1+0x139/0x160
     SyS_semtimedop+0xe/0x10
     SyS_semop+0x10/0x20
     entry_SYSCALL_64_fastpath+0x12/0x71
   Code: 47 10 83 e8 01 85 c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 &lt;55&gt; 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9
   Kernel panic - not syncing: softlockup: hung tasks

I wasn't able to trigger any badness on a recent kernel without the
proper config debugs enabled, however I have softlockup reports on some
kernel versions, in the semaphore code, which are similar as above (the
scenario is seen on some servers running IBM DB2 which uses semaphore
syscalls).

The patch here fixes the race against freeary, by acquiring or waiting
on the sem_undo_list lock as necessary (exit_sem can race with freeary,
while freeary sets un-&gt;semid to -1 and removes the same sem_undo from
list_proc or when it removes the last sem_undo).

After the patch I'm unable to reproduce the problem using the test case
[1].

[1] Test case used below:

    #include &lt;stdio.h&gt;
    #include &lt;sys/types.h&gt;
    #include &lt;sys/ipc.h&gt;
    #include &lt;sys/sem.h&gt;
    #include &lt;sys/wait.h&gt;
    #include &lt;stdlib.h&gt;
    #include &lt;time.h&gt;
    #include &lt;unistd.h&gt;
    #include &lt;errno.h&gt;

    #define NSEM 1
    #define NSET 5

    int sid[NSET];

    void thread()
    {
            struct sembuf op;
            int s;
            uid_t pid = getuid();

            s = rand() % NSET;
            op.sem_num = pid % NSEM;
            op.sem_op = 1;
            op.sem_flg = SEM_UNDO;

            semop(sid[s], &amp;op, 1);
            exit(EXIT_SUCCESS);
    }

    void create_set()
    {
            int i, j;
            pid_t p;
            union {
                    int val;
                    struct semid_ds *buf;
                    unsigned short int *array;
                    struct seminfo *__buf;
            } un;

            /* Create and initialize semaphore set */
            for (i = 0; i &lt; NSET; i++) {
                    sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
                    if (sid[i] &lt; 0) {
                            perror("semget");
                            exit(EXIT_FAILURE);
                    }
            }
            un.val = 0;
            for (i = 0; i &lt; NSET; i++) {
                    for (j = 0; j &lt; NSEM; j++) {
                            if (semctl(sid[i], j, SETVAL, un) &lt; 0)
                                    perror("semctl");
                    }
            }

            /* Launch threads that operate on semaphore set */
            for (i = 0; i &lt; NSEM * NSET * NSET; i++) {
                    p = fork();
                    if (p &lt; 0)
                            perror("fork");
                    if (p == 0)
                            thread();
            }

            /* Free semaphore set */
            for (i = 0; i &lt; NSET; i++) {
                    if (semctl(sid[i], NSEM, IPC_RMID))
                            perror("IPC_RMID");
            }

            /* Wait for forked processes to exit */
            while (wait(NULL)) {
                    if (errno == ECHILD)
                            break;
            };
    }

    int main(int argc, char **argv)
    {
            pid_t p;

            srand(time(NULL));

            while (1) {
                    p = fork();
                    if (p &lt; 0) {
                            perror("fork");
                            exit(EXIT_FAILURE);
                    }
                    if (p == 0) {
                            create_set();
                            goto end;
                    }

                    /* Wait for forked processes to exit */
                    while (wait(NULL)) {
                            if (errno == ECHILD)
                                    break;
                    };
            }
    end:
            return 0;
    }

[akpm@linux-foundation.org: use normal comment layout]
Signed-off-by: Herton R. Krzesinski &lt;herton@redhat.com&gt;
Acked-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Rafael Aquini &lt;aquini@redhat.com&gt;
CC: Aristeu Rozanski &lt;aris@redhat.com&gt;
Cc: David Jeffery &lt;djeffery@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Luis Henriques &lt;luis.henriques@canonical.com&gt;
</content>
</entry>
</feed>
