<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/ipc, branch v4.9.145</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.9.145</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v4.9.145'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2018-05-30T05:50:17Z</updated>
<entry>
<title>ipc/shm: fix shmat() nil address after round-down when remapping</title>
<updated>2018-05-30T05:50:17Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2018-05-25T21:47:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=9c798bc19e1b42ca7ece9523fcb6a6751f689561'/>
<id>urn:sha1:9c798bc19e1b42ca7ece9523fcb6a6751f689561</id>
<content type='text'>
commit 8f89c007b6dec16a1793cb88de88fcc02117bbbc upstream.

shmat()'s SHM_REMAP option forbids passing a nil address for; this is in
fact the very first thing we check for.  Andrea reported that for
SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check,
but we need to check again if the address was rounded down to nil.  As
of this patch, such cases will return -EINVAL.

Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805
Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Reported-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Joe Lawrence &lt;joe.lawrence@redhat.com&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Revert "ipc/shm: Fix shmat mmap nil-page protection"</title>
<updated>2018-05-30T05:50:17Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2018-05-25T21:47:27Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2ef44a3c1a32656dbae30cd16ec5c22a996a4ca9'/>
<id>urn:sha1:2ef44a3c1a32656dbae30cd16ec5c22a996a4ca9</id>
<content type='text'>
commit a73ab244f0dad8fffb3291b905f73e2d3eaa7c00 upstream.

Patch series "ipc/shm: shmat() fixes around nil-page".

These patches fix two issues reported[1] a while back by Joe and Andrea
around how shmat(2) behaves with nil-page.

The first reverts a commit that it was incorrectly thought that mapping
nil-page (address=0) was a no no with MAP_FIXED.  This is not the case,
with the exception of SHM_REMAP; which is address in the second patch.

I chose two patches because it is easier to backport and it explicitly
reverts bogus behaviour.  Both patches ought to be in -stable and ltp
testcases need updated (the added testcase around the cve can be
modified to just test for SHM_RND|SHM_REMAP).

[1] lkml.kernel.org/r/20180430172152.nfa564pvgpk3ut7p@linux-n805

This patch (of 2):

Commit 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
worked on the idea that we should not be mapping as root addr=0 and
MAP_FIXED.  However, it was reported that this scenario is in fact
valid, thus making the patch both bogus and breaks userspace as well.

For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem
initialization[1].

[1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347
Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net
Fixes: 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Reported-by: Joe Lawrence &lt;joe.lawrence@redhat.com&gt;
Reported-by: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>ipc/shm: fix use-after-free of shm file via remap_file_pages()</title>
<updated>2018-04-24T07:34:09Z</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2018-04-13T22:35:30Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=570ef10de6304dc20239d30a47b36c12e341d4be'/>
<id>urn:sha1:570ef10de6304dc20239d30a47b36c12e341d4be</id>
<content type='text'>
commit 3f05317d9889ab75c7190dcd39491d2a97921984 upstream.

syzbot reported a use-after-free of shm_file_data(file)-&gt;file-&gt;f_op in
shm_get_unmapped_area(), called via sys_remap_file_pages().

Unfortunately it couldn't generate a reproducer, but I found a bug which
I think caused it.  When remap_file_pages() is passed a full System V
shared memory segment, the memory is first unmapped, then a new map is
created using the -&gt;vm_file.  Between these steps, the shm ID can be
removed and reused for a new shm segment.  But, shm_mmap() only checks
whether the ID is currently valid before calling the underlying file's
-&gt;mmap(); it doesn't check whether it was reused.  Thus it can use the
wrong underlying file, one that was already freed.

Fix this by making the "outer" shm file (the one that gets put in
-&gt;vm_file) hold a reference to the real shm file, and by making
__shm_open() require that the file associated with the shm ID matches
the one associated with the "outer" file.

Taking the reference to the real shm file is needed to fully solve the
problem, since otherwise sfd-&gt;file could point to a freed file, which
then could be reallocated for the reused shm ID, causing the wrong shm
segment to be mapped (and without the required permission checks).

Commit 1ac0b6dec656 ("ipc/shm: handle removed segments gracefully in
shm_mmap()") almost fixed this bug, but it didn't go far enough because
it didn't consider the case where the shm ID is reused.

The following program usually reproduces this bug:

	#include &lt;stdlib.h&gt;
	#include &lt;sys/shm.h&gt;
	#include &lt;sys/syscall.h&gt;
	#include &lt;unistd.h&gt;

	int main()
	{
		int is_parent = (fork() != 0);
		srand(getpid());
		for (;;) {
			int id = shmget(0xF00F, 4096, IPC_CREAT|0700);
			if (is_parent) {
				void *addr = shmat(id, NULL, 0);
				usleep(rand() % 50);
				while (!syscall(__NR_remap_file_pages, addr, 4096, 0, 0, 0));
			} else {
				usleep(rand() % 50);
				shmctl(id, IPC_RMID, NULL);
			}
		}
	}

It causes the following NULL pointer dereference due to a 'struct file'
being used while it's being freed.  (I couldn't actually get a KASAN
use-after-free splat like in the syzbot report.  But I think it's
possible with this bug; it would just take a more extraordinary race...)

	BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
	PGD 0 P4D 0
	Oops: 0000 [#1] SMP NOPTI
	CPU: 9 PID: 258 Comm: syz_ipc Not tainted 4.16.0-05140-gf8cf2f16a7c95 #189
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
	RIP: 0010:d_inode include/linux/dcache.h:519 [inline]
	RIP: 0010:touch_atime+0x25/0xd0 fs/inode.c:1724
	[...]
	Call Trace:
	 file_accessed include/linux/fs.h:2063 [inline]
	 shmem_mmap+0x25/0x40 mm/shmem.c:2149
	 call_mmap include/linux/fs.h:1789 [inline]
	 shm_mmap+0x34/0x80 ipc/shm.c:465
	 call_mmap include/linux/fs.h:1789 [inline]
	 mmap_region+0x309/0x5b0 mm/mmap.c:1712
	 do_mmap+0x294/0x4a0 mm/mmap.c:1483
	 do_mmap_pgoff include/linux/mm.h:2235 [inline]
	 SYSC_remap_file_pages mm/mmap.c:2853 [inline]
	 SyS_remap_file_pages+0x232/0x310 mm/mmap.c:2769
	 do_syscall_64+0x64/0x1a0 arch/x86/entry/common.c:287
	 entry_SYSCALL_64_after_hwframe+0x42/0xb7

[ebiggers@google.com: add comment]
  Link: http://lkml.kernel.org/r/20180410192850.235835-1-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20180409043039.28915-1-ebiggers3@gmail.com
Reported-by: syzbot+d11f321e7f1923157eac80aa990b446596f46439@syzkaller.appspotmail.com
Fixes: c8d78c1823f4 ("mm: replace remap_file_pages() syscall with emulation")
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Acked-by: Kirill A. Shutemov &lt;kirill.shutemov@linux.intel.com&gt;
Acked-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: "Eric W . Biederman" &lt;ebiederm@xmission.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>ipc/shm.c: add split function to shm_vm_ops</title>
<updated>2018-04-08T10:12:42Z</updated>
<author>
<name>Mike Kravetz</name>
<email>mike.kravetz@oracle.com</email>
</author>
<published>2018-03-28T23:01:01Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=86c8c892d38fd80c8acc4b44b75213758eb5d9de'/>
<id>urn:sha1:86c8c892d38fd80c8acc4b44b75213758eb5d9de</id>
<content type='text'>
commit 3d942ee079b917b24e2a0c5f18d35ac8ec9fee48 upstream.

If System V shmget/shmat operations are used to create a hugetlbfs
backed mapping, it is possible to munmap part of the mapping and split
the underlying vma such that it is not huge page aligned.  This will
untimately result in the following BUG:

  kernel BUG at /build/linux-jWa1Fv/linux-4.15.0/mm/hugetlb.c:3310!
  Oops: Exception in kernel mode, sig: 5 [#1]
  LE SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in: kcm nfc af_alg caif_socket caif phonet fcrypt
  CPU: 18 PID: 43243 Comm: trinity-subchil Tainted: G         C  E 4.15.0-10-generic #11-Ubuntu
  NIP:  c00000000036e764 LR: c00000000036ee48 CTR: 0000000000000009
  REGS: c000003fbcdcf810 TRAP: 0700   Tainted: G         C  E (4.15.0-10-generic)
  MSR:  9000000000029033 &lt;SF,HV,EE,ME,IR,DR,RI,LE&gt;  CR: 24002222  XER: 20040000
  CFAR: c00000000036ee44 SOFTE: 1
  NIP __unmap_hugepage_range+0xa4/0x760
  LR __unmap_hugepage_range_final+0x28/0x50
  Call Trace:
    0x7115e4e00000 (unreliable)
    __unmap_hugepage_range_final+0x28/0x50
    unmap_single_vma+0x11c/0x190
    unmap_vmas+0x94/0x140
    exit_mmap+0x9c/0x1d0
    mmput+0xa8/0x1d0
    do_exit+0x360/0xc80
    do_group_exit+0x60/0x100
    SyS_exit_group+0x24/0x30
    system_call+0x58/0x6c
  ---[ end trace ee88f958a1c62605 ]---

This bug was introduced by commit 31383c6865a5 ("mm, hugetlbfs:
introduce -&gt;split() to vm_operations_struct").  A split function was
added to vm_operations_struct to determine if a mapping can be split.
This was mostly for device-dax and hugetlbfs mappings which have
specific alignment constraints.

Mappings initiated via shmget/shmat have their original vm_ops
overwritten with shm_vm_ops.  shm_vm_ops functions will call back to the
original vm_ops if needed.  Add such a split function to shm_vm_ops.

Link: http://lkml.kernel.org/r/20180321161314.7711-1-mike.kravetz@oracle.com
Fixes: 31383c6865a5 ("mm, hugetlbfs: introduce -&gt;split() to vm_operations_struct")
Signed-off-by: Mike Kravetz &lt;mike.kravetz@oracle.com&gt;
Reported-by: Laurent Dufour &lt;ldufour@linux.vnet.ibm.com&gt;
Reviewed-by: Laurent Dufour &lt;ldufour@linux.vnet.ibm.com&gt;
Tested-by: Laurent Dufour &lt;ldufour@linux.vnet.ibm.com&gt;
Reviewed-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>ipc: msg, make msgrcv work with LONG_MIN</title>
<updated>2018-01-31T11:55:51Z</updated>
<author>
<name>Jiri Slaby</name>
<email>jslaby@suse.cz</email>
</author>
<published>2016-12-14T23:06:07Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=542cde0e3cc27bd4d6cbfa596d9278d3c97bc193'/>
<id>urn:sha1:542cde0e3cc27bd4d6cbfa596d9278d3c97bc193</id>
<content type='text'>
commit 999898355e08ae3b92dfd0a08db706e0c6703d30 upstream.

When LONG_MIN is passed to msgrcv, one would expect to recieve any
message.  But convert_mode does *msgtyp = -*msgtyp and -LONG_MIN is
undefined.  In particular, with my gcc -LONG_MIN produces -LONG_MIN
again.

So handle this case properly by assigning LONG_MAX to *msgtyp if
LONG_MIN was specified as msgtyp to msgrcv.

This code:
  long msg[] = { 100, 200 };
  int m = msgget(IPC_PRIVATE, IPC_CREAT | 0644);
  msgsnd(m, &amp;msg, sizeof(msg), 0);
  msgrcv(m, &amp;msg, sizeof(msg), LONG_MIN, 0);

produces currently nothing:

  msgget(IPC_PRIVATE, IPC_CREAT|0644)     = 65538
  msgsnd(65538, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, 0) = 0
  msgrcv(65538, ...

Except a UBSAN warning:

  UBSAN: Undefined behaviour in ipc/msg.c:745:13
  negation of -9223372036854775808 cannot be represented in type 'long int':

With the patch, I see what I expect:

  msgget(IPC_PRIVATE, IPC_CREAT|0644)     = 0
  msgsnd(0, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, 0) = 0
  msgrcv(0, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, -9223372036854775808, 0) = 16

Link: http://lkml.kernel.org/r/20161024082633.10148-1-jslaby@suse.cz
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>mqueue: fix a use-after-free in sys_mq_notify()</title>
<updated>2017-07-15T10:16:10Z</updated>
<author>
<name>Cong Wang</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2017-07-09T20:19:55Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e6952841ade0f937750c7748a812cb403bd744b0'/>
<id>urn:sha1:e6952841ade0f937750c7748a812cb403bd744b0</id>
<content type='text'>
commit f991af3daabaecff34684fd51fac80319d1baad1 upstream.

The retry logic for netlink_attachskb() inside sys_mq_notify()
is nasty and vulnerable:

1) The sock refcnt is already released when retry is needed
2) The fd is controllable by user-space because we already
   release the file refcnt

so we when retry but the fd has been just closed by user-space
during this small window, we end up calling netlink_detachskb()
on the error path which releases the sock again, later when
the user-space closes this socket a use-after-free could be
triggered.

Setting 'sock' to NULL here should be sufficient to fix it.

Reported-by: GeneBlue &lt;geneblue.mail@gmail.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>ipc/shm: Fix shmat mmap nil-page protection</title>
<updated>2017-03-12T05:41:44Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2017-02-27T22:28:24Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=270e84a1e6effd6c0c6e9b13b196b5fdaa392954'/>
<id>urn:sha1:270e84a1e6effd6c0c6e9b13b196b5fdaa392954</id>
<content type='text'>
commit 95e91b831f87ac8e1f8ed50c14d709089b4e01b8 upstream.

The issue is described here, with a nice testcase:

    https://bugzilla.kernel.org/show_bug.cgi?id=192931

The problem is that shmat() calls do_mmap_pgoff() with MAP_FIXED, and
the address rounded down to 0.  For the regular mmap case, the
protection mentioned above is that the kernel gets to generate the
address -- arch_get_unmapped_area() will always check for MAP_FIXED and
return that address.  So by the time we do security_mmap_addr(0) things
get funky for shmat().

The testcase itself shows that while a regular user crashes, root will
not have a problem attaching a nil-page.  There are two possible fixes
to this.  The first, and which this patch does, is to simply allow root
to crash as well -- this is also regular mmap behavior, ie when hacking
up the testcase and adding mmap(...  |MAP_FIXED).  While this approach
is the safer option, the second alternative is to ignore SHM_RND if the
rounded address is 0, thus only having MAP_SHARED flags.  This makes the
behavior of shmat() identical to the mmap() case.  The downside of this
is obviously user visible, but does make sense in that it maintains
semantics after the round-down wrt 0 address and mmap.

Passes shm related ltp tests.

Link: http://lkml.kernel.org/r/1486050195-18629-1-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Reported-by: Gareth Evans &lt;gareth.evans@contextis.co.uk&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Michael Kerrisk &lt;mtk.manpages@googlemail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>ipc: account for kmem usage on mqueue and msg</title>
<updated>2016-10-28T01:43:43Z</updated>
<author>
<name>Aristeu Rozanski</name>
<email>arozansk@redhat.com</email>
</author>
<published>2016-10-28T00:46:35Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=8c8d4d45204902e144abc0f15b7c658828028fa1'/>
<id>urn:sha1:8c8d4d45204902e144abc0f15b7c658828028fa1</id>
<content type='text'>
When kmem accounting switched from account by default to only account if
flagged by __GFP_ACCOUNT, IPC mqueue and messages was left out.

The production use case at hand is that mqueues should be customizable
via sysctls in Docker containers in a Kubernetes cluster.  This can only
be safely allowed to the users of the cluster (without the risk that
they can cause resource shortage on a node, influencing other users'
containers) if all resources they control are bounded, i.e.  accounted
for.

Link: http://lkml.kernel.org/r/1476806075-1210-1-git-send-email-arozansk@redhat.com
Signed-off-by: Aristeu Rozanski &lt;arozansk@redhat.com&gt;
Reported-by: Stefan Schimanski &lt;sttts@redhat.com&gt;
Acked-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Vladimir Davydov &lt;vdavydov.dev@gmail.com&gt;
Cc: Stefan Schimanski &lt;sttts@redhat.com&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc/sem.c: add cond_resched in exit_sme</title>
<updated>2016-10-11T22:06:33Z</updated>
<author>
<name>Nikolay Borisov</name>
<email>kernel@kyup.com</email>
</author>
<published>2016-10-11T20:55:05Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=2a1613a586de91457fa93c3e468a6e2438fe52a0'/>
<id>urn:sha1:2a1613a586de91457fa93c3e468a6e2438fe52a0</id>
<content type='text'>
In CONFIG_PREEMPT=n kernel a softlockup was observed while the for loop in
exit_sem.  Apparently it's possible for the loop to take quite a long time
and it doesn't have a scheduling point in it.  Since the codes is
executing under an rcu read section this may also cause rcu stalls, which
in turn block synchronize_rcu operations, which more or less de-stabilises
the whole system.

Fix this by introducing a cond_resched() at the beginning of the loop.

So this patch fixes the following:

  NMI watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [httpd:18119]
  CPU: 10 PID: 18119 Comm: httpd Tainted: G           O    4.4.20-clouder2 #6
  Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
  task: ffff88348d695280 ti: ffff881c95550000 task.ti: ffff881c95550000
  RIP: 0010:[&lt;ffffffff81614bc7&gt;]  [&lt;ffffffff81614bc7&gt;] _raw_spin_lock+0x17/0x30
  RSP: 0018:ffff881c95553e40  EFLAGS: 00000246
  RAX: 0000000000000000 RBX: ffff883161b1eea8 RCX: 000000000000000d
  RDX: 0000000000000001 RSI: 000000000000000e RDI: ffff883161b1eea4
  RBP: ffff881c95553ea0 R08: ffff881c95553e68 R09: ffff883fef376f88
  R10: ffff881fffb58c20 R11: ffffea0072556600 R12: ffff883161b1eea0
  R13: ffff88348d695280 R14: ffff883dec427000 R15: ffff8831621672a0
  FS:  0000000000000000(0000) GS:ffff881fffb40000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f3b3723e020 CR3: 0000000001c0a000 CR4: 00000000001406e0
  Call Trace:
    ? exit_sem+0x7c/0x280
    do_exit+0x338/0xb40
    do_group_exit+0x43/0xd0
    SyS_exit_group+0x14/0x20
    entry_SYSCALL_64_fastpath+0x16/0x6e

Link: http://lkml.kernel.org/r/1475154992-6363-1-git-send-email-kernel@kyup.com
Signed-off-by: Nikolay Borisov &lt;kernel@kyup.com&gt;
Cc: Herton R. Krzesinski &lt;herton@redhat.com&gt;
Cc: Fabian Frederick &lt;fabf@skynet.be&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc/msg: avoid waking sender upon full queue</title>
<updated>2016-10-11T22:06:33Z</updated>
<author>
<name>Davidlohr Bueso</name>
<email>dave@stgolabs.net</email>
</author>
<published>2016-10-11T20:55:02Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=ed27f9122c541a1720898739ac55f824f820b7ff'/>
<id>urn:sha1:ed27f9122c541a1720898739ac55f824f820b7ff</id>
<content type='text'>
Blocked tasks queued in q_senders waiting for their message to fit in the
queue are blindly awoken every time we think there's a remote chance this
might happen.  This could cause numerous (and expensive -- thundering
herd-ish) bogus wakeups if the queue is still really full.  Adding to the
scheduling cost/overhead, there's also the fact that we need to take the
ipc object lock and requeue ourselves in the q_senders list.

By keeping track of the blocked sender's message size, we can know
previously if the wakeup ought to occur or not.  Otherwise, to maintain
the current wakeup order we just move it to the tail.  This is exactly
what occurs right now if the sender needs to go back to sleep.

The case of EIDRM is left completely untouched, as we need to wakeup all
the tasks, and shouldn't be playing games in the first place.

This patch was seen to save on the 'msgctl10' ltp testcase ~15% in context
switches (avg out of ten runs).  Although these tests are really about
functionality (as opposed to performance), is does show the direct
benefits of the optimization.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1469748819-19484-6-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
