<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel, branch v3.11.2</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.11.2</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v3.11.2'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2013-09-27T00:21:57Z</updated>
<entry>
<title>pidns: fix vfork() after unshare(CLONE_NEWPID)</title>
<updated>2013-09-27T00:21:57Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2013-09-11T21:19:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=f230d52714f57e8154703e402d6d5afbba0f18fe'/>
<id>urn:sha1:f230d52714f57e8154703e402d6d5afbba0f18fe</id>
<content type='text'>
commit e79f525e99b04390ca4d2366309545a836c03bf1 upstream.

Commit 8382fcac1b81 ("pidns: Outlaw thread creation after
unshare(CLONE_NEWPID)") nacks CLONE_VM if the forking process unshared
pid_ns, this obviously breaks vfork:

	int main(void)
	{
		assert(unshare(CLONE_NEWUSER | CLONE_NEWPID) == 0);
		assert(vfork() &gt;= 0);
		_exit(0);
		return 0;
	}

fails without this patch.

Change this check to use CLONE_SIGHAND instead.  This also forbids
CLONE_THREAD automatically, and this is what the comment implies.

We could probably even drop CLONE_SIGHAND and use CLONE_THREAD, but it
would be safer to not do this.  The current check denies CLONE_SIGHAND
implicitely and there is no reason to change this.

Eric said "CLONE_SIGHAND is fine.  CLONE_THREAD would be even better.
Having shared signal handling between two different pid namespaces is
the case that we are fundamentally guarding against."

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Reported-by: Colin Walters &lt;walters@redhat.com&gt;
Acked-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Reviewed-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup</title>
<updated>2013-09-27T00:21:57Z</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2013-08-29T20:56:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cbe0a23a4e32d3c5fbec85b25eb4dfca6b219bd1'/>
<id>urn:sha1:cbe0a23a4e32d3c5fbec85b25eb4dfca6b219bd1</id>
<content type='text'>
commit a606488513543312805fab2b93070cefe6a3016c upstream.

Serge Hallyn &lt;serge.hallyn@ubuntu.com&gt; writes:

&gt; Since commit af4b8a83add95ef40716401395b44a1b579965f4 it's been
&gt; possible to get into a situation where a pidns reaper is
&gt; &lt;defunct&gt;, reparented to host pid 1, but never reaped.  How to
&gt; reproduce this is documented at
&gt;
&gt; https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526
&gt; (and see
&gt; https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1168526/comments/13)
&gt; In short, run repeated starts of a container whose init is
&gt;
&gt; Process.exit(0);
&gt;
&gt; sysrq-t when such a task is playing zombie shows:
&gt;
&gt; [  131.132978] init            x ffff88011fc14580     0  2084   2039 0x00000000
&gt; [  131.132978]  ffff880116e89ea8 0000000000000002 ffff880116e89fd8 0000000000014580
&gt; [  131.132978]  ffff880116e89fd8 0000000000014580 ffff8801172a0000 ffff8801172a0000
&gt; [  131.132978]  ffff8801172a0630 ffff88011729fff0 ffff880116e14650 ffff88011729fff0
&gt; [  131.132978] Call Trace:
&gt; [  131.132978]  [&lt;ffffffff816f6159&gt;] schedule+0x29/0x70
&gt; [  131.132978]  [&lt;ffffffff81064591&gt;] do_exit+0x6e1/0xa40
&gt; [  131.132978]  [&lt;ffffffff81071eae&gt;] ? signal_wake_up_state+0x1e/0x30
&gt; [  131.132978]  [&lt;ffffffff8106496f&gt;] do_group_exit+0x3f/0xa0
&gt; [  131.132978]  [&lt;ffffffff810649e4&gt;] SyS_exit_group+0x14/0x20
&gt; [  131.132978]  [&lt;ffffffff8170102f&gt;] tracesys+0xe1/0xe6
&gt;
&gt; Further debugging showed that every time this happened, zap_pid_ns_processes()
&gt; started with nr_hashed being 3, while we were expecting it to drop to 2.
&gt; Any time it didn't happen, nr_hashed was 1 or 2.  So the reaper was
&gt; waiting for nr_hashed to become 2, but free_pid() only wakes the reaper
&gt; if nr_hashed hits 1.

The issue is that when the task group leader of an init process exits
before other tasks of the init process when the init process finally
exits it will be a secondary task sleeping in zap_pid_ns_processes and
waiting to wake up when the number of hashed pids drops to two.  This
case waits forever as free_pid only sends a wake up when the number of
hashed pids drops to 1.

To correct this the simple strategy of sending a possibly unncessary
wake up when the number of hashed pids drops to 2 is adopted.

Sending one extraneous wake up is relatively harmless, at worst we
waste a little cpu time in the rare case when a pid namespace
appropaches exiting.

We can detect the case when the pid namespace drops to just two pids
hashed race free in free_pid.

Dereferencing pid_ns-&gt;child_reaper with the pidmap_lock held is safe
without out the tasklist_lock because it is guaranteed that the
detach_pid will be called on the child_reaper before it is freed and
detach_pid calls __change_pid which calls free_pid which takes the
pidmap_lock.  __change_pid only calls free_pid if this is the
last use of the pid.  For a thread that is not the thread group leader
the threads pid will only ever have one user because a threads pid
is not allowed to be the pid of a process, of a process group or
a session.  For a thread that is a thread group leader all of
the other threads of that process will be reaped before it is allowed
for the thread group leader to be reaped ensuring there will only
be one user of the threads pid as a process pid.  Furthermore
because the thread is the init process of a pid namespace all of the
other processes in the pid namespace will have also been already freed
leading to the fact that the pid will not be used as a session pid or
a process group pid for any other running process.

Acked-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Tested-by: Serge Hallyn &lt;serge.hallyn@canonical.com&gt;
Reported-by: Serge Hallyn &lt;serge.hallyn@ubuntu.com&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>uprobes: Fix utask-&gt;depth accounting in handle_trampoline()</title>
<updated>2013-09-27T00:21:56Z</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2013-09-11T15:47:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=003386069c5be7067c98172549780f428702d5b9'/>
<id>urn:sha1:003386069c5be7067c98172549780f428702d5b9</id>
<content type='text'>
commit 878b5a6efd38030c7a90895dc8346e8fb1e09b4c upstream.

Currently utask-&gt;depth is simply the number of allocated/pending
return_instance's in uprobe_task-&gt;return_instances list.

handle_trampoline() should decrement this counter every time we
handle/free an instance, but due to typo it does this only if
-&gt;chained == T. This means that in the likely case this counter
is never decremented and the probed task can't report more than
MAX_URETPROBE_DEPTH events.

Reported-by: Mikhail Kulemin &lt;Mikhail.Kulemin@ru.ibm.com&gt;
Reported-by: Hemant Kumar Shaw &lt;hkshaw@linux.vnet.ibm.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Anton Arapov &lt;anton@redhat.com&gt;
Cc: masami.hiramatsu.pt@hitachi.com
Cc: srikar@linux.vnet.ibm.com
Cc: systemtap@sourceware.org
Link: http://lkml.kernel.org/r/20130911154726.GA8093@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2013-08-31T00:43:17Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-08-31T00:43:17Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=a8787645e14ce7bbc3db9788526ed0be968c0df2'/>
<id>urn:sha1:a8787645e14ce7bbc3db9788526ed0be968c0df2</id>
<content type='text'>
Pull networking fixes from David Miller:

 1) There was a simplification in the ipv6 ndisc packet sending
    attempted here, which avoided using memory accounting on the
    per-netns ndisc socket for sending NDISC packets.  It did fix some
    important issues, but it causes regressions so it gets reverted here
    too.  Specifically, the problem with this change is that the IPV6
    output path really depends upon there being a valid skb-&gt;sk
    attached.

    The reason we want to do this change in some form when we figure out
    how to do it right, is that if a device goes down the ndisc_sk
    socket send queue will fill up and block NDISC packets that we want
    to send to other devices too.  That's really bad behavior.

    Hopefully Thomas can come up with a better version of this change.

 2) Fix a severe TCP performance regression by reverting a change made
    to dev_pick_tx() quite some time ago.  From Eric Dumazet.

 3) TIPC returns wrongly signed error codes, fix from Erik Hugne.

 4) Fix OOPS when doing IPSEC over ipv4 tunnels due to orphaning the
    skb-&gt;sk too early.  Fix from Li Hongjun.

 5) RAW ipv4 sockets can use the wrong routing key during lookup, from
    Chris Clark.

 6) Similar to #1 revert an older change that tried to use plain
    alloc_skb() for SYN/ACK TCP packets, this broke the netfilter owner
    mark which needs to see the skb-&gt;sk for such frames.  From Phil
    Oester.

 7) BNX2x driver bug fixes from Ariel Elior and Yuval Mintz,
    specifically in the handling of virtual functions.

 8) IPSEC path error propagations to sockets is not done properly when
    we have v4 in v6, and v6 in v4 type rules.  Fix from Hannes Frederic
    Sowa.

 9) Fix missing channel context release in mac80211, from Johannes Berg.

10) Fix network namespace handing wrt.  SCM_RIGHTS, from Andy
    Lutomirski.

11) Fix usage of bogus NAPI weight in jme, netxen, and ps3_gelic
    drivers.  From Michal Schmidt.

12) Hopefully a complete and correct fix for the genetlink dump locking
    and module reference counting.  From Pravin B Shelar.

13) sk_busy_loop() must do a cpu_relax(), from Eliezer Tamir.

14) Fix handling of timestamp offset when restoring a snapshotted TCP
    socket.  From Andrew Vagin.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
  net: fec: fix time stamping logic after napi conversion
  net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay
  mISDN: return -EINVAL on error in dsp_control_req()
  net: revert 8728c544a9c ("net: dev_pick_tx() fix")
  Revert "ipv6: Don't depend on per socket memory for neighbour discovery messages"
  ipv4 tunnels: fix an oops when using ipip/sit with IPsec
  tipc: set sk_err correctly when connection fails
  tcp: tcp_make_synack() should use sock_wmalloc
  bridge: separate querier and query timer into IGMP/IPv4 and MLD/IPv6 ones
  ipv6: Don't depend on per socket memory for neighbour discovery messages
  ipv4: sendto/hdrincl: don't use destination address found in header
  tcp: don't apply tsoffset if rcv_tsecr is zero
  tcp: initialize rcv_tstamp for restored sockets
  net: xilinx: fix memleak
  net: usb: Add HP hs2434 device to ZLP exception table
  net: add cpu_relax to busy poll loop
  net: stmmac: fixed the pbl setting with DT
  genl: Hold reference on correct module while netlink-dump.
  genl: Fix genl dumpit() locking.
  xfrm: Fix potential null pointer dereference in xdst_queue_output
  ...
</content>
</entry>
<entry>
<title>Merge branch 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2013-08-30T00:03:48Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-08-30T00:03:48Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=41615e811b3031728a003da077005e8dcf9d71cc'/>
<id>urn:sha1:41615e811b3031728a003da077005e8dcf9d71cc</id>
<content type='text'>
Pull cgroup fix from Tejun Heo:
 "During the percpu reference counting update which was merged during
  v3.11-rc1, the cgroup destruction path was updated so that a cgroup in
  the process of dying may linger on the children list, which was
  necessary as the cgroup should still be included in child/descendant
  iteration while percpu ref is being killed.

  Unfortunately, I forgot to update cgroup destruction path accordingly
  and cgroup destruction may fail spuriously with -EBUSY due to
  lingering dying children even when there's no live child left - e.g.
  "rmdir parent/child parent" will usually fail.

  This can be easily fixed by iterating through the children list to
  verify that there's no live child left.  While this is very late in
  the release cycle, this bug is very visible to userland and I believe
  the fix is relatively safe.

  Thanks Hugh for spotting and providing fix for the issue"

* 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix rmdir EBUSY regression in 3.11
</content>
</entry>
<entry>
<title>cgroup: fix rmdir EBUSY regression in 3.11</title>
<updated>2013-08-29T15:05:07Z</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2013-08-28T23:31:23Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=bb78a92f47696b2da49f2692b6a9fa56d07c444a'/>
<id>urn:sha1:bb78a92f47696b2da49f2692b6a9fa56d07c444a</id>
<content type='text'>
On 3.11-rc we are seeing cgroup directories left behind when they should
have been removed.  Here's a trivial reproducer:

cd /sys/fs/cgroup/memory
mkdir parent parent/child; rmdir parent/child parent
rmdir: failed to remove `parent': Device or resource busy

It's because cgroup_destroy_locked() (step 1 of destruction) leaves
cgroup on parent's children list, letting cgroup_offline_fn() (step 2 of
destruction) remove it; but step 2 is run by work queue, which may not
yet have removed the children when parent destruction checks the list.

Fix that by checking through a non-empty list of children: if every one
of them has already been marked CGRP_DEAD, then it's safe to proceed:
those children are invisible to userspace, and should not obstruct rmdir.

(I didn't see any reason to keep the cgrp-&gt;children checks under the
unrelated css_set_lock, so moved them out.)

tj: Flattened nested ifs a bit and updated comment so that it's
    correct on both for-3.11-fixes and for-3.12.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: cond_resched() after processing each work item</title>
<updated>2013-08-29T13:19:28Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2013-08-28T21:33:37Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=b22ce2785d97423846206cceec4efee0c4afd980'/>
<id>urn:sha1:b22ce2785d97423846206cceec4efee0c4afd980</id>
<content type='text'>
If !PREEMPT, a kworker running work items back to back can hog CPU.
This becomes dangerous when a self-requeueing work item which is
waiting for something to happen races against stop_machine.  Such
self-requeueing work item would requeue itself indefinitely hogging
the kworker and CPU it's running on while stop_machine would wait for
that CPU to enter stop_machine while preventing anything else from
happening on all other CPUs.  The two would deadlock.

Jamie Liu reports that this deadlock scenario exists around
scsi_requeue_run_queue() and libata port multiplier support, where one
port may exclude command processing from other ports.  With the right
timing, scsi_requeue_run_queue() can end up requeueing itself trying
to execute an IO which is asked to be retried while another device has
an exclusive access, which in turn can't make forward progress due to
stop_machine.

Fix it by invoking cond_resched() after executing each work item.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Jamie Liu &lt;jamieliu@google.com&gt;
References: http://thread.gmane.org/gmane.linux.kernel/1552567
Cc: stable@vger.kernel.org
--
 kernel/workqueue.c |    9 +++++++++
 1 file changed, 9 insertions(+)
</content>
</entry>
<entry>
<title>timer_list: correct the iterator for timer_list</title>
<updated>2013-08-29T02:26:38Z</updated>
<author>
<name>Nathan Zimmer</name>
<email>nzimmer@sgi.com</email>
</author>
<published>2013-08-28T23:35:14Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=84a78a6504f5c5394a8e558702e5b54131f01d14'/>
<id>urn:sha1:84a78a6504f5c5394a8e558702e5b54131f01d14</id>
<content type='text'>
Correct an issue with /proc/timer_list reported by Holger.

When reading from the proc file with a sufficiently small buffer, 2k so
not really that small, there was one could get hung trying to read the
file a chunk at a time.

The timer_list_start function failed to account for the possibility that
the offset was adjusted outside the timer_list_next.

Signed-off-by: Nathan Zimmer &lt;nzimmer@sgi.com&gt;
Reported-by: Holger Hans Peter Freyther &lt;holger@freyther.de&gt;
Cc: John Stultz &lt;john.stultz@linaro.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Berke Durak &lt;berke.durak@xiphos.com&gt;
Cc: Jeff Layton &lt;jlayton@redhat.com&gt;
Tested-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: &lt;stable@vger.kernel.org&gt; # 3.10.x
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Rename nsproxy.pid_ns to nsproxy.pid_ns_for_children</title>
<updated>2013-08-27T17:52:52Z</updated>
<author>
<name>Andy Lutomirski</name>
<email>luto@amacapital.net</email>
</author>
<published>2013-08-22T18:39:16Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c2b1df2eb42978073ec27c99cc199d20ae48b849'/>
<id>urn:sha1:c2b1df2eb42978073ec27c99cc199d20ae48b849</id>
<content type='text'>
nsproxy.pid_ns is *not* the task's pid namespace.  The name should clarify
that.

This makes it more obvious that setns on a pid namespace is weird --
it won't change the pid namespace shown in procfs.

Signed-off-by: Andy Lutomirski &lt;luto@amacapital.net&gt;
Reviewed-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2013-08-23T17:58:50Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-08-23T17:58:50Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=e2982a04ede454b23ea2f5af11ba4d77d8a70155'/>
<id>urn:sha1:e2982a04ede454b23ea2f5af11ba4d77d8a70155</id>
<content type='text'>
Pull cgroup fix from Tejun Heo:
 "A late fix for cgroup.

  This fixes a behavior regression visible to userland which was created
  by a commit merged during -rc1.  While the behavior change isn't too
  likely to be noticeable, the fix is relatively low risk and we'll need
  to backport it through -stable anyway if the bug gets released"

* 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cpuset: fix a regression in validating config change
</content>
</entry>
</feed>
