user/sven/linux.git/kernel/rcu/tree.h, branch v4.9.99

rcu: Drive expedited grace periods from workqueue

2016-08-22T16:30:25Z

The current implementation of expedited grace periods has the user task drive the grace period. This works, but has downsides: (1) The user task must awaken tasks piggybacking on this grace period, which can result in latencies rivaling that of the grace period itself, and (2) User tasks can receive signals, which interfere with RCU CPU stall warnings. This commit therefore uses workqueues to drive the grace periods, so that the user task need not do the awakening. A subsequent commit will remove the now-unnecessary code allowing for signals. Signed-off-by: Paul E. McKenney

rcu: Correctly handle sparse possible cpus

2016-06-15T23:00:05Z

In many cases in the RCU tree code, we iterate over the set of cpus for a leaf node described by rcu_node::grplo and rcu_node::grphi, checking per-cpu data for each cpu in this range. However, if the set of possible cpus is sparse, some cpus described in this range are not possible, and thus no per-cpu region will have been allocated (or initialised) for them by the generic percpu code. Erroneous accesses to a per-cpu area for these !possible cpus may fault or may hit other data depending on the addressed generated when the erroneous per cpu offset is applied. In practice, both cases have been observed on arm64 hardware (the former being silent, but detectable with additional patches). To avoid issues resulting from this, we must iterate over the set of *possible* cpus for a given leaf node. This patch add a new helper, for_each_leaf_node_possible_cpu, to enable this. As iteration is often intertwined with rcu_node local bitmask manipulation, a new leaf_node_cpu_bit helper is added to make this simpler and more consistent. The RCU tree code is made to use both of these where appropriate. Without this patch, running reboot at a shell can result in an oops like: [ 3369.075979] Unable to handle kernel paging request at virtual address ffffff8008b21b4c [ 3369.083881] pgd = ffffffc3ecdda000 [ 3369.087270] [ffffff8008b21b4c] *pgd=00000083eca48003, *pud=00000083eca48003, *pmd=0000000000000000 [ 3369.096222] Internal error: Oops: 96000007 [#1] PREEMPT SMP [ 3369.101781] Modules linked in: [ 3369.104825] CPU: 2 PID: 1817 Comm: NetworkManager Tainted: G W 4.6.0+ #3 [ 3369.121239] task: ffffffc0fa13e000 ti: ffffffc3eb940000 task.ti: ffffffc3eb940000 [ 3369.128708] PC is at sync_rcu_exp_select_cpus+0x188/0x510 [ 3369.134094] LR is at sync_rcu_exp_select_cpus+0x104/0x510 [ 3369.139479] pc : [] lr : [] pstate: 200001c5 [ 3369.146860] sp : ffffffc3eb9435a0 [ 3369.150162] x29: ffffffc3eb9435a0 x28: ffffff8008be4f88 [ 3369.155465] x27: ffffff8008b66c80 x26: ffffffc3eceb2600 [ 3369.160767] x25: 0000000000000001 x24: ffffff8008be4f88 [ 3369.166070] x23: ffffff8008b51c3c x22: ffffff8008b66c80 [ 3369.171371] x21: 0000000000000001 x20: ffffff8008b21b40 [ 3369.176673] x19: ffffff8008b66c80 x18: 0000000000000000 [ 3369.181975] x17: 0000007fa951a010 x16: ffffff80086a30f0 [ 3369.187278] x15: 0000007fa9505590 x14: 0000000000000000 [ 3369.192580] x13: ffffff8008b51000 x12: ffffffc3eb940000 [ 3369.197882] x11: 0000000000000006 x10: ffffff8008b51b78 [ 3369.203184] x9 : 0000000000000001 x8 : ffffff8008be4000 [ 3369.208486] x7 : ffffff8008b21b40 x6 : 0000000000001003 [ 3369.213788] x5 : 0000000000000000 x4 : ffffff8008b27280 [ 3369.219090] x3 : ffffff8008b21b4c x2 : 0000000000000001 [ 3369.224406] x1 : 0000000000000001 x0 : 0000000000000140 ... [ 3369.972257] [] sync_rcu_exp_select_cpus+0x188/0x510 [ 3369.978685] [] synchronize_rcu_expedited+0x64/0xa8 [ 3369.985026] [] synchronize_net+0x24/0x30 [ 3369.990499] [] dev_deactivate_many+0x28c/0x298 [ 3369.996493] [] __dev_close_many+0x60/0xd0 [ 3370.002052] [] __dev_close+0x28/0x40 [ 3370.007178] [] __dev_change_flags+0x8c/0x158 [ 3370.012999] [] dev_change_flags+0x20/0x60 [ 3370.018558] [] do_setlink+0x288/0x918 [ 3370.023771] [] rtnl_newlink+0x398/0x6a8 [ 3370.029158] [] rtnetlink_rcv_msg+0xe4/0x220 [ 3370.034891] [] netlink_rcv_skb+0xc4/0xf8 [ 3370.040364] [] rtnetlink_rcv+0x2c/0x40 [ 3370.045663] [] netlink_unicast+0x160/0x238 [ 3370.051309] [] netlink_sendmsg+0x2f0/0x358 [ 3370.056956] [] sock_sendmsg+0x18/0x30 [ 3370.062168] [] ___sys_sendmsg+0x26c/0x280 [ 3370.067728] [] __sys_sendmsg+0x44/0x88 [ 3370.073027] [] SyS_sendmsg+0x10/0x20 [ 3370.078153] [] el0_svc_naked+0x24/0x28 Signed-off-by: Mark Rutland Reported-by: Dennis Chen Cc: Catalin Marinas Cc: Josh Triplett Cc: Lai Jiangshan Cc: Mathieu Desnoyers Cc: Steve Capper Cc: Steven Rostedt Cc: Will Deacon Cc: linux-kernel@vger.kernel.org Signed-off-by: Paul E. McKenney

Merge branches 'doc.2016.04.19a', 'exp.2016.03.31d', 'fixes.2016.03.31d' and 'torture.2016.04.21a' into HEAD

2016-04-21T20:48:20Z

doc.2016.04.19a: Documentation updates exp.2016.03.31d: Expedited grace-period updates fixes.2016.03.31d: Miscellaneous fixes torture.2016.004.21a Torture-test updates

rcu: Awaken grace-period kthread if too long since FQS

2016-03-31T20:34:50Z

Recent kernels can fail to awaken the grace-period kthread for quiescent-state forcing. This commit is a crude hack that does a wakeup if a scheduling-clock interrupt sees that it has been too long since force-quiescent-state (FQS) processing. Signed-off-by: Paul E. McKenney

rcu: Overlap wakeups with next expedited grace period

2016-03-31T20:34:11Z

The current expedited grace-period implementation makes subsequent grace periods wait on wakeups for the prior grace period. This does not fit the dictionary definition of "expedited", so this commit allows these two phases to overlap. Doing this requires four waitqueues rather than two because tasks can now be waiting on the previous, current, and next grace periods. The fourth waitqueue makes the bit masking work out nicely. Signed-off-by: Paul E. McKenney

rcu: Enforce expedited-GP fairness via funnel wait queue

2016-03-31T20:34:08Z

The current mutex-based funnel-locking approach used by expedited grace periods is subject to severe unfairness. The problem arises when a few tasks, making a path from leaves to root, all wake up before other tasks do. A new task can then follow this path all the way to the root, which needlessly delays tasks whose grace period is done, but who do not happen to acquire the lock quickly enough. This commit avoids this problem by maintaining per-rcu_node wait queues, along with a per-rcu_node counter that tracks the latest grace period sought by an earlier task to visit this node. If that grace period would satisfy the current task, instead of proceeding up the tree, it waits on the current rcu_node structure using a pair of wait queues provided for that purpose. This decouples awakening of old tasks from the arrival of new tasks. If the wakeups prove to be a bottleneck, additional kthreads can be brought to bear for that purpose. Signed-off-by: Paul E. McKenney

rcu: Shorten expedited_workdone* to exp_workdone*

2016-03-31T20:34:08Z

Just a name change to save a few lines and a bit of typing. Signed-off-by: Paul E. McKenney

rcu: Remove expedited GP funnel-lock bypass

2016-03-31T20:34:07Z

Commit #cdacbe1f91264 ("rcu: Add fastpath bypassing funnel locking") turns out to be a pessimization at high load because it forces a tree full of tasks to wait for an expedited grace period that they probably do not need. This commit therefore removes this optimization. Signed-off-by: Paul E. McKenney

Merge commit 'fixes.2015.02.23a' into core/rcu

2016-03-15T08:01:06Z

Conflicts: kernel/rcu/tree.c Signed-off-by: Ingo Molnar

rcu: Use simple wait queues where possible in rcutree

2016-02-25T10:27:16Z

As of commit dae6e64d2bcfd ("rcu: Introduce proper blocking to no-CBs kthreads GP waits") the RCU subsystem started making use of wait queues. Here we convert all additions of RCU wait queues to use simple wait queues, since they don't need the extra overhead of the full wait queue features. Originally this was done for RT kernels[1], since we would get things like... BUG: sleeping function called from invalid context at kernel/rtmutex.c:659 in_atomic(): 1, irqs_disabled(): 1, pid: 8, name: rcu_preempt Pid: 8, comm: rcu_preempt Not tainted Call Trace: [] __might_sleep+0xd0/0xf0 [] rt_spin_lock+0x24/0x50 [] __wake_up+0x36/0x70 [] rcu_gp_kthread+0x4d2/0x680 [] ? __init_waitqueue_head+0x50/0x50 [] ? rcu_gp_fqs+0x80/0x80 [] kthread+0xdb/0xe0 [] ? finish_task_switch+0x52/0x100 [] kernel_thread_helper+0x4/0x10 [] ? __init_kthread_worker+0x60/0x60 [] ? gs_change+0xb/0xb ...and hence simple wait queues were deployed on RT out of necessity (as simple wait uses a raw lock), but mainline might as well take advantage of the more streamline support as well. [1] This is a carry forward of work from v3.10-rt; the original conversion was by Thomas on an earlier -rt version, and Sebastian extended it to additional post-3.10 added RCU waiters; here I've added a commit log and unified the RCU changes into one, and uprev'd it to match mainline RCU. Signed-off-by: Daniel Wagner Acked-by: Peter Zijlstra (Intel) Cc: linux-rt-users@vger.kernel.org Cc: Boqun Feng Cc: Marcelo Tosatti Cc: Steven Rostedt Cc: Paul Gortmaker Cc: Paolo Bonzini Cc: "Paul E. McKenney" Link: http://lkml.kernel.org/r/1455871601-27484-6-git-send-email-wagi@monom.org Signed-off-by: Thomas Gleixner