<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/sven/linux.git/kernel/rcu/tree.c, branch v5.4.55</title>
<subtitle>Linux Kernel
</subtitle>
<id>https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.4.55</id>
<link rel='self' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/atom?h=v5.4.55'/>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/'/>
<updated>2020-02-24T07:36:23Z</updated>
<entry>
<title>rcu: Fix data-race due to atomic_t copy-by-value</title>
<updated>2020-02-24T07:36:23Z</updated>
<author>
<name>Marco Elver</name>
<email>elver@google.com</email>
</author>
<published>2019-10-09T15:57:43Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=255edefeb0b840f9f4959e1b96eabc70709f995b'/>
<id>urn:sha1:255edefeb0b840f9f4959e1b96eabc70709f995b</id>
<content type='text'>
[ Upstream commit 6cf539a87a61a4fbc43f625267dbcbcf283872ed ]

This fixes a data-race where `atomic_t dynticks` is copied by value. The
copy is performed non-atomically, resulting in a data-race if `dynticks`
is updated concurrently.

This data-race was found with KCSAN:
==================================================================
BUG: KCSAN: data-race in dyntick_save_progress_counter / rcu_irq_enter

write to 0xffff989dbdbe98e0 of 4 bytes by task 10 on cpu 3:
 atomic_add_return include/asm-generic/atomic-instrumented.h:78 [inline]
 rcu_dynticks_snap kernel/rcu/tree.c:310 [inline]
 dyntick_save_progress_counter+0x43/0x1b0 kernel/rcu/tree.c:984
 force_qs_rnp+0x183/0x200 kernel/rcu/tree.c:2286
 rcu_gp_fqs kernel/rcu/tree.c:1601 [inline]
 rcu_gp_fqs_loop+0x71/0x880 kernel/rcu/tree.c:1653
 rcu_gp_kthread+0x22c/0x3b0 kernel/rcu/tree.c:1799
 kthread+0x1b5/0x200 kernel/kthread.c:255
 &lt;snip&gt;

read to 0xffff989dbdbe98e0 of 4 bytes by task 154 on cpu 7:
 rcu_nmi_enter_common kernel/rcu/tree.c:828 [inline]
 rcu_irq_enter+0xda/0x240 kernel/rcu/tree.c:870
 irq_enter+0x5/0x50 kernel/softirq.c:347
 &lt;snip&gt;

Reported by Kernel Concurrency Sanitizer on:
CPU: 7 PID: 154 Comm: kworker/7:1H Not tainted 5.3.0+ #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Workqueue: kblockd blk_mq_run_work_fn
==================================================================

Signed-off-by: Marco Elver &lt;elver@google.com&gt;
Cc: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Cc: Josh Triplett &lt;josh@joshtriplett.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Cc: Joel Fernandes &lt;joel@joelfernandes.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: rcu@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Joel Fernandes (Google) &lt;joel@joelfernandes.org&gt;
Signed-off-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2019-09-17T00:25:49Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-09-17T00:25:49Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7e67a859997aad47727aff9c5a32e160da079ce3'/>
<id>urn:sha1:7e67a859997aad47727aff9c5a32e160da079ce3</id>
<content type='text'>
Pull scheduler updates from Ingo Molnar:

 - MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and
   Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann,
   Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers.

   As perf and the scheduler is getting bigger and more complex,
   document the status quo of current responsibilities and interests,
   and spread the review pain^H^H^H^H fun via an increase in the Cc:
   linecount generated by scripts/get_maintainer.pl. :-)

 - Add another series of patches that brings the -rt (PREEMPT_RT) tree
   closer to mainline: split the monolithic CONFIG_PREEMPT dependencies
   into a new CONFIG_PREEMPTION category that will allow the eventual
   introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches
   to go though.

 - Extend the CPU cgroup controller with uclamp.min and uclamp.max to
   allow the finer shaping of CPU bandwidth usage.

 - Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS).

 - Improve the behavior of high CPU count, high thread count
   applications running under cpu.cfs_quota_us constraints.

 - Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present.

 - Improve CPU isolation housekeeping CPU allocation NUMA locality.

 - Fix deadline scheduler bandwidth calculations and logic when cpusets
   rebuilds the topology, or when it gets deadline-throttled while it's
   being offlined.

 - Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from
   setscheduler() system calls without creating global serialization.
   Add new synchronization between cpuset topology-changing events and
   the deadline acceptance tests in setscheduler(), which were broken
   before.

 - Rework the active_mm state machine to be less confusing and more
   optimal.

 - Rework (simplify) the pick_next_task() slowpath.

 - Improve load-balancing on AMD EPYC systems.

 - ... and misc cleanups, smaller fixes and improvements - please see
   the Git log for more details.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
  sched/psi: Correct overly pessimistic size calculation
  sched/fair: Speed-up energy-aware wake-ups
  sched/uclamp: Always use 'enum uclamp_id' for clamp_id values
  sched/uclamp: Update CPU's refcount on TG's clamp changes
  sched/uclamp: Use TG's clamps to restrict TASK's clamps
  sched/uclamp: Propagate system defaults to the root group
  sched/uclamp: Propagate parent clamps
  sched/uclamp: Extend CPU's cgroup controller
  sched/topology: Improve load balancing on AMD EPYC systems
  arch, ia64: Make NUMA select SMP
  sched, perf: MAINTAINERS update, add submaintainers and reviewers
  sched/fair: Use rq_lock/unlock in online_fair_sched_group
  cpufreq: schedutil: fix equation in comment
  sched: Rework pick_next_task() slow-path
  sched: Allow put_prev_task() to drop rq-&gt;lock
  sched/fair: Expose newidle_balance()
  sched: Add task_struct pointer to sched_class::set_curr_task
  sched: Rework CPU hotplug task selection
  sched/{rt,deadline}: Fix set_next_task vs pick_next_task
  sched: Fix kerneldoc comment for ia64_set_curr_task
  ...
</content>
</entry>
<entry>
<title>Merge branch 'sched/rt' into sched/core, to pick up -rt changes</title>
<updated>2019-09-16T12:05:04Z</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2019-09-16T12:04:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=563c4f85f9f0d63b712081d5b4522152cdcb8b6b'/>
<id>urn:sha1:563c4f85f9f0d63b712081d5b4522152cdcb8b6b</id>
<content type='text'>
Pick up the first couple of patches working towards PREEMPT_RT.

Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
</entry>
<entry>
<title>rcu: Allow rcu_do_batch() to dynamically adjust batch sizes</title>
<updated>2019-08-13T21:38:24Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2019-07-25T01:07:52Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=cfcdef5e30469f3f2d6786ad35fc3fdef2a3833f'/>
<id>urn:sha1:cfcdef5e30469f3f2d6786ad35fc3fdef2a3833f</id>
<content type='text'>
Bimodal behavior of rcu_do_batch() is not really suited to Google
applications like gfe servers.

When a process with millions of sockets exits, closing all files
queues two rcu callbacks per socket.

This eventually reaches the point where RCU enters an emergency
mode, where rcu_do_batch() do not return until whole queue is flushed.

Each rcu callback lasts at least 70 nsec, so with millions of
elements, we easily spend more than 100 msec without rescheduling.

Goal of this patch is to avoid the infamous message like following
"need_resched set for &gt; 51999388 ns (52 ticks) without schedule"

We dynamically adjust the number of elements we process, instead
of 10 / INFINITE choices, we use a floor of ~1 % of current entries.

If the number is above 1000, we switch to a time based limit of 3 msec
per batch, adjustable with /sys/module/rcutree/parameters/rcu_resched_ns

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
[ paulmck: Forward-port and remove debug statements. ]
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>rcu/nocb: Advance CBs after merge in rcutree_migrate_callbacks()</title>
<updated>2019-08-13T21:38:24Z</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.ibm.com</email>
</author>
<published>2019-07-10T19:54:56Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=23651d9b9616060cf86af5e3b15defcf3bcd2642'/>
<id>urn:sha1:23651d9b9616060cf86af5e3b15defcf3bcd2642</id>
<content type='text'>
The rcutree_migrate_callbacks() invokes rcu_advance_cbs() on both the
offlined CPU's -&gt;cblist and that of the surviving CPU, then merges
them.  However, after the merge, and of the offlined CPU's callbacks
that were not ready to be invoked will no longer be associated with a
grace-period number.  This commit therefore invokes rcu_advance_cbs()
one more time on the merged -&gt;cblist in order to assign a grace-period
number to these callbacks.

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>rcu/nocb: Add bypass callback queueing</title>
<updated>2019-08-13T21:37:32Z</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.ibm.com</email>
</author>
<published>2019-07-02T23:03:33Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=d1b222c6be1f8bfc77099e034219732ecaeaaf96'/>
<id>urn:sha1:d1b222c6be1f8bfc77099e034219732ecaeaaf96</id>
<content type='text'>
Use of the rcu_data structure's segmented -&gt;cblist for no-CBs CPUs
takes advantage of unrelated grace periods, thus reducing the memory
footprint in the face of floods of call_rcu() invocations.  However,
the -&gt;cblist field is a more-complex rcu_segcblist structure which must
be protected via locking.  Even though there are only three entities
which can acquire this lock (the CPU invoking call_rcu(), the no-CBs
grace-period kthread, and the no-CBs callbacks kthread), the contention
on this lock is excessive under heavy stress.

This commit therefore greatly reduces contention by provisioning
an rcu_cblist structure field named -&gt;nocb_bypass within the
rcu_data structure.  Each no-CBs CPU is permitted only a limited
number of enqueues onto the -&gt;cblist per jiffy, controlled by a new
nocb_nobypass_lim_per_jiffy kernel boot parameter that defaults to
about 16 enqueues per millisecond (16 * 1000 / HZ).  When that limit is
exceeded, the CPU instead enqueues onto the new -&gt;nocb_bypass.

The -&gt;nocb_bypass is flushed into the -&gt;cblist every jiffy or when
the number of callbacks on -&gt;nocb_bypass exceeds qhimark, whichever
happens first.  During call_rcu() floods, this flushing is carried out
by the CPU during the course of its call_rcu() invocations.  However,
a CPU could simply stop invoking call_rcu() at any time.  The no-CBs
grace-period kthread therefore carries out less-aggressive flushing
(every few jiffies or when the number of callbacks on -&gt;nocb_bypass
exceeds (2 * qhimark), whichever comes first).  This means that the
no-CBs grace-period kthread cannot be permitted to do unbounded waits
while there are callbacks on -&gt;nocb_bypass.  A -&gt;nocb_bypass_timer is
used to provide the needed wakeups.

[ paulmck: Apply Coverity feedback reported by Colin Ian King. ]
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>rcu/nocb: Reduce contention at no-CBs registry-time CB advancement</title>
<updated>2019-08-13T21:35:49Z</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.ibm.com</email>
</author>
<published>2019-06-01T13:16:38Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=6608c3a027bcc0b34cc02bc764ea9f52b9dce46f'/>
<id>urn:sha1:6608c3a027bcc0b34cc02bc764ea9f52b9dce46f</id>
<content type='text'>
Currently, __call_rcu_nocb_wake() conditionally acquires the leaf rcu_node
structure's -&gt;lock, and only afterwards does rcu_advance_cbs_nowake()
check to see if it is possible to advance callbacks without potentially
needing to awaken the grace-period kthread.  Given that the no-awaken
check can be done locklessly, this commit reverses the order, so that
rcu_advance_cbs_nowake() is invoked without holding the leaf rcu_node
structure's -&gt;lock and rcu_advance_cbs_nowake() checks the grace-period
state before conditionally acquiring that lock, thus reducing the number
of needless acquistions of the leaf rcu_node structure's -&gt;lock.

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>rcu/nocb: Avoid needless wakeups of no-CBs grace-period kthread</title>
<updated>2019-08-13T21:35:49Z</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.ibm.com</email>
</author>
<published>2019-05-28T12:54:26Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=7f36ef82e5cf0b401c2676fb3e56ad0633ed6ad5'/>
<id>urn:sha1:7f36ef82e5cf0b401c2676fb3e56ad0633ed6ad5</id>
<content type='text'>
Currently, the code provides an extra wakeup for the no-CBs grace-period
kthread if one of its CPUs is generating excessive numbers of callbacks.
But satisfying though it is to wake something up when things are going
south, unless the thing being awakened can actually help solve the
problem, that extra wakeup does nothing but consume additional CPU time,
which is exactly what you don't want during a call_rcu() flood.

This commit therefore avoids doing anything if the corresponding
no-CBs callback kthread is going full tilt.  Otherwise, if advancing
callbacks immediately might help and if the leaf rcu_node structure's
lock is immediately available, this commit invokes a new variant of
rcu_advance_cbs() that advances callbacks only if doing so won't require
awakening the grace-period kthread (not to be confused with any of the
no-CBs grace-period kthreads).

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>rcu/nocb: Use build-time no-CBs check in rcu_pending()</title>
<updated>2019-08-13T21:35:49Z</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.ibm.com</email>
</author>
<published>2019-05-21T20:53:28Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=921bb5fad11c0e8ec5f7625547552b252281f4de'/>
<id>urn:sha1:921bb5fad11c0e8ec5f7625547552b252281f4de</id>
<content type='text'>
Currently, rcu_pending() invokes rcu_segcblist_is_offloaded() even
in CONFIG_RCU_NOCB_CPU=n kernels, which cannot possibly be offloaded.
Given that rcu_pending() is on a fastpath, it makes sense to check for
CONFIG_RCU_NOCB_CPU=y before invoking rcu_segcblist_is_offloaded().
This commit therefore makes this change.

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.ibm.com&gt;
</content>
</entry>
<entry>
<title>rcu/nocb: Use build-time no-CBs check in rcu_core()</title>
<updated>2019-08-13T21:35:49Z</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.ibm.com</email>
</author>
<published>2019-05-21T20:39:15Z</published>
<link rel='alternate' type='text/html' href='https://git.stealer.net/cgit.cgi/user/sven/linux.git/commit/?id=c1ab99d66ebcebedd9d416a840c488eaf079f3e9'/>
<id>urn:sha1:c1ab99d66ebcebedd9d416a840c488eaf079f3e9</id>
<content type='text'>
Currently, rcu_core() invokes rcu_segcblist_is_offloaded() each time it
needs to know whether the current CPU is a no-CBs CPU.  Given that it is
not possible to change the no-CBs status of a CPU after boot, and given
that it is not possible to even have no-CBs CPUs in CONFIG_RCU_NOCB_CPU=n
kernels, this repeated runtime invocation wastes CPU.  This commit
therefore created a const on-stack variable to allow this check to be
done only once per rcu_core() invocation.

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.ibm.com&gt;
</content>
</entry>
</feed>
