user/sven/linux.git - Linux Kernel

Age	Commit message (Collapse)	Author
2006-10-04	[PATCH] RCU: CREDITS and MAINTAINERS	Josh Triplett
	Add MAINTAINERS entry for Read-Copy Update (RCU), listing Dipankar Sarma as maintainer, and giving the URL for Paul McKenney's RCU site. Add MAINTAINERS entry for rcutorture, listing myself as maintainer. Add CREDITS entries for developers of RCU, RCU variants, and rcutorture. Use Paul McKenney's preferred email address in include/linux/rcupdate.h . Signed-off-by: Josh Triplett <josh@freedesktop.org> Cc: Paul McKenney <paulmck@us.ibm.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04	[PATCH] rcu: simplify/improve batch tuning	Oleg Nesterov
	Kill a hard-to-calculate 'rsinterval' boot parameter and per-cpu rcu_data.last_rs_qlen. Instead, it adds adds a flag rcu_ctrlblk.signaled, which records the fact that one of CPUs has sent a resched IPI since the last rcu_start_batch(). Roughly speaking, we need two rcu_start_batch()s in order to move callbacks from ->nxtlist to ->donelist. This means that when ->qlen exceeds qhimark and continues to grow, we should send a resched IPI, and then do it again after we gone through a quiescent state. On the other hand, if it was already sent, we don't need to do it again when another CPU detects overflow of the queue. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30	[PATCH] rcu: Add lock annotations to RCU locking primitives	Josh Triplett
	Add __acquire annotations to rcu_read_lock and rcu_read_lock_bh, and add __release annotations to rcu_read_unlock and rcu_read_unlock_bh. This allows sparse to detect improperly paired calls to these functions. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27	[PATCH] rcutorture: add call_rcu_bh() operations	Paul E. McKenney
	Add operations for the call_rcu_bh() variant of RCU. Also add an rcu_batches_completed_bh() function, which is needed by rcutorture. Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23	[PATCH] Make RCU API inaccessible to non-GPL Linux kernel modules	Paul E. McKenney
	Remove synchronize_kernel() (deprecated 2-APR-2005 in http://lkml.org/lkml/2005/4/3/11) and makes the RCU API inaccessible to non-GPL Linux kernel modules (as was announced more than one year ago in http://lkml.org/lkml/2005/4/3/8). Tested on x86 and ppc64. Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-15	[PATCH] RCU: introduce rcu_needs_cpu() interface	Heiko Carstens
	With "Paul E. McKenney" <paulmck@us.ibm.com> Introduce rcu_needs_cpu() interface. This can be used to tell if there will be a new rcu batch on a cpu soon by looking at the curlist pointer. This can be used to avoid to enter a tickless idle state where the cpu would miss that a new batch is ready when rcu_start_batch would be called on a different cpu. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23	[PATCH] kernel/rcupdate.c: make two structs static	Adrian Bunk
	This patch makes two needlessly global structs static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08	[PATCH] rcu batch tuning	Dipankar Sarma
	This patch adds new tunables for RCU queue and finished batches. There are two types of controls - number of completed RCU updates invoked in a batch (blimit) and monitoring for high rate of incoming RCUs on a cpu (qhimark, qlowmark). By default, the per-cpu batch limit is set to a small value. If the input RCU rate exceeds the high watermark, we do two things - force quiescent state on all cpus and set the batch limit of the CPU to INTMAX. Setting batch limit to INTMAX forces all finished RCUs to be processed in one shot. If we have more than INTMAX RCUs queued up, then we have bigger problems anyway. Once the incoming queued RCUs fall below the low watermark, the batch limit is set to the default. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-03	[PATCH] Fix comment to synchronize_sched()	Paul E. McKenney
	Fix to broken comment to synchronize_rcu() noted by Keith Owens. Also add sentence noting that synchronize_sched() and synchronize_rcu() are not necessarily identical. Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Cc: Keith Owens <kaos@sgi.com> Cc: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-10	[PATCH] rcu: join rcu_ctrlblk and rcu_state	Oleg Nesterov
	This patch moves rcu_state into the rcu_ctrlblk. I think there are no reasons why we should have 2 different variables to control rcu state. Every user of rcu_state has also "rcu_ctrlblk *rcp" in the parameter list. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-09	[PATCH] rcu: uninline __rcu_pending()	Oleg Nesterov
	__rcu_pending() is rather fat and called twice from rcu_pending(). rcu_pending() has multiple callers, and not that small too. This patch uninlines both of them. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08	[PATCH] Change maxaligned_in_smp alignemnt macros to internodealigned_in_smp ↵	Ravikiran G Thirumalai
	macros ____cacheline_maxaligned_in_smp is currently used to align critical structures and avoid false sharing. It uses per-arch L1_CACHE_SHIFT_MAX and people find L1_CACHE_SHIFT_MAX useless. However, we have been using ____cacheline_maxaligned_in_smp to align structures on the internode cacheline size. As per Andi's suggestion, following patch kills ____cacheline_maxaligned_in_smp and introduces INTERNODE_CACHE_SHIFT, which defaults to L1_CACHE_SHIFT for all arches. Arches needing L3/Internode cacheline alignment can define INTERNODE_CACHE_SHIFT in the arch asm/cache.h. Patch replaces ____cacheline_maxaligned_in_smp with ____cacheline_internodealigned_in_smp With this patch, L1_CACHE_SHIFT_MAX can be killed Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12	[PATCH] add rcu_barrier() synchronization point	Dipankar Sarma
	This introduces a new interface - rcu_barrier() which waits until all the RCUs queued until this call have been completed. Reiser4 needs this, because we do more than just freeing memory object in our RCU callback: we also remove it from the list hanging off super-block. This means, that before freeing reiser4-specific portion of super-block (during umount) we have to wait until all pending RCU callbacks are executed. The only change of reiser4 made to the original patch, is exporting of rcu_barrier(). Cc: Hans Reiser <reiser@namesys.com> Cc: Vladimir V. Saveliev <vs@namesys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-30	[PATCH] RCU torture-testing kernel module	Paul E. McKenney
	This patch is a rewrite of the one submitted on October 1st, using modules (http://marc.theaimsgroup.com/?l=linux-kernel&m=112819093522998&w=2). This rewrite adds a tristate CONFIG_RCU_TORTURE_TEST, which enables an intense torture test of the RCU infratructure. This is needed due to the continued changes to the RCU infrastructure to accommodate dynamic ticks, CPU hotplug, realtime, and so on. Most of the code is in a separate file that is compiled only if the CONFIG variable is set. Documentation on how to run the test and interpret the output is also included. This code has been tested on i386 and ppc64, and an earlier version of the code has received extensive testing on a number of architectures as part of the PREEMPT_RT patchset. Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-17	[PATCH] rcu: keep rcu callback event counter	Eric Dumazet
	This makes call_rcu() keep track of how many events there are on the RCU list, and cause a reschedule event when the list gets too long. This helps keep RCU event lists down. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09	[PATCH] files: fix rcu initializers	Dipankar Sarma
	First of a number of files_lock scaability patches. Here are the x86 numbers - tiobench on a 4(8)-way (HT) P4 system on ramdisk : (lockfree) Test 2.6.10-vanilla Stdev 2.6.10-fd Stdev ------------------------------------------------------------- Seqread 1400.8 11.52 1465.4 34.27 Randread 1594 8.86 2397.2 29.21 Seqwrite 242.72 3.47 238.46 6.53 Randwrite 445.74 9.15 446.4 9.75 The performance improvement is very significant. We are getting killed by the cacheline bouncing of the files_struct lock here. Writes on ramdisk (ext2) seems to vary just too much to get any meaningful number. Also, With Tridge's thread_perf test on a 4(8)-way (HT) P4 xeon system : 2.6.12-rc5-vanilla : Running test 'readwrite' with 8 tasks Threads 0.34 +/- 0.01 seconds Processes 0.16 +/- 0.00 seconds 2.6.12-rc5-fd : Running test 'readwrite' with 8 tasks Threads 0.17 +/- 0.02 seconds Processes 0.17 +/- 0.02 seconds I repeated the measurements on ramfs (as opposed to ext2 on ramdisk in the earlier measurement) and I got more consistent results from tiobench : 4(8) way xeon P4 ----------------- (lock-free) Test 2.6.12-rc5 Stdev 2.6.12-rc5-fd Stdev ------------------------------------------------------------- Seqread 1282 18.59 1343.6 26.37 Randread 1517 7 2415 34.27 Seqwrite 702.2 5.27 709.46 5.9 Randwrite 846.86 15.15 919.68 21.4 4-way ppc64 ------------ (lock-free) Test 2.6.12-rc5 Stdev 2.6.12-rc5-fd Stdev ------------------------------------------------------------- Seqread 1549 91.16 1569.6 47.2 Randread 1473.6 25.11 1585.4 69.99 Seqwrite 1096.8 20.03 1136 29.61 Randwrite 1189.6 4.04 1275.2 32.96 Also running Tridge's thread_perf test on ppc64 : 2.6.12-rc5-vanilla -------------------- Running test 'readwrite' with 4 tasks Threads 0.20 +/- 0.02 seconds Processes 0.16 +/- 0.01 seconds 2.6.12-rc5-fd -------------------- Running test 'readwrite' with 4 tasks Threads 0.18 +/- 0.04 seconds Processes 0.16 +/- 0.01 seconds The benefits are huge (upto ~60%) in some cases on x86 primarily due to the atomic operations during acquisition of ->file_lock and cache line bouncing in fast path. ppc64 benefits are modest due to LL/SC based locking, but still statistically significant. This patch: RCU head initilizer no longer needs the head varible name since we don't use list.h lists anymore. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01	[PATCH] Deprecate synchronize_kernel, GPL replacement	Paul E. McKenney
	The synchronize_kernel() primitive is used for quite a few different purposes: waiting for RCU readers, waiting for NMIs, waiting for interrupts, and so on. This makes RCU code harder to read, since synchronize_kernel() might or might not have matching rcu_read_lock()s. This patch creates a new synchronize_rcu() that is to be used for RCU readers and a new synchronize_sched() that is used for the rest. These two new primitives currently have the same implementation, but this is might well change with additional real-time support. Both new primitives are GPL-only, the old primitive is deprecated. Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-01-04	[PATCH] rcu: simplify quiescent state detection	Manfred Spraul
	Based on an initial patch from Oleg Nesterov <oleg@tv-sign.ru> rcu_data.last_qsctr is not needed. Actually, not even a counter is needed, just a flag that indicates that there was a quiescent state. Signed-Off-By: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-01-04	[PATCH] rcu: eliminate rcu_ctrlblk.lock	Oleg Nesterov
	rcu_ctrlblk.lock is used to read the ->cur and ->next_pending atomically in __rcu_process_callbacks(). It can be replaced by a couple of memory barriers. rcu_start_batch: rcp->next_pending = 0; smp_wmb(); rcp->cur++; __rcu_process_callbacks: rdp->batch = rcp->cur + 1; smp_rmb(); if (!rcp->next_pending) rcu_start_batch(rcp, rsp, 1); This way, if __rcu_process_callbacks() sees incremented ->cur value, it must also see that ->next_pending == 0 (or rcu_start_batch() is already in progress on another cpu). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-10-27	[PATCH] RCU: rcu_assign_pointer() removal of memory barriers	Paul E. McKenney
	This patch adds the rcu_assign_pointer() API that helps reduce the need for explicit memory barriers in code that uses RCU. This API buries the required memory barriers in a macro that also does the assignment. This has been tested successfully on i386 and ppc64. Signed-off-by: <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-22	[PATCH] rcu: abstracted RCU dereferencing	Dipankar Sarma
	Use abstracted RCU API to dereference RCU protected data. Hides barrier details. Patch from Paul McKenney. This patch introduced an rcu_dereference() macro that replaces most uses of smp_read_barrier_depends(). The new macro has the advantage of explicitly documenting which pointers are protected by RCU -- in contrast, it is sometimes difficult to figure out which pointer is being protected by a given smp_read_barrier_depends() call. Signed-off-by: Paul McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-22	[PATCH] rcu: document RCU api	Dipankar Sarma
	Patch from Paul for additional documentation of api. Updated based on feedback, and to apply to 2.6.8-rc3. I will be adding more detailed documentation to the Documentation directory in a separate patch. Signed-off-by: Paul McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-22	[PATCH] rcu: introduce call_rcu_bh()	Dipankar Sarma
	Introduces call_rcu_bh() to be used when critical sections are mostly in softirq context. This patch introduces a new api - call_rcu_bh(). This is to be used for RCU callbacks for whom the critical sections are mostly in softirq context. These callbacks consider completion of a softirq handler to be a quiescent state. So, in order to make reader critical sections safe in process context, rcu_read_lock_bh() and rcu_read_unlock_bh() must be used. Use of softirq handler completion as a quiescent state speeds up RCU grace periods and prevents too many callbacks getting queued up in softirq-heavy workloads like network stack. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-22	[PATCH] rcu: fix spaces in rcupdate.h	Dipankar Sarma
	Somehow spaces replaced tabs in rcupdate.h and I would like to keep everything clean. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-22	[PATCH] rcu: clean up code	Dipankar Sarma
	Avoids per_cpu calculations and also prepares for call_rcu_bh(). At OLS, Rusty had suggested getting rid of many per_cpu() calculations in RCU code and making the code simpler. I had already done that for the rcu-softirq patch earlier, so I am splitting that into two patch. This first patch cleans up the macros and uses pointers to the rcu per-cpu data directly to manipulate the callback queues. This is useful for the call-rcu-bh patch (to follow) which introduces a new RCU mechanism - call_rcu_bh(). Both generic and softirq rcu can then use the same code, they work different global and percpu data. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-22	[PATCH] RCU: low latency rcu	Dipankar Sarma
	This patch makes RCU callbacks friendly to scheduler. It helps low latency by limiting the number of callbacks invoked per tasklet handler. Since we cannot schedule during a single softirq handler, this reduces size of non-preemptible section significantly, specially under heavy RCU updates. The limiting is done through a kernel parameter rcupdate.maxbatch which is the maximum number of RCU callbacks to invoke during a single tasklet handler. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-08-22	[PATCH] RCU - cpu offline fix	Dipankar Sarma
	This fixes the RCU cpu offline code which was broken by singly-linked RCU changes. Nathan pointed out the problems and submitted a patch for this. This is an optimal fix - no need to iterate through the list of callbacks, just use the tail pointers and attach the list from the dead cpu. Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-06-23	[PATCH] rcu: avoid passing an argument to the callback function	Andrew Morton
	From: Dipankar Sarma <dipankar@in.ibm.com> This patch changes the call_rcu() API and avoids passing an argument to the callback function as suggested by Rusty. Instead, it is assumed that the user has embedded the rcu head into a structure that is useful in the callback and the rcu_head pointer is passed to the callback. The callback can use container_of() to get the pointer to its structure and work with it. Together with the rcu-singly-link patch, it reduces the rcu_head size by 50%. Considering that we use these in things like struct dentry and struct dst_entry, this is good savings in space. An example : struct my_struct { struct rcu_head rcu; int x; int y; }; void my_rcu_callback(struct rcu_head head) { struct my_struct p = container_of(head, struct my_struct, rcu); free(p); } void my_delete(struct my_struct *p) { ... call_rcu(&p->rcu, my_rcu_callback); ... } Signed-Off-By: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-06-23	[PATCH] reduce rcu_head size - core	Andrew Morton
	From: Dipankar Sarma <dipankar@in.ibm.com> This reduces the RCU head size by using a singly linked to maintain them. The ordering of the callbacks is still maintained as before by using a tail pointer for the next list. Signed-Off-By : Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-06-23	[PATCH] rcu lock update: Code move & cleanup	Andrew Morton
	From: Manfred Spraul <manfred@colorfullife.com> Step three for reducing cacheline trashing within rcupdate.c: Cleanup and code move from <linux/rcupdate.h> to kernel/rcupdate.c: Remove internal details from the header file. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-06-23	[PATCH] rcu lock update: Use a sequence lock for starting batches	Andrew Morton
	From: Manfred Spraul <manfred@colorfullife.com> Step two for reducing cacheline trashing within rcupdate.c: rcu_process_callbacks always acquires rcu_ctrlblk.state.mutex and calls rcu_start_batch, even if the batch is already running or already scheduled to run. This can be avoided with a sequence lock: A sequence lock allows to read the current batch number and next_pending atomically. If next_pending is already set, then there is no need to acquire the global mutex. This means that for each grace period, there will be - one write access to the rcu_ctrlblk.batch cacheline - lots of read accesses to rcu_ctrlblk.batch (3-10*cpus_online()). Behavior similar to the jiffies cacheline, shouldn't be a problem. - cpus_online()+1 write accesses to rcu_ctrlblk.state, all of them starting with spin_lock(&rcu_ctrlblk.state.mutex). For large enough cpus_online() this will be a problem, but all except two of the spin_lock calls only protect the rcu_cpu_mask bitmap, thus a hierarchical bitmap would allow to split the write accesses to multiple cachelines. Tested on an 8-way with reaim. Unfortunately it probably won't help with Jack Steiner's 'ls' test since in this test only one cpu generates rcu entries. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-06-23	[PATCH] rcu lock update: Add per-cpu batch counter	Andrew Morton
	From: Manfred Spraul <manfred@colorfullife.com> Below is the one of my patches from my rcu lock update. Jack Steiner tested the first one on a 512p and it resolved the rcu cache line trashing. All were tested on osdl with STP. Step one for reducing cacheline trashing within rcupdate.c: The current code uses the rcu_cpu_mask bitmap both for keeping track of the cpus that haven't gone through a quiescent state and for checking if a cpu should look for quiescent states. The bitmap is frequently changed and the check is done by polling - together this causes cache line trashing. If it's cheaper to access a (mostly) read-only cacheline than a cacheline that is frequently dirtied, then it's possible to reduce the trashing by splitting the rcu_cpu_mask bitmap into two cachelines: The patch adds a generation counter and moves it into a separate cacheline. This allows to removes all accesses to rcu_cpumask (in the read-write cacheline) from rcu_pending and at least 50% of the accesses from rcu_check_quiescent_state. rcu_pending and all but one call per cpu to rcu_check_quiescent_state access the read-only cacheline. Probably not enough for 512p, but it's a start, just for 128 byte more memory use, without slowing down rcu grace periods. Obviously the read-only cacheline is not really read-only: it's written once per grace period to indicate that a new grace period is running. Tests on an 8-way Pentium III with reaim showed some improvement: oprofile hits: Reference: http://khack.osdl.org/stp/293075/ Hits % 23741 0.0994 rcu_pending 19057 0.0798 rcu_check_quiescent_state 6530 0.0273 rcu_check_callbacks Patched: http://khack.osdl.org/stp/293076/ 8291 0.0579 rcu_pending 5475 0.0382 rcu_check_quiescent_state 3604 0.0252 rcu_check_callbacks The total runtime differs between both runs, thus the % number must be compared: Around 50% faster. I've uninlined rcu_pending for the test. Tested with reaim and kernbench. Description: - per-cpu quiescbatch and qs_pending fields introduced: quiescbatch contains the number of the last quiescent period that the cpu has seen and qs_pending is set if the cpu has not yet reported the quiescent state for the current period. With these two fields a cpu can test if it should report a quiescent state without having to look at the frequently written rcu_cpu_mask bitmap. - curbatch split into two fields: rcu_ctrlblk.batch.completed and rcu_ctrlblk.batch.cur. This makes it possible to figure out if a grace period is running (completed != cur) without accessing the rcu_cpu_mask bitmap. - rcu_ctrlblk.maxbatch removed and replaced with a true/false next_pending flag: next_pending=1 means that another grace period should be started immediately after the end of the current period. Previously, this was achieved by maxbatch: curbatch==maxbatch means don't start, curbatch!= maxbatch means start. A flag improves the readability: The only possible values for maxbatch were curbatch and curbatch+1. - rcu_ctrlblk split into two cachelines for better performance. - common code from rcu_offline_cpu and rcu_check_quiescent_state merged into cpu_quiet. - rcu_offline_cpu: replace spin_lock_irq with spin_lock_bh, there are no accesses from irq context (and there are accesses to the spinlock with enabled interrupts from tasklet context). - rcu_restart_cpu introduced, s390 should call it after changing nohz: Theoretically the global batch counter could wrap around and end up at RCU_quiescbatch(cpu). Then the cpu would not look for a quiescent state and rcu would lock up. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2004-01-18	[PATCH] C99 change to rcupdate.h	Andrew Morton
	From: "Art Haas" <ahaas@airmail.net> Replace the GNU initializers with C99 initializers.
2003-10-01	[PATCH] misc fixes	Andrew Morton
	- mpparse printk should be in hex (john stultz <johnstul@us.ibm.com>) - fiddle with RCU copyright messages (Dipankar Sarma <dipankar@in.ibm.com>) - use print_dev_t() for sysfs dev file in videodev.c (Gerd Knorr <kraxel@bytesex.org>) - comx-hw-munich.c 64-bit warning fix (Vinay K Nallamothu <vinay.nallamothu@gsecone.com>) - random.c return val fix
2003-08-18	[PATCH] cpumask_t: allow more than BITS_PER_LONG CPUs	Andrew Morton
	From: William Lee Irwin III <wli@holomorphy.com> Contributions from: Jan Dittmer <jdittmer@sfhq.hn.org> Arnd Bergmann <arnd@arndb.de> "Bryan O'Sullivan" <bos@serpentine.com> "David S. Miller" <davem@redhat.com> Badari Pulavarty <pbadari@us.ibm.com> "Martin J. Bligh" <mbligh@aracnet.com> Zwane Mwaikambo <zwane@linuxpower.ca> It has ben tested on x86, sparc64, x86_64, ia64 (I think), ppc and ppc64. cpumask_t enables systems with NR_CPUS > BITS_PER_LONG to utilize all their cpus by creating an abstract data type dedicated to representing cpu bitmasks, similar to fd sets from userspace, and sweeping the appropriate code to update callers to the access API. The fd set-like structure is according to Linus' own suggestion; the macro calling convention to ambiguate representations with minimal code impact is my own invention. Specifically, a new set of inline functions for manipulating arbitrary-width bitmaps is introduced with a relatively simple implementation, in tandem with a new data type representing bitmaps of width NR_CPUS, cpumask_t, whose accessor functions are defined in terms of the bitmap manipulation inlines. This bitmap ADT found an additional use in i386 arch code handling sparse physical APIC ID's, which was convenient to use in this case as the accounting structure was required to be wider to accommodate the physids consumed by larger numbers of cpus. For the sake of simplicity and low code impact, these cpu bitmasks are passed primarily by value; however, an additional set of accessors along with an auxiliary data type with const call-by-reference semantics is provided to address performance concerns raised in connection with very large systems, such as SGI's larger models, where copying and call-by-value overhead would be prohibitive. Few (if any) users of the call-by-reference API are immediately introduced. Also, in order to avoid calling convention overhead on architectures where structures are required to be passed by value, NR_CPUS <= BITS_PER_LONG is special-cased so that cpumask_t falls back to an unsigned long and the accessors perform the usual bit twiddling on unsigned longs as opposed to arrays thereof. Audits were done with the structure overhead in-place, restoring this special-casing only afterward so as to ensure a more complete API conversion while undergoing the majority of its end-user exposure in -mm. More -mm's were shipped after its restoration to be sure that was tested, too. The immediate users of this functionality are Sun sparc64 systems, SGI mips64 and ia64 systems, and IBM ia32, ppc64, and s390 systems. Of these, only the ppc64 machines needing the functionality have yet to be released; all others have had systems requiring it for full functionality for at least 6 months, and in some cases, since the initial Linux port to the affected architecture.
2002-10-29	[PATCH] percpu: convert RCU	Andrew Morton
	Patch from Dipankar Sarma <dipankar@in.ibm.com> This patch convers RCU per_cpu data to use per_cpu data area and makes it safe for cpu_possible allocation by using CPU notifiers.
2002-10-15	[PATCH] Read-Copy Update infrastructure	Dipankar Sarma
	This is the RCU core patch from akpm's tree. It has been in his tree since about 2.5.37-mm1 along with dcache_rcu and so far it has worked fine. For 2.5, I am hoping that we might get the following RCU patches included - 1. rt_rcu - ipv4 routecache lookup. Davem agreed to include this patch if and when you include RCU core in your tree. 2. dcache_rcu (by Maneesh Soni) - dcache lookup avoiding dcache_lock as much as possible. This has been akpm's tree - stable and gives us good yield. I have been submitting this to Viro and I will publish some more benchmark numbers later to help decide on this. This RCU core implements RCU APIs, call_rcu() and synchronize_kernel(), by monitoring a per-CPU quiescent state (idle/user etc.) counter. call_rcu() queues a callback to be invoked after all the CPUs have gone through a quiescent state. Queuing is per-CPU and each per-CPU batch gets a batch number. As batches get their turn, a global cpu mask is used to keep track of CPUs pending quiescent state. Checking for quiescent cycle is done by saving the per-CPU counter at the beginning of the batch and then monitoring it for change through the local timer interrupt handler.