[PATCH] low-latency zap_page_range

zap_page_range and truncate are the two main latency problems in the VM/VFS. The radix-tree-based truncate grinds that into the dust, but no algorithmic fixes for pagetable takedown have presented themselves... Patch from Robert Love. Attached patch implements a low latency version of "zap_page_range()". Calls with even moderately large page ranges result in very long lock held times and consequently very long periods of non-preemptibility. This function is in my list of the top 3 worst offenders. It is gross. This new version reimplements zap_page_range() as a loop over ZAP_BLOCK_SIZE chunks. After each iteration, if a reschedule is pending, we drop page_table_lock and automagically preempt. Note we can not blindly drop the locks and reschedule (e.g. for the non-preempt case) since there is a possibility to enter this codepath holding other locks. ... I am sure you are familar with all this, its the same deal as your low-latency work. This patch implements the "cond_resched_lock()" as we discussed sometime back. I think this solution should be acceptable to you and Linus. There are other misc. cleanups, too. This new zap_page_range() yields latency too-low-to-benchmark: <<1ms.
author: Andrew Morton <akpm@digeo.com> 2002-09-15 08:50:19 -0700
committer: Christoph Hellwig <hch@hera.kernel.org> 2002-09-15 08:50:19 -0700
commit: e572ef2ea320724ba32094c4b4817dfde4a4bef3 (patch)
tree: 2728ff1f5305cd6242a8be3991b05edebc2a8b1c /include/linux
parent: 697f3abeacfbab361efe0191b47a2d366e04949a (diff)
1 files changed, 28 insertions, 0 deletions
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e6c24f2bfadd..b132cc3952ea 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -956,6 +956,34 @@ static inline void cond_resched(void)
 		__cond_resched();
 }
 
+#ifdef CONFIG_PREEMPT
+
+/*
+ * cond_resched_lock() - if a reschedule is pending, drop the given lock,
+ * call schedule, and on return reacquire the lock.
+ *
+ * Note: this does not assume the given lock is the _only_ lock held.
+ * The kernel preemption counter gives us "free" checking that we are
+ * atomic -- let's use it.
+ */
+static inline void cond_resched_lock(spinlock_t * lock)
+{
+	if (need_resched() && preempt_count() == 1) {
+		_raw_spin_unlock(lock);
+		preempt_enable_no_resched();
+		__cond_resched();
+		spin_lock(lock);
+	}
+}
+
+#else
+
+static inline void cond_resched_lock(spinlock_t * lock)
+{
+}
+
+#endif
+
 /* Reevaluate whether the task has signals pending delivery.
    This is required every time the blocked sigset_t changes.
    Athread cathreaders should have t->sigmask_lock.  */
author	Andrew Morton <akpm@digeo.com>	2002-09-15 08:50:19 -0700
committer	Christoph Hellwig <hch@hera.kernel.org>	2002-09-15 08:50:19 -0700
commit	e572ef2ea320724ba32094c4b4817dfde4a4bef3 (patch)
tree	2728ff1f5305cd6242a8be3991b05edebc2a8b1c /include/linux
parent	697f3abeacfbab361efe0191b47a2d366e04949a (diff)