summaryrefslogtreecommitdiff
path: root/src/backend/storage/buffer/README
diff options
context:
space:
mode:
Diffstat (limited to 'src/backend/storage/buffer/README')
-rw-r--r--src/backend/storage/buffer/README44
1 files changed, 14 insertions, 30 deletions
diff --git a/src/backend/storage/buffer/README b/src/backend/storage/buffer/README
index a182fcd660c..119f31b5d65 100644
--- a/src/backend/storage/buffer/README
+++ b/src/backend/storage/buffer/README
@@ -128,11 +128,11 @@ independently. If it is necessary to lock more than one partition at a time,
they must be locked in partition-number order to avoid risk of deadlock.
* A separate system-wide spinlock, buffer_strategy_lock, provides mutual
-exclusion for operations that access the buffer free list or select
-buffers for replacement. A spinlock is used here rather than a lightweight
-lock for efficiency; no other locks of any sort should be acquired while
-buffer_strategy_lock is held. This is essential to allow buffer replacement
-to happen in multiple backends with reasonable concurrency.
+exclusion for operations that select buffers for replacement. A spinlock is
+used here rather than a lightweight lock for efficiency; no other locks of any
+sort should be acquired while buffer_strategy_lock is held. This is essential
+to allow buffer replacement to happen in multiple backends with reasonable
+concurrency.
* Each buffer header contains a spinlock that must be taken when examining
or changing fields of that buffer header. This allows operations such as
@@ -158,18 +158,8 @@ unset by sleeping on the buffer's condition variable.
Normal Buffer Replacement Strategy
----------------------------------
-There is a "free list" of buffers that are prime candidates for replacement.
-In particular, buffers that are completely free (contain no valid page) are
-always in this list. We could also throw buffers into this list if we
-consider their pages unlikely to be needed soon; however, the current
-algorithm never does that. The list is singly-linked using fields in the
-buffer headers; we maintain head and tail pointers in global variables.
-(Note: although the list links are in the buffer headers, they are
-considered to be protected by the buffer_strategy_lock, not the buffer-header
-spinlocks.) To choose a victim buffer to recycle when there are no free
-buffers available, we use a simple clock-sweep algorithm, which avoids the
-need to take system-wide locks during common operations. It works like
-this:
+To choose a victim buffer to recycle we use a simple clock-sweep algorithm. It
+works like this:
Each buffer header contains a usage counter, which is incremented (up to a
small limit value) whenever the buffer is pinned. (This requires only the
@@ -184,20 +174,14 @@ The algorithm for a process that needs to obtain a victim buffer is:
1. Obtain buffer_strategy_lock.
-2. If buffer free list is nonempty, remove its head buffer. Release
-buffer_strategy_lock. If the buffer is pinned or has a nonzero usage count,
-it cannot be used; ignore it go back to step 1. Otherwise, pin the buffer,
-and return it.
+2. Select the buffer pointed to by nextVictimBuffer, and circularly advance
+nextVictimBuffer for next time. Release buffer_strategy_lock.
-3. Otherwise, the buffer free list is empty. Select the buffer pointed to by
-nextVictimBuffer, and circularly advance nextVictimBuffer for next time.
-Release buffer_strategy_lock.
-
-4. If the selected buffer is pinned or has a nonzero usage count, it cannot
+3. If the selected buffer is pinned or has a nonzero usage count, it cannot
be used. Decrement its usage count (if nonzero), reacquire
buffer_strategy_lock, and return to step 3 to examine the next buffer.
-5. Pin the selected buffer, and return.
+4. Pin the selected buffer, and return.
(Note that if the selected buffer is dirty, we will have to write it out
before we can recycle it; if someone else pins the buffer meanwhile we will
@@ -211,9 +195,9 @@ Buffer Ring Replacement Strategy
When running a query that needs to access a large number of pages just once,
such as VACUUM or a large sequential scan, a different strategy is used.
A page that has been touched only by such a scan is unlikely to be needed
-again soon, so instead of running the normal clock sweep algorithm and
+again soon, so instead of running the normal clock-sweep algorithm and
blowing out the entire buffer cache, a small ring of buffers is allocated
-using the normal clock sweep algorithm and those buffers are reused for the
+using the normal clock-sweep algorithm and those buffers are reused for the
whole scan. This also implies that much of the write traffic caused by such
a statement will be done by the backend itself and not pushed off onto other
processes.
@@ -234,7 +218,7 @@ the ring strategy effectively degrades to the normal strategy.
VACUUM uses a ring like sequential scans, however, the size of this ring is
controlled by the vacuum_buffer_usage_limit GUC. Dirty pages are not removed
-from the ring. Instead, WAL is flushed if needed to allow reuse of the
+from the ring. Instead, the WAL is flushed if needed to allow reuse of the
buffers. Before introducing the buffer ring strategy in 8.3, VACUUM's buffers
were sent to the freelist, which was effectively a buffer ring of 1 buffer,
resulting in excessive WAL flushing.