From 867e2c91a0c341111b7a5257dc4c9a2659a022dc Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 28 Jun 2007 00:02:40 +0000
Subject: Implement "distributed" checkpoints in which the checkpoint I/O is
 spread over a fairly long period of time, rather than being spat out in a
 burst. This happens only for background checkpoints carried out by the
 bgwriter; other cases, such as a shutdown checkpoint, are still done at full
 speed.

Remove the "all buffers" scan in the bgwriter, and associated stats
infrastructure, since this seems no longer very useful when the checkpoint
itself is properly throttled.

Original patch by Itagaki Takahiro, reworked by Heikki Linnakangas,
and some minor API editorialization by me.
---
 doc/src/sgml/config.sgml     | 82 +++++++++++++-------------------------------
 doc/src/sgml/monitoring.sgml | 37 +++++---------------
 doc/src/sgml/wal.sgml        | 34 +++++++++++++++---
 3 files changed, 62 insertions(+), 91 deletions(-)

(limited to 'doc/src')
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a38b02fd211..c7d2d395c7a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.128 2007/06/22 16:15:23 tgl Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.129 2007/06/28 00:02:37 tgl Exp $ -->
 
 <chapter Id="runtime-config">
   <title>Server Configuration</title>
@@ -1168,21 +1168,17 @@ SET ENABLE_SEQSCAN TO OFF;
 
      <para>
       Beginning in <productname>PostgreSQL</> 8.0, there is a separate server
-      process called the <firstterm>background writer</>, whose sole function
+      process called the <firstterm>background writer</>, whose function
       is to issue writes of <quote>dirty</> shared buffers.  The intent is
       that server processes handling user queries should seldom or never have
       to wait for a write to occur, because the background writer will do it.
-      This arrangement also reduces the performance penalty associated with
-      checkpoints.  The background writer will continuously trickle out dirty
-      pages to disk, so that only a few pages will need to be forced out when
-      checkpoint time arrives, instead of the storm of dirty-buffer writes that
-      formerly occurred at each checkpoint.  However there is a net overall
+      However there is a net overall
       increase in I/O load, because where a repeatedly-dirtied page might
       before have been written only once per checkpoint interval, the
       background writer might write it several times in the same interval.
       In most situations a continuous low load is preferable to periodic
-      spikes, but the parameters discussed in this subsection can be used to tune
-      the behavior for local needs.
+      spikes, but the parameters discussed in this subsection can be used to
+      tune the behavior for local needs.
      </para>
 
      <variablelist>
@@ -1242,62 +1238,14 @@ SET ENABLE_SEQSCAN TO OFF;
         </para>
        </listitem>
       </varlistentry>
-
-      <varlistentry id="guc-bgwriter-all-percent" xreflabel="bgwriter_all_percent">
-       <term><varname>bgwriter_all_percent</varname> (<type>floating point</type>)</term>
-       <indexterm>
-        <primary><varname>bgwriter_all_percent</> configuration parameter</primary>
-       </indexterm>
-       <listitem>
-        <para>
-         To reduce the amount of work that will be needed at checkpoint time,
-         the background writer also does a circular scan through the entire
-         buffer pool, writing buffers that are found to be dirty.
-         In each round, it examines up to
-         <varname>bgwriter_all_percent</> of the buffers for this purpose.
-         The default value is 0.333 (0.333% of the total number
-         of shared buffers).  With the default <varname>bgwriter_delay</>
-         setting, this will allow the entire shared buffer pool to be scanned
-         about once per minute.
-         This parameter can only be set in the <filename>postgresql.conf</>
-         file or on the server command line.
-        </para>
-       </listitem>
-      </varlistentry>
-
-      <varlistentry id="guc-bgwriter-all-maxpages" xreflabel="bgwriter_all_maxpages">
-       <term><varname>bgwriter_all_maxpages</varname> (<type>integer</type>)</term>
-       <indexterm>
-        <primary><varname>bgwriter_all_maxpages</> configuration parameter</primary>
-       </indexterm>
-       <listitem>
-        <para>
-         In each round, no more than this many buffers will be written
-         as a result of the scan of the entire buffer pool.  (If this
-         limit is reached, the scan stops, and resumes at the next buffer
-         during the next round.)
-         The default value is five buffers.
-         This parameter can only be set in the <filename>postgresql.conf</>
-         file or on the server command line.
-        </para>
-       </listitem>
-      </varlistentry>
      </variablelist>
 
      <para>
-      Smaller values of <varname>bgwriter_all_percent</varname> and
-      <varname>bgwriter_all_maxpages</varname> reduce the extra I/O load
-      caused by the background writer, but leave more work to be done
-      at checkpoint time.  To reduce load spikes at checkpoints,
-      increase these two values.
-      Similarly, smaller values of <varname>bgwriter_lru_percent</varname> and
+      Smaller values of <varname>bgwriter_lru_percent</varname> and
       <varname>bgwriter_lru_maxpages</varname> reduce the extra I/O load
       caused by the background writer, but make it more likely that server
       processes will have to issue writes for themselves, delaying interactive
       queries.
-      To disable background writing entirely,
-      set both <varname>maxpages</varname> values and/or both
-      <varname>percent</varname> values to zero.
      </para>
     </sect2>
    </sect1>
@@ -1307,7 +1255,7 @@ SET ENABLE_SEQSCAN TO OFF;
 
    <para>
     See also <xref linkend="wal-configuration"> for details on WAL
-    tuning.
+    and checkpoint tuning.
    </para>
 
     <sect2 id="runtime-config-wal-settings">
@@ -1565,6 +1513,22 @@ SET ENABLE_SEQSCAN TO OFF;
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-checkpoint-completion-target" xreflabel="checkpoint_completion_target">
+      <term><varname>checkpoint_completion_target</varname> (<type>floating point</type>)</term>
+      <indexterm>
+       <primary><varname>checkpoint_completion_target</> configuration parameter</primary>
+      </indexterm>
+      <listitem>
+       <para>
+        Specifies the target length of checkpoints, as a fraction of 
+        the checkpoint interval. The default is 0.5.
+
+        This parameter can only be set in the <filename>postgresql.conf</>
+        file or on the server command line.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-checkpoint-warning" xreflabel="checkpoint_warning">
       <term><varname>checkpoint_warning</varname> (<type>integer</type>)</term>
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3db50198771..42816bd5d2d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/monitoring.sgml,v 1.50 2007/04/27 20:08:43 neilc Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/monitoring.sgml,v 1.51 2007/06/28 00:02:37 tgl Exp $ -->
 
 <chapter id="monitoring">
  <title>Monitoring Database Activity</title>
@@ -251,9 +251,9 @@ postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
       <entry><structname>pg_stat_bgwriter</></entry>
       <entry>One row only, showing cluster-wide statistics from the
       background writer: number of scheduled checkpoints, requested
-      checkpoints, buffers written by checkpoints, lru-scans and all-scans,
-      and the number of times the bgwriter aborted a round because it had
-      written too many buffers during lru-scans and all-scans.
+      checkpoints, buffers written by checkpoints and cleaning scans,
+      and the number of times the bgwriter stopped a cleaning scan
+      because it had written too many buffers.
      </entry>
      </row>
 
@@ -777,43 +777,24 @@ postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
      </row>
 
      <row>
-      <entry><literal><function>pg_stat_get_bgwriter_buf_written_lru</function>()</literal></entry>
+      <entry><literal><function>pg_stat_get_bgwriter_buf_written_clean</function>()</literal></entry>
       <entry><type>bigint</type></entry>
       <entry>
-       The number of buffers written by the bgwriter when performing a
-       LRU scan of the buffer cache
+       The number of buffers written by the bgwriter for routine cleaning of
+       dirty pages
       </entry>
      </row>
 
      <row>
-      <entry><literal><function>pg_stat_get_bgwriter_buf_written_all</function>()</literal></entry>
+      <entry><literal><function>pg_stat_get_bgwriter_maxwritten_clean</function>()</literal></entry>
       <entry><type>bigint</type></entry>
       <entry>
-       The number of buffers written by the bgwriter when performing a
-       scan of all the buffer cache
-      </entry>
-     </row>
-
-     <row>
-      <entry><literal><function>pg_stat_get_bgwriter_maxwritten_lru</function>()</literal></entry>
-      <entry><type>bigint</type></entry>
-      <entry>
-       The number of times the bgwriter has stopped its LRU round because
+       The number of times the bgwriter has stopped its cleaning scan because
        it has written more buffers than specified in the
        <varname>bgwriter_lru_maxpages</varname> parameter
       </entry>
      </row>
 
-     <row>
-      <entry><literal><function>pg_stat_get_bgwriter_maxwritten_all</function>()</literal></entry>
-      <entry><type>bigint</type></entry>
-      <entry>
-       The number of times the bgwriter has stopped its all-buffer round
-       because it has written more buffers than specified in the
-       <varname>bgwriter_all_maxpages</varname> parameter
-      </entry>
-     </row>
-
      <row>
       <entry><literal><function>pg_stat_clear_snapshot</function>()</literal></entry>
       <entry><type>void</type></entry>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index cf0c3d2e912..aaf1d0c71ef 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/wal.sgml,v 1.43 2007/01/31 20:56:19 momjian Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/wal.sgml,v 1.44 2007/06/28 00:02:37 tgl Exp $ -->
 
 <chapter id="wal">
  <title>Reliability and the Write-Ahead Log</title>
@@ -217,15 +217,41 @@
   </para>
 
   <para>
-   There will be at least one WAL segment file, and will normally
-   not be more than 2 * <varname>checkpoint_segments</varname> + 1
+   To avoid flooding the I/O system with a burst of page writes,
+   writing dirty buffers during a checkpoint is spread over a period of time.
+   That period is controlled by
+   <xref linkend="guc-checkpoint-completion-target">, which is
+   given as a fraction of the checkpoint interval.
+   The I/O rate is adjusted so that the checkpoint finishes when the
+   given fraction of <varname>checkpoint_segments</varname> WAL segments
+   have been consumed since checkpoint start, or the given fraction of
+   <varname>checkpoint_timeout</varname> seconds have elapsed,
+   whichever is sooner.  With the default value of 0.5,
+   <productname>PostgreSQL</> can be expected to complete each checkpoint
+   in about half the time before the next checkpoint starts.  On a system
+   that's very close to maximum I/O throughput during normal operation,
+   you might want to increase <varname>checkpoint_completion_target</varname>
+   to reduce the I/O load from checkpoints.  The disadvantage of this is that
+   prolonging checkpoints affects recovery time, because more WAL segments
+   will need to be kept around for possible use in recovery.  Although
+   <varname>checkpoint_completion_target</varname> can be set as high as 1.0,
+   it is best to keep it less than that (perhaps 0.9 at most) since
+   checkpoints include some other activities besides writing dirty buffers.
+   A setting of 1.0 is quite likely to result in checkpoints not being
+   completed on time, which would result in performance loss due to
+   unexpected variation in the number of WAL segments needed.
+  </para>
+
+  <para>
+   There will always be at least one WAL segment file, and will normally
+   not be more than (2 + <varname>checkpoint_completion_target</varname>) * <varname>checkpoint_segments</varname> + 1
    files.  Each segment file is normally 16 MB (though this size can be
    altered when building the server).  You can use this to estimate space
    requirements for <acronym>WAL</acronym>.
    Ordinarily, when old log segment files are no longer needed, they
    are recycled (renamed to become the next segments in the numbered
    sequence). If, due to a short-term peak of log output rate, there
-   are more than 2 * <varname>checkpoint_segments</varname> + 1
+   are more than 3 * <varname>checkpoint_segments</varname> + 1
    segment files, the unneeded segment files will be deleted instead
    of recycled until the system gets back under this limit.
   </para>
-- 
cgit v1.2.3