diff options
author | Robert Haas <rhaas@postgresql.org> | 2018-02-02 13:25:55 -0500 |
---|---|---|
committer | Robert Haas <rhaas@postgresql.org> | 2018-02-02 13:32:44 -0500 |
commit | 9da0cc35284bdbe8d442d732963303ff0e0a40bc (patch) | |
tree | 7e2d9e8cf9e1d186cc7274b4daffd78325a81b66 /doc/src | |
parent | 9aef173163ae68c6b241e4c9bbb375c6baa71c60 (diff) |
Support parallel btree index builds.
To make this work, tuplesort.c and logtape.c must also support
parallelism, so this patch adds that infrastructure and then applies
it to the particular case of parallel btree index builds. Testing
to date shows that this can often be 2-3x faster than a serial
index build.
The model for deciding how many workers to use is fairly primitive
at present, but it's better than not having the feature. We can
refine it as we get more experience.
Peter Geoghegan with some help from Rushabh Lathia. While Heikki
Linnakangas is not an author of this patch, he wrote other patches
without which this feature would not have been possible, and
therefore the release notes should possibly credit him as an author
of this feature. Reviewed by Claudio Freire, Heikki Linnakangas,
Thomas Munro, Tels, Amit Kapila, me.
Discussion: http://postgr.es/m/CAM3SWZQKM=Pzc=CAHzRixKjp2eO5Q0Jg1SoFQqeXFQ647JiwqQ@mail.gmail.com
Discussion: http://postgr.es/m/CAH2-Wz=AxWqDoVvGU7dq856S4r6sJAj6DBn7VMtigkB33N5eyg@mail.gmail.com
Diffstat (limited to 'doc/src')
-rw-r--r-- | doc/src/sgml/config.sgml | 44 | ||||
-rw-r--r-- | doc/src/sgml/monitoring.sgml | 12 | ||||
-rw-r--r-- | doc/src/sgml/ref/create_index.sgml | 58 | ||||
-rw-r--r-- | doc/src/sgml/ref/create_table.sgml | 4 |
4 files changed, 111 insertions, 7 deletions
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index f951ddb41e7..c45979dee48 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2022,7 +2022,8 @@ include_dir 'conf.d' <para> When changing this value, consider also adjusting - <xref linkend="guc-max-parallel-workers"/> and + <xref linkend="guc-max-parallel-workers"/>, + <xref linkend="guc-max-parallel-workers-maintenance"/>, and <xref linkend="guc-max-parallel-workers-per-gather"/>. </para> </listitem> @@ -2070,6 +2071,44 @@ include_dir 'conf.d' </listitem> </varlistentry> + <varlistentry id="guc-max-parallel-workers-maintenance" xreflabel="max_parallel_maintenance_workers"> + <term><varname>max_parallel_maintenance_workers</varname> (<type>integer</type>) + <indexterm> + <primary><varname>max_parallel_maintenance_workers</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Sets the maximum number of parallel workers that can be + started by a single utility command. Currently, the only + parallel utility command that supports the use of parallel + workers is <command>CREATE INDEX</command>, and only when + building a B-tree index. Parallel workers are taken from the + pool of processes established by <xref + linkend="guc-max-worker-processes"/>, limited by <xref + linkend="guc-max-parallel-workers"/>. Note that the requested + number of workers may not actually be available at runtime. + If this occurs, the utility operation will run with fewer + workers than expected. The default value is 2. Setting this + value to 0 disables the use of parallel workers by utility + commands. + </para> + + <para> + Note that parallel utility commands should not consume + substantially more memory than equivalent non-parallel + operations. This strategy differs from that of parallel + query, where resource limits generally apply per worker + process. Parallel utility commands treat the resource limit + <varname>maintenance_work_mem</varname> as a limit to be applied to + the entire utility command, regardless of the number of + parallel worker processes. However, parallel utility + commands may still consume substantially more CPU resources + and I/O bandwidth. + </para> + </listitem> + </varlistentry> + <varlistentry id="guc-max-parallel-workers" xreflabel="max_parallel_workers"> <term><varname>max_parallel_workers</varname> (<type>integer</type>) <indexterm> @@ -2079,8 +2118,9 @@ include_dir 'conf.d' <listitem> <para> Sets the maximum number of workers that the system can support for - parallel queries. The default value is 8. When increasing or + parallel operations. The default value is 8. When increasing or decreasing this value, consider also adjusting + <xref linkend="guc-max-parallel-workers-maintenance"/> and <xref linkend="guc-max-parallel-workers-per-gather"/>. Also, note that a setting for this value which is higher than <xref linkend="guc-max-worker-processes"/> will have no effect, diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 8a9793644fa..e138d1ef076 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -1263,7 +1263,7 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser <entry>Waiting in an extension.</entry> </row> <row> - <entry morerows="32"><literal>IPC</literal></entry> + <entry morerows="33"><literal>IPC</literal></entry> <entry><literal>BgWorkerShutdown</literal></entry> <entry>Waiting for background worker to shut down.</entry> </row> @@ -1372,6 +1372,10 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser <entry>Waiting for parallel bitmap scan to become initialized.</entry> </row> <row> + <entry><literal>ParallelCreateIndexScan</literal></entry> + <entry>Waiting for parallel <command>CREATE INDEX</command> workers to finish heap scan.</entry> + </row> + <row> <entry><literal>ProcArrayGroupUpdate</literal></entry> <entry>Waiting for group leader to clear transaction id at transaction end.</entry> </row> @@ -3900,13 +3904,15 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, </row> <row> <entry><literal>sort-start</literal></entry> - <entry><literal>(int, bool, int, int, bool)</literal></entry> + <entry><literal>(int, bool, int, int, bool, int)</literal></entry> <entry>Probe that fires when a sort operation is started. arg0 indicates heap, index or datum sort. arg1 is true for unique-value enforcement. arg2 is the number of key columns. arg3 is the number of kilobytes of work memory allowed. - arg4 is true if random access to the sort result is required.</entry> + arg4 is true if random access to the sort result is required. + arg5 indicates serial when <literal>0</literal>, parallel worker when + <literal>1</literal>, or parallel leader when <literal>2</literal>.</entry> </row> <row> <entry><literal>sort-done</literal></entry> diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index 5137fe63832..f464557de81 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -600,6 +600,64 @@ Indexes: </para> <para> + <productname>PostgreSQL</productname> can build indexes while + leveraging multiple CPUs in order to process the table rows faster. + This feature is known as <firstterm>parallel index + build</firstterm>. For index methods that support building indexes + in parallel (currently, only B-tree), + <varname>maintenance_work_mem</varname> specifies the maximum + amount of memory that can be used by each index build operation as + a whole, regardless of how many worker processes were started. + Generally, a cost model automatically determines how many worker + processes should be requested, if any. + </para> + + <para> + Parallel index builds may benefit from increasing + <varname>maintenance_work_mem</varname> where an equivalent serial + index build will see little or no benefit. Note that + <varname>maintenance_work_mem</varname> may influence the number of + worker processes requested, since parallel workers must have at + least a <literal>32MB</literal> share of the total + <varname>maintenance_work_mem</varname> budget. There must also be + a remaining <literal>32MB</literal> share for the leader process. + Increasing <xref linkend="guc-max-parallel-workers-maintenance"/> + may allow more workers to be used, which will reduce the time + needed for index creation, so long as the index build is not + already I/O bound. Of course, there should also be sufficient + CPU capacity that would otherwise lie idle. + </para> + + <para> + Setting a value for <literal>parallel_workers</literal> via <xref + linkend="sql-altertable"/> directly controls how many parallel + worker processes will be requested by a <command>CREATE + INDEX</command> against the table. This bypasses the cost model + completely, and prevents <varname>maintenance_work_mem</varname> + from affecting how many parallel workers are requested. Setting + <literal>parallel_workers</literal> to 0 via <command>ALTER + TABLE</command> will disable parallel index builds on the table in + all cases. + </para> + + <tip> + <para> + You might want to reset <literal>parallel_workers</literal> after + setting it as part of tuning an index build. This avoids + inadvertent changes to query plans, since + <literal>parallel_workers</literal> affects + <emphasis>all</emphasis> parallel table scans. + </para> + </tip> + + <para> + While <command>CREATE INDEX</command> with the + <literal>CONCURRENTLY</literal> option supports parallel builds + without special restrictions, only the first table scan is actually + performed in parallel. + </para> + + <para> Use <xref linkend="sql-dropindex"/> to remove an index. </para> diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index a0c9a6d2571..d2df40d5431 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -1228,8 +1228,8 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM This sets the number of workers that should be used to assist a parallel scan of this table. If not set, the system will determine a value based on the relation size. The actual number of workers chosen by the planner - may be less, for example due to - the setting of <xref linkend="guc-max-worker-processes"/>. + or by utility statements that use parallel scans may be less, for example + due to the setting of <xref linkend="guc-max-worker-processes"/>. </para> </listitem> </varlistentry> |