summaryrefslogtreecommitdiff
path: root/doc/src
diff options
context:
space:
mode:
authorTom Lane <tgl@sss.pgh.pa.us>2012-04-04 18:27:56 -0400
committerTom Lane <tgl@sss.pgh.pa.us>2012-04-04 18:27:56 -0400
commit92785dac2ee7026948962cd61c4cd84a2d052772 (patch)
treedeb7a2c120978b9f3b85410317271a91b76ad66d /doc/src
parentcb917e1544612c187c74fed1a990e26820514c8a (diff)
Add a "row processor" API to libpq for better handling of large results.
Traditionally libpq has collected an entire query result before passing it back to the application. That provides a simple and transactional API, but it's pretty inefficient for large result sets. This patch allows the application to process each row on-the-fly instead of accumulating the rows into the PGresult. Error recovery becomes a bit more complex, but often that tradeoff is well worth making. Kyotaro Horiguchi, reviewed by Marko Kreen and Tom Lane
Diffstat (limited to 'doc/src')
-rw-r--r--doc/src/sgml/libpq.sgml268
1 files changed, 268 insertions, 0 deletions
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 96064bbb0de..0ec501e5bda 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -5581,6 +5581,274 @@ defaultNoticeProcessor(void *arg, const char *message)
</sect1>
+ <sect1 id="libpq-row-processor">
+ <title>Custom Row Processing</title>
+
+ <indexterm zone="libpq-row-processor">
+ <primary>PQrowProcessor</primary>
+ </indexterm>
+
+ <indexterm zone="libpq-row-processor">
+ <primary>row processor</primary>
+ <secondary>in libpq</secondary>
+ </indexterm>
+
+ <para>
+ Ordinarily, when receiving a query result from the server,
+ <application>libpq</> adds each row value to the current
+ <type>PGresult</type> until the entire result set is received; then
+ the <type>PGresult</type> is returned to the application as a unit.
+ This approach is simple to work with, but becomes inefficient for large
+ result sets. To improve performance, an application can register a
+ custom <firstterm>row processor</> function that processes each row
+ as the data is received from the network. The custom row processor could
+ process the data fully, or store it into some application-specific data
+ structure for later processing.
+ </para>
+
+ <caution>
+ <para>
+ The row processor function sees the rows before it is known whether the
+ query will succeed overall, since the server might return some rows before
+ encountering an error. For proper transactional behavior, it must be
+ possible to discard or undo whatever the row processor has done, if the
+ query ultimately fails.
+ </para>
+ </caution>
+
+ <para>
+ When using a custom row processor, row data is not accumulated into the
+ <type>PGresult</type>, so the <type>PGresult</type> ultimately delivered to
+ the application will contain no rows (<function>PQntuples</> =
+ <literal>0</>). However, it still has <function>PQresultStatus</> =
+ <literal>PGRES_TUPLES_OK</>, and it contains correct information about the
+ set of columns in the query result. On the other hand, if the query fails
+ partway through, the returned <type>PGresult</type> has
+ <function>PQresultStatus</> = <literal>PGRES_FATAL_ERROR</>. The
+ application must be prepared to undo any actions of the row processor
+ whenever it gets a <literal>PGRES_FATAL_ERROR</> result.
+ </para>
+
+ <para>
+ A custom row processor is registered for a particular connection by
+ calling <function>PQsetRowProcessor</function>, described below.
+ This row processor will be used for all subsequent query results on that
+ connection until changed again. A row processor function must have a
+ signature matching
+
+<synopsis>
+typedef int (*PQrowProcessor) (PGresult *res, const PGdataValue *columns,
+ const char **errmsgp, void *param);
+</synopsis>
+ where <type>PGdataValue</> is described by
+<synopsis>
+typedef struct pgDataValue
+{
+ int len; /* data length in bytes, or <0 if NULL */
+ const char *value; /* data value, without zero-termination */
+} PGdataValue;
+</synopsis>
+ </para>
+
+ <para>
+ The <parameter>res</> parameter is the <literal>PGRES_TUPLES_OK</>
+ <type>PGresult</type> that will eventually be delivered to the calling
+ application (if no error intervenes). It contains information about
+ the set of columns in the query result, but no row data. In particular the
+ row processor must fetch <literal>PQnfields(res)</> to know the number of
+ data columns.
+ </para>
+
+ <para>
+ Immediately after <application>libpq</> has determined the result set's
+ column information, it will make a call to the row processor with
+ <parameter>columns</parameter> set to NULL, but the other parameters as
+ usual. The row processor can use this call to initialize for a new result
+ set; if it has nothing to do, it can just return <literal>1</>. In
+ subsequent calls, one per received row, <parameter>columns</parameter>
+ is non-NULL and points to an array of <type>PGdataValue</> structs, one per
+ data column.
+ </para>
+
+ <para>
+ <parameter>errmsgp</parameter> is an output parameter used only for error
+ reporting. If the row processor needs to report an error, it can set
+ <literal>*</><parameter>errmsgp</parameter> to point to a suitable message
+ string (and then return <literal>-1</>). As a special case, returning
+ <literal>-1</> without changing <literal>*</><parameter>errmsgp</parameter>
+ from its initial value of NULL is taken to mean <quote>out of memory</>.
+ </para>
+
+ <para>
+ The last parameter, <parameter>param</parameter>, is just a void pointer
+ passed through from <function>PQsetRowProcessor</function>. This can be
+ used for communication between the row processor function and the
+ surrounding application.
+ </para>
+
+ <para>
+ In the <type>PGdataValue</> array passed to a row processor, data values
+ cannot be assumed to be zero-terminated, whether the data format is text
+ or binary. A SQL NULL value is indicated by a negative length field.
+ </para>
+
+ <para>
+ The row processor <emphasis>must</> process the row data values
+ immediately, or else copy them into application-controlled storage.
+ The value pointers passed to the row processor point into
+ <application>libpq</>'s internal data input buffer, which will be
+ overwritten by the next packet fetch.
+ </para>
+
+ <para>
+ The row processor function must return either <literal>1</> or
+ <literal>-1</>.
+ <literal>1</> is the normal, successful result value; <application>libpq</>
+ will continue with receiving row values from the server and passing them to
+ the row processor. <literal>-1</> indicates that the row processor has
+ encountered an error. In that case,
+ <application>libpq</> will discard all remaining rows in the result set
+ and then return a <literal>PGRES_FATAL_ERROR</> <type>PGresult</type> to
+ the application (containing the specified error message, or <quote>out of
+ memory for query result</> if <literal>*</><parameter>errmsgp</parameter>
+ was left as NULL).
+ </para>
+
+ <para>
+ Another option for exiting a row processor is to throw an exception using
+ C's <function>longjmp()</> or C++'s <literal>throw</>. If this is done,
+ processing of the incoming data can be resumed later by calling
+ <function>PQgetResult</>; the row processor will be invoked as normal for
+ any remaining rows in the current result.
+ As with any usage of <function>PQgetResult</>, the application
+ should continue calling <function>PQgetResult</> until it gets a NULL
+ result before issuing any new query.
+ </para>
+
+ <para>
+ In some cases, an exception may mean that the remainder of the
+ query result is not interesting. In such cases the application can discard
+ the remaining rows with <function>PQskipResult</>, described below.
+ Another possible recovery option is to close the connection altogether with
+ <function>PQfinish</>.
+ </para>
+
+ <para>
+ <variablelist>
+ <varlistentry id="libpq-pqsetrowprocessor">
+ <term>
+ <function>PQsetRowProcessor</function>
+ <indexterm>
+ <primary>PQsetRowProcessor</primary>
+ </indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ Sets a callback function to process each row.
+
+<synopsis>
+void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param);
+</synopsis>
+ </para>
+
+ <para>
+ The specified row processor function <parameter>func</> is installed as
+ the active row processor for the given connection <parameter>conn</>.
+ Also, <parameter>param</> is installed as the passthrough pointer to
+ pass to it. Alternatively, if <parameter>func</> is NULL, the standard
+ row processor is reinstalled on the given connection (and
+ <parameter>param</> is ignored).
+ </para>
+
+ <para>
+ Although the row processor can be changed at any time in the life of a
+ connection, it's generally unwise to do so while a query is active.
+ In particular, when using asynchronous mode, be aware that both
+ <function>PQisBusy</> and <function>PQgetResult</> can call the current
+ row processor.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="libpq-pqgetrowprocessor">
+ <term>
+ <function>PQgetRowProcessor</function>
+ <indexterm>
+ <primary>PQgetRowProcessor</primary>
+ </indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ Fetches the current row processor for the specified connection.
+
+<synopsis>
+PQrowProcessor PQgetRowProcessor(const PGconn *conn, void **param);
+</synopsis>
+ </para>
+
+ <para>
+ In addition to returning the row processor function pointer, the
+ current passthrough pointer will be returned at
+ <literal>*</><parameter>param</>, if <parameter>param</> is not NULL.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="libpq-pqskipresult">
+ <term>
+ <function>PQskipResult</function>
+ <indexterm>
+ <primary>PQskipResult</primary>
+ </indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ Discard all the remaining rows in the incoming result set.
+
+<synopsis>
+PGresult *PQskipResult(PGconn *conn);
+</synopsis>
+ </para>
+
+ <para>
+ This is a simple convenience function to discard incoming data after a
+ row processor has failed or it's determined that the rest of the result
+ set is not interesting. <function>PQskipResult</> is exactly
+ equivalent to <function>PQgetResult</> except that it transiently
+ installs a dummy row processor function that just discards data.
+ The returned <type>PGresult</> can be discarded without further ado
+ if it has status <literal>PGRES_TUPLES_OK</>; but other status values
+ should be handled normally. (In particular,
+ <literal>PGRES_FATAL_ERROR</> indicates a server-reported error that
+ will still need to be dealt with.)
+ As when using <function>PQgetResult</>, one should usually repeat the
+ call until NULL is returned to ensure the connection has reached an
+ idle state. Another possible usage is to call
+ <function>PQskipResult</> just once, and then resume using
+ <function>PQgetResult</> to process subsequent result sets normally.
+ </para>
+
+ <para>
+ Because <function>PQskipResult</> will wait for server input, it is not
+ very useful in asynchronous applications. In particular you should not
+ code a loop of <function>PQisBusy</> and <function>PQskipResult</>,
+ because that will result in the installed row processor being called
+ within <function>PQisBusy</>. To get the proper behavior in an
+ asynchronous application, you'll need to install a dummy row processor
+ (or set a flag to make your normal row processor do nothing) and leave
+ it that way until you have discarded all incoming data via your normal
+ <function>PQisBusy</> and <function>PQgetResult</> loop.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ </sect1>
+
<sect1 id="libpq-events">
<title>Event System</title>