diff options
| author | Tom Lane <tgl@sss.pgh.pa.us> | 2012-04-04 18:27:56 -0400 |
|---|---|---|
| committer | Tom Lane <tgl@sss.pgh.pa.us> | 2012-04-04 18:27:56 -0400 |
| commit | 92785dac2ee7026948962cd61c4cd84a2d052772 (patch) | |
| tree | deb7a2c120978b9f3b85410317271a91b76ad66d /doc/src | |
| parent | cb917e1544612c187c74fed1a990e26820514c8a (diff) | |
Add a "row processor" API to libpq for better handling of large results.
Traditionally libpq has collected an entire query result before passing
it back to the application. That provides a simple and transactional API,
but it's pretty inefficient for large result sets. This patch allows the
application to process each row on-the-fly instead of accumulating the
rows into the PGresult. Error recovery becomes a bit more complex, but
often that tradeoff is well worth making.
Kyotaro Horiguchi, reviewed by Marko Kreen and Tom Lane
Diffstat (limited to 'doc/src')
| -rw-r--r-- | doc/src/sgml/libpq.sgml | 268 |
1 files changed, 268 insertions, 0 deletions
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 96064bbb0de..0ec501e5bda 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -5581,6 +5581,274 @@ defaultNoticeProcessor(void *arg, const char *message) </sect1> + <sect1 id="libpq-row-processor"> + <title>Custom Row Processing</title> + + <indexterm zone="libpq-row-processor"> + <primary>PQrowProcessor</primary> + </indexterm> + + <indexterm zone="libpq-row-processor"> + <primary>row processor</primary> + <secondary>in libpq</secondary> + </indexterm> + + <para> + Ordinarily, when receiving a query result from the server, + <application>libpq</> adds each row value to the current + <type>PGresult</type> until the entire result set is received; then + the <type>PGresult</type> is returned to the application as a unit. + This approach is simple to work with, but becomes inefficient for large + result sets. To improve performance, an application can register a + custom <firstterm>row processor</> function that processes each row + as the data is received from the network. The custom row processor could + process the data fully, or store it into some application-specific data + structure for later processing. + </para> + + <caution> + <para> + The row processor function sees the rows before it is known whether the + query will succeed overall, since the server might return some rows before + encountering an error. For proper transactional behavior, it must be + possible to discard or undo whatever the row processor has done, if the + query ultimately fails. + </para> + </caution> + + <para> + When using a custom row processor, row data is not accumulated into the + <type>PGresult</type>, so the <type>PGresult</type> ultimately delivered to + the application will contain no rows (<function>PQntuples</> = + <literal>0</>). However, it still has <function>PQresultStatus</> = + <literal>PGRES_TUPLES_OK</>, and it contains correct information about the + set of columns in the query result. On the other hand, if the query fails + partway through, the returned <type>PGresult</type> has + <function>PQresultStatus</> = <literal>PGRES_FATAL_ERROR</>. The + application must be prepared to undo any actions of the row processor + whenever it gets a <literal>PGRES_FATAL_ERROR</> result. + </para> + + <para> + A custom row processor is registered for a particular connection by + calling <function>PQsetRowProcessor</function>, described below. + This row processor will be used for all subsequent query results on that + connection until changed again. A row processor function must have a + signature matching + +<synopsis> +typedef int (*PQrowProcessor) (PGresult *res, const PGdataValue *columns, + const char **errmsgp, void *param); +</synopsis> + where <type>PGdataValue</> is described by +<synopsis> +typedef struct pgDataValue +{ + int len; /* data length in bytes, or <0 if NULL */ + const char *value; /* data value, without zero-termination */ +} PGdataValue; +</synopsis> + </para> + + <para> + The <parameter>res</> parameter is the <literal>PGRES_TUPLES_OK</> + <type>PGresult</type> that will eventually be delivered to the calling + application (if no error intervenes). It contains information about + the set of columns in the query result, but no row data. In particular the + row processor must fetch <literal>PQnfields(res)</> to know the number of + data columns. + </para> + + <para> + Immediately after <application>libpq</> has determined the result set's + column information, it will make a call to the row processor with + <parameter>columns</parameter> set to NULL, but the other parameters as + usual. The row processor can use this call to initialize for a new result + set; if it has nothing to do, it can just return <literal>1</>. In + subsequent calls, one per received row, <parameter>columns</parameter> + is non-NULL and points to an array of <type>PGdataValue</> structs, one per + data column. + </para> + + <para> + <parameter>errmsgp</parameter> is an output parameter used only for error + reporting. If the row processor needs to report an error, it can set + <literal>*</><parameter>errmsgp</parameter> to point to a suitable message + string (and then return <literal>-1</>). As a special case, returning + <literal>-1</> without changing <literal>*</><parameter>errmsgp</parameter> + from its initial value of NULL is taken to mean <quote>out of memory</>. + </para> + + <para> + The last parameter, <parameter>param</parameter>, is just a void pointer + passed through from <function>PQsetRowProcessor</function>. This can be + used for communication between the row processor function and the + surrounding application. + </para> + + <para> + In the <type>PGdataValue</> array passed to a row processor, data values + cannot be assumed to be zero-terminated, whether the data format is text + or binary. A SQL NULL value is indicated by a negative length field. + </para> + + <para> + The row processor <emphasis>must</> process the row data values + immediately, or else copy them into application-controlled storage. + The value pointers passed to the row processor point into + <application>libpq</>'s internal data input buffer, which will be + overwritten by the next packet fetch. + </para> + + <para> + The row processor function must return either <literal>1</> or + <literal>-1</>. + <literal>1</> is the normal, successful result value; <application>libpq</> + will continue with receiving row values from the server and passing them to + the row processor. <literal>-1</> indicates that the row processor has + encountered an error. In that case, + <application>libpq</> will discard all remaining rows in the result set + and then return a <literal>PGRES_FATAL_ERROR</> <type>PGresult</type> to + the application (containing the specified error message, or <quote>out of + memory for query result</> if <literal>*</><parameter>errmsgp</parameter> + was left as NULL). + </para> + + <para> + Another option for exiting a row processor is to throw an exception using + C's <function>longjmp()</> or C++'s <literal>throw</>. If this is done, + processing of the incoming data can be resumed later by calling + <function>PQgetResult</>; the row processor will be invoked as normal for + any remaining rows in the current result. + As with any usage of <function>PQgetResult</>, the application + should continue calling <function>PQgetResult</> until it gets a NULL + result before issuing any new query. + </para> + + <para> + In some cases, an exception may mean that the remainder of the + query result is not interesting. In such cases the application can discard + the remaining rows with <function>PQskipResult</>, described below. + Another possible recovery option is to close the connection altogether with + <function>PQfinish</>. + </para> + + <para> + <variablelist> + <varlistentry id="libpq-pqsetrowprocessor"> + <term> + <function>PQsetRowProcessor</function> + <indexterm> + <primary>PQsetRowProcessor</primary> + </indexterm> + </term> + + <listitem> + <para> + Sets a callback function to process each row. + +<synopsis> +void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param); +</synopsis> + </para> + + <para> + The specified row processor function <parameter>func</> is installed as + the active row processor for the given connection <parameter>conn</>. + Also, <parameter>param</> is installed as the passthrough pointer to + pass to it. Alternatively, if <parameter>func</> is NULL, the standard + row processor is reinstalled on the given connection (and + <parameter>param</> is ignored). + </para> + + <para> + Although the row processor can be changed at any time in the life of a + connection, it's generally unwise to do so while a query is active. + In particular, when using asynchronous mode, be aware that both + <function>PQisBusy</> and <function>PQgetResult</> can call the current + row processor. + </para> + </listitem> + </varlistentry> + + <varlistentry id="libpq-pqgetrowprocessor"> + <term> + <function>PQgetRowProcessor</function> + <indexterm> + <primary>PQgetRowProcessor</primary> + </indexterm> + </term> + + <listitem> + <para> + Fetches the current row processor for the specified connection. + +<synopsis> +PQrowProcessor PQgetRowProcessor(const PGconn *conn, void **param); +</synopsis> + </para> + + <para> + In addition to returning the row processor function pointer, the + current passthrough pointer will be returned at + <literal>*</><parameter>param</>, if <parameter>param</> is not NULL. + </para> + </listitem> + </varlistentry> + + <varlistentry id="libpq-pqskipresult"> + <term> + <function>PQskipResult</function> + <indexterm> + <primary>PQskipResult</primary> + </indexterm> + </term> + + <listitem> + <para> + Discard all the remaining rows in the incoming result set. + +<synopsis> +PGresult *PQskipResult(PGconn *conn); +</synopsis> + </para> + + <para> + This is a simple convenience function to discard incoming data after a + row processor has failed or it's determined that the rest of the result + set is not interesting. <function>PQskipResult</> is exactly + equivalent to <function>PQgetResult</> except that it transiently + installs a dummy row processor function that just discards data. + The returned <type>PGresult</> can be discarded without further ado + if it has status <literal>PGRES_TUPLES_OK</>; but other status values + should be handled normally. (In particular, + <literal>PGRES_FATAL_ERROR</> indicates a server-reported error that + will still need to be dealt with.) + As when using <function>PQgetResult</>, one should usually repeat the + call until NULL is returned to ensure the connection has reached an + idle state. Another possible usage is to call + <function>PQskipResult</> just once, and then resume using + <function>PQgetResult</> to process subsequent result sets normally. + </para> + + <para> + Because <function>PQskipResult</> will wait for server input, it is not + very useful in asynchronous applications. In particular you should not + code a loop of <function>PQisBusy</> and <function>PQskipResult</>, + because that will result in the installed row processor being called + within <function>PQisBusy</>. To get the proper behavior in an + asynchronous application, you'll need to install a dummy row processor + (or set a flag to make your normal row processor do nothing) and leave + it that way until you have discarded all incoming data via your normal + <function>PQisBusy</> and <function>PQgetResult</> loop. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + + </sect1> + <sect1 id="libpq-events"> <title>Event System</title> |
