31.12. Custom Row Processing

Ordinarily, when receiving a query result from the server, libpq adds each row value to the current PGresult until the entire result set is received; then the PGresult is returned to the application as a unit. This approach is simple to work with, but becomes inefficient for large result sets. To improve performance, an application can register a custom row processor function that processes each row as the data is received from the network. The custom row processor could process the data fully, or store it into some application-specific data structure for later processing.

Caution

The row processor function sees the rows before it is known whether the query will succeed overall, since the server might return some rows before encountering an error. For proper transactional behavior, it must be possible to discard or undo whatever the row processor has done, if the query ultimately fails.

When using a custom row processor, row data is not accumulated into the PGresult, so the PGresult ultimately delivered to the application will contain no rows (PQntuples = 0). However, it still has PQresultStatus = PGRES_TUPLES_OK, and it contains correct information about the set of columns in the query result. On the other hand, if the query fails partway through, the returned PGresult has PQresultStatus = PGRES_FATAL_ERROR. The application must be prepared to undo any actions of the row processor whenever it gets a PGRES_FATAL_ERROR result.

A custom row processor is registered for a particular connection by calling PQsetRowProcessor, described below. This row processor will be used for all subsequent query results on that connection until changed again. A row processor function must have a signature matching

typedef int (*PQrowProcessor) (PGresult *res, const PGdataValue *columns,
                               const char **errmsgp, void *param);

where PGdataValue is described by

typedef struct pgDataValue
{
    int         len;            /* data length in bytes, or <0 if NULL */
    const char *value;          /* data value, without zero-termination */
} PGdataValue;

The res parameter is the PGRES_TUPLES_OK PGresult that will eventually be delivered to the calling application (if no error intervenes). It contains information about the set of columns in the query result, but no row data. In particular the row processor must fetch PQnfields(res) to know the number of data columns.

Immediately after libpq has determined the result set's column information, it will make a call to the row processor with columns set to NULL, but the other parameters as usual. The row processor can use this call to initialize for a new result set; if it has nothing to do, it can just return 1. In subsequent calls, one per received row, columns is non-NULL and points to an array of PGdataValue structs, one per data column.

errmsgp is an output parameter used only for error reporting. If the row processor needs to report an error, it can set *errmsgp to point to a suitable message string (and then return -1). As a special case, returning -1 without changing *errmsgp from its initial value of NULL is taken to mean "out of memory".

The last parameter, param, is just a void pointer passed through from PQsetRowProcessor. This can be used for communication between the row processor function and the surrounding application.

In the PGdataValue array passed to a row processor, data values cannot be assumed to be zero-terminated, whether the data format is text or binary. A SQL NULL value is indicated by a negative length field.

The row processor must process the row data values immediately, or else copy them into application-controlled storage. The value pointers passed to the row processor point into libpq's internal data input buffer, which will be overwritten by the next packet fetch.

The row processor function must return either 1 or -1. 1 is the normal, successful result value; libpq will continue with receiving row values from the server and passing them to the row processor. -1 indicates that the row processor has encountered an error. In that case, libpq will discard all remaining rows in the result set and then return a PGRES_FATAL_ERROR PGresult to the application (containing the specified error message, or "out of memory for query result" if *errmsgp was left as NULL).

Another option for exiting a row processor is to throw an exception using C's longjmp() or C++'s throw. If this is done, processing of the incoming data can be resumed later by calling PQgetResult; the row processor will be invoked as normal for any remaining rows in the current result. As with any usage of PQgetResult, the application should continue calling PQgetResult until it gets a NULL result before issuing any new query.

In some cases, an exception may mean that the remainder of the query result is not interesting. In such cases the application can discard the remaining rows with PQskipResult, described below. Another possible recovery option is to close the connection altogether with PQfinish.

PQsetRowProcessor

Sets a callback function to process each row.

void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param);

The specified row processor function func is installed as the active row processor for the given connection conn. Also, param is installed as the passthrough pointer to pass to it. Alternatively, if func is NULL, the standard row processor is reinstalled on the given connection (and param is ignored).

Although the row processor can be changed at any time in the life of a connection, it's generally unwise to do so while a query is active. In particular, when using asynchronous mode, be aware that both PQisBusy and PQgetResult can call the current row processor.

PQgetRowProcessor

Fetches the current row processor for the specified connection.

PQrowProcessor PQgetRowProcessor(const PGconn *conn, void **param);

In addition to returning the row processor function pointer, the current passthrough pointer will be returned at *param, if param is not NULL.

PQskipResult

Discard all the remaining rows in the incoming result set.

PGresult *PQskipResult(PGconn *conn);

This is a simple convenience function to discard incoming data after a row processor has failed or it's determined that the rest of the result set is not interesting. PQskipResult is exactly equivalent to PQgetResult except that it transiently installs a dummy row processor function that just discards data. The returned PGresult can be discarded without further ado if it has status PGRES_TUPLES_OK; but other status values should be handled normally. (In particular, PGRES_FATAL_ERROR indicates a server-reported error that will still need to be dealt with.) As when using PQgetResult, one should usually repeat the call until NULL is returned to ensure the connection has reached an idle state. Another possible usage is to call PQskipResult just once, and then resume using PQgetResult to process subsequent result sets normally.

Because PQskipResult will wait for server input, it is not very useful in asynchronous applications. In particular you should not code a loop of PQisBusy and PQskipResult, because that will result in the installed row processor being called within PQisBusy. To get the proper behavior in an asynchronous application, you'll need to install a dummy row processor (or set a flag to make your normal row processor do nothing) and leave it that way until you have discarded all incoming data via your normal PQisBusy and PQgetResult loop.