From bb742407947ad1cbf19355d24282380d576e7654 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Sun, 20 Feb 2011 00:17:18 -0500 Subject: Implement an API to let foreign-data wrappers actually be functional. This commit provides the core code and documentation needed. A contrib module test case will follow shortly. Shigeru Hanada, Jan Urbanski, Heikki Linnakangas --- doc/src/sgml/ddl.sgml | 47 +++++ doc/src/sgml/fdwhandler.sgml | 212 ++++++++++++++++++++++ doc/src/sgml/filelist.sgml | 1 + doc/src/sgml/postgres.sgml | 1 + doc/src/sgml/ref/create_foreign_data_wrapper.sgml | 13 +- doc/src/sgml/ref/create_foreign_table.sgml | 4 +- 6 files changed, 267 insertions(+), 11 deletions(-) create mode 100644 doc/src/sgml/fdwhandler.sgml (limited to 'doc/src') diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index a65b4bcd338..12f7c3706e8 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -2986,6 +2986,53 @@ ANALYZE measurement; + + Foreign Data + + + foreign data + + + foreign table + + + + PostgreSQL implements portions of the SQL/MED + specification, allowing you to access data that resides outside + PostgreSQL using regular SQL queries. Such data is referred to as + foreign data. (Note that this usage is not to be confused + with foreign keys, which are a type of constraint within the database.) + + + + Foreign data is accessed with help from a + foreign data wrapper. A foreign data wrapper is a + library that can communicate with an external data source, hiding the + details of connecting to the data source and fetching data from it. There + are several foreign data wrappers available, which can for example read + plain data files residing on the server, or connect to another PostgreSQL + instance. If none of the existing foreign data wrappers suit your needs, + you can write your own; see . + + + + To access foreign data, you need to create a foreign server + object, which defines how to connect to a particular external data source, + according to the set of options used by a particular foreign data + wrapper. Then you need to create one or more foreign + tables, which define the structure of the remote data. A + foreign table can be used in queries just like a normal table, but a + foreign table has no storage in the PostgreSQL server. Whenever it is + used, PostgreSQL asks the foreign data wrapper to fetch the data from the + external source. + + + + Currently, foreign tables are read-only. This limitation may be fixed + in a future release. + + + Other Database Objects diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml new file mode 100644 index 00000000000..fc07f129b79 --- /dev/null +++ b/doc/src/sgml/fdwhandler.sgml @@ -0,0 +1,212 @@ + + + + Writing A Foreign Data Wrapper + + + foreign data wrapper + handler for + + + + All operations on a foreign table are handled through its foreign data + wrapper, which consists of a set of functions that the planner and + executor call. The foreign data wrapper is responsible for fetching + data from the remote data source and returning it to the + PostgreSQL executor. This chapter outlines how + to write a new foreign data wrapper. + + + + The FDW author needs to implement a handler function, and optionally + a validator function. Both functions must be written in a compiled + language such as C, using the version-1 interface. + For details on C language calling conventions and dynamic loading, + see . + + + + The handler function simply returns a struct of function pointers to + callback functions that will be called by the planner and executor. + Most of the effort in writing an FDW is in implementing these callback + functions. + The handler function must be registered with + PostgreSQL as taking no arguments and returning + the special pseudo-type fdw_handler. + The callback functions are plain C functions and are not visible or + callable at the SQL level. + + + + The validator function is responsible for validating options given in the + CREATE FOREIGN DATA WRAPPER, CREATE + SERVER and CREATE FOREIGN TABLE commands. + The validator function must be registered as taking two arguments, a text + array containing the options to be validated, and an OID representing the + type of object the options are associated with (in the form of the OID + of the system catalog the object would be stored in). If no validator + function is supplied, the options are not checked at object creation time. + + + + The foreign data wrappers included in the standard distribution are good + references when trying to write your own. Look into the + contrib/file_fdw subdirectory of the source tree. + The reference page also has + some useful details. + + + + + The SQL standard specifies an interface for writing foreign data wrappers. + However, PostgreSQL does not implement that API, because the effort to + accommodate it into PostgreSQL would be large, and the standard API hasn't + gained wide adoption anyway. + + + + + Foreign Data Wrapper Callback Routines + + + The FDW handler function returns a palloc'd FdwRoutine + struct containing pointers to the following callback functions: + + + + +FdwPlan * +PlanForeignScan (Oid foreigntableid, + PlannerInfo *root, + RelOptInfo *baserel); + + + Plan a scan on a foreign table. This is called when a query is planned. + foreigntableid is the pg_class OID of the + foreign table. root is the planner's global information + about the query, and baserel is the planner's information + about this table. + The function must return a palloc'd struct that contains cost estimates + plus any FDW-private information that is needed to execute the foreign + scan at a later time. (Note that the private information must be + represented in a form that copyObject knows how to copy.) + + + + The information in root and baserel can be used + to reduce the amount of information that has to be fetched from the + foreign table (and therefore reduce the cost estimate). + baserel->baserestrictinfo is particularly interesting, as + it contains restriction quals (WHERE clauses) that can be + used to filter the rows to be fetched. (The FDW is not required to + enforce these quals, as the finished plan will recheck them anyway.) + baserel->reltargetlist can be used to determine which + columns need to be fetched. + + + + In addition to returning cost estimates, the function should update + baserel->rows to be the expected number of rows returned + by the scan, after accounting for the filtering done by the restriction + quals. The initial value of baserel->rows is just a + constant default estimate, which should be replaced if at all possible. + The function may also choose to update baserel->width if + it can compute a better estimate of the average result row width. + + + + +void +ExplainForeignScan (ForeignScanState *node, + ExplainState *es); + + + Print additional EXPLAIN output for a foreign table scan. + This can just return if there is no need to print anything. + Otherwise, it should call ExplainPropertyText and + related functions to add fields to the EXPLAIN output. + The flag fields in es can be used to determine what to + print, and the state of the ForeignScanState node + can be inspected to provide runtime statistics in the EXPLAIN + ANALYZE case. + + + + +void +BeginForeignScan (ForeignScanState *node, + int eflags); + + + Begin executing a foreign scan. This is called during executor startup. + It should perform any initialization needed before the scan can start. + The ForeignScanState node has already been created, but + its fdw_state field is still NULL. Information about + the table to scan is accessible through the + ForeignScanState node (in particular, from the underlying + ForeignScan plan node, which contains a pointer to the + FdwPlan structure returned by + PlanForeignScan). + + + + Note that when (eflags & EXEC_FLAG_EXPLAIN_ONLY) is + true, this function should not perform any externally-visible actions; + it should only do the minimum required to make the node state valid + for ExplainForeignScan and EndForeignScan. + + + + +TupleTableSlot * +IterateForeignScan (ForeignScanState *node); + + + Fetch one row from the foreign source, returning it in a tuple table slot + (the node's ScanTupleSlot should be used for this + purpose). Return NULL if no more rows are available. The tuple table + slot infrastructure allows either a physical or virtual tuple to be + returned; in most cases the latter choice is preferable from a + performance standpoint. Note that this is called in a short-lived memory + context that will be reset between invocations. Create a memory context + in BeginForeignScan if you need longer-lived storage, or use + the es_query_cxt of the node's EState. + + + + The rows returned must match the column signature of the foreign table + being scanned. If you choose to optimize away fetching columns that + are not needed, you should insert nulls in those column positions. + + + + +void +ReScanForeignScan (ForeignScanState *node); + + + Restart the scan from the beginning. Note that any parameters the + scan depends on may have changed value, so the new scan does not + necessarily return exactly the same rows. + + + + +void +EndForeignScan (ForeignScanState *node); + + + End the scan and release resources. It is normally not important + to release palloc'd memory, but for example open files and connections + to remote servers should be cleaned up. + + + + The FdwRoutine and FdwPlan struct types + are declared in src/include/foreign/fdwapi.h, which see + for additional details. + + + + + diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index b9d4ea59b1a..659bcba7c78 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -86,6 +86,7 @@ + diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml index 4d32f7db259..98d19a5c733 100644 --- a/doc/src/sgml/postgres.sgml +++ b/doc/src/sgml/postgres.sgml @@ -238,6 +238,7 @@ &sources; &nls; &plhandler; + &fdwhandler; &geqo; &indexam; &gist; diff --git a/doc/src/sgml/ref/create_foreign_data_wrapper.sgml b/doc/src/sgml/ref/create_foreign_data_wrapper.sgml index 711f32b118b..3093ebcb4ac 100644 --- a/doc/src/sgml/ref/create_foreign_data_wrapper.sgml +++ b/doc/src/sgml/ref/create_foreign_data_wrapper.sgml @@ -119,18 +119,13 @@ CREATE FOREIGN DATA WRAPPER name Notes - At the moment, the foreign-data wrapper functionality is very - rudimentary. The purpose of foreign-data wrappers, foreign - servers, and user mappings is to store this information in a - standard way so that it can be queried by interested applications. - One such application is dblink; - see . The functionality to actually query - external data through a foreign-data wrapper library does not exist - yet. + At the moment, the foreign-data wrapper functionality is rudimentary. + There is no support for updating a foreign table, and optimization of + queries is primitive (and mostly left to the wrapper, too). - There is currently one foreign-data wrapper validator function + There is one built-in foreign-data wrapper validator function provided: postgresql_fdw_validator, which accepts options corresponding to libpq connection diff --git a/doc/src/sgml/ref/create_foreign_table.sgml b/doc/src/sgml/ref/create_foreign_table.sgml index ac2e1393e38..77c62140f28 100644 --- a/doc/src/sgml/ref/create_foreign_table.sgml +++ b/doc/src/sgml/ref/create_foreign_table.sgml @@ -131,8 +131,8 @@ CREATE FOREIGN TABLE [ IF NOT EXISTS ] table_name Options to be associated with the new foreign table. The allowed option names and values are specific to each foreign - data wrapper and are validated using the foreign-data wrapper - library. Option names must be unique. + data wrapper and are validated using the foreign-data wrapper's + validator function. Option names must be unique. -- cgit v1.2.3