From 372728b0d49552641f0ea83d9d2e08817de038fa Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Sun, 8 Apr 2018 13:16:50 -0400 Subject: Replace our traditional initial-catalog-data format with a better design. Historically, the initial catalog data to be installed during bootstrap has been written in DATA() lines in the catalog header files. This had lots of disadvantages: the format was badly underdocumented, it was very difficult to edit the data in any mechanized way, and due to the lack of any abstraction the data was verbose, hard to read/understand, and easy to get wrong. Hence, move this data into separate ".dat" files and represent it in a way that can easily be read and rewritten by Perl scripts. The new format is essentially "key => value" for each column; while it's a bit repetitive, explicit labeling of each value makes the data far more readable and less error-prone. Provide a way to abbreviate entries by omitting field values that match a specified default value for their column. This allows removal of a large amount of repetitive boilerplate and also lowers the barrier to adding new columns. Also teach genbki.pl how to translate symbolic OID references into numeric OIDs for more cases than just "regproc"-like pg_proc references. It can now do that for regprocedure-like references (thus solving the problem that regproc is ambiguous for overloaded functions), operators, types, opfamilies, opclasses, and access methods. Use this to turn nearly all OID cross-references in the initial data into symbolic form. This represents a very large step forward in readability and error resistance of the initial catalog data. It should also reduce the difficulty of renumbering OID assignments in uncommitted patches. Also, solve the longstanding problem that frontend code that would like to use OID macros and other information from the catalog headers often had difficulty with backend-only code in the headers. To do this, arrange for all generated macros, plus such other declarations as we deem fit, to be placed in "derived" header files that are safe for frontend inclusion. (Once clients migrate to using these pg_*_d.h headers, it will be possible to get rid of the pg_*_fn.h headers, which only exist to quarantine code away from clients. That is left for follow-on patches, however.) The now-automatically-generated macros include the Anum_xxx and Natts_xxx constants that we used to have to update by hand when adding or removing catalog columns. Replace the former manual method of generating OID macros for pg_type entries with an automatic method, ensuring that all built-in types have OID macros. (But note that this patch does not change the way that OID macros for pg_proc entries are built and used. It's not clear that making that match the other catalogs would be worth extra code churn.) Add SGML documentation explaining what the new data format is and how to work with it. Despite being a very large change in the catalog headers, there is no catversion bump here, because postgres.bki and related output files haven't changed at all. John Naylor, based on ideas from various people; review and minor additional coding by me; previous review by Alvaro Herrera Discussion: https://postgr.es/m/CAJVSVGWO48JbbwXkJz_yBFyGYW-M9YWxnPdxJBUosDC9ou_F0Q@mail.gmail.com --- doc/src/sgml/bki.sgml | 661 ++++++++++++++++++++++++++++++++++++++++++++++-- doc/src/sgml/libpq.sgml | 2 +- 2 files changed, 636 insertions(+), 27 deletions(-) (limited to 'doc/src') diff --git a/doc/src/sgml/bki.sgml b/doc/src/sgml/bki.sgml index 33378b46eaa..c4ffc61b84b 100644 --- a/doc/src/sgml/bki.sgml +++ b/doc/src/sgml/bki.sgml @@ -1,38 +1,647 @@ - <acronym>BKI</acronym> Backend Interface + System Catalog Declarations and Initial Contents - Backend Interface (BKI) files are scripts in a - special language that is understood by the - PostgreSQL backend when running in the - bootstrap mode. The bootstrap mode allows system catalogs - to be created and filled from scratch, whereas ordinary SQL commands - require the catalogs to exist already. - BKI files can therefore be used to create the - database system in the first place. (And they are probably not - useful for anything else.) + PostgreSQL uses many different system catalogs + to keep track of the existence and properties of database objects, such as + tables and functions. Physically there is no difference between a system + catalog and a plain user table, but the backend C code knows the structure + and properties of each catalog, and can manipulate it directly at a low + level. Thus, for example, it is inadvisable to attempt to alter the + structure of a catalog on-the-fly; that would break assumptions built into + the C code about how rows of the catalog are laid out. But the structure + of the catalogs can change between major versions. - initdb uses a BKI file - to do part of its job when creating a new database cluster. The - input file used by initdb is created as - part of building and installing PostgreSQL - by a program named genbki.pl, which reads some - specially formatted C header files in the src/include/catalog/ - directory of the source tree. The created BKI file - is called postgres.bki and is - normally installed in the - share subdirectory of the installation tree. + The structures of the catalogs are declared in specially formatted C + header files in the src/include/catalog/ directory of + the source tree. In particular, for each catalog there is a header file + named after the catalog (e.g., pg_class.h + for pg_class), which defines the set of columns + the catalog has, as well as some other basic properties such as its OID. + Other critical files defining the catalog structure + include indexing.h, which defines the indexes present + on all the system catalogs, and toasting.h, which + defines TOAST tables for catalogs that need one. - Related information can be found in the documentation for - initdb. + Many of the catalogs have initial data that must be loaded into them + during the bootstrap phase + of initdb, to bring the system up to a point + where it is capable of executing SQL commands. (For + example, pg_class.h must contain an entry for itself, + as well as one for each other system catalog and index.) This + initial data is kept in editable form in data files that are also stored + in the src/include/catalog/ directory. For example, + pg_proc.dat describes all the initial rows that must + be inserted into the pg_proc catalog. + + To create the catalog files and load this initial data into them, a + backend running in bootstrap mode reads a BKI + (Backend Interface) file containing commands and initial data. + The postgres.bki file used in this mode is prepared + from the aforementioned header and data files, while building + a PostgreSQL distribution, by a Perl script + named genbki.pl. + Although it's specific to a particular PostgreSQL + release, postgres.bki is platform-independent and is + installed in the share subdirectory of the + installation tree. + + + + genbki.pl also produces a derived header file for + each catalog, for example pg_class_d.h for + the pg_class catalog. This file contains + automatically-generated macro definitions, and may contain other macros, + enum declarations, and so on that can be useful for client C code that + reads a particular catalog. + + + + Most Postgres developers don't need to be directly concerned with + the BKI file, but almost any nontrivial feature + addition in the backend will require modifying the catalog header files + and/or initial data files. The rest of this chapter gives some + information about that, and for completeness describes + the BKI file format. + + + + System Catalog Declaration Rules + + + The key part of a catalog header file is a C structure definition + describing the layout of each row of the catalog. This begins with + a CATALOG macro, which so far as the C compiler is + concerned is just shorthand for typedef struct + FormData_catalogname. + Each field in the struct gives rise to a catalog column. + Fields can be annotated using the BKI property macros described + in genbki.h, for example to define a default value + for a field or mark it as nullable or not nullable. + The CATALOG line can also be annotated, with some + other BKI property macros described in genbki.h, to + define other properties of the catalog as a whole, such as whether + it has OIDs (by default, it does). + + + + The system catalog cache code (and most catalog-munging code in general) + assumes that the fixed-length portions of all system catalog tuples are + in fact present, because it maps this C struct declaration onto them. + Thus, all variable-length fields and nullable fields must be placed at + the end, and they cannot be accessed as struct fields. + For example, if you tried to + set pg_type.typrelid + to be NULL, it would fail when some piece of code tried to reference + typetup->typrelid (or worse, + typetup->typelem, because that follows + typrelid). This would result in + random errors or even segmentation violations. + + + + As a partial guard against this type of error, variable-length or + nullable fields should not be made directly visible to the C compiler. + This is accomplished by wrapping them in #ifdef + CATALOG_VARLEN ... #endif (where + CATALOG_VARLEN is a symbol that is never defined). + This prevents C code from carelessly trying to access fields that might + not be there or might be at some other offset. + As an independent guard against creating incorrect rows, we + require all columns that should be non-nullable to be marked so + in pg_attribute. The bootstrap code will + automatically mark catalog columns as NOT NULL + if they are fixed-width and are not preceded by any nullable column. + Where this rule is inadequate, you can force correct marking by using + BKI_FORCE_NOT_NULL + and BKI_FORCE_NULL annotations as needed. But note + that NOT NULL constraints are only enforced in the + executor, not against tuples that are generated by random C code, + so care is still needed when manually creating or updating catalog rows. + + + + Frontend code should not include any pg_xxx.h + catalog header file, as these files may contain C code that won't compile + outside the backend. (Typically, that happens because these files also + contain declarations for functions + in src/backend/catalog/ files.) + Instead, frontend code may include the corresponding + generated pg_xxx_d.h header, which will contain + OID #defines and any other data that might be of use + on the client side. If you want macros or other code in a catalog header + to be visible to frontend code, write #ifdef + EXPOSE_TO_CLIENT_CODE ... #endif around that + section to instruct genbki.pl to copy that section + to the pg_xxx_d.h header. + + + + A few of the catalogs are so fundamental that they can't even be created + by the BKI create command that's + used for most catalogs, because that command needs to write information + into these catalogs to describe the new catalog. These are + called bootstrap catalogs, and defining one takes + a lot of extra work: you have to manually prepare appropriate entries for + them in the pre-loaded contents of pg_class + and pg_type, and those entries will need to be + updated for subsequent changes to the catalog's structure. + (Bootstrap catalogs also need pre-loaded entries + in pg_attribute, but + fortunately genbki.pl handles that chore nowadays.) + Avoid making new catalogs be bootstrap catalogs if at all possible. + + + + + System Catalog Initial Data + + + Each catalog that has any manually-created initial data (some do not) + has a corresponding .dat file that contains its + initial data in an editable format. + + + + Data File Format + + + Each .dat file contains Perl data structure literals + that are simply eval'd to produce an in-memory data structure consisting + of an array of hash references, one per catalog row. + A slightly modified excerpt from pg_database.dat + will demonstrate the key features: + + + +[ + +# LC_COLLATE and LC_CTYPE will be replaced at initdb time with user choices +# that might contain non-word characters, so we must double-quote them. + +{ oid => '1', oid_symbol => 'TemplateDbOid', + descr => 'database\'s default template', + datname => 'template1', datdba => 'PGUID', encoding => 'ENCODING', + datcollate => '"LC_COLLATE"', datctype => '"LC_CTYPE"', datistemplate => 't', + datallowconn => 't', datconnlimit => '-1', datlastsysoid => '0', + datfrozenxid => '0', datminmxid => '1', dattablespace => '1663', + datacl => '_null_' }, + +] + + + + Points to note: + + + + + + + The overall file layout is: open square bracket, one or more sets of + curly braces each of which represents a catalog row, close square + bracket. Write a comma after each closing curly brace. + + + + + + Within each catalog row, write comma-separated + key => + value pairs. The + allowed keys are the names of the catalog's + columns, plus the metadata keys oid, + oid_symbol, and descr. + (The use of oid and oid_symbol + is described in + below. descr supplies a description string for + the object, which will be inserted + into pg_description + or pg_shdescription as appropriate.) + While the metadata keys are optional, the catalog's defined columns + must all be provided, except when the catalog's .h + file specifies a default value for the column. + + + + + + All values must be single-quoted. Escape single quotes used within + a value with a backslash. (Backslashes meant as data need not be + doubled, however; this follows Perl's rules for simple quoted + literals.) + + + + + + Null values are represented by _null_. + + + + + + If a value is a macro to be expanded + by initdb, it should also contain double + quotes as shown above, unless we know that no special characters can + appear within the string that will be substituted. + + + + + + Comments are preceded by #, and must be on their + own lines. + + + + + + To aid readability, field values that are OIDs of other catalog + entries can be represented by names rather than numeric OIDs. + This is described in + below. + + + + + + Since hashes are unordered data structures, field order and line + layout aren't semantically significant. However, to maintain a + consistent appearance, we set a few rules that are applied by the + formatting script reformat_dat_file.pl: + + + + + + Within each pair of curly braces, the metadata + fields oid, oid_symbol, + and descr (if present) come first, in that + order, then the catalog's own fields appear in their defined order. + + + + + + Newlines are inserted between fields as needed to limit line length + to 80 characters, if possible. A newline is also inserted between + the metadata fields and the regular fields. + + + + + + If the catalog's .h file specifies a default + value for a column, and a data entry has that same + value, reformat_dat_file.pl will omit it from + the data file. This keeps the data representation compact. + + + + + + reformat_dat_file.pl preserves blank lines + and comment lines as-is. + + + + + + It's recommended to run reformat_dat_file.pl + before submitting catalog data patches. For convenience, you can + simply change to src/include/catalog/ and + run make reformat-dat-files. + + + + + + If you want to add a new method of making the data representation + smaller, you must implement it + in reformat_dat_file.pl and also + teach Catalog::ParseData() how to expand the + data back into the full representation. + + + + + + + + OID Assignment + + + A catalog row appearing in the initial data can be given a + manually-assigned OID by writing an oid + => nnnn metadata field. + Furthermore, if an OID is assigned, a C macro for that OID can be + created by writing an oid_symbol + => name metadata field. + + + + Pre-loaded catalog rows must have preassigned OIDs if there are OID + references to them in other pre-loaded rows. A preassigned OID is + also needed if the row's OID must be referenced from C code. + If neither case applies, the oid metadata field can + be omitted, in which case the bootstrap code assigns an OID + automatically, or leaves it zero in a catalog that has no OIDs. + In practice we usually preassign OIDs for all or none of the pre-loaded + rows in a given catalog, even if only some of them are actually + cross-referenced. + + + + Writing the actual numeric value of any OID in C code is considered + very bad form; always use a macro, instead. Direct references + to pg_proc OIDs are common enough that there's + a special mechanism to create the necessary macros automatically; + see src/backend/utils/Gen_fmgrtab.pl. Similarly + — but, for historical reasons, not done the same way — + there's an automatic method for creating macros + for pg_type + OIDs. oid_symbol entries are therefore not + necessary in those two catalogs. Likewise, macros for + the pg_class OIDs of system catalogs and + indexes are set up automatically. For all other system catalogs, you + have to manually specify any macros you need + via oid_symbol entries. + + + + To find an available OID for a new pre-loaded row, run the + script src/include/catalog/unused_oids. + It prints inclusive ranges of unused OIDs (e.g., the output + line 45-900 means OIDs 45 through 900 have not been + allocated yet). Currently, OIDs 1-9999 are reserved for manual + assignment; the unused_oids script simply looks + through the catalog headers and .dat files + to see which ones do not appear. You can also use + the duplicate_oids script to check for mistakes. + (That script is run automatically at compile time, and will stop the + build if a duplicate is found.) + + + + The OID counter starts at 10000 at the beginning of a bootstrap run. + If a catalog row is in a table that requires OIDs, but no OID was + preassigned by an oid field, then it will + receive an OID of 10000 or above. + + + + + OID Reference Lookup + + + Cross-references from one initial catalog row to another can be written + by just writing the preassigned OID of the referenced row. But + that's error-prone and hard to understand, so for frequently-referenced + catalogs, genbki.pl provides mechanisms to write + symbolic references instead. Currently this is possible for references + to access methods, functions, operators, opclasses, opfamilies, and + types. The rules are as follows: + + + + + + + Use of symbolic references is enabled in a particular catalog column + by attaching BKI_LOOKUP(lookuprule) + to the column's definition, where lookuprule + is pg_am, pg_proc, + pg_operator, + pg_opclass, + pg_opfamily, + or pg_type. + BKI_LOOKUP can be attached to columns of + type Oid, regproc, oidvector, + or Oid[]; in the latter two cases it implies performing a + lookup on each element of the array. + + + + + + In such a column, all entries must use the symbolic format except + when writing 0 for InvalidOid. (If the column is + declared regproc, you can optionally + write - instead of 0.) + genbki.pl will warn about unrecognized names. + + + + + + Access methods are just represented by their names, as are types. + Type names must match the referenced pg_type + entry's typname; you do not get to use any + aliases such as integer + for int4. + + + + + + A function can be represented by + its proname, if that is unique among + the pg_proc.dat entries (this works like regproc + input). Otherwise, write it + as proname(argtypename,argtypename,...), + like regprocedure. The argument type names must be spelled exactly as + they are in the pg_proc.dat entry's + proargtypes field. Do not insert any + spaces. + + + + + + Operators are represented + by oprname(lefttype,righttype), + writing the type names exactly as they appear in + the pg_operator.dat + entry's oprleft + and oprright fields. + (Write 0 for the omitted operand of a unary + operator.) + + + + + + The names of opclasses and opfamilies are only unique within an + access method, so they are represented + by access_method_name/object_name. + + + + + + In none of these cases is there any provision for + schema-qualification; all objects created during bootstrap are + expected to be in the pg_catalog schema. + + + + + + genbki.pl resolves all symbolic references while it + runs, and puts simple numeric OIDs into the emitted BKI file. There is + therefore no need for the bootstrap backend to deal with symbolic + references. + + + + + Recipes for Editing Data Files + + + Here are some suggestions about the easiest ways to perform common tasks + when updating catalog data files. + + + + Add a new column with a default to a catalog: + + Add the column to the header file with + a BKI_DEFAULT(value) + annotation. The data file need only be adjusted by adding the field + in existing rows where a non-default value is needed. + + + + + Add a default value to an existing column that doesn't have + one: + + Add a BKI_DEFAULT annotation to the header file, + then run make reformat-dat-files to remove + now-redundant field entries. + + + + + Remove a column, whether it has a default or not: + + Remove the column from the header, then run make + reformat-dat-files to remove now-useless field entries. + + + + + Change or remove an existing default value: + + You cannot simply change the header file, since that will cause the + current data to be interpreted incorrectly. First run make + expand-dat-files to rewrite the data files with all + default values inserted explicitly, then change or remove + the BKI_DEFAULT annotation, then run make + reformat-dat-files to remove superfluous fields again. + + + + + Ad-hoc bulk editing: + + reformat_dat_file.pl can be adapted to perform + many kinds of bulk changes. Look for its block comments showing where + one-off code can be inserted. In the following example, we are going + to consolidate two boolean fields in pg_proc + into a char field: + + + + + Add the new column, with a default, + to pg_proc.h: + ++ /* see PROKIND_ categories below */ ++ char prokind BKI_DEFAULT(f); + + + + + + + Create a new script based on reformat_dat_file.pl + to insert appropriate values on-the-fly: + +- # At this point we have the full row in memory as a hash +- # and can do any operations we want. As written, it only +- # removes default values, but this script can be adapted to +- # do one-off bulk-editing. ++ # One-off change to migrate to prokind ++ # Default has already been filled in by now, so change to other ++ # values as appropriate ++ if ($values{proisagg} eq 't') ++ { ++ $values{prokind} = 'a'; ++ } ++ elsif ($values{proiswindow} eq 't') ++ { ++ $values{prokind} = 'w'; ++ } + + + + + + + Run the new script: + +$ cd src/include/catalog +$ perl -I ../../backend/catalog rewrite_dat_with_prokind.pl pg_proc.dat + + At this point pg_proc.dat has all three + columns, prokind, + proisagg, + and proiswindow, though they will appear + only in rows where they have non-default values. + + + + + + Remove the old columns from pg_proc.h: + +- /* is it an aggregate? */ +- bool proisagg BKI_DEFAULT(f); +- +- /* is it a window function? */ +- bool proiswindow BKI_DEFAULT(f); + + + + + + + Finally, run make reformat-dat-files to remove + the useless old entries from pg_proc.dat. + + + + + For further examples of scripts used for bulk editing, see + convert_oid2name.pl + and remove_pg_type_oid_symbols.pl attached to this + message: + + + + + + <acronym>BKI</acronym> File Format @@ -76,10 +685,10 @@ rowtype_oid oid (name1 = type1 - FORCE NOT NULL | FORCE NULL , + FORCE NOT NULL | FORCE NULL , name2 = type2 - FORCE NOT NULL | FORCE NULL , + FORCE NOT NULL | FORCE NULL , ...) @@ -106,7 +715,7 @@ tables containing columns of other types, this cannot be done until after pg_type has been created and filled with appropriate entries. (That effectively means that only these - column types can be used in bootstrapped tables, but non-bootstrap + column types can be used in bootstrap catalogs, but non-bootstrap catalogs can contain any built-in type.) @@ -340,7 +949,7 @@ - Example + BKI Example The following sequence of commands will create the diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 8729ccd5c5a..1626999a701 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -3566,7 +3566,7 @@ Oid PQftype(const PGresult *res, You can query the system table pg_type to obtain the names and properties of the various data types. The OIDs of the built-in data types are defined - in the file src/include/catalog/pg_type.h + in the file src/include/catalog/pg_type_d.h in the source tree. -- cgit v1.2.3