summaryrefslogtreecommitdiff
path: root/doc/src
diff options
context:
space:
mode:
authorTom Lane <tgl@sss.pgh.pa.us>2022-08-02 10:29:35 -0400
committerTom Lane <tgl@sss.pgh.pa.us>2022-08-02 10:29:35 -0400
commitec62ce55a813db5c925d89a53b5b22baa509abb6 (patch)
tree382d4b8dd8c1e20245ba0210b803a5a5e99b4ba1 /doc/src
parent1349d2790bf48a4de072931c722f39337e72055e (diff)
Change type "char"'s I/O format for non-ASCII characters.
Previously, a byte with the high bit set was just transmitted as-is by charin() and charout(). This is problematic if the database encoding is multibyte, because the result of charout() won't be validly encoded, which breaks various stuff that expects all text strings to be validly encoded. We've previously decided to enforce encoding validity rather than try to individually harden each place that might have a problem with such strings, so it's time to do something about "char". To fix, represent high-bit-set characters as \ooo (backslash and three octal digits), following the ancient "escape" format for bytea. charin() will continue to accept the old way as well, though that is only reachable in single-byte encodings. Add some test cases just so there is coverage for this code. We'll otherwise leave this question undocumented as it was before, because we don't really want to encourage end-user use of "char". For the moment, back-patch into v15 so that this change appears in 15beta3. If there's not great pushback we should consider absorbing this change into the older branches. Discussion: https://postgr.es/m/2318797.1638558730@sss.pgh.pa.us
Diffstat (limited to 'doc/src')
-rw-r--r--doc/src/sgml/datatype.sgml10
1 files changed, 6 insertions, 4 deletions
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 8e30b82273c..4cc9e592708 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -1338,9 +1338,10 @@ SELECT b, char_length(b) FROM test2;
<para>
There are two other fixed-length character types in
<productname>PostgreSQL</productname>, shown in <xref
- linkend="datatype-character-special-table"/>. The <type>name</type>
- type exists <emphasis>only</emphasis> for the storage of identifiers
- in the internal system catalogs and is not intended for use by the general user. Its
+ linkend="datatype-character-special-table"/>.
+ These are not intended for general-purpose use, only for use
+ in the internal system catalogs.
+ The <type>name</type> type is used to store identifiers. Its
length is currently defined as 64 bytes (63 usable characters plus
terminator) but should be referenced using the constant
<symbol>NAMEDATALEN</symbol> in <literal>C</literal> source code.
@@ -1348,7 +1349,8 @@ SELECT b, char_length(b) FROM test2;
is therefore adjustable for special uses); the default maximum
length might change in a future release. The type <type>"char"</type>
(note the quotes) is different from <type>char(1)</type> in that it
- only uses one byte of storage. It is internally used in the system
+ only uses one byte of storage, and therefore can store only a single
+ ASCII character. It is used in the system
catalogs as a simplistic enumeration type.
</para>