diff options
author | Tom Lane <tgl@sss.pgh.pa.us> | 2015-09-16 14:50:12 -0400 |
---|---|---|
committer | Tom Lane <tgl@sss.pgh.pa.us> | 2015-09-16 14:50:45 -0400 |
commit | 11103c6d95e43bf720b2293cb4c4b2f6efc4947a (patch) | |
tree | 80a1c72fcb040cbba3ef458e0a88c01a0fd8a085 | |
parent | 49232d4191149fd2955e8739a457d70228526dba (diff) |
Fix documentation of regular expression character-entry escapes.
The docs claimed that \uhhhh would be interpreted as a Unicode value
regardless of the database encoding, but it's never been implemented
that way: \uhhhh and \xhhhh actually mean exactly the same thing, namely
the character that pg_mb2wchar translates to 0xhhhh. Moreover we were
falsely dismissive of the usefulness of Unicode code points above FFFF.
Fix that.
It's been like this for ages, so back-patch to all supported branches.
-rw-r--r-- | doc/src/sgml/func.sgml | 21 |
1 files changed, 17 insertions, 4 deletions
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 4e0715aab13..1a60adfce84 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -4422,7 +4422,7 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; <entry> <literal>\e</> </entry> <entry> the character whose collating-sequence name is <literal>ESC</>, - or failing that, the character with octal value 033 </entry> + or failing that, the character with octal value <literal>033</> </entry> </row> <row> @@ -4448,15 +4448,17 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; <row> <entry> <literal>\u</><replaceable>wxyz</> </entry> <entry> (where <replaceable>wxyz</> is exactly four hexadecimal digits) - the UTF16 (Unicode, 16-bit) character <literal>U+</><replaceable>wxyz</> - in the local byte ordering </entry> + the character whose hexadecimal value is + <literal>0x</><replaceable>wxyz</> + </entry> </row> <row> <entry> <literal>\U</><replaceable>stuvwxyz</> </entry> <entry> (where <replaceable>stuvwxyz</> is exactly eight hexadecimal digits) - reserved for a hypothetical Unicode extension to 32 bits + the character whose hexadecimal value is + <literal>0x</><replaceable>stuvwxyz</> </entry> </row> @@ -4506,6 +4508,17 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo; </para> <para> + Numeric character-entry escapes specifying values outside the ASCII range + (0-127) have meanings dependent on the database encoding. When the + encoding is UTF-8, escape values are equivalent to Unicode code points, + for example <literal>\u1234</> means the character <literal>U+1234</>. + For other multibyte encodings, character-entry escapes usually just + specify the concatenation of the byte values for the character. If the + escape value does not correspond to any legal character in the database + encoding, no error will be raised, but it will never match any data. + </para> + + <para> The character-entry escapes are always taken as ordinary characters. For example, <literal>\135</> is <literal>]</> in ASCII, but <literal>\135</> does not terminate a bracket expression. |