From 5e09280057a4c3f5db297348ea3e044c9c5f4ef8 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Fri, 14 Dec 2018 12:52:49 -0500 Subject: Make pg_statistic and related code account more honestly for collations. When we first put in collations support, we basically punted on teaching pg_statistic, ANALYZE, and the planner selectivity functions about that. They've just used DEFAULT_COLLATION_OID independently of the actual collation of the data. It's time to improve that, so: * Add columns to pg_statistic that record the specific collation associated with each statistics slot. * Teach ANALYZE to use the column's actual collation when comparing values for statistical purposes, and record this in the appropriate slot. (Note that type-specific typanalyze functions are now expected to fill stats->stacoll with the appropriate collation, too.) * Teach assorted selectivity functions to use the actual collation of the stats they are looking at, instead of just assuming it's DEFAULT_COLLATION_OID. This should give noticeably better results in selectivity estimates for columns with nondefault collations, at least for query clauses that use that same collation (which would be the default behavior in most cases). It's still true that comparisons with explicit COLLATE clauses different from the stored data's collation won't be well-estimated, but that's no worse than before. Also, this patch does make the first step towards doing better with that, which is that it's now theoretically possible to collect stats for a collation other than the column's own collation. Patch by me; thanks to Peter Eisentraut for review. Discussion: https://postgr.es/m/14706.1544630227@sss.pgh.pa.us --- doc/src/sgml/catalogs.sgml | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'doc/src') diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 18c38e42de6..8d0cab5da69 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -6394,6 +6394,18 @@ SCRAM-SHA-256$<iteration count>:&l + + stacollN + oid + pg_collation.oid + + The collation used to derive the statistics stored in the + Nth slot. For example, a + histogram slot for a collatable column would show the collation that + defines the sort order of the data. Zero for noncollatable data. + + + stanumbersN float4[] -- cgit v1.2.3