From 2b3a0630b54ff9970a7cd2c78a686015f9a53c0c Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Thu, 17 Feb 2011 19:01:01 -0500 Subject: Fix tsmatchsel() to account properly for null rows. ts_typanalyze.c computes MCE statistics as fractions of the non-null rows, which seems fairly reasonable, and anyway changing it in released versions wouldn't be a good idea. But then ts_selfuncs.c has to account for that. Failure to do so results in overestimates in columns with a significant fraction of null documents. Back-patch to 8.4 where this stuff was introduced. Jesper Krogh --- src/include/catalog/pg_statistic.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'src/include') diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h index 0e831ef2982..edccf254b1b 100644 --- a/src/include/catalog/pg_statistic.h +++ b/src/include/catalog/pg_statistic.h @@ -244,6 +244,8 @@ typedef FormData_pg_statistic *Form_pg_statistic; * type with identifiable elements (for instance, tsvector). staop contains * the equality operator appropriate to the element type. stavalues contains * the most common element values, and stanumbers their frequencies. Unlike + * MCV slots, frequencies are measured as the fraction of non-null rows the + * element value appears in, not the frequency of all rows. Also unlike * MCV slots, the values are sorted into order (to support binary search * for a particular value). Since this puts the minimum and maximum * frequencies at unpredictable spots in stanumbers, there are two extra -- cgit v1.2.3