diff options
| author | Tom Lane <tgl@sss.pgh.pa.us> | 2011-02-17 19:00:54 -0500 |
|---|---|---|
| committer | Tom Lane <tgl@sss.pgh.pa.us> | 2011-02-17 19:00:54 -0500 |
| commit | 848cd3289e4d08f9a3c78f654ceb6e3f754e1dd3 (patch) | |
| tree | f2d2a2420fde8b6d8a4f8259b0fc7f7df749a680 /src/backend | |
| parent | 7422e0081d04ee4373a822392c729eb892a9d25e (diff) | |
Fix tsmatchsel() to account properly for null rows.
ts_typanalyze.c computes MCE statistics as fractions of the non-null rows,
which seems fairly reasonable, and anyway changing it in released versions
wouldn't be a good idea. But then ts_selfuncs.c has to account for that.
Failure to do so results in overestimates in columns with a significant
fraction of null documents. Back-patch to 8.4 where this stuff was
introduced.
Jesper Krogh
Diffstat (limited to 'src/backend')
| -rw-r--r-- | src/backend/tsearch/ts_selfuncs.c | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/src/backend/tsearch/ts_selfuncs.c b/src/backend/tsearch/ts_selfuncs.c index e7194ce66e2..b679b7544a3 100644 --- a/src/backend/tsearch/ts_selfuncs.c +++ b/src/backend/tsearch/ts_selfuncs.c @@ -189,11 +189,17 @@ tsquerysel(VariableStatData *vardata, Datum constval) /* No most-common-elements info, so do without */ selec = tsquery_opr_selec_no_stats(query); } + + /* + * MCE stats count only non-null rows, so adjust for null rows. + */ + selec *= (1.0 - stats->stanullfrac); } else { /* No stats at all, so do without */ selec = tsquery_opr_selec_no_stats(query); + /* we assume no nulls here, so no stanullfrac correction */ } return selec; |
