summaryrefslogtreecommitdiff
path: root/src/test
diff options
context:
space:
mode:
authorTom Lane <tgl@sss.pgh.pa.us>2023-01-19 16:21:44 -0500
committerTom Lane <tgl@sss.pgh.pa.us>2023-01-19 16:21:44 -0500
commit5a617d75d3b31414f378dd764a11db1a08fa79bb (patch)
tree0c366969873581f99ca2c99b50c6d21eab11c6bd /src/test
parent44e9e34266efd42901bf7b12552f2033972d70b7 (diff)
Fix ts_headline() to handle ORs and phrase queries more honestly.
This patch largely reverts what I did in commits c9b0c678d and 78e73e875. The maximum cover length limit that I added in 78e73e875 (to band-aid over c9b0c678d's performance issues) creates too many user-visible behavior discrepancies, as complained of for example in bug #17691. The real problem with hlCover() is not what I thought at the time, but more that it seems to have been designed with only AND tsquery semantics in mind. It doesn't work quite right for OR, and even less so for NOT or phrase queries. However, we can improve that situation by building a variant of TS_execute() that returns a list of match locations. We already get an ExecPhraseData struct representing match locations for the primitive case of a simple match, as well as one for a phrase match; we just need to add some logic to combine these for AND and OR operators. The result is a list of ExecPhraseDatas, which hlCover can regard as having simple AND semantics, so that its old algorithm works correctly. There's still a lot not to like about ts_headline's behavior, but I think the remaining issues have to do with the heuristics used in mark_hl_words and mark_hl_fragments (which, likewise, were not revisited when phrase search was added). Improving those is a task for another day. Patch by me; thanks to Alvaro Herrera for review. Discussion: https://postgr.es/m/840.1669405935@sss.pgh.pa.us
Diffstat (limited to 'src/test')
-rw-r--r--src/test/regress/expected/tsearch.out145
-rw-r--r--src/test/regress/sql/tsearch.sql77
2 files changed, 213 insertions, 9 deletions
diff --git a/src/test/regress/expected/tsearch.out b/src/test/regress/expected/tsearch.out
index dc03f154996..0e682457430 100644
--- a/src/test/regress/expected/tsearch.out
+++ b/src/test/regress/expected/tsearch.out
@@ -1814,12 +1814,111 @@ Water, water, every where
Water, water, every where,
Nor any drop to drink.
S. T. Coleridge (1772-1834)
-', phraseto_tsquery('english', 'painted Ocean'));
- ts_headline
----------------------------------------
- <b>painted</b> Ship +
- Upon a <b>painted</b> <b>Ocean</b>.+
- Water, water, every where +
+', to_tsquery('english', 'day & drink'));
+ ts_headline
+------------------------------------
+ <b>day</b>, +
+ We stuck, nor breath nor motion,+
+ As idle as a painted Ship +
+ Upon a painted Ocean. +
+ Water, water, every where +
+ And all the boards did shrink; +
+ Water, water, every where, +
+ Nor any drop
+(1 row)
+
+SELECT ts_headline('english', '
+Day after day, day after day,
+ We stuck, nor breath nor motion,
+As idle as a painted Ship
+ Upon a painted Ocean.
+Water, water, every where
+ And all the boards did shrink;
+Water, water, every where,
+ Nor any drop to drink.
+S. T. Coleridge (1772-1834)
+', to_tsquery('english', 'day | drink'));
+ ts_headline
+-----------------------------------------------------------
+ <b>Day</b> after <b>day</b>, <b>day</b> after <b>day</b>,+
+ We stuck, nor breath nor motion, +
+ As idle as a painted
+(1 row)
+
+SELECT ts_headline('english', '
+Day after day, day after day,
+ We stuck, nor breath nor motion,
+As idle as a painted Ship
+ Upon a painted Ocean.
+Water, water, every where
+ And all the boards did shrink;
+Water, water, every where,
+ Nor any drop to drink.
+S. T. Coleridge (1772-1834)
+', to_tsquery('english', 'day | !drink'));
+ ts_headline
+-----------------------------------------------------------
+ <b>Day</b> after <b>day</b>, <b>day</b> after <b>day</b>,+
+ We stuck, nor breath nor motion, +
+ As idle as a painted
+(1 row)
+
+SELECT ts_headline('english', '
+Day after day, day after day,
+ We stuck, nor breath nor motion,
+As idle as a painted Ship
+ Upon a painted Ocean.
+Water, water, every where
+ And all the boards did shrink;
+Water, water, every where,
+ Nor any drop to drink.
+S. T. Coleridge (1772-1834)
+', to_tsquery('english', 'painted <-> Ship & drink'));
+ ts_headline
+----------------------------------
+ <b>painted</b> <b>Ship</b> +
+ Upon a <b>painted</b> Ocean. +
+ Water, water, every where +
+ And all the boards did shrink;+
+ Water, water, every where, +
+ Nor any drop to <b>drink</b>
+(1 row)
+
+SELECT ts_headline('english', '
+Day after day, day after day,
+ We stuck, nor breath nor motion,
+As idle as a painted Ship
+ Upon a painted Ocean.
+Water, water, every where
+ And all the boards did shrink;
+Water, water, every where,
+ Nor any drop to drink.
+S. T. Coleridge (1772-1834)
+', to_tsquery('english', 'painted <-> Ship | drink'));
+ ts_headline
+---------------------------------
+ <b>painted</b> <b>Ship</b> +
+ Upon a <b>painted</b> Ocean. +
+ Water, water, every where +
+ And all the boards did shrink
+(1 row)
+
+SELECT ts_headline('english', '
+Day after day, day after day,
+ We stuck, nor breath nor motion,
+As idle as a painted Ship
+ Upon a painted Ocean.
+Water, water, every where
+ And all the boards did shrink;
+Water, water, every where,
+ Nor any drop to drink.
+S. T. Coleridge (1772-1834)
+', to_tsquery('english', 'painted <-> Ship | !drink'));
+ ts_headline
+---------------------------------
+ <b>painted</b> <b>Ship</b> +
+ Upon a <b>painted</b> Ocean. +
+ Water, water, every where +
And all the boards did shrink
(1 row)
@@ -1833,6 +1932,25 @@ Water, water, every where
Water, water, every where,
Nor any drop to drink.
S. T. Coleridge (1772-1834)
+', phraseto_tsquery('english', 'painted Ocean'));
+ ts_headline
+----------------------------------
+ <b>painted</b> <b>Ocean</b>. +
+ Water, water, every where +
+ And all the boards did shrink;+
+ Water, water, every
+(1 row)
+
+SELECT ts_headline('english', '
+Day after day, day after day,
+ We stuck, nor breath nor motion,
+As idle as a painted Ship
+ Upon a painted Ocean.
+Water, water, every where
+ And all the boards did shrink;
+Water, water, every where,
+ Nor any drop to drink.
+S. T. Coleridge (1772-1834)
', phraseto_tsquery('english', 'idle as a painted Ship'));
ts_headline
---------------------------------------------
@@ -1851,6 +1969,15 @@ to_tsquery('english','Lorem') && phraseto_tsquery('english','ullamcorper urna'),
<b>Lorem</b> ipsum <b>urna</b>. Nullam nullam <b>ullamcorper</b> <b>urna</b>
(1 row)
+SELECT ts_headline('english',
+'Lorem ipsum urna. Nullam nullam ullamcorper urna.',
+phraseto_tsquery('english','ullamcorper urna'),
+'MaxWords=100, MinWords=5');
+ ts_headline
+-------------------------------------------------------------
+ <b>urna</b>. Nullam nullam <b>ullamcorper</b> <b>urna</b>.
+(1 row)
+
SELECT ts_headline('english', '
<html>
<!-- some comment -->
@@ -1893,9 +2020,9 @@ SELECT ts_headline('simple', '1 2 3 1 3'::text, '1 & 3', 'MaxWords=4, MinWords=1
(1 row)
SELECT ts_headline('simple', '1 2 3 1 3'::text, '1 <-> 3', 'MaxWords=4, MinWords=1');
- ts_headline
-----------------------------
- <b>3</b> <b>1</b> <b>3</b>
+ ts_headline
+-------------------
+ <b>1</b> <b>3</b>
(1 row)
--Check if headline fragments work
diff --git a/src/test/regress/sql/tsearch.sql b/src/test/regress/sql/tsearch.sql
index 0fa8ac46821..b56477a8139 100644
--- a/src/test/regress/sql/tsearch.sql
+++ b/src/test/regress/sql/tsearch.sql
@@ -468,6 +468,78 @@ Water, water, every where
Water, water, every where,
Nor any drop to drink.
S. T. Coleridge (1772-1834)
+', to_tsquery('english', 'day & drink'));
+
+SELECT ts_headline('english', '
+Day after day, day after day,
+ We stuck, nor breath nor motion,
+As idle as a painted Ship
+ Upon a painted Ocean.
+Water, water, every where
+ And all the boards did shrink;
+Water, water, every where,
+ Nor any drop to drink.
+S. T. Coleridge (1772-1834)
+', to_tsquery('english', 'day | drink'));
+
+SELECT ts_headline('english', '
+Day after day, day after day,
+ We stuck, nor breath nor motion,
+As idle as a painted Ship
+ Upon a painted Ocean.
+Water, water, every where
+ And all the boards did shrink;
+Water, water, every where,
+ Nor any drop to drink.
+S. T. Coleridge (1772-1834)
+', to_tsquery('english', 'day | !drink'));
+
+SELECT ts_headline('english', '
+Day after day, day after day,
+ We stuck, nor breath nor motion,
+As idle as a painted Ship
+ Upon a painted Ocean.
+Water, water, every where
+ And all the boards did shrink;
+Water, water, every where,
+ Nor any drop to drink.
+S. T. Coleridge (1772-1834)
+', to_tsquery('english', 'painted <-> Ship & drink'));
+
+SELECT ts_headline('english', '
+Day after day, day after day,
+ We stuck, nor breath nor motion,
+As idle as a painted Ship
+ Upon a painted Ocean.
+Water, water, every where
+ And all the boards did shrink;
+Water, water, every where,
+ Nor any drop to drink.
+S. T. Coleridge (1772-1834)
+', to_tsquery('english', 'painted <-> Ship | drink'));
+
+SELECT ts_headline('english', '
+Day after day, day after day,
+ We stuck, nor breath nor motion,
+As idle as a painted Ship
+ Upon a painted Ocean.
+Water, water, every where
+ And all the boards did shrink;
+Water, water, every where,
+ Nor any drop to drink.
+S. T. Coleridge (1772-1834)
+', to_tsquery('english', 'painted <-> Ship | !drink'));
+
+SELECT ts_headline('english', '
+Day after day, day after day,
+ We stuck, nor breath nor motion,
+As idle as a painted Ship
+ Upon a painted Ocean.
+Water, water, every where
+ And all the boards did shrink;
+Water, water, every where,
+ Nor any drop to drink.
+S. T. Coleridge (1772-1834)
', phraseto_tsquery('english', 'painted Ocean'));
SELECT ts_headline('english', '
@@ -487,6 +559,11 @@ SELECT ts_headline('english',
to_tsquery('english','Lorem') && phraseto_tsquery('english','ullamcorper urna'),
'MaxWords=100, MinWords=1');
+SELECT ts_headline('english',
+'Lorem ipsum urna. Nullam nullam ullamcorper urna.',
+phraseto_tsquery('english','ullamcorper urna'),
+'MaxWords=100, MinWords=5');
+
SELECT ts_headline('english', '
<html>
<!-- some comment -->