diff options
| author | Tom Lane <tgl@sss.pgh.pa.us> | 2007-11-14 18:36:37 +0000 |
|---|---|---|
| committer | Tom Lane <tgl@sss.pgh.pa.us> | 2007-11-14 18:36:37 +0000 |
| commit | ca450a07eeee7b5a52336796edddce31c5f87ccd (patch) | |
| tree | 7abbed0a93382fa95595dcb9b693e49e80640214 /doc/src/sgml | |
| parent | a44c81d1b7936605afe9c15521499fa106a1aecc (diff) | |
Add an Accept parameter to "simple" dictionaries. The default of true
gives the old behavior; selecting false allows the dictionary to be used
as a filter ahead of other dictionaries, because it will pass on rather
than accept words that aren't in its stopword list.
Jan Urbanski
Diffstat (limited to 'doc/src/sgml')
| -rw-r--r-- | doc/src/sgml/textsearch.sgml | 37 |
1 files changed, 34 insertions, 3 deletions
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml index 0ba401c2a43..31753791cda 100644 --- a/doc/src/sgml/textsearch.sgml +++ b/doc/src/sgml/textsearch.sgml @@ -1,4 +1,4 @@ -<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.32 2007/11/14 03:26:24 tgl Exp $ --> +<!-- $PostgreSQL: pgsql/doc/src/sgml/textsearch.sgml,v 1.33 2007/11/14 18:36:37 tgl Exp $ --> <chapter id="textsearch"> <title id="textsearch-title">Full Text Search</title> @@ -2093,9 +2093,11 @@ SELECT ts_rank_cd (to_tsvector('english','list stop words'), to_tsquery('list &a <para> The <literal>simple</> dictionary template operates by converting the input token to lower case and checking it against a file of stop words. - If it is found in the file then <literal>NULL</> is returned, causing + If it is found in the file then an empty array is returned, causing the token to be discarded. If not, the lower-cased form of the word - is returned as the normalized lexeme. + is returned as the normalized lexeme. Alternatively, the dictionary + can be configured to report non-stop-words as unrecognized, allowing + them to be passed on to the next dictionary in the list. </para> <para> @@ -2138,6 +2140,35 @@ SELECT ts_lexize('public.simple_dict','The'); </programlisting> </para> + <para> + We can also choose to return <literal>NULL</>, instead of the lower-cased + word, if it is not found in the stop words file. This behavior is + selected by setting the dictionary's <literal>Accept</> parameter to + <literal>false</>. Continuing the example: + +<programlisting> +ALTER TEXT SEARCH DICTIONARY public.simple_dict ( Accept = false ); + +SELECT ts_lexize('public.simple_dict','YeS'); + ts_lexize +----------- + + +SELECT ts_lexize('public.simple_dict','The'); + ts_lexize +----------- + {} +</programlisting> + </para> + + <para> + With the default setting of <literal>Accept</> = <literal>true</>, + it is only useful to place a <literal>simple</> dictionary at the end + of a list of dictionaries, since it will never pass on any token to + a following dictionary. Conversely, <literal>Accept</> = <literal>false</> + is only useful when there is at least one following dictionary. + </para> + <caution> <para> Most types of dictionaries rely on configuration files, such as files of |
