From 3ccae48f44d993351e1f881761bd6c556ebd6638 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Tue, 9 Apr 2013 01:05:55 -0400 Subject: Support indexing of regular-expression searches in contrib/pg_trgm. This works by extracting trigrams from the given regular expression, in generally the same spirit as the previously-existing support for LIKE searches, though of course the details are far more complicated. Currently, only GIN indexes are supported. We might be able to make it work with GiST indexes later. The implementation includes adding API functions to backend/regex/ to provide a view of the search NFA created from a regular expression. These functions are meant to be generic enough to be supportable in a standalone version of the regex library, should that ever happen. Alexander Korotkov, reviewed by Heikki Linnakangas and Tom Lane --- doc/src/sgml/pgtrgm.sgml | 43 ++++++++++++++++++++++++++++++++++++++----- 1 file changed, 38 insertions(+), 5 deletions(-) (limited to 'doc/src') diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml index 30e535557c3..4572750f4d7 100644 --- a/doc/src/sgml/pgtrgm.sgml +++ b/doc/src/sgml/pgtrgm.sgml @@ -9,7 +9,7 @@ The pg_trgm module provides functions and operators - for determining the similarity of ASCII + for determining the similarity of alphanumeric text based on trigram matching, as well as index operator classes that support fast searching for similar strings. @@ -28,7 +28,9 @@ - A string is considered to have two spaces + pg_trgm ignores non-word characters + (non-alphanumerics) when extracting trigrams from a string. + Each word is considered to have two spaces prefixed and one space suffixed when determining the set of trigrams contained in the string. For example, the set of trigrams in the string @@ -37,6 +39,16 @@ ca, cat, and at . + The set of trigrams in the string + foo|bar is + f, + fo, + foo, + oo , + b, + ba, + bar, and + ar . @@ -145,9 +157,10 @@ operator classes that allow you to create an index over a text column for the purpose of very fast similarity searches. These index types support the above-described similarity operators, and additionally support - trigram-based index searches for LIKE and ILIKE - queries. (These indexes do not support equality nor simple comparison - operators, so you may need a regular B-tree index too.) + trigram-based index searches for LIKE, ILIKE, + ~ and ~* queries. (These indexes do not + support equality nor simple comparison operators, so you may need a + regular B-tree index too.) @@ -202,6 +215,26 @@ SELECT * FROM test_trgm WHERE t LIKE '%foo%bar'; searches, the search string need not be left-anchored. + + Beginning in PostgreSQL 9.3, pg_trgm + GIN indexes also support index searches for regular-expression matches + (~ and ~* operators), for example + +SELECT * FROM test_trgm WHERE t ~ '(foo|bar)'; + + The index search works by extracting trigrams from the regular expression + and then looking these up in the index. The more trigrams that can be + extracted from the regular expression, the more effective the index search + is. Unlike B-tree based searches, the search string need not be + left-anchored. + + + + For both LIKE and regular-expression searches, keep in mind + that a pattern with no extractable trigrams will degenerate to a full-index + scan. + + The choice between GiST and GIN indexing depends on the relative performance characteristics of GiST and GIN, which are discussed elsewhere. -- cgit v1.2.3