From 70dc4c509b330fdd965d795e8d7f41f09d56c9ae Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Tue, 31 Mar 2020 11:14:30 -0400 Subject: Fix lquery's NOT handling, and add ability to quantify non-'*' items. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The existing implementation of the ltree ~ lquery match operator is sufficiently complex and undocumented that it's hard to tell exactly what it does. But one thing it clearly gets wrong is the combination of NOT symbols (!) and '*' symbols. A pattern such as '*.!foo.*' should, by any ordinary understanding of regular expression behavior, match any ltree that has at least one label that's not "foo". As best we can tell by experimentation, what it's actually matching is any ltree in which *no* label is "foo". That's surprising, and not at all what the documentation says. Now, that's arguably a useful behavior, so if we rewrite to fix the bug we should provide some other way to get it. To do so, add the ability to attach lquery quantifiers to non-'*' items as well as '*'s. Then the pattern '!foo{,}' expresses "any ltree in which no label is foo". For backwards compatibility, the default quantifier for non-'*' items has to be "{1}", although the default for '*' items is '{,}'. I wouldn't have done it like that in a green field, but it's not totally horrible. Armed with that, rewrite checkCond() from scratch. Treating '*' and non-'*' items alike makes it simpler, not more complicated, so that the function actually gets a lot shorter than it was. Filip RembiaƂkowski, Tom Lane, Nikita Glukhov, per a very ancient bug report from M. Palm Discussion: https://postgr.es/m/CAP_rww=waX2Oo6q+MbMSiZ9ktdj6eaJj0cQzNu=Ry2cCDij5fw@mail.gmail.com --- doc/src/sgml/ltree.sgml | 38 ++++++++++++++++++++++++-------------- 1 file changed, 24 insertions(+), 14 deletions(-) (limited to 'doc/src') diff --git a/doc/src/sgml/ltree.sgml b/doc/src/sgml/ltree.sgml index ae4b33ec85e..d7dd55540a8 100644 --- a/doc/src/sgml/ltree.sgml +++ b/doc/src/sgml/ltree.sgml @@ -60,7 +60,8 @@ lquery represents a regular-expression-like pattern for matching ltree values. A simple word matches that label within a path. A star symbol (*) matches zero - or more labels. For example: + or more labels. These can be joined with dots to form a pattern that + must match the whole label path. For example: foo Match the exact label path foo *.foo.* Match any label path containing the label foo @@ -69,19 +70,25 @@ foo Match the exact label path foo - Star symbols can also be quantified to restrict how many labels - they can match: + Both star symbols and simple words can be quantified to restrict how many + labels they can match: *{n} Match exactly n labels *{n,} Match at least n labels *{n,m} Match at least n but not more than m labels -*{,m} Match at most m labels — same as *{0,m} +*{,m} Match at most m labels — same as *{0,m} +foo{n,m} Match at least n but not more than m occurrences of foo +foo{,} Match any number of occurrences of foo, including zero + In the absence of any explicit quantifier, the default for a star symbol + is to match any number of labels (that is, {,}) while + the default for a non-star item is to match exactly once (that + is, {1}). There are several modifiers that can be put at the end of a non-star - label in lquery to make it match more than just the exact match: + lquery item to make it match more than just the exact match: @ Match case-insensitively, for example a@ matches A * Match any label with this prefix, for example foo* matches foobar @@ -97,17 +104,20 @@ foo Match the exact label path foo - Also, you can write several possibly-modified labels separated with - | (OR) to match any of those labels, and you can put - ! (NOT) at the start to match any label that doesn't - match any of the alternatives. + Also, you can write several possibly-modified non-star items separated with + | (OR) to match any of those items, and you can put + ! (NOT) at the start of a non-star group to match any + label that doesn't match any of the alternatives. A quantifier, if any, + goes at the end of the group; it means some number of matches for the + group as a whole (that is, some number of labels matching or not matching + any of the alternatives). Here's an annotated example of lquery: -Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain -a. b. c. d. e. +Top.*{0,2}.sport*@.!football|tennis{1,}.Russ*|Spain +a. b. c. d. e. This query will match any label path that: @@ -129,8 +139,8 @@ a. b. c. d. e. - then a label not matching football nor - tennis + then has one or more labels, none of which + match football nor tennis @@ -632,7 +642,7 @@ ltreetest=> SELECT path FROM test WHERE path ~ '*.Astronomy.*'; Top.Collections.Pictures.Astronomy.Astronauts (7 rows) -ltreetest=> SELECT path FROM test WHERE path ~ '*.!pictures@.*.Astronomy.*'; +ltreetest=> SELECT path FROM test WHERE path ~ '*.!pictures@.Astronomy.*'; path ------------------------------------ Top.Science.Astronomy -- cgit v1.2.3