Implement SQL-compliant treatment of row comparisons for < <= > >= cases

(previously we only did = and <> correctly). Also, allow row comparisons with any operators that are in btree opclasses, not only those with these specific names. This gets rid of a whole lot of indefensible assumptions about the behavior of particular operators based on their names ... though it's still true that IN and NOT IN expand to "= ANY". The patch adds a RowCompareExpr expression node type, and makes some changes in the representation of ANY/ALL/ROWCOMPARE SubLinks so that they can share code with RowCompareExpr. I have not yet done anything about making RowCompareExpr an indexable operator, but will look at that soon. initdb forced due to changes in stored rules.
author: Tom Lane <tgl@sss.pgh.pa.us> 2005-12-28 01:30:02 +0000
committer: Tom Lane <tgl@sss.pgh.pa.us> 2005-12-28 01:30:02 +0000
commit: 6e07709760a29d8dbfb93b9846c905bd40689082 (patch)
tree: 9bf0084587d7e313ba087ce53c24bc748c63a456 /doc/src
parent: a37422e042a6114ab0e513f50dac4a47fab22313 (diff)
1 files changed, 93 insertions, 65 deletions
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 86e01ff113d..d90bc15d411 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1,5 +1,5 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.300 2005/12/21 23:22:55 momjian Exp $
+$PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.301 2005/12/28 01:29:58 tgl Exp $
 PostgreSQL documentation
 -->
 
@@ -350,18 +350,18 @@ PostgreSQL documentation
     </indexterm>
     The ordinary comparison operators yield null (signifying <quote>unknown</>)
     when either input is null.  Another way to do comparisons is with the
-    <literal>IS DISTINCT FROM</literal> construct:
+    <literal>IS <optional> NOT </> DISTINCT FROM</literal> construct:
 <synopsis>
 <replaceable>expression</replaceable> IS DISTINCT FROM <replaceable>expression</replaceable>
 <replaceable>expression</replaceable> IS NOT DISTINCT FROM <replaceable>expression</replaceable>
 </synopsis>
-    For non-null inputs, <literal>IS DISTINCT FROM</literal> this is
+    For non-null inputs, <literal>IS DISTINCT FROM</literal> is
     the same as the <literal>&lt;&gt;</> operator.  However, when both
     inputs are null it will return false, and when just one input is
     null it will return true.  Similarly, <literal>IS NOT DISTINCT
     FROM</literal> is identical to <literal>=</literal> for non-null
-    inputs, returns true when both inputs are null, and false
-    otherwise. Thus, these constructs effectively act as though null
+    inputs, but it returns true when both inputs are null, and false when only
+    one input is null. Thus, these constructs effectively act as though null
     were a normal data value, rather than <quote>unknown</>.
    </para>
 
@@ -7999,8 +7999,8 @@ SELECT col1 FROM tab1
    equal if all their corresponding members are non-null and equal; the rows
    are unequal if any corresponding members are non-null and unequal;
    otherwise the result of that row comparison is unknown (null).
-   If all the row results are either unequal or null, with at least one null,
-   then the result of <token>IN</token> is null.
+   If all the per-row results are either unequal or null, with at least one
+   null, then the result of <token>IN</token> is null.
   </para>
   </sect2>
 
@@ -8055,8 +8055,8 @@ SELECT col1 FROM tab1
    equal if all their corresponding members are non-null and equal; the rows
    are unequal if any corresponding members are non-null and unequal;
    otherwise the result of that row comparison is unknown (null).
-   If all the row results are either unequal or null, with at least one null,
-   then the result of <token>NOT IN</token> is null.
+   If all the per-row results are either unequal or null, with at least one
+   null, then the result of <token>NOT IN</token> is null.
   </para>
   </sect2>
 
@@ -8109,23 +8109,19 @@ SELECT col1 FROM tab1
    subquery, which must return exactly as many columns as there are
    expressions in the left-hand row.  The left-hand expressions are
    evaluated and compared row-wise to each row of the subquery result,
-   using the given <replaceable>operator</replaceable>.  Presently,
-   only <literal>=</literal> and <literal>&lt;&gt;</literal> operators are allowed
-   in row-wise <token>ANY</token> constructs.
-   The result of <token>ANY</token> is <quote>true</> if any equal or unequal row is
-   found, respectively.
-   The result is <quote>false</> if no such row is found (including the special
-   case where the subquery returns no rows).
+   using the given <replaceable>operator</replaceable>.
+   The result of <token>ANY</token> is <quote>true</> if the comparison
+   returns true for any subquery row.
+   The result is <quote>false</> if the comparison returns false for every
+   subquery row (including the special case where the subquery returns no
+   rows).
+   The result is NULL if the comparison does not return true for any row,
+   and it returns NULL for at least one row.
   </para>
 
   <para>
-   As usual, null values in the rows are combined per
-   the normal rules of SQL Boolean expressions.  Two rows are considered
-   equal if all their corresponding members are non-null and equal; the rows
-   are unequal if any corresponding members are non-null and unequal;
-   otherwise the result of that row comparison is unknown (null).
-   If there is at least one null row result, then the result of <token>ANY</token>
-   cannot be false; it will be true or null. 
+   See <xref linkend="row-wise-comparison"> for details about the meaning
+   of a row-wise comparison.
   </para>
   </sect2>
 
@@ -8145,6 +8141,8 @@ SELECT col1 FROM tab1
    The result of <token>ALL</token> is <quote>true</> if all rows yield true
    (including the special case where the subquery returns no rows).
    The result is <quote>false</> if any false result is found.
+   The result is NULL if the comparison does not return false for any row,
+   and it returns NULL for at least one row.
   </para>
 
   <para>
@@ -8152,14 +8150,6 @@ SELECT col1 FROM tab1
   </para>
 
   <para>
-   Note that if there are no failures but at least one right-hand row yields
-   null for the operator's result, the result of the <token>ALL</token> construct
-   will be null, not true.
-   This is in accordance with SQL's normal rules for Boolean combinations
-   of null values.
-  </para>
-
-  <para>
    As with <token>EXISTS</token>, it's unwise to assume that the subquery will
    be evaluated completely.
   </para>
@@ -8175,24 +8165,19 @@ SELECT col1 FROM tab1
    subquery, which must return exactly as many columns as there are
    expressions in the left-hand row.  The left-hand expressions are
    evaluated and compared row-wise to each row of the subquery result,
-   using the given <replaceable>operator</replaceable>.  Presently,
-   only <literal>=</literal> and <literal>&lt;&gt;</literal> operators are allowed
-   in row-wise <token>ALL</token> queries.
-   The result of <token>ALL</token> is <quote>true</> if all subquery rows are equal
-   or unequal, respectively (including the special
+   using the given <replaceable>operator</replaceable>.
+   The result of <token>ALL</token> is <quote>true</> if the comparison
+   returns true for all subquery rows (including the special
    case where the subquery returns no rows).
-   The result is <quote>false</> if any row is found to be unequal or equal,
-   respectively.
+   The result is <quote>false</> if the comparison returns false for any
+   subquery row.
+   The result is NULL if the comparison does not return false for any
+   subquery row, and it returns NULL for at least one row.
   </para>
 
   <para>
-   As usual, null values in the rows are combined per
-   the normal rules of SQL Boolean expressions.  Two rows are considered
-   equal if all their corresponding members are non-null and equal; the rows
-   are unequal if any corresponding members are non-null and unequal;
-   otherwise the result of that row comparison is unknown (null).
-   If there is at least one null row result, then the result of <token>ALL</token>
-   cannot be true; it will be false or null. 
+   See <xref linkend="row-wise-comparison"> for details about the meaning
+   of a row-wise comparison.
   </para>
   </sect2>
 
@@ -8216,17 +8201,11 @@ SELECT col1 FROM tab1
    the subquery cannot return more than one row.  (If it returns zero rows,
    the result is taken to be null.)  The left-hand side is evaluated and
    compared row-wise to the single subquery result row.
-   Presently, only <literal>=</literal> and <literal>&lt;&gt;</literal> operators are allowed
-   in row-wise comparisons.
-   The result is <quote>true</> if the two rows are equal or unequal, respectively.
   </para>
 
   <para>
-   As usual, null values in the rows are combined per
-   the normal rules of SQL Boolean expressions.  Two rows are considered
-   equal if all their corresponding members are non-null and equal; the rows
-   are unequal if any corresponding members are non-null and unequal;
-   otherwise the result of the row comparison is unknown (null).
+   See <xref linkend="row-wise-comparison"> for details about the meaning
+   of a row-wise comparison.
   </para>
   </sect2>
  </sect1>
@@ -8256,6 +8235,10 @@ SELECT col1 FROM tab1
   </indexterm>
 
   <indexterm>
+   <primary>row-wise comparison</primary>
+  </indexterm>
+
+  <indexterm>
    <primary>comparison</primary>
    <secondary>row-wise</secondary>
   </indexterm>
@@ -8265,6 +8248,10 @@ SELECT col1 FROM tab1
   </indexterm>
 
   <indexterm>
+   <primary>IS NOT DISTINCT FROM</primary>
+  </indexterm>
+
+  <indexterm>
    <primary>IS NULL</primary>
   </indexterm>
 
@@ -8288,7 +8275,7 @@ SELECT col1 FROM tab1
    <title><literal>IN</literal></title>
 
 <synopsis>
-<replaceable>expression</replaceable> IN (<replaceable>value</replaceable><optional>, ...</optional>)
+<replaceable>expression</replaceable> IN (<replaceable>value</replaceable> <optional>, ...</optional>)
 </synopsis>
 
   <para>
@@ -8319,7 +8306,7 @@ OR
    <title><literal>NOT IN</literal></title>
 
 <synopsis>
-<replaceable>expression</replaceable> NOT IN (<replaceable>value</replaceable><optional>, ...</optional>)
+<replaceable>expression</replaceable> NOT IN (<replaceable>value</replaceable> <optional>, ...</optional>)
 </synopsis>
 
   <para>
@@ -8425,7 +8412,7 @@ AND
   </para>
   </sect2>
 
-  <sect2>
+  <sect2 id="row-wise-comparison">
    <title>Row-wise Comparison</title>
 
 <synopsis>
@@ -8436,23 +8423,52 @@ AND
    Each side is a row constructor,
    as described in <xref linkend="sql-syntax-row-constructors">.
    The two row values must have the same number of fields.
-   Each side is evaluated and they are compared row-wise.
-   Presently, only <literal>=</literal> and <literal>&lt;&gt;</literal> operators are allowed
-   in row-wise comparisons.
-   The result is <quote>true</> if the two rows are equal or unequal, respectively.
+   Each side is evaluated and they are compared row-wise.  Row comparisons
+   are allowed when the <replaceable>operator</replaceable> is
+   <literal>=</>,
+   <literal>&lt;&gt;</>,
+   <literal>&lt;</>,
+   <literal>&lt;=</>,
+   <literal>&gt;</> or
+   <literal>&gt;=</>,
+   or has semantics similar to one of these.  (To be specific, an operator
+   can be a row comparison operator if it is a member of a btree operator
+   class, or is the negator of the <literal>=</> member of a btree operator
+   class.)
   </para>
 
   <para>
-   As usual, null values in the rows are combined per
-   the normal rules of SQL Boolean expressions.  Two rows are considered
+   The <literal>=</> and <literal>&lt;&gt;</> cases work slightly differently
+   from the others.  Two rows are considered
    equal if all their corresponding members are non-null and equal; the rows
    are unequal if any corresponding members are non-null and unequal;
    otherwise the result of the row comparison is unknown (null).
   </para>
 
-  <indexterm>
-   <primary>IS DISTINCT FROM</primary>
-  </indexterm>
+  <para>
+   For the <literal>&lt;</>, <literal>&lt;=</>, <literal>&gt;</> and
+   <literal>&gt;=</> cases, the row elements are compared left-to-right,
+   stopping as soon as an unequal or null pair of elements is found.
+   If either of this pair of elements is null, the result of the
+   row comparison is unknown (null); otherwise comparison of this pair
+   of elements determines the result.  For example,
+   <literal>ROW(1,2,NULL) &lt; ROW(1,3,0)</>
+   yields true, not null, because the third pair of elements are not
+   considered.
+  </para>
+
+  <note>
+   <para>
+    Prior to <productname>PostgreSQL</productname> 8.2, the
+    <literal>&lt;</>, <literal>&lt;=</>, <literal>&gt;</> and <literal>&gt;=</>
+    cases were not handled per SQL specification.  A comparison like
+    <literal>ROW(a,b) &lt; ROW(c,d)</>
+    was implemented as
+    <literal>a &lt; c AND b &lt; d</>
+    whereas the correct behavior is equivalent to
+    <literal>a &lt; c OR (a = c AND b &lt; d)</>.
+   </para>
+  </note>
 
 <synopsis>
 <replaceable>row_constructor</replaceable> IS DISTINCT FROM <replaceable>row_constructor</replaceable>
@@ -8467,6 +8483,18 @@ AND
   </para>
 
 <synopsis>
+<replaceable>row_constructor</replaceable> IS NOT DISTINCT FROM <replaceable>row_constructor</replaceable>
+</synopsis>
+
+  <para>
+   This construct is similar to a <literal>=</literal> row comparison,
+   but it does not yield null for null inputs.  Instead, any null value is
+   considered unequal to (distinct from) any non-null value, and any two
+   nulls are considered equal (not distinct).  Thus the result will always
+   be either true or false, never null.
+  </para>
+
+<synopsis>
 <replaceable>row_constructor</replaceable> IS NULL
 <replaceable>row_constructor</replaceable> IS NOT NULL
 </synopsis>
author	Tom Lane <tgl@sss.pgh.pa.us>	2005-12-28 01:30:02 +0000
committer	Tom Lane <tgl@sss.pgh.pa.us>	2005-12-28 01:30:02 +0000
commit	6e07709760a29d8dbfb93b9846c905bd40689082 (patch)
tree	9bf0084587d7e313ba087ce53c24bc748c63a456 /doc/src
parent	a37422e042a6114ab0e513f50dac4a47fab22313 (diff)