diff options
| author | Tom Lane <tgl@sss.pgh.pa.us> | 1999-12-13 17:39:38 +0000 | 
|---|---|---|
| committer | Tom Lane <tgl@sss.pgh.pa.us> | 1999-12-13 17:39:38 +0000 | 
| commit | ff6fe1502d2d8553fa3a305483344cb90a6411d4 (patch) | |
| tree | 4e33fcae7c921cf375d2d29f7c3d0c63ac4350c6 /doc | |
| parent | 662371cc5d56d76cc2530c10b13d535f24ce5457 (diff) | |
Update documentation to reflect availability of aggregate(DISTINCT).
Try to provide a more lucid discussion in 'Using Aggregate Functions'
tutorial section.
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/src/sgml/query.sgml | 73 | ||||
| -rw-r--r-- | doc/src/sgml/ref/select.sgml | 21 | ||||
| -rw-r--r-- | doc/src/sgml/syntax.sgml | 53 | 
3 files changed, 111 insertions, 36 deletions
| diff --git a/doc/src/sgml/query.sgml b/doc/src/sgml/query.sgml index a1830eb0297..961bc0d9dc7 100644 --- a/doc/src/sgml/query.sgml +++ b/doc/src/sgml/query.sgml @@ -361,39 +361,90 @@ DELETE FROM classname;      Like  most  other  query  languages,       <ProductName>PostgreSQL</ProductName> supports      aggregate functions. -    The current  implementation  of -    <ProductName>Postgres</ProductName> aggregate functions have some limitations. -    Specifically, while there  are  aggregates  to  compute -    such  functions as the <Function>count</Function>, <Function>sum</Function>, +    An aggregate function computes a single result from multiple input rows. +    For example, there are aggregates to compute the +    <Function>count</Function>, <Function>sum</Function>,      <Function>avg</Function> (average), <Function>max</Function> (maximum) and -    <Function>min</Function> (minimum) over a set of instances,  aggregates  can  only -    appear  in  the  target  list of a query and not directly in the -    qualification (the where clause). As an example, +    <Function>min</Function> (minimum) over a set of instances. +   </para> + +   <Para> +    It is important to understand the interaction between aggregates and +    SQL's <Command>where</Command> and <Command>having</Command> clauses. +    The fundamental difference between <Command>where</Command> and +    <Command>having</Command> is this: <Command>where</Command> selects +    input rows before groups and aggregates are computed (thus, it controls +    which rows go into the aggregate computation), whereas +    <Command>having</Command> selects group rows after groups and +    aggregates are computed.  Thus, the +    <Command>where</Command> clause may not contain aggregate functions; +    it makes no sense to try to use an aggregate to determine which rows +    will be inputs to the aggregates.  On the other hand, +    <Command>having</Command> clauses always contain aggregate functions. +    (Strictly speaking, you are allowed to write a <Command>having</Command> +    clause that doesn't use aggregates, but it's wasteful; the same condition +    could be used more efficiently at the <Command>where</Command> stage.) +   </para> + +   <Para> +    As an example, we can find the highest low-temperature reading anywhere +    with      <ProgramListing>  SELECT max(temp_lo) FROM weather;      </ProgramListing> -    is allowed, while +    If we want to know which city (or cities) that reading occurred in, +    we might try      <ProgramListing>  SELECT city FROM weather WHERE temp_lo = max(temp_lo);      </ProgramListing> -    is not. However, as is often the case the query can be restated to accomplish  -    the intended result; here by using a <FirstTerm>subselect</FirstTerm>: +    but this will not work since the aggregate max() can't be used in +    <Command>where</Command>. However, as is often the case the query can be +    restated to accomplish the intended result; here by using a +    <FirstTerm>subselect</FirstTerm>:      <ProgramListing>  SELECT city FROM weather WHERE temp_lo = (SELECT max(temp_lo) FROM weather);      </ProgramListing> +    This is OK because the sub-select is an independent computation that +    computes its own aggregate separately from what's happening in the outer +    select.     </Para>     <Para> -    Aggregates may also have <FirstTerm>group by</FirstTerm> clauses: +    Aggregates are also very useful in combination with +    <FirstTerm>group by</FirstTerm> clauses.  For example, we can get the +    maximum low temperature observed in each city with      <ProgramListing>  SELECT city, max(temp_lo)      FROM weather      GROUP BY city;      </ProgramListing> +    which gives us one output row per city.  We can filter these grouped +    rows using <Command>having</Command>: +    <ProgramListing> +SELECT city, max(temp_lo) +    FROM weather +    GROUP BY city +    HAVING min(temp_lo) < 0; +    </ProgramListing> +    which gives us the same results for only the cities that have some +    below-zero readings.  Finally, if we only care about cities whose +    names begin with 'P', we might do +    <ProgramListing> +SELECT city, max(temp_lo) +    FROM weather +    WHERE city like 'P%' +    GROUP BY city +    HAVING min(temp_lo) < 0; +    </ProgramListing> +    Note that we can apply the city-name restriction in +    <Command>where</Command>, since it needs no aggregate.  This is +    more efficient than adding the restriction to <Command>having</Command>, +    because we avoid doing the grouping and aggregate calculations +    for all rows that fail the <Command>where</Command> check.     </Para>    </sect1>   </Chapter> diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index f2afa38b7c9..070f8b43d0f 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -1,5 +1,5 @@  <!-- -$Header: /cvsroot/pgsql/doc/src/sgml/ref/select.sgml,v 1.22 1999/08/06 13:50:31 thomas Exp $ +$Header: /cvsroot/pgsql/doc/src/sgml/ref/select.sgml,v 1.23 1999/12/13 17:39:38 tgl Exp $  Postgres documentation  --> @@ -202,10 +202,10 @@ SELECT [ ALL | DISTINCT [ ON <replaceable class="PARAMETER">column</replaceable>    <para>     <command>DISTINCT</command> will eliminate all duplicate rows from the -   selection. +   result.     <command>DISTINCT ON <replaceable class="PARAMETER">column</replaceable></command>     will eliminate all duplicates in the specified column; this is -   equivalent to using +   similar to using     <command>GROUP BY <replaceable class="PARAMETER">column</replaceable></command>.     <command>ALL</command> will return all candidate rows,     including duplicates. @@ -320,11 +320,13 @@ GROUP BY <replaceable class="PARAMETER">column</replaceable> [, ...]     <para>      GROUP BY will condense into a single row all rows that share the -    same values for the -    grouped columns; aggregates return values derived from all rows -    that make up the group.  The value returned for an ungrouped -    and unaggregated column is dependent on the order in which rows -    happen to be read from the database. +    same values for the grouped columns.  Aggregate functions, if any, +    are computed across all rows making up each group, producing a +    separate value for each group (whereas without GROUP BY, an +    aggregate produces a single value computed across all the selected +    rows).  When GROUP BY is present, it is not valid to refer to +    ungrouped columns except within aggregate functions, since there +    would be more than one possible value to return for an ungrouped column.     </para>    </refsect2> @@ -354,7 +356,8 @@ HAVING <replaceable class="PARAMETER">cond_expr</replaceable>     <para>      Each column referenced in       <replaceable class="PARAMETER">cond_expr</replaceable> shall unambiguously -    reference a grouping column. +    reference a grouping column, unless the reference appears within an +    aggregate function.     </para>    </refsect2> diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index a74bd08be11..beacf8fbdf0 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -642,15 +642,16 @@ CAST '<replaceable>string</replaceable>' AS <replaceable>type</replaceable>       <member><replaceable>a_expr</replaceable> <replaceable>right_unary_operator</replaceable></member>       <member><replaceable>left_unary_operator</replaceable> <replaceable>a_expr</replaceable></member>       <member>parameter</member> -     <member>functional expressions</member> -     <member>aggregate expressions</member> +     <member>functional expression</member> +     <member>aggregate expression</member>      </simplelist>     </para>     <para> -    We have already discussed constants and attributes.  The two kinds of -    operator expressions indicate respectively binary and left_unary -    expressions.  The following sections discuss the remaining options. +    We have already discussed constants and attributes.  The three kinds of +    operator expressions indicate respectively binary (infix), right-unary +    (suffix) and left-unary (prefix) operators.  The following sections +    discuss the remaining options.     </para>     <sect2> @@ -690,7 +691,7 @@ CREATE FUNCTION dept (name)       enclosed in parentheses:       <synopsis> -<replaceable>function</replaceable> (<replaceable>a_expr</replaceable> [, <replaceable>a_expr</replaceable> ) +<replaceable>function</replaceable> (<replaceable>a_expr</replaceable> [, <replaceable>a_expr</replaceable> ... ] )       </synopsis>      </para> @@ -705,20 +706,40 @@ sqrt(emp.salary)     </sect2>     <sect2> -    <title>Aggregate Expression</title> +    <title>Aggregate Expressions</title>      <para> -     An <firstterm>aggregate expression</firstterm> -     represents a simple aggregate (i.e., one that computes a single value) -     or an aggregate function (i.e., one that computes a set of values). -     The syntax is the following: +     An <firstterm>aggregate expression</firstterm> represents the application +     of an aggregate function across the rows selected by a query. +     An aggregate function reduces multiple inputs to a single output value, +     such as the sum or average of the inputs. +     The syntax of an aggregate expression is one of the following: -     <synopsis> -<replaceable>aggregate_name</replaceable> (<replaceable>attribute</replaceable>) -     </synopsis> +    <simplelist> +     <member><replaceable>aggregate_name</replaceable> (<replaceable>expression</replaceable>)</member> +     <member><replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable>)</member> +     <member><replaceable>aggregate_name</replaceable> ( * )</member> +    </simplelist> + +     where <replaceable>aggregate_name</replaceable> is a previously defined +     aggregate, and <replaceable>expression</replaceable> is any expression +     that doesn't itself contain an aggregate expression. +    </para> -     where <replaceable>aggregate_name</replaceable> -     must be a previously defined aggregate. +    <para> +     The first form of aggregate expression invokes the aggregate across all +     input rows for which the given expression yields a non-null value. +     The second form invokes the aggregate for all distinct non-null values +     of the expression found in the input rows.  The last form invokes the +     aggregate once for each input row regardless of null or non-null values; +     since no particular input value is specified, it is generally only useful +     for the count() aggregate. +    </para> + +    <para> +      For example, count(*) yields the total number of input rows; +      count(f1) yields the number of input rows in which f1 is non-null; +      count(distinct f1) yields the number of distinct non-null values of f1.      </para>     </sect2> | 
