From 0436f6bde8848b7135f19dd7f8548b8c2ae89a34 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Tue, 13 Jun 2017 23:46:39 -0400 Subject: Disallow set-returning functions inside CASE or COALESCE. When we reimplemented SRFs in commit 69f4b9c85, our initial choice was to allow the behavior to vary from historical practice in cases where a SRF call appeared within a conditional-execution construct (currently, only CASE or COALESCE). But that was controversial to begin with, and subsequent discussion has resulted in a consensus that it's better to throw an error instead of executing the query differently from before, so long as we can provide a reasonably clear error message and a way to rewrite the query. Hence, add a parser mechanism to allow detection of such cases during parse analysis. The mechanism just requires storing, in the ParseState, a pointer to the set-returning FuncExpr or OpExpr most recently emitted by parse analysis. Then the parsing functions for CASE and COALESCE can detect the presence of a SRF in their arguments by noting whether this pointer changes while analyzing their arguments. Furthermore, if it does, it provides a suitable error cursor location for the complaint. (This means that if there's more than one SRF in the arguments, the error will point at the last one to be analyzed not the first. While connoisseurs of parsing behavior might find that odd, it's unlikely the average user would ever notice.) While at it, we can also provide more specific error messages than before about some pre-existing restrictions, such as no-SRFs-within-aggregates. Also, reject at parse time cases where a NULLIF or IS DISTINCT FROM construct would need to return a set. We've never supported that, but the restriction is depended on in more subtle ways now, so it seems wise to detect it at the start. Also, provide some documentation about how to rewrite a SRF-within-CASE query using a custom wrapper SRF. It turns out that the information_schema.user_mapping_options view contained an instance of exactly the behavior we're now forbidding; but rewriting it makes it more clear and safer too. initdb forced because of user_mapping_options change. Patch by me, with error message suggestions from Alvaro Herrera and Andres Freund, pursuant to a complaint from Regina Obe. Discussion: https://postgr.es/m/000001d2d5de$d8d66170$8a832450$@pcorp.us --- doc/src/sgml/xfunc.sgml | 83 ++++++++++++++++++++++++++++++++++--------------- 1 file changed, 58 insertions(+), 25 deletions(-) (limited to 'doc/src') diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml index 05f4312bf3e..1a6c3b9bc2f 100644 --- a/doc/src/sgml/xfunc.sgml +++ b/doc/src/sgml/xfunc.sgml @@ -997,6 +997,29 @@ SELECT name, listchildren(name) FROM nodes; the LATERAL syntax. + + PostgreSQL's behavior for a set-returning function in a + query's select list is almost exactly the same as if the set-returning + function had been written in a LATERAL FROM-clause item + instead. For example, + +SELECT x, generate_series(1,5) AS g FROM tab; + + is almost equivalent to + +SELECT x, g FROM tab, LATERAL generate_series(1,5) AS g; + + It would be exactly the same, except that in this specific example, + the planner could choose to put g on the outside of the + nestloop join, since g has no actual lateral dependency + on tab. That would result in a different output row + order. Set-returning functions in the select list are always evaluated + as though they are on the inside of a nestloop join with the rest of + the FROM clause, so that the function(s) are run to + completion before the next row from the FROM clause is + considered. + + If there is more than one set-returning function in the query's select list, the behavior is similar to what you get from putting the functions @@ -1028,32 +1051,19 @@ SELECT srf1(srf2(x), srf3(y)), srf4(srf5(z)) FROM tab; - This behavior also means that set-returning functions will be evaluated - even when it might appear that they should be skipped because of a - conditional-evaluation construct, such as CASE - or COALESCE. For example, consider + Set-returning functions cannot be used within conditional-evaluation + constructs, such as CASE or COALESCE. For + example, consider SELECT x, CASE WHEN x > 0 THEN generate_series(1, 5) ELSE 0 END FROM tab; - It might seem that this should produce five repetitions of input - rows that have x > 0, and a single repetition of those - that do not; but actually it will produce five repetitions of every - input row. This is because generate_series() is run first, - and then the CASE expression is applied to its result rows. - The behavior is thus comparable to - -SELECT x, CASE WHEN x > 0 THEN g ELSE 0 END - FROM tab, LATERAL generate_series(1,5) AS g; - - It would be exactly the same, except that in this specific example, - the planner could choose to put g on the outside of the - nestloop join, since g has no actual lateral dependency - on tab. That would result in a different output row - order. Set-returning functions in the select list are always evaluated - as though they are on the inside of a nestloop join with the rest of - the FROM clause, so that the function(s) are run to - completion before the next row from the FROM clause is - considered. + It might seem that this should produce five repetitions of input rows + that have x > 0, and a single repetition of those that do + not; but actually, because generate_series(1, 5) would be + run in an implicit LATERAL FROM item before + the CASE expression is ever evaluated, it would produce five + repetitions of every input row. To reduce confusion, such cases produce + a parse-time error instead. @@ -1078,11 +1088,34 @@ SELECT x, CASE WHEN x > 0 THEN g ELSE 0 END functions. Also, nested set-returning functions did not work as described above; instead, a set-returning function could have at most one set-returning argument, and each nest of set-returning functions - was run independently. The behavior for conditional execution - (set-returning functions inside CASE etc) was different too. + was run independently. Also, conditional execution (set-returning + functions inside CASE etc) was previously allowed, + complicating things even more. Use of the LATERAL syntax is recommended when writing queries that need to work in older PostgreSQL versions, because that will give consistent results across different versions. + If you have a query that is relying on conditional execution of a + set-returning function, you may be able to fix it by moving the + conditional test into a custom set-returning function. For example, + +SELECT x, CASE WHEN y > 0 THEN generate_series(1, z) ELSE 5 END FROM tab; + + could become + +CREATE FUNCTION case_generate_series(cond bool, start int, fin int, els int) + RETURNS SETOF int AS $$ +BEGIN + IF cond THEN + RETURN QUERY SELECT generate_series(start, fin); + ELSE + RETURN QUERY SELECT els; + END IF; +END$$ LANGUAGE plpgsql; + +SELECT x, case_generate_series(y > 0, 1, z, 5) FROM tab; + + This formulation will work the same in all versions + of PostgreSQL. -- cgit v1.2.3