summaryrefslogtreecommitdiff
path: root/src/include/executor/nodeAgg.h
AgeCommit message (Collapse)Author
2024-01-03Update copyright for 2024Bruce Momjian
Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
2023-01-02Update copyright for 2023Bruce Momjian
Backpatch-through: 11
2022-08-02Improve performance of ORDER BY / DISTINCT aggregatesDavid Rowley
ORDER BY / DISTINCT aggreagtes have, since implemented in Postgres, been executed by always performing a sort in nodeAgg.c to sort the tuples in the current group into the correct order before calling the transition function on the sorted tuples. This was not great as often there might be an index that could have provided pre-sorted input and allowed the transition functions to be called as the rows come in, rather than having to store them in a tuplestore in order to sort them once all the tuples for the group have arrived. Here we change the planner so it requests a path with a sort order which supports the most amount of ORDER BY / DISTINCT aggregate functions and add new code to the executor to allow it to support the processing of ORDER BY / DISTINCT aggregates where the tuples are already sorted in the correct order. Since there can be many ORDER BY / DISTINCT aggregates in any given query level, it's very possible that we can't find an order that suits all of these aggregates. The sort order that the planner chooses is simply the one that suits the most aggregate functions. We take the most strictly sorted variation of each order and see how many aggregate functions can use that, then we try again with the order of the remaining aggregates to see if another order would suit more aggregate functions. For example: SELECT agg(a ORDER BY a),agg2(a ORDER BY a,b) ... would request the sort order to be {a, b} because {a} is a subset of the sort order of {a,b}, but; SELECT agg(a ORDER BY a),agg2(a ORDER BY c) ... would just pick a plan ordered by {a} (we give precedence to aggregates which are earlier in the targetlist). SELECT agg(a ORDER BY a),agg2(a ORDER BY b),agg3(a ORDER BY b) ... would choose to order by {b} since two aggregates suit that vs just one that requires input ordered by {a}. Author: David Rowley Reviewed-by: Ronan Dunklau, James Coleman, Ranier Vilela, Richard Guo, Tom Lane Discussion: https://postgr.es/m/CAApHDvpHzfo92%3DR4W0%2BxVua3BUYCKMckWAmo-2t_KiXN-wYH%3Dw%40mail.gmail.com
2022-01-07Update copyright for 2022Bruce Momjian
Backpatch-through: 10
2021-01-02Update copyright for 2021Bruce Momjian
Backpatch-through: 9.5
2020-07-28HashAgg: use better cardinality estimate for recursive spilling.Jeff Davis
Use HyperLogLog to estimate the group cardinality in a spilled partition. This estimate is used to choose the number of partitions if we recurse. The previous behavior was to use the number of tuples in a spilled partition as the estimate for the number of groups, which lead to overpartitioning. That could cause the number of batches to be much higher than expected (with each batch being very small), which made it harder to interpret EXPLAIN ANALYZE results. Reviewed-by: Peter Geoghegan Discussion: https://postgr.es/m/a856635f9284bc36f7a77d02f47bbb6aaf7b59b3.camel@j-davis.com Backpatch-through: 13
2020-06-19Fix EXPLAIN ANALYZE for parallel HashAgg plansDavid Rowley
Since 1f39bce02, HashAgg nodes have had the ability to spill to disk when memory consumption exceeds work_mem. That commit added new properties to EXPLAIN ANALYZE to show the maximum memory usage and disk usage, however, it didn't quite go as far as showing that information for parallel workers. Since workers may have experienced something very different from the main process, we should show this information per worker, as is done in Sort. Reviewed-by: Justin Pryzby Reviewed-by: Jeff Davis Discussion: https://postgr.es/m/CAApHDvpEKbfZa18mM1TD7qV6PG+w97pwCWq5tVD0dX7e11gRJw@mail.gmail.com Backpatch-through: 13, where the hashagg spilling code was added.
2020-05-14Initial pgindent and pgperltidy run for v13.Tom Lane
Includes some manual cleanup of places that pgindent messed up, most of which weren't per project style anyway. Notably, it seems some people didn't absorb the style rules of commit c9d297751, because there were a bunch of new occurrences of function calls with a newline just after the left paren, all with faulty expectations about how the rest of the call would get indented.
2020-04-03Include chunk overhead in hash table entry size estimate.Jeff Davis
Don't try to be precise about it, just use a constant 16 bytes of chunk overhead. Being smarter would require knowing the memory context where the chunk will be allocated, which is not known by all callers. Discussion: https://postgr.es/m/20200325220936.il3ni2fj2j2b45y5@alap3.anarazel.de
2020-03-18Disk-based Hash Aggregation.Jeff Davis
While performing hash aggregation, track memory usage when adding new groups to a hash table. If the memory usage exceeds work_mem, enter "spill mode". In spill mode, new groups are not created in the hash table(s), but existing groups continue to be advanced if input tuples match. Tuples that would cause a new group to be created are instead spilled to a logical tape to be processed later. The tuples are spilled in a partitioned fashion. When all tuples from the outer plan are processed (either by advancing the group or spilling the tuple), finalize and emit the groups from the hash table. Then, create new batches of work from the spilled partitions, and select one of the saved batches and process it (possibly spilling recursively). Author: Jeff Davis Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com
2020-02-06Refactor hash_agg_entry_size().Jeff Davis
Consolidate the calculations for hash table size estimation. This will help with upcoming Hash Aggregation work that will add additional call sites.
2020-01-01Update copyrights for 2020Bruce Momjian
Backpatch-through: update all files in master, backpatch legal files through 9.4
2019-08-16Remove redundant prototypes for SQL callable functions.Andres Freund
These aren't needed after 352a24a1f9d6. The remaining prototypes are not defined on the SQL level. Author: Andres Freund Discussion: https://postgr.es/m/20190803193733.g3l3x3o42uv4qj7l@alap3.anarazel.de
2019-07-22Fix inconsistencies and typos in the treeMichael Paquier
This is numbered take 7, and addresses a set of issues with code comments, variable names and unreferenced variables. Author: Alexander Lakhin Discussion: https://postgr.es/m/dff75442-2468-f74f-568c-6006e141062f@gmail.com
2019-01-26Change function call information to be variable length.Andres Freund
Before this change FunctionCallInfoData, the struct arguments etc for V1 function calls are stored in, always had space for FUNC_MAX_ARGS/100 arguments, storing datums and their nullness in two arrays. For nearly every function call 100 arguments is far more than needed, therefore wasting memory. Arg and argnull being two separate arrays also guarantees that to access a single argument, two cachelines have to be touched. Change the layout so there's a single variable-length array with pairs of value / isnull. That drastically reduces memory consumption for most function calls (on x86-64 a two argument function now uses 64bytes, previously 936 bytes), and makes it very likely that argument value and its nullness are on the same cacheline. Arguments are stored in a new NullableDatum struct, which, due to padding, needs more memory per argument than before. But as usually far fewer arguments are stored, and individual arguments are cheaper to access, that's still a clear win. It's likely that there's other places where conversion to NullableDatum arrays would make sense, e.g. TupleTableSlots, but that's for another commit. Because the function call information is now variable-length allocations have to take the number of arguments into account. For heap allocations that can be done with SizeForFunctionCallInfoData(), for on-stack allocations there's a new LOCAL_FCINFO(name, nargs) macro that helps to allocate an appropriately sized and aligned variable. Some places with stack allocation function call information don't know the number of arguments at compile time, and currently variably sized stack allocations aren't allowed in postgres. Therefore allow for FUNC_MAX_ARGS space in these cases. They're not that common, so for now that seems acceptable. Because of the need to allocate FunctionCallInfo of the appropriate size, older extensions may need to update their code. To avoid subtle breakages, the FunctionCallInfoData struct has been renamed to FunctionCallInfoBaseData. Most code only references FunctionCallInfo, so that shouldn't cause much collateral damage. This change is also a prerequisite for more efficient expression JIT compilation (by allocating the function call information on the stack, allowing LLVM to optimize it away); previously the size of the call information caused problems inside LLVM's optimizer. Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de
2019-01-02Update copyright for 2019Bruce Momjian
Backpatch-through: certain files through 9.4
2018-05-21Improve spelling of new FINALFUNC_MODIFY aggregate attribute.Tom Lane
I'd used SHARABLE as a value originally, but Peter Eisentraut points out that dictionaries agree that SHAREABLE is the preferred spelling. Run around and change that before it's too late. Discussion: https://postgr.es/m/d2e1afd4-659c-50d6-1b20-7cfd3675e909@2ndquadrant.com
2018-04-01Fix a boatload of typos in C comments.Tom Lane
Justin Pryzby Discussion: https://postgr.es/m/20180331105640.GK28454@telsasoft.com
2018-03-22Add FIELDNO_* macro designating offset into structs required for JIT.Andres Freund
For any interesting JIT target, fields inside structs need to be accessed. b96d550e contains infrastructure for syncing the definition of types between postgres C code and runtime code generation with LLVM. But that doesn't sync the number or names of fields inside structs, just the types (including padding etc). One option would be to hardcode the offset numbers in the JIT code, but that'd be hard to keep in sync. Instead add macros indicating the field offset to the fields that need to be accessed. Not pretty, but manageable. Author: Andres Freund Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-02-16Do execGrouping.c via expression eval machinery, take two.Andres Freund
This has a performance benefit on own, although not hugely so. The primary benefit is that it will allow for to JIT tuple deforming and comparator invocations. Large parts of this were previously committed (773aec7aa), but the commit contained an omission around cross-type comparisons and was thus reverted. Author: Andres Freund Discussion: https://postgr.es/m/20171129080934.amqqkke2zjtekd4t@alap3.anarazel.de
2018-02-15Revert "Do execGrouping.c via expression eval machinery."Andres Freund
This reverts commit 773aec7aa98abd38d6d9435913bb8e14e392c274. There's an unresolved issue in the reverted commit: It only creates one comparator function, but in for the nodeSubplan.c case we need more (c.f. FindTupleHashEntry vs LookupTupleHashEntry calls in nodeSubplan.c). This isn't too difficult to fix, but it's not entirely trivial either. The fact that the issue only causes breakage on 32bit systems shows that the current test coverage isn't that great. To avoid turning half the buildfarm red till those two issues are addressed, revert.
2018-02-15Do execGrouping.c via expression eval machinery.Andres Freund
This has a performance benefit on own, although not hugely so. The primary benefit is that it will allow for to JIT tuple deforming and comparator invocations. Author: Andres Freund Discussion: https://postgr.es/m/20171129080934.amqqkke2zjtekd4t@alap3.anarazel.de
2018-01-09Expression evaluation based aggregate transition invocation.Andres Freund
Previously aggregate transition and combination functions were invoked by special case code in nodeAgg.c, evaluating input and filters separately using the expression evaluation machinery. That turns out to not be great for performance for several reasons: - repeated expression evaluations have some cost - the transition functions invocations are poorly predicted, as commonly there are multiple aggregates in a query, resulting in the same call-stack invoking different functions. - filter and input computation had to be done separately - the special case code made it hard to implement JITing of the whole transition function invocation Address this by building one large expression that computes input, evaluates filters, and invokes transition functions. This leads to moderate speedups in queries bottlenecked by aggregate computations, and enables large speedups for similar cases once JITing is done. There's potential for further improvement: - It'd be nice if we could simplify the somewhat expensive aggstate->all_pergroups lookups. - right now there's still an advance_transition_function invocation in nodeAgg.c, leading to some code duplication. Author: Andres Freund Discussion: https://postgr.es/m/20170901064131.tazjxwus3k2w3ybh@alap3.anarazel.de
2018-01-02Update copyright for 2018Bruce Momjian
Backpatch-through: certain files through 9.3
2017-07-30Move ExecProcNode from dispatch to function pointer based model.Andres Freund
This allows us to add stack-depth checks the first time an executor node is called, and skip that overhead on following calls. Additionally it yields a nice speedup. While it'd probably have been a good idea to have that check all along, it has become more important after the new expression evaluation framework in b8d7f053c5c2bf2a7e - there's no stack depth check in common paths anymore now. We previously relied on ExecEvalExpr() being executed somewhere. We should move towards that model for further routines, but as this is required for v10, it seems better to only do the necessary (which already is quite large). Author: Andres Freund, Tom Lane Reported-By: Julien Rouhaud Discussion: https://postgr.es/m/22833.1490390175@sss.pgh.pa.us https://postgr.es/m/b0af9eaa-130c-60d0-9e4e-7a135b1e0c76@dalibo.com
2017-06-21Phase 2 of pgindent updates.Tom Lane
Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-01-03Update copyright via script for 2017Bruce Momjian
2016-01-02Update copyright for 2016Bruce Momjian
Backpatch certain files through 9.1
2015-01-06Update copyright for 2015Bruce Momjian
Backpatch certain files through 9.0
2014-01-07Update copyright for 2014Bruce Momjian
Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
2013-01-01Update copyrights for 2013Bruce Momjian
Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
2012-01-01Update copyright notices for year 2012.Bruce Momjian
2011-01-01Stamp copyrights for year 2011.Bruce Momjian
2010-09-20Remove cvs keywords from all files.Magnus Hagander
2010-07-12Make NestLoop plan nodes pass outer-relation variables into their innerTom Lane
relation using the general PARAM_EXEC executor parameter mechanism, rather than the ad-hoc kluge of passing the outer tuple down through ExecReScan. The previous method was hard to understand and could never be extended to handle parameters coming from multiple join levels. This patch doesn't change the set of possible plans nor have any significant performance effect, but it's necessary infrastructure for future generalization of the concept of an inner indexscan plan. ExecReScan's second parameter is now unused, so it's removed.
2010-01-02Update copyright for the year 2010.Bruce Momjian
2009-09-27Remove no-longer-needed ExecCountSlots infrastructure.Tom Lane
2009-01-01Update copyright for 2009.Bruce Momjian
2008-01-01Update copyrights in source tree to 2008.Bruce Momjian
2007-01-05Update CVS HEAD for 2007 copyright. Back branches are typically notBruce Momjian
back-stamped for this.
2006-07-13Allow include files to compile own their own.Bruce Momjian
Strip unused include files out unused include files, and add needed includes to C files. The next step is to remove unused include files in C files.
2006-03-05Update copyright for 2006. Update scripts.Bruce Momjian
2006-02-28Extend the ExecInitNode API so that plan nodes receive a set of flagTom Lane
bits indicating which optional capabilities can actually be exercised at runtime. This will allow Sort and Material nodes, and perhaps later other nodes, to avoid unnecessary overhead in common cases. This commit just adds the infrastructure and arranges to pass the correct flag values down to plan nodes; none of the actual optimizations are here yet. I'm committing this separately in case anyone wants to measure the added overhead. (It should be negligible.) Simon Riggs and Tom Lane
2005-01-28Improve planner's estimation of the space needed for HashAgg plans:Tom Lane
look at the actual aggregate transition datatypes and the actual overhead needed by nodeAgg.c, instead of using pessimistic round numbers. Per a discussion with Michael Tiemann.
2004-12-31Tag appropriate files for rc3PostgreSQL Daemon
Also performed an initial run through of upgrading our Copyright date to extend to 2005 ... first run here was very simple ... change everything where: grep 1996-2004 && the word 'Copyright' ... scanned through the generated list with 'less' first, and after, to make sure that I only picked up the right entries ...
2004-08-29Update copyright to 2004.Bruce Momjian
2003-11-29make sure the $Id tags are converted to $PostgreSQL as well ...PostgreSQL Daemon
2003-08-04Update copyrights to 2003.Bruce Momjian
2003-01-10Create a new file executor/execGrouping.c to centralize utility routinesTom Lane
shared by nodeGroup, nodeAgg, and soon nodeSubplan.
2002-12-05Phase 1 of read-only-plans project: cause executor state nodes to pointTom Lane
to plan nodes, not vice-versa. All executor state nodes now inherit from struct PlanState. Copying of plan trees has been simplified by not storing a list of SubPlans in Plan nodes (eliminating duplicate links). The executor still needs such a list, but it can build it during ExecutorStart since it has to scan the plan tree anyway. No initdb forced since no stored-on-disk structures changed, but you will need a full recompile because of node-numbering changes.