summaryrefslogtreecommitdiff
path: root/src/backend/executor/nodeHash.c
AgeCommit message (Collapse)Author
2015-05-24Manual cleanup of pgindent results.Tom Lane
Fix some places where pgindent did silly stuff, often because project style wasn't followed to begin with. (I've not touched the atomics headers, though.)
2015-05-23pgindent run for 9.5Bruce Momjian
2015-02-21Use FLEXIBLE_ARRAY_MEMBER for HeapTupleHeaderData.t_bits[].Tom Lane
This requires changing quite a few places that were depending on sizeof(HeapTupleHeaderData), but it seems for the best. Michael Paquier, some adjustments by me
2015-01-06Update copyright for 2015Bruce Momjian
Backpatch certain files through 9.0
2014-10-13Increase number of hash join buckets for underestimate.Kevin Grittner
If we expect batching at the very beginning, we size nbuckets for "full work_mem" (see how many tuples we can get into work_mem, while not breaking NTUP_PER_BUCKET threshold). If we expect to be fine without batching, we start with the 'right' nbuckets and track the optimal nbuckets as we go (without actually resizing the hash table). Once we hit work_mem (considering the optimal nbuckets value), we keep the value. At the end of the first batch, we check whether (nbuckets != nbuckets_optimal) and resize the hash table if needed. Also, we keep this value for all batches (it's OK because it assumes full work_mem, and it makes the batchno evaluation trivial). So the resize happens only once. There could be cases where it would improve performance to allow the NTUP_PER_BUCKET threshold to be exceeded to keep everything in one batch rather than spilling to a second batch, but attempts to generate such a case have so far been unsuccessful; that issue may be addressed with a follow-on patch after further investigation. Tomas Vondra with minor format and comment cleanup by me Reviewed by Robert Haas, Heikki Linnakangas, and Kevin Grittner
2014-09-14Fix pointer type in size passed to memset.Heikki Linnakangas
Pointers are all the same size, so it makes no practical difference, but let's be tidy. Found by Coverity, noted off-list by Tom Lane.
2014-09-12Change NTUP_PER_BUCKET to 1 to improve hash join lookup speed.Robert Haas
Since this makes the bucket headers use ~10x as much memory, properly account for that memory when we figure out whether everything fits in work_mem. This might result in some cases that previously used only a single batch getting split into multiple batches, but it's unclear as yet whether we need defenses against that case, and if so, what the shape of those defenses should be. It's worth noting that even in these edge cases, users should still be no worse off than they would have been last week, because commit 45f6240a8fa9d35548eb2ef23dba2c11540aa02a saved a big pile of memory on exactly the same workloads. Tomas Vondra, reviewed and somewhat revised by me.
2014-09-10Pack tuples in a hash join batch densely, to save memory.Heikki Linnakangas
Instead of palloc'ing each HashJoinTuple individually, allocate 32kB chunks and pack the tuples densely in the chunks. This avoids the AllocChunk header overhead, and the space wasted by standard allocator's habit of rounding sizes up to the nearest power of two. This doesn't contain any planner changes, because the planner's estimate of memory usage ignores the palloc overhead. Now that the overhead is smaller, the planner's estimates are in fact more accurate. Tomas Vondra, reviewed by Robert Haas.
2014-05-06pgindent run for 9.4Bruce Momjian
This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
2014-01-07Update copyright for 2014Bruce Momjian
Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
2013-01-01Update copyrights for 2013Bruce Momjian
Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
2012-12-11Add defenses against integer overflow in dynahash numbuckets calculations.Tom Lane
The dynahash code requires the number of buckets in a hash table to fit in an int; but since we calculate the desired hash table size dynamically, there are various scenarios where we might calculate too large a value. The resulting overflow can lead to infinite loops, division-by-zero crashes, etc. I (tgl) had previously installed some defenses against that in commit 299d1716525c659f0e02840e31fbe4dea3, but that covered only one call path. Moreover it worked by limiting the request size to work_mem, but in a 64-bit machine it's possible to set work_mem high enough that the problem appears anyway. So let's fix the problem at the root by installing limits in the dynahash.c functions themselves. Trouble report and patch by Jeff Davis.
2012-08-30Split tuple struct defs from htup.h to htup_details.hAlvaro Herrera
This reduces unnecessary exposure of other headers through htup.h, which is very widely included by many files. I have chosen to move the function prototypes to the new file as well, because that means htup.h no longer needs to include tupdesc.h. In itself this doesn't have much effect in indirect inclusion of tupdesc.h throughout the tree, because it's also required by execnodes.h; but it's something to explore in the future, and it seemed best to do the htup.h change now while I'm busy with it.
2012-01-01Update copyright notices for year 2012.Bruce Momjian
2011-10-11Rearrange the implementation of index-only scans.Tom Lane
This commit changes index-only scans so that data is read directly from the index tuple without first generating a faux heap tuple. The only immediate benefit is that indexes on system columns (such as OID) can be used in index-only scans, but this is necessary infrastructure if we are ever to support index-only scans on expression indexes. The executor is now ready for that, though the planner still needs substantial work to recognize the possibility. To do this, Vars in index-only plan nodes have to refer to index columns not heap columns. I introduced a new special varno, INDEX_VAR, to mark such Vars to avoid confusion. (In passing, this commit renames the two existing special varnos to OUTER_VAR and INNER_VAR.) This allows ruleutils.c to handle them with logic similar to what we use for subplan reference Vars. Since index-only scans are now fundamentally different from regular indexscans so far as their expression subtrees are concerned, I also chose to change them to have their own plan node type (and hence, their own executor source file).
2011-09-22Make EXPLAIN ANALYZE report the numbers of rows rejected by filter steps.Tom Lane
This provides information about the numbers of tuples that were visited but not returned by table scans, as well as the numbers of join tuples that were considered and discarded within a join plan node. There is still some discussion going on about the best way to report counts for outer-join situations, but I think most of what's in the patch would not change if we revise that, so I'm going to go ahead and commit it as-is. Documentation changes to follow (they weren't in the submitted patch either). Marko Tiikkaja, reviewed by Marc Cousin, somewhat revised by Tom
2011-09-01Remove unnecessary #include references, per pgrminclude script.Bruce Momjian
2011-04-10pgindent run before PG 9.1 beta 1.Bruce Momjian
2011-01-01Stamp copyrights for year 2011.Bruce Momjian
2010-12-30Support RIGHT and FULL OUTER JOIN in hash joins.Tom Lane
This is advantageous first because it allows us to hash the smaller table regardless of the outer-join type, and second because hash join can be more flexible than merge join in dealing with arbitrary join quals in a FULL join. For merge join all the join quals have to be mergejoinable, but hash join will work so long as there's at least one hashjoinable qual --- the others can be any condition. (This is true essentially because we don't keep per-inner-tuple match flags in merge join, while hash join can do so.) To do this, we need a has-it-been-matched flag for each tuple in the hashtable, not just one for the current outer tuple. The key idea that makes this practical is that we can store the match flag in the tuple's infomask, since there are lots of bits there that are of no interest for a MinimalTuple. So we aren't increasing the size of the hashtable at all for the feature. To write this without turning the hash code into even more of a pile of spaghetti than it already was, I rewrote ExecHashJoin in a state-machine style, similar to ExecMergeJoin. Other than that decision, it was pretty straightforward.
2010-09-20Remove cvs keywords from all files.Magnus Hagander
2010-07-12Make NestLoop plan nodes pass outer-relation variables into their innerTom Lane
relation using the general PARAM_EXEC executor parameter mechanism, rather than the ad-hoc kluge of passing the outer tuple down through ExecReScan. The previous method was hard to understand and could never be extended to handle parameters coming from multiple join levels. This patch doesn't change the set of possible plans nor have any significant performance effect, but it's necessary infrastructure for future generalization of the concept of an inner indexscan plan. ExecReScan's second parameter is now unused, so it's removed.
2010-02-26pgindent run for 9.0Bruce Momjian
2010-02-14Wrap calls to SearchSysCache and related functions using macros.Robert Haas
The purpose of this change is to eliminate the need for every caller of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists, GetSysCacheOid, and SearchSysCacheList to know the maximum number of allowable keys for a syscache entry (currently 4). This will make it far easier to increase the maximum number of keys in a future release should we choose to do so, and it makes the code shorter, too. Design and review by Tom Lane.
2010-02-01Augment EXPLAIN output with more details on Hash nodes.Robert Haas
We show the number of buckets, the number of batches (and also the original number if it has changed), and the peak space used by the hash table. Minor executor changes to track peak space used.
2010-01-04When estimating the selectivity of an inequality "column > constant" orTom Lane
"column < constant", and the comparison value is in the first or last histogram bin or outside the histogram entirely, try to fetch the actual column min or max value using an index scan (if there is an index on the column). If successful, replace the lower or upper histogram bound with that value before carrying on with the estimate. This limits the estimation error caused by moving min/max values when the comparison value is close to the min or max. Per a complaint from Josh Berkus. It is tempting to consider using this mechanism for mergejoinscansel as well, but that would inject index fetches into main-line join estimation not just endpoint cases. I'm refraining from that until we can get a better handle on the costs of doing this type of lookup.
2010-01-02Update copyright for the year 2010.Bruce Momjian
2009-12-29Add the ability to store inheritance-tree statistics in pg_statistic,Tom Lane
and teach ANALYZE to compute such stats for tables that have subclasses. Per my proposal of yesterday. autovacuum still needs to be taught about running ANALYZE on parent tables when their subclasses change, but the feature is useful even without that.
2009-10-30Make the overflow guards in ExecChooseHashTableSize be more protective.Tom Lane
The original coding ensured nbuckets and nbatch didn't exceed INT_MAX, which while not insane on its own terms did nothing to protect subsequent code like "palloc(nbatch * sizeof(BufFile *))". Since enormous join size estimates might well be planner error rather than reality, it seems best to constrain the initial sizes to be not more than work_mem/sizeof(pointer), thus ensuring the allocated arrays don't exceed work_mem. We will allow nbatch to get bigger than that during subsequent ExecHashIncreaseNumBatches calls, but we should still guard against integer overflow in those palloc requests. Per bug #5145 from Bernt Marius Johnsen. Although the given test case only seems to fail back to 8.2, previous releases have variants of this issue, so patch all supported branches.
2009-09-27Remove no-longer-needed ExecCountSlots infrastructure.Tom Lane
2009-06-118.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian
provided by Andrew.
2009-04-02Revert DTrace patch from Robert LorBruce Momjian
2009-04-02Add support for additional DTrace probes.Bruce Momjian
Robert Lor
2009-03-21Optimize multi-batch hash joins when the outer relation has a nonuniformTom Lane
distribution, by creating a special fast path for the (first few) most common values of the outer relation. Tuples having hashvalues matching the MCVs are effectively forced to be in the first batch, so that we never write them out to the batch temp files. Bryce Cutt and Ramon Lawrence, with some editorialization by me.
2009-01-01Update copyright for 2009.Bruce Momjian
2008-01-01Update copyrights in source tree to 2008.Bruce Momjian
2007-11-15pgindent run for 8.3.Bruce Momjian
2007-06-07Rework temp_tablespaces patch so that temp tablespaces are assigned separatelyTom Lane
for each temp file, rather than once per sort or hashjoin; this allows spreading the data of a large sort or join across multiple tablespaces. (I remain dubious that this will make any difference in practice, but certain people insisted.) Arrange to cache the results of parsing the GUC variable instead of recomputing from scratch on every demand, and push usage of the cache down to the bottommost fd.c level.
2007-06-03Create a GUC parameter temp_tablespaces that allows selection of theTom Lane
tablespace(s) in which to store temp tables and temporary files. This is a list to allow spreading the load across multiple tablespaces (a random list element is chosen each time a temp object is to be created). Temp files are not stored in per-database pgsql_tmp/ directories anymore, but per-tablespace directories. Jaime Casanova and Albert Cervera, with review by Bernd Helmle and Tom Lane.
2007-06-01Buy back some of the cycles spent in more-expensive hash functions byTom Lane
selecting power-of-2, rather than prime, numbers of buckets in hash joins. If the hash functions are doing their jobs properly by making all hash bits equally random, this is good enough, and it saves expensive integer division and modulus operations.
2007-02-22Fix bug I introduced in recent patch to make hash joins discard null tuplesTom Lane
immediately: ExecHashGetHashValue failed to restore the caller's memory context before taking the failure exit.
2007-01-30Add support for cross-type hashing in hash index searches and hash joins.Tom Lane
Hashing for aggregation purposes still needs work, so it's not time to mark any cross-type operators as hashable for general use, but these cases work if the operators are so marked by hand in the system catalogs.
2007-01-28Improve hash join to discard input tuples immediately if they can'tTom Lane
match because they contain a null join key (and the join operator is known strict). Improves performance significantly when the inner relation contains a lot of nulls, as per bug #2930.
2007-01-05Update CVS HEAD for 2007 copyright. Back branches are typically notBruce Momjian
back-stamped for this.
2006-07-14Add additional includes needed on some platforms.Bruce Momjian
2006-07-13Move math.h after postgresql.hBruce Momjian
2006-07-13Allow include files to compile own their own.Bruce Momjian
Strip unused include files out unused include files, and add needed includes to C files. The next step is to remove unused include files in C files.
2006-06-27Convert hash join code to use MinimalTuple format in tuple hash tableTom Lane
and batch files. Should reduce memory and I/O demands for such joins.
2006-05-30Make EXPLAIN sampling smarter, to avoid excessive sampling delay.Bruce Momjian
Martijn van Oosterhout
2006-05-23Remove CXT_printf/CXT1_printf macros. If anyone had found them to be ofTom Lane
any use in the past many years, we'd have made some effort to include them in all executor node types; but in fact they were only in nodeAppend.c and nodeIndexscan.c, up until I copied nodeIndexscan.c's occurrence into the new bitmap node types. Remove some other unused macros in execdebug.h, too. Some day the whole header probably ought to go away in favor of better-designed facilities.