Improve planner's estimates of tuple hash table sizes.

For several types of plan nodes that use TupleHashTables, the planner estimated the expected size of the table as basically numEntries * (MAXALIGN(dataWidth) + MAXALIGN(SizeofHeapTupleHeader)). This is pretty far off, especially for small data widths, because it doesn't account for the overhead of the simplehash.h hash table nor for any per-tuple "additional space" the plan node may request. Jeff Janes noted a case where the estimate was off by about a factor of three, even though the obvious hazards such as inaccurate estimates of numEntries or dataWidth didn't apply. To improve matters, create functions provided by the relevant executor modules that can estimate the required sizes with reasonable accuracy. (We're still not accounting for effects like allocator padding, but this at least gets the first-order effects correct.) I added functions that can estimate the tuple table sizes for nodeSetOp and nodeSubplan; these rely on an estimator for TupleHashTables in general, and that in turn relies on one for simplehash.h hash tables. That feels like kind of a lot of mechanism, but if we take any short-cuts we're violating modularity boundaries. The other places that use TupleHashTables are nodeAgg, which took pains to get its numbers right already, and nodeRecursiveunion. I did not try to improve the situation for nodeRecursiveunion because there's nothing to improve: we are not making an estimate of the hash table size, and it wouldn't help us to do so because we have no non-hashed alternative implementation. On top of that, our estimate of the number of entries to be hashed in that module is so suspect that we'd likely often choose the wrong implementation if we did have two ways to do it. Reported-by: Jeff Janes <jeff.janes@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com
author: Tom Lane <tgl@sss.pgh.pa.us> 2025-11-02 16:57:26 -0500
committer: Tom Lane <tgl@sss.pgh.pa.us> 2025-11-02 16:57:26 -0500
commit: 1ea5bdb00bfbc6f8034859cd19769346bf31dc53 (patch)
tree: e9c45200a27b001f80e196148c8c71b209c77c72 /src/backend/executor
parent: b8f1c62807a58dc97e9262a17e7d0cadb305322b (diff)
3 files changed, 119 insertions, 2 deletions
diff --git a/src/backend/executor/execGrouping.c b/src/backend/executor/execGrouping.c
index b4bdaa3c305..e1a3a813dd9 100644
--- a/src/backend/executor/execGrouping.c
+++ b/src/backend/executor/execGrouping.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "access/htup_details.h"
 #include "access/parallel.h"
 #include "common/hashfn.h"
 #include "executor/executor.h"
@@ -303,6 +304,64 @@ ResetTupleHashTable(TupleHashTable hashtable)
 }
 
 /*
+ * Estimate the amount of space needed for a TupleHashTable with nentries
+ * entries, if the tuples have average data width tupleWidth and the caller
+ * requires additionalsize extra space per entry.
+ *
+ * Return SIZE_MAX if it'd overflow size_t.
+ *
+ * nentries is "double" because this is meant for use by the planner,
+ * which typically works with double rowcount estimates.  So we'd need to
+ * clamp to integer somewhere and that might as well be here.  We do expect
+ * the value not to be NaN or negative, else the result will be garbage.
+ */
+Size
+EstimateTupleHashTableSpace(double nentries,
+							Size tupleWidth,
+							Size additionalsize)
+{
+	Size		sh_space;
+	double		tuples_space;
+
+	/* First estimate the space needed for the simplehash table */
+	sh_space = tuplehash_estimate_space(nentries);
+
+	/* Give up if that's already too big */
+	if (sh_space >= SIZE_MAX)
+		return sh_space;
+
+	/*
+	 * Compute space needed for hashed tuples with additional data.  nentries
+	 * must be somewhat sane, so it should be safe to compute this product.
+	 *
+	 * We assume that the hashed tuples will be kept in a BumpContext so that
+	 * there is not additional per-tuple overhead.
+	 *
+	 * (Note that this is only accurate if MEMORY_CONTEXT_CHECKING is off,
+	 * else bump.c will add a MemoryChunk header to each tuple.  However, it
+	 * seems undesirable for debug builds to make different planning choices
+	 * than production builds, so we assume the production behavior always.)
+	 */
+	tuples_space = nentries * (MAXALIGN(SizeofMinimalTupleHeader) +
+							   MAXALIGN(tupleWidth) +
+							   MAXALIGN(additionalsize));
+
+	/*
+	 * Check for size_t overflow.  This coding is trickier than it may appear,
+	 * because on 64-bit machines SIZE_MAX cannot be represented exactly as a
+	 * double.  We must cast it explicitly to suppress compiler warnings about
+	 * an inexact conversion, and we must trust that any double value that
+	 * compares strictly less than "(double) SIZE_MAX" will cast to a
+	 * representable size_t value.
+	 */
+	if (sh_space + tuples_space >= (double) SIZE_MAX)
+		return SIZE_MAX;
+
+	/* We don't bother estimating size of the miscellaneous overhead data */
+	return (Size) (sh_space + tuples_space);
+}
+
+/*
  * Find or create a hashtable entry for the tuple group containing the
  * given tuple.  The tuple must be the same type as the hashtable entries.
  *
diff --git a/src/backend/executor/nodeSetOp.c b/src/backend/executor/nodeSetOp.c
index 7b223a7ca3a..5aabed18a09 100644
--- a/src/backend/executor/nodeSetOp.c
+++ b/src/backend/executor/nodeSetOp.c
@@ -111,6 +111,15 @@ build_hash_table(SetOpState *setopstate)
 												false);
 }
 
+/* Planner support routine to estimate space needed for hash table */
+Size
+EstimateSetOpHashTableSpace(double nentries, Size tupleWidth)
+{
+	return EstimateTupleHashTableSpace(nentries,
+									   tupleWidth,
+									   sizeof(SetOpStatePerGroupData));
+}
+
 /*
  * We've completed processing a tuple group.  Decide how many copies (if any)
  * of its representative row to emit, and store the count into numOutput.
diff --git a/src/backend/executor/nodeSubplan.c b/src/backend/executor/nodeSubplan.c
index 9f6e45bcb0b..1cd0988bb49 100644
--- a/src/backend/executor/nodeSubplan.c
+++ b/src/backend/executor/nodeSubplan.c
@@ -525,7 +525,7 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
 											  node->tab_hash_funcs,
 											  node->tab_collations,
 											  nbuckets,
-											  0,
+											  0,	/* no additional data */
 											  node->planstate->state->es_query_cxt,
 											  node->tuplesContext,
 											  innerecontext->ecxt_per_tuple_memory,
@@ -554,7 +554,7 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
 												  node->tab_hash_funcs,
 												  node->tab_collations,
 												  nbuckets,
-												  0,
+												  0,	/* no additional data */
 												  node->planstate->state->es_query_cxt,
 												  node->tuplesContext,
 												  innerecontext->ecxt_per_tuple_memory,
@@ -636,6 +636,55 @@ buildSubPlanHash(SubPlanState *node, ExprContext *econtext)
 	MemoryContextSwitchTo(oldcontext);
 }
 
+/* Planner support routine to estimate space needed for hash table(s) */
+Size
+EstimateSubplanHashTableSpace(double nentries,
+							  Size tupleWidth,
+							  bool unknownEqFalse)
+{
+	Size		tab1space,
+				tab2space;
+
+	/* Estimate size of main hashtable */
+	tab1space = EstimateTupleHashTableSpace(nentries,
+											tupleWidth,
+											0 /* no additional data */ );
+
+	/* Give up if that's already too big */
+	if (tab1space >= SIZE_MAX)
+		return tab1space;
+
+	/* Done if we don't need a hashnulls table */
+	if (unknownEqFalse)
+		return tab1space;
+
+	/*
+	 * Adjust the rowcount estimate in the same way that buildSubPlanHash
+	 * will, except that we don't bother with the special case for a single
+	 * hash column.  (We skip that detail because it'd be notationally painful
+	 * for our caller to provide the column count, and this table has
+	 * relatively little impact on the total estimate anyway.)
+	 */
+	nentries /= 16;
+	if (nentries < 1)
+		nentries = 1;
+
+	/*
+	 * It might be sane to also reduce the tupleWidth, but on the other hand
+	 * we are not accounting for the space taken by the tuples' null bitmaps.
+	 * Leave it alone for now.
+	 */
+	tab2space = EstimateTupleHashTableSpace(nentries,
+											tupleWidth,
+											0 /* no additional data */ );
+
+	/* Guard against overflow */
+	if (tab2space >= SIZE_MAX - tab1space)
+		return SIZE_MAX;
+
+	return tab1space + tab2space;
+}
+
 /*
  * execTuplesUnequal
  *		Return true if two tuples are definitely unequal in the indicated
author	Tom Lane <tgl@sss.pgh.pa.us>	2025-11-02 16:57:26 -0500
committer	Tom Lane <tgl@sss.pgh.pa.us>	2025-11-02 16:57:26 -0500
commit	1ea5bdb00bfbc6f8034859cd19769346bf31dc53 (patch)
tree	e9c45200a27b001f80e196148c8c71b209c77c72 /src/backend/executor
parent	b8f1c62807a58dc97e9262a17e7d0cadb305322b (diff)