Improve planner's estimates of tuple hash table sizes.

For several types of plan nodes that use TupleHashTables, the planner estimated the expected size of the table as basically numEntries * (MAXALIGN(dataWidth) + MAXALIGN(SizeofHeapTupleHeader)). This is pretty far off, especially for small data widths, because it doesn't account for the overhead of the simplehash.h hash table nor for any per-tuple "additional space" the plan node may request. Jeff Janes noted a case where the estimate was off by about a factor of three, even though the obvious hazards such as inaccurate estimates of numEntries or dataWidth didn't apply. To improve matters, create functions provided by the relevant executor modules that can estimate the required sizes with reasonable accuracy. (We're still not accounting for effects like allocator padding, but this at least gets the first-order effects correct.) I added functions that can estimate the tuple table sizes for nodeSetOp and nodeSubplan; these rely on an estimator for TupleHashTables in general, and that in turn relies on one for simplehash.h hash tables. That feels like kind of a lot of mechanism, but if we take any short-cuts we're violating modularity boundaries. The other places that use TupleHashTables are nodeAgg, which took pains to get its numbers right already, and nodeRecursiveunion. I did not try to improve the situation for nodeRecursiveunion because there's nothing to improve: we are not making an estimate of the hash table size, and it wouldn't help us to do so because we have no non-hashed alternative implementation. On top of that, our estimate of the number of entries to be hashed in that module is so suspect that we'd likely often choose the wrong implementation if we did have two ways to do it. Reported-by: Jeff Janes <jeff.janes@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAMkU=1zia0JfW_QR8L5xA2vpa0oqVuiapm78h=WpNsHH13_9uw@mail.gmail.com
author: Tom Lane <tgl@sss.pgh.pa.us> 2025-11-02 16:57:26 -0500
committer: Tom Lane <tgl@sss.pgh.pa.us> 2025-11-02 16:57:26 -0500
commit: 1ea5bdb00bfbc6f8034859cd19769346bf31dc53 (patch)
tree: e9c45200a27b001f80e196148c8c71b209c77c72 /src/include/lib/simplehash.h
parent: b8f1c62807a58dc97e9262a17e7d0cadb305322b (diff)
1 files changed, 48 insertions, 2 deletions
diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h
index 9622131ede6..031a377da84 100644
--- a/src/include/lib/simplehash.h
+++ b/src/include/lib/simplehash.h
@@ -125,6 +125,7 @@
 #define SH_ITERATE SH_MAKE_NAME(iterate)
 #define SH_ALLOCATE SH_MAKE_NAME(allocate)
 #define SH_FREE SH_MAKE_NAME(free)
+#define SH_ESTIMATE_SPACE SH_MAKE_NAME(estimate_space)
 #define SH_STAT SH_MAKE_NAME(stat)
 
 /* internal helper functions (no externally visible prototypes) */
@@ -242,7 +243,10 @@ SH_SCOPE void SH_START_ITERATE_AT(SH_TYPE * tb, SH_ITERATOR * iter, uint32 at);
 /* <element> *<prefix>_iterate(<prefix>_hash *tb, <prefix>_iterator *iter) */
 SH_SCOPE	SH_ELEMENT_TYPE *SH_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter);
 
-/* void <prefix>_stat(<prefix>_hash *tb */
+/* size_t <prefix>_estimate_space(double nentries) */
+SH_SCOPE size_t SH_ESTIMATE_SPACE(double nentries);
+
+/* void <prefix>_stat(<prefix>_hash *tb) */
 SH_SCOPE void SH_STAT(SH_TYPE * tb);
 
 #endif							/* SH_DECLARE */
@@ -305,7 +309,7 @@ SH_SCOPE void SH_STAT(SH_TYPE * tb);
 
 /*
  * Compute allocation size for hashtable. Result can be passed to
- * SH_UPDATE_PARAMETERS.
+ * SH_UPDATE_PARAMETERS.  (Keep SH_ESTIMATE_SPACE in sync with this!)
  */
 static inline uint64
 SH_COMPUTE_SIZE(uint64 newsize)
@@ -1069,6 +1073,47 @@ SH_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter)
 }
 
 /*
+ * Estimate the amount of space needed for a hashtable with nentries entries.
+ * Return SIZE_MAX if that's too many entries.
+ *
+ * nentries is "double" because this is meant for use by the planner,
+ * which typically works with double rowcount estimates.  So we'd need to
+ * clamp to integer somewhere and that might as well be here.  We do expect
+ * the value not to be NaN or negative, else the result will be garbage.
+ */
+SH_SCOPE size_t
+SH_ESTIMATE_SPACE(double nentries)
+{
+	uint64		size;
+	uint64		space;
+
+	/* scale request by SH_FILLFACTOR, as SH_CREATE does */
+	nentries = nentries / SH_FILLFACTOR;
+
+	/* fail if we'd overrun SH_MAX_SIZE entries */
+	if (nentries >= SH_MAX_SIZE)
+		return SIZE_MAX;
+
+	/* should be safe to convert to uint64 */
+	size = (uint64) nentries;
+
+	/* supporting zero sized hashes would complicate matters */
+	size = Max(size, 2);
+
+	/* round up size to the next power of 2, that's how bucketing works */
+	size = pg_nextpower2_64(size);
+
+	/* calculate space needed for ->data */
+	space = ((uint64) sizeof(SH_ELEMENT_TYPE)) * size;
+
+	/* verify that allocation of ->data is possible on this platform */
+	if (space >= SIZE_MAX / 2)
+		return SIZE_MAX;
+
+	return (size_t) space + sizeof(SH_TYPE);
+}
+
+/*
  * Report some statistics about the state of the hashtable. For
  * debugging/profiling purposes only.
  */
@@ -1195,6 +1240,7 @@ SH_STAT(SH_TYPE * tb)
 #undef SH_ITERATE
 #undef SH_ALLOCATE
 #undef SH_FREE
+#undef SH_ESTIMATE_SPACE
 #undef SH_STAT
 
 /* internal function names */
author	Tom Lane <tgl@sss.pgh.pa.us>	2025-11-02 16:57:26 -0500
committer	Tom Lane <tgl@sss.pgh.pa.us>	2025-11-02 16:57:26 -0500
commit	1ea5bdb00bfbc6f8034859cd19769346bf31dc53 (patch)
tree	e9c45200a27b001f80e196148c8c71b209c77c72 /src/include/lib/simplehash.h
parent	b8f1c62807a58dc97e9262a17e7d0cadb305322b (diff)