Trim TIDs during parallel GIN builds more eagerly

The parallel GIN builds perform "freezing" of TID lists when merging chunks built earlier. This means determining what part of the list can no longer change, depending on the last received chunk. The frozen part can be evicted from memory and written out. The code attempted to freeze items right before merging the old and new TID list, after already attempting to trim the current buffer. That means part of the data may get frozen based on the new TID list, but will be trimmed later (on next loop). This increases memory usage. This inverts the order, so that we freeze data first (before trimming). The benefits are likely relatively small, but it's also virtually free with no other downsides. Discussion: https://postgr.es/m/CAHLJuCWDwn-PE2BMZE4Kux7x5wWt_6RoWtA0mUQffEDLeZ6sfA@mail.gmail.com
author: Tomas Vondra <tomas.vondra@postgresql.org> 2025-11-04 19:30:17 +0100
committer: Tomas Vondra <tomas.vondra@postgresql.org> 2025-11-04 20:06:01 +0100
commit: 1213cb475391640508d2495b2b560329897d152c (patch)
tree: 701ac891cd93a8db7eb937f8de4d1ac52f2ddba3
parent: 6d2ff1de4d0b66eb0288e21021c3741b9df1df0d (diff)
1 files changed, 36 insertions, 42 deletions
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 61869d9d931..c2b879b2bf6 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -1401,10 +1401,46 @@ GinBufferKeyEquals(GinBuffer *buffer, GinTuple *tup)
  * enough TIDs to trim (with values less than "first" TID from the new tuple),
  * we do the trim. By enough we mean at least 128 TIDs (mostly an arbitrary
  * number).
+ *
+ * We try freezing TIDs at the beginning of the list first, before attempting
+ * to trim the buffer. This may allow trimming the data earlier, reducing the
+ * memory usage and excluding it from the mergesort.
  */
 static bool
 GinBufferShouldTrim(GinBuffer *buffer, GinTuple *tup)
 {
+	/*
+	 * Check if the last TID in the current list is frozen. This is the case
+	 * when merging non-overlapping lists, e.g. in each parallel worker.
+	 */
+	if ((buffer->nitems > 0) &&
+		(ItemPointerCompare(&buffer->items[buffer->nitems - 1],
+							GinTupleGetFirst(tup)) == 0))
+		buffer->nfrozen = buffer->nitems;
+
+	/*
+	 * Now find the last TID we know to be frozen, i.e. the last TID right
+	 * before the new GIN tuple.
+	 *
+	 * Start with the first not-yet-frozen tuple, and walk until we find the
+	 * first TID that's higher. If we already know the whole list is frozen
+	 * (i.e. nfrozen == nitems), this does nothing.
+	 *
+	 * XXX This might do a binary search for sufficiently long lists, but it
+	 * does not seem worth the complexity. Overlapping lists should be rare
+	 * common, TID comparisons are cheap, and we should quickly freeze most of
+	 * the list.
+	 */
+	for (int i = buffer->nfrozen; i < buffer->nitems; i++)
+	{
+		/* Is the TID after the first TID of the new tuple? Can't freeze. */
+		if (ItemPointerCompare(&buffer->items[i],
+							   GinTupleGetFirst(tup)) > 0)
+			break;
+
+		buffer->nfrozen++;
+	}
+
 	/* not enough TIDs to trim (1024 is somewhat arbitrary number) */
 	if (buffer->nfrozen < 1024)
 		return false;
@@ -1470,48 +1506,6 @@ GinBufferStoreTuple(GinBuffer *buffer, GinTuple *tup)
 			buffer->key = (Datum) 0;
 	}
 
-	/*
-	 * Try freeze TIDs at the beginning of the list, i.e. exclude them from
-	 * the mergesort. We can do that with TIDs before the first TID in the new
-	 * tuple we're about to add into the buffer.
-	 *
-	 * We do this incrementally when adding data into the in-memory buffer,
-	 * and not later (e.g. when hitting a memory limit), because it allows us
-	 * to skip the frozen data during the mergesort, making it cheaper.
-	 */
-
-	/*
-	 * Check if the last TID in the current list is frozen. This is the case
-	 * when merging non-overlapping lists, e.g. in each parallel worker.
-	 */
-	if ((buffer->nitems > 0) &&
-		(ItemPointerCompare(&buffer->items[buffer->nitems - 1],
-							GinTupleGetFirst(tup)) == 0))
-		buffer->nfrozen = buffer->nitems;
-
-	/*
-	 * Now find the last TID we know to be frozen, i.e. the last TID right
-	 * before the new GIN tuple.
-	 *
-	 * Start with the first not-yet-frozen tuple, and walk until we find the
-	 * first TID that's higher. If we already know the whole list is frozen
-	 * (i.e. nfrozen == nitems), this does nothing.
-	 *
-	 * XXX This might do a binary search for sufficiently long lists, but it
-	 * does not seem worth the complexity. Overlapping lists should be rare
-	 * common, TID comparisons are cheap, and we should quickly freeze most of
-	 * the list.
-	 */
-	for (int i = buffer->nfrozen; i < buffer->nitems; i++)
-	{
-		/* Is the TID after the first TID of the new tuple? Can't freeze. */
-		if (ItemPointerCompare(&buffer->items[i],
-							   GinTupleGetFirst(tup)) > 0)
-			break;
-
-		buffer->nfrozen++;
-	}
-
 	/* add the new TIDs into the buffer, combine using merge-sort */
 	{
 		int			nnew;
author	Tomas Vondra <tomas.vondra@postgresql.org>	2025-11-04 19:30:17 +0100
committer	Tomas Vondra <tomas.vondra@postgresql.org>	2025-11-04 20:06:01 +0100
commit	1213cb475391640508d2495b2b560329897d152c (patch)
tree	701ac891cd93a8db7eb937f8de4d1ac52f2ddba3
parent	6d2ff1de4d0b66eb0288e21021c3741b9df1df0d (diff)