Prevent O(N^2) unique index insertion edge case.

Commit dd299df8 made nbtree treat heap TID as a tiebreaker column, establishing the principle that there is only one correct location (page and page offset number) for every index tuple, no matter what. Insertions of tuples into non-unique indexes proceed as if heap TID (scan key's scantid) is just another user-attribute value, but insertions into unique indexes are more delicate. The TID value in scantid must initially be omitted to ensure that the unique index insertion visits every leaf page that duplicates could be on. The scantid is set once again after unique checking finishes successfully, which can force _bt_findinsertloc() to step right one or more times, to locate the leaf page that the new tuple must be inserted on. Stepping right within _bt_findinsertloc() was assumed to occur no more frequently than stepping right within _bt_check_unique(), but there was one important case where that assumption was incorrect: inserting a "duplicate" with NULL values. Since _bt_check_unique() didn't do any real work in this case, it wasn't appropriate for _bt_findinsertloc() to behave as if it was finishing off a conventional unique insertion, where any existing physical duplicate must be dead or recently dead. _bt_findinsertloc() might have to grovel through a substantial portion of all of the leaf pages in the index to insert a single tuple, even when there were no dead tuples. To fix, treat insertions of tuples with NULLs into a unique index as if they were insertions into a non-unique index: never unset scantid before calling _bt_search() to descend the tree, and bypass _bt_check_unique() entirely. _bt_check_unique() is no longer responsible for incoming tuples with NULL values. Discussion: https://postgr.es/m/CAH2-Wzm08nr+JPx4jMOa9CGqxWYDQ-_D4wtPBiKghXAUiUy-nQ@mail.gmail.com
author: Peter Geoghegan <pg@bowt.ie> 2019-04-23 10:33:57 -0700
committer: Peter Geoghegan <pg@bowt.ie> 2019-04-23 10:33:57 -0700
commit: 9b10926263d831fac5758f1493c929a49b55669b (patch)
tree: a9a4d23e46c576b541d36d75d99a87d6f42b9232 /src/include
parent: f4a3fdfbdcd3763c42111318d004c2e90d072021 (diff)
1 files changed, 10 insertions, 0 deletions
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index fbc8134cfdb..6c1acd4855f 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -435,6 +435,15 @@ typedef BTStackData *BTStack;
  * indexes whose version is >= version 4.  It's convenient to keep this close
  * by, rather than accessing the metapage repeatedly.
  *
+ * anynullkeys indicates if any of the keys had NULL value when scankey was
+ * built from index tuple (note that already-truncated tuple key attributes
+ * set NULL as a placeholder key value, which also affects value of
+ * anynullkeys).  This is a convenience for unique index non-pivot tuple
+ * insertion, which usually temporarily unsets scantid, but shouldn't iff
+ * anynullkeys is true.  Value generally matches non-pivot tuple's HasNulls
+ * bit, but may not when inserting into an INCLUDE index (tuple header value
+ * is affected by the NULL-ness of both key and non-key attributes).
+ *
  * When nextkey is false (the usual case), _bt_search and _bt_binsrch will
  * locate the first item >= scankey.  When nextkey is true, they will locate
  * the first item > scan key.
@@ -461,6 +470,7 @@ typedef BTStackData *BTStack;
 typedef struct BTScanInsertData
 {
 	bool		heapkeyspace;
+	bool		anynullkeys;
 	bool		nextkey;
 	bool		pivotsearch;
 	ItemPointer scantid;		/* tiebreaker for scankeys */
author	Peter Geoghegan <pg@bowt.ie>	2019-04-23 10:33:57 -0700
committer	Peter Geoghegan <pg@bowt.ie>	2019-04-23 10:33:57 -0700
commit	9b10926263d831fac5758f1493c929a49b55669b (patch)
tree	a9a4d23e46c576b541d36d75d99a87d6f42b9232 /src/include
parent	f4a3fdfbdcd3763c42111318d004c2e90d072021 (diff)