diff options
author | David Rowley <drowley@postgresql.org> | 2024-08-20 13:38:22 +1200 |
---|---|---|
committer | David Rowley <drowley@postgresql.org> | 2024-08-20 13:38:22 +1200 |
commit | adf97c1562380e02acd60dc859c289ed3a8352ee (patch) | |
tree | bbc199a61078c00d997903c4d5ce0c2fdccc7224 /src/include/nodes/execnodes.h | |
parent | 9380e5f129d2a160ecc2444f61bb7cb97fd51fbb (diff) |
Speed up Hash Join by making ExprStates support hashing
Here we add ExprState support for obtaining a 32-bit hash value from a
list of expressions. This allows both faster hashing and also JIT
compilation of these expressions. This is especially useful when hash
joins have multiple join keys as the previous code called ExecEvalExpr on
each hash join key individually and that was inefficient as tuple
deformation would have only taken into account one key at a time, which
could lead to walking the tuple once for each join key. With the new
code, we'll determine the maximum attribute required and deform the tuple
to that point only once.
Some performance tests done with this change have shown up to a 20%
performance increase of a query containing a Hash Join without JIT
compilation and up to a 26% performance increase when JIT is enabled and
optimization and inlining were performed by the JIT compiler. The
performance increase with 1 join column was less with a 14% increase
with and without JIT. This test was done using a fairly small hash
table and a large number of hash probes. The increase will likely be
less with large tables, especially ones larger than L3 cache as memory
pressure is more likely to be the limiting factor there.
This commit only addresses Hash Joins, but lays expression evaluation
and JIT compilation infrastructure for other hashing needs such as Hash
Aggregate.
Author: David Rowley
Reviewed-by: Alexey Dvoichenkov <alexey@hyperplane.net>
Reviewed-by: Tels <nospam-pg-abuse@bloodgate.com>
Discussion: https://postgr.es/m/CAApHDvoexAxgQFNQD_GRkr2O_eJUD1-wUGm%3Dm0L%2BGc%3DT%3DkEa4g%40mail.gmail.com
Diffstat (limited to 'src/include/nodes/execnodes.h')
-rw-r--r-- | src/include/nodes/execnodes.h | 12 |
1 files changed, 6 insertions, 6 deletions
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index 87f1519ec65..af7d8fd1e72 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -2184,8 +2184,7 @@ typedef struct MergeJoinState * HashJoinState information * * hashclauses original form of the hashjoin condition - * hj_OuterHashKeys the outer hash keys in the hashjoin condition - * hj_HashOperators the join operators in the hashjoin condition + * hj_OuterHash ExprState for hashing outer keys * hj_HashTable hash table for the hashjoin * (NULL if table not built yet) * hj_CurHashValue hash value for current outer tuple @@ -2215,9 +2214,7 @@ typedef struct HashJoinState { JoinState js; /* its first field is NodeTag */ ExprState *hashclauses; - List *hj_OuterHashKeys; /* list of ExprState nodes */ - List *hj_HashOperators; /* list of operator OIDs */ - List *hj_Collations; + ExprState *hj_OuterHash; HashJoinTable hj_HashTable; uint32 hj_CurHashValue; int hj_CurBucketNo; @@ -2770,7 +2767,10 @@ typedef struct HashState { PlanState ps; /* its first field is NodeTag */ HashJoinTable hashtable; /* hash table for the hashjoin */ - List *hashkeys; /* list of ExprState nodes */ + ExprState *hash_expr; /* ExprState to get hash value */ + + FmgrInfo *skew_hashfunction; /* lookup data for skew hash function */ + Oid skew_collation; /* collation to call skew_hashfunction with */ /* * In a parallelized hash join, the leader retains a pointer to the |