Introduction
PostgreSQL does not natively offer "Adaptive Hash Joins" as a built-in feature for query execution.
Instead, the query planner selects join methods, such as nested loop, hash join, or merge join, during plan generation.
Additionally, it bases these selections on statistics and cost estimates available at the query planning stage.
Although PostgreSQL's planner efficiently selects join methods, it does not change join strategies adaptively during query execution. Consequently, it cannot adjust to real-time data distribution or size changes, as "Adaptive Hash Joins" might imply.
For more advanced join behaviours or to mimic adaptive join strategies, users often resort to custom solutions, like dynamic SQL in PL/pgSQL functions or application-side logic, to estimate data characteristics at runtime and choose the most appropriate join strategy.
Mimicking Adaptive Joins in PostgreSQL 16
To mimic adaptive joins in PostgreSQL 16, you can create a PL/pgSQL function that dynamically selects the join strategy based on the data size. Here’s a simplified example:
Suppose you have two tables, orders
and customers
, and you want to join them using either a nested loop or a hash join based on the number of rows.
1. Create a Function
CREATE OR REPLACE FUNCTION adaptive_join() RETURNS SETOF RECORD AS $ DECLARE row_count_orders INTEGER; row_count_customers INTEGER; query TEXT; BEGIN -- Estimate row counts SELECT count(*) INTO row_count_orders FROM orders; SELECT count(*) INTO row_count_customers FROM customers; -- Choose join strategy based on row count IF row_count_orders > 1000 OR row_count_customers > 1000 THEN query := 'SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id'; ELSE query := 'SELECT * FROM orders, customers WHERE orders.customer_id = customers.id'; END IF; -- Execute the chosen query RETURN QUERY EXECUTE query; END; $ LANGUAGE plpgsql;
2. Use the Function
SELECT * FROM adaptive_join() AS t(column_definitions);
Conclusion
In this example, the function adaptive_join
first estimates the number of rows in each table. If either table has more than 1000 rows, it uses a regular JOIN (which might use a hash join depending on the planner's decision); otherwise, it uses a comma-separated JOIN syntax, which tends to favour nested loop joins. This approach is a rudimentary form of an adaptive join, where the decision is based on row count thresholds.