Adaptive Joins in PostgreSQL 16 (mimicked feature)

Introduction

PostgreSQL does not natively offer "Adaptive Hash Joins" as a built-in feature for query execution.
Instead, the query planner selects join methods, such as nested loop, hash join, or merge join, during plan generation.
Additionally, it bases these selections on statistics and cost estimates available at the query planning stage.
Although PostgreSQL's planner efficiently selects join methods, it does not change join strategies adaptively during query execution. Consequently, it cannot adjust to real-time data distribution or size changes, as "Adaptive Hash Joins" might imply.

For more advanced join behaviours or to mimic adaptive join strategies, users often resort to custom solutions, like dynamic SQL in PL/pgSQL functions or application-side logic, to estimate data characteristics at runtime and choose the most appropriate join strategy.

Mimicking Adaptive Joins in PostgreSQL 16

To mimic adaptive joins in PostgreSQL 16, you can create a PL/pgSQL function that dynamically selects the join strategy based on the data size. Here’s a simplified example:

Suppose you have two tables, orders and customers, and you want to join them using either a nested loop or a hash join based on the number of rows.

1. Create a Function

CREATE OR REPLACE FUNCTION adaptive_join() RETURNS SETOF RECORD AS $
DECLARE
    row_count_orders INTEGER;
    row_count_customers INTEGER;
    query TEXT;
BEGIN
    -- Estimate row counts
    SELECT count(*) INTO row_count_orders FROM orders;
    SELECT count(*) INTO row_count_customers FROM customers;

    -- Choose join strategy based on row count
    IF row_count_orders > 1000 OR row_count_customers > 1000 THEN
        query := 'SELECT * FROM orders JOIN customers ON orders.customer_id = customers.id';
    ELSE
        query := 'SELECT * FROM orders, customers WHERE orders.customer_id = customers.id';
    END IF;

    -- Execute the chosen query
    RETURN QUERY EXECUTE query;
END;
$ LANGUAGE plpgsql;

2. Use the Function

SELECT * FROM adaptive_join() AS t(column_definitions);

Conclusion

In this example, the function adaptive_join first estimates the number of rows in each table. If either table has more than 1000 rows, it uses a regular JOIN (which might use a hash join depending on the planner's decision); otherwise, it uses a comma-separated JOIN syntax, which tends to favour nested loop joins. This approach is a rudimentary form of an adaptive join, where the decision is based on row count thresholds.

About Shiv Iyer 497 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.