
Are you curious why PostgreSQL lacks query hints and how you can still influence execution plans? Discover alternative techniques to force join orders in PostgreSQL and optimize query performance.
Why PostgreSQL avoids query hints?
Dive into the PostgreSQL TODO list and explore the discussion behind optimizer hints, listed under “Features We Do Not Want.” While not universally agreed upon, PostgreSQL’s extensibility provides a solution for query hints through the pg_hint_plan extension. However, if you prefer not to install third-party software or lack access to the operating system, read on for alternative methods to shape join orders.
Real-life example: Controlling the join order
Consider three tables—’a’, ‘b’, and ‘c’—and the goal of calculating their natural join. PostgreSQL’s optimizer initially selects the following plan:
SELECT b.b_id, a.value
FROM a
JOIN b USING (a_id)
JOIN c USING (a_id)
WHERE c.c_id < 300;
Explore the optimizer barriers
To enforce your preferred join order, you must understand PostgreSQL’s query planning process. PostgreSQL rearranges queries extensively to find the most efficient execution plan. However, we seek optimizer barriers that prevent this reordering. Discover two similar techniques that leverage subqueries to thwart optimization.
- Using OFFSET 0: By including an OFFSET clause (e.g., OFFSET 0) within a subquery in the FROM clause, you prevent PostgreSQL from flattening the query and reshuffling the join order.
- Using a common table expression (CTE): CTEs offer an alternative approach. Specify the desired join order within a CTE and reference it in the main query. Prior to PostgreSQL v12, CTEs were automatically optimizer barriers. From v12 onwards, you can use the MATERIALIZED keyword to maintain CTEs as barriers.
WITH subq AS MATERIALIZED (
SELECT a_id, b.b_id, c.c_id
FROM b
JOIN c USING (a_id)
WHERE c.c_id < 300
)
SELECT subq.b_id, a.value
FROM a
JOIN subq USING (a_id);
Managing join order and join_collapse_limit
PostgreSQL optimizes join orders based on the query structure. For queries with multiple joins, PostgreSQL explores various join orders to find the best execution plan. However, to ensure planning efficiency, PostgreSQL imposes limits on the number of potential combinations considered. These limits can be adjusted using the join_collapse_limit and from_collapse_limit parameters.
However, when the number of tables exceeds the threshold (typically 12), PostgreSQL employs a genetic query optimizer, generating multiple plans and evolving the most promising ones. This genetic optimizer can produce non-deterministic query plans.
Deliberately influencing the optimizer
To exert complete control over join orders, set join_collapse_limit to 1. This forces the optimizer to adhere strictly to the order specified in the SQL statement. Be cautious with this setting, as it can negatively impact other queries. To change the parameter for a single query, consider using explicit READ ONLY transactions or running the query within a database function.
Unlock PostgreSQL’s optimization potential
While PostgreSQL avoids query hints, you can still influence query execution plans. By understanding optimizer barriers and adjusting relevant parameters, you can shape join orders and optimize query performance. Explore these techniques further and dive into topics like UNION ALL performance and different join strategies. Additionally, learn about the power of EXPLAIN (ANALYZE) for query analysis.