Influencing PostgreSQL Query Planner: Understanding Query Hints and Optimizer Barriers

Are you curious why PostgreSQL lacks query hints and how you can still influence execution plans? Discover alternative techniques to force join orders in PostgreSQL and optimize query performance.

Why PostgreSQL avoids query hints?

Dive into the PostgreSQL TODO list and explore the discussion behind optimizer hints, listed under “Features We Do Not Want.” While not universally agreed upon, PostgreSQL’s extensibility provides a solution for query hints through the pg_hint_plan extension. However, if you prefer not to install third-party software or lack access to the operating system, read on for alternative methods to shape join orders.

Real-life example: Controlling the join order

Consider three tables—’a’, ‘b’, and ‘c’—and the goal of calculating their natural join. PostgreSQL’s optimizer initially selects the following plan:

SELECT b.b_id, a.value
JOIN b USING (a_id)
JOIN c USING (a_id)
WHERE c.c_id < 300;

Explore the optimizer barriers

To enforce your preferred join order, you must understand PostgreSQL’s query planning process. PostgreSQL rearranges queries extensively to find the most efficient execution plan. However, we seek optimizer barriers that prevent this reordering. Discover two similar techniques that leverage subqueries to thwart optimization.

  1. Using OFFSET 0: By including an OFFSET clause (e.g., OFFSET 0) within a subquery in the FROM clause, you prevent PostgreSQL from flattening the query and reshuffling the join order.
  2. Using a common table expression (CTE): CTEs offer an alternative approach. Specify the desired join order within a CTE and reference it in the main query. Prior to PostgreSQL v12, CTEs were automatically optimizer barriers. From v12 onwards, you can use the MATERIALIZED keyword to maintain CTEs as barriers.

SELECT a_id, b.b_id, c.c_id
JOIN c USING (a_id)
WHERE c.c_id < 300
SELECT subq.b_id, a.value
JOIN subq USING (a_id);

Managing join order and join_collapse_limit

PostgreSQL optimizes join orders based on the query structure. For queries with multiple joins, PostgreSQL explores various join orders to find the best execution plan. However, to ensure planning efficiency, PostgreSQL imposes limits on the number of potential combinations considered. These limits can be adjusted using the join_collapse_limit and from_collapse_limit parameters.

However, when the number of tables exceeds the threshold (typically 12), PostgreSQL employs a genetic query optimizer, generating multiple plans and evolving the most promising ones. This genetic optimizer can produce non-deterministic query plans.

Deliberately influencing the optimizer

To exert complete control over join orders, set join_collapse_limit to 1. This forces the optimizer to adhere strictly to the order specified in the SQL statement. Be cautious with this setting, as it can negatively impact other queries. To change the parameter for a single query, consider using explicit READ ONLY transactions or running the query within a database function.

Unlock PostgreSQL’s optimization potential

While PostgreSQL avoids query hints, you can still influence query execution plans. By understanding optimizer barriers and adjusting relevant parameters, you can shape join orders and optimize query performance. Explore these techniques further and dive into topics like UNION ALL performance and different join strategies. Additionally, learn about the power of EXPLAIN (ANALYZE) for query analysis.

About Shiv Iyer 330 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.