Looking under the hood of query transformation (done by CBO) in PostgreSQL with a simple real-life example

Query transformation is a key process in the PostgreSQL Cost-Based Optimizer (CBO) that allows the optimizer to modify a user’s SQL query in order to generate a more efficient query plan. In this process, the CBO analyzes the user’s query and applies a set of predefined transformation rules to the query syntax, resulting in a transformed query that can be optimized more efficiently.

One common example of query transformation in PostgreSQL is the use of subqueries to simplify complex queries. For instance, consider the following query that retrieves the total revenue for all products sold in the last quarter:

SELECT SUM(price * quantity) AS revenue
FROM sales
JOIN products ON sales.product_id = products.id
WHERE sales.date BETWEEN ‘2022-01-01’ AND ‘2022-03-31’;

This query can be transformed into a simpler form using a subquery, as follows:

SELECT SUM(revenue) AS revenue
FROM (
SELECT price * quantity AS revenue
FROM sales
JOIN products ON sales.product_id = products.id
WHERE sales.date BETWEEN ‘2022-01-01’ AND ‘2022-03-31’
) AS subquery;

In this example, the CBO has transformed the original query by creating a subquery that calculates the revenue for each individual sale, and then summing the revenue in the outer query. This transformation can help the optimizer generate a more efficient query plan by reducing the number of rows that need to be processed and by simplifying the join conditions.

Another example of query transformation in PostgreSQL is the use of join elimination to reduce the number of join operations in a query. For instance, consider the following query that retrieves the list of customers who have purchased a particular product:

SELECT customers.name
FROM customers
JOIN sales ON customers.id = sales.customer_id
JOIN products ON sales.product_id = products.id
WHERE products.name = ‘Product X’;

Assuming that the join between customers and sales tables is a one-to-many relationship, the CBO can eliminate the join between products and sales tables by moving the WHERE clause to the join condition:

SELECT customers.name
FROM customers
JOIN sales ON customers.id = sales.customer_id AND sales.product_id = (
SELECT id FROM products WHERE name = ‘Product X’
);

This transformation can help the optimizer generate a more efficient query plan by reducing the number of join operations and by filtering the result set more efficiently.

In conclusion, query transformation is an important process in the PostgreSQL CBO that allows the optimizer to generate more efficient query plans by modifying the user’s SQL query using a set of predefined transformation rules. By using query transformation, the CBO can simplify complex queries, reduce the number of join operations, and filter the result set more efficiently, resulting in faster and more efficient query processing.

About Shiv Iyer 460 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.