How to Implement Parallel Queries in PostgreSQL 15 for Improved Performance

PostgreSQL 15 introduces a new parallel query feature called “Partitionwise Joins” which can improve the performance of queries involving large tables. Here’s how to implement parallel queries using this feature:

Ensure that your query involves a join between two or more large tables. Partitionwise joins work best when the tables being joined are partitioned.
Partition your tables using PostgreSQL’s built-in partitioning feature. Partitioning divides a large table into smaller partitions based on a partition key, which can improve query performance by reducing the amount of data that needs to be scanned.
Use the SET max_parallel_workers_per_gather configuration parameter to set the maximum number of parallel workers that can be used per parallel gather operation. This parameter limits the number of workers that can be used to execute a query in parallel.
Use the ENABLE_PARALLEL_HASH configuration parameter to enable parallel hash joins. Parallel hash joins can be faster than traditional hash joins for large tables.
Use the ENABLE_PARTWISE_JOIN configuration parameter to enable partitionwise joins. Partitionwise joins allow each partition to be joined independently, which can improve parallel query performance.

Here’s an example query that uses partitionwise joins:

SELECT *
FROM orders o
JOIN lineitems l ON o.orderid = l.orderid
WHERE o.orderdate BETWEEN '2022-01-01' AND '2022-12-31'

Assuming that the orders and lineitems tables are partitioned by orderid, you can enable partitionwise joins for this query by adding the following statement:

SET ENABLE_PARTWISE_JOIN=1;

You can also enable parallel hash joins by adding the following statement:

SET ENABLE_PARALLEL_HASH=1;

Finally, you can set the maximum number of parallel workers per gather operation by adding the following statement:

SET max_parallel_workers_per_gather=4;

This statement limits the number of parallel workers to 4, which can improve query performance by limiting the overhead of parallel query execution.

With these configuration settings in place, PostgreSQL will use parallel query execution to improve the performance of the query. Keep in mind that parallel query execution can increase the load on your system, so it’s important to monitor your system resources and adjust the configuration settings as necessary to ensure optimal performance.

Monitoring Parallel Query Operations in PostgreSQL 15

You can use the pg_stat_activity system view in PostgreSQL to monitor parallel query operations. In PostgreSQL 15, the pg_stat_activity view includes new columns that provide information about parallel query operations, including the number of parallel workers used, the maximum number of parallel workers allowed, and the status of each worker. Here’s an example SQL query that you can use to monitor parallel query operations:

SELECT
  pid,
  datname,
  usename,
  query,
  state,
  wait_event_type,
  wait_event,
  backend_type,
  num_parallel_workers,
  max_parallel_workers,
  parallel_leader_pid,
  parallel_terminate
FROM pg_stat_activity
WHERE backend_type = 'parallel worker';

his query selects data from the pg_stat_activity view and filters for rows where the backend_type column equals “parallel worker”. This will return only rows for parallel query operations. The columns in the SELECT statement provide various information about the parallel query operations:

pid: The process ID of the parallel worker process
datname: The name of the database being queried
usename: The name of the user running the query
query: The SQL query being executed
state: The current state of the parallel worker process
wait_event_type: The type of event that the worker process is waiting for, if any
wait_event: The name of the event that the worker process is waiting for, if any
backend_type: The type of PostgreSQL backend process (should be “parallel worker”)
num_parallel_workers: The number of parallel workers being used for the query
max_parallel_workers: The maximum number of parallel workers allowed for the query
parallel_leader_pid: The process ID of the parallel query leader process
parallel_terminate: A flag indicating whether the parallel worker process has been marked for termination

By running this query periodically, you can monitor the number of parallel workers being used, the status of each worker, and other details about parallel query operations. This can help you identify any issues or bottlenecks in your parallel query execution and optimize your system for optimal performance.

The WebScale Database Infrastructure Architecture, Engineering and Operations Company

Full-Stack Database Engineering & Cloud DBaaS Solutions for PostgreSQL, MySQL, MongoDB & More | Performance, Scalability, High Availability, Security & Analytics Experts

How to implement parallel queries in PostgreSQL 15?

Monitoring Parallel Query Operations in PostgreSQL 15