How Direct Path Reads(DPR) work for PostgreSQL Performance?

In PostgreSQL, Direct Path Read is implemented as an optimization technique for full table scans, which aims to bypass the buffer cache and read data directly from disk into the process memory, reducing the overhead of disk I/O and buffer management. Direct Path Read is automatically triggered by the query planner when a full table scan is detected and certain conditions are met, such as a large table size or a low buffer cache hit rate. Direct Path Read can significantly improve query performance for large data sets and high-concurrency scenarios.

Direct path reads can affect the performance of PostgreSQL in several ways:

  1. Improved Query Performance: Direct path reads can improve the performance of large-scale data access operations by reducing the amount of time spent reading from disk. This can result in faster query execution times and improved overall system performance.
  2. Increased Memory Usage: Direct path reads bypass the buffer cache and read data directly into memory. This can result in increased memory usage and may cause the system to swap to disk if the available memory is insufficient.
  3. Increased Disk I/O: Direct path reads can increase disk I/O, as the data is read directly from disk into memory. This can put additional pressure on the disk subsystem and may result in increased latency for other disk-bound operations.
  4. Increased CPU Utilization: Direct path reads can increase CPU utilization, as the data is loaded directly into memory without the benefit of the buffer cache. This can result in increased CPU usage and may impact the performance of other CPU-bound operations.

Overall, direct path reads can be a useful tool for improving query performance, but it is important to monitor their impact on the system and to use them appropriately. If direct path reads are causing performance issues, you may need to adjust your workload or modify your system configuration to address the problem.

Monitoring Direct Access Paths in PostgreSQL

There are several metrics that you can use to monitor direct path reads in PostgreSQL, including:

  1. heap_blks_read: The number of disk pages read from disk into memory during direct path reads.
  2. heap_blks_hit: The number of disk pages that were found in memory and did not need to be read from disk.
  3. heap_blks_total: The total number of disk pages that were either read from disk or found in memory.
  4. read_time: The total time spent reading data from disk during direct path reads.
  5. hit_time: The total time spent searching for data in memory instead of reading from disk.

These metrics can be obtained using the pg_statio_user_tables system catalog view or the pg_stat_user_tables function.

Here’s an example of how to use pg_stat_user_tables to obtain direct path read metrics for a specific table:

By monitoring these metrics, you can gain insight into the performance of your direct path reads and identify potential issues or areas for improvement.

About Shiv Iyer 330 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.