How to gather statistics of PostgreSQL only when stale?

In PostgreSQL, you can gather statistics only when they become stale by using the pg_stat_statements extension along with the track_activity_query_size configuration parameter. Follow these steps to achieve this:

  1. Enable the pg_stat_statements Extension:

  • Ensure that the pg_stat_statements extension runs in your PostgreSQL database.
  • If it’s not already enabled, you can do so by executing the following SQL command as a superuser:

CREATE EXTENSION pg_stat_statements;

  1. Configure the track_activity_query_size Parameter:

  • Next, Set the track_activity_query_size parameter in your PostgreSQL configuration file (postgresql.conf).
  • This parameter determines the maximum size of the query text tracked by pg_stat_statements.
  • Increase the value to a sufficient size to capture the complete query text. For example, set it to 1024 to allow tracking of queries up to 1KB in size.
  • After that, save the changes to the configuration file and restart the PostgreSQL server for the new configuration to take effect.

3. Query the pg_stat_statements View:

  • Execute queries against the pg_stat_statements view to retrieve statistical information about executed queries.
  • The view contains columns such as queryid, query, calls, total_time, and last_execution_time.
  • By examining the last_execution_time column, you can determine the time of the last execution for each query.
  1. Determine Staleness Criteria:

  • Define your criteria for determining when statistics are considered stale.
  • For example, you can set a threshold such as “if a query has not been executed for a certain duration (e.g., 24 hours), consider its statistics stale.”
  1. Schedule Statistics Collection:

  • Implement a scheduled task or script that regularly checks the pg_stat_statements view.Use your defined criteria to identify queries with stale statistics.
  • When the script detects stale statistics, execute the ANALYZE command specifically for that query to refresh its statistics.
  • You can use dynamic SQL along with the EXECUTE command in PostgreSQL to run the ANALYZE command for each identified query.

By using the pg_stat_statements extension and periodically monitoring the pg_stat_statements view for staleness, you can selectively gather statistics for queries that have not been executed for a specified duration. This approach allows you to maintain up-to-date statistics while minimizing the overhead of analyzing all queries.

About Shiv Iyer 501 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.