How to gather statistics of PostgreSQL only when stale?

In PostgreSQL, you can gather statistics only when they are stale by utilizing the pg_stat_statements extension and the track_activity_query_size configuration parameter. Here’s how you can achieve this:

  1. Enable the pg_stat_statements Extension:
  • Ensure that the pg_stat_statements extension is enabled in your PostgreSQL database.
  • If it’s not already enabled, you can do so by executing the following SQL command as a superuser:

CREATE EXTENSION pg_stat_statements;

  1. Configure the track_activity_query_size Parameter:
  • Set the track_activity_query_size parameter in your PostgreSQL configuration file (postgresql.conf).
  • This parameter determines the maximum size of the query text tracked by pg_stat_statements.
  • Increase the value to a sufficient size to capture the complete query text. For example, set it to 1024 to allow tracking of queries up to 1KB in size.
  • Save the changes to the configuration file and restart the PostgreSQL server for the new configuration to take effect.

3. Query the pg_stat_statements View:

  • Execute queries against the pg_stat_statements view to retrieve statistical information about executed queries.
  • The view contains columns such as queryid, query, calls, total_time, and last_execution_time.
  • By examining the last_execution_time column, you can determine the time of the last execution for each query.
  1. Determine Staleness Criteria:
  • Define your criteria for determining when statistics are considered stale.
  • For example, you can set a threshold such as “if a query has not been executed for a certain duration (e.g., 24 hours), consider its statistics stale.”
  1. Schedule Statistics Collection:
  • Implement a periodic task or script that checks the pg_stat_statements view to identify queries with stale statistics based on your defined criteria.
  • When a query is determined to have stale statistics, execute the ANALYZE command specifically for that query to update its statistics.
  • You can use dynamic SQL and the EXECUTE command in PostgreSQL to execute the ANALYZE command for the identified queries.

By using the pg_stat_statements extension and periodically monitoring the pg_stat_statements view for staleness, you can selectively gather statistics for queries that have not been executed for a specified duration. This approach allows you to maintain up-to-date statistics while minimizing the overhead of analyzing all queries.

About Shiv Iyer 485 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.