PostgreSQL Database Bloat Analysis: Identifying Large and Bloated Tables

Introduction:

This blog post focuses on analyzing table size and bloat using the provided SQL query. Database bloat can significantly impact PostgreSQL performance, leading to increased storage requirements and slower query execution. We will explain the query in detail and discuss how it helps in identifying large and bloated tables in PostgreSQL.

SELECT relname, relkind, pg_size_pretty(pg_relation_size(oid)) AS size,
pg_size_pretty(pg_total_relation_size(oid) – pg_relation_size(oid)) AS bloat
FROM pg_class
WHERE relkind IN (‘r’, ‘i’) AND relname NOT LIKE ‘pg_%’ AND relname NOT LIKE ‘sql_%’
ORDER BY pg_total_relation_size(oid) – pg_relation_size(oid) DESC
LIMIT 10;

Query Explanation:

The SQL query retrieves information about the largest and most bloated tables in a PostgreSQL database. Let’s break down the query and explore each component:

  1. pg_class: This is a system catalog table in PostgreSQL that stores information about database objects such as tables, indexes, and sequences.
  2. relname: This column represents the name of the table or index.
  3. relkind: This column indicates the type of object (table, index, or other).
  4. pg_relation_size(oid): This function calculates the size of the table or index in bytes.
  5. pg_size_pretty(): This function formats the size in a human-readable format (e.g., bytes, kilobytes, megabytes).
  6. pg_total_relation_size(oid) – pg_relation_size(oid): This expression calculates the amount of bloat in the table or index by subtracting the actual size from the total size.
  7. relname NOT LIKE ‘pg_%’ AND relname NOT LIKE ‘sql_%’: These conditions exclude system tables and tables with names starting with ‘pg_’ or ‘sql_’.
  8. ORDER BY pg_total_relation_size(oid) – pg_relation_size(oid) DESC: This sorts the results in descending order based on the amount of bloat.
  9. LIMIT 10: This limits the output to the top 10 tables with the highest bloat.

Analysis and Interpretation:

Running the provided query helps identify tables with the largest sizes and the amount of bloat present in each table. Here’s how to interpret the output:

  1. Table Name (relname): The name of the table or index.
  2. Object Type (relkind): Indicates whether the object is a table (‘r’) or an index (‘i’).
  3. Size (size): The actual size of the table or index. It represents the disk space used by the object.
  4. Bloat (bloat): The amount of bloat in the table or index. Bloat refers to the additional space consumed due to fragmentation or unused space within the object.

By analyzing the output of this query, database administrators can identify tables that contribute the most to the overall database size and have significant bloat. These tables might require optimization actions such as vacuuming, reindexing, or data archiving to reclaim disk space and improve query performance.

Conclusion:

Regularly analyzing table sizes and identifying bloat is essential for maintaining a healthy and performant PostgreSQL database. By using the provided SQL query and examining the output, administrators can identify large and bloated tables that impact storage usage and query performance. Taking appropriate optimization actions for these tables helps ensure efficient space utilization, faster queries, and a more responsive database system.

About Shiv Iyer 466 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.