Performing Vacuum-like Operations in PostgreSQL Using pg_repack
The pg_repack extension is a powerful tool in PostgreSQL that allows you to perform vacuum-like operations, such as reclaiming storage and rebuilding tables or indexes, without locking the table. It is especially useful for large tables where downtime caused by traditional VACUUM FULL is not acceptable.
Key Features of pg_repack
- Non-blocking operation: pg_repack works by creating a new table or index and incrementally copying data into it while maintaining access for read and write operations.
- Reclaims disk space: It performs a complete cleanup of bloated tables or indexes caused by frequent updates or deletes.
- Rebuilds indexes: Optimizes indexes by recreating them.
- Online operation: Unlike VACUUM FULL, which locks the table, pg_repack minimizes downtime.
How pg_repack Works
1. Temporary Shadow Table
- pg_repack creates a temporary shadow table to store a copy of the data.
- The shadow table is kept in sync with the main table using triggers.
2. Data Copy
- Data is copied from the original table to the shadow table in small chunks.
- Triggers capture ongoing write operations during the process to ensure consistency.
3. Atomic Swap
- After the copy is complete, the shadow table replaces the original table atomically. This step requires a brief lock but is nearly instantaneous.
Installing pg_repack
- Install the Extension:
On most systems, you can install pg_repack using your package manager:
1 |
sudo apt install postgresql-version -repack |
- Create the Extension in the Database:
1 |
CREATE EXTENSION pg_repack; |
- Verify Installation:
1 |
SELECT * FROM pg_available_extensions WHERE name = 'pg_repack'; |
Performing Vacuum-like Operations with pg_repack
Repacking a Table
To reclaim storage from a bloated table:
1 |
pg_repack -d my_database -t my_table |
- -d: Specifies the database name.
- -t: Specifies the table name to be repacked.
Repacking an Entire Database
To optimize all tables and indexes in the database:
1 |
pg_repack -d my_database |
Repacking an Index
To rebuild a specific index:
1 |
pg_repack -d my_database -i my_index |
Best Practices for Using pg_repack
- Run During Low Traffic: Although pg_repack is online, schedule it during off-peak hours.
- Monitor Resources: Use pg_stat_activity and logs to monitor the process.
- Test in Non-Production: Always test pg_repack on a non-production environment first.
- Ensure Compatibility: Verify your PostgreSQL version is compatible with pg_repack.
Limitations
- Additional Disk Space Required: Needs temporary storage equal to the size of the table and indexes.
- System Tables: Cannot repack system tables.
- Performance Impact: Triggers used for syncing may slightly impact performance during operation.
Conclusion
pg_repack provides an efficient way to perform vacuum-like operations in PostgreSQL with minimal impact on live operations. It is particularly valuable for large, frequently updated tables that experience bloat over time. By following best practices and monitoring the process, you can maintain optimal database performance while keeping downtime to a minimum.