How to implement online schema change in PostgreSQL with Python?

Online schema changes in PostgreSQL allow you to modify a table’s schema while keeping it available for reads and writes. There are several third-party tools available for implementing online schema changes, such as pg_repack and pglogical. These tools provide easy-to-use interfaces for performing online schema changes, but may require some additional setup and configuration.

If you want to implement online schema changes in PostgreSQL with Python, you can use the psycopg2 library to execute SQL commands and the pg8000 library for handling database connections.

Here are the general steps for implementing online schema changes in PostgreSQL with Python:

  1. Create a new table with the desired schema using the CREATE TABLE command.
  2. Use the INSERT INTO command to copy data from the old table into the new table.
  3. Use the ALTER TABLE command to rename the old table to a backup name, and then rename the new table to the original table name.
  4. Use the DROP TABLE command to drop the backup table.

Here is some sample Python code that implements these steps:

Note that this code is just a simple example, and may need to be modified depending on your specific schema change requirements. It is also important to test any schema changes on a non-production database before applying them to a production system.

How to handle the delta (changes or updates) between the old and new tables?

The above code copies data from one table to another using PostgreSQL’s INSERT INTO SELECT statement. However, it does not handle the delta (changes or updates) between the old and new table. If you want to handle the delta and only copy the new or modified rows, you can modify the code as follows:

  1. Add a timestamp column to track the last modification time in both the old and new tables:

with conn.cursor() as cur:
ALTER TABLE my_table ADD COLUMN last_modified TIMESTAMP DEFAULT current_timestamp
ALTER TABLE my_table_new ADD COLUMN last_modified TIMESTAMP DEFAULT current_timestamp

2. Update the INSERT statement to include a WHERE clause that filters the rows based on their modification time:

with conn.cursor() as cur:
INSERT INTO my_table_new (id, name, age)
SELECT id, name, age
FROM my_table
WHERE last_modified > (SELECT MAX(last_modified) FROM my_table_backup)

3. Replace the table renaming and dropping steps with the following:

with conn.cursor() as cur:
cur.execute(“DROP TABLE IF EXISTS my_table_backup”)
cur.execute(“ALTER TABLE my_table RENAME TO my_table_backup”)
cur.execute(“ALTER TABLE my_table_new RENAME TO my_table”)

By introducing the last_modified column and modifying the INSERT statement to only copy rows with a modification time greater than the latest modification time in the backup table, you can handle the delta and copy only the changed or new rows to the new table.

Please note that this code assumes that the my_table_backup table already exists as a backup of the original table. Adjust the code accordingly if you need to create the backup table first.

About Shiv Iyer 452 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.