Best Practices for Efficient Large-Scale PostgreSQL Database Migrations

PostgreSQL Database Migration

Migrating multi-terabyte PostgreSQL databases requires careful planning to balance speed, data integrity, and system availability. As organizations scale to petabyte-scale datasets, traditional migration methods often fall short, leading to prolonged downtime and performance degradation. Therefore, adopting proven strategies becomes essential.

Below is an expanded guide outlining effective techniques used by cloud providers and database engineers, incorporating technical optimizations and real-world implementation insights to ensure a smooth, efficient migration process.


1. Parallel Processing & Advanced Tooling

Modern migration tools leverage distributed architectures to overcome single-thread limitations:

Cloud Database Migration Service (DMS) MAXIMUM Mode

Google Cloud’s optimized migration mode demonstrates remarkable efficiency for 5TB+ datasets:

  • Partitioned pglogical subscriptions: This technique Divides tables into replication sets based on size thresholds (For example, >50GB tables get dedicated workers)
  • Zero-copy streaming: This approach maximizes network utilization, achieving 95% bandwidth efficiency, compared to 60-70% in traditional ETL pipelines.
  • Benchmarked results: Tests show that this method is 8-12x faster than standard logical replication, successfully completing a 5TB financial data migration in just 2.7 hours.

pg_dump/pg_restore Parallelization

For physical server migrations, combine multi-threaded dumping/restoring with storage optimizations:

# NVMe-optimized dump (40 vCPU environment)
pg_dump -Fd -j 16 -Z 6 --compress=6 -f /nvme_backup mydb

# Parallel restore with connection pooling
pg_restore -j 24 -d newdb --no-privileges --no-owner /nvme_backup
  • Throughput metrics: 2.8M rows/sec on AMD EPYC 7B13 with RAID-0 NVMe
  • Memory tuning: maintenance_work_mem = 1GB per restore job prevents OOM errors

Hybrid Migration Architecture

Tool PeerDB Streams pglogical pg_dump
Parallelism 16 workers 8 workers Single-thread
Throughput 1.2M rows/sec 650K rows/sec 200K rows/sec
Use Case Time-series data Transactional tables Configuration tables

2. Database Configuration Optimization

Source Database Tuning

Example Configuration (64GB RAM Server):

-- For 64GB RAM server
ALTER SYSTEM SET shared_buffers = '38GB';        -- 60% of total RAM
ALTER SYSTEM SET max_wal_senders = 20;          -- Allow concurrent backups
ALTER SYSTEM SET wal_level = 'logical';         -- Enable CDC replication
SELECT pg_reload_conf();

Implementation Tip: Increase max_replication_slots to match parallel worker count + 25% buffer

Target Database Preparation

-- Pre-migration setup
ALTER SYSTEM SET max_parallel_workers = 32;
ALTER SYSTEM SET max_worker_processes = 40;
CREATE TABLESPACE ssd_vol1 LOCATION '/mnt/ebs_vol1';
ALTER DATABASE prod SET TABLESPACE ssd_vol1;

Critical Adjustment: Disable autovacuum during initial load (autovacuum = off) to prevent I/O contention

3. Schema & Storage Optimization

Partitioning Strategy

For large tables exceeding 100M rows, implementing partitioning improves query performance.

For Example:

-- Time-based partitioning
CREATE TABLE sensor_data (
    id BIGSERIAL,
    recorded_at TIMESTAMPTZ NOT NULL,
    payload JSONB
) PARTITION BY RANGE (recorded_at);

-- Monthly partitions
CREATE TABLE sensor_2024_06 PARTITION OF sensor_data
  FOR VALUES FROM ('2024-06-01') TO ('2024-07-01')
  TABLESPACE fast_ssd;

Migration Benefit: Enables parallel COPY operations per partition

Index Optimization

For Example:

-- BRIN for time-series (75% smaller than B-tree)
CREATE INDEX readings_brin_idx ON sensor_data
  USING brin (recorded_at) WITH (pages_per_range=64);

-- Concurrent index build
CREATE INDEX CONCURRENTLY users_email_idx ON users(email);

4. Execution Strategy & Monitoring

Phased Migration Plan

  • Schema Sync (0 Downtime):
pg_dump --schema-only -Fp source_db | psql target_db
  • Initial Data Load (48h for 10TB):
parallel -j 16 pg_dump -t {} | psql target_db ::: $(psql -Atqc "SELECT tablename FROM pg_tables")
  • CDC Replication (15min Cutover):
    • Monitor using:
SELECT pid, client_addr, state,
        pg_wal_lsn_diff(sent_lsn, write_lsn) AS send_lag,
        pg_wal_lsn_diff(write_lsn, flush_lsn) AS write_lag
FROM pg_stat_replication;

Network Optimization

For cross-cloud migrations, tuning network parameters enhances performance.

For Example (AWS Direct Connect tuning):

# AWS Direct Connect tuning
sysctl -w net.core.rmem_max=2147483647
sysctl -w net.core.wmem_max=2147483647
iptables -t mangle -A POSTROUTING -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1440

5. Post-Migration Validation

Automated Consistency Checks

# Schema checksum
pg_dump --schema-only -Fp source | sha256sum > source.sha
pg_dump --schema-only -Fp target | sha256sum > target.sha
diff source.sha target.sha

# Sample data verification
psql source_db -c "SELECT * FROM transactions TABLESAMPLE BERNOULLI(0.01)" > source_sample.csv
psql target_db -c "SELECT * FROM transactions TABLESAMPLE BERNOULLI(0.01)" > target_sample.csv
cmp source_sample.csv target_sample.csv

Performance Benchmarking

After migration, assessing performance helps detect potential bottlenecks.

For Example Query Execution Analysis:

EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT customer_id, SUM(amount)
FROM partitioned_sales
WHERE sale_date BETWEEN '2024-01-01' AND '2024-06-30'
GROUP BY ROLLUP (customer_id);

Implementation Example

For Instance, Azure’s Flexible Server migration service achieved 7.5-hour migrations for 2TB financial databases using:

  • 32 vCPU instances with 256GB RAM
  • 8 parallel pglogical streams
  • ZFS snapshots for rollback safety

Conclusion

In Conclusion, Successful large-scale PostgreSQL migrations require a multi-faceted approach combining:

  1. Tool Parallelization: To begin with, Leverage Cloud DMS or pglogical for partitioned streams.
  2. Infrastructure Tuning: Additionally, Optimize WAL, memory, and storage configurations
  3. Schema Design: Moreover, Implement partitioning and space-efficient indexing
  4. Validation Automation:Finally, Checksums and statistical sampling

By strategically utilizing these resources, Azure optimized performance, minimized downtime, and ensured a seamless migration process.

For mission-critical systems, conduct staged migrations using this workflow:

Development → Stress Testing (JMeter/pgBench) → Staging → Production

Additionally, always maintain fallback options using LVM snapshots or ZFS clones, particularly when migrating datasets exceeding 1TB.

How are Global Indexes implemented in PostgreSQL?

Comparing Storage Mechanisms: Oracle’s PCTFREE, PCTUSED, and Row Migration vs. PostgreSQL’s MVCC and Tuple Management

How to detect chained and migrated rows in PostgreSQL?

About MinervaDB Corporation 60 Articles
A boutique private-label enterprise-class MySQL, MariaDB, MyRocks, PostgreSQL and ClickHouse consulting, 24*7 consultative support and remote DBA services company with core expertise in performance, scalability and high availability. Our consultants have several years of experience in architecting and building web-scale database infrastructure operations for internet properties from diversified verticals like CDN, Mobile Advertising Networks, E-Commerce, Social Media Applications, SaaS, Gaming and Digital Payment Solutions. Our globally distributed team working on multiple timezones guarantee 24*7 Consulting, Support and Remote DBA Services delivery for MySQL, MariaDB, MyRocks, PostgreSQL and ClickHouse.