PostgreSQL Database Migration

Migrating multi-terabyte PostgreSQL databases requires careful planning to balance speed, data integrity, and system availability. As organizations scale to petabyte-scale datasets, traditional migration methods often fall short, leading to prolonged downtime and performance degradation. Therefore, adopting proven strategies becomes essential.

Below is an expanded guide outlining effective techniques used by cloud providers and database engineers, incorporating technical optimizations and real-world implementation insights to ensure a smooth, efficient migration process.

1. Parallel Processing & Advanced Tooling

Modern migration tools leverage distributed architectures to overcome single-thread limitations:

Cloud Database Migration Service (DMS) MAXIMUM Mode

Google Cloud’s optimized migration mode demonstrates remarkable efficiency for 5TB+ datasets:

Partitioned pglogical subscriptions: This technique Divides tables into replication sets based on size thresholds (For example, >50GB tables get dedicated workers)
Zero-copy streaming: This approach maximizes network utilization, achieving 95% bandwidth efficiency, compared to 60-70% in traditional ETL pipelines.
Benchmarked results: Tests show that this method is 8-12x faster than standard logical replication, successfully completing a 5TB financial data migration in just 2.7 hours.

pg_dump/pg_restore Parallelization

For physical server migrations, combine multi-threaded dumping/restoring with storage optimizations:

# NVMe-optimized dump (40 vCPU environment)
pg_dump -Fd -j 16 -Z 6 --compress=6 -f /nvme_backup mydb

# Parallel restore with connection pooling
pg_restore -j 24 -d newdb --no-privileges --no-owner /nvme_backup

Throughput metrics: 2.8M rows/sec on AMD EPYC 7B13 with RAID-0 NVMe
Memory tuning: maintenance_work_mem = 1GB per restore job prevents OOM errors

Hybrid Migration Architecture

Tool	PeerDB Streams	pglogical	pg_dump
Parallelism	16 workers	8 workers	Single-thread
Throughput	1.2M rows/sec	650K rows/sec	200K rows/sec
Use Case	Time-series data	Transactional tables	Configuration tables

2. Database Configuration Optimization

Source Database Tuning

Example Configuration (64GB RAM Server):

-- For 64GB RAM server
ALTER SYSTEM SET shared_buffers = '38GB';        -- 60% of total RAM
ALTER SYSTEM SET max_wal_senders = 20;          -- Allow concurrent backups
ALTER SYSTEM SET wal_level = 'logical';         -- Enable CDC replication
SELECT pg_reload_conf();

Implementation Tip: Increase max_replication_slots to match parallel worker count + 25% buffer

Target Database Preparation

-- Pre-migration setup
ALTER SYSTEM SET max_parallel_workers = 32;
ALTER SYSTEM SET max_worker_processes = 40;
CREATE TABLESPACE ssd_vol1 LOCATION '/mnt/ebs_vol1';
ALTER DATABASE prod SET TABLESPACE ssd_vol1;

Critical Adjustment: Disable autovacuum during initial load (autovacuum = off) to prevent I/O contention

3. Schema & Storage Optimization

Partitioning Strategy

For large tables exceeding 100M rows, implementing partitioning improves query performance.

For Example:

-- Time-based partitioning
CREATE TABLE sensor_data (
    id BIGSERIAL,
    recorded_at TIMESTAMPTZ NOT NULL,
    payload JSONB
) PARTITION BY RANGE (recorded_at);

-- Monthly partitions
CREATE TABLE sensor_2024_06 PARTITION OF sensor_data
  FOR VALUES FROM ('2024-06-01') TO ('2024-07-01')
  TABLESPACE fast_ssd;

Migration Benefit: Enables parallel COPY operations per partition

Index Optimization

For Example:

-- BRIN for time-series (75% smaller than B-tree)
CREATE INDEX readings_brin_idx ON sensor_data
  USING brin (recorded_at) WITH (pages_per_range=64);

-- Concurrent index build
CREATE INDEX CONCURRENTLY users_email_idx ON users(email);

4. Execution Strategy & Monitoring

Phased Migration Plan

Schema Sync (0 Downtime):

pg_dump --schema-only -Fp source_db | psql target_db

Initial Data Load (48h for 10TB):

parallel -j 16 pg_dump -t {} | psql target_db ::: $(psql -Atqc "SELECT tablename FROM pg_tables")

CDC Replication (15min Cutover):
- Monitor using:

SELECT pid, client_addr, state,
        pg_wal_lsn_diff(sent_lsn, write_lsn) AS send_lag,
        pg_wal_lsn_diff(write_lsn, flush_lsn) AS write_lag
FROM pg_stat_replication;

Network Optimization

For cross-cloud migrations, tuning network parameters enhances performance.

For Example (AWS Direct Connect tuning):

# AWS Direct Connect tuning
sysctl -w net.core.rmem_max=2147483647
sysctl -w net.core.wmem_max=2147483647
iptables -t mangle -A POSTROUTING -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1440

5. Post-Migration Validation

Automated Consistency Checks

# Schema checksum
pg_dump --schema-only -Fp source | sha256sum > source.sha
pg_dump --schema-only -Fp target | sha256sum > target.sha
diff source.sha target.sha

# Sample data verification
psql source_db -c "SELECT * FROM transactions TABLESAMPLE BERNOULLI(0.01)" > source_sample.csv
psql target_db -c "SELECT * FROM transactions TABLESAMPLE BERNOULLI(0.01)" > target_sample.csv
cmp source_sample.csv target_sample.csv

Performance Benchmarking

After migration, assessing performance helps detect potential bottlenecks.

For Example Query Execution Analysis:

EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT customer_id, SUM(amount)
FROM partitioned_sales
WHERE sale_date BETWEEN '2024-01-01' AND '2024-06-30'
GROUP BY ROLLUP (customer_id);

Implementation Example

For Instance, Azure’s Flexible Server migration service achieved 7.5-hour migrations for 2TB financial databases using:

32 vCPU instances with 256GB RAM
8 parallel pglogical streams
ZFS snapshots for rollback safety

Conclusion

In Conclusion, Successful large-scale PostgreSQL migrations require a multi-faceted approach combining:

Tool Parallelization: To begin with, Leverage Cloud DMS or pglogical for partitioned streams.
Infrastructure Tuning: Additionally, Optimize WAL, memory, and storage configurations
Schema Design: Moreover, Implement partitioning and space-efficient indexing
Validation Automation:Finally, Checksums and statistical sampling

By strategically utilizing these resources, Azure optimized performance, minimized downtime, and ensured a seamless migration process.

For mission-critical systems, conduct staged migrations using this workflow:

Development → Stress Testing (JMeter/pgBench) → Staging → Production

Additionally, always maintain fallback options using LVM snapshots or ZFS clones, particularly when migrating datasets exceeding 1TB.

How are Global Indexes implemented in PostgreSQL?

Comparing Storage Mechanisms: Oracle’s PCTFREE, PCTUSED, and Row Migration vs. PostgreSQL’s MVCC and Tuple Management

How to detect chained and migrated rows in PostgreSQL?

The WebScale Database Infrastructure Architecture, Engineering and Operations Company

Full-Stack Database Engineering & Cloud DBaaS Solutions for PostgreSQL, MySQL, MongoDB & More | Performance, Scalability, High Availability, Security & Analytics Experts

Best Practices for Efficient Large-Scale PostgreSQL Database Migrations

PostgreSQL Database Migration

1. Parallel Processing & Advanced Tooling

Cloud Database Migration Service (DMS) MAXIMUM Mode

pg_dump/pg_restore Parallelization

Hybrid Migration Architecture

2. Database Configuration Optimization

Source Database Tuning

Target Database Preparation

3. Schema & Storage Optimization

Partitioning Strategy

For Example:

Index Optimization

4. Execution Strategy & Monitoring

Phased Migration Plan

Network Optimization

5. Post-Migration Validation

Automated Consistency Checks

Performance Benchmarking

For Example Query Execution Analysis:

Implementation Example

Conclusion

Development → Stress Testing (JMeter/pgBench) → Staging → Production

PostgreSQL Database Migration

1. Parallel Processing & Advanced Tooling

Cloud Database Migration Service (DMS) MAXIMUM Mode

pg_dump/pg_restore Parallelization

Hybrid Migration Architecture

2. Database Configuration Optimization

Source Database Tuning

Target Database Preparation

3. Schema & Storage Optimization

Partitioning Strategy

For Example:

Index Optimization

4. Execution Strategy & Monitoring

Phased Migration Plan

Network Optimization

5. Post-Migration Validation

Automated Consistency Checks

Performance Benchmarking

For Example Query Execution Analysis:

Implementation Example

Conclusion

Development → Stress Testing (JMeter/pgBench) → Staging → Production

Related Articles

Pgpool-II Configuration for Optimal Performance

PostgreSQL Vacuum and Vacuum full are not two different processes

Step-by-Step Guide to Setting Up Point-in-Time Recovery in PostgreSQL 16