How to Safely Upgrade PostgreSQL Extensions Without Breaking Production

How to Safely Upgrade PostgreSQL Extensions Without Breaking Production

How to Safely Upgrade PostgreSQL Extensions Without Breaking Production

1
Discovery and assessment
Inventory schemas, traffic shape, dependencies and existing automation.
2
Set up logical replication
Publisher creates the publication; subscriber starts initial copy plus continuous CDC.
3
Validate row content, not just counts
Run column-level checksums on representative tables and resolve mismatches.
4
Cutover with rollback ready
Read-only window, DNS or proxy switch, old system warm for at least one week.
5
Stabilise and decommission
Monitor on the new platform; decommission only after a month of clean operation.

MinervaDB Database Infrastructure Operations

Figure: How to Safely Upgrade PostgreSQL Extensions Without Breaking Production — MinervaDB Engineering.

This is, more or less verbatim, the diagnostic flow MinervaDB engineers follow when a mobility PostgreSQL customer reports an incident in the Safely Upgrade PostgreSQL Extensions category. Database migrations are the projects everyone underestimates. The data movement is the easy part; the verification, the cutover and the rollback plan are where the project lives or dies. The teams that migrate cleanly are the teams that designed the rollback first.

Cutover is the moment of maximum risk in any migration. Every minute of unavailable database is a minute of unavailable application; every minute of dual-write is a minute of potential divergence. Designing the cutover to minimise both, simultaneously, is most of the engineering of a migration project.

Symptom triage

When customers call us about Safely Upgrade PostgreSQL Extensions, the symptom usually presents in one of these patterns:

  • Migrations that depend on the application being read-only for hours. The business never agrees to the maintenance window when the time comes; the migration ships anyway, half-finished.
  • Validation that compares row counts but not row content. Row counts match while data is silently mismatched; the gap surfaces months later through an unhappy customer.
  • Cutting over before observability is wired up on the new system. The dashboard you do not have is the dashboard you cannot use during the cutover incident.
  • Trusting that the source schema is documented. It rarely is; the migration is the moment you learn what is actually in the production database.

Cutover rehearsals against staging-equivalent data are the fastest learning loop in a migration project. The first rehearsal will reveal three things you missed; the third will be uneventful; the production cutover should look like the third rehearsal. Anything less is hoping the migration survives its first contact with production traffic.

Step 1: capture the current state

Cutover rehearsals against staging-equivalent data are the fastest learning loop in a migration project. The first rehearsal will reveal three things you missed; the third will be uneventful; the production cutover should look like the third rehearsal. Anything less is hoping the migration survives its first contact with production traffic.

-- Validation: column-level checksum across versions
-- (run on both publisher and subscriber, compare results)
SELECT 'orders' AS table_name,
       count(*) AS row_count,
       md5(string_agg(md5(t::text), ',' ORDER BY id)) AS content_hash
FROM orders t;

-- Or for huge tables, sample-based:
SELECT count(*) AS sampled,
       sum(hashtext(t::text)) AS content_hash
FROM orders TABLESAMPLE BERNOULLI(1) t;

Step 2: identify the contributing factor

Cutover rehearsals against staging-equivalent data are the fastest learning loop in a migration project. The first rehearsal will reveal three things you missed; the third will be uneventful; the production cutover should look like the third rehearsal. Anything less is hoping the migration survives its first contact with production traffic. Change Data Capture (CDC) generalises this pattern across engine types. Debezium, MaxCompute, AWS DMS and similar tools tail the WAL or binlog and emit a stream of row-level changes that downstream systems consume. The migration becomes ‘replay this stream into the new database’, which is a simpler problem than ‘extract a consistent snapshot at the same time as the running application’.

# PostgreSQL major version upgrade with logical replication (online)
# 1. On the publisher (old version):
psql -c "CREATE PUBLICATION upgrade_pub FOR ALL TABLES;"

# 2. On the subscriber (new version):
psql -c "
CREATE SUBSCRIPTION upgrade_sub
  CONNECTION 'host=old.internal port=5432 user=replicator dbname=prod'
  PUBLICATION upgrade_pub
  WITH (copy_data = true, create_slot = true, slot_name = 'upgrade_slot');
"

# 3. Track lag during catch-up:
psql -c "SELECT subname, latest_end_lsn, last_msg_send_time, last_msg_receipt_time
        FROM pg_stat_subscription;"

Step 3: apply the targeted change

Logical replication is the standard mechanism for online major-version upgrades in PostgreSQL: the new version subscribes to the old, the new version catches up, the application cuts over, and the old version is decommissioned. The mechanics are well understood; the failure modes are around DDL (which logical replication does not cover) and large objects (which require special handling). Schema migration tools (Liquibase, Flyway, Alembic, sqitch) belong in version control, with every change reviewed and reversible. Schema drift between environments is one of the most expensive operational issues we encounter; tools cannot fix it if the team treats them as optional, but they make discipline cheap when the team treats them as mandatory.

Verification before you stand down

Before closing the incident or moving on, confirm the following are true:

  1. Design rollback before launch. The rollback you have not planned is the one you will need.
  2. Validate content, not counts. Row counts are necessary; they are not sufficient.
  3. Rehearse cutover at least three times against representative data.
  4. Keep the old system warm for a week after cutover. The unforeseen issue is the one you cannot foresee.
  5. Treat schema migrations as code. Reviewed, version-controlled, reversible.

When to escalate

When MinervaDB takes over a PostgreSQL estate as part of an enterprise support engagement, the first thirty days almost always include a structured review of Safely Upgrade PostgreSQL Extensions, because the gains here are usually larger and faster than any other intervention available in the first month.

MinervaDB engineers maintain a library of internal runbooks for PostgreSQL that are updated whenever a customer engagement reveals a new pattern; if you would like a copy of the relevant runbook for Safely Upgrade PostgreSQL Extensions, contact our team and we will share the sanitised version that we use during incident response.

It is worth emphasising that Safely Upgrade PostgreSQL Extensions in PostgreSQL is not a static topic. The engine, the cloud platforms it runs on, the storage technologies it uses and the workloads pushed through it all evolve, which means any configuration you ship today should be considered a snapshot rather than a permanent answer.

Finally, remember that documentation is a force multiplier. Every diagnostic command, every tuning decision, every runbook step that lives in a shared system rather than in someone’s head is a step closer to a PostgreSQL estate that does not depend on a single hero engineer being awake.

Where possible, treat Safely Upgrade PostgreSQL Extensions as a code review concern: a peer should challenge configuration changes the same way they would challenge an application code change, with explicit acceptance criteria and a documented rollback plan. This single cultural shift removes more outages than any individual parameter tweak.

When MinervaDB takes over a PostgreSQL estate as part of an enterprise support engagement, the first thirty days almost always include a structured review of Safely Upgrade PostgreSQL Extensions, because the gains here are usually larger and faster than any other intervention available in the first month.

MinervaDB engineers maintain a library of internal runbooks for PostgreSQL that are updated whenever a customer engagement reveals a new pattern; if you would like a copy of the relevant runbook for Safely Upgrade PostgreSQL Extensions, contact our team and we will share the sanitised version that we use during incident response.

A retail customer attempted a major version upgrade with pg_upgrade –link, succeeded, and discovered three hours later that the rollback was no longer possible because the data files had been hard-linked between versions. The recovery was a full restore from the previous night’s backup. The lesson, written in the runbook in capital letters: never –link without a verified backup taken minutes before the upgrade.

If your team can confidently answer the questions in this article without looking anything up, you are ahead of most of the PostgreSQL estates we walk into.

Frequently asked questions

What is your typical engagement model for a one-off review?

A typical engagement starts with a short discovery call, a focused review (architecture, performance, security, cost, or topic-specific), and a written assessment with prioritised recommendations. We can then either hand it back to your team to execute, or stay engaged to implement.

Do you support both self-managed and cloud-managed deployments?

Yes. We work across PostgreSQL, MySQL/MariaDB, MongoDB, SQL Server, ClickHouse, Cassandra, Redis/Valkey, Milvus, Trino and SAP HANA, on bare-metal, virtualised infrastructure, Kubernetes, and managed cloud services (Aurora, RDS, Azure SQL, Cloud SQL).

How quickly can MinervaDB engineers respond to a production incident on this topic?

MinervaDB runs a 24×7 support practice with documented SLAs that vary by contract; for SEV-1 incidents on supported clusters the first engineer response is measured in minutes, not hours.

Do you publish runbooks and documentation we can keep after the engagement?

Yes. Documentation and runbooks are deliverables, not afterthoughts. Everything we produce is yours to keep, with no proprietary tooling lock-in.

Related reading

Authoritative external references

Related MinervaDB articles and services


A short note on how we work with PostgreSQL customers

MinervaDB engineers spend their days inside production PostgreSQL environments — tuning, troubleshooting, migrating, and on-call. The articles on this site reflect what we have actually seen, in real customer engagements, not what reads well in a slide deck.

How we typically help:

  • 24×7 Enterprise-Class Support with strict SLAs for incident response, root-cause analysis and recovery.
  • Performance Engineering and Tuning for high-throughput, low-latency, mixed OLTP and analytical workloads.
  • High Availability and Disaster Recovery Architecture across regions, clouds and hybrid topologies.
  • Database Reliability Engineering (DBRE) with observability, runbooks, capacity planning and incident review.
  • Cost Optimisation for self-managed and cloud database platforms, with hardware-right-sizing and licensing reviews.
  • Data Security, Audit and Compliance readiness for regulated workloads (PCI-DSS, HIPAA, SOC 2, RBI, GDPR).
  • Database Migrations and Upgrades with zero-downtime cutover playbooks.

If you would like a deeper review: drop us a note at contact@minervadb.com or use minervadb.com/contact. Reference this piece on Safely Upgrade PostgreSQL Extensions for a faster start.

MinervaDB — The WebScale Database Infrastructure Operations Experts.

About MinervaDB Corporation 272 Articles
Full-stack Database Infrastructure Architecture, Engineering and Operations Consultative Support(24*7) Provider for PostgreSQL, MySQL, MariaDB, MongoDB, ClickHouse, Trino, SQL Server, Cassandra, CockroachDB, Yugabyte, Couchbase, Redis, Valkey, NoSQL, NewSQL, SAP HANA, Databricks, Amazon Resdhift, Amazon Aurora, CloudSQL, Snowflake and AzureSQL with core expertize in Performance, Scalability, High Availability, Database Reliability Engineering, Database Upgrades/Migration, and Data Security.