Citus for horizontal scaling of PostgreSQL

Citus for horizontal scaling of PostgreSQL

Two months ago, three different MinervaDB customers asked us to look at the same thing: their Citus horizontal scaling PostgreSQL configuration on an 8-node PostgreSQL fleet. The shard key is the most consequential decision in any sharded system, and the one most likely to be made under time pressure during the migration project. A bad shard key is rarely fixed; the system is rebuilt around it. Spend the time up-front to validate the access pattern against representative production traffic, and resist the temptation to optimise for the first quarter at the expense of the third year.

Resharding online is the most operationally expensive task we are routinely asked to perform. It involves moving data while it is changing, keeping reads consistent, keeping writes consistent, and finishing before the storage on the old shard runs out. Done well, it is invisible to the application; done poorly, it is the longest week of an engineer's career. Distributed transactions across shards are technically possible and operationally fragile. Two-phase commit ties the success of every transaction to the availability of every shard; the math says your effective availability falls below any single shard's. The architectural answer is to design the application around per-shard transactions and asynchronous reconciliation — the same lesson microservices teams learn the second time.

The two approaches you will encounter

Re-sharding is what separates production-grade sharding implementations from prototypes. The mechanics involve copying data while it is changing, switching writes atomically, and removing the old copies once consistency is verified. MongoDB's reshardCollection, Citus's rebalance_table_shards() and Vitess's reshard workflows all implement this with subtly different trade-offs. MongoDB's sharded cluster has three components: mongos routers, the config server replica set and the shard replica sets. The router decides where each query goes; the config server stores the shard map. Most operational issues we see are router-side: routers that are CPU-bound, routers running outdated configurations, routers that have not been horizontally scaled with the cluster.

A representative implementation of the first approach:

// MongoDB: shard a collection on a compound key for even distribution
sh.enableSharding("production");

sh.shardCollection(
  "production.events",
  { tenant_id: "hashed", created_at: 1 }
);

// Inspect chunk distribution
db.adminCommand({ balancerStatus: 1 });
sh.status();

The trade-off matrix

Each approach trades cleanly against the other on a small number of axes that operators learn to recognise:

  • Choosing a monotonically increasing shard key. Every new write lands on the most recent shard; the older shards are read-only and the newest one is permanently saturated.
  • Cross-shard transactions in the application's hot path. They look like normal transactions, run two orders of magnitude slower, and lock the cluster on the shard with the slowest disk.
  • Schema changes that are applied to one shard and forgotten on the others. Subtle correctness bugs follow that take months to detect.
  • Backup strategies that snapshot each shard independently. Without a consistent point-in-time across shards, recovery produces a database that has never logically existed.

Hash-based sharding distributes load evenly but makes range queries expensive: the data for any range is spread across every shard, so the query becomes a scatter-gather. Range-based sharding is the opposite: range queries are local, but uneven access patterns produce hot shards. The right answer depends on the workload, not on the documentation. Hot shards are the operational signature of a poorly chosen shard key. The dashboards show one shard pegged at 90% CPU while the others idle; the workaround is to add capacity that the cluster does not need. The right fix is almost always reshard, but the cost is significant enough that teams often live with the imbalance for years.

When the first approach is the right answer

Hot shards are the operational signature of a poorly chosen shard key. The dashboards show one shard pegged at 90% CPU while the others idle; the workaround is to add capacity that the cluster does not need. The right fix is almost always reshard, but the cost is significant enough that teams often live with the imbalance for years. Hash-based sharding distributes load evenly but makes range queries expensive: the data for any range is spread across every shard, so the query becomes a scatter-gather. Range-based sharding is the opposite: range queries are local, but uneven access patterns produce hot shards. The right answer depends on the workload, not on the documentation.

When the second approach earns its keep

Re-sharding is what separates production-grade sharding implementations from prototypes. The mechanics involve copying data while it is changing, switching writes atomically, and removing the old copies once consistency is verified. MongoDB's reshardCollection, Citus's rebalance_table_shards() and Vitess's reshard workflows all implement this with subtly different trade-offs. MongoDB's sharded cluster has three components: mongos routers, the config server replica set and the shard replica sets. The router decides where each query goes; the config server stores the shard map. Most operational issues we see are router-side: routers that are CPU-bound, routers running outdated configurations, routers that have not been horizontally scaled with the cluster.

-- Citus: distribute a large table by tenant_id
SELECT create_distributed_table('events', 'tenant_id');

-- Reference table for joins
SELECT create_reference_table('countries');

-- Verify shard placement
SELECT logicalrelid::regclass, partmethod, partkey,
       (SELECT count(*) FROM pg_dist_shard
        WHERE logicalrelid = pg_dist_partition.logicalrelid) AS shard_count
FROM pg_dist_partition;

Our standing recommendation

A SaaS platform sharded customer data by tenant ID using a hash function. Within eighteen months, two tenants accounted for 60% of the traffic and one shard was permanently saturated. The reshard moved those tenants to dedicated shards and used the original hash function for the long tail; throughput tripled overnight, with no application change.

Finally, remember that documentation is a force multiplier. Every diagnostic command, every tuning decision, every runbook step that lives in a shared system rather than in someone's head is a step closer to a PostgreSQL estate that does not depend on a single hero engineer being awake.

MinervaDB engineers maintain a library of internal runbooks for PostgreSQL that are updated whenever a customer engagement reveals a new pattern; if you would like a copy of the relevant runbook for Citus horizontal scaling PostgreSQL, contact our team and we will share the sanitised version that we use during incident response.

Where possible, treat Citus horizontal scaling PostgreSQL as a code review concern: a peer should challenge configuration changes the same way they would challenge an application code change, with explicit acceptance criteria and a documented rollback plan. This single cultural shift removes more outages than any individual parameter tweak.

It is worth emphasising that Citus horizontal scaling PostgreSQL in PostgreSQL is not a static topic. The engine, the cloud platforms it runs on, the storage technologies it uses and the workloads pushed through it all evolve, which means any configuration you ship today should be considered a snapshot rather than a permanent answer.

When MinervaDB takes over a PostgreSQL estate as part of an enterprise support engagement, the first thirty days almost always include a structured review of Citus horizontal scaling PostgreSQL, because the gains here are usually larger and faster than any other intervention available in the first month.

Finally, remember that documentation is a force multiplier. Every diagnostic command, every tuning decision, every runbook step that lives in a shared system rather than in someone's head is a step closer to a PostgreSQL estate that does not depend on a single hero engineer being awake.

  • Exhaust vertical scaling, replicas and archival before you commit to sharding. Sharding is permanent; the alternatives are reversible.
  • Validate the shard key against six months of representative traffic, not against a theoretical model.
  • Plan for resharding from day one. Tools, tests and a recent measurement of how long it takes.
  • Keep schema changes coordinated across shards. Drift is the silent killer of multi-shard correctness.
  • Backup shards as a logical unit, not as independent databases.

If your team can confidently answer the questions in this article without looking anything up, you are ahead of most of the PostgreSQL estates we walk into.

Frequently asked questions

Can your team take over on-call for our database tier?

Yes — our 24x7 enterprise support practice is designed exactly for this. We can take pager ownership at L1/L2 with documented escalation paths into your engineering team for application-side issues.

What is your typical engagement model for a one-off review?

A typical engagement starts with a short discovery call, a focused review (architecture, performance, security, cost, or topic-specific), and a written assessment with prioritised recommendations. We can then either hand it back to your team to execute, or stay engaged to implement.

How quickly can MinervaDB engineers respond to a production incident on this topic?

MinervaDB runs a 24x7 support practice with documented SLAs that vary by contract; for SEV-1 incidents on supported clusters the first engineer response is measured in minutes, not hours.

Do you publish runbooks and documentation we can keep after the engagement?

Yes. Documentation and runbooks are deliverables, not afterthoughts. Everything we produce is yours to keep, with no proprietary tooling lock-in.

Related reading

Vendor and community documentation

Related MinervaDB articles and services


When to bring MinervaDB into the conversation

MinervaDB engineers spend their days inside production PostgreSQL environments — tuning, troubleshooting, migrating, and on-call. The articles on this site reflect what we have actually seen, in real customer engagements, not what reads well in a slide deck.

Our consulting and 24x7 support engineers cover the full operational surface: incident response with strict SLAs, performance engineering for high-throughput OLTP and analytical workloads, high-availability and disaster-recovery architecture across regions and clouds, database reliability engineering practices, cost optimisation for self-managed and cloud platforms, data-security and compliance readiness, and zero-downtime migrations and upgrades.

To start a conversation: email contact@minervadb.com or minervadb.com/contact. Mention this article on Citus horizontal scaling PostgreSQL and we will come prepared with a tailored review of your environment.

MinervaDB — The WebScale Database Infrastructure Operations Experts.

About MinervaDB Corporation 273 Articles
Full-stack Database Infrastructure Architecture, Engineering and Operations Consultative Support(24*7) Provider for PostgreSQL, MySQL, MariaDB, MongoDB, ClickHouse, Trino, SQL Server, Cassandra, CockroachDB, Yugabyte, Couchbase, Redis, Valkey, NoSQL, NewSQL, SAP HANA, Databricks, Amazon Resdhift, Amazon Aurora, CloudSQL, Snowflake and AzureSQL with core expertize in Performance, Scalability, High Availability, Database Reliability Engineering, Database Upgrades/Migration, and Data Security.