PostgreSQL Execution Plan Analysis

PostgreSQL Execution Plan

PostgreSQL execution plan analysis is critical for diagnosing query performance issues. Here’s how to interpret optimiser decisions and identify bottlenecks:

Generating Execution Plans

  • First, Generate basic plan estimation:
EXPLAIN SELECT * FROM employees WHERE salary > 50000;

Shows estimated costs without executing the query

  • Next, obtain actual runtime metrics:
EXPLAIN ANALYZE SELECT * FROM employees WHERE salary > 50000;

Executes the query and provides real-timing data

  • Additionally, for transaction-safe analysis:
BEGIN; EXPLAIN ANALYZE ...; ROLLBACK;

For write operations to prevent data changes

Key Plan Components to Analyze

1. Cost Metrics

Seq Scan on employees (cost=0.00..16.50 rows=173 width=118)
  • Startup cost: Resources to return first row (0.00)
  • Total cost: Resources for full execution (16.50)
  • Rows/Width: Estimated rows and row size

2. Node Types

Node Type Indicates Common Issues
Seq Scan Full table scan Missing indexes on filtered columns
Index Scan Index utilization Non-selective indexes
Nested Loop Small table joins Cartesian products without conditions
Hash Join Large dataset joins Insufficient work_mem for hash tables

3. Actual vs Estimated

(actual time=0.018..0.018 rows=0 loops=1)
  • Large discrepancies between estimated and actual rows indicate outdated statistics
  • Similarly, High actual time relative to cost estimates suggests planner miscalculations

Identifying Bottlenecks

  • Sequential Scans on Large Tables
    • Look for Seq Scan on tables with >1,000 rows
    • Solution: Add appropriate indexes
  • Sort/Merge Operations
    • Check work_mem settings if spills to disk occur
  • Lock Contention
    • Use pg_stat_activity to identify blocking transactions
  • Suboptimal Join Strategies
    • Nested Loop with large inner tables
    • Hash Join exceeding work_mem

Optimizer Decision Factors

  • Statistics from pg_statistic (updated via ANALYZE)
  • Cost constants in postgresql.conf (cpu_tuple_cost, random_page_cost)
  • Data distribution (histogram bounds for value distribution)
  • Index availability (only visible, enabled indexes are considered)

Advanced Analysis Tools

  • Visual Explain (pgAdmin, DBeaver, other PostgreSQL GUI tools)
    • Graphical plan representation
    • Color-coded costly nodes
  • Plan stabilization
ALTER SYSTEM SET plan_cache_mode = 'force_custom_plan';

Helps control plan generation behavior

  • Wait Event Analysis
SELECT wait_event_type, COUNT(*) FROM pg_stat_activity GROUP BY 1;

Identify system resource contention

Optimization Workflow

  1. Start by -Run EXPLAIN ANALYZE on slow query
  2. Identify highest-cost node in plan tree
  3. Check for:
    • Sequential scans where indexes exist
    • Sort operations exceeding work_mem
    • Incorrect row estimates
  4. Test solutions:
    • Add missing indexes
    • Refresh statistics with ANALYZE
    • Adjust cost parameters
  5. Finally, Re-run analysis to verify improvements

By systematically analyzing these elements, you can pinpoint why PostgreSQL’s optimizer chooses specific execution paths and address underlying performance issues.



Strategic Indexing: Making Informed Column Indexing Decisions in PostgreSQL

 

 

How to compute cost of slow query in PostgreSQL?

 

Tips and tricks to troubleshoot indexes with high maintenance costs in PostgreSQL

About MinervaDB Corporation 273 Articles
Full-stack Database Infrastructure Architecture, Engineering and Operations Consultative Support(24*7) Provider for PostgreSQL, MySQL, MariaDB, MongoDB, ClickHouse, Trino, SQL Server, Cassandra, CockroachDB, Yugabyte, Couchbase, Redis, Valkey, NoSQL, NewSQL, SAP HANA, Databricks, Amazon Resdhift, Amazon Aurora, CloudSQL, Snowflake and AzureSQL with core expertize in Performance, Scalability, High Availability, Database Reliability Engineering, Database Upgrades/Migration, and Data Security.

PostgreSQL Execution Plan Analysis

PostgreSQL Execution Plan Analysis

PostgreSQL Execution Plan Analysis

Application Workload
OLTP, analytics, batch jobs, APIs and microservices
Database Engine
Query planner, executor, transaction manager, locking
Storage Engine
Buffer pool, WAL/redo log, page management, compression
Persistent Storage
Data files, log files, backups on durable media
Operations & Observability
Monitoring, alerting, backups, runbooks, SLOs

Source: MinervaDB Database Infrastructure Operations

Figure: PostgreSQL Execution Plan Analysis — MinervaDB Engineering.

On a recent engagement we walked into a SaaS PostgreSQL environment where PostgreSQL Execution Plan had been silently broken for eleven months — nobody noticed until the audit. There is an industrial folklore around 'the right value' for any given database parameter. Most of it is wrong. The right value depends on the workload, the hardware, the storage class, and the variance budget the business has agreed to. Anyone who tells you the right value of work_mem without first asking what your queries do is selling you something.

The fastest query is the query that does not run. The second fastest is the query that runs against an index, in cache, with no parsing overhead. Most performance work is moving queries up that ladder. Once a query is at the top, the marginal returns of further tuning collapse and the engineering time is better spent on the next bottleneck. Performance tuning is mostly the discipline of asking better questions. The query is slow — why? It does a sequential scan — why? Statistics are stale — why? Autovacuum has not run — why? The chain of why is what separates engineers who fix problems from engineers who reach for parameter changes.

How it works under the hood

work_mem is the per-operation memory budget for sorts and hash joins. Set it too low and the planner spills to disk; set it too high and a hundred concurrent queries each take that allocation and your machine OOMs. The right approach is a low default with per-session overrides for the queries that legitimately need more. Statistics drive the planner; out-of-date statistics produce bad plans. ANALYZE samples each table and updates pg_statistic; the default sample size is 30,000 rows times default_statistics_target (default 100). On highly variable data, raising the per-column statistics target with ALTER TABLE ... ALTER COLUMN ... SET STATISTICS is a precise lever.

-- The single most useful diagnostic in PostgreSQL
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS, FORMAT TEXT)
SELECT customer_id, sum(amount_cents)
FROM orders
WHERE created_at > now() - interval '30 days'
  AND status = 'paid'
GROUP BY customer_id
ORDER BY 2 DESC
LIMIT 100;

-- And what is the buffer cache hit ratio for the recent workload?
SELECT round(100.0 *
             sum(blks_hit) / NULLIF(sum(blks_hit) + sum(blks_read), 0), 2)
       AS cache_hit_pct
FROM pg_stat_database
WHERE datname = current_database();

The data structures involved

Connection management is performance tuning by another name. Postgres allocates roughly 10 MB per connection for backend memory, more for active queries; 500 idle connections is a quarter of a server. pgBouncer in transaction mode is the standard answer, multiplexing thousands of client connections across dozens of backend ones. Query plan stability under load is what most production-tuning conversations actually care about. A query that picks an index in dev and a sequential scan in production is the canonical pain point. pg_hint_plan, query rewriting, and statistics tuning all attack this; the hardest cases require workload shape changes upstream.

The cost model nobody documents clearly

Query plan stability under load is what most production-tuning conversations actually care about. A query that picks an index in dev and a sequential scan in production is the canonical pain point. pg_hint_plan, query rewriting, and statistics tuning all attack this; the hardest cases require workload shape changes upstream. Statistics drive the planner; out-of-date statistics produce bad plans. ANALYZE samples each table and updates pg_statistic; the default sample size is 30,000 rows times default_statistics_target (default 100). On highly variable data, raising the per-column statistics target with ALTER TABLE ... ALTER COLUMN ... SET STATISTICS is a precise lever.

-- MySQL slow log digest, top by total time
-- Run pt-query-digest on the slow log; the SQL view of the result:
SELECT digest_text, count_star, round(sum_timer_wait/1e9, 0) AS total_ms,
       round(avg_timer_wait/1e9, 1) AS avg_ms,
       round(max_timer_wait/1e9, 1) AS max_ms
FROM performance_schema.events_statements_summary_by_digest
ORDER BY sum_timer_wait DESC
LIMIT 20;

Failure modes we have actually seen

A SaaS billing platform was missing its 99.9th percentile SLO during month-end runs. The diagnosis was not slow queries; it was a brief flurry of VACUUM (FREEZE) across the partitioned billing tables, holding share locks long enough to delay the application. Moving the freeze to a quieter time and per-table tuned scale factors solved the SLO violation without touching any query.

  • Increasing max_connections instead of introducing a connection pooler. Each connection costs memory whether it is busy or idle.
  • Aggressive parameter sweeps in production. Change one thing at a time, measure, repeat; everything else is folklore.
  • Tuning a single query in isolation. The query that looks like the bottleneck might be a victim of a heavier neighbour holding locks; fix the neighbour first.
  • Tuning shared_buffers based on a number from a 2014 blog post. The right value depends on workload mix and OS cache behaviour, not on a fixed percentage.

What we actually tune in production

Where possible, treat PostgreSQL Execution Plan as a code review concern: a peer should challenge configuration changes the same way they would challenge an application code change, with explicit acceptance criteria and a documented rollback plan. This single cultural shift removes more outages than any individual parameter tweak.

MinervaDB engineers maintain a library of internal runbooks for PostgreSQL that are updated whenever a customer engagement reveals a new pattern; if you would like a copy of the relevant runbook for PostgreSQL Execution Plan, contact our team and we will share the sanitised version that we use during incident response.

When MinervaDB takes over a PostgreSQL estate as part of an enterprise support engagement, the first thirty days almost always include a structured review of PostgreSQL Execution Plan, because the gains here are usually larger and faster than any other intervention available in the first month.

It is worth emphasising that PostgreSQL Execution Plan in PostgreSQL is not a static topic. The engine, the cloud platforms it runs on, the storage technologies it uses and the workloads pushed through it all evolve, which means any configuration you ship today should be considered a snapshot rather than a permanent answer.

Finally, remember that documentation is a force multiplier. Every diagnostic command, every tuning decision, every runbook step that lives in a shared system rather than in someone's head is a step closer to a PostgreSQL estate that does not depend on a single hero engineer being awake.

Where possible, treat PostgreSQL Execution Plan as a code review concern: a peer should challenge configuration changes the same way they would challenge an application code change, with explicit acceptance criteria and a documented rollback plan. This single cultural shift removes more outages than any individual parameter tweak.

MinervaDB engineers maintain a library of internal runbooks for PostgreSQL that are updated whenever a customer engagement reveals a new pattern; if you would like a copy of the relevant runbook for PostgreSQL Execution Plan, contact our team and we will share the sanitised version that we use during incident response.

When MinervaDB takes over a PostgreSQL estate as part of an enterprise support engagement, the first thirty days almost always include a structured review of PostgreSQL Execution Plan, because the gains here are usually larger and faster than any other intervention available in the first month.

Query plan stability under load is what most production-tuning conversations actually care about. A query that picks an index in dev and a sequential scan in production is the canonical pain point. pg_hint_plan, query rewriting, and statistics tuning all attack this; the hardest cases require workload shape changes upstream. EXPLAIN (ANALYZE, BUFFERS) is the single most useful diagnostic command in PostgreSQL. ANALYZE runs the query and reports actual times; BUFFERS reports cache hits and misses. The combination tells you whether the query is fast in isolation and whether it is paying I/O cost or running from memory.

If your team can confidently answer the questions in this article without looking anything up, you are ahead of most of the PostgreSQL estates we walk into.

Frequently asked questions

Do you support both self-managed and cloud-managed deployments?

Yes. We work across PostgreSQL, MySQL/MariaDB, MongoDB, SQL Server, ClickHouse, Cassandra, Redis/Valkey, Milvus, Trino and SAP HANA, on bare-metal, virtualised infrastructure, Kubernetes, and managed cloud services (Aurora, RDS, Azure SQL, Cloud SQL).

Do you work with regulated industries with strict change-control requirements?

Yes. Several MinervaDB customers operate under PCI-DSS, HIPAA, SOC 2, RBI, GDPR or local equivalents. We work inside change-control processes, document every change, and provide audit-ready evidence on request.

What if our team is smaller and we just want a quarterly health-check?

That is one of the most common engagements we run. A quarterly health-check covers performance trends, capacity, observability gaps, security posture, and a written report with prioritised actions.

How quickly can MinervaDB engineers respond to a production incident on this topic?

MinervaDB runs a 24x7 support practice with documented SLAs that vary by contract; for SEV-1 incidents on supported clusters the first engineer response is measured in minutes, not hours.

Where to read more

Vendor and community documentation

MinervaDB resources


When to bring MinervaDB into the conversation

MinervaDB engineers spend their days inside production PostgreSQL environments — tuning, troubleshooting, migrating, and on-call. The articles on this site reflect what we have actually seen, in real customer engagements, not what reads well in a slide deck.

How we typically help:

  • Round-the-clock production support with documented SLAs and escalation paths.
  • Performance engineering and capacity planning for serious OLTP and analytical workloads.
  • High-availability and disaster-recovery architecture, across regions and clouds.
  • Cost optimisation for cloud database bills and self-managed platforms.
  • Migration and upgrade engineering with rehearsed cutover and rollback plans.
  • Security, audit, and compliance readiness for regulated industries.

If this resonates: reach the team at contact@minervadb.com or via minervadb.com/contact and we can schedule a no-obligation technical discovery focused on PostgreSQL Execution Plan.

MinervaDB — The WebScale Database Infrastructure Operations Experts.

About MinervaDB Corporation 273 Articles
Full-stack Database Infrastructure Architecture, Engineering and Operations Consultative Support(24*7) Provider for PostgreSQL, MySQL, MariaDB, MongoDB, ClickHouse, Trino, SQL Server, Cassandra, CockroachDB, Yugabyte, Couchbase, Redis, Valkey, NoSQL, NewSQL, SAP HANA, Databricks, Amazon Resdhift, Amazon Aurora, CloudSQL, Snowflake and AzureSQL with core expertize in Performance, Scalability, High Availability, Database Reliability Engineering, Database Upgrades/Migration, and Data Security.