Epic PostgreSQL Indexing Myths That Boost Performance

Table of Contents

PostgreSQL Indexing Myths – Just because you’re getting an index scan in PostgreSQL, doesn’t mean you can’t do better!

Understanding Index Scans: The Starting Point, Not the Destination

When you see that your PostgreSQL query is using an index scan, it’s natural to feel a sense of accomplishment. After all, you’ve moved beyond the dreaded sequential scan that examines every row in a table. An index scan represents progress in query optimization—it means PostgreSQL is using an index structure to locate the data you need, rather than scanning the entire table .

However, experienced database professionals know that an index scan is merely the beginning of the optimization journey, not its conclusion. Just because PostgreSQL is using an index doesn’t mean it’s using the index optimally, or that you’ve selected the most appropriate index strategy for your workload.

The Spectrum of Index Operations

PostgreSQL offers several types of index access methods, each with different performance characteristics:

Sequential Scan: Reads every row in the table
Index Scan: Uses an index to find rows, but may still require heap (table) lookups
Index-Only Scan: Retrieves all needed data directly from the index, avoiding heap lookups entirely
Bitmap Index Scan: Combines results from multiple indexes using bitmap operations

The presence of an index scan in your query plan indicates that PostgreSQL has identified an index that can help narrow down the search space, but it may still be performing additional work that could be eliminated with better index design.

Why Index Scans Aren’t Always Optimal

The Hidden Cost of Heap Lookups

The primary limitation of a standard index scan is that it typically requires heap lookups. When PostgreSQL uses an index to find qualifying rows, it first retrieves the tuple identifiers (TIDs) from the index, then uses these TIDs to fetch the actual row data from the table (heap). This two-step process introduces additional I/O overhead.

Consider a query like:

SELECT name, email, phone FROM customers WHERE status = 'active';

If you have an index on status alone, PostgreSQL will use an index scan to find all active customers, but then must perform a heap lookup for each matching row to retrieve the name, email, and phone columns. If thousands of customers are active, this results in thousands of random I/O operations to fetch the row data.

Index-Only Scans: Eliminating the Heap Lookup Penalty

PostgreSQL 9.2 introduced index-only scans, which can dramatically improve performance by eliminating heap lookups entirely. For an index-only scan to be possible, the index must contain all the columns referenced in the query (either in the SELECT list or WHERE clause), and the table must be sufficiently vacuumed so that the visibility map is accurate.

To convert our previous example to use an index-only scan, we could create a covering index:

CREATE INDEX idx_customers_active_contact ON customers (status) INCLUDE (name, email, phone);

Or, if these are frequently accessed together:

CREATE INDEX idx_customers_active_covering ON customers (status, name, email, phone);

Now PostgreSQL can satisfy the entire query from the index alone, transforming our index scan into an index-only scan and eliminating the costly heap lookups.

Advanced Indexing Strategies for Better Performance

Partial Indexes: Indexing Only What You Need

Many queries filter on specific subsets of data. Instead of indexing entire columns, partial indexes allow you to index only the rows that matter for particular queries.

For example, if most of your queries focus on active customers, creating a partial index can be much more efficient:

CREATE INDEX idx_customers_active_only ON customers (email) WHERE status = 'active';

This index is smaller, faster to maintain, and more likely to stay in memory. It also improves write performance since only inserts/updates of active customers need to update this index.

Expression Indexes: Indexing Computed Values

Sometimes your queries filter on expressions rather than raw column values. Expression indexes allow you to index the result of a function or expression.

-- For case-insensitive searches
CREATE INDEX idx_customers_email_lower ON customers (LOWER(email));

-- For date truncation
CREATE INDEX idx_orders_month ON orders (date_trunc('month', order_date));

With the first index, a query like SELECT * FROM customers WHERE LOWER(email) = ‘user@example.com’ can use an index scan instead of a sequential scan with function evaluation on every row.

Multicolumn Indexes: Order Matters

When creating indexes on multiple columns, the order of columns significantly impacts performance. The general rule is to place the most selective column first, but this isn’t always optimal.

Consider a query:

SELECT * FROM orders WHERE customer_id = 123 AND status = 'shipped' AND order_date > '2024-01-01';

A naive approach might create: CREATE INDEX idx_orders_naive ON orders (customer_id, status, order_date);

However, if you have customers with orders in multiple statuses, but most queries are for recent shipped orders, you might benefit more from: CREATE INDEX idx_orders_optimized ON orders (customer_id, order_date, status);

This allows PostgreSQL to quickly narrow down to a customer’s recent orders, then filter by status among a much smaller set of rows.

Index Bloat and Maintenance

Identifying Index Bloat

Over time, indexes can become bloated due to UPDATE and DELETE operations, leading to inefficient use of space and poorer performance. You can check for index bloat using queries like:

SELECT 
    schemaname,
    tablename,
    indexname,
    pg_size_pretty(pg_relation_size(schemaname || '.' || indexname)) as index_size,
    bloat_ratio
FROM (
    SELECT 
        schemaname,
        tablename,
        indexname,
        (100 * (current_index_bytes - ideal_index_bytes) / current_index_bytes) as bloat_ratio
    FROM (
        -- Bloat calculation logic
    ) bloat
    WHERE current_index_bytes > 10000000 -- Only consider indexes > 10MB
) bloat_summary
WHERE bloat_ratio > 30 -- More than 30% bloat
ORDER BY bloat_ratio DESC;

Rebuilding and Reorganizing Indexes

For severely bloated indexes, rebuilding can restore performance:

-- Rebuild index (holds exclusive lock)
REINDEX INDEX idx_name;

-- Concurrent rebuild (available in PostgreSQL 9.2+)
REINDEX INDEX CONCURRENTLY idx_name;

The concurrent option allows the index to remain available for reads during the rebuild, though it takes longer and uses more resources.

Query Plan Analysis: Seeing Beyond the Index Scan

Reading EXPLAIN Output Effectively

When analyzing query plans, look beyond the mere presence of an index scan. Key metrics to examine include:

Rows removed by filter: Indicates how selective your index is
Heap fetches: Shows the number of heap lookups performed
Actual loops: For nested loop joins, indicates how many times the index scan was executed
Buffers: Shows I/O activity (shared hit, read, etc.)

EXPLAIN (ANALYZE, BUFFERS) SELECT name, email FROM customers WHERE status = 'active';

Using pg_stat_user_indexes for Real-World Insights

PostgreSQL’s statistics views provide valuable information about index usage:

SELECT 
    schemaname,
    tablename,
    indexname,
    idx_scan as index_scans,
    idx_tup_read as tuples_read,
    idx_tup_fetch as tuples_fetched
FROM pg_stat_user_indexes 
ORDER BY idx_scan ASC; -- Find unused indexes

Indexes with zero or very low scan counts might be candidates for removal, reducing maintenance overhead.

Advanced Optimization Techniques

Index-Only Scan Optimization

To maximize index-only scan opportunities:

VACUUM frequently: The visibility map must be up-to-date
Consider fillfactor: For tables with frequent updates, a lower fillfactor (e.g., 80) leaves space for updates without row movement
Use INCLUDE clause: Add frequently accessed columns to indexes without affecting the index’s sorting

-- Add frequently accessed columns without changing sort order
CREATE INDEX idx_transactions_user_covering 
ON transactions (user_id) INCLUDE (amount, transaction_date, description);

Multi-Index Strategies and Bitmap Scans

For queries with multiple filter conditions, PostgreSQL can combine multiple indexes using bitmap operations:

-- Individual indexes
CREATE INDEX idx_orders_customer ON orders (customer_id);
CREATE INDEX idx_orders_status ON orders (status);
CREATE INDEX idx_orders_date ON orders (order_date);

For a query filtering on all three columns, PostgreSQL might use a bitmap index scan that combines the results from all three indexes, which can be more efficient than a single multicolumn index, especially when the individual filters are highly selective.

Partitioning and Indexing

For very large tables, partitioning can dramatically improve query performance. Each partition can have its own indexes, and queries that can be limited to specific partitions avoid scanning irrelevant data entirely.

-- Partitioned table
CREATE TABLE sales (
    sale_id BIGSERIAL,
    sale_date DATE NOT NULL,
    amount DECIMAL(10,2),
    region VARCHAR(50)
) PARTITION BY RANGE (sale_date);

-- Create partition
CREATE TABLE sales_2024 PARTITION OF sales 
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

-- Index on partition
CREATE INDEX idx_sales_2024_region ON sales_2024 (region);

Monitoring and Continuous Improvement

Setting Up Index Performance Monitoring

Create regular monitoring queries to identify optimization opportunities:

-- Long-running queries with index scans
SELECT 
    query,
    calls,
    mean_time,
    rows 
FROM pg_stat_statements 
WHERE query LIKE '%INDEX SCAN%' 
ORDER BY mean_time DESC 
LIMIT 10;

The Index Optimization Cycle

Effective index management follows a continuous cycle:

Monitor: Track query performance and index usage
Analyze: Identify slow queries and inefficient index usage
Optimize: Create, modify, or remove indexes
Test: Validate improvements in a staging environment
Deploy: Implement changes with appropriate change management
Repeat: Continuously monitor and improve

Conclusion: Beyond the Index Scan

An index scan in PostgreSQL is a positive sign that your query optimization efforts are moving in the right direction. However, it should be viewed as a starting point rather than a destination. True query optimization involves understanding the nuances of different index access methods, recognizing the hidden costs of heap lookups, and applying advanced indexing strategies tailored to your specific workload.

By focusing on index-only scans, partial indexes, expression indexes, and proper index maintenance, you can transform merely “indexed” queries into truly optimized operations that deliver maximum performance with minimal resource consumption. Remember that index optimization is an ongoing process—not a one-time task—and requires continuous monitoring and refinement to keep your PostgreSQL databases running at peak efficiency.

The next time you see an index scan in your query plan, don’t celebrate too soon. Ask yourself: Can I do better? The answer is almost always yes.

The Data Transformation Company

Data Architecture, Engineering and Operations for SQL, NoSQL, NewSQL, Cloud Native Data Platforms, Analytics and AI

PostgreSQL Indexing Myths

PostgreSQL Indexing Myths – Just because you’re getting an index scan in PostgreSQL, doesn’t mean you can’t do better!

Understanding Index Scans: The Starting Point, Not the Destination

The Spectrum of Index Operations

Why Index Scans Aren’t Always Optimal

Advanced Indexing Strategies for Better Performance

Multicolumn Indexes: Order Matters

Index Bloat and Maintenance

Query Plan Analysis: Seeing Beyond the Index Scan

Advanced Optimization Techniques

Index-Only Scan Optimization

Monitoring and Continuous Improvement

The Index Optimization Cycle

Conclusion: Beyond the Index Scan

Further Reading

PostgreSQL Indexing Myths – Just because you’re getting an index scan in PostgreSQL, doesn’t mean you can’t do better!

Understanding Index Scans: The Starting Point, Not the Destination

The Spectrum of Index Operations

Why Index Scans Aren’t Always Optimal

Advanced Indexing Strategies for Better Performance

Multicolumn Indexes: Order Matters

Index Bloat and Maintenance

Query Plan Analysis: Seeing Beyond the Index Scan

Advanced Optimization Techniques

Index-Only Scan Optimization

Monitoring and Continuous Improvement

The Index Optimization Cycle

Conclusion: Beyond the Index Scan

Further Reading

Related Articles

Understanding the Unnest Function in PostgreSQL: Transforming Arrays into Rows

Implementing a Tamper-Proof Ledger in PostgreSQL

Optimizing PostgreSQL Performance: A Comprehensive Guide to Rowstore Index Implementation and Tuning