Why Your PostgreSQL Query Reads Millions of Blocks to Return Zero Rows (And How to Fix It)
A deep dive into index bloat, dead tuples, and MVCC visibility in high-throughput PostgreSQL tables

If you’ve ever stared at a PostgreSQL query plan in disbelief — watching a query scan 3.5 million blocks only to return zero rows — you’re not alone. And if that same query magically completes in 23 milliseconds five seconds later, only to crawl back to 60 seconds fifteen minutes after that, something deeper is going on beneath the surface.
This post breaks down exactly what’s happening, why it happens, and — most importantly — how to fix it for good.
The Setup: A High-Throughput Staging Table
Imagine a staging table — let’s call it poslog_publisher_rms_stage — running on Aurora PostgreSQL 17.4. It’s a busy table. Rows arrive with a status of READY, get picked up almost immediately by a processing job that flips them to PROCESSED, and every two hours a separate job scans for anything stuck in UNPROCESSED or FAILED with fewer than three retry attempts.
The table has a composite index on (status, error_retry_count, modified_date) — exactly what that retry query needs. And it does use it. So why is performance so catastrophically bad?
The Core Problem: Index Bloat from Dead Tuples
This is where PostgreSQL’s MVCC (Multi-Version Concurrency Control) model becomes both a strength and a hidden trap.
When a row is updated in PostgreSQL, the database doesn’t modify the row in place. Instead, it writes a brand new version of the row and marks the old one as dead. This is fundamental to how PostgreSQL handles concurrent reads and writes without locking — but it has a cost.
Every single row that gets flipped from READY to PROCESSED leaves behind a dead tuple. And those dead tuples don’t disappear immediately. They linger in the heap until VACUUM comes along to clean them up.
Here’s where it gets painful: the composite index still contains entries pointing to those dead tuples. So when the retry query runs and walks the index looking for rows where status IN (‘UNPROCESSED’, ‘FAILED’), it encounters index entries for all the old READY and PROCESSED versions of rows too.
For each of those entries, PostgreSQL has to go fetch the actual heap page to determine whether that version of the row is still visible to the current transaction. This is called a heap fetch for visibility check — and with 8.4 million dead tuples spread across a 96 million row table, the query ends up touching millions of heap pages just to confirm, over and over again: “Nope, not visible. Skip it.”
That’s where the 3.5 million block reads come from. Most of them aren’t finding matching rows — they’re doing visibility bookkeeping.
Why the Second Run Drops to 18k Block Reads
Five seconds later, the same query runs in 23 milliseconds. What changed?
Nothing changed in the data. What changed is the buffer cache.
The first run pulled roughly 2 million blocks from disk into PostgreSQL’s shared buffer pool — the in-memory cache. When the query runs again five seconds later, all those pages are still warm in memory. Nothing has evicted them yet. So instead of going to disk, the second run reads from cache, and the difference between cold disk I/O and warm in-memory reads is the entire explanation for that 66-second-to-23-millisecond drop.
It’s worth noting: even 18k logical reads to find zero rows is higher than you’d expect from a clean index. That’s still the dead tuples at work — even with everything in cache, the query still has to touch those pages to confirm visibility. It’s just fast because it’s all in memory.
Why It Blows Up Again 15 Minutes Later
In those 15 minutes, roughly 10,000+ new rows were inserted and processed. Each insert and update:
- Added new pages to the heap
- Added new entries to the index
- Created new dead tuples
The buffer cache — which has a finite size — has been partially evicted and replaced with newer data. The index has grown with fresh entries pointing to newly dead rows. So when the query runs again, it’s back to scanning a bloated index and fetching heap pages that are no longer in cache. Another ~3 million block reads. Another ~60 seconds of execution time.
The cycle repeats.
The Visibility Map: The Missing Piece
There’s one more concept worth understanding here: the visibility map.
PostgreSQL maintains a visibility map for each table that tracks which heap pages contain only live, visible tuples. When a page is marked “all-visible,” an index scan can skip the heap fetch entirely — it knows without checking that any row on that page is safe to return.
But when a page has dead tuples on it, it’s not marked all-visible. PostgreSQL must fetch the heap page to check visibility for every single index entry it encounters.
With 8.4 million dead tuples spread across the table, a huge portion of heap pages are not all-visible. The index scan degenerates into what is essentially a full heap scan disguised as an index scan — all the overhead of a sequential scan, with none of the efficiency of a true index lookup.
How to Fix It
Here are the most impactful solutions, roughly in order of effectiveness:
1. Create a Partial Index
This is the single most impactful change you can make right now.
Since the retry query only ever cares about UNPROCESSED and FAILED rows — and the vast majority of rows are PROCESSED — you can create a partial index that only includes the rows you actually need:
CREATE INDEX idx_retry_candidates
ON poslog_publisher_rms_stage (error_retry_count, modified_date)
WHERE status IN ('UNPROCESSED', 'FAILED');
This index will be tiny. It will stay clean. It will never contain entries for dead READY or PROCESSED rows. The retry query will fly through it in microseconds regardless of how large the main table grows.
2. Tune Autovacuum Aggressively for This Table
A dead tuple count of 8.4 million on a 96 million row table is a red flag. The default autovacuum settings are designed for general-purpose workloads — they’re not aggressive enough for a high-throughput staging table like this one.
You can set table-level storage parameters to make autovacuum kick in far more frequently:
Summary
| Observation | Root Cause |
|---|---|
| 3.5M block reads for 0 rows | Index bloat + dead tuples forcing heap visibility checks |
| 2nd run drops to 18k reads | Buffer cache — pages already in memory from first run |
| 15 min later, back to 3M reads | New dead tuples + cache eviction from ongoing inserts/updates |
| 18k reads still seems high | Remaining dead tuples still require visibility checks even in cache |
The Bottom Line
PostgreSQL’s MVCC model is one of its greatest strengths — it enables high concurrency without the locking overhead that plagues other databases. But on high-throughput tables where rows are constantly being inserted and updated, it creates a hidden cost: dead tuples accumulate faster than autovacuum can clean them up, index bloat follows, and queries that should be lightning-fast end up doing the work of a full table scan.
The good news is that this is a well-understood problem with well-understood solutions. A partial index on just the rows your query actually cares about is often all it takes to go from a 60-second nightmare to a sub-millisecond lookup — permanently, regardless of how many dead tuples accumulate in the rest of the table.
If your PostgreSQL queries are behaving erratically — fast sometimes, slow others, with no obvious pattern — dead tuples and index bloat should be near the top of your diagnostic checklist.