Reading and Writing Blocks in InnoDB

InnoDB, the default storage engine in MySQL, uses a buffer pool to manage the reading and writing of data blocks. The buffer pool is a cache that holds frequently accessed data pages in memory, reducing disk I/O and improving performance. Let’s explore how InnoDB handles reading and writing blocks:

  1. Reading Blocks: When a query needs to read data from a table, InnoDB follows a process known as logical I/O. Here’s an overview of how reading blocks works:a. Check the Buffer Pool: InnoDB first checks if the required data block is already in the buffer pool. If it’s present, it’s considered a cache hit, and the data is read directly from memory, avoiding disk I/O.b. Buffer Pool Miss: If the required data block is not found in the buffer pool, it’s considered a cache miss. InnoDB proceeds with the following steps:i. Fetch from Disk: InnoDB reads the data block from disk into the buffer pool. This operation involves physical I/O, which can be a time-consuming process compared to reading from memory.ii. LRU Algorithm: If the buffer pool is full, InnoDB uses the Least Recently Used (LRU) algorithm to determine which data blocks to evict from the buffer pool to make room for the newly read block. The goal is to keep the most frequently accessed data blocks in the buffer pool.iii. Page Structure: Once the data block is in the buffer pool, InnoDB can access the specific row or page requested by the query.
  2. Writing Blocks: InnoDB follows a process known as write-ahead logging (WAL) to ensure durability and recoverability of data. Here’s an overview of how writing blocks works:a. Update in Memory: When a data modification (e.g., INSERT, UPDATE, DELETE) occurs, InnoDB first writes the changes to the appropriate data pages in the buffer pool. This is known as the “write to memory” phase.b. Write to Redo Log: InnoDB then writes the changes to the redo log, which is a transaction log that captures all changes made to the database. The redo log is a circular buffer stored on disk, optimized for sequential writes.c. Flush to Disk: Periodically or during specific events (e.g., COMMIT statement), InnoDB flushes the modified data pages from the buffer pool to the data files on disk. This process is called a “flush” or “page write.” InnoDB uses a background thread called the flush list to manage the flushing process efficiently.d. Write to Data Files: The modified data pages are written to the appropriate data files on disk. This ensures that the changes are durable and can be recovered in case of a crash or system failure.e. Checkpoint: After the data pages are flushed to disk, InnoDB updates a checkpoint in the redo log to mark the progress of the flushing process. Checkpoints help during crash recovery to determine which redo log entries need to be applied or rolled back.It’s worth noting that InnoDB employs various optimizations to minimize disk I/O and maximize performance, such as grouping multiple changes into a single disk write (known as write batching) and using double write buffers to reduce the impact of partial page writes.

Overall, InnoDB’s approach to reading and writing blocks focuses on efficient memory utilization, minimizing disk I/O, and ensuring data durability and recoverability. The buffer pool caching mechanism and the write-ahead logging mechanism play crucial roles in providing high-performance and reliable data operations in InnoDB.

About Shiv Iyer 444 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.