Reading and Writing Blocks in InnoDB

InnoDB, the default storage engine in MySQL, uses a buffer pool to manage the reading and writing of data blocks. The buffer pool is a cache that holds frequently accessed data pages in memory, reducing disk I/O and improving performance. Let’s explore how InnoDB handles reading and writing blocks:

  1. Reading Blocks:

    When a query needs to read data from a table, InnoDB follows a process known as logical I/O.
    Here’s an overview of how reading blocks works:
    a. Check the Buffer Pool: InnoDB first checks whether the required data block is already in the buffer pool. If it’s present, this is considered a cache hit. In this case, InnoDB directly reads the data from memory, avoiding any disk I/O.
    b. Buffer Pool Miss: If the required data block is not found in the buffer pool, InnoDB treats it as a cache miss. In this case, InnoDB performs the following steps:
      i. Fetch from Disk: InnoDB reads the data block from disk into the buffer pool. This operation involves physical I/O, which can be a time-consuming process compared to reading from memory. 
      ii. LRU Algorithm: If the buffer pool is full, InnoDB uses the Least Recently Used (LRU) algorithm. This algorithm helps decide which data blocks to evict in order to make space for the newly read block. The goal is to keep frequently accessed data in the buffer pool.
      iii. Page Structure: After InnoDB loads the data block into the buffer pool, it immediately accesses the specific row or page requested by the query.
  2. Writing Blocks:

    InnoDB follows a process known as write-ahead logging (WAL) to ensure durability and recoverability of data. Here’s an overview of how writing blocks works:
    a. Update in Memory: When a data modification (e.g., INSERT, UPDATE, DELETE) occurs. InnoDB first writes the changes to the appropriate data pages in the buffer pool. This step is called the “write to memory” phase.
    b. Write to Redo Log: InnoDB then writes the changes to the redo log, which is a transaction log that captures all changes made to the database. The redo log is a circular buffer stored on disk, optimized for sequential writes.
    c. Flush to Disk: Periodically or during specific events (e.g., COMMIT statement), InnoDB flushes the modified data pages from the buffer pool to the data files on disk. This process is called a “flush” or “page write.” InnoDB uses a background thread called the flush list to manage the flushing process efficiently.
    d. Write to Data Files: The modified data pages are written to the appropriate data files on disk. This ensures that the changes are durable and can be recovered in case of a crash or system failure.
    e.Checkpoint: Once the data pages are flushed to disk, InnoDB updates a checkpoint in the redo log. This marks the progress of the flush operation. During crash recovery, InnoDB uses this checkpoint to determine which redo log entries need to be applied or rolled back.
    It’s worth noting that InnoDB employs various optimizations to minimize disk I/O and maximize performance, such as grouping multiple changes into a single disk write (known as write batching) and using double write buffers to reduce the impact of partial page writes.

Conclusion:

Overall, InnoDB’s approach to reading and writing blocks focuses on efficient memory utilization, minimizing disk I/O, and ensuring data durability and recoverability. The buffer pool caching and the write-ahead logging mechanism are both critical to providing high performance and reliable data operations in InnoDB.

About Shiv Iyer 500 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.