When MySQL ignores histograms?

MySQL can ignore histograms in the query optimizer when it determines that using the histograms would not provide any benefit to the query execution plan. This can occur in several cases, including:

  1. Small Tables: If a table is small, the optimizer may determine that a full table scan would be faster than using an index. In this case, the optimizer may ignore the histograms and choose to use a full table scan instead.
  2. Non-SELECT Queries: Histograms are only used by the query optimizer for SELECT statements. Queries such as INSERT, UPDATE, and DELETE statements do not use histograms.
  3. No Histograms: If histograms are not available for a column, the optimizer will not use them. Histograms must be explicitly created using the ANALYZE TABLE or OPTIMIZE TABLE commands.
  4. Different Data Types: If a column contains a mix of different data types, the optimizer may not be able to use the histograms effectively. This can result in the optimizer ignoring the histograms and choosing an alternative execution plan.
  5. Query Complexity: If a query is complex and involves multiple conditions, the optimizer may ignore histograms and choose a different execution plan that is easier to optimize.
  6. Up-to-date Statistics: If the statistics for a table are not up-to-date, the optimizer may ignore the histograms and choose a different execution plan. It is important to regularly analyze and optimize tables to ensure that the statistics are up-to-date.

Why MySQL ignores Histograms in distributed queries?

MySQL may ignore histograms in the query optimizer for distributed queries because histograms are used to provide information about the distribution of data in a single table. When a query involves multiple tables, the distribution of data across the tables becomes more complex, and histograms may not provide as much benefit in optimizing the query execution plan.

In a distributed query, data is stored across multiple nodes, and the query optimizer must determine the best way to retrieve the data from each node. The optimizer may choose to use other information, such as the distribution of data across the nodes, to determine the best execution plan.

Additionally, in a distributed query, the query optimizer must also consider the cost of transmitting data between nodes. The optimizer may choose to use other information, such as the network topology, to determine the best execution plan.

In general, the query optimizer will use histograms when it determines that they will improve the query execution plan. However, in some cases, the optimizer may determine that the histograms are not useful and will choose a different execution plan. In distributed queries, MySQL will consider a variety of factors during optimization stage and histograms may or may not play a role in the optimization process depending on the specific query and the data being queried.

About Shiv Iyer 452 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.