Monday, April 11, 2011

MySQL 5.6: Index Condition Pushdown

Index Condition Pushdown (ICP) is one of the new optimizer features in the MySQL 5.6.2 milestone release. The goal with Index Condition Pushdown is to move as much as possible of the processing of conditions (mainly WHERE clauses) from the server to the storage engine. Instead of fetching entire rows into the server and then evaluate the conditions in the server, the optimizer "pushes" the parts of condition that can be evaluated using the index down to the storage engine. This gives the storage engine the possibility to filter out non-relevant rows using the index and only return relevant rows to the server. The result of using ICP should be less IO accesses to the base table and fewer accesses from the MySQL server to the storage engine.

To explain how Index Condition Pushdown works let us first look at how retrieval of records based on using an index is done without use of Index Condition Pushdown. The main operations that the storage engine and server perform look approximately like this:
  1. The storage engine reads the next row by first reading the index tuple and then using the index tuple to locate and read the full row from the base table.
  2. The server will then evaluate the WHERE condition on the row and based on this either use or ignore this row.
When Index Condition Pushdown is used, the server will push down the parts of the WHERE condition that can be evaluated using the index to the storage engine. The reading of records from a table will then consist of the following steps:

  1. The storage engine reads the next index tuple from the index.
  2. The storage engine evaluates the pushed index condition using the index tuple. If this condition is not satisfied then the storage engine will proceed to the next entry in the index (go back to step 1). Only when it has found an index entry that satisfies the pushed index condition it will continue to read data from the base table.
  3. The storage engine uses the index entry to locate and read the full row from the table.
  4. The server will evaluate the part of the WHERE condition that has not been pushed down the storage engine and based on this either use or ignore this row.
The savings during execution of the query comes from that the storage engine will be able to filter out and drop reading of full rows from the table every time the pushed index condition evaluates to false by just using data from the index.

The optimizer can use Index Condition Pushdown for queries that need to access full table rows using the range, ref, eq_ref, and ref_or_null access methods. In MySQL 5.6.2 Index Condition Pushdown is implemented for InnoDB and MyISAM.

How to enable Index Condition Pushdown?

There is no need to do anything extra in order to use Index Condition Pushdown in MySQL 5.6.2 since it is enabled by default. If you for some reason should want to disable ICP, for instance to be able to compare the execution time with and without ICP, you can use the optimizer_switch system variable to control the use of it. For example to disable ICP you can set:
  optimizer_switch='index_condition_pushdown=off';
This setting can be changed at run-time.

How to know if Index Condition Pushdown is used?


When Index Condition is used, the Extra column in the EXPLAIN of the query will show "Using index condition".

Example using Index Condition Pushdown

To illustrate the effect of using Index Condition Pushdown a small example is needed. Assume you have the following table with information about the five million people living in Norway:

   CREATE TABLE person (
      personid INTEGER PRIMARY KEY,
      firstname CHAR(20),
      lastname CHAR(20),
      postalcode INTEGER,
      age INTEGER,
      address CHAR(50),
      KEY k1 (postalcode,age)
   ) ENGINE=InnoDB;


and you want to find all persons who live close to the city of Bergen (with postal code in the range between 5000 and 5500) and are either 21 or 22 years old. This can be done using the following query:

   SELECT lastname, firstname
   FROM person
   WHERE postalcode BETWEEN 5000 AND 5500 AND age BETWEEN 21 AND 22;


The MySQL optimizer will execute this as a range query using the k1 index. The range query will request the storage engine to read all records where "postalcode BETWEEN 5000 AND 5500".

If this query was run against MySQL 5.5 (which does not have ICP) or if ICP has been disabled then the storage engine will need to fetch all rows that has postal code between 5000 and 5500. With randomly distributed data in the table this would correspond to 250.000 rows that the storage engine has to read and return to the server for evaluation. The server would then evaluate the complete WHERE clause and filter out the records that does not satisfy the "age BETWEEN 21 AND 22". This would result in 5.000 records that would be returned to the user.

With ICP enabled the optimizer will evaluate the WHERE clause and find the parts of it that can be evaluated by using fields from the index. In this case the index contains both the postalcode and the age. Thus, the optimizer will push the entire WHERE clause down to the storage engine. The storage engine can now evaluate the pushed index condition by using the index. So before reading a full row from the base table, the storage engine evaluates the pushed index condition using the index entry. Only those rows that actually qualifies need to be read from the base table. So in this example, the storage engine will evaluate 250.000 index entries but only need to read 5.000 rows from the base table and return those to the server (compared to the 250.000 rows in the case where ICP was not used).

Performance
To get some performance numbers I filled the above table with five million entries with random data (uniformly distributed). The resulting size of the database was about 1 GB on disk. First I disabled the use of ICP and measured the time it took to run the query. The default size of the InnoDB buffer pool is too small to fit this table in memory. So with this setting it took about 15 seconds for this query to complete due to a lot of disk accesses. When I increased the size of the InnoDB buffer pool to 1.5 GB the query takes about 1.4 seconds to complete.

With Index Condition Pushdown enabled it only takes about 90 ms to run the same query both with the default size of the InnoDB buffer pool and the 1.5 GB buffer pool.

In this example the speedup is 15 times when using Index Condition Pushdown (or 160 times in the case of default InnoDB buffer pool size). This speedup is mainly caused by having to read a much lower number of records from the base table (reduced number of IO operations) and a much lower number of calls from the server to InnoDB.