Monday, April 11, 2011

MySQL 5.6: Index Condition Pushdown

Index Condition Pushdown (ICP) is one of the new optimizer features in the MySQL 5.6.2 milestone release. The goal with Index Condition Pushdown is to move as much as possible of the processing of conditions (mainly WHERE clauses) from the server to the storage engine. Instead of fetching entire rows into the server and then evaluate the conditions in the server, the optimizer "pushes" the parts of condition that can be evaluated using the index down to the storage engine. This gives the storage engine the possibility to filter out non-relevant rows using the index and only return relevant rows to the server. The result of using ICP should be less IO accesses to the base table and fewer accesses from the MySQL server to the storage engine.

To explain how Index Condition Pushdown works let us first look at how retrieval of records based on using an index is done without use of Index Condition Pushdown. The main operations that the storage engine and server perform look approximately like this:
  1. The storage engine reads the next row by first reading the index tuple and then using the index tuple to locate and read the full row from the base table.
  2. The server will then evaluate the WHERE condition on the row and based on this either use or ignore this row.
When Index Condition Pushdown is used, the server will push down the parts of the WHERE condition that can be evaluated using the index to the storage engine. The reading of records from a table will then consist of the following steps:

  1. The storage engine reads the next index tuple from the index.
  2. The storage engine evaluates the pushed index condition using the index tuple. If this condition is not satisfied then the storage engine will proceed to the next entry in the index (go back to step 1). Only when it has found an index entry that satisfies the pushed index condition it will continue to read data from the base table.
  3. The storage engine uses the index entry to locate and read the full row from the table.
  4. The server will evaluate the part of the WHERE condition that has not been pushed down the storage engine and based on this either use or ignore this row.
The savings during execution of the query comes from that the storage engine will be able to filter out and drop reading of full rows from the table every time the pushed index condition evaluates to false by just using data from the index.

The optimizer can use Index Condition Pushdown for queries that need to access full table rows using the range, ref, eq_ref, and ref_or_null access methods. In MySQL 5.6.2 Index Condition Pushdown is implemented for InnoDB and MyISAM.

How to enable Index Condition Pushdown?

There is no need to do anything extra in order to use Index Condition Pushdown in MySQL 5.6.2 since it is enabled by default. If you for some reason should want to disable ICP, for instance to be able to compare the execution time with and without ICP, you can use the optimizer_switch system variable to control the use of it. For example to disable ICP you can set:
This setting can be changed at run-time.

How to know if Index Condition Pushdown is used?

When Index Condition is used, the Extra column in the EXPLAIN of the query will show "Using index condition".

Example using Index Condition Pushdown

To illustrate the effect of using Index Condition Pushdown a small example is needed. Assume you have the following table with information about the five million people living in Norway:

   CREATE TABLE person (
      personid INTEGER PRIMARY KEY,
      firstname CHAR(20),
      lastname CHAR(20),
      postalcode INTEGER,
      age INTEGER,
      address CHAR(50),
      KEY k1 (postalcode,age)
   ) ENGINE=InnoDB;

and you want to find all persons who live close to the city of Bergen (with postal code in the range between 5000 and 5500) and are either 21 or 22 years old. This can be done using the following query:

   SELECT lastname, firstname
   FROM person
   WHERE postalcode BETWEEN 5000 AND 5500 AND age BETWEEN 21 AND 22;

The MySQL optimizer will execute this as a range query using the k1 index. The range query will request the storage engine to read all records where "postalcode BETWEEN 5000 AND 5500".

If this query was run against MySQL 5.5 (which does not have ICP) or if ICP has been disabled then the storage engine will need to fetch all rows that has postal code between 5000 and 5500. With randomly distributed data in the table this would correspond to 250.000 rows that the storage engine has to read and return to the server for evaluation. The server would then evaluate the complete WHERE clause and filter out the records that does not satisfy the "age BETWEEN 21 AND 22". This would result in 5.000 records that would be returned to the user.

With ICP enabled the optimizer will evaluate the WHERE clause and find the parts of it that can be evaluated by using fields from the index. In this case the index contains both the postalcode and the age. Thus, the optimizer will push the entire WHERE clause down to the storage engine. The storage engine can now evaluate the pushed index condition by using the index. So before reading a full row from the base table, the storage engine evaluates the pushed index condition using the index entry. Only those rows that actually qualifies need to be read from the base table. So in this example, the storage engine will evaluate 250.000 index entries but only need to read 5.000 rows from the base table and return those to the server (compared to the 250.000 rows in the case where ICP was not used).

To get some performance numbers I filled the above table with five million entries with random data (uniformly distributed). The resulting size of the database was about 1 GB on disk. First I disabled the use of ICP and measured the time it took to run the query. The default size of the InnoDB buffer pool is too small to fit this table in memory. So with this setting it took about 15 seconds for this query to complete due to a lot of disk accesses. When I increased the size of the InnoDB buffer pool to 1.5 GB the query takes about 1.4 seconds to complete.

With Index Condition Pushdown enabled it only takes about 90 ms to run the same query both with the default size of the InnoDB buffer pool and the 1.5 GB buffer pool.

In this example the speedup is 15 times when using Index Condition Pushdown (or 160 times in the case of default InnoDB buffer pool size). This speedup is mainly caused by having to read a much lower number of records from the base table (reduced number of IO operations) and a much lower number of calls from the server to InnoDB.


Nars said...

This is really good improvement.
But it looks like only queries having indexed columns in WHERE clause will get benefited using this improvement.


Olav Sandstå said...

Nars: Thanks for the comment. You are right about that in order to push down parts of the condition to the storage engine it must be possible to evaluate the condition using the content of an index entry. The reason for this is that we get the greatest performance benefit when the storage engine can avoid reading the corresponding record from the table.

If the server pushed down conditions that referred fields not stored in the index entry then the storage engine would have to read the record first before evaluating the condition. In this case the savings by having the storage engine evaluate the condition compared to having the server evaluate the condition would be much less.

Mark Callaghan said...

I welcome anything that eliminates disk reads. Nice job.

Ovais Tariq said...

In the example that you have given there is a range condition on the postalcode column which happens to be the first column of the index, and because this is a range condition, therefore MySQL < 5.6 is not able to use the next part of the index (age column) to filter rows. So what you are implying is that this limitation is removed by ICP. Am I correct in thinking that?

Nextly, for a non-range condition on the first column of the index, say postcalcode=5000 AND age BETWEEN 21 and 22, ICP is not going to make any difference, because both the columns of the index (postalcode, age) would already be used by the storage engine to filter the rows before passing them to the Server. Am I correct?


Jørgen Løland said...

Ovais: Yes, that is correct. If the first keypart had been a non-range condition the range optimizer would read only rows qualifying both keyparts in the first place. In this regard, ICP is the second best thing.

A more tightly setup range is better whenever possible because that removes the need to read index entries while ICP only removes the need to read rows from the table.