I was recently tasked with optimizing a table for a rather tricky query. A typical query went something like this:
SELECT `ID` FROM `ExistingUsers`
WHERE `EntryType` = 'NEW'
AND `Version` = 'long.version.string'
AND
(
(
`IDTypeOne` = '4e46552376ef6ab7'
AND `IDTypeOne` IS NOT NULL
)
OR
(
`IDTypeTwo` = '860525021806844'
AND `IDTypeTwo` IS NOT NULL
)
OR
(
`IDTypeThree` = 'g2r376ef6ab7'
AND `IDTypeThree` IS NOT NULL
)
);
The query’s purpose is to look into the `ExistingUsers` table to find if we already have a matching entry for this user. A qualifying matching entry would satisfy both the `EntryType` and `Version` columns, and at least one of the three possible `IDTypeXXX` columns.
For this particular problem, optimizing the query was not an option, and cleaning the data in the table wasn’t either. So I was left with optimizing the table itself.
Here’s what the CREATE TABLE showed:
CREATE TABLE `ExistingUsers` (
`ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`EntryType` enum('NEW','EXISTING') DEFAULT NULL,
`Version` varchar(64) DEFAULT NULL,
`IDTypeOne` varchar(128) DEFAULT NULL,
`IDTypeTwo` varchar(64) DEFAULT NULL,
`IDTypeThree` varchar(64) DEFAULT NULL,
`Time` datetime DEFAULT NULL,
PRIMARY KEY (`ID`),
UNIQUE KEY `Uniq_idx` (`EntryType`,`IDTypeOne`,`IDTypeTwo`,`IDTypeThree`,`Version`),
KEY `Time` (`Time`),
) ENGINE=InnoDB AUTO_INCREMENT=3816978 DEFAULT CHARSET=utf8
A relatively small table of less than 4 million rows (actually fewer since not all the auto increments resulted in a new row). Each time this query ran it took between 2.5 and 5 seconds.
Here’s what EXPLAIN EXTENDED showed:
id select_type table type possible_keys key key_len ref
1 SIMPLE ExistingUsers ref Uniq_idx Uniq_idx 2 const
rows filtered Extra
1894887 100 Using where; Using index
The important number here is rows
. MySQL is expecting to examine about half of all rows in the table. That’s telling me the index is not doing enough of the heavy lifting.
The first thing I noticed right away is that our query has two direct columns it is always going to match against and then three more columns that are all “OR”s. The order of the unique index might be better suited to have those two hard matches first, followed by the three OR columns. From this:
`Uniq_idx` (`EntryType`,`IDTypeOne`,`IDTypeTwo`,`IDTypeThree`,`Version`)
to this:
`Uniq_idx` (`EntryType`,`Version`,`IDTypeOne`,`IDTypeTwo`,`IDTypeThree`)
Here is EXPLAIN EXTENDED after this change:
id select_type table type possible_keys key key_len ref
1 SIMPLE ExistingUsers ref Uniq_idx Uniq_idx 197 const,const
rows filtered Extra
78550 100.00 Using where; Using index
This change actually helped the performance quite a bit. We can see the number of rows expecting to be examined is about 5% of what it was before. Also note the ref
column shows const,const telling us there are two constraints being used in the index.
After this change, if there was a matching row to return, the query took around 0.03 secs. Definitely acceptable. However, if there was no match to return, there was no improvement in run time at all. I would expect that roughly 10% of the time this query runs, it would not have a row to match against.
So that leaves one final optimization to try. Can we somehow create one or more indices that work for the other three columns that are examined in the “OR” clauses of the query? The answer here lies in a MySQL optimization called Index Merge. What we need to do to take advantage of this optimization is to index each of the columns to be examined by the “OR” clauses separately.
This command should do it:
ALTER TABLE `ExistingUsers`
ADD KEY `IDTypeOne_idx` (`IDTypeOne`),
ADD KEY `IDTypeOne_idx` (`IDTypeTwo`),
ADD KEY `IDTypeThree_idx` (`IDTypeThree`);
Now let’s look at an explain of our query:
id select_type table type possible_keys
1 SIMPLE ExistingUsers index_merge Uniq_idx,IDTypeOne_idx,IDTypeOne_idx,IDTypeThree_idx
key key_len ref
IDTypeOne_idx,IDTypeOne_idx,IDTypeThree_idx 387,195,195 null
rows filtered Extra
3 100.00 Using union(IDTypeOne_idx,IDTypeOne_idx,IDTypeThree_idx); Using where
EXPLAIN is telling us it expects to only examine three rows. That’s looking much better. We also see type
showing index_merge.
MySQL documentation states:
The Index Merge method is used to retrieve rows with several range scans and to merge their results into one.
In possible_keys
we see the index that was being used before, along with the three new indexes. In key
we see that the optimizer is planning on using the three new indexes together and not even using the previous index.
Also note the Extra
column is telling us it is planning on using the union flavor of the index merge optimization.
Again from MySQL documentation:
The [Union] algorithm can be employed when the table’s WHERE clause was converted to several range conditions on different keys combined with OR
There are some special cases and restrictions on MySQL’s use of the index merge optimization. I won’t go into any more details here, but if you’re having a hard time getting indexes to work on a query containing multiple or nested AND/OR clauses, the index Merge optimization might be what you’re looking for.