[Solved] Query Large Table With Large Number Of Matches In Query

EverSQL Database Performance Knowledge Base

Query Large Table With Large Number Of Matches In Query

Database type:

I have a table of emails with about 800 million entries, and I need to match it against a list of about 100,000 emails. The email entry in the database is indexed.

What would be the best query to do it? I've tried creating a temporary table with the entries I want to match against and then doing a SELECT against that and the main table, but it seems like it should be faster than it is - multiple queries seem to slow down. Any suggestions?

The server is running MySQL 5.5.27 on CentOS 6.3.

EDIT: Here are the creation statements:

CREATE TEMPORARY TABLE temptable (Email varchar(50)) 
CREATE INDEX tempindex ON temptable (Email)

And here's my query:

SELECT temptable.Email FROM temptable, biglist WHERE temptable.Email = biglist.Email

And here's the EXPLAIN result:

+----+-------------+-----------+-------+---------------+-----------+---------+-----------------------+------+--------------------------+
| id | select_type | table     | type  | possible_keys | key       | key_len | ref                   | rows | Extra                    |
+----+-------------+-----------+-------+---------------+-----------+---------+-----------------------+------+--------------------------+
|  1 | SIMPLE      | temptable | index | tempindex     | tempindex | 53      | NULL                  |   21 | Using index              |
|  1 | SIMPLE      | biglist   | ref   | idx_Email     | idx_Email | 202     | datab.temptable.email |    1 | Using where; Using index |
+----+-------------+-----------+-------+---------------+-----------+---------+-----------------------+------+--------------------------+

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
ALTER TABLE `biglist` ADD INDEX `biglist_idx_email` (`Email`);
ALTER TABLE `temptable` ADD INDEX `temptable_idx_email` (`Email`);
The optimized query:
SELECT
        temptable.Email 
    FROM
        temptable,
        biglist 
    WHERE
        temptable.Email = biglist.Email

Related Articles



* original question posted on StackOverflow here.