[Solved] Improving speed of mySQL Query on a larger database
Looking to automatically optimize YOUR SQL query? Start for free.

EverSQL Database Performance Knowledge Base

Improving speed of mySQL Query on a larger database

Database type:

I am trying to optimize an SQL query to hopefully increase the speed of its execution.

I have the following two tables:

CREATE TABLE IF NOT EXISTS `data` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `entry` varchar(255) NOT NULL,
  `numDB` int(11) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  UNIQUE KEY `entry` (`entry`),
  UNIQUE KEY `entry_numDB` (`entry`,`numDB`),
  UNIQUE KEY `entry_numDB_id` (`id`,`entry`,`numDB`),
  KEY `numDB` (`numDB`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1;

CREATE TABLE IF NOT EXISTS `details` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `dataID` bigint(20) NOT NULL,
  `dbID` int(11) NOT NULL,
  <removed - unimportant>
  PRIMARY KEY (`id`),
  KEY `dataID` (`dataID`),
  KEY `dbID` (`dbID`),
  KEY `dataID_dbID` (`dataID`,`dbID`),
  <removed - unimportant>
) ENGINE=InnoDB  DEFAULT CHARSET=latin1;

An entry (e.g., 'abc') is stored in table data as "id = 1; entry = abc, numDB = 2" and has (at least) two entries in details "id = 1, dataID = 1, dbID = 4" and "id = 2, dataID = 1, dbID = 17", however the same dataID and dbID can occur multiple times, e.g., "id = 3, dataID = 1, dbID = 17", "id = 4, dataID = 1, dbID = 17".

Overall number of entries in table data: 45.245.438
Overall number of entries in table details: 126.608.661

Now I would like to get the first 50 entries from table data which do not have the condition dbID = 4, sorted by data.numDB. The resulting query is:

SELECT DISTINCT(data.entry), data.numDB FROM blacklists.data data INNER JOIN blacklists.details details ON details.dbID NOT IN (4) AND data.id = details.dataID ORDER BY data.numDB DESC LIMIT 50

which (at least) takes 10 minutes of processing time (I stopped it after 10 minutes).

This is what EXPLAIN outputs:

EXPLAIN SELECT DISTINCT(data.entry), data.numDB FROM blacklists.data data INNER JOIN blacklists.details details ON details.dbID NOT IN (4) AND data.id = details.dataID ORDER BY data.numDB DESC LIMIT 50

id  select_type  table    type   possible_keys            key         key_len  ref                rows      Extra
1   SIMPLE       data     index  PRIMARY,entry_numDB_id   entry_numDB 261      NULL               45166874  Using index; Using temporary; Using filesort
1   SIMPLE       details  ref    dataID,dbID,dataID_dbID  dataID      8        blacklists.data.id  1        Using where; Distinct

Not using DISTINCT (or GROUP BY) results in entries being repeated multiple times.

Is there a way to improve this query? I already read many help pages and other questions and their answers but were unable to find a solution for these tables.

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  2. Replace Join With Exists To Avoid Redundant Grouping (modified query below): When a joined table isn’t used anywhere other than in the WHERE clause, it's equivalent to an EXISTS subquery, which often performs better. In cases where the DISTINCT or GROUP BY clause contains only columns from the Primary key, they can be removed to further improve performance, as after this transformation, they are redundant.
Optimal indexes for this query:
ALTER TABLE `data` ADD INDEX `data_idx_numdb` (`numDB`);
ALTER TABLE `details` ADD INDEX `details_idx_dataid_dbid` (`dataID`,`dbID`);
The optimized query:
SELECT
        DISTINCT (data.entry),
        data.numDB 
    FROM
        blacklists.data data 
    WHERE
        (
            EXISTS (
                SELECT
                    1 
                FROM
                    blacklists.details details 
                WHERE
                    details.dbID NOT IN (
                        4
                    ) 
                    AND data.id = details.dataID
            )
        ) 
    ORDER BY
        data.numDB DESC LIMIT 50

Related Articles



* original question posted on StackOverflow here.