[Solved] MySQL Query Optimization - Indexes & Joins 12 million+ records

EverSQL Database Performance Knowledge Base

MySQL Query Optimization - Indexes & Joins 12 million+ records

Database type:

I'm running the following query:

SELECT
    `track_title`.`title`,
    `track`.`id`,
    `artist`.`name` artist,
    `album`.`title` album,
    `st_tag`.`tag`,
    `track`.`year`
FROM 
    `track`
JOIN `artist` ON `track`.`artist_id` = `artist`.`id`
JOIN `album` ON `track`.`album_id` = `album`.`id`
JOIN `track_title` ON `track`.`id` = `track_title`.`id`
JOIN `track_tags` ON `track`.`id` = `track_tags`.`track_id`
JOIN `st_tag` ON `track_tags`.`tag_id` = `st_tag`.`id`
WHERE
    `track_title`.`title_sch` LIKE "% love %"
ORDER BY
    `track_title`.`title`
LIMIT
    0,500;

Right now it takes at least 15 mins to execute. Both track and track_title have about 12 million records. Album and artist have at least a million each. Can this be sped up with better indexes? I've been playing with a combination of indexes in order to speed up the queries but i'm still having trouble getting it right.

Table structure:

CREATE TABLE `album` (
  `id` int(10) unsigned NOT NULL,
  `title` varchar(1000) default NULL,
  `title_sch` varchar(1000) default NULL,
  `flags` smallint(5) unsigned NOT NULL default '0',
  PRIMARY KEY  (`id`),
  KEY `title_sch` (`title_sch`(255)),
  KEY `album_title` USING BTREE (`title`(255)),
  KEY `album_cluster` USING BTREE (`id`,`title`(255),`title_sch`(255))
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `artist` (
  `id` int(10) unsigned NOT NULL,
  `name` varchar(1000) default NULL,
  `name_sch` varchar(1000) default NULL,
  `flags` smallint(5) unsigned NOT NULL default '0',
  PRIMARY KEY  (`id`),
  KEY `name_sch` (`name_sch`(255)),
  KEY `name` USING BTREE (`name`(255)),
  KEY `artist_name` USING BTREE (`name`(255)),
  KEY `artist_cluster` USING BTREE (`id`,`name`(255),`name_sch`(255))
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `st_tag` (
  `id` smallint(5) unsigned NOT NULL auto_increment,
  `tag` varchar(50) NOT NULL,
  `genre` smallint(5) unsigned NOT NULL,
  `parent` smallint(5) unsigned default NULL,
  `depth` tinyint(3) unsigned NOT NULL default '0',
  `display` tinyint(4) NOT NULL default '-1',
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=352 DEFAULT CHARSET=utf8;

CREATE TABLE `track` (
  `id` int(10) unsigned NOT NULL,
  `artist_id` int(10) unsigned NOT NULL,
  `album_id` int(10) unsigned NOT NULL,
  `status` smallint(5) unsigned NOT NULL default '0',
  `flags` smallint(5) unsigned NOT NULL default '0',
  `year` smallint(5) unsigned default NULL,
  `duration` smallint(5) unsigned NOT NULL,
  `track_no` smallint(5) unsigned default NULL,
  `disc_no` tinyint(3) unsigned default NULL,
  `explicit` tinyint(3) unsigned NOT NULL default '0',
  `popularity` tinyint(3) unsigned NOT NULL default '0',
  PRIMARY KEY  (`id`),
  KEY `album_id` (`album_id`),
  KEY `artist_id` (`artist_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `track_tags` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `track_id` int(10) unsigned NOT NULL,
  `tag_id` smallint(5) unsigned NOT NULL,
  `status` tinyint(3) unsigned NOT NULL default '0',
  PRIMARY KEY  (`id`),
  KEY `track_id` (`track_id`),
  KEY `tag_id` (`tag_id`)
) ENGINE=InnoDB AUTO_INCREMENT=26661380 DEFAULT CHARSET=utf8;

CREATE TABLE `track_title` (
  `id` int(10) unsigned NOT NULL,
  `title` varchar(1000) default NULL,
  `title_sch` varchar(1000) default NULL,
  `version` varchar(100) default NULL,
  PRIMARY KEY  (`id`),
  KEY `title` (`title`(255)),
  KEY `title_sch` (`title_sch`(255)),
  KEY `title_cluster` USING BTREE (`id`,`title`(255),`title_sch`(255))
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

EDIT

The server i'm running on has about 4gbs of ram.

Querying without the first % definitely speeds up the query. It runs in 4 secs approximately. However, that doesn't give me the results i need because I need all titles with the word "love" in it not just ones that start with love. What do you guys think about this? I have a title_sch field in the track_title table. That field is a normalized version of the title where any punctuation has been removed, title has been changed to lowercase etc. Basically a field where the title has been cured. I was thinking what if i move the regular title back to the tracks table and make to track_title table a one to many relationship from track to track_title. But i would change track_title to have one word entry per word pertaining to each track. I know this will increase the table size tremendously but i then i can index track_title.title_sch and instead of using like I can use "=" right? Or when the user searches "love hurts" i can do use "IN" instead. What do you guys think? I'm open to other suggestions.

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  2. Sort and Limit Before Joining (modified query below): In cases where the joins aren't filtering any rows, it's possible to sort and limit the amount of rows using a subquery in the FROM clause, before applying the joins to all other tables.
Optimal indexes for this query:
ALTER TABLE `album` ADD INDEX `album_idx_id` (`id`);
ALTER TABLE `st_tag` ADD INDEX `st_tag_idx_id` (`id`);
ALTER TABLE `track` ADD INDEX `track_idx_id` (`id`);
ALTER TABLE `track_tags` ADD INDEX `track_tags_idx_track_id` (`track_id`);
ALTER TABLE `track_title` ADD INDEX `track_title_idx_title` (`title`);
The optimized query:
SELECT
        track_title_title,
        track_id,
        `artist`.`name` artist,
        `album`.`title` album,
        `st_tag`.`tag`,
        track_year 
    FROM
        (SELECT
            `track`.`id` AS track_id,
            `track`.`year` AS track_year,
            `track`.`artist_id` AS track_artist_id,
            `track`.`album_id` AS track_album_id,
            `track`.`title` AS track_title_title 
        FROM
            `track` 
        JOIN
            `track_title` 
                ON `track`.`id` = `track_title`.`id` 
        WHERE
            `track_title`.`title_sch` LIKE "% love %" 
        ORDER BY
            `track_title`.`title` LIMIT 500) AS `track` 
    JOIN
        `artist` 
            ON `track`.track_artist_id = `artist`.`id` 
    JOIN
        `album` 
            ON `track`.track_album_id = `album`.`id` 
    JOIN
        `track_tags` 
            ON `track`.track_id = `track_tags`.`track_id` 
    JOIN
        `st_tag` 
            ON `track_tags`.`tag_id` = `st_tag`.`id` 
    WHERE
        1 = 1 LIMIT 500

Related Articles



* original question posted on StackOverflow here.