[Solved] Need MySQL optimization for complex search on EAV structured data

EverSQL Database Performance Knowledge Base

Need MySQL optimization for complex search on EAV structured data

Database type:

I have a large database with EAV structured data that has to be searchable and pageable. I tried every trick in my book to get it fast enough, but under certain circumstances, it still fails to complete in a reasonable time.

This is my table structure (relevant parts only, ask away if you need more):

CREATE TABLE IF NOT EXISTS `object` (
  `object_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `oid` varchar(32) CHARACTER SET utf8 NOT NULL,
  `status` varchar(100) CHARACTER SET utf8 DEFAULT NULL,
  `created` datetime NOT NULL,
  `updated` datetime NOT NULL,
  PRIMARY KEY (`object_id`),
  UNIQUE KEY `oid` (`oid`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8;

CREATE TABLE IF NOT EXISTS `version` (
  `version_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `type_id` bigint(20) NOT NULL,
  `object_id` bigint(20) NOT NULL,
  `created` datetime NOT NULL,
  `status` varchar(100) CHARACTER SET utf8 DEFAULT NULL,
  PRIMARY KEY (`version_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8;

CREATE TABLE IF NOT EXISTS `value` (
  `value_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `object_id` int(11) NOT NULL,
  `attribute_id` int(11) NOT NULL,
  `version_id` bigint(20) NOT NULL,
  `type_id` bigint(20) NOT NULL,
  `value` text NOT NULL,
  PRIMARY KEY (`value_id`),
  KEY `field_id` (`attribute_id`),
  KEY `action_id` (`version_id`),
  KEY `form_id` (`type_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8;

This is a sample object. I have around 1 million of those in my database. each object may have different number of attributes with different attribute_id

INSERT INTO `owner` (`owner_id`, `uid`, `status`, `created`, `updated`) VALUES (1, 'cwnzrdxs4dzxns47xs4tx', 'Green', NOW(), NOW());
INSERT INTO `object` (`object_id`, `type_id`, `owner_id`, `created`, `status`) VALUES (1, 1, 1, NOW(), NOW());
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (1, 1, 1, 1, 1, 'Munich');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (2, 1, 2, 1, 1, 'Germany');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (3, 1, 3, 1, 1, '123');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (4, 1, 4, 1, 1, '2012-01-13');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (5, 1, 5, 1, 1, 'A cake!');

Now on to my current mechanism. My first try was the typical approach to Mysql. Do one huge SQL with loads of joins on anything I require. Complete desaster! Took way to long to load and even crashed the PHP and MySQL servers due to exhausted RAM.

So I split my queries up into several steps:

1 Determine all needed attribute_ids.

I can look them up in another table that references the type_id of an object. The result is a list of attribute_ids. (this table is not very relevant to the performance, so it's not included in my sample.)

:type_id contains all type_ids from any objects I want to include in my search. I already got this information in my application. So this is inexpensive.

SELECT * FROM attribute WHERE form_id IN (:type_id)

Result is an array of type_id integers.

2 Search for matching objects A big SQL query is compiled that adds one INNER JOIN for each and every condition I want. This sounds horrible, but in the end, it was the fastest method :(

A typical generated query might look like this. The LIMIT sadly is necessary or I will potentially get so many IDs that the resulting array makes PHP explode or break the IN statement in the next Query:

SELECT DISTINCT `version`.object_id FROM `version`
INNER JOIN `version` AS condition1 
        ON `version`.version_id = condition1.version_id 
       AND condition1.created = '2012-03-04' -- Filter by version date
INNER JOIN `value` AS condition2 
        ON `version`.version_id = condition2.version_id
       AND condition2.type_id IN (:type_id) -- try to limit joins to object types we need
       AND condition2.attribute_id = :field_id2 -- searching for a value in a specific attribute
       AND condition2.value = 'Munich' -- searching for the value 'Munich'
INNER JOIN `value` AS condition3 
        ON `version`.version_id = condition3.version_id
       AND condition3.type_id IN (:type_id) -- try to limit joins to object types we need
       AND condition3.attribute_id = :field_id3 -- searching for a value in a specific attribute
       AND condition3.value = 'Green' -- searching for the value 'Green'
WHERE `version`.type_id IN (:type_id) ORDER BY `version`.version_id DESC LIMIT 10000

The result will contain all object_ids from any object I might need. I am selecting object_ids and not version_ids as I need to have all versions of matching objects, regardless of which version matched.

3 Sort and page results Next I will create a query that sorts the objects by a certain attribute and then pages the resulting array.

SELECT DISTINCT object_id
FROM value
WHERE object_id IN (:foundObjects)
AND attribute_id = :attribute_id_to_sort
AND value > ''
ORDER BY value ASC LIMIT :limit OFFSET :offset

The result is a sorted and paged list of object ids from former search

4 Get our complete objects, versions and attributes In the last step, I will select all values for any objects and versions the former queries found.

SELECT `value`.*, `object`.*, `version`.*, `type`.*
`object`.status AS `object.status`,
`object`.flag AS `object.flag`,
`version`.created AS `version.created`,
`version`.status AS `version.status`,
FROM version
INNER JOIN `type` ON `version`.form_id = `type`.type_id
INNER JOIN `object` ON `version`.object_id = `object`.object_id
LEFT JOIN value ON `version`.version_id = `value`.version_id
WHERE version.object_id IN (:sortedObjectIds) AND `version.type_id IN (:typeIds)
ORDER BY version.created DESC

The result will then be compiled via PHP into nice object->version->value array structures.


Now the question:

If all else fails, maybe switch database technology? See my other question: Database optimized for searching in large number of objects with different attributes


Real Life samples

Table sizes: object - 193801 rows, version - 193841 rows, value - 1053928 rows

SELECT * FROM attribute WHERE attribute_id IN (30)

SELECT DISTINCT `version`.object_id
FROM version  
INNER JOIN value AS condition_d4e328e33813 
     ON version.version_id = condition_d4e328e33813.version_id
    AND condition_d4e328e33813.type_id IN (30)
    AND condition_d4e328e33813.attribute_id IN (377) 
    AND condition_d4e328e33813.value LIKE '%e%'  
INNER JOIN value AS condition_2c870b0a429f 
     ON version.version_id = condition_2c870b0a429f.version_id
    AND condition_2c870b0a429f.type_id IN (30)
    AND condition_2c870b0a429f.attribute_id IN (376) 
    AND condition_2c870b0a429f.value LIKE '%s%' 
WHERE version.type_id IN (30) 
ORDER BY version.version_id DESC LIMIT 10000 -- limit to 10000 or it breaks!

Explain:

id  select_type  table                   type      possible_keys                key         key_len ref                               rows      Extra   
1   SIMPLE       condition_2c870b0a429f  ref       field_id,action_id,form_id   field_id    4       const                             178639    Using where; Using temporary; Using filesort
1   SIMPLE       action                  eq_ref    PRIMARY                      PRIMARY     8       condition_2c870b0a429f.action_id  1         Using where
1   SIMPLE       condition_d4e328e33813  ref       field_id,action_id,form_id   action_id   8       action.action_id                  11        Using where; Distinct

objects search completed (Peak RAM: 5.91MB, Time: 4.64s)

SELECT DISTINCT object_id
FROM version
WHERE object_id IN (193793,193789, ... ,135326,135324) -- 10000 ids in here!
ORDER BY created ASC
LIMIT 50 OFFSET 0                                                  

objects sort completed (Peak RAM: 6.68MB, Time: 0.352s)

SELECT `value`.*, object.*, version.*, type.*,
    object.status AS `object.status`,
    object.flag AS `object.flag`,
    version.created AS `version.created`,
    version.status AS `version.status`,
    version.flag AS `version.flag`
FROM version
INNER JOIN type ON version.type_id = type.type_id
INNER JOIN object ON version.object_id = object.object_id
LEFT JOIN value ON version.version_id = `value`.version_id
WHERE version.object_id IN (135324,135326,...,135658,135661) AND version.type_id IN (30)
ORDER BY quality DESC, version.created DESC 

objects load query completed (Peak RAM: 6.68MB, Time: 0.083s)
objects compilation into arrays completed (Peak RAM: 6.68MB, Time: 0.007s)

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Selecting Unnecessary Columns (query line: 2): Avoid selecting all columns with the '*' wildcard, unless you intend to use them all. Selecting redundant columns may result in unnecessary performance degradation.
  2. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
ALTER TABLE `attribute` ADD INDEX `attribute_idx_form_id` (`form_id`);
The optimized query:
SELECT
        * 
    FROM
        attribute 
    WHERE
        attribute.form_id IN (
            :type_id
        )

Related Articles



* original question posted on StackOverflow here.