I have a large database with EAV structured data that has to be searchable and pageable. I tried every trick in my book to get it fast enough, but under certain circumstances, it still fails to complete in a reasonable time.
This is my table structure (relevant parts only, ask away if you need more):
CREATE TABLE IF NOT EXISTS `object` (
`object_id` bigint(20) NOT NULL AUTO_INCREMENT,
`oid` varchar(32) CHARACTER SET utf8 NOT NULL,
`status` varchar(100) CHARACTER SET utf8 DEFAULT NULL,
`created` datetime NOT NULL,
`updated` datetime NOT NULL,
PRIMARY KEY (`object_id`),
UNIQUE KEY `oid` (`oid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `version` (
`version_id` bigint(20) NOT NULL AUTO_INCREMENT,
`type_id` bigint(20) NOT NULL,
`object_id` bigint(20) NOT NULL,
`created` datetime NOT NULL,
`status` varchar(100) CHARACTER SET utf8 DEFAULT NULL,
PRIMARY KEY (`version_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `value` (
`value_id` bigint(20) NOT NULL AUTO_INCREMENT,
`object_id` int(11) NOT NULL,
`attribute_id` int(11) NOT NULL,
`version_id` bigint(20) NOT NULL,
`type_id` bigint(20) NOT NULL,
`value` text NOT NULL,
PRIMARY KEY (`value_id`),
KEY `field_id` (`attribute_id`),
KEY `action_id` (`version_id`),
KEY `form_id` (`type_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
This is a sample object. I have around 1 million of those in my database. each object may have different number of attributes with different attribute_id
INSERT INTO `owner` (`owner_id`, `uid`, `status`, `created`, `updated`) VALUES (1, 'cwnzrdxs4dzxns47xs4tx', 'Green', NOW(), NOW());
INSERT INTO `object` (`object_id`, `type_id`, `owner_id`, `created`, `status`) VALUES (1, 1, 1, NOW(), NOW());
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (1, 1, 1, 1, 1, 'Munich');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (2, 1, 2, 1, 1, 'Germany');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (3, 1, 3, 1, 1, '123');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (4, 1, 4, 1, 1, '2012-01-13');
INSERT INTO `value` (`value_id`, `owner_id`, `attribute_id`, `object_id`, `type_id`, `value`) VALUES (5, 1, 5, 1, 1, 'A cake!');
Now on to my current mechanism. My first try was the typical approach to Mysql. Do one huge SQL with loads of joins on anything I require. Complete desaster! Took way to long to load and even crashed the PHP and MySQL servers due to exhausted RAM.
So I split my queries up into several steps:
1 Determine all needed attribute_ids.
I can look them up in another table that references the type_id of an object. The result is a list of attribute_ids. (this table is not very relevant to the performance, so it's not included in my sample.)
:type_id contains all type_ids from any objects I want to include in my search. I already got this information in my application. So this is inexpensive.
SELECT * FROM attribute WHERE form_id IN (:type_id)
Result is an array of type_id integers.
2 Search for matching objects A big SQL query is compiled that adds one INNER JOIN for each and every condition I want. This sounds horrible, but in the end, it was the fastest method :(
A typical generated query might look like this. The LIMIT sadly is necessary or I will potentially get so many IDs that the resulting array makes PHP explode or break the IN statement in the next Query:
SELECT DISTINCT `version`.object_id FROM `version`
INNER JOIN `version` AS condition1
ON `version`.version_id = condition1.version_id
AND condition1.created = '2012-03-04' -- Filter by version date
INNER JOIN `value` AS condition2
ON `version`.version_id = condition2.version_id
AND condition2.type_id IN (:type_id) -- try to limit joins to object types we need
AND condition2.attribute_id = :field_id2 -- searching for a value in a specific attribute
AND condition2.value = 'Munich' -- searching for the value 'Munich'
INNER JOIN `value` AS condition3
ON `version`.version_id = condition3.version_id
AND condition3.type_id IN (:type_id) -- try to limit joins to object types we need
AND condition3.attribute_id = :field_id3 -- searching for a value in a specific attribute
AND condition3.value = 'Green' -- searching for the value 'Green'
WHERE `version`.type_id IN (:type_id) ORDER BY `version`.version_id DESC LIMIT 10000
The result will contain all object_ids from any object I might need. I am selecting object_ids and not version_ids as I need to have all versions of matching objects, regardless of which version matched.
3 Sort and page results Next I will create a query that sorts the objects by a certain attribute and then pages the resulting array.
SELECT DISTINCT object_id
FROM value
WHERE object_id IN (:foundObjects)
AND attribute_id = :attribute_id_to_sort
AND value > ''
ORDER BY value ASC LIMIT :limit OFFSET :offset
The result is a sorted and paged list of object ids from former search
4 Get our complete objects, versions and attributes In the last step, I will select all values for any objects and versions the former queries found.
SELECT `value`.*, `object`.*, `version`.*, `type`.*
`object`.status AS `object.status`,
`object`.flag AS `object.flag`,
`version`.created AS `version.created`,
`version`.status AS `version.status`,
FROM version
INNER JOIN `type` ON `version`.form_id = `type`.type_id
INNER JOIN `object` ON `version`.object_id = `object`.object_id
LEFT JOIN value ON `version`.version_id = `value`.version_id
WHERE version.object_id IN (:sortedObjectIds) AND `version.type_id IN (:typeIds)
ORDER BY version.created DESC
The result will then be compiled via PHP into nice object->version->value array structures.
Now the question:
If all else fails, maybe switch database technology? See my other question: Database optimized for searching in large number of objects with different attributes
Real Life samples
Table sizes: object - 193801 rows, version - 193841 rows, value - 1053928 rows
SELECT * FROM attribute WHERE attribute_id IN (30)
SELECT DISTINCT `version`.object_id
FROM version
INNER JOIN value AS condition_d4e328e33813
ON version.version_id = condition_d4e328e33813.version_id
AND condition_d4e328e33813.type_id IN (30)
AND condition_d4e328e33813.attribute_id IN (377)
AND condition_d4e328e33813.value LIKE '%e%'
INNER JOIN value AS condition_2c870b0a429f
ON version.version_id = condition_2c870b0a429f.version_id
AND condition_2c870b0a429f.type_id IN (30)
AND condition_2c870b0a429f.attribute_id IN (376)
AND condition_2c870b0a429f.value LIKE '%s%'
WHERE version.type_id IN (30)
ORDER BY version.version_id DESC LIMIT 10000 -- limit to 10000 or it breaks!
Explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE condition_2c870b0a429f ref field_id,action_id,form_id field_id 4 const 178639 Using where; Using temporary; Using filesort
1 SIMPLE action eq_ref PRIMARY PRIMARY 8 condition_2c870b0a429f.action_id 1 Using where
1 SIMPLE condition_d4e328e33813 ref field_id,action_id,form_id action_id 8 action.action_id 11 Using where; Distinct
objects search completed (Peak RAM: 5.91MB, Time: 4.64s)
SELECT DISTINCT object_id
FROM version
WHERE object_id IN (193793,193789, ... ,135326,135324) -- 10000 ids in here!
ORDER BY created ASC
LIMIT 50 OFFSET 0
objects sort completed (Peak RAM: 6.68MB, Time: 0.352s)
SELECT `value`.*, object.*, version.*, type.*,
object.status AS `object.status`,
object.flag AS `object.flag`,
version.created AS `version.created`,
version.status AS `version.status`,
version.flag AS `version.flag`
FROM version
INNER JOIN type ON version.type_id = type.type_id
INNER JOIN object ON version.object_id = object.object_id
LEFT JOIN value ON version.version_id = `value`.version_id
WHERE version.object_id IN (135324,135326,...,135658,135661) AND version.type_id IN (30)
ORDER BY quality DESC, version.created DESC
objects load query completed (Peak RAM: 6.68MB, Time: 0.083s)
objects compilation into arrays completed (Peak RAM: 6.68MB, Time: 0.007s)
The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:
ALTER TABLE `attribute` ADD INDEX `attribute_idx_form_id` (`form_id`);
SELECT
*
FROM
attribute
WHERE
attribute.form_id IN (
:type_id
)