We have a table for logging actions. The volume added up pretty fast. After about 3-4 months, we got 600M rows, we need to store the data for 1 year at least, so it is expected to host 2-3 billions rows.
The structure is not very complicated:
CREATE TABLE `site_statistic` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`entityType` varchar(50) NOT NULL,
`entityId` int(10) unsigned NOT NULL,
`entityCount` int(10) unsigned NOT NULL DEFAULT '1',
`timeStamp` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `entity` (`entityType`,`timeStamp`,`entityId`)
) ENGINE=MyISAM CHARSET=utf8
Sample data:
Currently, data insert is not a problem (yet). But reading the data is very slow. It took 90+ seconds to query product activities for a given period of time:
SELECT
`entityType`,
DATE_FORMAT(FROM_UNIXTIME(`timeStamp`), "%e") AS `category`,
SUM(`entityCount`) as `count`
FROM
`site_statistic`
WHERE
`entityType` IN ('product.displayed','product.clicked','product.add')
AND
`timeStamp` >= 1527267600
AND
`timeStamp` <= 1527872399 AND entityId = 12807
GROUP BY
`entityType`,
`category`;
Mysql crashed (or rather not responding) from time to time when many statistic reports are required at the same time.
Deleting the data (related to a product for example) is also very slow. We need to delete to save space...
What can we do to improve it, apart from disable real time report and send later by email?
P.s.: Our client can be considered small (few guys hosting a website with modest profit) not some crazy enterprise solution.
The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:
ALTER TABLE `site_statistic` ADD INDEX `site_statistic_idx_entityid_entitytype_timestamp` (`entityId`,`entityType`,`timeStamp`);
SELECT
`site_statistic`.`entityType`,
DATE_FORMAT(FROM_UNIXTIME(`site_statistic`.`timeStamp`),
"%e") AS `category`,
SUM(`site_statistic`.`entityCount`) AS `count`
FROM
`site_statistic`
WHERE
`site_statistic`.`entityType` IN (
'product.displayed', 'product.clicked', 'product.add'
)
AND `site_statistic`.`timeStamp` >= 1527267600
AND `site_statistic`.`timeStamp` <= 1527872399
AND `site_statistic`.entityId = 12807
GROUP BY
`site_statistic`.`entityType`,
`category`
ORDER BY
NULL