[Solved] Most efficient way to store data for a graph

EverSQL Database Performance Knowledge Base

Most efficient way to store data for a graph

I have come up with a total of three different, equally viable methods for saving data for a graph.

The graph in question is "player's score in various categories over time". Categories include "buildings", "items", "quest completion", "achievements" and so on.

Method 1:

CREATE TABLE `graphdata` (
    `userid` INT UNSIGNED NOT NULL,
    `date` DATE NOT NULL,
    `category` ENUM('buildings','items',...) NOT NULL,
    `score` FLOAT UNSIGNED NOT NULL,
    PRIMARY KEY (`userid`, `date`, `category`),
    INDEX `userid` (`userid`),
    INDEX `date` (`date`)
) ENGINE=InnoDB

This table contains one row for each user/date/category combination. To show a user's data, select by userid. Old entries are cleared out by:

DELETE FROM `graphdata` WHERE `date` < DATE_ADD(NOW(),INTERVAL -1 WEEK)

Method 2:

CREATE TABLE `graphdata` (
    `userid` INT UNSIGNED NOT NULL,
    `buildings-1day` FLOAT UNSIGNED NOT NULL,
    `buildings-2day` FLOAT UNSIGNED NOT NULL,
    ... (and so on for each category up to `-7day`
    PRIMARY KEY (`userid`)
)

Selecting by user id is faster due to being a primary key. Every day scores are shifted down the fields, as in:

... SET `buildings-3day`=`buildings-2day`, `buildings-2day`=`buildings-1day`...

Entries are not deleted (unless a user deletes their account). Rows can be added/updated with an INSERT...ON DUPLICATE KEY UPDATE query.

Method 3:

Use one file for each user, containing a JSON-encoded array of their score data. Since the data is being fetched by an AJAX JSON call anyway, this means the file can be fetched statically (and even cached until the following midnight) without any stress on the server. Every day the server runs through each file, shift()s the oldest score off each array and push()es the new one on the end.


Personally I think Method 3 is by far the best, however I've heard bad things about using files instead of databases - for instance if I wanted to be able to rank users by their scores in different categories, this solution would be very bad.

Out of the two database solutions, I've implemented Method 2 on one of my older projects, and that seems to work quite well. Method 1 seems "better" in that it makes better use of relational databases and all that stuff, but I'm a little concerned in that it will contain (number of users) * (number of categories) * 7 rows, which could turn out to be a big number.

Is there anything I'm missing that could help me make a final decision on which method to use? 1, 2, 3 or none of the above?

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
ALTER TABLE `graphdata` ADD INDEX `graphdata_idx_date` (`date`);
The optimized query:
SELECT
        1 
    FROM
        `graphdata` 
    WHERE
        `graphdata`.`date` < DATE_ADD(NOW(), INTERVAL -1 WEEK)

Related Articles



* original question posted on StackOverflow here.