I have come up with a total of three different, equally viable methods for saving data for a graph.
The graph in question is "player's score in various categories over time". Categories include "buildings", "items", "quest completion", "achievements" and so on.
CREATE TABLE `graphdata` (
`userid` INT UNSIGNED NOT NULL,
`date` DATE NOT NULL,
`category` ENUM('buildings','items',...) NOT NULL,
`score` FLOAT UNSIGNED NOT NULL,
PRIMARY KEY (`userid`, `date`, `category`),
INDEX `userid` (`userid`),
INDEX `date` (`date`)
) ENGINE=InnoDB
This table contains one row for each user/date/category combination. To show a user's data, select by userid
. Old entries are cleared out by:
DELETE FROM `graphdata` WHERE `date` < DATE_ADD(NOW(),INTERVAL -1 WEEK)
CREATE TABLE `graphdata` (
`userid` INT UNSIGNED NOT NULL,
`buildings-1day` FLOAT UNSIGNED NOT NULL,
`buildings-2day` FLOAT UNSIGNED NOT NULL,
... (and so on for each category up to `-7day`
PRIMARY KEY (`userid`)
)
Selecting by user id is faster due to being a primary key. Every day scores are shifted down the fields, as in:
... SET `buildings-3day`=`buildings-2day`, `buildings-2day`=`buildings-1day`...
Entries are not deleted (unless a user deletes their account). Rows can be added/updated with an INSERT...ON DUPLICATE KEY UPDATE
query.
Use one file for each user, containing a JSON-encoded array of their score data. Since the data is being fetched by an AJAX JSON call anyway, this means the file can be fetched statically (and even cached until the following midnight) without any stress on the server. Every day the server runs through each file, shift()
s the oldest score off each array and push()
es the new one on the end.
Personally I think Method 3 is by far the best, however I've heard bad things about using files instead of databases - for instance if I wanted to be able to rank users by their scores in different categories, this solution would be very bad.
Out of the two database solutions, I've implemented Method 2 on one of my older projects, and that seems to work quite well. Method 1 seems "better" in that it makes better use of relational databases and all that stuff, but I'm a little concerned in that it will contain (number of users) * (number of categories) * 7
rows, which could turn out to be a big number.
Is there anything I'm missing that could help me make a final decision on which method to use? 1, 2, 3 or none of the above?
The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:
ALTER TABLE `graphdata` ADD INDEX `graphdata_idx_date` (`date`);
SELECT
1
FROM
`graphdata`
WHERE
`graphdata`.`date` < DATE_ADD(NOW(), INTERVAL -1 WEEK)