Consider a structure where you have a many-to-one (or one-to-many) relationship with a condition (where, order by, etc.) on both tables. For example:
CREATE TABLE tableTwo ( id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT, eventTime DATETIME NOT NULL, INDEX (eventTime) ) ENGINE=InnoDB; CREATE TABLE tableOne ( id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT, tableTwoId INT UNSIGNED NOT NULL, objectId INT UNSIGNED NOT NULL, INDEX (objectID), FOREIGN KEY (tableTwoId) REFERENCES tableTwo (id) ) ENGINE=InnoDB;
and for an example query:
select * from tableOne t1 inner join tableTwo t2 on t1.tableTwoId = t2.id where objectId = '..' order by eventTime;
Let's say you index
tableTwo.eventTime. If you then explain on the above query, it will show "Using filesort". Essentially, it first applies the
tableOne.objectId index, but it can't apply the
tableTwo.eventTime index because that index is for the entirety of tableTwo (not the limited result set), and thus it must do a manual sort.
Thus, is there a way to do a cross-table index so it wouldn't have to filesort each time results are retrieved? Something like:
create index ind_t1oi_t2et on tableOne t1 inner join tableTwo t2 on t1.tableTwoId = t2.id (t1.objectId, t2.eventTime);
Also, I've looked into creating a view and indexing that, but indexing is not supported for views.
The solution I've been leaning towards if cross-table indexing isn't possible is replicating the conditional data in one table. In this case that means
eventTime would be replicated in
tableOne and a multi-column index would be set up on
tableOne.eventTime (essentially manually creating the index). However, I thought I'd seek out other people's experience first to see if that was the best way.
Here are some procedures for loading test data and comparing results:
drop procedure if exists populate_table_two; delimiter # create procedure populate_table_two(IN numRows int) begin declare v_counter int unsigned default 0; while v_counter < numRows do insert into tableTwo (eventTime) values (CURRENT_TIMESTAMP - interval 0 + floor(0 + rand()*1000) minute); set v_counter=v_counter+1; end while; end # delimiter ; drop procedure if exists populate_table_one; delimiter # create procedure populate_table_one (IN numRows int, IN maxTableTwoId int, IN maxObjectId int) begin declare v_counter int unsigned default 0; while v_counter < numRows do insert into tableOne (tableTwoId, objectId) values (floor(1 +(rand() * maxTableTwoId)), floor(1 +(rand() * maxObjectId))); set v_counter=v_counter+1; end while; end # delimiter ;
You can use these as follows to populate 10,000 rows in
tableTwo and 20,000 rows in
tableOne (with random references to
tableOne and random
objectIds between 1 and 5), which took 26.2 and 70.77 seconds respectively to run for me:
call populate_table_two(10000); call populate_table_one(20000, 10000, 5);
Update 2 (Tested Triggering SQL):
Below is the tried and tested SQL based on daniHp's triggering method. This keeps the
dateTime in sync on
tableOne is added or
tableTwo is updated. Also, this method should also work for many-to-many relationships if the condition columns are copied to the joining table. In my testing of 300,000 rows in
tableOne and 200,000 rows in
tableTwo, the speed of the old query with similar limits was 0.12 sec and the speed of the new query still shows as 0.00 seconds. Thus, there is a clear improvement, and this method should perform well into the millions of rows and farther.
alter table tableOne add column tableTwo_eventTime datetime; create index ind_t1_oid_t2et on tableOne (objectId, tableTwo_eventTime); drop TRIGGER if exists t1_copy_t2_eventTime; delimiter # CREATE TRIGGER t1_copy_t2_eventTime BEFORE INSERT ON tableOne for each row begin set NEW.tableTwo_eventTime = (select eventTime from tableTwo t2 where t2.id = NEW.tableTwoId); end # delimiter ; drop TRIGGER if exists upd_t1_copy_t2_eventTime; delimiter # CREATE TRIGGER upd_t1_copy_t2_eventTime BEFORE UPDATE ON tableTwo for each row begin update tableOne set tableTwo_eventTime = NEW.eventTime where tableTwoId = NEW.id; end # delimiter ;
And the updated query:
select * from tableOne t1 inner join tableTwo t2 on t1.tableTwoId = t2.id where t1.objectId = 1 order by t1.tableTwo_eventTime desc limit 0,10;
The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:
SELECT * FROM tableOne t1 INNER JOIN tableTwo t2 ON t1.tableTwoId = t2.id WHERE objectId = '..' ORDER BY eventTime