This was an interview question and I have been upfront and said I'm not sure how to optimize the query - I work in the .Net stack and have been enjoying the Entity Framework for quite some time. So my skills in this area have suffered a little. This is also MySql.
So, it's driving me mad - trying to figure how I would optimize this statement. Especially given no DB to play with.
The statement looks much like this:
SELECT a.eid, a.cid, b.e_date
FROM table_A a
JOIN table_B b ON a.eid = b.eid
WHERE a.cid IN
(SELECT cid FROM table_A_trans WHERE `someDateColumn` <= '2014-06-01 00:00:00')
AND a.eid NOT IN(123,124,200)
I'm thinking it's perhaps to do with the fact that a subquery is ran, which could bring back hundreds of results, only to then strip them out in the final NOT IN()
clause.
So - how would you do this?
The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:
ALTER TABLE `table_A` ADD INDEX `table_a_idx_eid` (`eid`);
ALTER TABLE `table_A_trans` ADD INDEX `table_trans_idx_cid_somedatecolumn` (`cid`,`someDateColumn`);
ALTER TABLE `table_B` ADD INDEX `table_b_idx_eid` (`eid`);
SELECT
a.eid,
a.cid,
b.e_date
FROM
table_A a
JOIN
table_B b
ON a.eid = b.eid
WHERE
EXISTS (
SELECT
1
FROM
table_A_trans
WHERE
(
table_A_trans.`someDateColumn` <= '2014-06-01 00:00:00'
)
AND (
a.cid = table_A_trans.cid
)
)
AND a.eid NOT IN (
123, 124, 200
)