[Solved] Optimizing a MySql statement
Looking to automatically optimize YOUR SQL query? Start for free.

EverSQL Database Performance Knowledge Base

Optimizing a MySql statement

Database type:

This was an interview question and I have been upfront and said I'm not sure how to optimize the query - I work in the .Net stack and have been enjoying the Entity Framework for quite some time. So my skills in this area have suffered a little. This is also MySql.

So, it's driving me mad - trying to figure how I would optimize this statement. Especially given no DB to play with.

The statement looks much like this:

SELECT a.eid, a.cid, b.e_date
FROM table_A a
JOIN table_B b ON a.eid = b.eid
WHERE a.cid IN 
    (SELECT cid FROM table_A_trans WHERE `someDateColumn` <= '2014-06-01 00:00:00') 
AND a.eid NOT IN(123,124,200)

I'm thinking it's perhaps to do with the fact that a subquery is ran, which could bring back hundreds of results, only to then strip them out in the final NOT IN() clause.

So - how would you do this?

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  2. Replace In Subquery With Correlated Exists (modified query below): In many cases, an EXISTS subquery with a correlated condition will perform better than a non correlated IN subquery.
Optimal indexes for this query:
ALTER TABLE `table_A` ADD INDEX `table_a_idx_eid` (`eid`);
ALTER TABLE `table_A_trans` ADD INDEX `table_trans_idx_cid_somedatecolumn` (`cid`,`someDateColumn`);
ALTER TABLE `table_B` ADD INDEX `table_b_idx_eid` (`eid`);
The optimized query:
SELECT
        a.eid,
        a.cid,
        b.e_date 
    FROM
        table_A a 
    JOIN
        table_B b 
            ON a.eid = b.eid 
    WHERE
        EXISTS (
            SELECT
                1 
            FROM
                table_A_trans 
            WHERE
                (
                    table_A_trans.`someDateColumn` <= '2014-06-01 00:00:00'
                ) 
                AND (
                    a.cid = table_A_trans.cid
                )
        ) 
        AND a.eid NOT IN (
            123, 124, 200
        )

Related Articles



* original question posted on StackOverflow here.