[Solved] Slowness in SQL query with subquery
Looking to automatically optimize YOUR SQL query? Start for free.

EverSQL Database Performance Knowledge Base

Slowness in SQL query with subquery

My SQL query becomes too slow when a subquery is added in WHERE clause even though the individual run times of the queries is less than 1 minute. The query has the following skeleton

SELECT COL1, COL2, COL3, COL4, COL5, COL6, sum(COL7) FROM TABLE1 
WHERE Col1 = 'something' AND COl2 = date AND Col3 = (SELECT MAX(COLUMN1) FROM TABLE2)
GROUP BY COL1, COL2, COL3, COL4, COL5, COL6 

This query is running on SYBASE IQ.

Data for table 1 is 60M+ rows and post application of filter conditions is just 60 rows that usually takes 50 sec to run if subquery is replaced with hardcoded value.

Data for table 2 is 200 rows and post application of filter condition is just one integer value that individually takes 1 sec to run.

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Subqueries (query line: 15): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, it's recommended to join a newly created temporary table that holds the data, which also includes the relevant search index.
  2. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  3. Explicitly ORDER BY After GROUP BY (modified query below): By default, the database sorts all 'GROUP BY col1, col2, ...' queries as if you specified 'ORDER BY col1, col2, ...' in the query as well. If a query includes a GROUP BY clause but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying 'ORDER BY NULL'.
Optimal indexes for this query:
ALTER TABLE `TABLE1` ADD INDEX `table1_idx_col1_col2_date_col3_col4_col5` (`Col1`,`COl2`,`date`,`COL3`,`COL4`,`COL5`);
ALTER TABLE `TABLE2` ADD INDEX `table2_idx_column1` (`COLUMN1`);
The optimized query:
SELECT
        TABLE1.COL1,
        TABLE1.COL2,
        TABLE1.COL3,
        TABLE1.COL4,
        TABLE1.COL5,
        TABLE1.COL6,
        sum(TABLE1.COL7) 
    FROM
        TABLE1 
    WHERE
        TABLE1.Col1 = 'something' 
        AND TABLE1.COl2 = TABLE1.date 
        AND TABLE1.Col3 = (
            SELECT
                MAX(TABLE2.COLUMN1) 
            FROM
                TABLE2
        ) 
    GROUP BY
        TABLE1.COL1,
        TABLE1.COL2,
        TABLE1.COL3,
        TABLE1.COL4,
        TABLE1.COL5,
        TABLE1.COL6 
    ORDER BY
        NULL

Related Articles



* original question posted on StackOverflow here.