[Solved] Altering Tables in From clause and altering filters affects SQL performance?

EverSQL Database Performance Knowledge Base

Altering Tables in From clause and altering filters affects SQL performance?

I recently asked a question on SO here:

SQL Joins Taking forever on table with 60 matches

I have dropped the performance of the query from infinity to about 8 minutes by altering the FROM clause and some of the filters.

The sql changed from:

SELECT 
ad.acct_lv2 AS IFRS_lvl2,
ad.acct_lv3 AS IFRS_lvl3,
ad.acct_lv4 AS IFRS_lvl4,
ad.acct_lv6 AS IFRS_lvl6,
--(CASE WHEN tb.pl_date = cal.curr_me_date THEN cal.date END) as Reporting_Cycle_This,
--(CASE WHEN tb.pl_date = cal.prev_me_date THEN cal.date END) as Reporting_Cycle_Last,
--(CASE WHEN tb.pl_date = cal.prev_me_date THEN cal.date END) as Reporting_Cycle_Previous,
null as Total,
null as Balance_default_dim,
null as Balance_proxy_dim,
0 as percentage_bal_proxy_applied,
'1 Table Data' as rowType 
FROM TrialBal tb 
INNER JOIN Partition p ON p.partition_code = tb.partition_code
    AND p.entity = tb.entity
LEFT OUTER JOIN ActDetail ah ON tb.actNum = ad.actNum
    AND tb.cpny = ad.cpny
    AND tb.pl_date = ad.pl_date
    AND ad.source = 'Ground0'
--LEFT OUTER JOIN CalendarEngine cal on tb.pl_date = cal.curr_me_date OR tb.pl_date = cal.prev_me_date
    --AND tb.entity = cal.entity
WHERE tb.pl_date = '2014-04-30'
AND tb.entity = 'My_entity'
GROUP BY ad.acct_lv2, ad.acct_lv3, ad.acct_lv4, ad.acct_lv6, --tb.actNum, tb.cpny, 
    CASE WHEN tb.pl_date = cal.curr_me_date THEN cal.date END,
    CASE WHEN tb.pl_date = cal.prev_me_date THEN cal.date END,
    CASE WHEN tb.pl_date = cal.prev_me_date THEN cal.date END

to

SELECT 
ad.acct_lv2 AS IFRS_lvl2,
ad.acct_lv3 AS IFRS_lvl3,
ad.acct_lv4 AS IFRS_lvl4,
ad.acct_lv6 AS IFRS_lvl6,
(CASE WHEN tb.pl_date = cal_1.curr_me_date THEN cal_1.date END) as Reporting_Cycle_This,
(CASE WHEN tb.pl_date = cal_2.prev_me_date THEN cal_2.date END) as Reporting_Cycle_Last,
(CASE WHEN tb.pl_date = cal_2.prev_me_date THEN cal_2.date END) as Reporting_Cycle_Previous,
null as Total,
null as Balance_default_dim,
null as Balance_proxy_dim,
0 as percentage_bal_proxy_applied,
'1 Table Data' as rowType 
FROM Partition p  
LEFT OUTER JOIN TrialBal tb 
    ON tb.pl_date = '2014-04-30'
    AND tb.entity = 'My_entity'
    AND p.partition_code = tb.partition_code
    AND p.entity = tb.entity
LEFT OUTER JOIN ActDetail ah ON tb.actNum = ad.actNum
    AND tb.cpny = ad.cpny
    AND tb.pl_date = ad.pl_date
    AND ad.source = 'Ground0'
LEFT OUTER JOIN CalendarEngine cal_1 on tb.pl_date = cal_1.curr_me_date 
    AND tb.entity = cal_1.entity
LEFT OUTER JOIN CalendarEngine cal_2 on tb.pl_date = cal_2.curr_me_date 
    AND tb.entity = cal_2.entity
GROUP BY ad.acct_lv2, ad.acct_lv3, ad.acct_lv4, ad.acct_lv6, cal_1.date, cal_2.date

This was a change recommended by a DBA, but why the drastic improvement? TrialBal had 8 million+ rows, how does moving this table into the OUTER JOIN and away from the FROM clause make such a difference? Wouldn't the Sybase optimiser analyse the most efficient execution path? And is it a good rule going forward to have the FROM clause reference the smallest table when building the query?

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  2. Explicitly ORDER BY After GROUP BY (modified query below): By default, the database sorts all 'GROUP BY col1, col2, ...' queries as if you specified 'ORDER BY col1, col2, ...' in the query as well. If a query includes a GROUP BY clause but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying 'ORDER BY NULL'.
  3. Remove Redundant Left Joins (modified query below): Redundant LEFT JOINs (e.g. `ActDetail`) were detected in the query. Removing them will result in a performance improvement. In some cases, JOINs become redundant after an optimization is applied, such as when converting OR conditions to a UNION clause.
Optimal indexes for this query:
ALTER TABLE `CalendarEngine` ADD INDEX `calendarengine_idx_curr_date_entity` (`curr_me_date`,`entity`);
ALTER TABLE `TrialBal` ADD INDEX `trialbal_idx_pl_date_entity_partition` (`pl_date`,`entity`,`partition_code`);
The optimized query:
SELECT
        ad.acct_lv2 AS IFRS_lvl2,
        ad.acct_lv3 AS IFRS_lvl3,
        ad.acct_lv4 AS IFRS_lvl4,
        ad.acct_lv6 AS IFRS_lvl6,
        (CASE 
            WHEN tb.pl_date = cal_1.curr_me_date THEN cal_1.date END) AS Reporting_Cycle_This,
(CASE 
    WHEN tb.pl_date = cal_2.prev_me_date THEN cal_2.date END) AS Reporting_Cycle_Last,
(CASE 
    WHEN tb.pl_date = cal_2.prev_me_date THEN cal_2.date END) AS Reporting_Cycle_Previous,
NULL AS Total,
NULL AS Balance_default_dim,
NULL AS Balance_proxy_dim,
0 AS percentage_bal_proxy_applied,
'1 Table Data' AS rowType 
FROM
Partition p 
LEFT OUTER JOIN
TrialBal tb 
    ON tb.pl_date = '2014-04-30' 
    AND tb.entity = 'My_entity' 
    AND p.partition_code = tb.partition_code 
    AND p.entity = tb.entity 
LEFT OUTER JOIN
CalendarEngine cal_1 
    ON tb.pl_date = cal_1.curr_me_date 
    AND tb.entity = cal_1.entity 
LEFT OUTER JOIN
CalendarEngine cal_2 
    ON tb.pl_date = cal_2.curr_me_date 
    AND tb.entity = cal_2.entity 
GROUP BY
ad.acct_lv2,
ad.acct_lv3,
ad.acct_lv4,
ad.acct_lv6,
cal_1.date,
cal_2.date 
ORDER BY
NULL

Related Articles



* original question posted on StackOverflow here.