[Solved] Query including subquery and group by slower than expected

EverSQL Database Performance Knowledge Base

Query including subquery and group by slower than expected

The whole query below runs incredibly slowly.

The subquery query [alias Stage_1] takes only 1.37 minutes returning 9514 records, however the whole query takes over 20 minutes, returning 2606 records.

I could use a #temp table to hold the subquery to improve the performance however I would prefer not to.

An overview of the query is that table WeeklySpace inner joins to Spaceblock_Name_to_PG table on SpaceblockName_SID, this cuts down the results in WeeklySpace and includes PG_Code with the results in WeeklySpace. WeeklySpace is then Full Outer Joined to Sales_PG_Wk across 3 fields. The where clause focuses the results, and may be changed. The results from the subquery are then sum'd. You cannot do the final sum'ing in the subquery due to the group by and sum over used.

I believe the issue is due to the subquery re calculation repeatedly during the group by in the final sum'ing. The field SpaceblockName_SID also appears to be involved in causing the issue as without it the run time with a group by in the subquery isn't affected.

I have read though loads of suggestion, trying them all to resolve the issue.

These include;

The execution plan (cut in two parts, shown below the code) for both the subquery and the whole query appear similar. The cost is around the Full Outer Join (Hash Match), which I expected.

The query is running on T-SQL 2005.

Any help greatly appreciated!

select 
    Cost_centre
    , Fin_week
    , SpaceblockName_SID
    , sum(Propor_rep_SRV) as Total_SpaceblockName_SID_SRV
from
(
        select  
            coalesce(space_side.fin_week , sales_side.fin_week) as Fin_week 
           ,coalesce(space_side.cost_centre , sales_side.cost_Centre) as Cost_centre 
           ,space_side.SpaceblockName_SID
           ,case
                when  space_side.SpaceblockName_SID is null
                    then sales_side.SalesExVAT
                else  sum(space_side.TLM)
                                /nullif(sum (sum(space_side.TLM) ) over (partition by coalesce(space_side.fin_week , sales_side.fin_week)
                                                                                            , coalesce(space_side.cost_centre , sales_side.cost_Centre)  
                                                                                            , coalesce( Spaceblock_Name_to_PG.PG_Code, sales_side.PG_Code)) ,0)*sales_side.SalesExVAT 
            end as Propor_rep_SRV
        from
            WeeklySpace as space_side
        INNER JOIN
            Spaceblock_Name_to_PG 
                ON space_side.SpaceblockName_SID = Spaceblock_Name_to_PG.SpaceblockName_SID 
                    and Spaceblock_Name_to_PG.PG_Code < 10000
        full outer join
            sales_pg_wk as sales_side
                on  space_side.fin_week = sales_side.fin_week 
                    and space_side.Cost_Centre = sales_side.Cost_Centre 
                    and Spaceblock_Name_to_PG.PG_code = sales_side.pg_code
        where 
            coalesce(space_side.fin_week, sales_side.fin_week) between 201538 and 201550
                and
            coalesce(space_side.cost_centre, sales_side.cost_Centre) in (3, 2800)
        group by  
            coalesce(space_side.fin_week, sales_side.fin_week)  
           ,coalesce(space_side.cost_centre, sales_side.cost_Centre)  
           ,coalesce( Spaceblock_Name_to_PG.PG_Code, sales_side.PG_Code)  
           ,sales_side.SalesExVAT
           ,space_side.SpaceblockName_SID
) as stage_1
group by 
    Cost_centre
    , Fin_week
    , SpaceblockName_SID

Execution plan left hand side Execution plan left hand side


Execution plan right hand side Execution plan left hand side

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Calling Functions With Indexed Columns (query line: 36): When a function is used directly on an indexed column, the database's optimizer won’t be able to use the index. For example, if the column `fin_week` is indexed, the index won’t be used as it’s wrapped with the function `coalesce`. If you can’t find an alternative condition that won’t use a function call, a possible solution is to store the required value in a new indexed column.
  2. Avoid Calling Functions With Indexed Columns (query line: 37): When a function is used directly on an indexed column, the database's optimizer won’t be able to use the index. For example, if the column `cost_centre` is indexed, the index won’t be used as it’s wrapped with the function `coalesce`. If you can’t find an alternative condition that won’t use a function call, a possible solution is to store the required value in a new indexed column.
  3. Avoid Calling Functions With Indexed Columns (query line: 37): When a function is used directly on an indexed column, the database's optimizer won’t be able to use the index. For example, if the column `cost_Centre` is indexed, the index won’t be used as it’s wrapped with the function `coalesce`. If you can’t find an alternative condition that won’t use a function call, a possible solution is to store the required value in a new indexed column.
  4. Avoid Calling Functions With Indexed Columns (query line: 41): When a function is used directly on an indexed column, the database's optimizer won’t be able to use the index. For example, if the column `fin_week` is indexed, the index won’t be used as it’s wrapped with the function `coalesce`. If you can’t find an alternative condition that won’t use a function call, a possible solution is to store the required value in a new indexed column.
  5. Avoid Calling Functions With Indexed Columns (query line: 43): When a function is used directly on an indexed column, the database's optimizer won’t be able to use the index. For example, if the column `cost_centre` is indexed, the index won’t be used as it’s wrapped with the function `coalesce`. If you can’t find an alternative condition that won’t use a function call, a possible solution is to store the required value in a new indexed column.
  6. Avoid Calling Functions With Indexed Columns (query line: 43): When a function is used directly on an indexed column, the database's optimizer won’t be able to use the index. For example, if the column `cost_Centre` is indexed, the index won’t be used as it’s wrapped with the function `coalesce`. If you can’t find an alternative condition that won’t use a function call, a possible solution is to store the required value in a new indexed column.
  7. Avoid Calling Functions With Indexed Columns (query line: 45): When a function is used directly on an indexed column, the database's optimizer won’t be able to use the index. For example, if the column `PG_Code` is indexed, the index won’t be used as it’s wrapped with the function `coalesce`. If you can’t find an alternative condition that won’t use a function call, a possible solution is to store the required value in a new indexed column.
  8. Avoid Subqueries (query line: 7): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, it's recommended to join a newly created temporary table that holds the data, which also includes the relevant search index.
  9. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  10. Explicitly ORDER BY After GROUP BY (modified query below): By default, the database sorts all 'GROUP BY col1, col2, ...' queries as if you specified 'ORDER BY col1, col2, ...' in the query as well. If a query includes a GROUP BY clause but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying 'ORDER BY NULL'.
Optimal indexes for this query:
ALTER TABLE `Spaceblock_Name_to_PG` ADD INDEX `spaceblock_to_idx_spaceblockname_pg_code` (`SpaceblockName_SID`,`PG_Code`);
ALTER TABLE `sales_pg_wk` ADD INDEX `sales_wk_idx_pg_code` (`pg_code`);
The optimized query:
SELECT
        Cost_centre,
        Fin_week,
        stage_1.SpaceblockName_SID,
        sum(Propor_rep_SRV) AS Total_SpaceblockName_SID_SRV 
    FROM
        (SELECT
            coalesce(space_side.fin_week,
            sales_side.fin_week) AS Fin_week,
            coalesce(space_side.cost_centre,
            sales_side.cost_Centre) AS Cost_centre,
            space_side.SpaceblockName_SID,
            CASE 
                WHEN space_side.SpaceblockName_SID IS NULL THEN sales_side.SalesExVAT 
                ELSE sum(space_side.TLM) / nullif(sum(sum(space_side.TLM)) OVER (PARTITION 
            BY
                coalesce(space_side.fin_week,
                sales_side.fin_week),
                coalesce(space_side.cost_centre,
                sales_side.cost_Centre),
                coalesce(Spaceblock_Name_to_PG.PG_Code,
                sales_side.PG_Code) ),
                0) * sales_side.SalesExVAT END AS Propor_rep_SRV 
FROM
WeeklySpace AS space_side 
INNER JOIN
Spaceblock_Name_to_PG 
    ON space_side.SpaceblockName_SID = Spaceblock_Name_to_PG.SpaceblockName_SID 
    AND Spaceblock_Name_to_PG.PG_Code < 10000 FULL 
OUTER JOIN
sales_pg_wk AS sales_side 
    ON space_side.fin_week = sales_side.fin_week 
    AND space_side.Cost_Centre = sales_side.Cost_Centre 
    AND Spaceblock_Name_to_PG.PG_code = sales_side.pg_code 
WHERE
coalesce(space_side.fin_week, sales_side.fin_week) BETWEEN 201538 AND 201550 
AND coalesce(space_side.cost_centre, sales_side.cost_Centre) IN (
    3, 2800
) 
GROUP BY
coalesce(space_side.fin_week,
sales_side.fin_week),
coalesce(space_side.cost_centre,
sales_side.cost_Centre),
coalesce(Spaceblock_Name_to_PG.PG_Code,
sales_side.PG_Code),
sales_side.SalesExVAT,
space_side.SpaceblockName_SID 
ORDER BY
NULL) AS stage_1 
GROUP BY
Cost_centre,
Fin_week,
stage_1.SpaceblockName_SID 
ORDER BY
NULL

Related Articles



* original question posted on StackOverflow here.