I have a pretty simple recursive CTE running on a single sourcetable (REP.INVENTMOVEMENTS
) containing around 4mln records. The table is pretty heavily indexed.
with
dataset as (
select imv.sourceBatch,
imv.targetBatch,
imv.sourceDataArea,
imv.targetDataArea,
sum(Weight) as Weight
from REP.INVENTMOVEMENTS imv
where imv.sourceBatch <> ''
Group By imv.sourceBatch,
imv.targetBatch,
imv.sourceDataArea,
imv.targetDataArea
),
result as (
select targetBatch as Batch,
targetDataArea as DataArea,
sourceBatch,
targetBatch,
sourceDataArea,
targetDataArea,
1 as level,
Weight
from dataset
where sourceBatch <> targetBatch
union all
select result.Batch,
result.DataArea,
dataset.sourceBatch,
dataset.targetBatch,
dataset.sourceDataArea,
dataset.targetDataArea,
result.level + 1 as level,
dataset.Weight
from dataset inner join result on dataset.targetBatch = result.sourceBatch
and dataset.targetDataArea = result.sourceDataArea
and dataset.targetBatch <> dataset.sourceBatch
)
select * from result
union all
select targetBatch as Batch,
targetDataArea as DataArea,
sourceBatch,
targetBatch,
sourceDataArea,
targetDataArea,
0 as level,
Weight
from dataset
where sourceBatch = targetBatch
;
running the initial query without selections takes the database 122 seconds returning 517.947 records.
Running that same query returning a single batch takes the database less then a second returning 5 records.
But if I run the CTE with a selection on 1 batch it takes the database 28 seconds to complete 2 recursions and return 7 records.
I need to fill a table with the result from this view for 150k batches. so if all of them take half a minute to complete it would take 52 days to finish that task.
This is the execution plan
Just to clarify my objective. Batches can merge into new batches so 2 or more source batches can create an new batch. Two batches created in such a merge can be used to create a new batch... and so forth.
I want to be able to select a batch and find all the batches used to create this new batch.
Please take into account that one batch can be used in multiple other batches.
I hope you can help me out here.
The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:
ALTER TABLE `INVENTMOVEMENTS` ADD INDEX `inventmovements_idx_sourceb_targetb_sourced_targetd` (`sourceBatch`,`targetBatch`,`sourceDataArea`,`targetDataArea`);
WITH dataset AS (SELECT
imv.sourceBatch,
imv.targetBatch,
imv.sourceDataArea,
imv.targetDataArea,
sum(Weight) AS Weight
FROM
REP.INVENTMOVEMENTS imv
WHERE
imv.sourceBatch <> ''
GROUP BY
imv.sourceBatch,
imv.targetBatch,
imv.sourceDataArea,
imv.targetDataArea), result AS (SELECT
dataset.targetBatch AS Batch,
dataset.targetDataArea AS DataArea,
dataset.sourceBatch,
dataset.targetBatch,
dataset.sourceDataArea,
dataset.targetDataArea,
1 AS level,
dataset.Weight
FROM
dataset
WHERE
dataset.sourceBatch <> dataset.targetBatch
UNION
ALL SELECT
result.Batch,
result.DataArea,
dataset.sourceBatch,
dataset.targetBatch,
dataset.sourceDataArea,
dataset.targetDataArea,
result.level + 1 AS level,
dataset.Weight
FROM
dataset
INNER JOIN
result
ON dataset.targetBatch = result.sourceBatch
AND dataset.targetDataArea = result.sourceDataArea
AND dataset.targetBatch <> dataset.sourceBatch) SELECT
*
FROM
result
UNION
ALL SELECT
dataset.targetBatch AS Batch,
dataset.targetDataArea AS DataArea,
dataset.sourceBatch,
dataset.targetBatch,
dataset.sourceDataArea,
dataset.targetDataArea,
0 AS level,
dataset.Weight
FROM
dataset
WHERE
dataset.sourceBatch = dataset.targetBatch