I've experienced a T-SQL query taking a significant amount of time. If I split it up into 2 queries, it's faster, but I don't know why.
The query is meant to fetch data from tables C, G, Y and insert it into X - but only if it doesn't already exist in X.
Here's the slow query. It slows down significant when the X-table grows:
INSERT INTO X (COLUMN_A, COLUMN_B, COLUMN_C)
(SELECT a1.COLUMN_A, a1.COLUMN_B, 'QWE' as COLUMN_C
FROM
(SELECT
CAST(G.ID AS VARCHAR(900)) AS COLUMN_A,
MAX(Y.B) AS COLUMN_B
FROM
C
JOIN
G ON (G.ID = C.G_UID)
JOIN
Y ON Y.B COLLATE Danish_Norwegian_BIN = C.UID COLLATE Danish_Norwegian_BIN
GROUP BY
CAST(G.ID AS VARCHAR(900))) AS a1
WHERE
a1.COLUMN_A NOT IN (SELECT COLUMN_A FROM X))
BUT! It's faster if I split the query up (in code) into 2 sets:
Set 1:
SELECT
a1.COLUMN_A, a1.COLUMN_B, 'QWE' as COLUMN_C
FROM
(SELECT
CAST(G.ID AS VARCHAR(900)) AS COLUMN_A, MAX(Y.B) AS COLUMN_B
FROM
C
JOIN
G ON (G.ID = C.G_UID)
JOIN
Y ON Y.B COLLATE Danish_Norwegian_BIN = C.UID COLLATE Danish_Norwegian_BIN
GROUP BY
CAST(G.ID AS VARCHAR(900))) AS a1
Set 2:
SELECT COLUMN_A FROM X
And then manually subtract the sets from each other, and inserting the result into the X-table.
Why is that? Can my slow query be written better?
The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:
ALTER TABLE `G` ADD INDEX `g_idx_id` (`ID`);
SELECT
a1.COLUMN_A,
a1.COLUMN_B,
'QWE' AS COLUMN_C
FROM
(SELECT
CAST(G.ID AS VARCHAR (900)) AS COLUMN_A,
MAX(Y.B) AS COLUMN_B
FROM
C
JOIN
G
ON (
G.ID = C.G_UID
)
JOIN
Y
ON Y.B COLLATE Danish_Norwegian_BIN = C.UID COLLATE Danish_Norwegian_BIN
GROUP BY
CAST(G.ID AS VARCHAR (900))
ORDER BY
NULL) AS a1