I've been dealing with a slow running query, similar to the following
select
count(*)
from
a
join b
on a.akey = b.akey
join c
on b.bkey = c.bkey
left join d
on c.ykey = d.ykey
and b.xkey = d.xkey
where
a.idkey = 'someid'
This query takes 130s to run for 'someid'
If I remove either condition of the left join, the query runs in <1s.
I've determined the issue for this particular record (someid). There are a huge number of matching d.xkey
values (~5 000 000). I've done some tests and modifying the relevant d.xkey
values for this record to more unique values improves run time to <1s.
This is the fix I'm currently using.
select
count(*)
from
a
join b
on a.akey = b.akey
join c
on b.bkey = c.bkey
left join d
on c.ykey = d.ykey
where
a.idkey = 'someid'
and (
b.xkey = d.xkey
OR b.xkey is null
OR not exists (
select
dd.xkey
from
d dd
where
dd.xkey = b.xkey
and dd.ykey = c.ykey
)
)
This query runs in less than 1s.
My question is, why is this so much faster than the left join?
Is my new query equivalent to the old one in terms of results?
The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:
CREATE INDEX a_idx_idkey ON a (idkey);
CREATE INDEX b_idx_akey ON b (akey);
CREATE INDEX c_idx_bkey ON c (bkey);
CREATE INDEX d_idx_ykey ON d (ykey);
SELECT
count(*)
FROM
a
JOIN
b
ON a.akey = b.akey
JOIN
c
ON b.bkey = c.bkey
LEFT JOIN
d
ON c.ykey = d.ykey
AND b.xkey = d.xkey
WHERE
a.idkey = 'someid'