I use MS Access a lot as an ad-hoc data processing tool. One thing I've noticed is that using sub-queries in certain ways tends to have very bad performance with large tables.
For example, this query performs poorly:
SELECT TOP 500 EmployeeID FROM Employee
WHERE EmployeeID NOT IN
(SELECT EmployeeID FROM LoadHistory
WHERE YearWeek = '2015-26');
This version of the same query performs well:
SELECT TOP 500 EmployeeID FROM Employee
WHERE NOT EXISTS
(SELECT 1 FROM LoadHistory
WHERE YearWeek = '2015-26' AND
EmployeeID = Employee.EmployeeID);
And this other form of the same query also performs well:
SELECT TOP 500 Employee.EmployeeID
FROM Employee
LEFT JOIN
(SELECT EmployeeID FROM LoadHistory
WHERE YearWeek = '2015-26') q
ON Employee.EmployeeID = q.EmployeeID
WHERE q.EmployeeID IS NULL;
For style reasons, I prefer the first form. I can't really understand why the optimizer doesn't generate the same plan for the first and second queries. Is there any logic to how the ACE optimizer is behaving here? Are there any other ways to slightly rewrite the first query so the optimizer can do a better job?
The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:
ALTER TABLE `LoadHistory` ADD INDEX `loadhistory_idx_yearweek_employeeid` (`YearWeek`,`EmployeeID`);
SELECT
TOP 500 Employee.EmployeeID
FROM
Employee
WHERE
NOT EXISTS (
SELECT
1
FROM
LoadHistory
WHERE
(
LoadHistory.YearWeek = '2015-26'
)
AND (
Employee.EmployeeID = LoadHistory.EmployeeID
)
)