I use MS Access a lot as an ad-hoc data processing tool. One thing I've noticed is that using sub-queries in certain ways tends to have very bad performance with large tables.
For example, this query performs poorly:
SELECT TOP 500 EmployeeID FROM Employee WHERE EmployeeID NOT IN (SELECT EmployeeID FROM LoadHistory WHERE YearWeek = '2015-26');
This version of the same query performs well:
SELECT TOP 500 EmployeeID FROM Employee WHERE NOT EXISTS (SELECT 1 FROM LoadHistory WHERE YearWeek = '2015-26' AND EmployeeID = Employee.EmployeeID);
And this other form of the same query also performs well:
SELECT TOP 500 Employee.EmployeeID FROM Employee LEFT JOIN (SELECT EmployeeID FROM LoadHistory WHERE YearWeek = '2015-26') q ON Employee.EmployeeID = q.EmployeeID WHERE q.EmployeeID IS NULL;
For style reasons, I prefer the first form. I can't really understand why the optimizer doesn't generate the same plan for the first and second queries. Is there any logic to how the ACE optimizer is behaving here? Are there any other ways to slightly rewrite the first query so the optimizer can do a better job?
The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:
ALTER TABLE `LoadHistory` ADD INDEX `loadhistory_idx_yearweek_employeeid` (`YearWeek`,`EmployeeID`);
SELECT TOP 500 Employee.EmployeeID FROM Employee WHERE NOT EXISTS ( SELECT 1 FROM LoadHistory WHERE ( LoadHistory.YearWeek = '2015-26' ) AND ( Employee.EmployeeID = LoadHistory.EmployeeID ) )