[Solved] MS Access subquery performance
Looking to automatically optimize YOUR SQL query? Start for free.

EverSQL Database Performance Knowledge Base

MS Access subquery performance

I use MS Access a lot as an ad-hoc data processing tool. One thing I've noticed is that using sub-queries in certain ways tends to have very bad performance with large tables.

For example, this query performs poorly:

SELECT TOP 500 EmployeeID FROM Employee
WHERE EmployeeID NOT IN
    (SELECT EmployeeID FROM LoadHistory
    WHERE YearWeek = '2015-26');

This version of the same query performs well:

SELECT TOP 500 EmployeeID FROM Employee
WHERE NOT EXISTS
    (SELECT 1 FROM LoadHistory
    WHERE YearWeek = '2015-26' AND
        EmployeeID = Employee.EmployeeID);

And this other form of the same query also performs well:

SELECT TOP 500 Employee.EmployeeID
FROM Employee
LEFT JOIN
    (SELECT EmployeeID FROM LoadHistory
    WHERE YearWeek = '2015-26') q
ON Employee.EmployeeID = q.EmployeeID
WHERE q.EmployeeID IS NULL;

For style reasons, I prefer the first form. I can't really understand why the optimizer doesn't generate the same plan for the first and second queries. Is there any logic to how the ACE optimizer is behaving here? Are there any other ways to slightly rewrite the first query so the optimizer can do a better job?

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  2. Replace In Subquery With Correlated Exists (modified query below): In many cases, an EXISTS subquery with a correlated condition will perform better than a non correlated IN subquery.
Optimal indexes for this query:
ALTER TABLE `LoadHistory` ADD INDEX `loadhistory_idx_yearweek_employeeid` (`YearWeek`,`EmployeeID`);
The optimized query:
SELECT
        TOP 500 Employee.EmployeeID 
    FROM
        Employee 
    WHERE
        NOT EXISTS (
            SELECT
                1 
            FROM
                LoadHistory 
            WHERE
                (
                    LoadHistory.YearWeek = '2015-26'
                ) 
                AND (
                    Employee.EmployeeID = LoadHistory.EmployeeID
                )
        )

Related Articles



* original question posted on StackOverflow here.