[Solved] Is having an \'OR\' in an INNER JOIN condition a bad idea?

EverSQL Database Performance Knowledge Base

Is having an \'OR\' in an INNER JOIN condition a bad idea?

Database type:

In trying to improve the speed of an immensely slow query (several minutes on two tables with only ~50,000 rows each, on SQL Server 2008 if it matters), I narrowed down the problem to an OR in my inner join, as in:

SELECT mt.ID, mt.ParentID, ot.MasterID
  FROM dbo.MainTable AS mt
  INNER JOIN dbo.OtherTable AS ot ON ot.ParentID = mt.ID
                                  OR ot.ID = mt.ParentID

I changed this to (what I hope is) an equivalent pair of left joins, shown here:

SELECT mt.ID, mt.ParentID,
   CASE WHEN ot1.MasterID IS NOT NULL THEN
      ot1.MasterID ELSE
      ot2.MasterID END AS MasterID
  FROM dbo.MainTable AS mt
  LEFT JOIN dbo.OtherTable AS ot1 ON ot1.ParentID = mt.ID
  LEFT JOIN dbo.OtherTable AS ot2 ON ot2.ID = mt.ParentID
  WHERE ot1.MasterID IS NOT NULL OR ot2.MasterID IS NOT NULL

.. and the query now runs in about a second!

Is it generally a bad idea to put an OR in a join condition? Or am I just unlucky somehow in the layout of my tables?

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid OR Conditions By Using UNION (modified query below): In mosts cases, filtering using the OR operator cannot be applied using indexes. A more optimized alternative will be to split the query to two parts combined with a UNION clause, while each query holds one part of the original OR condition.
  2. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  3. Use UNION ALL instead of UNION (query line: 17): Always use UNION ALL unless you need to eliminate duplicate records. By using UNION ALL, you'll avoid the expensive distinct operation the database applies when using a UNION clause.
Optimal indexes for this query:
CREATE INDEX othertable_idx_id ON dbo.OtherTable (ID);
CREATE INDEX othertable_idx_parentid ON dbo.OtherTable (ParentID);
The optimized query:
SELECT
        mt_id,
        mt_parentid,
        ot_masterid 
    FROM
        ((SELECT
            mt.ID AS mt_id,
            mt.ParentID AS mt_parentid,
            ot.MasterID AS ot_masterid 
        FROM
            dbo.MainTable AS mt 
        INNER JOIN
            dbo.OtherTable AS ot 
                ON ot.ID = mt.ParentID) 
    UNION
    (
        SELECT
            mt.ID AS mt_id,
            mt.ParentID AS mt_parentid,
            ot.MasterID AS ot_masterid 
        FROM
            dbo.MainTable AS mt 
        INNER JOIN
            dbo.OtherTable AS ot 
                ON ot.ParentID = mt.ID
        )
    ) AS union1

Related Articles



* original question posted on StackOverflow here.