[Solved] How can I optimize this incredibly slow left outer join sqlite query?

EverSQL Database Performance Knowledge Base

How can I optimize this incredibly slow left outer join sqlite query?

I've run into an issue with a SQL query that is basically slow to the point that it takes about 17+ minutes. I'm pretty sure this is simply due to the fact that the outer join(s) and pure volume of data make this query terrible. Unfortunately I'm not seeing a great way to rewrite it to get what I want

I've got the following tables (omitting some columns for brevity):

Events
ID (AUTOINCREMENT INTEGER PRIMARY KEY) | Guid (16 Byte BLOB) | Time (FLOAT)


Relationships
ID (AUTOINCREMENT INTEGER PRIMARY KEY) | Parent (INTEGER) | Child (INTEGER) | ParentTable (INTEGER) | ChildTable (INTEGER)

The Event table has about 25k rows (this will likely quadruple with real data). The Relationship table has about 212k rows (again, will likely quadruple).

Essentially, Event can have nested Events. The resulting tree has no depth limit (though it's not terribly deep at the moment).

When selecting Event records, my goal is to return rows of data that give me the following data:

ID | Guid | Time | ParentIndex | ParentGuid

I also fully expect that root level Events will have null ParentID and ParentGuid columns (which was one of the reasons I took the outer join approach).

My query (without constraints) looks like this:

SELECT E.*, R.Parent as 'ParentIndex', PE.Guid AS 'ParentGuid' FROM Events AS E
LEFT OUTER JOIN Relationships AS R ON R.Child = E.ID AND R.ChildTable = 0
LEFT OUTER JOIN Events AS PE ON R.Parent = PE.ID ORDER BY E.Time;

If I constrain this query with a WHERE clause that filters most of the Events returned, I get a row of data that is exactly what I want. However, without a tight constraint the execution time is crippling.

I assume there is a better way to write this query to get the same sort of result row, but my Sql-fu has failed me.

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Selecting Unnecessary Columns (query line: 2): Avoid selecting all columns with the '*' wildcard, unless you intend to use them all. Selecting redundant columns may result in unnecessary performance degradation.
  2. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
ALTER TABLE `Events` ADD INDEX `events_idx_id_time` (`ID`,`Time`);
ALTER TABLE `Relationships` ADD INDEX `relationships_idx_childtable_child` (`ChildTable`,`Child`);
The optimized query:
SELECT
        E.*,
        R.Parent AS 'ParentIndex',
        PE.Guid AS 'ParentGuid' 
    FROM
        Events AS E 
    LEFT OUTER JOIN
        Relationships AS R 
            ON R.Child = E.ID 
            AND R.ChildTable = 0 
    LEFT OUTER JOIN
        Events AS PE 
            ON R.Parent = PE.ID 
    ORDER BY
        E.Time

Related Articles



* original question posted on StackOverflow here.