[Solved] Need Help Speeding up an Aggregate SQLite Query
Looking to automatically optimize YOUR SQL query? Start for free.

EverSQL Database Performance Knowledge Base

Need Help Speeding up an Aggregate SQLite Query

I have a table defined like the following...

CREATE table actions (
  id INTEGER PRIMARY KEY AUTO_INCREMENT,
  end BOOLEAN,
  type VARCHAR(15) NOT NULL,
  subtype_a VARCHAR(15),
  subtype_b VARCHAR(15),
);

I'm trying to query for the last end action of some type to happen on each unique (subtype_a, subtype_b) pair, similar to a group by (except SQLite doesn't say what row is guaranteed to be returned by a group by).

On an SQLite database of about 1MB, the query I have now can take upwards of two seconds, but I need to speed it up to take under a second (since this will be called frequently).

example query:

SELECT * FROM actions a_out 
WHERE id = 
  (SELECT MAX(a_in.id) FROM actions a_in 
   WHERE a_out.subtype_a = a_in.subtype_a 
     AND a_out.subtype_b = a_in.subtype_b 
     AND a_in.status IS NOT NULL 
     AND a_in.type = "some_type");

If it helps, I know all the unique possibilities for a (subtype_a,subtype_b)

eg:

(a,1)
(a,2)
(b,3)
(b,4)
(b,5)
(b,6)

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Selecting Unnecessary Columns (query line: 2): Avoid selecting all columns with the '*' wildcard, unless you intend to use them all. Selecting redundant columns may result in unnecessary performance degradation.
  2. Avoid Subselect When Selecting MAX/MIN Per Group (query line: 7): Constant subquery results are usually not cached by the database, especially in non-recent database versions. Therefore, a constant subquery in a WHERE clause will be fully evaluated for every row the WHERE clause will examine, which can significantly impact query performance. Use the method mentioned in the example instead.
  3. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
ALTER TABLE `actions` ADD INDEX `actions_idx_id_subtype_subtype_status` (`id`,`subtype_a`,`subtype_b`,`status`);
The optimized query:
SELECT
        * 
    FROM
        actions AS actions1 
    LEFT JOIN
        actions AS actions2 
            ON (
                actions1.subtype_a = actions2.subtype_a 
                AND actions1.subtype_b = actions2.subtype_b 
                AND actions2.status IS NOT NULL 
                AND actions2.type = actions2."some_type"
            ) 
            AND (
                actions1.id < actions2.id
            ) 
    WHERE
        (
            1 = 1
        ) 
        AND (
            actions2.id IS NULL
        )

Related Articles



* original question posted on StackOverflow here.