[Solved] SQLite sqlite3_step() hangs with big database
Looking to automatically optimize YOUR SQL query? Start for free.

EverSQL Database Performance Knowledge Base

SQLite sqlite3_step() hangs with big database

I'm writing a small Objective-C library that works with an embedded SQLite database.

The SQLite version I'm using is 3.7.13 (checked with SELECT sqlite_version())

My query is:

SELECT ROUND(AVG(difference), 5) as distance 
FROM (
  SELECT (
    SELECT A.timestamp - B.timestamp 
    FROM ExampleTable as B 
    WHERE B.timestamp = (
      SELECT MAX(timestamp) 
      FROM ExampleTable as C 
      WHERE C.timestamp < A.timestamp
    )
  ) as difference 
  FROM ExampleTable as A 
  ORDER BY timestamp)

Basically it outputs the average timestamp difference between rows ordered by timestamp.

I tried the query on a sample database with 35k rows and it runs in around 100ms. So far so good.

I then tried the query on another sample database with 100k rows and it hangs at sqlite3_step() taking up 100% of CPU usage.

Since I cannot step into sqlite3_step() with the debugger, is there another way I can get a grasp of where is the function hanging or a debug log of what is the issue here?

I also tried running other queries from my library on the 100k rows database and there is no issue, but it's also true that these are simple queries with no subquery. Maybe this is the issue?

Thanks

UPDATE

This is the output of EXPLAIN QUERY PLAN as requested:

"1","0","0","SCAN TABLE ExampleTable AS A"
"1","0","0","EXECUTE CORRELATED SCALAR SUBQUERY 2"
"2","0","0","SCAN TABLE ExampleTable AS B"
"2","0","0","EXECUTE CORRELATED SCALAR SUBQUERY 3"
"3","0","0","SEARCH TABLE ExampleTable AS C"
"1","0","0","USE TEMP B-TREE FOR ORDER BY"
"0","0","0","SCAN SUBQUERY 1"

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Correlated Subqueries (query line: 12): A correlated subquery is a subquery that contains a reference (column: timestamp) to a table that also appears in the outer query. Usually correlated queries can be rewritten with a join clause, which is the best practice. The database optimizer handles joins much better than correlated subqueries. Therefore, rephrasing the query with a join will allow the optimizer to use the most efficient execution plan for the query.
  2. Avoid Subqueries (query line: 5): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, it's recommended to join a newly created temporary table that holds the data, which also includes the relevant search index.
  3. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
ALTER TABLE `ExampleTable` ADD INDEX `exampletable_idx_timestamp` (`timestamp`);
The optimized query:
SELECT
        ROUND(AVG(difference),
        5) AS distance 
    FROM
        (SELECT
            (SELECT
                A.timestamp - B.timestamp 
            FROM
                ExampleTable AS B 
            WHERE
                B.timestamp = (
                    SELECT
                        MAX(C.timestamp) 
                    FROM
                        ExampleTable AS C 
                    WHERE
                        C.timestamp < A.timestamp
                )
            ) AS difference 
        FROM
            ExampleTable AS A 
        ORDER BY
            A.timestamp)

Related Articles



* original question posted on StackOverflow here.