[Solved] Worse query plan with a JOIN after ANALYZE

EverSQL Database Performance Knowledge Base

Worse query plan with a JOIN after ANALYZE

I see that running ANALYZE results in significantly poor performance on a particular JOIN I'm making between two tables.

Suppose the following schema:

CREATE TABLE a ( id INTEGER PRIMARY KEY, name TEXT );
CREATE TABLE b ( a NOT NULL REFERENCES a, value INTEGER, PRIMARY KEY(a, b) );

CREATE VIEW ab AS SELECT a.name, b.text, MAX(b.value)
FROM a
JOIN b ON b.a = a.id;
GROUP BY a.id
ORDER BY a.name

Table a is approximately 10K rows, table b is approximately 48K rows (~5 rows per row in table a).

Before ANALYZE

Now when I run the following query:

SELECT * FROM ab;

The query plan looks as follows:

1|0|0|SCAN TABLE b
1|1|1|SEARCH TABLE a USING INTEGER PRIMARY KEY (rowid=?)

This is a good plan, b is larger and I want it to be in the outer loop, making use of the index in table a. It finishes well within a second.

After ANALYZE

When I execute the same query again, the query plan results in two table scans:

1|0|1|SCAN TABLE a
1|1|0|SCAN TABLE b

This is far for optimal. For some reason the query planner thinks that an outer loop of 10K rows and an inner loop of 48K rows is a better fit. This takes about 1.5 minute to complete.

Should I adapt the index in table b to make it work after ANALYZE? Anything else to change to the indexing/schema?

I just try to understand the problem here. I worked around it using a CROSS JOIN, but that feels dirty and I don't really understand why the planner would go with a plan that is orders of magnitude slower than the un-analyzed plan. It seems to be related to GROUP BY, since the query planner puts table b in the outer loop without it (but that renders the query useless for what I want).

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Selecting Unnecessary Columns (query line: 2): Avoid selecting all columns with the '*' wildcard, unless you intend to use them all. Selecting redundant columns may result in unnecessary performance degradation.
The optimized query:
SELECT
        * 
    FROM
        ab

Related Articles



* original question posted on StackOverflow here.