I see that running ANALYZE
results in significantly poor performance on a particular JOIN
I'm making between two tables.
Suppose the following schema:
CREATE TABLE a ( id INTEGER PRIMARY KEY, name TEXT );
CREATE TABLE b ( a NOT NULL REFERENCES a, value INTEGER, PRIMARY KEY(a, b) );
CREATE VIEW ab AS SELECT a.name, b.text, MAX(b.value)
FROM a
JOIN b ON b.a = a.id;
GROUP BY a.id
ORDER BY a.name
Table a is approximately 10K rows, table b is approximately 48K rows (~5 rows per row in table a).
Before ANALYZE
Now when I run the following query:
SELECT * FROM ab;
The query plan looks as follows:
1|0|0|SCAN TABLE b
1|1|1|SEARCH TABLE a USING INTEGER PRIMARY KEY (rowid=?)
This is a good plan, b is larger and I want it to be in the outer loop, making use of the index in table a. It finishes well within a second.
After ANALYZE
When I execute the same query again, the query plan results in two table scans:
1|0|1|SCAN TABLE a
1|1|0|SCAN TABLE b
This is far for optimal. For some reason the query planner thinks that an outer loop of 10K rows and an inner loop of 48K rows is a better fit. This takes about 1.5 minute to complete.
Should I adapt the index in table b to make it work after ANALYZE
? Anything else to change to the indexing/schema?
I just try to understand the problem here. I worked around it using a CROSS JOIN
, but that feels dirty and I don't really understand why the planner would go with a plan that is orders of magnitude slower than the un-analyzed plan. It seems to be related to GROUP BY
, since the query planner puts table b in the outer loop without it (but that renders the query useless for what I want).
The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:
SELECT
*
FROM
ab