[Solved] Postgres query plan different on local and prod. Not using index and bad row estimation

EverSQL Database Performance Knowledge Base

Postgres query plan different on local and prod. Not using index and bad row estimation

Database type:

Confusing index-use question, I have a database that looks like this:

+-----------------+------------------+-----------+----------+--------------+-------------+
| Column          | Type             | Modifiers | Storage  | Stats target | Description |
|-----------------+------------------+-----------+----------+--------------+-------------|
| line            | text             |  not null | extended | <null>       | <null>      |
| elo_range       | text             |  not null | extended | <null>       | <null>      |
| white_wins      | integer          |  not null | plain    | <null>       | <null>      |
| black_wins      | integer          |  not null | plain    | <null>       | <null>      |
| draws           | integer          |  not null | plain    | <null>       | <null>      |
| games           | integer          |  not null | plain    | <null>       | <null>      |
| white_incidence | double precision |  not null | plain    | <null>       | <null>      |
| black_incidence | double precision |  not null | plain    | <null>       | <null>      |
+-----------------+------------------+-----------+----------+--------------+-------------+
Indexes:
    "pk_opening_lines_v3_meta" PRIMARY KEY, btree (line, elo_range)
    "idx_opening_lines_v3_meta_line" hash (line)
Foreign-key constraints:
    "fk_opening_line_v3" FOREIGN KEY (line) REFERENCES opening_lines_v3(line)
Has OIDs: no

And I have a very straight-forward query that just searches for specific lines:

SELECT *
FROM "opening_lines_v3_meta" 
WHERE line IN 
  (
    '1.e4 e5 2.c3 d5 3.b4', 
    '1.e4 e5 2.c3 d5 3.Qh5', 
    '1.e4 e5 2.c3 d5 3.c4', 
    '1.e4 e5 2.c3 d5 3.d3'
    -- ...
  )

Locally, I get a reasonable plan, which uses the idx_opening_lines_v3_meta_line index, the one I'd expect: https://explain.dalibo.com/plan/ARG#plan/node/3

But on prod, I get a different plan, where for some reason the compound primary index is used, and postgres drastically over-estimates the # of rows (estimated 1m, real 30k): https://explain.dalibo.com/plan/D2HD#plan/node/1

Local and prod database are the same. Same indexes, same # of rows, etc.

I've tried a VACUUM ANALYZE, not sure what else to try here. VACUUM FULL? Would rather not lock the db if I can help it, esp. since the docs seem to suggest this is only helpful when there's a bunch of unused space, which doesn't seem to be the case here.

Here's the plan in text format:

QUERY PLAN
Index Scan using pk_opening_lines_v3_meta on public.opening_lines_v3_meta  (cost=0.70..114347.06 rows=118076 width=96) (actual time=143.255..4409.818 rows=30274 loops=1)
  Output: line, elo_range, white_wins, black_wins, draws, games, white_incidence, black_incidence
  Index Cond: (opening_lines_v3_meta.line = ANY ('{"OMITTED"}'::text[]))
  Buffers: shared hit=35081 read=8486
Query Identifier: 5159034391756592230
Planning:
  Buffers: shared hit=206 read=12
Planning Time: 14.126 ms
JIT:
  Functions: 2
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 0.189 ms, Inlining 0.000 ms, Optimization 0.000 ms, Emission 0.000 ms, Total 0.189 ms
Execution Time: 4465.422 ms

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Selecting Unnecessary Columns (query line: 2): Avoid selecting all columns with the '*' wildcard, unless you intend to use them all. Selecting redundant columns may result in unnecessary performance degradation.
  2. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
CREATE INDEX opening_v3_idx_line ON "opening_lines_v3_meta" ("line");
The optimized query:
SELECT
        * 
    FROM
        "opening_lines_v3_meta" 
    WHERE
        "opening_lines_v3_meta".line IN (
            '1.e4 e5 2.c3 d5 3.b4', '1.e4 e5 2.c3 d5 3.Qh5', '1.e4 e5 2.c3 d5 3.c4', '1.e4 e5 2.c3 d5 3.d3'
        )

Related Articles



* original question posted on StackOverflow here.