[Solved] Slow PostgreSQL query with (incorrect?) indexes
Looking to automatically optimize YOUR SQL query? Start for free.

EverSQL Database Performance Knowledge Base

Slow PostgreSQL query with (incorrect?) indexes

Database type:

I have an Events table with 30 million rows. The following query returns in 25 seconds

SELECT DISTINCT "events"."id", "calendars"."user_id" 
FROM "events" 
LEFT JOIN "calendars" ON "events"."calendar_id" = "calendars"."id" 
WHERE "events"."deleted_at" is null 
AND tstzrange('2016-04-21T12:12:36-07:00', '2016-04-21T12:22:36-07:00') @> lower(time_range) 
AND ("status" is null or (status->>'pre_processed') IS NULL) 

status is a jsonb column with an index on status->>'pre_processed'. Here are the other indexes that were created on the events table. time_range is of type TSTZRANGE.

CREATE INDEX events_time_range_idx ON events USING gist (time_range);
CREATE INDEX events_lower_time_range_index on events(lower(time_range));
CREATE INDEX events_upper_time_range_index on events(upper(time_range));
CREATE INDEX events_calendar_id_index on events (calendar_id)

I'm definitely out of my comfort zone on this and am trying to reduce the query time. Here's the output of explain analyze

  HashAggregate  (cost=7486635.89..7486650.53 rows=1464 width=48) (actual time=26989.272..26989.306 rows=98 loops=1)
  Group Key: events.id, calendars.user_id
  ->  Nested Loop Left Join  (cost=0.42..7486628.57 rows=1464 width=48) (actual time=316.110..26988.941 rows=98 loops=1)
    ->  Seq Scan on events  (cost=0.00..7475629.43 rows=1464 width=50) (actual time=316.049..26985.344 rows=98 loops=1)
          Filter: ((deleted_at IS NULL) AND ((status IS NULL) OR ((status ->> 'pre_processed'::text) IS NULL)) AND ('["2016-04-21 19:12:36+00","2016-04-21 19:22:36+00")'::tstzrange @> lower(time_range)))
          Rows Removed by Filter: 31592898
    ->  Index Scan using calendars_pkey on calendars  (cost=0.42..7.50 rows=1 width=48) (actual time=0.030..0.031 rows=1 loops=98)
          Index Cond: (events.calendar_id = (id)::text)
Planning time: 1.468 ms
Execution time: 26989.370 ms

And here is the explain analyze with the events.deleted_at part of the query removed

HashAggregate  (cost=7487382.57..7487398.33 rows=1576 width=48) (actual time=23880.466..23880.503 rows=115 loops=1)
  Group Key: events.id, calendars.user_id
  ->  Nested Loop Left Join  (cost=0.42..7487374.69 rows=1576 width=48) (actual time=16.612..23880.114 rows=115 loops=1)
    ->  Seq Scan on events  (cost=0.00..7475629.43 rows=1576 width=50) (actual time=16.576..23876.844 rows=115 loops=1)
          Filter: (((status IS NULL) OR ((status ->> 'pre_processed'::text) IS NULL)) AND ('["2016-04-21 19:12:36+00","2016-04-21 19:22:36+00")'::tstzrange @> lower(time_range)))
          Rows Removed by Filter: 31592881
    ->  Index Scan using calendars_pkey on calendars  (cost=0.42..7.44 rows=1 width=48) (actual time=0.022..0.023 rows=1 loops=115)
          Index Cond: (events.calendar_id = (id)::text)

Planning time: 0.372 ms Execution time: 23880.571 ms

I added the index on the status column. Everything else what already there and I'm unsure how to proceed going forward. Any suggestions on how to get the query time down to a more manageable number?

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Calling Functions With Indexed Columns (query line: 11): When a function is used directly on an indexed column, the database's optimizer won’t be able to use the index. For example, if the column `time_range` is indexed, the index won’t be used as it’s wrapped with the function `lower`. If you can’t find an alternative condition that won’t use a function call, a possible solution is to store the required value in a new indexed column.
The optimized query:
SELECT
        DISTINCT "events"."id",
        "calendars"."user_id" 
    FROM
        "events" 
    LEFT JOIN
        "calendars" 
            ON "events"."calendar_id" = "calendars"."id" 
    WHERE
        "events"."deleted_at" IS NULL 
        AND tstzrange('2016-04-21T12:12:36-07:00', '2016-04-21T12:22:36-07:00') @> lower(time_range) 
        AND (
            "status" IS NULL 
            OR (
                status->>'pre_processed'
            ) IS NULL
        )

Related Articles



* original question posted on StackOverflow here.