[Solved] Postgres optimizing join of 2 large tables

EverSQL Database Performance Knowledge Base

Postgres optimizing join of 2 large tables

Database type:

I have 2 tables one with around 1M entries and other around 4M. I want to join these tables in a materialized view. But the materialized view creation is taking too long ~4 hours.

I don't see the indexes being used, which is fine as per this answer. When a join involves the whole tables then indices don't help so much.

So what is the workaround? My tables have time-series data. Would TimeScaleDB extension be of any help? Any other suggestion to improve the performance ?

EDIT

CREATE TABLE public.fco_assets
(
  id serial,
  symbol character varying(64) NOT NULL,
  currency integer NOT NULL,
  CONSTRAINT fco_assets_pkey1 PRIMARY KEY (id),
  CONSTRAINT fco_assets_symbol_key1 UNIQUE (symbol)
)

CREATE TABLE public.fco_prices
(
  aid integer NOT NULL,
  cid integer NOT NULL,
  "time" timestamp without time zone NOT NULL,
  close double precision,
  CONSTRAINT fco_prices_pkey PRIMARY KEY (aid, cid, "time")
)   

CREATE TABLE public.fco_chr_res
(
  pid integer NOT NULL,
  aid integer NOT NULL,
  cid integer NOT NULL,
  "time" timestamp without time zone NOT NULL,
  res double precision[] NOT NULL,
  CONSTRAINT fco_chr_res_pkey PRIMARY KEY (pid, aid, cid, "time")
)

There are around 8M rows in fco_chr_res. I have created various indices on price and res tables besides the primary key which should act as an index.

 select  count(*) 
 FROM fco_prices pr
 join fco_assets a on pr.aid = a.id and pr.cid=a.currency
 LEFT JOIN fco_chr_res ch ON pr.aid = ch.aid AND pr.cid = ch.cid AND pr."time" = ch."time"

For above query here is the Explain/Analysis report

  1. Aggregate (rows=1 loops=1) 1 1
  2. Gather (rows=1 loops=1) 1 1
  3. Aggregate (rows=1 loops=1) 1 1
  4. Merge Left Join (rows=45636664 loops=1) 45636664 1
  5. Merge Inner Join (rows=21023282 loops=1) 21023282 1
  6. Index Only Scan using fco_prices_pkey on fco_prices as pr (rows=21023282 loops=1) 21023282 1
  7. Sort (rows=3428 loops=1) 3428 1
  8. Seq Scan on fco_assets as a (rows=3428 loops=1) 3428 1
  9. Index Only Scan using aid_cid_time on fco_chr_res as ch (rows=34245723 loops=1) 34245723 1 enter image description here enter image description here is there any scope for improvement ?

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
CREATE INDEX fco_assets_idx_id_currency ON "fco_assets" ("id","currency");
CREATE INDEX fco_res_idx_aid_cid_time ON "fco_chr_res" ("aid","cid","time");
The optimized query:
SELECT
        count(*) 
    FROM
        fco_prices pr 
    JOIN
        fco_assets a 
            ON pr.aid = a.id 
            AND pr.cid = a.currency 
    LEFT JOIN
        fco_chr_res ch 
            ON pr.aid = ch.aid 
            AND pr.cid = ch.cid 
            AND pr."time" = ch."time"

Related Articles



* original question posted on StackOverflow here.