[Solved] Making Efficient Lateral Joins (or alternatives)
Looking to automatically optimize YOUR SQL query? Start for free.

EverSQL Database Performance Knowledge Base

Making Efficient Lateral Joins (or alternatives)

Database type:

context

I'm playing around with PostgreSQL's lateral join, specifically to perform a join on a group by/limit.

The query works really well when I'm looking up individual records, but the performance very quickly degrades as we query multiple records. This makes sense given that we have multiple subqueries running individual gather, filter aggregate, sorts.the question is, what Postgres strategy should we look at or how do we refactor the below query in order to make it performant at scale?

Query

We have three main tables with a junction table between two of them:

|Managers| >- |Stores| >- |Store_Products| -< Products

We have all of the historical managers for a given store record, and we have a whole catalogue of products for stores (products may be carried by multiple stores).

Goal:given a Store ID, query the most recent Manager and the most recent Product sold.

It's an inner join from Store to Manager and to Product. Manager & Product must be sorted by date desc and limited to 1 (at least I believe that is the way to get the most recent).

SELECT 
    store.id as store_id,
    manager.id as manager_id,
    *
FROM 
    Stores as store,
    LATERAL (
        SELECT 
            * 
        FROM 
            Products as product 
        INNER JOIN Stores_Products store_product on store_product.product_id = product.id
        WHERE 
            store_product.store_id = store.id
        ORDER BY 
            store.date desc
        LIMIT 1
    ) p,
    LATERAL (
        SELECT 
            * 
        FROM 
            Managers as manager
        WHERE 
            manager.store_id = store.id 
        ORDER BY
            manager.date desc
        LIMIT 1
        ) m
WHERE store.name = 'ABC retail'

This works perfectly when you query a single store. However, if you try to batch query (e.g. WHERE store.name in [...]), the query gets very slow and memory consuming very quickly.

Question

Is there a better way to query the data that would scale well?

Thank you!

Note: the example given with stores/products is simply a device to communicate the problem. The actual schema is different - so I'd ask not to not put too much thought into whether this is the best way to normalize the schema! Thank you !

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Selecting Unnecessary Columns (query line: 4): Avoid selecting all columns with the '*' wildcard, unless you intend to use them all. Selecting redundant columns may result in unnecessary performance degradation.
  2. Avoid Selecting Unnecessary Columns (query line: 8): Avoid selecting all columns with the '*' wildcard, unless you intend to use them all. Selecting redundant columns may result in unnecessary performance degradation.
  3. Avoid Selecting Unnecessary Columns (query line: 19): Avoid selecting all columns with the '*' wildcard, unless you intend to use them all. Selecting redundant columns may result in unnecessary performance degradation.
  4. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
CREATE INDEX managers_idx_store_id_date ON "Managers" ("store_id","date" desc);
CREATE INDEX managers_idx_date ON "Managers" ("date" desc);
CREATE INDEX products_idx_id ON "Products" ("id");
CREATE INDEX stores_idx_name ON "Stores" ("name");
CREATE INDEX stores_products_idx_store_id ON "Stores_Products" ("store_id");
The optimized query:
SELECT
        store.id AS store_id,
        manager.id AS manager_id,
        * 
    FROM
        Stores AS store,
        LATERAL(SELECT
            * 
        FROM
            Products AS product 
        INNER JOIN
            Stores_Products store_product 
                ON store_product.product_id = product.id 
        WHERE
            store_product.store_id = store.id 
        ORDER BY
            store.date DESC LIMIT 1) p,
        LATERAL(SELECT
            * 
        FROM
            Managers AS manager 
        WHERE
            manager.store_id = store.id 
        ORDER BY
            manager.date DESC LIMIT 1) m 
    WHERE
        store.name = 'ABC retail'

Related Articles



* original question posted on StackOverflow here.