[Solved] SQL Count on large table using multiple joins - mysql speed tip?
Looking to automatically optimize YOUR SQL query? Start for free.

EverSQL Database Performance Knowledge Base

SQL Count on large table using multiple joins - mysql speed tip?

Database type:

Is there anyway I can improve on this query to have better performance?

SELECT count(stories.id), count(distinct table_1.table_2_id), count(distinct table_1.id)
FROM stories
    INNER JOIN table_1 ON (table_1.id = stories.table_1_id)
    INNER JOIN table_2 ON (table_2.id = table_1.table_2_id)
    INNER JOIN table_3 ON (table_3.id = table_2.table_3_id)
    INNER JOIN table_4 ON (table_4.id = table_3.table_4_id)
    INNER JOIN table_5 ON (table_5.id = table_4.table_5_id)
WHERE stories.id in (select s2.id
                    from stories s2
                    where s2.published_at between '2015-01-01' and '2020-02-21'
                    and s2.deleted = false
                    )

I currently need this info, (the counts) for a dashboard total.

All tables involved are Inno DB, I have indexes on all ids being joined. My stories table currently has over 15M records, and growing.

What is the best approach, in terms of scalability, to deal with a situation like this?

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  2. Replace In Subquery With Correlated Exists (modified query below): In many cases, an EXISTS subquery with a correlated condition will perform better than a non correlated IN subquery.
Optimal indexes for this query:
ALTER TABLE `stories` ADD INDEX `stories_idx_id_deleted_published` (`id`,`deleted`,`published_at`);
ALTER TABLE `table_1` ADD INDEX `table_1_idx_id` (`id`);
ALTER TABLE `table_2` ADD INDEX `table_2_idx_id` (`id`);
ALTER TABLE `table_3` ADD INDEX `table_3_idx_id` (`id`);
ALTER TABLE `table_4` ADD INDEX `table_4_idx_id` (`id`);
ALTER TABLE `table_5` ADD INDEX `table_5_idx_id` (`id`);
The optimized query:
SELECT
        count(stories.id),
        count(DISTINCT table_1.table_2_id),
        count(DISTINCT table_1.id) 
    FROM
        stories 
    INNER JOIN
        table_1 
            ON (
                table_1.id = stories.table_1_id
            ) 
    INNER JOIN
        table_2 
            ON (
                table_2.id = table_1.table_2_id
            ) 
    INNER JOIN
        table_3 
            ON (
                table_3.id = table_2.table_3_id
            ) 
    INNER JOIN
        table_4 
            ON (
                table_4.id = table_3.table_4_id
            ) 
    INNER JOIN
        table_5 
            ON (
                table_5.id = table_4.table_5_id
            ) 
    WHERE
        EXISTS (
            SELECT
                1 
            FROM
                stories AS s21 
            WHERE
                (
                    s21.published_at BETWEEN '2015-01-01' AND '2020-02-21' 
                    AND s21.deleted = false
                ) 
                AND (
                    stories.id = s21.id
                )
        )

Related Articles



* original question posted on StackOverflow here.