[Solved] Select rows from joined tables with more than n occurrence

How to optimize this SQL query?

In case you have your own slow SQL query, you can optimize it automatically here.

For the query above, the following recommendations will be helpful as part of the SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Correlated Subqueries (query line: 16): A correlated subquery is a subquery that contains a reference (column: user_id) to a table that also appears in the outer query. Usually correlated queries can be rewritten with a join clause, which is the best practice. The database optimizer handles joins much better than correlated subqueries. Therefore, rephrasing the query with a join will allow the optimizer to use the most efficient execution plan for the query.
  2. Avoid Correlated Subqueries (query line: 57): A correlated subquery is a subquery that contains a reference (column: user_id) to a table that also appears in the outer query. Usually correlated queries can be rewritten with a join clause, which is the best practice. The database optimizer handles joins much better than correlated subqueries. Therefore, rephrasing the query with a join will allow the optimizer to use the most efficient execution plan for the query.
  3. Avoid OR Conditions By Using UNION (modified query below): In mosts cases, filtering using the OR operator cannot be applied using indexes. A more optimized alternative will be to split the query to two parts combined with a UNION clause, while each query holds one part of the original OR condition.
  4. Avoid Subqueries (query line: 22): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, it's recommended to join a newly created temporary table that holds the data, which also includes the relevant search index.
  5. Avoid Subqueries (query line: 62): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, it's recommended to join a newly created temporary table that holds the data, which also includes the relevant search index.
  6. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  7. Use UNION ALL instead of UNION (query line: 49): Always use UNION ALL unless you need to eliminate duplicate records. By using UNION ALL, you'll avoid the expensive distinct operation the database applies when using a UNION clause.
Optimal indexes for this query:
ALTER TABLE `care_connect` ADD INDEX `care_connect_idx_clinic_id_role_status` (`clinic_id`,`role`,`status`);
ALTER TABLE `diary` ADD INDEX `diary_idx_state_meal_type_recorded` (`state`,`meal_type`,`recorded_at`);
ALTER TABLE `diary` ADD INDEX `diary_idx_state_recorded_at` (`state`,`recorded_at`);
The optimized query:
SELECT
        d_user_id,
        d_recorded_at,
        d_glucose_value,
        d_unit 
    FROM
        ((SELECT
            d.user_id AS d_user_id,
            d.recorded_at AS d_recorded_at,
            d.glucose_value AS d_glucose_value,
            d.unit AS d_unit 
        FROM
            diary AS d 
        JOIN
            (
                SELECT
                    u.id 
                FROM
                    health_user AS u 
                JOIN
                    (
                        SELECT
                            DISTINCT care_connect.user_id 
                        FROM
                            care_connect 
                        WHERE
                            care_connect.clinic_id = 217 
                            AND care_connect.role = 'user' 
                            AND care_connect.status = 'active'
                    ) AS c 
                        ON u.id = c.user_id 
                WHERE
                    u.is_tester IS FALSE
                ) AS cu 
                    ON d.user_id = cu.id 
            WHERE
                d.created_at >= d.recorded_at 
                AND d.recorded_at < current_date 
                AND d.recorded_at >= current_date - INTERVAL '30 days' 
                AND d.glucose_value > 0 
                AND (
                    (
                        d.state = 'before_meal' 
                        AND d.meal_type = 'breakfast'
                    )
                )
        ) 
    UNION
    DISTINCT (SELECT
        d.user_id AS d_user_id,
        d.recorded_at AS d_recorded_at,
        d.glucose_value AS d_glucose_value,
        d.unit AS d_unit 
    FROM
        diary AS d 
    JOIN
        (SELECT
            u.id 
        FROM
            health_user AS u 
        JOIN
            (SELECT
                DISTINCT care_connect.user_id 
            FROM
                care_connect 
            WHERE
                care_connect.clinic_id = 217 
                AND care_connect.role = 'user' 
                AND care_connect.status = 'active') AS c 
                ON u.id = c.user_id 
        WHERE
            u.is_tester IS FALSE) AS cu 
                ON d.user_id = cu.id 
        WHERE
            d.created_at >= d.recorded_at 
            AND d.recorded_at < current_date 
            AND d.recorded_at >= current_date - INTERVAL '30 days' 
            AND d.glucose_value > 0 
            AND (d.state = 'wakeup'))
    ) AS union1

Related Articles



* original question posted on StackOverflow here.