[Solved] Optimizing Related Table Query

EverSQL Database Performance Knowledge Base

Optimizing Related Table Query

I have a users table with user info and a related table which will show the related users to the current user.

To get the related users to user id '25' my query looks like

SELECT
  id
FROM users u
  INNER JOIN (SELECT
                primary_id,
                secondary_id
              FROM users_rel
              WHERE primary_id = '25'
                   OR secondary_id = '25') temp
    ON (u.id = temp.primary_id
         OR u.id = temp.secondary_id)
WHERE u.id != '25'

the issue here is that in the users_rel table the user id can either be on the primary side or on the secondary side. Don't tell me to change that because its already done for 6 million records so i can not change it. this query takes from 2 to 5 mins to execute with 4000 records in user_rel table and 629241 in users table.

    user_rel TABLE
.--------------------------------.
|    id         |  (VARCHAR,36 ) |   
|   primary_id  |  (VARCHAR,36)  |
|  secondary_id |  (VARCHAR,36)  |
|    del        |  (TINYINT,1)   |
|.______________________________.|

and index is defined as combination of primary_id and secondary_id

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid OR Conditions By Using UNION (modified query below): In mosts cases, filtering using the OR operator cannot be applied using indexes. A more optimized alternative will be to split the query to two parts combined with a UNION clause, while each query holds one part of the original OR condition.
  2. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  3. Prefer Direct Join Over Joined Subquery (query line: 7): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, we recommend to replace subqueries with JOIN clauses.
  4. Use Numeric Column Types For Numeric Values (query line: 15): Referencing a numeric value (e.g. 25) as a string in a WHERE clause might result in poor performance. Possible impacts of storing numbers as varchars: more space will be used, you won't be able to perform arithmetic operations, the data won't be self-validated, aggregation functions like SUM won't work, the output may sort incorrectly and more. If the column is numeric, remove the quotes from the constant value, to make sure a numeric comparison is done.
  5. Use Numeric Column Types For Numeric Values (query line: 18): Referencing a numeric value (e.g. 25) as a string in a WHERE clause might result in poor performance. Possible impacts of storing numbers as varchars: more space will be used, you won't be able to perform arithmetic operations, the data won't be self-validated, aggregation functions like SUM won't work, the output may sort incorrectly and more. If the column is numeric, remove the quotes from the constant value, to make sure a numeric comparison is done.
  6. Use Numeric Column Types For Numeric Values (query line: 29): Referencing a numeric value (e.g. 25) as a string in a WHERE clause might result in poor performance. Possible impacts of storing numbers as varchars: more space will be used, you won't be able to perform arithmetic operations, the data won't be self-validated, aggregation functions like SUM won't work, the output may sort incorrectly and more. If the column is numeric, remove the quotes from the constant value, to make sure a numeric comparison is done.
  7. Use Numeric Column Types For Numeric Values (query line: 30): Referencing a numeric value (e.g. 25) as a string in a WHERE clause might result in poor performance. Possible impacts of storing numbers as varchars: more space will be used, you won't be able to perform arithmetic operations, the data won't be self-validated, aggregation functions like SUM won't work, the output may sort incorrectly and more. If the column is numeric, remove the quotes from the constant value, to make sure a numeric comparison is done.
  8. Use Numeric Column Types For Numeric Values (query line: 40): Referencing a numeric value (e.g. 25) as a string in a WHERE clause might result in poor performance. Possible impacts of storing numbers as varchars: more space will be used, you won't be able to perform arithmetic operations, the data won't be self-validated, aggregation functions like SUM won't work, the output may sort incorrectly and more. If the column is numeric, remove the quotes from the constant value, to make sure a numeric comparison is done.
  9. Use Numeric Column Types For Numeric Values (query line: 41): Referencing a numeric value (e.g. 25) as a string in a WHERE clause might result in poor performance. Possible impacts of storing numbers as varchars: more space will be used, you won't be able to perform arithmetic operations, the data won't be self-validated, aggregation functions like SUM won't work, the output may sort incorrectly and more. If the column is numeric, remove the quotes from the constant value, to make sure a numeric comparison is done.
  10. Use Numeric Column Types For Numeric Values (query line: 51): Referencing a numeric value (e.g. 25) as a string in a WHERE clause might result in poor performance. Possible impacts of storing numbers as varchars: more space will be used, you won't be able to perform arithmetic operations, the data won't be self-validated, aggregation functions like SUM won't work, the output may sort incorrectly and more. If the column is numeric, remove the quotes from the constant value, to make sure a numeric comparison is done.
  11. Use Numeric Column Types For Numeric Values (query line: 52): Referencing a numeric value (e.g. 25) as a string in a WHERE clause might result in poor performance. Possible impacts of storing numbers as varchars: more space will be used, you won't be able to perform arithmetic operations, the data won't be self-validated, aggregation functions like SUM won't work, the output may sort incorrectly and more. If the column is numeric, remove the quotes from the constant value, to make sure a numeric comparison is done.
  12. Use UNION ALL instead of UNION (query line: 43): Always use UNION ALL unless you need to eliminate duplicate records. By using UNION ALL, you'll avoid the expensive distinct operation the database applies when using a UNION clause.
Optimal indexes for this query:
ALTER TABLE `users` ADD INDEX `users_idx_id` (`id`);
ALTER TABLE `users_rel` ADD INDEX `users_rel_idx_secondary_id` (`secondary_id`);
ALTER TABLE `users_rel` ADD INDEX `users_rel_idx_primary_id` (`primary_id`);
The optimized query:
SELECT
        u_id 
    FROM
        ((SELECT
            u.id AS u_id 
        FROM
            users u 
        INNER JOIN
            users_rel temp 
                ON (
                    u.id = temp.secondary_id
                ) 
        WHERE
            (
                u.id != '25'
            ) 
            AND (
                temp.secondary_id = '25'
            )) 
    UNION
    DISTINCT (SELECT
        u.id AS u_id 
    FROM
        users u 
    INNER JOIN
        users_rel temp 
            ON (u.id = temp.secondary_id) 
    WHERE
        (u.id != '25') 
        AND (temp.primary_id = '25')) 
UNION
DISTINCT (SELECT
    u.id AS u_id 
FROM
    users u 
INNER JOIN
    users_rel temp 
        ON (u.id = temp.primary_id) 
WHERE
    (u.id != '25') 
    AND (temp.secondary_id = '25')) 
UNION
DISTINCT (SELECT
u.id AS u_id 
FROM
users u 
INNER JOIN
users_rel temp 
    ON (u.id = temp.primary_id) 
WHERE
(u.id != '25') 
AND (temp.primary_id = '25'))
) AS union1

Related Articles



* original question posted on StackOverflow here.