[Solved] need to make this query scalable/optimized for larger db in the future (remove full table reads)

EverSQL Database Performance Knowledge Base

need to make this query scalable/optimized for larger db in the future (remove full table reads)

Been working at this for awhile now and cannot seem to get it optimized. Although it does work, each left joined logs* table is reading every row in the database regardless if it is part of the set it is joined to (user_id's). While it returns correct results as is, this will be a problem as the user base and db as a whole grows.

Some quick background : given an account id there can be any number of computers to it. On each of those computers there can be any number of users linked to it. These user_id's are then linked in the logs tables. Each of these relationships is indexed (account_id, computer_id, user_id) for the necessary tables.

I have put the left joins in subqueries to prevent a cartesian product (a previous issue which subqueries solved).

Query :

SELECT
  users.username as username,
  computers.computer_name as computer_name,
  l1.cnt as cnt1,
  l2.cnt as cnt2,
  l3.cnt as cnt3,
  l4.cnt as cnt4,
  l5.cnt as cnt5,
  l6.cnt as cnt6

FROM computers

INNER JOIN users
    on users.computer_id = computers.computer_id

LEFT JOIN
  (SELECT
      user_id,
      count(*) as cnt
    from logs1
    group by user_id
  ) AS l1
  on l1.user_id = users.user_id

LEFT JOIN
  (SELECT
      user_id,
      count(*) as cnt
    from logs2
    group by user_id
  ) AS l2
  on l2.user_id = users.user_id

LEFT JOIN
  (SELECT
      user_id,
      count(*) as cnt
    from logs3
    group by user_id
  ) AS l3
  on l3.user_id = users.user_id

LEFT JOIN
  (SELECT
      user_id,
      count(*) as cnt
    from logs4
    group by user_id
  ) AS l4
  on l4.user_id = users.user_id

LEFT JOIN
  (SELECT
      user_id,
      count(*) as cnt
    from logs5
    group by user_id
  ) AS l5
  on l5.user_id = users.user_id

LEFT JOIN
  (SELECT
      user_id,
      count(*) as cnt
    from logs6
    group by user_id
  ) AS l6
  on l6.user_id = users.user_id 

WHERE computers.account_id = :cw_account_id AND computers.status = :cw_status
GROUP BY users.user_id

Plan :

computers   1   PRIMARY ref PRIMARY,unique_filter,status    unique_filter   4   const   5   Using where; Using temporary; Using filesort
users   1   PRIMARY ref PRIMARY,unique_filter   unique_filter   4   stephen_spcplus_inno.computers.computer_id  1   Using index
<derived2>  1   PRIMARY ref <auto_key0> <auto_key0> 4   stephen_spcplus_inno.users.user_id  3   
logs1   2   DERIVED index   user_id user_id 8       33  Using index
<derived3>  1   PRIMARY ref <auto_key0> <auto_key0> 4   stephen_spcplus_inno.users.user_id  10  
logs2   3   DERIVED index   user_id user_id 8       101 Using index
<derived4>  1   PRIMARY ref <auto_key0> <auto_key0> 4   stephen_spcplus_inno.users.user_id  4   
logs3   4   DERIVED index   user_id user_id 8       41  Using index
<derived5>  1   PRIMARY ref <auto_key0> <auto_key0> 4   stephen_spcplus_inno.users.user_id  2   
logs4   5   DERIVED index   user_id user_id 8       28  Using index
<derived6>  1   PRIMARY ref <auto_key0> <auto_key0> 4   stephen_spcplus_inno.users.user_id  2   
logs5   6   DERIVED index   user_id user_id 8       28  Using index
<derived7>  1   PRIMARY ref <auto_key0> <auto_key0> 4   stephen_spcplus_inno.users.user_id  275 
logs6   7   DERIVED index   user_id user_id 775     27516   Using index

example results :

username    computer_name   cnt1    cnt2    cnt3    cnt4    cnt5    cnt6
testuser    COMPUTER_1  1   2   1   (null)  (null)  3
testuser2   COMPUTER_1  (null)  (null)  (null)  (null)  (null)  (null)
someuser    COMPUTER_2  32  83  26  15  28  1157

As an example, for logs6 the plan is reading every row in the database (27516) yet there were only 1160 which 'should' have been joined.

I have tried lots of different things, but cannot get this to operate in an optimized manner. As it is currently the reason all the rows from each table are being read is due to the use of COUNT(*) within each joins subquery... removing this and only the needed rows are joined like I want, however, I do not know how to get the counts then in the same grouped result.

Help from any gurus would be great! Yes, I know I do not have a lot of rows in the db, but I can see the results are correct and see that the full table scans are going to be a problem as well.

EDIT (partial solution) :

I have found a partial solution to this problem, but it requires an additional query to get a list of user_ids. By adding WHERE user_id IN (17,22,23) where these are the user_ids which should be joined... to each log table I get the correct results and the entire table is not scanned.

If anyone knows of a way to make this work without this additional query and where additional please let me know.

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Subqueries In From Clause (modified query below): The database cannot properly optimize subqueries in the FROM clause. Therefore, we recommend to extract the subqueries to temporary tables, index them and join to them in the outer query.
  2. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  3. Explicitly ORDER BY After GROUP BY (modified query below): By default, the database sorts all 'GROUP BY col1, col2, ...' queries as if you specified 'ORDER BY col1, col2, ...' in the query as well. If a query includes a GROUP BY clause but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying 'ORDER BY NULL'.
Optimal indexes for this query:
ALTER TABLE `computers` ADD INDEX `computers_idx_account_id_status` (`account_id`,`status`);
ALTER TABLE `es_temp1` ADD INDEX `es_temp1_idx_user_id` (`user_id`);
ALTER TABLE `es_temp2` ADD INDEX `es_temp2_idx_user_id` (`user_id`);
ALTER TABLE `es_temp3` ADD INDEX `es_temp3_idx_user_id` (`user_id`);
ALTER TABLE `es_temp4` ADD INDEX `es_temp4_idx_user_id` (`user_id`);
ALTER TABLE `es_temp5` ADD INDEX `es_temp5_idx_user_id` (`user_id`);
ALTER TABLE `es_temp6` ADD INDEX `es_temp6_idx_user_id` (`user_id`);
ALTER TABLE `users` ADD INDEX `users_idx_computer_id_user_id` (`computer_id`,`user_id`);
The optimized query:
SELECT
        users.username AS username,
        computers.computer_name AS computer_name,
        l1.cnt AS cnt1,
        l2.cnt AS cnt2,
        l3.cnt AS cnt3,
        l4.cnt AS cnt4,
        l5.cnt AS cnt5,
        l6.cnt AS cnt6 
    FROM
        computers 
    INNER JOIN
        users 
            ON users.computer_id = computers.computer_id 
    LEFT JOIN
        es_temp1 AS l1 
            ON l1.user_id = users.user_id 
    LEFT JOIN
        es_temp2 AS l2 
            ON l2.user_id = users.user_id 
    LEFT JOIN
        es_temp3 AS l3 
            ON l3.user_id = users.user_id 
    LEFT JOIN
        es_temp4 AS l4 
            ON l4.user_id = users.user_id 
    LEFT JOIN
        es_temp5 AS l5 
            ON l5.user_id = users.user_id 
    LEFT JOIN
        es_temp6 AS l6 
            ON l6.user_id = users.user_id 
    WHERE
        computers.account_id = :cw_account_id 
        AND computers.status = :cw_status 
    GROUP BY
        users.user_id 
    ORDER BY
        NULL

Related Articles



* original question posted on StackOverflow here.