[Solved] Return rows with the highest value on one column corresponding to a given value in another

EverSQL Database Performance Knowledge Base

Return rows with the highest value on one column corresponding to a given value in another

Database type:

There's a MySQL table named raw_contacts with the following structure:

ID (primary auto-increment, int)
PHONE (composite unique with NAME, varchar)
NAME (composite unique with PHONE, varchar)
FREQUENCY (int)
Composite unique key is named PHONENUM

I am trying to write a query to return the row with the highest corresponding value in the FREQUENCY column for any given value for PHONE. I did find a related post but the accepted answer doesn't work for my case since I have a composite key of two columns to run the check against.

Just to illustrate, consider the following sample:

1 1234 John 6
2 1234 Dave 2
3 2199 Bill 9
4 1878 Dani 3
5 1234 Cory 7
6 1234 Gore 5
7 3319 Cory 1

Run against 1234 on the above sample, the query should return the 5th row since Cory has the highest count in the FREQUENCY column for that number.

Here's what I've come up with and it works great:

select RC.ID, RC.PHONE, RC.FREQUENCY, RC.NAME
from `raw_contacts` RC
inner join(
    select PHONE, max(FREQUENCY) FREQUENCY
    from `raw_contacts`
    where PHONE="11111"
    group by PHONE
) RC2 on RC.PHONE = RC2.PHONE and RC.FREQUENCY = RC2.FREQUENCY
limit 1

Is there any more resource-friendly query for this task since it needs to run at a very high frequency on a table with millions of records? I'm looking to optimize it to the bone!

P.S. In case of more than one rows qualifying, I need only one of them, which one doesn't matter.

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Subqueries (query line: 7): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, it's recommended to join a newly created temporary table that holds the data, which also includes the relevant search index.
  2. Avoid Subqueries (query line: 16): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, it's recommended to join a newly created temporary table that holds the data, which also includes the relevant search index.
  3. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  4. Explicitly ORDER BY After GROUP BY (modified query below): By default, the database sorts all 'GROUP BY col1, col2, ...' queries as if you specified 'ORDER BY col1, col2, ...' in the query as well. If a query includes a GROUP BY clause but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying 'ORDER BY NULL'.
  5. Sort and Limit Before Joining (modified query below): In cases where the joins aren't filtering any rows, it's possible to sort and limit the amount of rows using a subquery in the FROM clause, before applying the joins to all other tables.
  6. Use Numeric Column Types For Numeric Values (query line: 22): Referencing a numeric value (e.g. 11111) as a string in a WHERE clause might result in poor performance. Possible impacts of storing numbers as varchars: more space will be used, you won't be able to perform arithmetic operations, the data won't be self-validated, aggregation functions like SUM won't work, the output may sort incorrectly and more. If the column is numeric, remove the quotes from the constant value, to make sure a numeric comparison is done.
Optimal indexes for this query:
ALTER TABLE `raw_contacts` ADD INDEX `raw_contacts_idx_phone` (`PHONE`);
The optimized query:
SELECT
        rc_id,
        rc_phone,
        rc_frequency,
        rc_name 
    FROM
        (SELECT
            RC.ID AS rc_id,
            RC.PHONE AS rc_phone,
            RC.FREQUENCY AS rc_frequency,
            RC.NAME AS rc_name 
        FROM
            `raw_contacts` RC LIMIT 1) RC 
    INNER JOIN
        (
            SELECT
                `raw_contacts`.PHONE,
                max(FREQUENCY) FREQUENCY 
            FROM
                `raw_contacts` 
            WHERE
                `raw_contacts`.PHONE = '11111' 
            GROUP BY
                `raw_contacts`.PHONE 
            ORDER BY
                NULL
        ) RC2 
            ON RC.rc_phone = RC2.PHONE 
            AND RC.rc_frequency = RC2.FREQUENCY LIMIT 1

Related Articles



* original question posted on StackOverflow here.