[Solved] Optimization of MySQL query have millions of records

EverSQL Database Performance Knowledge Base

Optimization of MySQL query have millions of records

Database type:

OBJECTIVE: Need query to count all "distinct" leads outside of current company that do not exist in current company. The query needs to account for millions of records between multiple tables (lead_details, domains, company)

EXAMPLE: 
  company 1 -> domain 1 -> lead 1 lead_details records exists.
  company 2 -> domain 2 -> lead 1 lead_details records exists.  
  company 2 -> domain 2 -> lead 2 lead_details records exists. 
  company 3 -> domain 3 -> lead 2 lead_details records exists. 
  company 3 -> domain 3 -> lead 3 lead_details records exists.

RESULT: If I run the query for the data above on company 1, the result should be a count of (2) since lead 2 & lead 3 is unique and does not exist in company 1

domain_id domain_name company_id company_name lead_id lead_count
    "2"         "D2"        "2"       "C2"        "2"      "2"
    "3"         "D3"        "3"       "C3"        "3"      "1"

Here is my Query, Please let me know if anyone has any better suggestion.

SELECT al.*
FROM (
    SELECT 
    d.id AS domain_id, 
    d.name AS domain_name, 
    c.id AS company_id, 
    c.name AS company_name, 
    ld.lead_id, 
    count(ld.lead_id) as lead_count 
    FROM domains d 
    INNER JOIN company c
    ON (c.id = d.company_id AND c.id != 1)
    INNER JOIN lead_details ld 
    ON (ld.domain_id = d.id)
    GROUP BY ld.lead_id
) al 
LEFT JOIN (
    SELECT ld.lead_id FROM domains d 
    INNER JOIN company c
    ON (c.id = d.company_id AND c.id = 1)
    INNER JOIN lead_details ld 
    ON (ld.domain_id = d.id)
) ccl
ON al.lead_id = ccl.lead_id 
WHERE ccl.lead_id IS NULL;

I have almost million rows, so need to figure out better solution..

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Selecting Unnecessary Columns (query line: 2): Avoid selecting all columns with the '*' wildcard, unless you intend to use them all. Selecting redundant columns may result in unnecessary performance degradation.
  2. Avoid Subqueries In From Clause (modified query below): The database cannot properly optimize subqueries in the FROM clause. Therefore, we recommend to extract the subqueries to temporary tables, index them and join to them in the outer query.
  3. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
ALTER TABLE `es_temp2` ADD INDEX `es_temp2_idx_lead_id` (`lead_id`);
The optimized query:
SELECT
        al.* 
    FROM
        es_temp1 al 
    LEFT JOIN
        es_temp2 ccl 
            ON al.lead_id = ccl.lead_id 
    WHERE
        ccl.lead_id IS NULL

Related Articles



* original question posted on StackOverflow here.