[Solved] How to Check 4m Rows of Data between MySql Tables in Different DBs Efficiently
Looking to automatically optimize YOUR SQL query? Start for free.

EverSQL Database Performance Knowledge Base

How to Check 4m Rows of Data between MySql Tables in Different DBs Efficiently

Database type:

I'm trying to check whether the data in a column (username2) from a table (db2.table2) is in another column (username2) from another table (db1.table1). And if it isn't, then enter a 'No' into the column 'usernametaken' in db2.table2.

This is what i've tried:

UPDATE table2 SET usernametaken = "No" WHERE db2.table2.username2 NOT IN (SELECT username1 FROM db1.table1) 

In an initial test (with LIMIT 2 added), the 2 cells that had a 'No' added was correct. And it took 467.1423 seconds.

Then i ran it fully on 4mn+ rows (table2) and 100mn rows (table1). It ran for 3 days and i had to force terminate it by stopping MySQL. When i reviewed table2, there wasn't any data (ie 'No''s) added into the usernametaken column.

Clearly something's not right, and even if there were some results, this query is surely not the best way to get this done. It would be great if anyone can lend a hand on how to improve the query.

I just tried this:

ALTER TABLE db2.table2 ADD INDEX covering_index (username2, usernametaken);
UPDATE table2 SET usernametaken = "No" WHERE db2.table2.username2 NOT IN (SELECT username1 FROM db1.table1) LIMIT 10

... and just got the result ... 8 rows affected. (Query took 1126.1817 seconds.)

So, the required rows seem to get affected when i put a LIMIT in place. However, it still takes way too long ... 1126 secs / 8 rows * 4mn rows = 563mn seconds = 6516 days.

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
ALTER TABLE `table2` ADD INDEX `table2_idx_username2` (`username2`);
The optimized query:
SELECT
        table2.usernametaken 
    FROM
        table2 
    WHERE
        db2.table2.username2 NOT IN (
            SELECT
                db1.table1.username1 
            FROM
                db1.table1
        )

Related Articles



* original question posted on StackOverflow here.