I'm trying to check whether the data in a column (username2) from a table (db2.table2) is in another column (username2) from another table (db1.table1). And if it isn't, then enter a 'No' into the column 'usernametaken' in db2.table2.
This is what i've tried:
UPDATE table2 SET usernametaken = "No" WHERE db2.table2.username2 NOT IN (SELECT username1 FROM db1.table1)
In an initial test (with LIMIT 2 added), the 2 cells that had a 'No' added was correct. And it took 467.1423 seconds.
Then i ran it fully on 4mn+ rows (table2) and 100mn rows (table1). It ran for 3 days and i had to force terminate it by stopping MySQL. When i reviewed table2, there wasn't any data (ie 'No''s) added into the usernametaken column.
Clearly something's not right, and even if there were some results, this query is surely not the best way to get this done. It would be great if anyone can lend a hand on how to improve the query.
I just tried this:
ALTER TABLE db2.table2 ADD INDEX covering_index (username2, usernametaken);
UPDATE table2 SET usernametaken = "No" WHERE db2.table2.username2 NOT IN (SELECT username1 FROM db1.table1) LIMIT 10
... and just got the result ... 8 rows affected. (Query took 1126.1817 seconds.)
So, the required rows seem to get affected when i put a LIMIT in place. However, it still takes way too long ... 1126 secs / 8 rows * 4mn rows = 563mn seconds = 6516 days.
The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:
ALTER TABLE `table2` ADD INDEX `table2_idx_username2` (`username2`);
SELECT
table2.usernametaken
FROM
table2
WHERE
db2.table2.username2 NOT IN (
SELECT
db1.table1.username1
FROM
db1.table1
)