[Solved] Speeding up SQL query with more than 6 billion operations
Looking to automatically optimize YOUR SQL query? Start for free.

EverSQL Database Performance Knowledge Base

Speeding up SQL query with more than 6 billion operations

We have a Django project with the following problem:

User objects are stored in the auth_user table

These have a One-to-One relation to the Profile object, this object is stored in the yuza_profile table. Its relation to the user is maintained through the user_id field

However, years ago thousands of Profile objects were deleted without removing the linked User objects.

These users without profiles are causing problems in our system and we would like to remove them.

I created the following SQL query to perform this task

DELETE FROM auth_user WHERE id NOT IN (SELECT user_id FROM yuza_profile); 

Since the Profile objects have been deleted I cannot scan for null values on *_id fields - which would be my first move here.

Instead I am looping over two tables that both contain more than 60.000 rows which leads to a slow and inefficient query with more than 6 billion operations (and server timeouts)

Is there a way to speed up this query? I am aware that my query is very inefficient but I do not know of a way to improve it and any help would be greatly appreciated.

EDIT: As requested I've included the schemas below:

user_profile

id - integer
user_id - integer
gender - varchar(2)
birth_date - date
address - varchar(255)
city - varchar(255)
phone_number - varchar(10)
avatar - varchar(255)
---------------------
user_profile_pkey - (id)
user_profile_user_id_key - (user_id)
user_profile_user_id_fkey - (user_id) -> auth_user(id)
user_profile_pkey - (id) UNIQUE
user_profile_user_id_key - (user_id) UNIQUE
auth_user

id - integer
username - varchar(150)
first_name - varchar(30)
last_name - varchar(30)
email - varchar(75)
password - varchar(128)
is_staff - boolean
is_active - boolean
is_superuser - boolean
last_login - timestamp with time zone
date_joined - timestamp with time zone
---------------
auth_user_pkey - (id)
auth_user_username_key - (username)
auth_user_pkey - (id) UNIQUE
auth_user_username_key - (username) UNIQUE

I should note that I've started to wonder if the IDE (PycharmPro) I am using might be a factor - when testing my queries with the SELECT statement the IDE displays queryresults as pages with 500 results per page. The initial query indeed took less than a second - but pressing the 'go to last page' button took more than 2 minutes (for just 16000 results)

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
ALTER TABLE `auth_user` ADD INDEX `auth_user_idx_id` (`id`);
The optimized query:
SELECT
        1 
    FROM
        auth_user 
    WHERE
        auth_user.id NOT IN (
            SELECT
                yuza_profile.user_id 
            FROM
                yuza_profile
        )

Related Articles



* original question posted on StackOverflow here.