[Solved] MySQL (id >= N AND col2 IS NULL) query unexpectedly slow for large N

EverSQL Database Performance Knowledge Base

MySQL (id >= N AND col2 IS NULL) query unexpectedly slow for large N

Database type:

We are using MySQL 5.5.42.

We have a table publications containing about 150 million rows (about 140 GB on an SSD).

The table has many columns, of which two are of particular interest:

Both columns have their own (separate) index.

We make queries of the form

SELECT * FROM publications
WHERE id >= 14032924480302800156 AND cluster_id IS NULL
ORDER BY id
LIMIT 0, 200;

Here is the problem: The larger the id value (14032924480302800156 in the example above), the slower the request.

In other words, requests for low id value are fast (< 0.1 s) but the higher the id value, the slower the request (up to minutes).

Everything is fine if we use another (indexed) column in the WHERE clause. For instance

SELECT * FROM publications
WHERE inserted_at >= '2014-06-20 19:30:25' AND cluster_id IS NULL
ORDER BY inserted_at
LIMIT 0, 200;

where inserted_at is of type timestamp.

Edit:

Output of EXPLAIN when using id >= 14032924480302800156:

id | select_type | table        | type | possible_keys      | key        | key_len | ref   | rows     | Extra
---+-------------+--------------+------+--------------------+------------+---------+-------+----------+------------
1  | SIMPLE      | publications | ref  | PRIMARY,cluster_id | cluster_id | 9       | const | 71647796 | Using where

Output of EXPLAIN when using inserted_at >= '2014-06-20 19:30:25':

id | select_type | table        | type | possible_keys          | key        | key_len | ref   | rows     | Extra
---+-------------+--------------+------+------------------------+------------+---------+-------+----------+------------
1  | SIMPLE      | publications | ref  | inserted_at,cluster_id | cluster_id | 9       | const | 71647796 | Using where

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid OFFSET In LIMIT Clause (query line: 9): OFFSET clauses can be very slow when used with high offsets (e.g. with high page numbers when implementing paging). Instead, use the following \u003ca target\u003d"_blank" href\u003d"http://www.eversql.com/faster-pagination-in-mysql-why-order-by-with-limit-and-offset-is-slow/"\u003eseek method\u003c/a\u003e, which provides better and more stable response rates.
  2. Avoid Selecting Unnecessary Columns (query line: 2): Avoid selecting all columns with the '*' wildcard, unless you intend to use them all. Selecting redundant columns may result in unnecessary performance degradation.
  3. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
ALTER TABLE `publications` ADD INDEX `publications_idx_cluster_id_id` (`cluster_id`,`id`);
ALTER TABLE `publications` ADD INDEX `publications_idx_id` (`id`);
The optimized query:
SELECT
        * 
    FROM
        publications 
    WHERE
        publications.id >= 14032924480302800156 
        AND publications.cluster_id IS NULL 
    ORDER BY
        publications.id LIMIT 0,
        200

Related Articles



* original question posted on StackOverflow here.