Queries with LIMITs and OFFSETs are common in application that require pagination and in some cases might work well for a while.
In many cases though, they become slow and painful once the OFFSET has a high value.
Why OFFSET is so slow?
Well, in most cases, low offset queries are not slow. The problem starts with high OFFSET values.
If your query is using the following limit clause: "LIMIT 50000, 20", it's actually requesting the database to go through 50,020 rows and throw away the first 50,000. This action can have a high cost an impact response time.
You may ask yourself “who the heck is going to skip to page 50,000 in my application?”.
Let’s list few possible use cases:
- Your favorite search engine (Google / Bing / Yahoo / DuckDuckGo / whatever) is about to index your ecommerce website. You have about 100,000 pages in that website. How will your application react when the search bot will try to fetch those last 50,000 pages to index them? How frequently will that happen?
- In most web applications, we allow the user to skip to the last page, and not only the next page. What will happen when the user will try to skip to page 50,000 after visiting page 2?
- What happens if a user landed in page 20,000 from a Google search result, liked something there and posted it on facebook for another 1000 friends to read?
We tested the following OFFSET values with the following query, to present the performance deterioration as the OFFSET grows.
The query was executed on a table that holds user performed events (an analytics table) with 150,000 records. The data is real user information and not auto generated.
SELECT * FROM events WHERE date > '2010-01-01T00:00:00-00:00' AND event = 'editstart' ORDER BY date LIMIT 50;
|Offset||Query Duration (ms)|
How to optimize slow OFFSET queries?
To optimize slow OFFSET queries, you can either limit the amount of permitted pages in a pagination view, or simply just not use OFFSET.
A good alternative for using OFFSET will be the Seek Method, which is also highly recommended by both Lukas Eder and Markus Winand in their blogs.
In simple words, the seek method is all about finding a unique column or set of columns that identifies each row. Then, instead of using the OFFSET clause, we can just use that unique value as a bookmark that presents the position of the last row we’ve fetched and query the next set of rows by starting from this position in the WHERE clause.
For example, looking at the queries we executed before, assuming the last event id in offset 999,999 was ‘111866’, the query will be:
SELECT * FROM events WHERE (date,id) > ('2010-07-12T10:29:47-07:00',111866) AND event = 'editstart' ORDER BY date, id LIMIT 10
Another way to write the query is:
SELECT * FROM events WHERE date>='2010-07-12T10:29:47-07:00' and not (date='2010-07-12T10:29:47-07:00' and id < 111866) AND event = 'editstart' ORDER BY date, id LIMIT 10
Please note that you need to make sure to order by the unique columns, so that the order is always kept the same between pages, otherwise you might get an unexpected behavior.
This is a comparison of the performance between both methods. The interesting observation here is not only that the performance of the Seek method is better, but that it’s also more stable no matter how far you paginate into the table.
Possible pitfalls / challenges
- You’ll need to change some code in your application to get this method working, by saving the last fetched row (instead of adjusting the relevant offset value).
- There should be an index on the unique seek column / set of columns.
- Each column in the unique set of columns must have the NOT NULL constraint, otherwise you might get some unexpected behaviour.
- When a user skips pages, you first need to fetch the relevant position of that page. Say we want to skip to page 40,000:
SELECT date, id FROM events ORDER BY date, id LIMIT 1 OFFSET 39999;
This query should be extremely quick, because it's using a covering index.
We do not recommend using the OFFSET capability in MySQL to implement paging capabilities. When data grows, you’ll probably start noticing performance issues. Instead, consider using the Seek Method described above.