[Solved] Aurora MySQL, Innodb and Indexes: Creating index required more than ‘innodb_online_alter_log_max_size’ bytes of modification log. Please try again.

The challenge

If you have a large MySQL table and you're trying to add a new column and index it, you may have ran into this error:
Creating index 'Name' required more than 'innodb_online_alter_log_max_size' bytes 
of modification log. Please try again.

So how can you address it?

The solution

By default, recent MySQL versions will execute the ALTER statement with the INPLACE flag (unless it's eligible for INSTANT alter, though that's not always the case). this means that the database is using a temporary log (size of innodb_online_alter_log_max_size) which is by default 128MB in most recent versions, to keep track of DML changes happening during the ALTER command. So if the database is executing an UPDATE/INSERT/DELETE during the ALTER, it will keep track of those changes in the temp file, and then after the ALTER is done, it will apply those changes to the table as well.
So you have several options we can suggest:

Increase innodb_online_alter_log_max_size

  • You can increase innodb_online_alter_log_max_size, (also possible on RDS/Aurora instances, as it's a modifiable parameter). The pro is that it can help avoid the failure, but the con is that you'll end up collecting a large backlog of DML operations that has to be applied after the ALTER is done, and that's where the lock happens, so as large as that temp file, the lock will be longer and can cause more downtime.

Postpone DML operations

  • You can schedule the ALTER to a day/time that has a lot less DML operations in your database (for example a weekend), and if creating the column/index is rather quick (matter of hours lets say), you can get it done within that period of less-activity in the database.
  • You can also try to "dry out" the database from activity as much as possible. For example, if you have scheduled jobs that run those DELETE/UPDATE operations, you can pause them for the duration of the ALTER, and then resume them.
  • You might wonder "what exactly is the benefit of pausing the delete / update calls during altering the table? if I add index, and right after comes 100 update calls, what exactly is the difference in "snoozing" and resuming them only after alter was completed? is the fact the 100 update calls triggers while an ALTER is happening effecting, or is just they are stock "waiting" for the alter to finish?"
    The answer is that in many cases, MySQL will try to add the column as an online operation, which is not an instant operation. So we would recommend to see if you can get MySQL to use instant operation (at least for adding the column, as instant operations are not supported for index creation) rather than online operations, as instant operations are a lot faster, but has a few limitations/scoping you need to keep in mind. To better understand the relation between DML operations (update/insert/delete...) to DDL (the ALTER you're executing), you can imagine the ALTER process MySQL is executing this way:

    1. You have a table with 1000 records you want to add a new column to (and then index).
    2. You now execute an ALTER to add a new column, which is an operation that can take a while, even with online operations, so let's say it takes 1 hour starting now.
    3. MySQL starts with creating a temp copy of the table and adds the new column in the new temp table (prod table doesn't include the new column yet at this point).
    4. In the meantime, as it takes nearly 1 hour to create the new temp table with the new column, MySQL needs to perform 2 actions for with each DML (insert/update/delete) that arrives to the database: (a) it first applies the DML operation to the prod table to make sure the new data is available to your users, and (2) it also saves the DML to a temp log file (used in next steps).
    5. So as an example, let's say we had an UDPATE that updates record #101 in the table while the ALTER is still executing, the row will be updated in the prod table, and the update will be logged to a temp log file, but it won't be reflected in the new temp table with the new column just yet, as it's still being created with the original data + new column.
    6. So fast forward 50 minutes later, the temp table was created with the new column, and it's time to apply the incremental changes from the log, which were logged in the last 50 minutes.
    7. Now MySQL reads the online alter log file, and applies the DML changes that happened in the last 50 minutes. THIS IS WHERE LOCK STARTS on your table as mentioned in the MySQL docs: "A large innodb_online_alter_log_max_size setting permits more DML during an online DDL operation, but it also extends the period of time at the end of the DDL operation when the table is locked to apply logged DML."
    8. Once that's done, it does a switch (rename) of the prod table with the new table that contains the new column and all the updated data.
    9. All done.

So to summarize, the amount of DML (insert/update/delete) operations has a direct impact on the ALTER's duration and lock duration in that process, and therefore it's limited (in most MySQL versions to a 128MB log file using the innodb_online_alter_log_max_size configuration).Therefore, if you minimize the amount of DML operations (update/insert/delete) during the ALTER statements, you will:

    • Get a faster ALTER
    • Reduce the potential downtime as a result of the lock for incremental DML applying
    • Prevent potential failure of the ALTER before it completes, which happens when you exceed innodb_online_alter_log_max_size, which happens there are too many DML operations during the ALTER.

This means that by pausing DMLs you can avoid the double action, avoid longer locks on the production table, avoid potential failures in the ALTER (like the one you experienced), and get a faster ALTER operation overall. Having said that, We understand that from business perspective it's not always easy to pause DMLs, as your users always work with the system, but if you can reduce the amount by postponing cron jobs or something, it can help.P.S, if these ALTERs are not urgent for now, We always recommend to start with a test, not on a production database. You can snapshot the database, create a new temp aurora instance from the snapshot, and run the ALTERs. The reason we recommend that, is because it will give you a good estimate of how long the column creation and index creation will take (as now it fails so you don't really know how long it will take when it will be successful).
This will allow you to better prepare for the duration of the "pausing" of DMLs, and overall better predict how the process will look like in production.We assume that your decisions will be different in case you'll realize the ALTERs will take 2 days, when compared to 30 minutes.

Do not append several ALTER statements

  • Check if you're running both column creation and index creation in the same command (ALTER ... ADD COLUMN ... ADD INDEX). If so, we recommend to run them individually. It can help, as each of the commands will run less time, and the online_alter_log will be filled with less DML operations, so the size will be smaller.
    Separating the commands can also help, as the MySQL docs say that "Adding a column cannot be combined in the same statement with other ALTER TABLE actions that do not support ALGORITHM=INSTANT.", which means you can't really add the column in INSTANT mode, which can also help.

Do not use AFTER when adding a new column

  • Based on the command we can see you're adding the new column with a "AFTER `name`" statement. We can see why this can be better in terms of readability and maintenance, but please keep in mind that the MySQL docs say regarding INSTANT column add that "Prior to MySQL 8.0.29, a column can only be added as the last column of the table. Adding a column to any other position among other columns is not supported. From MySQL 8.0.29, an instantly added column can be added to any position in the table.". Therefore, maybe you should consider adding it without the "AFTER" statement, which will add it as the last column, but will allow you to use the INSTANCE algorithm.

Partitioning?

Many customers looking at this potential solution, however, in order to recommend it, you need to have a much better understanding of the entire system. Partitioning can be super helpful in some cases, but can be a real pain in others.
For example, are you 99.9% of the time fetching data from only the last week (based on timestamp), or there are some processes that do that and others that look at very large data sets back in time?
Partitioning will impact the way you structure your queries and indexes, so it's something you'd need to consider in advance. For example if you frequently run a large DELETE query, assuming the column deleted is indexed in that table - This usually means that there is a rather large chunk of data deleted there, and not only a few rows. This means that the database needs to locate them, delete the data, and update all indexes. Unless all of those rows are in the same "future partition", we are not sure if till be beneficial at all, as there is still a large chunk of data to be deleted, and indexes to be updated.
Before going into Partitioning you estimate how many rows are deleted on average. Our general guideline, in case you have a large delete, will be to check whether you can delete them in smaller chunks, so the process will be much quicker and will run more times, so you'll avoid potential long locks and potential long rollbacks on failure.

In some cases, partition by date isn't an option since companies have a very wide variety of date ranges executed.

So to summarize, We believe that most performance issues can be solved without going into partitioning, as partitioning comes with its own challenges.

One thought on “[Solved] Aurora MySQL, Innodb and Indexes: Creating index required more than ‘innodb_online_alter_log_max_size’ bytes of modification log. Please try again.

  1. I have designed and implemented most of the InnoDB ALTER TABLE features, including the online index creation and online table rebuild that first appeared in MySQL 5.6. A major contributing factor to the online log size is that the log will be written by concurrent DML before any operations are committed. For example, if a concurrently running transaction executed an INSERT and was rolled back, then both an INSERT and a DELETE would be logged for every secondary index that is being created. This was recently fixed in MariaDB Server 10.6.8, by MDEV-15250.

    MariaDB Server also implemented instant ADD COLUMN before MySQL. The version in MariaDB Server 10.3 can only add columns instantly to the end of the table. Starting with MariaDB Server 10.4, columns can be instantly added at any position, and they can be instantly dropped as well.

Comments are closed.