Re-evaluating MySQL 8 Query Transformations Capabilities

I recently stumbled upon a very interesting post by Lukas Eder, where he describes 10 query transformations which do not depend on the database’s cost model. He posted it a couple of years ago, though when I read it, I assumed some portions of it may still be relevant today.

In the original post, several databases were tested to see if their internal optimizer will be able to automatically re-write the SQL queries and optimize them. In those tests, MySQL under-performed in several of the use cases (the tested version was MySQL 8.0.2, which was released on 2017-07-17).

Seeing those results, and given the previous evaluation was done almost two years ago, I thought that now can be a good chance to re-evaluate a few of those tests with the latest MySQL 8.0.16 (released on 2019-04-25), and demonstrate EverSQL Query Optimizer's capabilities while we are at it.

We'll use the Sakila sample database for the demonstrations, and some of the original queries from Lukas's post.

The following two tables will be used for these demonstrations:

CREATE TABLE address (
    address_id INT NOT NULL,
    address VARCHAR(50) NOT NULL,
    CONSTRAINT pk_address PRIMARY KEY (address_id)

CREATE TABLE customer (
    customer_id INT NOT NULL,
    first_name VARCHAR(45) NOT NULL,
    last_name VARCHAR(45) NOT NULL,
    address_id INT NOT NULL,
    CONSTRAINT pk_customer PRIMARY KEY (customer_id),
    CONSTRAINT fk_customer_address FOREIGN KEY (address_id)
    REFERENCES address(address_id)

    first_name VARCHAR(45) NOT NULL,
    last_name VARCHAR(45) NOT NULL,
    PRIMARY KEY  (actor_id),
    KEY idx_actor_last_name (last_name)

JOIN Elimination

Looking at the query below, you'll probably notice that the table address isn't really used anywhere in the query (other than in the join's ON clause), and has no actual contribution to the query. The columns are selected from the customer table, and there are no predicates for the address table (or any predicates for that matter). The existence of a PRIMARY KEY for address_id in the table address and the FOREIGN KEY for the same column in the table customer should provide the database's optimizer the confidence to conclude that the join to the table address can be spared.

SELECT c.first_name, c.last_name
FROM   customer c
JOIN   address a ON c.address_id = a.address_id

Yet, as you can see from the EXPLAIN, MySQL executes the join, without applying the expected query transformation.

When submitting the same query and schema structure to EverSQL, it will output the following query and recommendation:

In a similar manner, MySQL 8.0.16 will execute the join for all other examples in the Join Elimination section of the original post, so I saw no reason to repeat them here.

Unneeded Self JOIN

In the following query, the table actor is joined to itself. It can be proven that a1 = a2, because the tables are joined using the primary key actor_id. Therefore, anything we can do with a2, can actually be done with a1 as well. Therefore, we can can modify the references to a2 in the SELECT clause to the same columns in a1, and remove the redundant join to a2.

SELECT a1.first_name, a2.last_name
FROM   actor a1
JOIN   actor a2 ON a1.actor_id = a2.actor_id;

MySQL 8.0.16 will execute the join in this case as well, without applying the expected query transformation.

When submitting the same query and schema structure to EverSQL, it will output the following query and recommendation:

Original queries and information for this section can be found here.

Predicate Pushdown

We should always strive to have our SQL queries process as less data as possible, especially if the filtering is done using indexes. In the following query, we expect MySQL to push the condition from the outer query to both parts of the UNION ALL, to make sure we filter out as much data as we can, as early as possible.

        first_name, last_name, 'actor' type
        actor UNION ALL SELECT 
        first_name, last_name, 'customer' type
        customer) people
    people.last_name = 'DAVIS';

As you can see below, the transformation isn't applied by MySQL, and both tables, actor and customer are scanned in full.

Submitting the query and schema structure to EverSQL will result in the following query and recommendation:

When looking at the execution plan of the optimized query, you can see that the indexes are used, and significantly less data is scanned:

Wrapping up

Query transformations can be very powerful, and it's important to understand which of them will be applied automatically by the database and which won't. In this post, we listed three examples (originally posted by Lukas Eder), in which MySQL 8.0.16 didn't apply the expected transformations, which eventually resulted in non-optimal execution plans.