[Solved] Calculating rolling percentage change by category in MySQL 5.6

EverSQL Database Performance Knowledge Base

Calculating rolling percentage change by category in MySQL 5.6

Database type:

data table

I have a SQL data table like this, and I wanted to calculate the rolling percentage change (by row and category). So that the result looks like this below

result table

The SQL query I use is really slow and it takes forever to calculate it when there are thousands of categories, do you have an idea what's going on? Or any improvement?

First create a sample data_table:

CREATE TABLE IF NOT EXISTS data_table (
    id INT AUTO_INCREMENT,
    num INT,
    category VARCHAR(10),
    price FLOAT(20,2),
    PRIMARY KEY (id)
);

INSERT INTO data_table(num,category,price)
VALUES(1,"A","10"),
      (2,"A","20"),
      (3,"A","30"),
      (1,"B","20"),
      (2,"B","30"),
      (3,"B","40");

SQL for calculating percentage change:

SELECT 
     A.*, 
     CASE WHEN (A.price IS NULL OR B.price IS NULL OR B.price=0) THEN 0 ELSE
        (A.price - B.price)/(B.price) *100 END AS perc
FROM (SELECT
    num,
    category,
    price
  FROM data_table
  ) A LEFT JOIN (SELECT
    num,
    category,
    price
  FROM data_table
  ) B
ON (A.num = B.num+1) AND A.category=B.category;

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Selecting Unnecessary Columns (query line: 2): Avoid selecting all columns with the '*' wildcard, unless you intend to use them all. Selecting redundant columns may result in unnecessary performance degradation.
  2. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  3. Prefer Direct Join Over Joined Subquery (query line: 17): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, we recommend to replace subqueries with JOIN clauses.
  4. Prefer Direct Join Over Joined Subquery (query line: 9): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, we recommend to replace subqueries with JOIN clauses.
Optimal indexes for this query:
ALTER TABLE `data_table` ADD INDEX `data_table_idx_category` (`category`);
The optimized query:
SELECT
        A.*,
        CASE 
            WHEN (A.price IS NULL 
            OR B.price IS NULL 
            OR B.price = 0) THEN 0 
            ELSE (A.price - B.price) / (B.price) * 100 END AS perc 
FROM
data_table A 
LEFT JOIN
data_table B 
    ON (
        A.num = B.num + 1
    ) 
    AND A.category = B.category

Related Articles



* original question posted on StackOverflow here.