[Solved] Is there a better way to optimise / index this query?
Looking to automatically optimize YOUR SQL query? Start for free.

EverSQL Database Performance Knowledge Base

Is there a better way to optimise / index this query?

I have a semi-large (10,000,000+ record) credit card transaction database that I need to query regularly. I have managed to optimise most queries to be sub 0.1 seconds but I'm struggling to do the same for sub-queries.

The purpose of the following query is to obtain the number of "inactive" credit cards (credit cards that have not made a card transaction in the last x days / weeks) for both the current user's company, and all companies (so as to form a comparison).

The sub-query first obtains the last card transaction of all credit cards, and then the parent query removes any expired credit cards, and groups the card based on their associated company and whether or not the they are deemed "inactive" (the (UNIX_TIMESTAMP() - (14 * 86400)) is used in place of a PHP time calculation.

SELECT
    SUM(IF(LastActivity < (UNIX_TIMESTAMP() - (14 * 86400)), 1, 0)) AS AllInactiveCards,
    SUM(IF(LastActivity >= (UNIX_TIMESTAMP() - (14 * 86400)), 1, 0)) AS AllActiveCards,
    SUM(IF(LastActivity < (UNIX_TIMESTAMP() - (14 * 86400)) AND lastCardTransactions.CompanyID = 15, 1, 0)) AS CompanyInactiveCards,
    SUM(IF(LastActivity >= (UNIX_TIMESTAMP() - (14 * 86400)) AND lastCardTransactions.CompanyID = 15, 1, 0)) AS CompanyActiveCards
FROM CardTransactions
JOIN
(
    SELECT
        CardSerialNumberID,
        MAX(CardTransactions.Timestamp) AS LastActivity,
        CardTransactions.CompanyID
    FROM CardTransactions
    GROUP BY
        CardTransactions.CardSerialNumberID, CardTransactions.CompanyID
) lastCardTransactions
ON
    CardTransactions.CardSerialNumberID = lastCardTransactions.CardSerialNumberID AND
    CardTransactions.Timestamp = lastCardTransactions.LastActivity AND
    CardTransactions.CardExpiryTimestamp > UNIX_TIMESTAMP()

The indexes in use are on CardSerialNumberID, CompanyID, Timestamp for the inner query, and CardSerialNumberID, Timestamp, CardExpiryTimestamp, CompanyID for the outer query.

The query takes around 0.4 seconds to execute when done multiple times, but the initial run can be as slow as 0.9 - 1.1 seconds, which is a big problem when loading a page with 4-5 of these types of query.

One thought I did have was to calculate the overall inactive card number in a routine separate to this, perhaps run daily. This would allow me to adjust this query to only pull records for a single company, thus reducing the dataset and bringing the query time down. However, this is only really a temporary fix, as the database will continue to grow until the same amount of data is being analysed anyway.

Note: The query above's fields have been modified to make them more generic, as the specific subject this query is used on is quite complex. As such there is no DB schema to give (and if there was, you'd need a dataset of 10,000,000+ records anyway to test the query I suppose). I'm more looking for a conceptual fix than for anyone to actually give me an adjusted query.

Any help is very much appreciated!

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Subqueries (query line: 20): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, it's recommended to join a newly created temporary table that holds the data, which also includes the relevant search index.
  2. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  3. Explicitly ORDER BY After GROUP BY (modified query below): By default, the database sorts all 'GROUP BY col1, col2, ...' queries as if you specified 'ORDER BY col1, col2, ...' in the query as well. If a query includes a GROUP BY clause but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying 'ORDER BY NULL'.
Optimal indexes for this query:
ALTER TABLE `CardTransactions` ADD INDEX `cardtransactions_idx_cardexpirytimestamp` (`CardExpiryTimestamp`);
ALTER TABLE `CardTransactions` ADD INDEX `cardtransactions_idx_cardserial_companyid_timestamp` (`CardSerialNumberID`,`CompanyID`,`Timestamp`);
The optimized query:
SELECT
        SUM(IF(LastActivity < (UNIX_TIMESTAMP() - (14 * 86400)),
        1,
        0)) AS AllInactiveCards,
        SUM(IF(LastActivity >= (UNIX_TIMESTAMP() - (14 * 86400)),
        1,
        0)) AS AllActiveCards,
        SUM(IF(LastActivity < (UNIX_TIMESTAMP() - (14 * 86400)) 
        AND lastCardTransactions.CompanyID = 15,
        1,
        0)) AS CompanyInactiveCards,
        SUM(IF(LastActivity >= (UNIX_TIMESTAMP() - (14 * 86400)) 
        AND lastCardTransactions.CompanyID = 15,
        1,
        0)) AS CompanyActiveCards 
    FROM
        CardTransactions 
    JOIN
        (
            SELECT
                CardTransactions.CardSerialNumberID,
                MAX(CardTransactions.Timestamp) AS LastActivity,
                CardTransactions.CompanyID 
            FROM
                CardTransactions 
            GROUP BY
                CardTransactions.CardSerialNumberID,
                CardTransactions.CompanyID 
            ORDER BY
                NULL
        ) lastCardTransactions 
            ON CardTransactions.CardSerialNumberID = lastCardTransactions.CardSerialNumberID 
            AND CardTransactions.Timestamp = lastCardTransactions.LastActivity 
            AND CardTransactions.CardExpiryTimestamp > UNIX_TIMESTAMP()

Related Articles



* original question posted on StackOverflow here.