[Solved] how can i make the following query more efficient in lookup with many conditions
Looking to automatically optimize YOUR SQL query? Start for free.

EverSQL Database Performance Knowledge Base

how can i make the following query more efficient in lookup with many conditions

First of all i need to have a functionality of pagination by limiting the result in the following query lookup with couple where conditions.

SELECT SQL_CALC_FOUND_ROWS
    a.uid, b.NAME
FROM
    `profiles` AS a FORCE INDEX(profiles_country_city_gender_index)
JOIN `users` AS b
ON b.id = a.uid
AND a.country = 'INDONESIA'
AND a.gender = 0
JOIN (
    SELECT
        a.uid
    FROM
       profile_details AS a
    JOIN profile_details AS kids ON kids.uid = a.uid
    AND kids.kids_pref = 1
    JOIN profile_details AS current ON current.uid = a.uid
    AND current.current_relationship = 1
    JOIN profile_details AS smoking ON smoking.uid = a.uid
    AND smoking.smoking_pref = 1
    ) AS e ON e.uid = a.uid
AND ( TIMESTAMPDIFF( YEAR, a.birth_date, NOW()) BETWEEN 25 AND 35 )
LIMIT 33;

All tables here is one on one relation with table Users which are

Using id column as Primary key in Users, and uid in the other tables as Foreign key. At the start, i do not have problem with the above query/design until the tables grows till 300K rows in it, the query running takes OK, Time: 0.726000s to fetch the result which is too slow for my opinion.

I tried to count the rows based on the conditions above using count(*) and get some approximate same result, i need to have the faster way to get the count of rows from the lookup conditions to make the pagination system works as expected with less waiting time.

As you can see in the query, i am using:

FORCE INDEX(profiles_country_city_gender_index)

I think it's not helping much caused by the larger rows result from the scope by using:

AND a.country = 'INDONESIA' 
AND a.gender = 0

Resulting (148801 rows scope limit by country with gender is equal by 0), if i am pairing with city it's not problem query time is considerable because rows result much smaller but still gonna be a problem when there were larger rows at someday.

To anyone who might ask for the query explanation:

Explain SELECT SQL_CALC_FOUND_ROWS
        a.uid, 
        b.NAME ...

Results:

| select_type | table   | type   | possible_keys                      | key                                | key_len | ref              | rows   | filtered  | Extra                              |
+-------------+---------+--------+------------------------------------+------------------------------------+---------+------------------+--------+-----------+------------------------------------+
| SIMPLE      | a       | ref    | profiles_country_city_gender_index | profiles_country_city_gender_index | 242     | const            | 148801 | 10.00     | Using index condition; Using where |
| SIMPLE      | a       | ref    | profile_details_uid_foreign        | profile_details_uid_foreign        | 3       | restfulapi.a.uid | 1      | 100.00.00 | Using index                        |
| SIMPLE      | kids    | ref    | profile_details_uid_foreign        | profile_details_uid_foreign        | 3       | restfulapi.a.uid | 1      | 10.00     | Using where                        |
| SIMPLE      | current | ref    | profile_details_uid_foreign        | profile_details_uid_foreign        | 3       | restfulapi.a.uid | 1      | 10.00     | Using where                        |
| SIMPLE      | smoking | ref    | profile_details_uid_foreign        | profile_details_uid_foreign        | 3       | restfulapi.a.uid | 1      | 10.00     | Using where                        |
| SIMPLE      | b       | eq_ref | PRIMARY                            | PRIMARY                            | 3       | restfulapi.a.uid | 1      | 100.00.00 |                                    |

As you can see in explain result, no table scan or using temporary or using range, only index condition. I imagine if the tables have at least 1 million rows returns by Country scope, just multiply the time with scaling rows by 300K it sucks :(.

Below is the table definition in case it helps to analysing the problem:

CREATE TABLE `profile_details` (
  `id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
  `uid` mediumint(8) unsigned NOT NULL,
  `intents` tinyint(4) NOT NULL DEFAULT '3',
  `height` smallint(6) DEFAULT NULL,
  `body_type` tinyint(4) NOT NULL DEFAULT '5',
  `kids_pref` tinyint(4) NOT NULL DEFAULT '1',
  `drinking_pref` tinyint(4) NOT NULL DEFAULT '2',
  `living_with` tinyint(4) NOT NULL DEFAULT '0',
  `current_relationship` tinyint(4) NOT NULL DEFAULT '1',
  `sexual_pref` tinyint(4) NOT NULL DEFAULT '1',
  `smoking_pref` tinyint(4) NOT NULL DEFAULT '0',
  `status_online` tinyint(4) NOT NULL DEFAULT '0',
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `profile_details_uid_foreign` (`uid`),
  KEY `idx_multipart` (`intents`,`body_type`,`kids_pref`,`drinking_pref`,`living_with`,`current_relationship`,`sexual_pref`,`smoking_pref`),
  CONSTRAINT `profile_details_uid_foreign` FOREIGN KEY (`uid`) REFERENCES `users` (`id`)
)

CREATE TABLE `profiles` (
  `id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
  `uid` mediumint(8) unsigned NOT NULL,
  `birth_date` date NOT NULL,
  `gender` tinyint(4) NOT NULL DEFAULT '0',
  `country` varchar(60) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'ID',
  `city` varchar(60) COLLATE utf8mb4_unicode_ci DEFAULT 'Makassar',
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL,
  `latitude` double NOT NULL DEFAULT '0',
  `longitude` double NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `profiles_uid_foreign` (`uid`),
  KEY `profiles_birth_date_index` (`birth_date`),
  KEY `profiles_latitude_longitude_index` (`latitude`,`longitude`),
  KEY `profiles_country_city_gender_index` (`country`,`city`,`gender`),
  KEY `idx_country_gender_birthdate` (`country`,`gender`,`birth_date`),
  KEY `idx_country_city_gender_birthdate` (`country`,`city`,`gender`,`birth_date`),
  CONSTRAINT `profiles_uid_foreign` FOREIGN KEY (`uid`) REFERENCES `users` (`id`)
)

How can I find the solution, do i need to redesign the table to get the ideal system? perhaps it is the last option.

EDIT

I am trying what you suggested earlier, first i added an index in three column:

CREATE INDEX profiles_country_gender_birth_date_index on `profiles`(country,gender,birth_date);

and i tried to select Count(*) without JOIN with profile_detail:

SELECT
    count(*)


FROM
    `profiles` AS a 
    FORCE INDEX ( profiles_country_gender_birth_date_index )
    JOIN `users` AS b ON b.id = a.uid 
and 
a.country = 'INDONESIA' 

    AND a.gender =1 
    AND a.birth_date BETWEEN NOW()- INTERVAL 35 YEAR 
    AND NOW()- INTERVAL 25 YEAR 

result timing is not stable between 0.7sec to 0.35sec and i don't know why it could be. Below is Explain Query Plan in Json Format just in case to help to find out the culprit.

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "114747.38"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "a",
          "access_type": "range",
          "possible_keys": [
            "profiles_country_gender_birth_date_index"
          ],
          "key": "profiles_country_gender_birth_date_index",
          "used_key_parts": [
            "country",
            "gender",
            "birth_date"
          ],
          "key_length": "246",
          "rows_examined_per_scan": 94066,
          "rows_produced_per_join": 32961,
          "filtered": "100.00",
          "index_condition": "((`restfulapi`.`a`.`gender` = 1) and (`restfulapi`.`a`.`country` = 'INDONESIA') and (`restfulapi`.`a`.`birth_date` between <cache>((now() - interval 35 year)) and <cache>((now() - interval 25 year))))",
          "cost_info": {
            "read_cost": "15858.00",
            "eval_cost": "6592.23",
            "prefix_cost": "75194.00",
            "data_read_per_join": "16M"
          },
          "used_columns": [
            "uid",
            "birth_date",
            "gender",
            "country"
          ]
        }
      },
      {
        "table": {
          "table_name": "b",
          "access_type": "eq_ref",
          "possible_keys": [
            "PRIMARY"
          ],
          "key": "PRIMARY",
          "used_key_parts": [
            "id"
          ],
          "key_length": "3",
          "ref": [
            "restfulapi.a.uid"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 32961,
          "filtered": "100.00",
          "using_index": true,
          "cost_info": {
            "read_cost": "32961.15",
            "eval_cost": "6592.23",
            "prefix_cost": "114747.38",
            "data_read_per_join": "89M"
          },
          "used_columns": [
            "id"
          ]
        }
      }
    ]
  }
}

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Calling Functions With Indexed Columns (query line: 38): When a function is used directly on an indexed column, the database's optimizer won’t be able to use the index. For example, if the column `a_birth_date` is indexed, the index won’t be used as it’s wrapped with the function `TIMESTAMPDIFF`. If you can’t find an alternative condition that won’t use a function call, a possible solution is to store the required value in a new indexed column.
  2. Avoid Optimizer Hints (modified query below): Using optimizer hints such as FORCE INDEX can be valuable in the short term. When important aspects such as the amount of data or the data distribution change, these hints can do more harm than good.
  3. Avoid Subqueries (query line: 5): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, it's recommended to join a newly created temporary table that holds the data, which also includes the relevant search index.
  4. Avoid Subqueries (query line: 19): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, it's recommended to join a newly created temporary table that holds the data, which also includes the relevant search index.
  5. Avoid using SQL_CALC_FOUND_ROWS (query line: 2): Including SQL_CALC_FOUND_ROWS statements tend to slow down queries significantly as it doesn't scale well. It's recommended to split this query to two: a data selection query and a counting query.
  6. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  7. Sort and Limit Before Joining (modified query below): In cases where the joins aren't filtering any rows, it's possible to sort and limit the amount of rows using a subquery in the FROM clause, before applying the joins to all other tables.
Optimal indexes for this query:
ALTER TABLE `profile_details` ADD INDEX `profile_details_idx_kids_pr_current_smoking_uid` (`kids_pref`,`current_relationship`,`smoking_pref`,`uid`);
The optimized query:
SELECT
        SQL_CALC_FOUND_ROWS a_uid,
        b.NAME 
    FROM
        (SELECT
            a.uid AS a_uid,
            a.country AS a_country,
            a.gender AS a_gender,
            a.birth_date AS a_birth_date 
        FROM
            `profiles` AS a LIMIT 33) AS a 
    JOIN
        `users` AS b 
            ON b.id = a.a_uid 
            AND a.a_country = 'INDONESIA' 
            AND a.a_gender = 0 
    JOIN
        (
            SELECT
                a.uid 
            FROM
                profile_details AS a 
            JOIN
                profile_details AS kids 
                    ON kids.uid = a.uid 
                    AND kids.kids_pref = 1 
            JOIN
                profile_details AS `current` 
                    ON `current`.uid = a.uid 
                    AND `current`.current_relationship = 1 
            JOIN
                profile_details AS smoking 
                    ON smoking.uid = a.uid 
                    AND smoking.smoking_pref = 1
            ) AS e 
                ON e.uid = a.a_uid 
                AND (
                    TIMESTAMPDIFF(YEAR,
                a.a_birth_date,
                NOW()) BETWEEN 25 AND 35) LIMIT 33

Related Articles



* original question posted on StackOverflow here.