[Solved] Small select subquery slows down the whole query about 10 times

EverSQL Database Performance Knowledge Base

Small select subquery slows down the whole query about 10 times

Database type:

I have the following MySQL table:

CREATE TABLE IF NOT EXISTS `pics` (
  `id` mediumint(8) unsigned NOT NULL auto_increment,
  `bnb_id` mediumint(7) unsigned NOT NULL,
  `img_path` varchar(128) NOT NULL,
  `img_path_gallery` varchar(128) NOT NULL,
  `img_path_thumb_small` varchar(128) NOT NULL,
  `img_path_thumb_large` varchar(128) NOT NULL,
  `img_path_thumb_grid` varchar(128) NOT NULL,
  `title` varchar(80) NOT NULL,
  `order` tinyint(2) NOT NULL,
  `upload_date` datetime NOT NULL,
  `state` enum('LOCAL','S3') NOT NULL default 'LOCAL',
  `is_cover` tinyint(1) unsigned default NULL,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `bnb_id_2` (`bnb_id`,`is_cover`),
  KEY `bnb_id` (`bnb_id`),
  KEY `is_cover` (`is_cover`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=30371 ;

is_cover is a field I created to choose only one picture for each bnb_id: it is set to 1 when the picture has been choosen as a cover and to NULL otherwise. I need to LEFT JOIN the table to another one, let's call it bnb; there may be multiple rows in the pics table for each bnb entry (there is a referential integrity bound on bnb_id), but in this case I must extract only one row from the pics table, hence the need for the is_cover coulmn and all the indexing (every other solution I tried produced queries lasting anywhere from 10 to 50 seconds).

Even in this case, though, queries are very slow and each takes anywhere from 5 to 8 seconds of execution on a data pool of about 10000 elements in the bnb table and 30000 in the pics table. Selecting from the table where is_cover = 1 is pretty fast and straightforward, but when put in a larger query everything breaks up.

SELECT subbnb.*, 
            3956 * 2 * ASIN(
                SQRT(
                    POWER(
                        SIN((_LAT - abs(lat)) * pi()/180 / 2), 
                    2) +
                    COS(_LAT * pi()/180 ) * 
                    COS(abs(lat) * pi()/180) * 
                    POWER(
                        SIN((_LNG - abs(lng)) * pi()/180 / 2), 
                    2) 
                )
            ) AS distance,
            prices.price,
            pics.img_path_thumb_grid,
            reviews.count reviewsCount,
            likes.count likesCount
        FROM 
            (SELECT
                bnb.*,
                bnbdata_a.*,
                pos.lat,
                pos.lng

                FROM bnb

                JOIN bnbdata 
                    ON (bnb.id = bnbdata.bnb_id)

                JOIN positions pos
                    ON (bnb.id = pos.bnb_id) 
            ) subbnb

            LEFT JOIN (
                    SELECT *
                    FROM pics 
                    WHERE is_cover = 1
                ) pics
                ON (subbnb.id = pics.bnb_id)


            LEFT JOIN (SELECT price, bnb_id FROM prices WHERE category = "DAILY") prices
                ON (subbnb.id = prices.bnb_id)

            LEFT JOIN (SELECT COUNT(*) AS count, bnb_id FROM reviews GROUP BY bnb_id) reviews
                ON (subbnb.id = reviews.bnb_id)

            LEFT JOIN (SELECT COUNT(*) AS count, bnb_id FROM likes GROUP BY bnb_id) likes
                ON (subbnb.id = likes.bnb_id)
        WHERE
            lng BETWEEN _LNGA AND  _LNGB
            AND lat BETWEEN _LATA AND  _LATB
        HAVING distance < 10
        ORDER BY distance
        LIMIT 0, 25

(the strings you see with _ prepended are actual numerical values)

EXPLAINing the query produces the following result:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   PRIMARY <derived5>  system  NULL    NULL    NULL    NULL    0   const row not found
1   PRIMARY <derived6>  system  NULL    NULL    NULL    NULL    0   const row not found
1   PRIMARY <derived2>  ALL NULL    NULL    NULL    NULL    10522   Using where; Using temporary; Using filesort
1   PRIMARY <derived3>  ALL NULL    NULL    NULL    NULL    7040    
1   PRIMARY <derived4>  ALL NULL    NULL    NULL    NULL    1   
6   DERIVED likes   index   NULL    PRIMARY 6   NULL    1   Using index
5   DERIVED reviews index   NULL    bnb_id  5   NULL    1   Using index
4   DERIVED prices  ALL NULL    NULL    NULL    NULL    1   Using where
3   DERIVED pics    ref is_cover    is_cover    2       11760   Using where
2   DERIVED pos ALL PRIMARY NULL    NULL    NULL    10543   
2   DERIVED bnbdata eq_ref  PRIMARY PRIMARY 3   db.pos.bnb_id   1   
2   DERIVED bnb eq_ref  PRIMARY PRIMARY 3   db.pos.bnb_id   1   

It looks like the is_cover index is being ignored by MySQL (Using where, id 4) but the same happens when I run the small select against the pics table and everything happens very quickly. I can't find the bottleneck in this query, removing the JOIN to pics makes everything way faster but the JOINed subquery itself is quite fast and so is the rest of the big query - even with the math-computing code at the beginning it never goes much past 2 seconds of execution.

Does anybody know where the bottleneck is, and how to get around that?

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid OFFSET In LIMIT Clause (query line: 76): OFFSET clauses can be very slow when used with high offsets (e.g. with high page numbers when implementing paging). Instead, use the following \u003ca target\u003d"_blank" href\u003d"http://www.eversql.com/faster-pagination-in-mysql-why-order-by-with-limit-and-offset-is-slow/"\u003eseek method\u003c/a\u003e, which provides better and more stable response rates.
  2. Avoid Selecting Unnecessary Columns (query line: 2): Avoid selecting all columns with the '*' wildcard, unless you intend to use them all. Selecting redundant columns may result in unnecessary performance degradation.
  3. Avoid Selecting Unnecessary Columns (query line: 12): Avoid selecting all columns with the '*' wildcard, unless you intend to use them all. Selecting redundant columns may result in unnecessary performance degradation.
  4. Avoid Subqueries (query line: 11): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, it's recommended to join a newly created temporary table that holds the data, which also includes the relevant search index.
  5. Avoid Subqueries (query line: 57): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, it's recommended to join a newly created temporary table that holds the data, which also includes the relevant search index.
  6. Explicitly ORDER BY After GROUP BY (modified query below): By default, the database sorts all 'GROUP BY col1, col2, ...' queries as if you specified 'ORDER BY col1, col2, ...' in the query as well. If a query includes a GROUP BY clause but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying 'ORDER BY NULL'.
  7. Prefer Direct Join Over Joined Subquery (query line: 30): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, we recommend to replace subqueries with JOIN clauses.
  8. Prefer Direct Join Over Joined Subquery (query line: 42): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, we recommend to replace subqueries with JOIN clauses.
The optimized query:
SELECT
        subbnb.*,
        3956 * 2 * ASIN(SQRT(POWER(SIN((_LAT - abs(subbnb.lat)) * pi() / 180 / 2),
        2) + COS(_LAT * pi() / 180) * COS(abs(subbnb.lat) * pi() / 180) * POWER(SIN((_LNG - abs(subbnb.lng)) * pi() / 180 / 2),
        2))) AS distance,
        prices.price,
        pics.img_path_thumb_grid,
        reviews.count reviewsCount,
        likes.count likesCount 
    FROM
        (SELECT
            bnb.*,
            bnbdata_a.*,
            pos.lat,
            pos.lng 
        FROM
            bnb 
        JOIN
            bnbdata 
                ON (
                    bnb.id = bnbdata.bnb_id
                ) 
        JOIN
            positions pos 
                ON (
                    bnb.id = pos.bnb_id
                )) subbnb 
    LEFT JOIN
        pics pics 
            ON (
                subbnb.id = pics.bnb_id
            ) 
            AND pics.is_cover = 1 
    LEFT JOIN
        prices prices 
            ON (
                subbnb.id = prices.bnb_id
            ) 
            AND prices.category = prices."DAILY" 
    LEFT JOIN
        (
            SELECT
                COUNT(*) AS count,
                reviews.bnb_id 
            FROM
                reviews 
            GROUP BY
                reviews.bnb_id 
            ORDER BY
                NULL
        ) reviews 
            ON (
                subbnb.id = reviews.bnb_id
            ) 
    LEFT JOIN
        (
            SELECT
                COUNT(*) AS count,
                likes.bnb_id 
            FROM
                likes 
            GROUP BY
                likes.bnb_id 
            ORDER BY
                NULL
        ) likes 
            ON (
                subbnb.id = likes.bnb_id
            ) 
    WHERE
        subbnb.lng BETWEEN _LNGA AND _LNGB 
        AND subbnb.lat BETWEEN _LATA AND _LATB 
    HAVING
        distance < 10 
    ORDER BY
        distance LIMIT 0,
        25

Related Articles



* original question posted on StackOverflow here.