[Solved] how to speed up query when using count and group by

EverSQL Database Performance Knowledge Base

how to speed up query when using count and group by

Database type:

I have two tables named seller and item. They are connected through a third table (seller_item) using a "n" to "m" foreign key relation.

Now I a try to answer the requirement: "I as a seller want a list of my competitors with a count of items I am selling and they are selling as well". So a list of all sellers with the count of overlapping items in relation to one specific seller. Also I want this to be sorted by count and limited. But the query is using temp table and filesort which is very slow. Explain says:

Using where; Using index; Using temporary; Using filesort

How can I speed this up ?

Here is the query:

SELECT
          COUNT(*) AS itemCount,
          s.sellerName
        FROM
          seller s,
          seller_item si
        WHERE
          si.itemId IN 
           (SELECT itemId FROM seller_item WHERE sellerId = 4711)
        AND
          si.sellerId=s.id
        GROUP BY
          sellerName
        ORDER BY
          itemCount DESC
        LIMIT 50;

the table defs:

CREATE TABLE `seller` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`sellerName` varchar(50) NOT NULL
PRIMARY KEY (`id`),
UNIQUE KEY `unique_index` (`sellerName`),
) ENGINE=InnoDB 

contains about 200.000 rows

--

CREATE TABLE `item` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`itemName` varchar(20) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_index` (`itemName`),
) ENGINE=InnoDB

contains about 100.000.000 rows

--

CREATE TABLE `seller_item` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`sellerId` bigint(20) unsigned NOT NULL,
`itemId` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `sellerId` (`sellerId`,`itemId`),
KEY `item_id` (`itemId`),
CONSTRAINT `fk_1` FOREIGN KEY (`sellerId`) REFERENCES `seller` (`id`)            ON DELETE CASCADE ON UPDATE NO ACTION,
CONSTRAINT `fk_2` FOREIGN KEY (`itemId`) REFERENCES `item` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=InnoDB

contains about 170.000.000 rows

Database is Mysql Percona 5.6

Output of EXPLAIN:

+----+-------------+-------------+--------+----------------------+-----    ---------+---------+---------------------+------+----------------------------------------------+
| id | select_type | table       | type   | possible_keys        | key                | key_len | ref                 | rows | Extra                                           |
+----+-------------+-------------+--------+----------------------+--------------+---------+---------------------+------+----------------------------------------------+
|  1 | SIMPLE      | s           | index  | PRIMARY,unique_index | unique_index | 152     | NULL                |    1 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | si          | ref    | sellerId,item_id     | sellerId     | 8       | tmp.s.id            |    1 | Using index                                  |
|  1 | SIMPLE      | seller_item | eq_ref | sellerId,item_id     |  sellerId     | 16      | const,tmp.si.itemId |    1 | Using where; Using  index                     |
+----+-------------+-------------+--------+----------------------+--------------+---------+---------------------+------+----------------------------------------------+

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  2. Replace In Subquery With Correlated Exists (modified query below): In many cases, an EXISTS subquery with a correlated condition will perform better than a non correlated IN subquery.
Optimal indexes for this query:
ALTER TABLE `seller` ADD INDEX `seller_idx_id_sellername` (`id`,`sellerName`);
ALTER TABLE `seller_item` ADD INDEX `seller_item_idx_sellerid` (`sellerId`);
ALTER TABLE `seller_item` ADD INDEX `seller_item_idx_itemid_sellerid` (`itemId`,`sellerId`);
The optimized query:
SELECT
        COUNT(*) AS itemCount,
        s.sellerName 
    FROM
        seller s,
        seller_item si 
    WHERE
        EXISTS (
            SELECT
                1 
            FROM
                seller_item AS seller_item1 
            WHERE
                (
                    seller_item1.sellerId = 4711
                ) 
                AND (
                    si.itemId = seller_item1.itemId
                )
        ) 
        AND si.sellerId = s.id 
    GROUP BY
        s.sellerName 
    ORDER BY
        itemCount DESC LIMIT 50

Related Articles



* original question posted on StackOverflow here.