[Solved] How can I quickly search posts/listings using multiple tags?

EverSQL Database Performance Knowledge Base

How can I quickly search posts/listings using multiple tags?

My directory site currently uses the "toxi" table structure for tagging listings.

My posts table has around 5 million records and the map table has around 15 million records.

I use terms to store all kinds of information like author, publisher, subject, medium (e.g. audio, video, etc.). And a post can have multiple multiple terms for each taxonomy (multiple authors, multiple subjects, etc).

Searching for posts based on a single term_id takes about four second to return results which is pretty lousy, but it takes 40 seconds to return results using multiple terms.

I need a more efficient solution, but I can't figure out if it's my queries that are inefficient or my table structure.

== SINGLE TERM SEARCH QUERY ==

SELECT * FROM posts
LEFT JOIN post_taxonomy_term_map ON (posts.ID = post_taxonomy_term_map.object_id)
WHERE post_taxonomy_term_map.term_id=$term1

== MULTIPLE TERMS SEARCH QUERY ==

SELECT p.*
FROM post_taxonomy_term_map m, posts p
WHERE m.term_id IN ($term1, $term2, $term3)
AND p.ID = m.object_id
GROUP BY p.ID
HAVING COUNT( p.ID )=3

TABLES & COLUMNS

posts { ID, post_title, etc... }

PRIMARY ID

taxonomy_terms { term_id, term_label, term_slug, etc. }

PRIMARY term_id

post_taxonomy_term_map { map_id, object_id, taxonomy, term_id}

PRIMARY map_id

INDEX object_id

INDEX term_id

INDEX taxonomy

NOTE: post_taxonomy_term_map.object_id relates to the posts.ID value

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Selecting Unnecessary Columns (query line: 2): Avoid selecting all columns with the '*' wildcard, unless you intend to use them all. Selecting redundant columns may result in unnecessary performance degradation.
  2. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  3. Prefer Inner Join Over Left Join (modified query below): We identified that one or more left joined entities (e.g. `post_taxonomy_term_map`) are used in the 'where' clause, in a way that allows to replace it with an optimized inner join. Inner joins can be fully optimized by the database, while Left joins apply limitations on the database's optimizer.
Optimal indexes for this query:
ALTER TABLE `post_taxonomy_term_map` ADD INDEX `post_term_idx_term_id` (`term_id`);
ALTER TABLE `posts` ADD INDEX `posts_idx_id` (`ID`);
The optimized query:
SELECT
        * 
    FROM
        posts 
    INNER JOIN
        post_taxonomy_term_map 
            ON (
                posts.ID = post_taxonomy_term_map.object_id
            ) 
    WHERE
        post_taxonomy_term_map.term_id = $term1

Related Articles



* original question posted on StackOverflow here.