[Solved] Alter search query condition for dataset in Drupal 7
Looking to automatically optimize YOUR SQL query? Start for free.

EverSQL Database Performance Knowledge Base

Alter search query condition for dataset in Drupal 7

I am very new to Drupal and attempting to build a module that will allow admins to tag nodes with keywords to boost nodes to the top of the search results.

I have a separate DB table for the keywords and respective node IDs. This table is UNIONed with the search_index table via hook_query_alter...

function mos_search_result_forcer_query_alter(QueryAlterableInterface &$query) {
    if (get_class($query) !== 'PagerDefault') { //<< check this because this function will mod all queries elsewise
        return;
    }

    // create unioned search index result set...
    $index = db_select('search_index', 's');

    $index->addField('s', 'sid');
    $index->addField('s', 'word');
    $index->addField('s', 'score');
    $index->addField('s', 'type');

    $msrfi = db_select('mos_search_result_forcer', 'm');

    $msrfi->addField('m', 'nid', 'sid');
    $msrfi->addField('m', 'keyword', 'word');
    $msrfi->addExpression('(SELECT MAX(score) + m.id / (SELECT MAX(id) FROM {mos_search_result_forcer}) FROM {search_index})', 'score');
    $msrfi->addExpression(':type', 'type', array(':type' => 'node'));

    $index->union($msrfi);

    $tables =& $query->getTables();

    $tables['i']['table'] = $index;

    return $query;
}

Drupal then generates the almost correct query...

SELECT 
    i.type AS type, i.sid AS sid, SUM(CAST('10' AS DECIMAL) * COALESCE(( (12.048628015788 * i.score * t.count)), 0) / CAST('10' AS DECIMAL)) AS calculated_score 
FROM (
    SELECT 
        s.sid AS sid, s.word AS word, s.score AS score, s.type AS type 
    FROM 
        search_index s 
    UNION SELECT 
        m.nid AS sid, m.keyword AS word, (
            SELECT 
                MAX(score) + m.id / (SELECT MAX(id) FROM mos_search_result_forcer) 
            FROM 
                search_index
        ) AS score, 'node' AS type 
    FROM 
        mos_search_result_forcer m
) i 
INNER JOIN node n ON n.nid = i.sid 
INNER JOIN search_total t ON i.word = t.word 
INNER JOIN search_dataset d ON i.sid = d.sid AND i.type = d.type 
WHERE (n.status = '1') 
AND( (i.word = 'turtles') )
AND (i.type = 'node') 

/* this is the problem line... */
AND( (d.data LIKE '% turtles %' ESCAPE '\\') )
/* ...end problem line */

GROUP BY i.type, i.sid 
HAVING (COUNT(*) >= '1') 
ORDER BY calculated_score DESC 
LIMIT 10 OFFSET 0

...I need that "problem line" to read...

AND( (d.data LIKE '% turtles %' ESCAPE '\\') OR (d.sid IN (SELECT nid FROM mos_search_result_forcer)) )

...what hook can I use to add that OR condition?

Thanks, smart people!

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid LIKE Searches With Leading Wildcard (query line: 59): The database will not use an index when using like searches with a leading wildcard (e.g. '% turtles %'). Although it's not always a satisfactory solution, please consider using prefix-match LIKE patterns (e.g. 'TERM%').
  2. Avoid OFFSET In LIMIT Clause (query line: 70): OFFSET clauses can be very slow when used with high offsets (e.g. with high page numbers when implementing paging). Instead, use the following \u003ca target\u003d"_blank" href\u003d"http://www.eversql.com/faster-pagination-in-mysql-why-order-by-with-limit-and-offset-is-slow/"\u003eseek method\u003c/a\u003e, which provides better and more stable response rates.
  3. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  4. Prefer Sorting/Grouping By Columns From Same Table (modified query below): The database cannot use an index for the sorting/grouping operation if the ORDER BY/GROUP BY clause contains columns from different tables.
  5. Push Filtering Conditions Into Subqueries (modified query below): Parts of the WHERE clause can pushed from the outer query to a subquery / union clause. Applying those conditions as early as possible will allow the database to scan less data and run the query more efficiently.
  6. Use Numeric Column Types For Numeric Values (query line: 4): Referencing a numeric value (e.g. 10) as a string in a WHERE clause might result in poor performance. Possible impacts of storing numbers as varchars: more space will be used, you won't be able to perform arithmetic operations, the data won't be self-validated, aggregation functions like SUM won't work, the output may sort incorrectly and more. If the column is numeric, remove the quotes from the constant value, to make sure a numeric comparison is done.
  7. Use Numeric Column Types For Numeric Values (query line: 5): Referencing a numeric value (e.g. 10) as a string in a WHERE clause might result in poor performance. Possible impacts of storing numbers as varchars: more space will be used, you won't be able to perform arithmetic operations, the data won't be self-validated, aggregation functions like SUM won't work, the output may sort incorrectly and more. If the column is numeric, remove the quotes from the constant value, to make sure a numeric comparison is done.
  8. Use Numeric Column Types For Numeric Values (query line: 47): Referencing a numeric value (e.g. 1) as a string in a WHERE clause might result in poor performance. Possible impacts of storing numbers as varchars: more space will be used, you won't be able to perform arithmetic operations, the data won't be self-validated, aggregation functions like SUM won't work, the output may sort incorrectly and more. If the column is numeric, remove the quotes from the constant value, to make sure a numeric comparison is done.
  9. Use UNION ALL instead of UNION (query line: 22): Always use UNION ALL unless you need to eliminate duplicate records. By using UNION ALL, you'll avoid the expensive distinct operation the database applies when using a UNION clause.
Optimal indexes for this query:
ALTER TABLE `mos_search_result_forcer` ADD INDEX `mos_result_idx_id` (`id`);
ALTER TABLE `node` ADD INDEX `node_idx_status` (`status`);
ALTER TABLE `search_dataset` ADD INDEX `search_dataset_idx_sid_type` (`sid`,`type`);
ALTER TABLE `search_index` ADD INDEX `search_index_idx_word_type` (`word`,`type`);
ALTER TABLE `search_total` ADD INDEX `search_total_idx_word` (`word`);
The optimized query:
SELECT
        i.type AS type,
        i.sid AS sid,
        SUM(CAST('10' AS DECIMAL) * COALESCE(((12.048628015788 * i.score * t.count)),
        0) / CAST('10' AS DECIMAL)) AS calculated_score 
    FROM
        (SELECT
            s.sid AS sid,
            s.word AS word,
            s.score AS score,
            s.type AS type 
        FROM
            search_index s 
        WHERE
            (
                s.word = 'turtles'
            ) 
            AND (
                s.type = 'node'
            ) 
        UNION
        SELECT
            m.nid AS sid,
            m.keyword AS word,
            (SELECT
                MAX(score) + m.id / (SELECT
                    MAX(mos_search_result_forcer.id) 
                FROM
                    mos_search_result_forcer) 
            FROM
                search_index) AS score,
                'node' AS type 
            FROM
                mos_search_result_forcer m) i 
        INNER JOIN
            node n 
                ON n.nid = i.sid 
        INNER JOIN
            search_total t 
                ON i.word = t.word 
        INNER JOIN
            search_dataset d 
                ON i.sid = d.sid 
                AND i.type = d.type 
        WHERE
            (
                n.status = '1'
            ) 
            AND (
                (
                    1 = 1
                )
            ) 
            AND (
                1 = 1
            ) 
            AND (
                (
                    d.data LIKE '% turtles %' ESCAPE '\\'
                )
            ) 
        GROUP BY
            d.type,
            d.sid 
        HAVING
            (
                COUNT(*) >= '1'
            ) 
        ORDER BY
            calculated_score DESC LIMIT 10 OFFSET 0

Related Articles



* original question posted on StackOverflow here.