[Solved] Select all rows for the first N distinct child table rows

EverSQL Database Performance Knowledge Base

Select all rows for the first N distinct child table rows

It seems like this should be a common problem with a simple solution, but I haven't found it. I would like to compute child_order, which is the order of appearance of distinct child table rows as is shown below in the following data:

child_order   PK1  PK2       ACCESS     ACCESS_ID
1             99   Al        NULL       NULL
2             55   Charles   Accounts   1
2             55   Charles   Desktop    2
2             55   Charles   Printer    3
2             55   Charles   Servers    4
2             55   Charles   VMs        5
3             66   Charles   Desktop    2
3             66   Charles   VMs        5
4             22   Chris     Desktop    2
4             22   Chris     Printer    3
4             22   Chris     Servers    4
5             89   Evan      Desktop    2

Retrieved by a query like:

SELECT sub1.*
FROM (
    SELECT ??? as child_order, sub2.*
    FROM (
        SELECT ct.PK1, ct.PK2, pt1.ACCESS, pt1.ACCESS_ID
        FROM child_table ct
        LEFT JOIN some_linktable lt ON lt.child_id = ct.id
        LEFT JOIN parent_table1 pt1 ON lt.parent_id = pt1.id
        WHERE ct.PK2 IN ('Charles', 'Evan', 'Al', 'Chris')
        ORDER BY ct.PK2, pt1.ACCESS -- Order must be preserved
    ) sub2
) sub1
WHERE child_order < 10 AND (other_conditions)

I can use subqueries, aggregates, analytics, etc. but not really CTEs/"WITH" statements or temporary tables because of the complexity of generating SQL for them dynamically.

Specifically, I am generating pagination SQL (for several DBMSs) for search results from a query joining several tables. I am trying to figure out how to simply show the top N rows, not counting repeats due to a join (e.g. Chris counts as only one row. Access shows "Desktop, Printer, Servers").

I've tried DENSE_RANK() OVER (ORDER BY PK1, PK2), but of course I get ranking in PK1 PK2 order, which is useless for the WHERE clause. Al, for example, would get a value higher than 1.

I've tried DENSE_RANK() OVER (ORDER BY PK2, ACCESS), but it enumerates only the search terms, not the child table rows.

I've tried DENSE_RANK() OVER (PARTITION BY PK2, ACCESS ORDER BY (SELECT NULL)) (to get DENSE_RANK to use the row order it is given, which is how I want to rank values) but only "1" is returned.

I'll omit my other "try random stuff"-phase attempts.

I would like to avoid having a SELECT DISTINCT PK1, PK2 WHERE (search) ORDER BY (sortorder) subquery because there may be zero or very many primary key fields so dynamic SQL generation would be tricky and, additionally, I suspect the performance would suck with all the WHERE sub3.field1 = sub2.field1 AND sub3.field2 = sub2.field2... checks.

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Subqueries (query line: 8): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, it's recommended to join a newly created temporary table that holds the data, which also includes the relevant search index.
  2. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
ALTER TABLE `child_table` ADD INDEX `child_table_idx_pk2` (`PK2`);
ALTER TABLE `parent_table1` ADD INDEX `parent_table1_idx_id` (`id`);
ALTER TABLE `some_linktable` ADD INDEX `some_linktable_idx_child_id` (`child_id`);
The optimized query:
SELECT
        sub1.* 
    FROM
        (SELECT
            ? ? ? AS child_order,
            sub2.* 
        FROM
            (SELECT
                ct.PK1,
                ct.PK2,
                pt1.ACCESS,
                pt1.ACCESS_ID 
            FROM
                child_table ct 
            LEFT JOIN
                some_linktable lt 
                    ON lt.child_id = ct.id 
            LEFT JOIN
                parent_table1 pt1 
                    ON lt.parent_id = pt1.id 
            WHERE
                ct.PK2 IN (
                    'Charles', 'Evan', 'Al', 'Chris'
                ) 
            ORDER BY
                ct.PK2,
                pt1.ACCESS) sub2) sub1 
        WHERE
            child_order < 10 
            AND (
                sub1.other_conditions
            )

Related Articles



* original question posted on StackOverflow here.