[Solved] Checking for missing data in SQL

EverSQL Database Performance Knowledge Base

Checking for missing data in SQL

Database type:

I am having a hard time with this not knowing if there's a solution for this.

I am trying to detect missing hourly data. Sample:

Table HRLY_DATA:

NAME          HOUR
Me            0
Me            1
Me            2
Me            3
Me            6
Me            7
You           0
You           1
You           2
You           3
You           4
You           5
You           6
You           7

As you can see, [HOUR] data of Me is missing 4 and 5. I want a query that will output:

NAME          HOUR
Me            4
Me            5

For now, here's what I've got:

SELECT d.NAME, HR FROM HRs c
LEFT OUTER JOIN
   (
   SELECT distinct a.NAME
   FROM HRLY_DATA a
   INNER JOIN
      (
      SELECT NAME FROM
         (
         SELECT NAME, count(*) as CNT
         FROM
            (
            SELECT DISTINCT NAME, HOUR
            FROM HRLY_DATA
            ) as i
         GROUP BY NAME
         ) as ii
      WHERE CNT < 8
      ) as b
   ON a.NAME=b.NAME
   ) as d
ON c.HR=d.HOUR
WHERE d.HOUR IS NULL

HRs

HR
0
1
2
3
4
5
6
7

I am getting this output:

NAME          HR
NULL          4
NULL          5

Data for HOUR will range only from 0 - 7..

BTW, I am using SQL SERVER/ MSSQL for this.

:(

Sorry if I can't explain my problem clearly. :(

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Correlated Subqueries (query line: 8): A correlated subquery is a subquery that contains a reference (column: NAME) to a table that also appears in the outer query. Usually correlated queries can be rewritten with a join clause, which is the best practice. The database optimizer handles joins much better than correlated subqueries. Therefore, rephrasing the query with a join will allow the optimizer to use the most efficient execution plan for the query.
  2. Avoid Subqueries (query line: 21): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, it's recommended to join a newly created temporary table that holds the data, which also includes the relevant search index.
The optimized query:
SELECT
        d.NAME,
        c.HR 
    FROM
        HRs c 
    LEFT OUTER JOIN
        (
            SELECT
                DISTINCT a.NAME 
            FROM
                HRLY_DATA a 
            INNER JOIN
                (
                    SELECT
                        ii.NAME 
                    FROM
                        (SELECT
                            i.NAME,
                            count(*) AS CNT 
                        FROM
                            (SELECT
                                DISTINCT HRLY_DATA.NAME,
                                HOUR 
                            FROM
                                HRLY_DATA) AS i 
                        GROUP BY
                            i.NAME) AS ii 
                        WHERE
                            CNT < 8) AS b 
                            ON a.NAME = b.NAME) AS d 
                                ON c.HR = d.HOUR 
                        WHERE
                            d.HOUR IS NULL

Related Articles



* original question posted on StackOverflow here.