[Solved] Better way to do distinct rollup in redshift?
Looking to automatically optimize YOUR SQL query? Start for free.

EverSQL Database Performance Knowledge Base

Better way to do distinct rollup in redshift?

What is the best way to write a query that does a rollup of distinct counts over various discrete time ranges on redshift?

For example, if you have some phone numbers collected for leads in various organizations and want to find out how many distinct phone numbers were created weekly, monthly, quarterly, etc... what is the best way to do it?

This is the best I could come up with:

SELECT
  organization,
  sum(weekly) as weekly,
  sum(monthly) as monthly,
  sum(quarterly) as quarterly,
  sum(yearly) as yearly
FROM (
    SELECT 
      organization,
      COUNT(DISTINCT phoneNumber) as weekly,
      null as monthly,
      null as quarterly,
      null as yearly
    FROM Lead
    WHERE createdAt >= current_date - interval '7 days'
    GROUP BY organization

      UNION ALL

    SELECT
      organization,
      null as weekly,
      COUNT(DISTINCT phoneNumber) as monthly,
      null as quarterly,
      null as yearly
    FROM Lead
    WHERE createdAt >= current_date - interval '1 month'
    GROUP BY organization

        UNION ALL

    SELECT
      organization,
      null as weekly,
      null as monthly,
      COUNT(DISTINCT phoneNumber) as quarterly,
      null as yearly
    FROM Lead
    WHERE createdAt >= current_date - interval '3 months'
    GROUP BY organization

        UNION ALL

    SELECT
      organization,
      null as weekly,
      null as monthly,
      null as quarterly,
      COUNT(DISTINCT phoneNumber) as yearly
    FROM Lead
    WHERE createdAt >= current_date - interval '1 year'
    GROUP BY organization
) GROUP BY organization

Any way to make the query faster / easier to understand?

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Explicitly ORDER BY After GROUP BY (modified query below): By default, the database sorts all 'GROUP BY col1, col2, ...' queries as if you specified 'ORDER BY col1, col2, ...' in the query as well. If a query includes a GROUP BY clause but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying 'ORDER BY NULL'.
The optimized query:
SELECT
        organization,
        sum(weekly) AS weekly,
        sum(monthly) AS monthly,
        sum(quarterly) AS quarterly,
        sum(yearly) AS yearly 
    FROM
        (SELECT
            Lead.organization,
            COUNT(DISTINCT Lead.phoneNumber) AS weekly,
            NULL AS monthly,
            NULL AS quarterly,
            NULL AS yearly 
        FROM
            Lead 
        WHERE
            Lead.createdAt >= current_date - INTERVAL '7 days' 
        GROUP BY
            Lead.organization 
        UNION
        ALL SELECT
            Lead.organization,
            NULL AS weekly,
            COUNT(DISTINCT Lead.phoneNumber) AS monthly,
            NULL AS quarterly,
            NULL AS yearly 
        FROM
            Lead 
        WHERE
            Lead.createdAt >= current_date - INTERVAL '1 month' 
        GROUP BY
            Lead.organization 
        UNION
        ALL SELECT
            Lead.organization,
            NULL AS weekly,
            NULL AS monthly,
            COUNT(DISTINCT Lead.phoneNumber) AS quarterly,
            NULL AS yearly 
        FROM
            Lead 
        WHERE
            Lead.createdAt >= current_date - INTERVAL '3 months' 
        GROUP BY
            Lead.organization 
        UNION
        ALL SELECT
            Lead.organization,
            NULL AS weekly,
            NULL AS monthly,
            NULL AS quarterly,
            COUNT(DISTINCT Lead.phoneNumber) AS yearly 
        FROM
            Lead 
        WHERE
            Lead.createdAt >= current_date - INTERVAL '1 year' 
        GROUP BY
            Lead.organization
    ) 
GROUP BY
    organization 
ORDER BY
    NULL

Related Articles



* original question posted on StackOverflow here.