[Solved] Bigquery resources exceeded during query execution - optimization

How to optimize this SQL query?

In case you have your own slow SQL query, you can optimize it automatically here.

For the query above, the following recommendations will be helpful as part of the SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Correlated Subqueries (query line: 35): A correlated subquery is a subquery that contains a reference (column: country) to a table that also appears in the outer query. Usually correlated queries can be rewritten with a join clause, which is the best practice. The database optimizer handles joins much better than correlated subqueries. Therefore, rephrasing the query with a join will allow the optimizer to use the most efficient execution plan for the query.
  2. Explicitly ORDER BY After GROUP BY (modified query below): By default, the database sorts all 'GROUP BY col1, col2, ...' queries as if you specified 'ORDER BY col1, col2, ...' in the query as well. If a query includes a GROUP BY clause but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying 'ORDER BY NULL'.
The optimized query:
SELECT
        agg.event_date,
        agg.country,
        COUNT(*) AS sessions,
        AVG(length) AS average_session_length 
    FROM
        (SELECT
            session.country,
            session.event_date,
            global_session_id,
            (MAX(session.event_timestamp) - MIN(session.event_timestamp)) / (60 * 1000 * 1000) AS length 
        FROM
            (SELECT
                user_pseudo_id,
                event_timestamp,
                country,
                event_date,
                SUM(is_new_session) OVER (ORDER 
            BY
                user_pseudo_id,
                event_timestamp) AS global_session_id,
                SUM(is_new_session) OVER (PARTITION 
            BY
                user_pseudo_id 
            ORDER BY
                event_timestamp) AS user_session_id 
            FROM
                (SELECT
                    *,
                    CASE 
                        WHEN last.event_timestamp - last_event >= (30 * 60 * 1000 * 1000) 
                        OR last_event IS NULL THEN 1 
                        ELSE 0 END AS is_new_session 
FROM
(SELECT
    `xxx.events*`.user_pseudo_id,
    `xxx.events*`.event_timestamp,
    geo.country,
    `xxx.events*`.event_date,
    LAG(`xxx.events*`.event_timestamp,
    1) OVER (PARTITION 
BY
    user_pseudo_id 
ORDER BY
    `xxx.events*`.event_timestamp) AS last_event 
FROM
    `xxx.events*`) last) final) session 
GROUP BY
global_session_id,
session.country,
session.event_date 
ORDER BY
NULL) agg 
WHERE
length >= (
10 / 60
) 
GROUP BY
agg.country,
agg.event_date 
ORDER BY
NULL

Related Articles



* original question posted on StackOverflow here.