[Solved] Google BigQuery Taking too Long, but small change

EverSQL Database Performance Knowledge Base

Google BigQuery Taking too Long, but small change

One of the annoying features of BigQuery is its unpredictable behavior.

The last such annoyance can be illustrated in this query

SELECT 
COUNT(DISTINCT large_table.event_id ) AS large_table_count,
COUNT(DISTINCT small_table.pk) AS small_table_count
FROM large_table
LEFT JOIN small_table ON
  large_table.ga_user_id = small_table.ga_user_id

Where large_table ~2.5GB, and small_table is ~250mb

When I join on a ga_user_id, the query takes ~60 seconds. More importantly, it finishes and it the size of tables can be increased without a significant increase in time. (see screenshot 1).

When I join on a date string, however, the query's time shoots up to ~140 seconds. (see screenshot #2) More importantly, the performance does not scale linearly with data size. The query just does not finish even after 3,000 seconds.

A couple observations: 1) snapshot_date has less variability than ga_user_id 2) Both ga_user_id and snapshot_date are strings

What am I missing?

enter image description here enter image description here

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
ALTER TABLE `small_table` ADD INDEX `small_table_idx_ga_id` (`ga_user_id`);
The optimized query:
SELECT
        COUNT(DISTINCT large_table.event_id) AS large_table_count,
        COUNT(DISTINCT small_table.pk) AS small_table_count 
    FROM
        large_table 
    LEFT JOIN
        small_table 
            ON large_table.ga_user_id = small_table.ga_user_id

Related Articles



* original question posted on StackOverflow here.