[Solved] postgres query optimisation to avoid hash right join

How to optimize this SQL query?

In case you have your own slow SQL query, you can optimize it automatically here.

For the query above, the following recommendations will be helpful as part of the SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Selecting Unnecessary Columns (query line: 50): Avoid selecting all columns with the '*' wildcard, unless you intend to use them all. Selecting redundant columns may result in unnecessary performance degradation.
  2. Prefer Inner Join Over Left Join (modified query below): We identified that one or more left joined entities (e.g. `nonanon_customer`) are used in the 'where' clause, in a way that allows to replace it with an optimized inner join. Inner joins can be fully optimized by the database, while Left joins apply limitations on the database's optimizer.
The optimized query:
WITH memberships AS (SELECT
        dwh.fact_membership.customer_sk,
        dwh.fact_membership.membership_sk,
        dwh.fact_membership.membership_state,
        dwh.fact_membership.membership_b2b_type,
        dwh.fact_membership.membership_sml_type,
        dwh.fact_membership.membership_start_date,
        dwh.fact_membership.membership_end_date,
        dwh.fact_membership.membership_pause_from,
        dwh.fact_membership.membership_pause_to,
        dwh.fact_membership.covid_pause_start_date,
        dwh.fact_membership.covid_pause_end_date,
        dwh.fact_membership.city_sk AS membership_city_region_sk,
        dwh.fact_membership.sport_persona_current,
        dwh.fact_membership.membership_cancellation_reason,
        dwh.fact_membership.membership_sequence_nr_reverse,
        dwh.fact_membership.company_sk,
        dwh.fact_membership.company_name 
    FROM
        dwh.fact_membership 
    WHERE
        dwh.fact_membership.membership_is_urban_sports IS TRUE), request_cancellation AS (SELECT
        staging.request_cancellation.membership_sk,
        staging.request_cancellation.requested_cancellation_last_date 
    FROM
        staging.request_cancellation), blacklisted_emails AS (SELECT
        dwh_userdata.blacklist_emails.customer_sk,
        dwh_userdata.blacklist_emails.email,
        'blacklisted' AS blacklisted 
    FROM
        dwh_userdata.blacklist_emails), nonanon_customer AS (SELECT
        dwh_userdata.customer.id,
        dwh_userdata.customer.first_name,
        dwh_userdata.customer.last_name,
        dwh_userdata.customer.email 
    FROM
        dwh_userdata.customer), nonanon_customer_address_prep AS (SELECT
        dwh_userdata.customer_address.customer_id,
        dwh_userdata.customer_address.city,
        dwh_userdata.customer_address.state,
        dwh_userdata.customer_address.country,
        dwh_userdata.customer_address.zip,
        row_number() OVER (PARTITION 
    BY
        customer_id 
    ORDER BY
        dwh_userdata.customer_address.created_at DESC) AS row_number 
    FROM
        dwh_userdata.customer_address), nonanon_customer_address AS (SELECT
        * 
    FROM
        nonanon_customer_address_prep 
    WHERE
        nonanon_customer_address_prep.row_number = 1), favorite_sport_category_prep_1 AS (SELECT
        dwh.report_venue_visitors.membership_sk,
        dwh.report_venue_visitors.service_top_category_name,
        count(DISTINCT dwh.report_venue_visitors.booking_sk) AS cnt_booking 
    FROM
        dwh.report_venue_visitors 
    WHERE
        dwh.report_venue_visitors.booking_is_valid 
    GROUP BY
        1,
        2), favorite_sport_category_prep_2 AS (SELECT
        favorite_sport_category_prep_1.membership_sk,
        favorite_sport_category_prep_1.service_top_category_name,
        favorite_sport_category_prep_1.cnt_booking,
        row_number() OVER (PARTITION 
    BY
        membership_sk 
    ORDER BY
        favorite_sport_category_prep_1.cnt_booking DESC,
        favorite_sport_category_prep_1.service_top_category_name) AS row_number 
    FROM
        favorite_sport_category_prep_1), favorite_sport_category AS (SELECT
        favorite_sport_category_prep_2.membership_sk,
        favorite_sport_category_prep_2.service_top_category_name AS favourite_sport_category,
        favorite_sport_category_prep_2.cnt_booking 
    FROM
        favorite_sport_category_prep_2 
    WHERE
        favorite_sport_category_prep_2.row_number = 1), free_trial AS (SELECT
        DISTINCT dwh.report_memberships.membership_sk,
        dwh.report_memberships.customer_sk,
        dwh.report_memberships.trial_status AS free_trial_status,
        dwh.report_memberships.trial AS free_trial_length,
        dwh.report_memberships.trial_start_date AS free_trial_start,
        dwh.report_memberships.trial_end_date AS free_trial_end 
    FROM
        dwh.report_memberships 
    WHERE
        dwh.report_memberships.trial_status IS NOT NULL 
        AND dwh.report_memberships.trial_start_date >= '2020-06-23') SELECT
        c.customer_sk AS named_user,
        CASE 
            WHEN c.gender IN ('M',
            'F') THEN c.gender 
            ELSE NULL END AS gender,
nc.first_name,
nc.last_name,
customer_language,
anss.state AS newsletter_status,
dl.city_name AS membership_city_region,
dl.country_code AS membership_country_code,
dl.country_name AS membership_country_name,
dl.admin1 AS membership_administrative_state,
m.membership_sk,
m.membership_state,
m.membership_b2b_type,
m.company_sk,
m.company_name,
m.membership_sml_type,
CASE 
    WHEN m.membership_start_date IS NOT NULL THEN CONCAT(TO_CHAR(m.membership_start_date,
    'YYYY-MM-DD'),
    'T00:00:00') 
    ELSE NULL END AS membership_start_date,
CASE 
    WHEN m.membership_end_date IS NOT NULL THEN CONCAT(TO_CHAR(m.membership_end_date,
    'YYYY-MM-DD'),
    'T00:00:00') 
    ELSE NULL END AS membership_end_date,
ft.free_trial_status,
ft.free_trial_length,
CASE 
    WHEN ft.free_trial_start IS NOT NULL THEN CONCAT(TO_CHAR(ft.free_trial_start,
    'YYYY-MM-DD'),
    'T00:00:00') 
    ELSE NULL END AS free_trial_start,
CASE 
    WHEN ft.free_trial_end IS NOT NULL THEN CONCAT(TO_CHAR(ft.free_trial_end,
    'YYYY-MM-DD'),
    'T00:00:00') 
    ELSE NULL END AS free_trial_end,
CASE 
    WHEN m.membership_pause_from IS NOT NULL THEN CONCAT(TO_CHAR(m.membership_pause_from,
    'YYYY-MM-DD'),
    'T00:00:00') 
    ELSE NULL END AS membership_pause_from,
CASE 
    WHEN m.membership_pause_to IS NOT NULL THEN CONCAT(TO_CHAR(m.membership_pause_to,
    'YYYY-MM-DD'),
    'T00:00:00') 
    ELSE NULL END AS membership_pause_to,
CASE 
    WHEN m.covid_pause_start_date IS NOT NULL THEN CONCAT(TO_CHAR(m.covid_pause_start_date,
    'YYYY-MM-DD'),
    'T00:00:00') 
    ELSE NULL END AS covid_pause_start_date,
CASE 
    WHEN m.covid_pause_end_date IS NOT NULL THEN CONCAT(TO_CHAR(m.covid_pause_end_date,
    'YYYY-MM-DD'),
    'T00:00:00') 
    ELSE NULL END AS covid_pause_end_date,
CASE 
    WHEN rc.requested_cancellation_last_date IS NOT NULL THEN CONCAT(TO_CHAR(rc.requested_cancellation_last_date,
    'YYYY-MM-DD'),
    'T00:00:00') 
    ELSE NULL END AS requested_cancellation_last_date,
memberships.membership_cancellation_reason,
be.blacklisted AS blacklist_email,
fsc.favourite_sport_category AS fav_sports_category,
m.sport_persona_current,
ambd.membership_months_active,
ambd.membership_months_total,
ambd.is_gm1_positive,
ambd.cnt_bookings_total,
ambd.cnt_bookings_last_30_days_total,
ambd.cnt_bookings_last_30_days_onsite,
ambd.cnt_bookings_onsite,
ambd.cnt_bookings_online,
ambd.cnt_bookings_last_30_days_online,
CASE 
    WHEN ambd.latest_booking_date IS NOT NULL THEN CONCAT(TO_CHAR(ambd.latest_booking_date,
    'YYYY-MM-DD'),
    'T00:00:00') 
    ELSE NULL END AS latest_booking_date,
ambd.avg_bookings_active_month,
ambd.last_checkin_type,
ambd.fav_sports_category_onsite,
ambd.fav_sports_category_online,
ambd.fav_studio_last_30_days,
ambd.fav_studio_group_website 
FROM
dwh.dim_customer c 
INNER JOIN
nonanon_customer nc 
    ON nc.id = c.customer_sk 
LEFT JOIN
nonanon_customer_address nca 
    ON nca.customer_id = c.customer_sk 
LEFT JOIN
memberships m 
    ON c.customer_sk = m.customer_sk 
    AND memberships.membership_sequence_nr_reverse = 1 
LEFT JOIN
request_cancellation rc 
    ON m.membership_sk = rc.membership_sk 
LEFT JOIN
dwh.dim_location dl 
    ON m.membership_city_region_sk = dl.city_sk 
LEFT JOIN
blacklisted_emails be 
    ON be.email = nc.email 
LEFT JOIN
favorite_sport_category fsc 
    ON fsc.membership_sk = m.membership_sk 
LEFT JOIN
staging.airship_newsletter_subscription_status anss 
    ON anss.customer_id = c.customer_sk 
LEFT JOIN
free_trial ft 
    ON ft.customer_sk = m.customer_sk 
LEFT JOIN
staging.airship_membership_booking_details ambd 
    ON ambd.membership_sk = m.membership_sk 
    AND memberships.membership_sequence_nr_reverse = 1 
WHERE
be.blacklisted IS NULL 
AND nc.email NOT LIKE '%delete%' 
AND nc.email IS NOT NULL 
AND (
    (
        m.membership_sk IS NULL 
        AND anss.state = 'subscribed'
    ) 
    OR m.membership_state IS NOT NULL
)

Related Articles



* original question posted on StackOverflow here.