[Solved] Performance of a Postgres query

How to optimize this SQL query?

In case you have your own slow SQL query, you can optimize it automatically here.

For the query above, the following recommendations will be helpful as part of the SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Avoid Subqueries (query line: 11): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, it's recommended to join a newly created temporary table that holds the data, which also includes the relevant search index.
  2. Avoid Subqueries (query line: 30): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, it's recommended to join a newly created temporary table that holds the data, which also includes the relevant search index.
  3. Avoid Subqueries (query line: 50): We advise against using subqueries as they are not optimized well by the optimizer. Therefore, it's recommended to join a newly created temporary table that holds the data, which also includes the relevant search index.
  4. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  5. Prefer Inner Join Over Left Join (modified query below): We identified that one or more left joined entities (e.g. `c`) are used in the 'where' clause, in a way that allows to replace it with an optimized inner join. Inner joins can be fully optimized by the database, while Left joins apply limitations on the database's optimizer.
  6. Push Filtering Conditions Into Subqueries (modified query below): Parts of the WHERE clause can pushed from the outer query to a subquery / union clause. Applying those conditions as early as possible will allow the database to scan less data and run the query more efficiently.
Optimal indexes for this query:
CREATE INDEX person_idx_user_id_gender_age ON "person" ("user_id","gender","age");
CREATE INDEX person_idx_user_id_gender_household ON "person" ("user_id","gender","household_id");
CREATE INDEX person_idx_household_id_age ON "person" ("household_id","age");
The optimized query:
SELECT
        a.household_id household_id,
        age_of_youngest_woman,
        b.number_of_children,
        c.number_of_men,
        fertility_cond_prob_number_of_children.cond_prob cond_prob_number_of_children,
        fertility_cond_age.cond_prob cond_prob_age,
        fertility_cond_prob_number_of_children.cond_prob * fertility_cond_age.cond_prob total_cond_prob,
        random() <= (874. / 1703.) is_newborn_male 
    FROM
        (SELECT
            person.household_id,
            MIN(person.age) age_of_youngest_woman 
        FROM
            person 
        WHERE
            (
                person.user_id = 1
            ) 
            AND (
                person.gender = 'FEMALE'
            ) 
            AND (
                person.age >= 18
            ) 
        GROUP BY
            person.household_id) a 
    LEFT JOIN
        (
            SELECT
                person.household_id,
                COUNT(*) number_of_children 
            FROM
                person 
            WHERE
                (
                    person.user_id = 1
                ) 
                AND (
                    person.gender = 'CHILD'
                ) 
            GROUP BY
                person.household_id
        ) b 
            ON (
                a.household_id = b.household_id
            ) 
    INNER JOIN
        (
            SELECT
                person.household_id,
                COUNT(*) number_of_men 
            FROM
                person 
            WHERE
                (
                    person.user_id = 1
                ) 
                AND (
                    person.gender = 'MALE'
                ) 
                AND (
                    person.age >= 18
                ) 
            GROUP BY
                person.household_id 
            HAVING
                (
                    number_of_men > 0
                )
        ) c 
            ON (
                a.household_id = c.household_id
            ) 
    LEFT JOIN
        fertility_cond_prob_number_of_children 
            ON (
                fertility_cond_prob_number_of_children.number_of_children = b.number_of_children
            ) 
    LEFT JOIN
        fertility_cond_age 
            ON (
                fertility_cond_age.age = age_of_youngest_woman
            ) 
    WHERE
        (
            1 = 1
        ) 
        AND (
            random() <= (
                fertility_cond_prob_number_of_children.cond_prob * fertility_cond_age.cond_prob
            )
        )

Related Articles



* original question posted on StackOverflow here.