[Solved] PostgreSQL optimization: Select road edges where start and end vertex is in some set

EverSQL Database Performance Knowledge Base

PostgreSQL optimization: Select road edges where start and end vertex is in some set

Database type:

The pgRouting pgr_drivingDistance function returns only ids of road network vertices, but without ids of road edges in specific drive time.
To find this road edges I created the query:

SELECT all_roads.id, the_geom
    FROM
        (
            SELECT e.id, e.the_geom
            FROM tmp_ddist dd,
                tmp_edge e
            WHERE
                e.target = dd.vertex 
        UNION
            SELECT e.id, e.the_geom
            FROM tmp_ddist dd,
                tmp_edge e
            WHERE
                e.source = dd.vertex 
        ) all_roads
    GROUP BY all_roads.id, the_geom
    HAVING COUNT(all_roads.id) = 2

Basically it finds road edges which have source and target vertices in set generated by pgr_drivingDistance function and returns only those edges which have both source and target in this set.

Is there any way to optimize this query?

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
  2. Push Filtering Conditions Into Subqueries (modified query below): Parts of the WHERE clause can pushed from the outer query to a subquery / union clause. Applying those conditions as early as possible will allow the database to scan less data and run the query more efficiently.
  3. Use UNION ALL instead of UNION (query line: 18): Always use UNION ALL unless you need to eliminate duplicate records. By using UNION ALL, you'll avoid the expensive distinct operation the database applies when using a UNION clause.
Optimal indexes for this query:
CREATE INDEX tmp_ddist_idx_vertex ON "tmp_ddist" ("vertex");
CREATE INDEX tmp_edge_idx_target ON "tmp_edge" ("target");
CREATE INDEX tmp_edge_idx_source ON "tmp_edge" ("source");
The optimized query:
SELECT
        all_roads.id,
        all_roads.the_geom 
    FROM
        (SELECT
            e.id,
            e.the_geom 
        FROM
            tmp_ddist dd,
            tmp_edge e 
        WHERE
            e.target = dd.vertex 
        HAVING
            (
                COUNT(e.id) = 2
            ) 
        UNION
        SELECT
            e.id,
            e.the_geom 
        FROM
            tmp_ddist dd,
            tmp_edge e 
        WHERE
            e.source = dd.vertex 
        HAVING
            (
                COUNT(e.id) = 2
            )
    ) all_roads 
GROUP BY
    all_roads.id,
    all_roads.the_geom 
HAVING
    COUNT(all_roads.id) = 2

Related Articles



* original question posted on StackOverflow here.