[Solved] Top N Results Ordered by Joined Table Column
Looking to automatically optimize YOUR SQL query? Start for free.

EverSQL Database Performance Knowledge Base

Top N Results Ordered by Joined Table Column

Database type:

I'm trying to optimize a fairly simple query that just gets a list of the top N users ordered by the number of followers they have. I am running into performance issues when I run the query a certain way, and I'm not sure I understand why the other way is better.

I have two tables, user_account, and following, with about 30,000 and 9,000 records, respectively.

I also created this materialized view follower_count_mv to hold cached follower counts for each user:

CREATE MATERIALIZED VIEW follower_count_mv AS
SELECT followed_user_id AS user_id,  COUNT(*)::int AS follower_count FROM following
GROUP BY followed_user_id;

There are indexes on user_account.user_id, follower_count_mv.user_id, and (follower_count_mv.user_id, follower_count_mv.follower_count).

Here is the first, slower query:

  SELECT ua.user_id, f.follower_count                                                                                                     
  FROM user_account ua
  LEFT OUTER JOIN follower_count_mv f ON (ua.user_id = f.user_id)
  ORDER BY f.follower_count DESC NULLS LAST
  LIMIT 3

and its explain analyze:

Limit  (cost=42756.30..42756.31 rows=3 width=54) (actual time=70.219..70.220 rows=3 loops=1)
->  Sort  (cost=42756.30..42831.88 rows=30234 width=54) (actual time=70.217..70.217 rows=3 loops=1)
     Sort Key: f.follower_count
     Sort Method: top-N heapsort  Memory: 25kB
     ->  Merge Left Join  (cost=0.69..42365.53 rows=30234 width=54) (actual time=0.010..65.185 rows=30234 loops=1)
           Merge Cond: ((ua.user_id)::text = (f.followed_user_id)::text)
           ->  Index Scan using user_account_pkey on user_account ua  (cost=0.41..41893.22 rows=30234 width=50) (actual time=0.003..17.876 rows=30234 loops=1)
           ->  Index Only Scan using follower_count_mv_user_id_follower_count_idx on follower_count_mv fr  (cost=0.28..316.52 rows=6416 width=41) (actual time=0.004..0.864 rows=6416 loops=1)
                 Heap Fetches: 0
 Planning time: 0.233 ms
 Execution time: 70.240 ms

Here is the second, faster query:

  SELECT ua.user_id, f.follower_count                                                                                                     
  FROM user_account ua
  LEFT OUTER JOIN follower_count_mv f ON (ua.user_id = f.user_id)
  WHERE ua.user_id IN (SELECT user_id FROM follower_count_mv ORDER BY follower_count DESC limit 3)

and its explain analyze:

Nested Loop Left Join  (cost=400.19..425.80 rows=3 width=54) (actual time=2.689..2.785 rows=3 loops=1)
 ->  Nested Loop  (cost=399.91..424.85 rows=3 width=50) (actual time=2.666..2.722 rows=3 loops=1)
     ->  HashAggregate  (cost=399.49..399.52 rows=3 width=37) (actual time=2.627..2.627 rows=3 loops=1)
           Group Key: ("ANY_subquery".followed_user_id)::text
           ->  Subquery Scan on "ANY_subquery"  (cost=399.45..399.49 rows=3 width=37) (actual time=2.622..2.624 rows=3 loops=1)
                 ->  Limit  (cost=399.45..399.46 rows=3 width=41) (actual time=2.620..2.620 rows=3 loops=1)
                       ->  Sort  (cost=399.45..415.49 rows=6416 width=41) (actual time=2.620..2.620 rows=3 loops=1)
                             Sort Key: follower_count_mv.follower_count
                             Sort Method: top-N heapsort  Memory: 25kB
                             ->  Index Only Scan using follower_count_mv_user_id_follower_count_idx on follower_count_mv  (cost=0.28..316.52 rows=6416 width=41) (actual time=0.008..1.312 rows=6416 loops=1)
                                   Heap Fetches: 0
     ->  Index Scan using user_account_pkey on user_account ua  (cost=0.41..8.43 rows=1 width=50) (actual time=0.030..0.030 rows=1 loops=3)
           Index Cond: ((user_id)::text = ("ANY_subquery".followed_user_id)::text)
 ->  Index Only Scan using follower_count_mv_user_id_follower_count_idx on follower_count_mv fr  (cost=0.28..0.31 rows=1 width=41) (actual time=0.019..0.019 rows=1 loops=3)
     Index Cond: (followed_user_id = (ua.user_id)::text)
     Heap Fetches: 0
 Planning time: 0.603 ms
 Execution time: 2.834 ms

Am I missing something? Is the query planner not smart enough to figure out it should query follower_count_mv for the top 3 results first, and then join on that? Is this correct Postgres behaviour? Is there a better way to optimize this query?

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
CREATE INDEX follower_mv_idx_user_id_follower_count ON "follower_count_mv" ("user_id","follower_count" desc);
CREATE INDEX follower_mv_idx_follower_count ON "follower_count_mv" ("follower_count" desc);
The optimized query:
SELECT
        ua.user_id,
        ua.follower_count 
    FROM
        user_account ua 
    LEFT OUTER JOIN
        follower_count_mv f 
            ON (
                ua.user_id = f.user_id
            ) 
    ORDER BY
        f.follower_count DESC NULLS LAST LIMIT 3

Related Articles



* original question posted on StackOverflow here.