[Solved] Postgres deadlock with read_commited isolation
Looking to automatically optimize YOUR SQL query? Start for free.

EverSQL Database Performance Knowledge Base

Postgres deadlock with read_commited isolation

Database type:

We have noticed a rare occurrence of a deadlock on a Postgresql 9.2 server on the following situation:

T1 starts the batch operation:

UPDATE BB bb SET status = 'PROCESSING', chunk_id = 0 WHERE bb.status ='PENDING' 
AND bb.bulk_id = 1 AND bb.user_id IN (SELECT user_id FROM BB WHERE bulk_id = 1 
AND chunk_id IS NULL AND status ='PENDING' LIMIT 2000)

When T1 commits after a few hundred milliseconds or so (BB has many millions of rows), multiple threads begin new Transactions (one transaction per thread) that read items from BB, do some processing and update them in batches of 50 or so with the queries:

For select:

SELECT *, RANK() as rno OVER(ORDER BY user_id) FROM BB WHERE status = 'PROCESSING' AND bulk_id = 1 and rno = $1

And Update:

UPDATE BB set datetime=$1, status='DONE', message_id=$2 WHERE bulk_id=1 AND user_id=$3

(user_id, bulk_id have a UNIQUE constraint).

Due to an external to the situation problem, another transaction T2 executes the same query with T1 almost immediately after T1 has committed (the initial batch operation where items are marked as 'PROCESSING').

UPDATE BB bb SET status = 'PROCESSING', chunk_id = 0 WHERE bb.status ='PENDING' 
AND bb.bulk_id = 1 AND bb.user_id IN (SELECT user_id FROM BB WHERE bulk_id = 1 
AND chunk_id IS NULL AND status ='PENDING' LIMIT 2000)

However although these items are marked as 'PROCESSING' this query deadlocks with some of the updates (which are done in batches as i said) off the worker threads. To my understanding this should not happen with READ_COMMITTED isolation level (default) that we use. I am sure that T1 has committed because the worker threads execute after it has done so.

edit: One thing i should clear up is that T2 starts after T1 but before it commits. However due to a write_exclusive tuple lock that we acquire with a SELECT for UPDATE on the same row (that is not affected by any of the above queries), it waits for T1 to commit before it runs the batch update query.

How to optimize this SQL query?

The following recommendations will help you in your SQL tuning process.
You'll find 3 sections below:

  1. Description of the steps you can take to speed up the query.
  2. The optimal indexes for this query, which you can copy and create in your database.
  3. An automatically re-written query you can copy and execute in your database.
The optimization process and recommendations:
  1. Create Optimal Indexes (modified query below): The recommended indexes are an integral part of this optimization effort and should be created before testing the execution duration of the optimized query.
Optimal indexes for this query:
CREATE INDEX bb_idx_status_bulk_id_user_id ON "BB" ("status","bulk_id","user_id");
CREATE INDEX bb_idx_bulk_id_chunk_id_status ON "BB" ("bulk_id","chunk_id","status");
The optimized query:
SELECT
        bb.status 
    FROM
        BB bb 
    WHERE
        bb.status = 'PENDING' 
        AND bb.bulk_id = 1 
        AND bb.user_id IN (
            SELECT
                BB.user_id 
            FROM
                BB AS BB 
            WHERE
                BB.bulk_id = 1 
                AND BB.chunk_id IS NULL 
                AND BB.status = 'PENDING' LIMIT 2000
        )

Related Articles



* original question posted on StackOverflow here.