Our love story with Deadlocks (PostgreSQL Edition)

By Core Services Team

Core Services team is responsible for infrastructure and platform development at AUTO1

TL;DR

Always try to keep your transactions short and write your queries to deal with records in a deterministic order with respect to other transactions.

Full Story

Almost every non-trivial system needs some kind of task (job, event, whatever you call it) scheduling functionality at some point. Having more than 250 micro-services, ours was no exception. At some point, it made sense to us to provide such functionality as a service for our developers. With event-store service, one simply can schedule a job to be executed at a specific time in future. There are already several ready to use tools for this specific purpose (e.g. BigBen, Quartz), however, because of several reasons (which are beyond the scope of this post) we decided to implement our own from scratch.

Considering the subject of the post which is Deadlocks, we need a brief introduction on event-store's implementation.

An event on the server side is simply a record in the database. A simplified view of events table is as follow:

id	event_type	due_time	start_process_time	end_process_time
'5c55ea79-4774-49ae-96e2-975af7a41ce6'	'user-cache-flush'	'2019-01-01 00:00:00'	null	null

The following is a bird's-eye view of what happens on the service:

Thread #1
a. The service periodically fetches a bunch of unprocessed events (those that are due and have no process time, neither start nor end) from the database (and marks them as progressing by filling the start_process_time column with the current timestamp). Let's call this fetch-unprocessed query.

b. The service processes the events and marks them as processed (by filling the end_process_time column with the current timestamp). Let's call this save-processed query.

Thread #2
a. The service periodically checks the table and looks for stale events (those that are due and have start_process_time but no end_process_time). Let's call this fetch-stale query.

b. The service marks them back as unprocessed (by setting start_process_time to null). Let's call this release-stale query.

The very first MVP was a simple read-modify-write (anti-)pattern (with the help of Spring, JPA, and Hibernate). It is not hard to guess to what issue this implementation is vulnerable: Deadlocks.

ERROR: deadlock detected
Detail: Process 5234 waits for ShareLock on transaction 3465; blocked by process 467845.
        Process 467845 waits for ShareLock on transaction 96575; blocked by process 5234.
Hint: See server log for query details.
Where: while updating tuple (14954,4) in relation "events"

ERROR: deadlock detected
Detail: Process 10438 waits for ExclusiveLock on tuple (14954,4) of relation 19118 of database 19113; blocked by process 31501.
        Process 31501 waits for ShareLock on transaction 763124271; blocked by process 28450.
        Process 28450 waits for ShareLock on transaction 763124277; blocked by process 28873.
        Process 28873 waits for ExclusiveLock on tuple (14954,4) of relation 19118 of database 19113; blocked by process 10438.
Hint: See server log for query details.
Where: while locking tuple (6984,19) in relation "events"

In the beginning it was not a big deal, because deadlock errors were not frequent and we could live with it. However, very soon, after getting more clients, the deadlock issue (among other problems, e.g. lost-updates) started hurting the quality of the service (e.g. firing events twice).

For the first attempt, we refactored the code to get rid of read-modify-write antipattern:

-- fetch-unprocessed query
UPDATE events event_to_update
SET start_process_time = now()
FROM (
    SELECT id
    FROM events      
    WHERE end_process_time is NULL
            AND start_process_time IS NULL
            AND (due_time BETWEEN ? AND ?)
) matched_event_for_update
WHERE event_to_update.id = matched_event_for_update.id
RETURNING *;

-- release-stale query (including fetch-stale query)
UPDATE events event_to_release
SET start_process_time = NULL
FROM (
    SELECT id
    FROM events
    WHERE end_process_time IS NULL AND start_process_time < ?
) stuck_event
WHERE event_to_release.id = stuck_event.id
RETURNING *;

Obviously this is an improvement, though reduced the number of deadlocks, it didn't solve the issue completely. The quickest solution to eliminate deadlocks was to change the isolation level to Serializable. However this is the last solution we wanted since it would hurt the performance and the quality of the service (think what would happen if one service start a transaction, fetch the events to process, but for some reason become stale).

The better way to deal with deadlock is to analyze transactions and queries to prevent transaction interference by:

making transactions short in terms of time
writing queries to deal with records in a deterministic order

By the first refactoring, we made transactions a bit shorter. But there's still some room for improvement. To apply the second best practice from the above we refactored the queries as follow (essentially ordering and acquiring a lock):

-- fetch-unprocessed query
UPDATE events event_to_update
SET start_process_time = now()
FROM (
    SELECT id
    FROM events      
    WHERE ... -- as before
    ORDER BY ID
    FOR NO KEY UPDATE
) matched_event_for_update
WHERE ... -- as before
RETURNING *;

-- release-stale query (including fetch-stale query)
UPDATE events event_to_release
SET start_process_time = NULL
FROM (
    SELECT id
    FROM events
    WHERE ... -- as before
    ORDER BY ID
    FOR NO KEY UPDATE
) stuck_event
WHERE ... -- as before
RETURNING *;

However, the deadlocks were still appearing intermittently:

ERROR: deadlock detected
Detail: Process 7534 waits for ShareLock on transaction 93897000; blocked by process 13376.
        Process 13376 waits for ShareLock on transaction 93896994; blocked by process 7534.
Hint: See server log for query details.
Where: while rechecking updated tuple (134131,3) in relation "events"

Of course! we still had one transaction/query dealing with records in a non deterministic way: JpaRepository.save(Iterable<S> entities). In the code, after fetching events to be processed, we flag (UPDATE) the process events as processed, and also save (INSERT) new events spawned by the processed events. Here's the catch: the event processing is done in parallel. So the resulting list of events (both processed and newly created ones) is not ordered with respect to other transactions. So the third refactoring was to restore the order of processed events:

/**
 * To process events.
 * The result list will be ordered with respect to the given list.
 * All new events will be added at the end of the result list.
 * @param events events to be processed
 * @return list of processed and created events
 */
private List<Event> doProcessEvents(List<Event> events) {
    ProcessingResult result = eventProcessor.processInParallel(events);

    List<Event> successfulEvents = result.getSucceeded(); // may contain new events;
    List<Event> failedEvents = result.getFailed();
    List<Event> allEvents = ListUtils.union(successfulEvents, failedEvents);

    Map<UUID, Integer> eventIndices = range(0, events.size()).boxed()
            .collect(CollectorUtils.toMap(i -> events.get(i).getId(), i -> i));

    // This should sort according to the original order and put all new events at the end.
    allEvents.sort(comparingInt(event -> eventIndices.getOrDefault(event.getId(), Integer.MAX_VALUE)));

    return allEvents;
}

After this last change, we saw the game for deadlocks was over: "Goodbye Deadlock!"

Stories you might like:

Coding

Spring batch integration

By Artur Yolchyan

Usage cases for spring-batch.

#auto1 #engineering #spring #code #java

Coding

Writing IntelliJ plugins

By Piotr Czekaj

How to write a simple IntelliJ plugin

#intellij

Coding

PostgreSQL 12 - a precious release

By Mariusz Nowak

Improved Common Table Expressions in recent release of Postgres

#postgres #postgres12 #release #sql #cte