2 minute read

Distributed Locking with Redisson

Apart from locking in a relational database, a distributed locking system might be needed for certain environments.
In cases where multiple databases or APIs need to be included in the same transaction, a distributed locking system is required.

Redisson is one of the popular distributed locks.
It uses an algorithm called Redlock, which is implemented based on a clustering system.
Essentially, if there are N number of nodes, a lock is considered to be acquired if it is gained from (N / 2 + 1) nodes.

Martin Kleppmann states that Redlock is not safe to use for ensuring the correctness in a distributed application, mainly for two reasons:

  1. More than one client can be considered to have gained a lock due to the possibility of clock jumps.
    The scenario is as follows: for a total of 5 nodes, Client A gains the lock of nodes a, b, and c. Client B gains the lock of nodes c, d, and e. Client B could gain the lock of node c due to a clock jump on node c causing lock expiration.
  2. If a lock held by Client A expires while Client A still thinks it is holding the lock, the lock can be acquired by Client B.
    This would cause a data race, as Client B writes before Client A. This could happen due to a network issue or a GC pause of Client A.

Let’s analyze issue number 1 first. According to Martin Kleppmann, this occurs when an administrator manually adjusts the clock or when time synchronization occur with an NTP server.

Antirez suggests that the first issue can be mitigated by avoiding manual adjustments, and the second one by gradually adjusting the time over a longer period, rather than attempting to sync a significant time gap all at once or just use monotonic time API.

As long as clock jumping is not an issue, a node that lost a lock due to crashes and restarts can be resolved by the restart delay of Redlock. (If there is no delay, the node that lost the lock kicks in and can give the lock to another client.)

For number 2, a fencing token would help in avoiding race conditions, but Redlock does not generate a fencing token.

Unless the storage implements the validation of the fencing token, the validation logic should be implemented in the application or a compensatory action (e.g., rollback) should be taken.

This can be achieved since every lock is signed with a random string, and Redisson validates it when releasing the lock. It throws errors in Java like the one below.

lockExpireError

There are some options to handle this issue.

  • As mentioned earlier, compensatory action can be implemented to handle this issue.
  • Rollback, error handling, or retrying can be options.
  • A compensation event can be issued if event sourcing is implemented.

As long as clock jumps are well managed and handled properly for lock expiration, I believe Redlock could be a great option for distributed locks.

This article was created by Crocoder7. It is not to be copied without permission.

References

Leave a comment