Design Locking Serivce

Design locking serivce
Created @September 27, 2023 5:52 PM
Last Edited Time @October 18, 2023 4:59 PM
Type
Created By ế
khi t cao
Tags
Types:
(From chatgpt)
Atomic operation: using $set, $inc —> refactor business handle logic, avoiding get data and then
procced.
Message queue pattern: instead of updating DB directly, we could queue updates command
sequentially
Distributed locking:
Optimistic locking: take notes records (involve date, timestamp,…) to ensure it hasn’t been
updated by another thread, and then perform my updates.
Use updateAndReturn to mark timestamp record and then update later.
Saga pattern: manage distributed transactions across multiple services, instead of a single
transaction spanning multiple services, each service executes its own local transaction and emits
event to trigger subsequent actions in other services
Event sourcing pattern: store and process events that represents changes to DB. Each service
publish events when it performs operation on DB, other service subscribe to that events to update
their own data. By relying on log of events, we can ensure updates are applied in correct order
CQRS (command query responsibility segregation): separate read and write operation
Idempotent operation: design microservice to support idempotent operation, an idempotent

operation can be executed multiple times without changing result beyond the initial application, u
can handle retries and duplicate request without causing data inconsistencies.
Transaction and Distributed lock
💡 Outline
Design locking serivce 1

1. Motivation:
Data race condition…
Criteria when design micro-services that avoiding race condition
2. Common phenomenon race condition

Isolation level in DB…
Basic ACID: principles of DB ensure data integrity

Compare to MongoDB and analyze ACID, transaction in mongoDB
3. Explain common solution & synchronization types

Locking mechanism
Lock type: pessimistic & optimistic (document version,…)
4. Distributed cache and Redis usecase
Fault tolerance is gained by the nature of distributed and replica across storage nodes
References:
Slide presentation: https://www.canva.com/design/DAFwjbmpxmk/rsjC7nztswGSepOcUOJjjw/edit?
utm_content=DAFwjbmpxmk&utm_campaign=designshare&utm_medium=link2&utm_source=sharebutton
Redlock: https://redis.io/docs/manual/patterns/distributed-locks/#liveness-arguments
Lock with fencing token: https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-

locking.html#:~:text=In this context%2C a fencing,pause and the lease expires.
Distributed lock general and comparation: https://careers.saigontechnology.com/blog-

detail/implement-distributed-lock-for-a-microservices-software-system#toc_1
Distributed lock: https://www.youtube.com/watch?v=szw54UbPJRE
Distributed Lock design

Problem
Distributed lock is same as normal locking when multiple processes are trying to access 1 files (shared
memory)
Capacity estimate
When multiple machines in nodes try to access shared resources, we can summarize into 3 operations:
Grab the lock
Do operations: reading, writing
Release the lock (fencing token)

Consider requirement:
💡 Need to satisfy Safety and Liveness property
Single instance Lock (1 Redis master node)
💡 Lock instance is general term and could be anymore (ZooKeeper, AWS S3), Redis is just a
specific implementation
Lack of safety and liveness property and explanation:
Safety: refer to the state of lock correctly and guarantee that only one process that holding lock,
following situations could compromise:
Network or node failure: if lock service (Redis) fail or become unreachable, the lock can not be
released and potentially leading multiple process holding the lock or lock could not be released
after process done
Redis crash or restart: lock will be lost and other process could hold a lock again —> leading
concurrent holding lock
Liveness:
Network congestion and latency: potentially increase dramatically acquire and release lock, this
could lead to livelock, which is processes are constantly waiting for lock without making progress
Performance bottleneck: a single redis instance may struggle to handle the load, resulting in
increased latency and reduced liveness.
To overcome these situations, a more robust and fault-tolerant approach is required.
💡 Single instance will have limitations: Single point of failure —> add replicas but Why Failover-
based implementation won’t enough
The solution is add replicas to our cluster, but it will violate Rule 1: Safety property
When master go down before sync lock to replica, its replica will be elected master without knowing
about lock before —> lead to other process could hold the lock
Redlock with multiple instance (N redis master node)

There are some limitations with Redlock, its safety depends on lots of timing assumptions:
All redis nodes hold keys for approximately the right length of time (because there are delay in
expired time length between nodes) that network delay is small compared to expiry duration.

The process running is much shorter than expiry duration —> fencing token will solve this problem.
Safety violation
Deadlock by client holding lock is crashed or too long
Assure lock holding by one client must not be deleted by another client: when client holding
lock longer than validity time of lock, it will automatically released and be acquired by another
client, and old client after done processing could remove lock of another client
—> Simplest solution is using timestamp with micro precision concatenate with locking key.
Master down before sync lock to replica
DEL is unsafe, could lead to remove another lock by client after expired validity lock
Performance, crash recovery and fsync:
Problem:
Assume client has acquired 3 out of 5 instances, but one of these is down and restart, now
there are 3 instances could grant lock to another clients
Solution
With standalone redis instance: we could enable AOF persistence, it could remain data after
loss
With multi redis nodes: need to ensure correctly data replication between replicas, after the
election of leader, data will be remain
Another simple but bad solution is that we add delay to restarted instance that greater than
TTL of currently active lock, ensure that it can not rejoin into previous lock operation. But
delayed restart have a drawback of lacking availability, for instance, there are majority of
instance was down, my system could be unavailable, meaning no resource at all could be
locked.
Fault tolerant:
when service holdings lock is suddenly down, this will make other machines trying get lock is
blocked and fail
Or when process holding lock running too long, could be stuck somewhere,
Use fencing token, which is a number indicate version of record in storage. Bear in mind not
using timestamp because it’s not reliable in because distributed system is fault tolerant and
what happen when machine go down, it restart and timestamp will be different
Redis don’t support fencing token generation in several nodes because syncing counter on
multiple nodes will be out of sync and we need a strictly consensus algorithm

“stop-the-world GC pause” could be simply considered a long running process
—> Simplest solution is set expired time holding lock
—> Setup periodically send heartbeat to service holding lock, in case doesn’t have response, we
can assume that service is dead and release the lock
💡 we can use heartbeat with some complexity in our lock service to increase TTL of lock.
The important of consensus algorithm, suppose we have 3 nodes, when machine want grab a lock, it
need to grab all of 3 locking nodes for successfully lock grabbing
💡 Consensus algorithm ensure consistent view of system despite potential failure, network
partitions or delay in distributed nodes. It can used to coordinate and establish agreement
among multiple nodes on a particular value or decision.
—> The important building distributes system is lock is replicated in fault tolerant matter
Design distributed
Distributed lock in majority of nodes
Having odd number of nodes, one of these is leader node, every time write is sent to leader, and
leader to forward write to all follower, when leader receive response ok from all follower, it going
ahead and commit that locally and tell all other nodes that right ok
Leader election, we have heartbeat service and will elect new leader
Consider scaling
Imagine when release lock, a thousand of processes trying to grab the lock on single raft instance, that’s
bad situation
Solution is using linked-list for queueing process, use some of sorting time

Overview problem
Efficiency: taking lock save u from unnecessary workload (e.g. 2 processes doing the same
computation & taking expensive cost)
Correctness: Taking lock prevent from multiple processes modify the same piece of data, which lead
to corrupted, data loss & permanent inconsistency
Distributed lock
💡 Make lock safe with fencing token
Requirements:
Lock service could strictly generate monotonically increasing tokens
Storage could actively role checking token validity as well reject write on any token has gone
backward.
Fencing token with Redlock algorithm:
Redis can not safely generate fencing token, simply keeping a counter on node won’t be sufficient as
node could fall, in case multiple nodes, counter would go out of sync. And if we want use Redlock,
we need implement consensus algorithm like Raft or Paxos
Raft machine


Design Locking Serivce

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Design Locking Serivce

Uploaded by

Copyright:

Available Formats

Design locking serivce

Created @September 27, 2023 5:52 PM

Last Edited Time @October 18, 2023 4:59 PM

Use updateAndReturn to mark timestamp record and then update later.

Idempotent operation: design microservice to support idempotent operation, an idempotent

Transaction and Distributed lock

Design locking serivce 1

Criteria when design micro-services that avoiding race condition

2. Common phenomenon race condition

Basic ACID: principles of DB ensure data integrity

3. Explain common solution & synchronization types

Lock type: pessimistic & optimistic (document version,…)

4. Distributed cache and Redis usecase

Lock with fencing token: https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-

Distributed lock general and comparation: https://careers.saigontechnology.com/blog-

Distributed lock: https://www.youtube.com/watch?v=szw54UbPJRE

Distributed Lock design

Grab the lock

Do operations: reading, writing

Release the lock (fencing token)

Design locking serivce 2

💡 Need to satisfy Safety and Liveness property

Single instance Lock (1 Redis master node)

Lack of safety and liveness property and explanation:

To overcome these situations, a more robust and fault-tolerant approach is required.

Redlock with multiple instance (N redis master node)

Design locking serivce 3

Deadlock by client holding lock is crashed or too long

Master down before sync lock to replica

Performance, crash recovery and fsync:

Design locking serivce 4

—> Simplest solution is set expired time holding lock

Distributed lock in majority of nodes

Design locking serivce 5

💡 Make lock safe with fencing token

Lock service could strictly generate monotonically increasing tokens

Fencing token with Redlock algorithm:

Design locking serivce 6

You might also like