You are on page 1of 50

DISTRIBUTED SYSTEMS

Principles and Paradigms


Second Edition
ANDREW S. TANENBAUM
MAARTEN VAN STEEN

Chapter 7
Consistency And Replication

Reasons for Replication

Data are replicated to increase the


reliability of a system.
Replication for performance

Scaling in numbers
Scaling in geographical area

Caveats

Consistency issues
Gain in performance
Cost of increased bandwidth for
maintaining replication

Data-centric Consistency Models

Figure 7-1. The general organization of a logical


data store, physically distributed and
replicated across multiple processes.

Continuous Consistency (0)

Conit = CONsistency UniT


Continuous consistency ranges
Specify consistency practically

Order = number of uncommitted updates


Staleness = time since last update
Numeric = number of unseen updates

Specify consistency semantically

Numeric = Difference between local and


committed values
Absolute (by |v-v'|)
Relative (by |v-v'|/v)

Continuous Consistency (2)

Figure 7-3. Choosing the appropriate granularity for a conit.


(a) Two updates lead to update propagation.

Continuous Consistency (1)


tentative
Committed
x=2
y=0

Committed
x=0
y=0

Deviation
x=0
y=5

Deviation
x=6
y=3

order
numerical

Figure 7-2. An example of keeping track of consistency


deviations [adapted from (Yu and Vahdat, 2002)].

Order = # uncommitted local updates; Numerical (#unseen remote updates,


max dev. between remote tentative and local committed); gray = committed

Continuous Consistency (3)

Figure 7-3. Choosing the appropriate granularity for a conit.


(b) No update propagation is needed (yet).
Tradeoffs: high traffic, false sharing if conit too large; too stale, overhead too
great if conit too small.

Access Consistency
Forms of access consistency

Atomic Consistency
Sequential Consistency
Causal Consistency
Processor Consistency

Notation:
Wi(x)a = Process i wrote object x with value a
Ri(x)a = Process i read object x with value a

Sequential Consistency (1)

Figure 7-4. Behavior of two processes operating


on the same data item. The horizontal axis is time.
R(x) NIL

W(x) a

R(x) a

A sequential order satisfying the observations

Sequential Consistency (2)


A data store is sequentially consistent when:
The result of any execution is the same as if
the (read and write) operations by all
processes on the data store
were executed in some sequential order
and
the operations of each individual process
appear

in this sequence
in the order specified by its program.

Sequential Consistency (3)

Forces W(x)b first

Forces W(x)a first

Figure 7-5. (a) A sequentially consistent data store.


(b) A data store that is not sequentially consistent.

Sequential Consistency (4)

Figure 7-6. Three concurrently-executing processes.

Sequential Consistency (5)


Initial values all 0
P1
P1
P2
P2
P3
P3

00
10
11

P1
P2
P2
P1
P3
P3

10
10
11

P2
P3
P3
P2
P1
P1

01
01
11

P2
P1
P3
P2
P1
P3

11
11
11

Figure 7-7. Four valid execution sequences for the processes of


Fig. 7-6. The vertical axis is time.
Signature is output of processes in process ID order: P1, P2, P3

Causal Consistency (1)


For a data store to be considered causally
consistent, it is necessary that the store obeys
the following condition:
Writes that are potentially causally related
must be seen by all processes
in the same order.
Concurrent writes
may be seen in a different order
on different machines.

Causal Consistency (2)

Causal relation element

Figure 7-8. This sequence is allowed with a causally-consistent


store, but not with a sequentially consistent store.
Induced digraph is acyclic

Causal Consistency (3)


W(x)a before W(x)b causally
Sees W(x)b before W(x)a

Figure 7-9. (a) A violation of a causally-consistent store.

Causal Consistency (4)


W(x)a not causally related to W(x)b

Figure 7-9. (b) A correct sequence of events


in a causally-consistent store.

Processor Consistency(1)
For a data store to be considered processor
consistent, it is necessary that the store obeys
the following condition:
Writes that are made by the same processor
must be seen by all processes
in the same order.
Writes by different processes
may be seen in a different order
on different machines.

Processor Consistency (2)


W(x)a before W(x)b causally
Sees W(x)b before W(x)a

Only one write per processor, so trivially


all processes see them in the same order
as they were made on the writing processor

Figure 7-9. (a) A violation of a causally-consistent store, but which


is processor consistent.

Processor Consistency (3)


W(x)a before W(x)b on same processor

P1:
P2:

W(x)a

P2 sees W(x)b before W(x)a

W(x)b
R(x)b

R(x)a

A violation of processor consistency.

Grouping Operations (1)


Necessary criteria for correct synchronization:

An acquire access of a synchronization variable, not


allowed to perform until all updates to guarded shared
data have been performed with respect to that process.

Before exclusive mode access to synchronization variable


by process is allowed to perform with respect to that
process, no other process may hold synchronization
variable, not even in nonexclusive mode.

After exclusive mode access to synchronization variable


has been performed, any other process next
nonexclusive mode access to that synchronization
variable may not be performed until it has performed with
respect to that variables owner.

Grouping Operations (1.5)


Varieties of synchronization variable consistency:

Weak consistency: sync(S) operation is not performed


until all previous access to variables it protects are
completed with respect to the caller; future operations on
protected set are delayed (wrt caller) until sync done.

Release consistency: Acquire(S) not performed until


previous releases completed, access by caller to any
variable in protection set delayed until acquire performed.
Release(S) not performed until all access operations by
caller on variable in protection set are done.

Entry consistency: Acquire and release are on implicit,


per-object locks. Access is exclusive while lock held, and
acquire forces all previous accesses to complete first

Grouping Operations (2)

Waiting for operation to complete


Figure 7-10. A valid event sequence for entry consistency.

Eventual Consistency

Figure 7-11. The principle of a mobile user accessing


different replicas of a distributed database.

Monotonic Reads (1)


A data store is said to provide monotonic-read
consistency if the following condition holds:
If a process reads the value of a data item x
any successive read operation on x by that
process
will always return that same value
or a more recent value.

Monotonic Reads (1.5)


Notation:
xi[t] = value of object x at store Li at time t
WS(xi[t]) = set of writes to object x at store Li at time
t
WS(xi[t];xj[t']) = writes to object x at store Li at time t
have been applied at store Li before writes to
object x at store Li at time t'
(t may be omitted if clear from context)

Monotonic Reads (2)

Operations of WS(x1) performed at L2 before those of WS(x2)

Figure 7-12. The read operations performed by a single process


P at two different local copies of the same data store.
(a) A monotonic-read consistent data store.

Monotonic Reads (3)

Operations of WS(x1) not performed at L2 before those of WS(x2)

Figure 7-12. The read operations performed by a single process P


at two different local copies of the same data store.
(b) A data store that does not provide monotonic reads.

Monotonic Writes (1)


In a monotonic-write consistent store, the following
condition holds:
A write operation by a process on a data item x
is completed before any successive write
operation on x
by the same process.

Monotonic Writes (2)

Operations of WS(x1) performed at L2 before write of x at L2, W(x2)

Figure 7-13. The write operations performed by a single process P at


two different local copies of the same data store.
(a) A monotonic-write consistent data store.

Monotonic Writes (2)

Operations of WS(x1) performed at L2 before write of x at L2, W(x2)

Figure 7-13. The write operations performed by a single process P at


two different local copies of the same data store.
(a) A monotonic-write consistent data store.

Monotonic Writes (3)

Write of x at L1, W(x1), not performed at L2 before write of x at L2

Figure 7-13. The write operations performed by a single process P


at two different local copies of the same data store. (b) A data
store that does not provide monotonic-write consistency.

Read Your Writes (1)


A data store is said to provide read-your-writes
consistency, if the following condition holds:
The effect of a write operation by a process on data
item x
will always be seen by a successive read
operation on x
by the same process.

Read Your Writes (2)

Operations of WS(x1) performed at L2 before those of WS(x2),


and subsequent read at L2, R(x2)

Figure 7-14. (a) A data store that provides


read-your-writes consistency.

Read Your Writes (3)

Operations of WS(x1) not performed at L2

Figure 7-14. (b) A data store that does not.

Writes Follow Reads (1)


A data store is said to provide writes-follow-reads
consistency, if the following holds:
A write operation by a process
on a data item x following a previous read
operation on x by the same process
is guaranteed to take place on the same or a
more recent value of x that was read.

Writes Follow Reads (2)

More recent value of x

Figure 7-15. (a) A writes-follow-reads consistent data store.

Writes Follow Reads (3)

Value of x read at L2 does not reflect update at L1

Figure 7-15. (b) A data store that does


not provide writes-follow-reads consistency.

Replica-Server Placement(0)
Big issues:
How many replicas
Where to locate replicas
Replica-server placement:
Client-driven incremental
AS-based cells, demand-driven
Geometric cells, cluster-driven

Replica-Server Placement
Cluster spans two cells

Two clusters in same cell

One cluster in one cell

Figure 7-16. Choosing a proper cell size for server placement.

Content Replication and Placement

Figure 7-17. The logical organization of different kinds


of copies of a data store into three concentric rings.

Server-Initiated Replication (0)


Big issues:

When to replicate? Where?

When to migrate? Where?

When to delete? When NOT to delete?


Threshold approach

Server S knows nearest server P to client C

Server S keeps local count of accesses to file F from


clients nearest to P

If total count for F at S is below deletion threshold, then


delete F unless last copy

If total count above replication threshold, then copy

If count for F from P > (total count for F at S)/2, migrate

Server-Initiated Replicas
Closest server
to C1 and C2

Figure 7-18. Counting access requests from different clients.

Client-Initiated Replication
a.k.a. caching
1. Improve access time
2. Flush cache to avoid stale data, and ...
3. ... to make room for more relevant data
4. Especially useful when cache serves
multiple clients, and ...
5. Clients exhibit correlated accesses

State versus Operations


Possibilities for what is to be propagated
(applies to all replication types):
1. Propagate only a notification of an update
Cache invalidation/witness
2. Transfer data from one copy to another
Cache migration
3. Propagate the update operation to other
copies
Cache update

Pull versus Push Protocols


What information is needed? What Actions? Results?

Figure 7-19. A comparison between push-based and pull-based


protocols in the case of multiple-client, single-server systems.

Remote-Write Protocols
W6

, W5

W6 tell client write completed (opt)

Figure 7-20. The principle of a primary-backup protocol.

Local-Write Protocols

W1.5

W1.5: tell primary

W4 often notification, not update


So R1.1 update.req, R1.2 update.cnf

Figure 7-21. Primary-backup protocol in which the primary


migrates to the process wanting to perform an update.

Quorum-based Protocols (0)


N = total number of replicas; each keeps version number
R = read quorum size (number of replicas to probe when
performing a read)
W = write quorum size (number of replicas to probe when
performing a write)
Must ensure any two write sets intersect; and any read set
intersects every write set to get latest version no.
Hence, W > N/2 and R+W > N
Note any W copies will do for a write, and any R copies
for a read (not fixed quorums as in coterie DME)
May have witnesses (keep meta-info only) witness can
alert querier that new version # exists

Quorum-Based Protocols

Figure 7-22. Three examples of the voting algorithm. (a) A


correct choice of read and write set. (b) A choice that
may lead to write-write conflicts. (c) A correct choice,
known as ROWA (read one, write all).
Issues work for read, write; fault tolerance. May have witnesses no copy, just ver.

You might also like