Chap 07v2 Nemo

DISTRIBUTED SYSTEMS
Principles and Paradigms

Second Edition
ANDREW S. TANENBAUM
MAARTEN VAN STEEN
Chapter 7
Consistency And Replication
Reasons for Replication
Data are replicated to increase the

reliability of a system.
Replication for performance
Scaling in numbers
Scaling in geographical area
Caveats
Consistency issues
Gain in performance
Cost of increased bandwidth for
maintaining replication
Data-centric Consistency Models
Figure 7-1. The general organization of a logical

data store, physically distributed and
replicated across multiple processes.
Continuous Consistency (0)
Conit = CONsistency UniT

Continuous consistency ranges
Specify consistency practically
Order = number of uncommitted updates

Staleness = time since last update
Numeric = number of unseen updates
Specify consistency semantically
Numeric = Difference between local and

committed values
Absolute (by |v-v'|)
Relative (by |v-v'|/v)
Figure 7-3. Choosing the appropriate granularity for a conit.

(a) Two updates lead to update propagation.

tentative
Committed
x=2
y=0
Committed
x=0
y=0
Deviation
x=0
y=5
Deviation
x=6
y=3
order
numerical
Figure 7-2. An example of keeping track of consistency

deviations [adapted from (Yu and Vahdat, 2002)].
Order = # uncommitted local updates; Numerical (#unseen remote updates,

max dev. between remote tentative and local committed); gray = committed
Figure 7-3. Choosing the appropriate granularity for a conit.

(b) No update propagation is needed (yet).
Tradeoffs: high traffic, false sharing if conit too large; too stale, overhead too
great if conit too small.
Access Consistency
Forms of access consistency
Atomic Consistency
Sequential Consistency
Causal Consistency
Processor Consistency
Notation:
Wi(x)a = Process i wrote object x with value a
Ri(x)a = Process i read object x with value a
Sequential Consistency (1)
Figure 7-4. Behavior of two processes operating

on the same data item. The horizontal axis is time.
R(x) NIL
W(x) a
R(x) a
A sequential order satisfying the observations

A data store is sequentially consistent when:
The result of any execution is the same as if
the (read and write) operations by all
processes on the data store
were executed in some sequential order
and
the operations of each individual process
appear
in this sequence
in the order specified by its program.
Forces W(x)b first
Forces W(x)a first
Figure 7-5. (a) A sequentially consistent data store.

(b) A data store that is not sequentially consistent.
Figure 7-6. Three concurrently-executing processes.

Initial values all 0
P1
P1
P2
P2
P3
P3
00
10
11
P1
P2
P2
P1
P3
P3
10
10
11
P2
P3
P3
P2
P1
P1
01
01
11
P2
P1
P3
P2
P1
P3
11
11
11
Figure 7-7. Four valid execution sequences for the processes of

Fig. 7-6. The vertical axis is time.
Signature is output of processes in process ID order: P1, P2, P3
Causal Consistency (1)

For a data store to be considered causally
consistent, it is necessary that the store obeys
the following condition:
Writes that are potentially causally related
must be seen by all processes
in the same order.
Concurrent writes
may be seen in a different order
on different machines.
Causal relation element
Figure 7-8. This sequence is allowed with a causally-consistent

store, but not with a sequentially consistent store.
Induced digraph is acyclic

W(x)a before W(x)b causally
Sees W(x)b before W(x)a
Figure 7-9. (a) A violation of a causally-consistent store.

W(x)a not causally related to W(x)b
Figure 7-9. (b) A correct sequence of events

in a causally-consistent store.
Processor Consistency(1)
For a data store to be considered processor
consistent, it is necessary that the store obeys
the following condition:
Writes that are made by the same processor
must be seen by all processes
in the same order.
Writes by different processes
may be seen in a different order
on different machines.
Processor Consistency (2)

W(x)a before W(x)b causally
Sees W(x)b before W(x)a
Only one write per processor, so trivially

all processes see them in the same order
as they were made on the writing processor
Figure 7-9. (a) A violation of a causally-consistent store, but which

is processor consistent.
Processor Consistency (3)

W(x)a before W(x)b on same processor
P1:
P2:
W(x)a
P2 sees W(x)b before W(x)a
W(x)b
R(x)b
R(x)a
A violation of processor consistency.
Grouping Operations (1)

Necessary criteria for correct synchronization:
An acquire access of a synchronization variable, not

allowed to perform until all updates to guarded shared
data have been performed with respect to that process.
Before exclusive mode access to synchronization variable

by process is allowed to perform with respect to that
process, no other process may hold synchronization
variable, not even in nonexclusive mode.
After exclusive mode access to synchronization variable

has been performed, any other process next
nonexclusive mode access to that synchronization
variable may not be performed until it has performed with
respect to that variables owner.
Grouping Operations (1.5)

Varieties of synchronization variable consistency:
Weak consistency: sync(S) operation is not performed

until all previous access to variables it protects are
completed with respect to the caller; future operations on
protected set are delayed (wrt caller) until sync done.
Release consistency: Acquire(S) not performed until

previous releases completed, access by caller to any
variable in protection set delayed until acquire performed.
Release(S) not performed until all access operations by
caller on variable in protection set are done.
Entry consistency: Acquire and release are on implicit,

per-object locks. Access is exclusive while lock held, and
acquire forces all previous accesses to complete first
Grouping Operations (2)
Waiting for operation to complete

Figure 7-10. A valid event sequence for entry consistency.
Eventual Consistency
Figure 7-11. The principle of a mobile user accessing

different replicas of a distributed database.
Monotonic Reads (1)

A data store is said to provide monotonic-read
consistency if the following condition holds:
If a process reads the value of a data item x
any successive read operation on x by that
process
will always return that same value
or a more recent value.
Monotonic Reads (1.5)

Notation:
xi[t] = value of object x at store Li at time t
WS(xi[t]) = set of writes to object x at store Li at time
t
WS(xi[t];xj[t']) = writes to object x at store Li at time t
have been applied at store Li before writes to
object x at store Li at time t'
(t may be omitted if clear from context)
Monotonic Reads (2)
Operations of WS(x1) performed at L2 before those of WS(x2)
Figure 7-12. The read operations performed by a single process

P at two different local copies of the same data store.
(a) A monotonic-read consistent data store.
Monotonic Reads (3)
Operations of WS(x1) not performed at L2 before those of WS(x2)
Figure 7-12. The read operations performed by a single process P

at two different local copies of the same data store.
(b) A data store that does not provide monotonic reads.
Monotonic Writes (1)

In a monotonic-write consistent store, the following
condition holds:
A write operation by a process on a data item x
is completed before any successive write
operation on x
by the same process.
Operations of WS(x1) performed at L2 before write of x at L2, W(x2)
Figure 7-13. The write operations performed by a single process P at

two different local copies of the same data store.
(a) A monotonic-write consistent data store.
Operations of WS(x1) performed at L2 before write of x at L2, W(x2)
Figure 7-13. The write operations performed by a single process P at

two different local copies of the same data store.
(a) A monotonic-write consistent data store.
Write of x at L1, W(x1), not performed at L2 before write of x at L2
Figure 7-13. The write operations performed by a single process P

at two different local copies of the same data store. (b) A data
store that does not provide monotonic-write consistency.
Read Your Writes (1)

A data store is said to provide read-your-writes
consistency, if the following condition holds:
The effect of a write operation by a process on data
item x
will always be seen by a successive read
operation on x
by the same process.
Operations of WS(x1) performed at L2 before those of WS(x2),

and subsequent read at L2, R(x2)
Figure 7-14. (a) A data store that provides

read-your-writes consistency.
Operations of WS(x1) not performed at L2
Figure 7-14. (b) A data store that does not.
Writes Follow Reads (1)

A data store is said to provide writes-follow-reads
consistency, if the following holds:
A write operation by a process
on a data item x following a previous read
operation on x by the same process
is guaranteed to take place on the same or a
more recent value of x that was read.
More recent value of x
Figure 7-15. (a) A writes-follow-reads consistent data store.
Value of x read at L2 does not reflect update at L1
Figure 7-15. (b) A data store that does

not provide writes-follow-reads consistency.
Replica-Server Placement(0)
Big issues:
How many replicas
Where to locate replicas
Replica-server placement:
Client-driven incremental
AS-based cells, demand-driven
Geometric cells, cluster-driven
Replica-Server Placement
Cluster spans two cells
Two clusters in same cell
One cluster in one cell
Figure 7-16. Choosing a proper cell size for server placement.
Content Replication and Placement
Figure 7-17. The logical organization of different kinds

of copies of a data store into three concentric rings.
Server-Initiated Replication (0)

Big issues:
When to replicate? Where?
When to migrate? Where?
When to delete? When NOT to delete?

Threshold approach
Server S knows nearest server P to client C
Server S keeps local count of accesses to file F from

clients nearest to P
If total count for F at S is below deletion threshold, then

delete F unless last copy
If total count above replication threshold, then copy
If count for F from P > (total count for F at S)/2, migrate
Server-Initiated Replicas
Closest server
to C1 and C2
Figure 7-18. Counting access requests from different clients.
Client-Initiated Replication
a.k.a. caching
1. Improve access time
2. Flush cache to avoid stale data, and ...
3. ... to make room for more relevant data
4. Especially useful when cache serves
multiple clients, and ...
5. Clients exhibit correlated accesses
State versus Operations

Possibilities for what is to be propagated
(applies to all replication types):
1. Propagate only a notification of an update
Cache invalidation/witness
2. Transfer data from one copy to another
Cache migration
3. Propagate the update operation to other
copies
Cache update
Pull versus Push Protocols

What information is needed? What Actions? Results?
Figure 7-19. A comparison between push-based and pull-based

protocols in the case of multiple-client, single-server systems.
Remote-Write Protocols
W6
, W5
W6 tell client write completed (opt)
Figure 7-20. The principle of a primary-backup protocol.
Local-Write Protocols
W1.5
W1.5: tell primary
W4 often notification, not update

So R1.1 update.req, R1.2 update.cnf
Figure 7-21. Primary-backup protocol in which the primary

migrates to the process wanting to perform an update.
Quorum-based Protocols (0)

N = total number of replicas; each keeps version number
R = read quorum size (number of replicas to probe when
performing a read)
W = write quorum size (number of replicas to probe when
performing a write)
Must ensure any two write sets intersect; and any read set
intersects every write set to get latest version no.
Hence, W > N/2 and R+W > N
Note any W copies will do for a write, and any R copies
for a read (not fixed quorums as in coterie DME)
May have witnesses (keep meta-info only) witness can
alert querier that new version # exists
Quorum-Based Protocols
Figure 7-22. Three examples of the voting algorithm. (a) A

correct choice of read and write set. (b) A choice that
may lead to write-write conflicts. (c) A correct choice,
known as ROWA (read one, write all).
Issues work for read, write; fault tolerance. May have witnesses no copy, just ver.

Chap 07v2 Nemo

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chap 07v2 Nemo

Uploaded by

Copyright:

Available Formats

DISTRIBUTED SYSTEMS

Principles and Paradigms

Reasons for Replication

Data are replicated to increase the

Data-centric Consistency Models

Figure 7-1. The general organization of a logical

Continuous Consistency (0)

Conit = CONsistency UniT

Order = number of uncommitted updates

Specify consistency semantically

Numeric = Difference between local and

Continuous Consistency (2)

Figure 7-3. Choosing the appropriate granularity for a conit.

Continuous Consistency (1)

Figure 7-2. An example of keeping track of consistency

Order = # uncommitted local updates; Numerical (#unseen remote updates,

Continuous Consistency (3)

Figure 7-3. Choosing the appropriate granularity for a conit.

Sequential Consistency (1)

Figure 7-4. Behavior of two processes operating

A sequential order satisfying the observations

Sequential Consistency (2)

Sequential Consistency (3)

Forces W(x)b first

Forces W(x)a first

Figure 7-5. (a) A sequentially consistent data store.

Sequential Consistency (4)

Figure 7-6. Three concurrently-executing processes.

Sequential Consistency (5)

Figure 7-7. Four valid execution sequences for the processes of

Causal Consistency (1)

Causal Consistency (2)

Causal relation element

Figure 7-8. This sequence is allowed with a causally-consistent

Causal Consistency (3)

Figure 7-9. (a) A violation of a causally-consistent store.

Causal Consistency (4)

Figure 7-9. (b) A correct sequence of events

Processor Consistency (2)

Only one write per processor, so trivially

Figure 7-9. (a) A violation of a causally-consistent store, but which

Processor Consistency (3)

P2 sees W(x)b before W(x)a

A violation of processor consistency.

Grouping Operations (1)

An acquire access of a synchronization variable, not

Before exclusive mode access to synchronization variable

After exclusive mode access to synchronization variable

Grouping Operations (1.5)

Weak consistency: sync(S) operation is not performed

Release consistency: Acquire(S) not performed until

Entry consistency: Acquire and release are on implicit,

Grouping Operations (2)

Waiting for operation to complete

Figure 7-11. The principle of a mobile user accessing

Monotonic Reads (1)

Monotonic Reads (1.5)

Monotonic Reads (2)

Operations of WS(x1) performed at L2 before those of WS(x2)

Figure 7-12. The read operations performed by a single process

Monotonic Reads (3)

Operations of WS(x1) not performed at L2 before those of WS(x2)

Figure 7-12. The read operations performed by a single process P

Monotonic Writes (1)