You are on page 1of 28

Chapter 7

Consistency and Replication

1
Outlines

 Reasons for Replication


 Issues in Replication
 Replica Management
 Consistency Protocols

2
Reasons for Replication
 Replication - creating and using multiple copies of data or services.
 It could be:
 Data replication if the same data is stored on multiple storage devices.

E.g., Web site mirror


 Computation replication if the same computing task is executed many
times.
 Why replication?
 Increasing the reliability of systems
 If one replica is unavailable or crashes, use another.
 Protect against corrupted data.
 Improving performance
 Scale in numbers (size) of the distributed system (replicated web
servers).
 Scale in geographically distributed systems (web proxies).
3
Issues in Replication
 Performance and scalability
 Replication and caching are used as system scaling techniques.
 Keeping copies of data close to the processes using them can improve performance
and solve scalability problems.
 Scaling up usually results in management overheads.

 Consistency when updating replicas


 How to deal with updated data
 Updates have to be carried out on all copies.
 Problems with network performance.
 It is needed to handle concurrency.
 How are updates distributed?

 Replica management
 How many replicas?
 Where to place them (close to the server or client)?
 When to get rid of them?

 Redirection/routing: Which replica should clients use?


4
Issues in Replication: Performance and Scalability
 To keep replicas consistent, we generally need to ensure that all conflicting
operations are done in the same order everywhere.
 Two operations conflict if they operate on the same data item.
 Read–write conflict: a read operation and a write operation act concurrently.

 Write–write conflict: two concurrent write operations.

 Guaranteeing global ordering on conflicting operations may be a costly operation,


downgrading scalability.
 However, such problems can be solved by loosening consistency requirements so
that hopefully global synchronization can be avoided.
 This leads to the definition of different consistency models, as well as
protocols for implementing the models.
 Protocols are used:
 to manage replicas (creation, placement, etc)
 to manage updates (propagation)
 to assign client requests to replicas
5
Issues in Replication: Consistency Models
 Consistency models (also known as consistency semantics or constraints) are essentially
contracts between processes and the data store.
 The general organization of a logical distributed data store:

 If processes obey certain rules, data store will work correctly.


 Data store guarantees certain properties on the data items stored.
 E.g., a read on a data item would always return the value showing the most recent write.

 Broad categories of consistency models


 Data-Centric
 Client-Centric

6
Data – Centric Consistency: Overview
 Data-centric models provide a system-wide consistent view on a data store.
 Important assumptions:
 Concurrent processes may be simultaneously updating the data store.
 Consistency need to be provided in the face of such concurrency.
 Data-centric consistency (order based consistency) models to be discussed
 Sequential Consistency
 Causal Consistency

7
Sequential Consistency
 The result of any execution is the same as if
 operations by all processes on the data store were executed in some
sequential order; and
 the operations of each individual process appear in this sequence in the
order specified by its program.
 When processes run concurrently on different machines, any valid
interleaving of read and write operations is acceptable, but all processes see
the same interleaving of operations.
 Example:

8
Causal Consistency
 Writes that are potentially causally related must be seen by all processes
in the same order. Concurrent writes may be seen in a different order by
different processes.
 Represents a weakening of sequential consistency.

9
Client centric consistency - Overview
 Client centric consistency provides consistency guarantees for a single client with respect to the data
stored by that client.
 Assumption:
 The data stores are characterized by the lack of simultaneous updates.
 Most operations involve reading data.
 The data stores offer a very weak consistency model called eventual consistency.
 Four client-centric consistency models introduced:
 Monotonic-Read Consistency
 Monotonic-Write Consistency
 Read-Your-Writes Consistency
 Writes-Follow-Reads Consistency
 The goal of the models is to avoid system-wide consistency, by concentrating on what specific
clients want, instead of what should be maintained by servers.
 Notations:
 Xi[t] – the version of data item x at time t at local copy Li.
 WS(xi[t]) -- all write operations at Li since initialization.
 WS(xi[t1]; xj[t2]) -- if operations in WS(xi[t1]) have also been performed at local copy Lj at a
later time t2. indicates that WS(xi [t1]) is part of WS(xj[t2]).
10
Client centric consistency - Eventual Consistency
 In many applications most processes only read data.
 Think of most database systems: mainly read.

 In many cases concurrency appears only in restricted form.


 Think of the DNS: write-write conflicts do no occur.

 In many cases updates (writes) may not be immediately made available to readonly
processes.
 Think of WWW: as with DNS, except that heavy use of client-side caching is
present: even the return of stale pages is acceptable to most users.
 Thus, many systems may exhibit a high degree of acceptable inconsistency, with the
replicas gradually become consistent over time.
 The only requirement is that all replicas will eventually be the same.
 All updates must be guaranteed to propagate to all replicas eventually!

 However, this works well if every client always updates the same replica.
 Things are a little difficult if the clients are mobile.
 A mobile user accessing different replicas of a distributed database has problems

11with eventual consistency.


Eventual Consistency: Problems With Mobile Users

Example
 Consider a distributed database to which you have access through your
laptop. Assume your laptop acts as a front end to the database.
 At location A you access the database doing reads and updates.
 At location B you continue your work, but unless you access the same
server as the one at location A, you may detect inconsistencies:
 your updates at A may not have yet been propagated to B.
 you may be reading newer entries than the ones available at A
 your updates at B may eventually conflict with those at A.
 Note:
 The only thing you really want is that the entries you updated and/or
read at A, are in B the way you left them in A.
 In that case, the database will appear to be consistent to you.

12
… Cont’d
 Basic Architecture

13
Client-Centric Consistency Models: Monotonic Reads
 A data store is said to provide monotonic-read consistency if the following
condition holds:
 If a process reads the value of a data item x any successive read operation on x
by that process will always return that same value or a more recent value.
 Client “sees” only same or newer version of data.
 Example: Reading (not modifying) incoming e-mail while you are on the move.
 Each time you connect to a different e-mail server, that server fetches(at
least) all the updates from the server you previously visited.

The read operations performed by a single process P at two different local copies of
the same data store
(a) A monotonic-read consistent data store.
(b) A data store that does not provide monotonic reads.
14
Client-Centric Consistency Models: Monotonic Writes
 A data store is said to provide monotonic-write consistency if the following
condition holds:
 A write operation by a process on a data item x is completed before any
successive write operation on x by the same process.
 Write happens on a copy only if it’s brought up to date with preceding write
operations on same data (but possibly at different copies).
 Example: Updating a program at server S2, and ensuring that all components
on which compilation and linking depends, are also placed at S2.

The write operations performed by a single process P at two different local copies of
the same data store
(a) A monotonic-write consistent data store.
15 (b) A data store that does not provide monotonic-write consistency.
Client-Centric Consistency Models: Read your Writes
 A data store is said to provide read-your-writes consistency if the following
condition holds:
 The effect of a write operation by a process on data item x, will always be
seen by a successive read operation on x by the same process.
 All previous writes are always completed before any successive read.
 Example:
 Updating your Web page and guaranteeing that your Web browser shows
the newest version instead of its cached copy.

(a) A data store that provides read-your-writes consistency.


(b) A data store that does not provide read-your-writes consistency.
16
Client-Centric Consistency Models: Writes follow Reads
 A data store is said to provide writes-follow-reads consistency if the following
condition holds:
 A write operation by a process on a data item x following a previous read
operation on x by the same process, is guaranteed to take place on the same or
a more recent value of x that was read.
 Any successive write operation on x will be performed on a copy of x that is
same or more recent than the last read.
 Example:
 See reactions to posted articles only if you have the original posting (a read
“pulls in” the corresponding write operation).

(a) A writes-follow-reads consistent data store.


(b) A data store that does not provide writes-follow-reads consistency.
17
Replica Management: Replica Server Placement
 Regardless of which consistency model is chosen, we need to decide where, when and by
whom copies of the data-store are to be placed.
 Figure out what the best K places are out of N possible locations.
 Select best location out of N-K for which the average distance to clients is minimal.
 The first chosen location minimizes the average distance to all clients.
 Computationally expensive.
 Select the Kth largest autonomous system and place a server at the best connected host.
 Computationally expensive.
 Position nodes in a d-dimensional geometric space, where distance reflects latency.
 Identify the K regions with highest density and place a server in every one.
 Computationally cheap.
 Identify the K largest clusters, then put one server in each cluster.
 How do you find clusters?

- One way, divide the entire space into cells, pick K most populated ones.

18
Replica Management: Content Replication and Placement
 A process is capable of hosting a replica of data.
 There are three types of replica:
 Permanent replicas: Process/machine always having a replica. Tend to be
small in number. Organized as clusters of workstations or mirrored
systems.
 Server-initiated replica: Process that can dynamically host a replica on
request of another server in the data store. Used to enhance performance at
the initiation of the owner of the data store. Typically used by web hosting
companies to geographically locate replicas close to where they are needed
most.
 Client-initiated replica: Process that can dynamically host a replica on
request of a client (client cache). Created as a result of client requests –
think of browser caches. Works well assuming that the cached data does
not go stale too soon. Used to improve access time.

19
… Cont’d

20
Replica Management: Content Distribution
State versus Operations
 When a client initiates an update to a distributed data store, what gets propagated?
 There are different design issues to consider with respect to propagating updates.
 Propagate notification of the update to the other replicas.
 This is an invalidation protocol which indicates that the replica’s data is no longer
up-to-date.
 Main advantage is that notifications use little network bandwidth.
 Can work well when there’s many writes compared to read operations.
 Often used for caches.
 Transfer modified data from one replica to another.
 Works well when there’s many reads compared to write operations.
 Often used for distributed databases.
 Propagate the update to the other replicas
 This is active replication, and shifts the workload to each of the replicas.

 No single approach is the best, but depends highly on available bandwidth and read-to-
write ratio at replicas.
21
… Cont’d
Pull versus Push
 Another issue is how the update is to be propagated.
 Updates can be pulled or pushed.
 Pushing updates: Server-initiated approach, in which update is propagated
regardless whether target asked for it. Useful when a high degree of
consistency is needed. Used between permanent and server-initiated
replicas.
 Pulling updates: client-initiated approach, in which client requests to be
updated. Used by client caches (e.g., browsers). No request, no update!
 A comparison between push-based and pull-based protocols in the case of
multiple client, single server systems

22
… Cont’d
Switching Between Pull and Push
 We can dynamically switch between pull and push using leases:
 Lease: A contract in which the server promises to push updates to the
client until the lease expires.
 Make lease expiration time dependent on system's behavior (adaptive
leases):
 Age-based leases: An object that hasn't changed for a long time, will
not change in the near future, so provide a long-lasting lease.
 Renewal-frequency based leases: The more often a client requests a
specific object, the longer the expiration time for that client (for that
object) will be.
 State-based leases: The more loaded a server is, the shorter the
expiration times become.

23
Consistency Protocols: Primary-Based Protocols
 Each data item in the data store is associated with a primary replica.
 The primary is responsible for coordinating writes to the data item.
 There are two types of primary-based protocols.
 Remote-Write Protocols
 The primary is fixed at a remote server.
 This model is typically associated with traditional client/server
systems.
 Local-Write Protocols
 Write operations can be carried out locally after moving the primary
to the process.

24
Primary-Based Protocols: Remote-Write Protocols
Primary-Backup Protocol
 Primary-backup protocol is a variation in which read operations are
preformed locally, write operations are carried out at a fixed primary copy.

25
Primary-Based Protocols: Local-Write Protocols
Primary Migration
 A variant of local-write protocol with primary-backup protocol in which
the primary copy migrates between processes that wish to perform a write
operation.

26
Replicated-Write Protocols: Quorum-Based Protocols
 Write operations can be carried out at multiple replicas instead of only one.
 A commonly used approach to replicated writes is to use voting system (referred to as
quorum-based protocols).
 Clients must request and acquire permissions from multiple replicas before either
reading/writing a replicated data item.
 To read a file of which N replicas exist, a client needs to assemble a read quorum of NR
servers or more.
 To modify a file, a write quorum of at least NW servers is required.
 The values of NR and NW are subject to the following constraints:

NR + NW > N (used to prevent read-write conflicts)


NW > N/2 (used to prevent write-write conflicts)

27
?
28

You might also like