Professional Documents
Culture Documents
Introduction
Dr. Barsha Mitra
CSIS Dept, BITS Pilani, Hyderabad Campus
BITS Pilani
Hyderabad Campus
1
Course Details
•Course will be run in FLIPPED MODE
•Course handout is uploaded on https://elearn.bits-pilani.ac.in
No Name Type Weight Day, Date, Session, Time
Quiz-1 5% August 16-30, 2022
EC-1 Quiz-2 5% September 16-30, 2022
Assignment 10% October 16-30, 2022
EC-2 Mid-Semester Test
30%
EC-3 Comprehensive Exam
50%
Course ID: SS ZG526, Title: Distributed Computing 2 BITS Pilani, Hyderabad Campus
Course Details
•Quizzes and Assignment will be comducted on
https://elearn.bits-pilani.ac.in
•Course slides will be uploaded on https://elearn.bits-pilani.ac.in
and will be available on Microsoft Teams
•Course recordings will be available on Microsoft Teams
•Lab capsules are available on https://bitscsis.vlabs.platifi.com/
Course ID: SS ZG526, Title: Distributed Computing 3 BITS Pilani, Hyderabad Campus
Definition of Distributed Systems
•Collection of independent entities cooperating to collectively solve a task
Course ID: SS ZG526, Title: Distributed Computing 4 BITS Pilani, Hyderabad Campus
Features of Distributed System
•No common physical clock
•No shared memory - Requires message passing for communication
•Geographical seperation
•Processors are geographically wide apart
•Processors may reside on a WAN or LAN
•Autonomy and Heterogeneity
•Loosely coupled processors
•Different processor speeds and operating systems
•Processors are not part of a dedicated system
•Cooperate with one another
•May offer services or solve a problem jointly
Course ID: SS ZG526, Title: Distributed Computing 5 BITS Pilani, Hyderabad Campus
Middleware
•Middleware –
• distributed software
• drives the distributed system
• provides transparency of heterogeneity
at platform level
•Middleware standards:
•Common Object Request Broker
Architecture (CORBA)
•Remote Procedure Call (RPC) mechanism
•Distributed Component Object Model
(DCOM)
•Remote Method Invocation (RMI)
Course ID: SS ZG526, Title: Distributed Computing 6 BITS Pilani, Hyderabad Campus
Motivation for Distributed System
•Inherently distributed computation
•Resource sharing
•Access to remote resources
•Increased performance/cost ratio
•Reliability
•Scalability
•Modularity and Incremental Expandability
Course ID: SS ZG526, Title: Distributed Computing 7 BITS Pilani, Hyderabad Campus
Coupling
• Interdependency/binding and/or homogeneity among modules,
whether hardware or software (e.g., OS, middleware)
• High coupling tightly coupled modules
• Low coupling loosely coupled modules
Course ID: SS ZG526, Title: Distributed Computing 8 BITS Pilani, Hyderabad Campus
Parallel Systems
• Multiprocessor systems
• E.g., Omega, Butterfly Networks
• Multicomputer parallel systems
• E.g., NYU Ultracomputer, CM* Connection Machine, IBM Blue gene
• Array processors (collocated, tightly coupled, common system clock,
may not have shared memory)
• E.g., DSP applications, image processing
Course ID: SS ZG526, Title: Distributed Computing 9 BITS Pilani, Hyderabad Campus
UMA Model
• Direct access to shared memory
• Access latency - waiting time to complete an
access to any memory location from any
processor
• Access latency is same for all processors
• Processors
• Remain in close proximity
• Connected by an interconnection network
Course ID: SS ZG526, Title: Distributed Computing 10 BITS Pilani, Hyderabad Campus
UMA Model
• Interprocess communication occurs through read and write operations
on shared memory
• Processors are of same type
• Remain within same physical box/container
• Processors run the same operating system
• Hardware & software are very tightly coupled
• Interconnection network to access memory may be
• Bus
• Multistage switch
Course ID: SS ZG526, Title: Distributed Computing 11 BITS Pilani, Hyderabad Campus
Omega Network
•Multi-stage networks formed by 2 x 2
switching elements
•Data can be sent on any one of the
input wires
•n-input and n-output network uses:
• log2(n) stages
• log2(n) bits for addressing
•n processors, n memory banks
Course ID: SS ZG526, Title: Distributed Computing 12 BITS Pilani, Hyderabad Campus
Omega Network
• (n/2)log2(n) switching elements
of size 2x2
• Interconnection function:
Output i of a stage is connected
to input j of next stage:
• j = 2i, for 0 ≤ i ≤ n/2 – 1
= 2i + 1 – n, for n/2 ≤ i ≤ n – 1
•left-rotation operation on the
binary representation of i to
get j
Course ID: SS ZG526, Title: Distributed Computing 13 BITS Pilani, Hyderabad Campus
References
• Ajay D. Kshemkalyani, and Mukesh Singhal, Chapter 1, “Distributed
Computing: Principles, Algorithms, and Systems”, Cambridge
University Press, 2008.
• https://medium.com/@rashmishehana_48965/middleware-
technologies-e5def95da4e
Course ID: SS ZG526, Title: Distributed Computing 14 BITS Pilani, Hyderabad Campus
Thank You
Course ID: SS ZG526, Title: Distributed Computing 15 BITS Pilani, Hyderabad Campus
Distributed Computing
Introduction
Dr. Barsha Mitra
CSIS Dept, BITS Pilani, Hyderabad Campus
BITS Pilani
Hyderabad Campus
1
Multicomputer Parallel Systems
• multiple processors do not have direct access to shared memory
• usually do not have a common clock
• processors are in close physical proximity
• very tightly coupled (homogenous hardware and software)
• connected by an interconnection network
• communicate either via a common address space or via message-passing
Course ID: SS ZG526, Title: Distributed Computing 2 BITS Pilani, Hyderabad Campus
https://en.wikipedia.org/wiki/Parallel_computing#/media/File:IBM_Blue_Gene_P_supercomputer.jpg
Course ID: SS ZG526, Title: Distributed Computing 3 BITS Pilani, Hyderabad Campus
https://www.cs.iusb.edu/~danav/teach/b424/b424_24_networks.html
NUMA Model
Course ID: SS ZG526, Title: Distributed Computing 4 BITS Pilani, Hyderabad Campus
Multicomputer Parallel Systems
processor + memory
Course ID: SS ZG526, Title: Distributed Computing 5 BITS Pilani, Hyderabad Campus
Multicomputer Parallel Systems
• k-dimensional hypercube
has 2k processor-and-
memory units (nodes)
• each dimension is
associated with a bit
position in the label
• labels of two adjacent
nodes differ in the kth bit
position for dimension k
• shortest path between any
two processors
Hamming distance
• Hamming distance <= k processor + memory
Course ID: SS ZG526, Title: Distributed Computing 6 BITS Pilani, Hyderabad Campus
Multicomputer Parallel Systems
Course ID: SS ZG526, Title: Distributed Computing 7 BITS Pilani, Hyderabad Campus
Parallelism/Speedup
• measure of the relative speedup of a specific program on a
given machine
• depends on
• number of processors
• mapping of the code to the processors
• speedup = T(1)/T(n)
• T(1) = time with a single processor
• T(n) = time with n processors
Course ID: SS ZG526, Title: Distributed Computing 8 BITS Pilani, Hyderabad Campus
Parallel/Distributed Program
Concurrency
non-communication and
non-shared memory
number of local operations
access
total number of operations
includes the
communication or shared
memory access
operations
Course ID: SS ZG526, Title: Distributed Computing 9 BITS Pilani, Hyderabad Campus
Distributed Communication Models
Course ID: SS ZG526, Title: Distributed Computing 10 BITS Pilani, Hyderabad Campus
Publish/ Subscribe Model
• allows any number of publishers to
communicate with any number of subscribers
asynchronously
• Used in serverless and microservices
architecture
• form of multicasting
• events are produced by publishers
• interested subscribers are notified when the
events are available
• information is provided as notifications
• subscribers continue their tasks until they
receive notifications
• Oracle, AWS, Azure Web PubSub service
Course ID: SS ZG526, Title: Distributed Computing 11 BITS Pilani, Hyderabad Campus
References
• Ajay D. Kshemkalyani, and Mukesh Singhal, Chapter 1, “Distributed Computing: Principles,
Algorithms, and Systems”, Cambridge University Press, 2008.
• https://medium.com/@rashmishehana_48965/middleware-technologies-e5def95da4e
• http://wiki.c2.com/?PublishSubscribeModel
• https://www.slideshare.net/ishraqabd/publish-subscribe-model-overview-13368808/5
• https://www.bmc.com/blogs/pub-sub-publish-subscribe/
• https://aws.amazon.com/pub-sub-
messaging/#:~:text=Publish%2Fsubscribe%20messaging%2C%20or%20pub,the%20subscribe
rs%20to%20the%20topic.
Course ID: SS ZG526, Title: Distributed Computing 12 BITS Pilani, Hyderabad Campus
Thank You
Course ID: SS ZG526, Title: Distributed Computing 13 BITS Pilani, Hyderabad Campus
Distributed Computing
A Model of Distributed Computations
Dr. Barsha Mitra
CSIS Dept, BITS Pilani, Hyderabad Campus
BITS Pilani
Hyderabad Campus
Preliminaries
•distributed system consists of a set of processors
•connected by a communication network
•communication delay is finite but unpredictable
•processors do not share a common global memory
•asynchronous message passing
•no common physical global clock
•communication medium may deliver messages out of order
Course ID: SS ZG526, Title: Distributed Computing 2 BITS Pilani, Hyderabad Campus
A Distributed Program
•Cij : channel from process pi to process pj
•mij: message sent by pi to pj
•Events –
•Internal Events
•Message Send Events
•Message Receive Events
Course ID: SS ZG526, Title: Distributed Computing 3 BITS Pilani, Hyderabad Campus
Space-Time Diagram
Course ID: SS ZG526, Title: Distributed Computing 4 BITS Pilani, Hyderabad Campus
A Model of Distributed Executions
Causal Precedence/Dependency or Happens Before Relation
•ei → ej
•path
Course ID: SS ZG526, Title: Distributed Computing 5 BITS Pilani, Hyderabad Campus
Models of Communication Networks
• FIFO model:
• each channel acts as a first-in first-out message queue
• message ordering is preserved by channel
• Non-FIFO model
• channel acts like a set
• sender process adds messages to channel
• receiver process removes messages from it
Course ID: SS ZG526, Title: Distributed Computing 6 BITS Pilani, Hyderabad Campus
Models of Communication Networks
•Causal ordering model is based on Lamport’s “happens before”
relation
•A system supporting causal ordering model satisfies :
CO: for mij and mkj , if send(mij ) → send(mkj ),
then rec(mij ) → rec(mkj )
•Causally ordered delivery of messages implies FIFO message delivery
•CO ⊂ FIFO ⊂ Non-FIFO
Course ID: SS ZG526, Title: Distributed Computing 7 BITS Pilani, Hyderabad Campus
Reference
Course ID: SS ZG526, Title: Distributed Computing 8 BITS Pilani, Hyderabad Campus
Course ID: SS ZG526, Title: Distributed Computing 9 BITS Pilani, Hyderabad Campus
Distributed Computing
Logical Time
Dr. Barsha Mitra
CSIS Dept, BITS Pilani, Hyderabad Campus
BITS Pilani
Hyderabad Campus
Introduction
•not possible to have global physical time
•realize only an approximation of physical time using logical time
•logical time advances in jumps
•captures monotonicity property
• monotonicity property: if event a causally affects event b, then timestamp
of a is smaller than timestamp of b
• Ways to represent logical time:
•scalar time
•vector time
Course ID: SS ZG526, Title: Distributed Computing 2 BITS Pilani, Hyderabad Campus
Logical Clocks
Course ID: SS ZG526, Title: Distributed Computing 3 BITS Pilani, Hyderabad Campus
Scalar Time - Definition
•Ci:
•integer variable
•denotes logical clock of pi
Course ID: SS ZG526, Title: Distributed Computing 4 BITS Pilani, Hyderabad Campus
Scalar Time - Definition
Rules for updating clock:
•R1: Before executing an event (send, receive, or internal), process pi
executes
Course ID: SS ZG526, Title: Distributed Computing 5 BITS Pilani, Hyderabad Campus
Evolution of Scalar Time
p1
p2
Processes
p3
p4
Time
Course ID: SS ZG526, Title: Distributed Computing 6 BITS Pilani, Hyderabad Campus
Evolution of Scalar Time
1 8 9 10 13 14
p1
1 7 9 10 12 14
6 7 11 12 13 15 16
p2
Processes 5 11 16
p3 4 5 10 11 12 17
3 12
p4
2 3 4 5 6 13 14
Time
Course ID: SS ZG526, Title: Distributed Computing 7 BITS Pilani, Hyderabad Campus
Scalar Time – Basic Properties
•Consistency
•Total Ordering
•Event Counting
•No Strong Consistency
•From Recorded Lecture
Course ID: SS ZG526, Title: Distributed Computing 8 BITS Pilani, Hyderabad Campus
Scalar Time – Basic Properties
1 8 9 10 13 14
p1
1 7 9 10 12 14
6 7 11 12 13 15 16
• d=1
p2
• height of event
Processes 5 11 16
p3 4 5 10 11 12 17
3 12
p4
2 3 4 5 6 13 14
Time
Course ID: SS ZG526, Title: Distributed Computing 9 BITS Pilani, Hyderabad Campus
Vector Time - Definition
• time domain is represented by a set of n-dimensional non-
negative integer vectors
• vti[i], vti[j]
Course ID: SS ZG526, Title: Distributed Computing 10 BITS Pilani, Hyderabad Campus
Vector Time - Definition
p1
p2
Processes
p3
p4
Time
Course ID: SS ZG526, Title: Distributed Computing 11 BITS Pilani, Hyderabad Campus
Vector Time - Definition
[1 0 0 0] [2 0 0 0] [3 0 0 0] [4 0 0 0] [5 3 0 0]
p1
[1 0 0 0] [2 0 0 0] [3 0 0 0] [2 3 0 0]
[2 4 0 0] [2 5 0 0]
p2
[2 1 0 0] [2 2 0 0] [2 3 0 0]
Processes
[2 5 0 0]
[1 0 1 2] [1 0 2 3] [3 0 4 5] [3 0 5 5]
p3
[3 0 3 3] [3 5 6 5]
[1 0 0 2] [1 0 0 3] [ 1 0 0 5] [3 0 5 5]
p4
[1 0 0 1] [1 0 0 2] [1 0 0 3] [1 0 0 4] [1 0 0 5] [3 0 5 6] [3 0 5 7]
Time
Course ID: SS ZG526, Title: Distributed Computing 12 BITS Pilani, Hyderabad Campus
Vector Time
• Comparing Vector timestamps from Recorded Lecture
• Vector Time properties from Recorded Lecture
Course ID: SS ZG526, Title: Distributed Computing 13 BITS Pilani, Hyderabad Campus
Vector Time – Basic Properties
Event counting
• if d is always 1 and vti[i] is the ith component of vector clock at process pi
• then vti[i] = no. of events that have occurred at pi until that instant
Course ID: SS ZG526, Title: Distributed Computing 14 BITS Pilani, Hyderabad Campus
Vector Time – Basic Properties
Event counting
•if an event e has timestamp vh
• vh[j] = number of events executed by process pj that causally
precede e
• Σ vh[j] −1 : total no. of events that causally precede e in the
distributed computation
Course ID: SS ZG526, Title: Distributed Computing 15 BITS Pilani, Hyderabad Campus
Singhal–Kshemkalyani’s Differential
Technique
p1
p2
Processes
p3
p4
Time
Course ID: SS ZG526, Title: Distributed Computing 16 BITS Pilani, Hyderabad Campus
Singhal–Kshemkalyani’s Differential
Technique
[1 0 0 0] [2 0 0 0] [3 0 0 0]
p1
{(1, 1)} {(1, 2)} {(1, 3)}
[2 3 1 0] [3 5 3 1] [3 6 6 3]
p2
[1 1 0 0] [2 2 0 0] [3 4 1 0]
{(3, 1)} {(3, 3), (4, 1)} {(3, 6), (4, 3)}
Processes
[0 0 2 1] [0 0 4 2] [0 0 5 3]
p3
[0 0 1 0] [0 0 3 1] [0 0 6 3]
p4
[0 0 0 1] [0 0 0 2] [0 0 0 3]
Time
Course ID: SS ZG526, Title: Distributed Computing 17 BITS Pilani, Hyderabad Campus
Fowler–Zwaenepoel’s Direct-
Dependency Technique
Course ID: SS ZG526, Title: Distributed Computing 18 BITS Pilani, Hyderabad Campus
Fowler–Zwaenepoel’s Direct-
Dependency Technique
p1 [1 0 0 0] [2 0 0 0] [3 0 0 0]
p2 [2 3 1 0] [3 5 3 0] [3 6 6 0]
[1 1 0 0] [2 2 0 0] [3 4 1 0]
{1} {3} {6}
Processes
p3 [0 0 2 1] [0 0 4 2] [0 0 5 3]
[0 0 1 0] [0 0 3 1] [0 0 6 3]
Time
Course ID: SS ZG526, Title: Distributed Computing 19 BITS Pilani, Hyderabad Campus
Reference
• Ajay D. Kshemkalyani, and Mukesh Singhal, Chapter 3, “Distributed
Computing: Principles, Algorithms, and Systems”, Cambridge
University Press, 2008.
Course ID: SS ZG526, Title: Distributed Computing 20 BITS Pilani, Hyderabad Campus
Course ID: SS ZG526, Title: Distributed Computing 21 BITS Pilani, Hyderabad Campus
Distributed Computing
Logical Time
Dr. Barsha Mitra
CSIS Dept, BITS Pilani, Hyderabad Campus
BITS Pilani
Hyderabad Campus
Introduction
•not possible to have global physical time
•realize only an approximation of physical time using logical time
•logical time advances in jumps
•captures monotonicity property
• monotonicity property: if event a causally affects event b, then timestamp
of a is smaller than timestamp of b
• Ways to represent logical time:
•scalar time
•vector time
Course ID: SS ZG526, Title: Distributed Computing 2 BITS Pilani, Hyderabad Campus
Logical Clocks
Course ID: SS ZG526, Title: Distributed Computing 3 BITS Pilani, Hyderabad Campus
Scalar Time - Definition
•Ci:
•integer variable
•denotes logical clock of pi
Course ID: SS ZG526, Title: Distributed Computing 4 BITS Pilani, Hyderabad Campus
Scalar Time - Definition
Rules for updating clock:
•R1: Before executing an event (send, receive, or internal), process pi
executes
Course ID: SS ZG526, Title: Distributed Computing 5 BITS Pilani, Hyderabad Campus
Evolution of Scalar Time
p1
p2
Processes
p3
p4
Time
Course ID: SS ZG526, Title: Distributed Computing 6 BITS Pilani, Hyderabad Campus
Evolution of Scalar Time
1 8 9 10 13 14
p1
1 7 9 10 12 14
6 7 11 12 13 15 16
p2
Processes 5 11 16
p3 4 5 10 11 12 17
3 12
p4
2 3 4 5 6 13 14
Time
Course ID: SS ZG526, Title: Distributed Computing 7 BITS Pilani, Hyderabad Campus
Scalar Time – Basic Properties
•Consistency
•Total Ordering
•Event Counting
•No Strong Consistency
•From Recorded Lecture
Course ID: SS ZG526, Title: Distributed Computing 8 BITS Pilani, Hyderabad Campus
Scalar Time – Basic Properties
1 8 9 10 13 14
p1
1 7 9 10 12 14
6 7 11 12 13 15 16
• d=1
p2
• height of event
Processes 5 11 16
p3 4 5 10 11 12 17
3 12
p4
2 3 4 5 6 13 14
Time
Course ID: SS ZG526, Title: Distributed Computing 9 BITS Pilani, Hyderabad Campus
Vector Time - Definition
• time domain is represented by a set of n-dimensional non-
negative integer vectors
• vti[i], vti[j]
Course ID: SS ZG526, Title: Distributed Computing 10 BITS Pilani, Hyderabad Campus
Vector Time - Definition
p1
p2
Processes
p3
p4
Time
Course ID: SS ZG526, Title: Distributed Computing 11 BITS Pilani, Hyderabad Campus
Vector Time - Definition
[1 0 0 0] [2 0 0 0] [3 0 0 0] [4 0 0 0] [5 3 0 0]
p1
[1 0 0 0] [2 0 0 0] [3 0 0 0] [2 3 0 0]
[2 4 0 0] [2 5 0 0]
p2
[2 1 0 0] [2 2 0 0] [2 3 0 0]
Processes
[2 5 0 0]
[1 0 1 2] [1 0 2 3] [3 0 4 5] [3 0 5 5]
p3
[3 0 3 3] [3 5 6 5]
[1 0 0 2] [1 0 0 3] [ 1 0 0 5] [3 0 5 5]
p4
[1 0 0 1] [1 0 0 2] [1 0 0 3] [1 0 0 4] [1 0 0 5] [3 0 5 6] [3 0 5 7]
Time
Course ID: SS ZG526, Title: Distributed Computing 12 BITS Pilani, Hyderabad Campus
Vector Time
• Comparing Vector timestamps from Recorded Lecture
• Vector Time properties from Recorded Lecture
Course ID: SS ZG526, Title: Distributed Computing 13 BITS Pilani, Hyderabad Campus
Vector Time – Basic Properties
Event counting
• if d is always 1 and vti[i] is the ith component of vector clock at process pi
• then vti[i] = no. of events that have occurred at pi until that instant
Course ID: SS ZG526, Title: Distributed Computing 14 BITS Pilani, Hyderabad Campus
Vector Time – Basic Properties
Event counting
•if an event e has timestamp vh
• vh[j] = number of events executed by process pj that causally
precede e
• Σ vh[j] −1 : total no. of events that causally precede e in the
distributed computation
Course ID: SS ZG526, Title: Distributed Computing 15 BITS Pilani, Hyderabad Campus
Singhal–Kshemkalyani’s Differential
Technique
p1
p2
Processes
p3
p4
Time
Course ID: SS ZG526, Title: Distributed Computing 16 BITS Pilani, Hyderabad Campus
Singhal–Kshemkalyani’s Differential
Technique
[1 0 0 0] [2 0 0 0] [3 0 0 0]
p1
{(1, 1)} {(1, 2)} {(1, 3)}
[2 3 1 0] [3 5 3 1] [3 6 6 3]
p2
[1 1 0 0] [2 2 0 0] [3 4 1 0]
{(3, 1)} {(3, 3), (4, 1)} {(3, 6), (4, 3)}
Processes
[0 0 2 1] [0 0 4 2] [0 0 5 3]
p3
[0 0 1 0] [0 0 3 1] [0 0 6 3]
p4
[0 0 0 1] [0 0 0 2] [0 0 0 3]
Time
Course ID: SS ZG526, Title: Distributed Computing 17 BITS Pilani, Hyderabad Campus
Fowler–Zwaenepoel’s Direct-
Dependency Technique
Course ID: SS ZG526, Title: Distributed Computing 18 BITS Pilani, Hyderabad Campus
Fowler–Zwaenepoel’s Direct-
Dependency Technique
p1 [1 0 0 0] [2 0 0 0] [3 0 0 0]
p2 [2 3 1 0] [3 5 3 0] [3 6 6 0]
[1 1 0 0] [2 2 0 0] [3 4 1 0]
{1} {3} {6}
Processes
p3 [0 0 2 1] [0 0 4 2] [0 0 5 3]
[0 0 1 0] [0 0 3 1] [0 0 6 3]
Time
Course ID: SS ZG526, Title: Distributed Computing 19 BITS Pilani, Hyderabad Campus
Reference
• Ajay D. Kshemkalyani, and Mukesh Singhal, Chapter 3, “Distributed
Computing: Principles, Algorithms, and Systems”, Cambridge
University Press, 2008.
Course ID: SS ZG526, Title: Distributed Computing 20 BITS Pilani, Hyderabad Campus
Course ID: SS ZG526, Title: Distributed Computing 21 BITS Pilani, Hyderabad Campus
Distributed Computing
Logical Time
Dr. Barsha Mitra
CSIS Dept, BITS Pilani, Hyderabad Campus
BITS Pilani
Hyderabad Campus
Introduction
•not possible to have global physical time
•realize only an approximation of physical time using logical time
•logical time advances in jumps
•captures monotonicity property
• monotonicity property: if event a causally affects event b, then timestamp
of a is smaller than timestamp of b
• Ways to represent logical time:
•scalar time
•vector time
Course ID: SS ZG526, Title: Distributed Computing 2 BITS Pilani, Hyderabad Campus
Logical Clocks
Course ID: SS ZG526, Title: Distributed Computing 3 BITS Pilani, Hyderabad Campus
Scalar Time - Definition
•Ci:
•integer variable
•denotes logical clock of pi
Course ID: SS ZG526, Title: Distributed Computing 4 BITS Pilani, Hyderabad Campus
Scalar Time - Definition
Rules for updating clock:
•R1: Before executing an event (send, receive, or internal), process pi
executes
Course ID: SS ZG526, Title: Distributed Computing 5 BITS Pilani, Hyderabad Campus
Evolution of Scalar Time
p1
p2
Processes
p3
p4
Time
Course ID: SS ZG526, Title: Distributed Computing 6 BITS Pilani, Hyderabad Campus
Evolution of Scalar Time
1 8 9 10 13 14
p1
1 7 9 10 12 14
6 7 11 12 13 15 16
p2
Processes 5 11 16
p3 4 5 10 11 12 17
3 12
p4
2 3 4 5 6 13 14
Time
Course ID: SS ZG526, Title: Distributed Computing 7 BITS Pilani, Hyderabad Campus
Scalar Time – Basic Properties
•Consistency
•Total Ordering
•Event Counting
•No Strong Consistency
•From Recorded Lecture
Course ID: SS ZG526, Title: Distributed Computing 8 BITS Pilani, Hyderabad Campus
Scalar Time – Basic Properties
1 8 9 10 13 14
p1
1 7 9 10 12 14
6 7 11 12 13 15 16
• d=1
p2
• height of event
Processes 5 11 16
p3 4 5 10 11 12 17
3 12
p4
2 3 4 5 6 13 14
Time
Course ID: SS ZG526, Title: Distributed Computing 9 BITS Pilani, Hyderabad Campus
Vector Time - Definition
• time domain is represented by a set of n-dimensional non-
negative integer vectors
• vti[i], vti[j]
Course ID: SS ZG526, Title: Distributed Computing 10 BITS Pilani, Hyderabad Campus
Vector Time - Definition
p1
p2
Processes
p3
p4
Time
Course ID: SS ZG526, Title: Distributed Computing 11 BITS Pilani, Hyderabad Campus
Vector Time - Definition
[1 0 0 0] [2 0 0 0] [3 0 0 0] [4 0 0 0] [5 3 0 0]
p1
[1 0 0 0] [2 0 0 0] [3 0 0 0] [2 3 0 0]
[2 4 0 0] [2 5 0 0]
p2
[2 1 0 0] [2 2 0 0] [2 3 0 0]
Processes
[2 5 0 0]
[1 0 1 2] [1 0 2 3] [3 0 4 5] [3 0 5 5]
p3
[3 0 3 3] [3 5 6 5]
[1 0 0 2] [1 0 0 3] [ 1 0 0 5] [3 0 5 5]
p4
[1 0 0 1] [1 0 0 2] [1 0 0 3] [1 0 0 4] [1 0 0 5] [3 0 5 6] [3 0 5 7]
Time
Course ID: SS ZG526, Title: Distributed Computing 12 BITS Pilani, Hyderabad Campus
Vector Time
• Comparing Vector timestamps from Recorded Lecture
• Vector Time properties from Recorded Lecture
Course ID: SS ZG526, Title: Distributed Computing 13 BITS Pilani, Hyderabad Campus
Vector Time – Basic Properties
Event counting
• if d is always 1 and vti[i] is the ith component of vector clock at process pi
• then vti[i] = no. of events that have occurred at pi until that instant
Course ID: SS ZG526, Title: Distributed Computing 14 BITS Pilani, Hyderabad Campus
Vector Time – Basic Properties
Event counting
•if an event e has timestamp vh
• vh[j] = number of events executed by process pj that causally
precede e
• Σ vh[j] −1 : total no. of events that causally precede e in the
distributed computation
Course ID: SS ZG526, Title: Distributed Computing 15 BITS Pilani, Hyderabad Campus
Singhal–Kshemkalyani’s Differential
Technique
p1
p2
Processes
p3
p4
Time
Course ID: SS ZG526, Title: Distributed Computing 16 BITS Pilani, Hyderabad Campus
Singhal–Kshemkalyani’s Differential
Technique
[1 0 0 0] [2 0 0 0] [3 0 0 0]
p1
{(1, 1)} {(1, 2)} {(1, 3)}
[2 3 1 0] [3 5 3 1] [3 6 6 3]
p2
[1 1 0 0] [2 2 0 0] [3 4 1 0]
{(3, 1)} {(3, 3), (4, 1)} {(3, 6), (4, 3)}
Processes
[0 0 2 1] [0 0 4 2] [0 0 5 3]
p3
[0 0 1 0] [0 0 3 1] [0 0 6 3]
p4
[0 0 0 1] [0 0 0 2] [0 0 0 3]
Time
Course ID: SS ZG526, Title: Distributed Computing 17 BITS Pilani, Hyderabad Campus
Fowler–Zwaenepoel’s Direct-
Dependency Technique
Course ID: SS ZG526, Title: Distributed Computing 18 BITS Pilani, Hyderabad Campus
Fowler–Zwaenepoel’s Direct-
Dependency Technique
p1 [1 0 0 0] [2 0 0 0] [3 0 0 0]
p2 [2 3 1 0] [3 5 3 0] [3 6 6 0]
[1 1 0 0] [2 2 0 0] [3 4 1 0]
{1} {3} {6}
Processes
p3 [0 0 2 1] [0 0 4 2] [0 0 5 3]
[0 0 1 0] [0 0 3 1] [0 0 6 3]
Time
Course ID: SS ZG526, Title: Distributed Computing 19 BITS Pilani, Hyderabad Campus
Reference
• Ajay D. Kshemkalyani, and Mukesh Singhal, Chapter 3, “Distributed
Computing: Principles, Algorithms, and Systems”, Cambridge
University Press, 2008.
Course ID: SS ZG526, Title: Distributed Computing 20 BITS Pilani, Hyderabad Campus
Course ID: SS ZG526, Title: Distributed Computing 21 BITS Pilani, Hyderabad Campus
Distributed Computing
Global State & Snapshot Recording
Algorithms
Dr. Barsha Mitra
BITS Pilani CSIS Dept, BITS Pilani, Hyderabad Campus
Hyderabad Campus
Introduction
•problems in recording global state
•lack of a globally shared memory
•lack of a global clock
•message transmission is asynchronous
•message transfer delays are finite but unpredictable
Course No.- SS ZG526, Course Title - Distributed Computing 2 BITS Pilani, Hyderabad Campus
A Distributed Program
•Cij : channel from process pi to process pj
•mij: message sent by pi to pj
•Global state of a distributed computation consists of
•states of the processes
•states of the communication channels
•State of a process is characterized by its local memory state
•Set of messages in transit in the channel characterizes state of channel
• occurrence of events
•changes processes' states
•changes channels' states
•causes transition in global system state
Course ID: SS ZG526, Title: Distributed Computing 3 BITS Pilani, Hyderabad Campus
System Model
Course No.- SS ZG526, Course Title - Distributed Computing 4 BITS Pilani, Hyderabad Campus
A Consistent Global State
• global state GS is a consistent global state iff:
• C1: every message mij that is recorded as sent in the local state of
sender pi must be captured either in the state of Cij or in the
collected local state of the receiver pj
• C1 states the law of conservation of messages
• C2: send(mij) LSi ⇒ mij SCij ∧ rec(mij) LSj
• Condition C2 states that for every effect, its cause must be present
Course No.- SS ZG526, Course Title - Distributed Computing 5 BITS Pilani, Hyderabad Campus
Chandy–Lamport Algorithm
Marker sending rule for process pi
(1) Process pi records its state.
(2) For each outgoing channel C on which a marker has not been sent, pi sends
a marker along C before pi sends further messages along C.
Marker receiving rule for process pj
On receiving a marker along channel C:
if pj has not recorded its state then
Record the state of C as the empty set
Execute the “marker sending rule”
else
Record the state of C as the set of messages received along C after pj’s
state was recorded and before pj received the marker along C
Course ID: SS ZG526, Title: Distributed Computing 6 BITS Pilani, Hyderabad Campus
Chandy–Lamport Algorithm
Putting together recorded snapshots - each process can send its local
snapshot to the initiator of the algorithm
Course ID: SS ZG526, Title: Distributed Computing 7 BITS Pilani, Hyderabad Campus
Chandy–Lamport Algorithm
500 450 450 350 370 370
S1: A
100
50
20
S2: B
Course ID: SS ZG526, Title: Distributed Computing 8 BITS Pilani, Hyderabad Campus
Snapshot Algorithms for Non-FIFO
Channels
• a marker cannot be used to
Course ID: SS ZG526, Title: Distributed Computing 9 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm
• coloring scheme
• every process is initially white
• process turns red while taking a snapshot
• equivalent of the “marker sending rule” is executed when
a process turns red
• every message sent by a white process is colored white
• a white message is a message that was sent before the
sender of that message recorded its local snapshot
Course ID: SS ZG526, Title: Distributed Computing 10 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm
Course ID: SS ZG526, Title: Distributed Computing 11 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm
• every white process records a history of all white messages sent or
received by it along each channel
• when a process turns red, it sends these histories along with its
snapshot to the initiator process that collects the global snapshot
• initiator process evaluates transit(LSi, LSj) to compute the state of a
channel Cij as:
SCij = {white messages sent by pi on Cij} − {white messages
received by pj on Cij}
= {mij | send(mij) ∈ LSi} − {mij | rec(mij) ∈ LSj}
Course ID: SS ZG526, Title: Distributed Computing 12 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm
800 780 750 740 740 740 770 770 790
P1: X
30 10
20
20
30
P2: Y
Course ID: SS ZG526, Title: Distributed Computing 13 BITS Pilani, Hyderabad Campus
Reference
• Ajay D. Kshemkalyani, and Mukesh Singhal, Chapter 4,
“Distributed Computing: Principles, Algorithms, and Systems”,
Cambridge University Press, 2008.
Course No.- SS ZG526, Course Title - Distributed Computing 14 BITS Pilani, Hyderabad Campus
Course No.- SS ZG526, Course Title - Distributed Computing 15 BITS Pilani, Hyderabad Campus
Distributed Computing
Global State & Snapshot Recording
Algorithms
Dr. Barsha Mitra
BITS Pilani CSIS Dept, BITS Pilani, Hyderabad Campus
Hyderabad Campus
Introduction
•problems in recording global state
•lack of a globally shared memory
•lack of a global clock
•message transmission is asynchronous
•message transfer delays are finite but unpredictable
Course No.- SS ZG526, Course Title - Distributed Computing 2 BITS Pilani, Hyderabad Campus
A Distributed Program
•Cij : channel from process pi to process pj
•mij: message sent by pi to pj
•Global state of a distributed computation consists of
•states of the processes
•states of the communication channels
•State of a process is characterized by its local memory state
•Set of messages in transit in the channel characterizes state of channel
• occurrence of events
•changes processes' states
•changes channels' states
•causes transition in global system state
Course ID: SS ZG526, Title: Distributed Computing 3 BITS Pilani, Hyderabad Campus
System Model
Course No.- SS ZG526, Course Title - Distributed Computing 4 BITS Pilani, Hyderabad Campus
A Consistent Global State
• global state GS is a consistent global state iff:
• C1: every message mij that is recorded as sent in the local state of
sender pi must be captured either in the state of Cij or in the
collected local state of the receiver pj
• C1 states the law of conservation of messages
• C2: send(mij) LSi ⇒ mij SCij ∧ rec(mij) LSj
• Condition C2 states that for every effect, its cause must be present
Course No.- SS ZG526, Course Title - Distributed Computing 5 BITS Pilani, Hyderabad Campus
Chandy–Lamport Algorithm
Marker sending rule for process pi
(1) Process pi records its state.
(2) For each outgoing channel C on which a marker has not been sent, pi sends
a marker along C before pi sends further messages along C.
Marker receiving rule for process pj
On receiving a marker along channel C:
if pj has not recorded its state then
Record the state of C as the empty set
Execute the “marker sending rule”
else
Record the state of C as the set of messages received along C after pj’s
state was recorded and before pj received the marker along C
Course ID: SS ZG526, Title: Distributed Computing 6 BITS Pilani, Hyderabad Campus
Chandy–Lamport Algorithm
Putting together recorded snapshots - each process can send its local
snapshot to the initiator of the algorithm
Course ID: SS ZG526, Title: Distributed Computing 7 BITS Pilani, Hyderabad Campus
Chandy–Lamport Algorithm
500 450 450 350 370 370
S1: A
100
50
20
S2: B
Course ID: SS ZG526, Title: Distributed Computing 8 BITS Pilani, Hyderabad Campus
Snapshot Algorithms for Non-FIFO
Channels
• a marker cannot be used to
Course ID: SS ZG526, Title: Distributed Computing 9 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm
• coloring scheme
• every process is initially white
• process turns red while taking a snapshot
• equivalent of the “marker sending rule” is executed when
a process turns red
• every message sent by a white process is colored white
• a white message is a message that was sent before the
sender of that message recorded its local snapshot
Course ID: SS ZG526, Title: Distributed Computing 10 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm
Course ID: SS ZG526, Title: Distributed Computing 11 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm
• every white process records a history of all white messages sent or
received by it along each channel
• when a process turns red, it sends these histories along with its
snapshot to the initiator process that collects the global snapshot
• initiator process evaluates transit(LSi, LSj) to compute the state of a
channel Cij as:
SCij = {white messages sent by pi on Cij} − {white messages
received by pj on Cij}
= {mij | send(mij) ∈ LSi} − {mij | rec(mij) ∈ LSj}
Course ID: SS ZG526, Title: Distributed Computing 12 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm
800 780 750 740 740 740 770 770 790
P1: X
30 10
20
20
30
P2: Y
Course ID: SS ZG526, Title: Distributed Computing 13 BITS Pilani, Hyderabad Campus
Course No.- SS ZG526, Course Title - Distributed Computing 14 BITS Pilani, Hyderabad Campus
Message ordering
Course ID: SS ZG526, Title: Distributed Computing 2 BITS Pilani, Hyderabad Campus
Causal Order
•A system supporting causal ordering model satisfies
CO: for mij and mkj , if send(mij ) → send(mkj ),
then rec(mij ) → rec(mkj )
•2 criteria must be satisfied by a causal ordering
protocol:
•Safety
•Liveness
Course ID: SS ZG526, Title: Distributed Computing 3 BITS Pilani, Hyderabad Campus
Causal Order
Safety
•a message M arriving at a process may need to be buffered until all
system-wide messages sent in the causal past of the send(M) event to the
same destination have already arrived
•distinction is made between
•arrival of a message at a process
•event at which the message is given to the application process
Liveness
A message that arrives at a process must eventually be delivered to the
process
Course ID: SS ZG526, Title: Distributed Computing 4 BITS Pilani, Hyderabad Campus
Raynal–Schiper–Toueg Algorithm
• each message M should carry a log of
• all other messages
• sent causally before M’s send event, and sent to the same
destination dest(M)
• log can be examined to ensure whether it is safe to deliver a
message
• channels are FIFO
Course ID: SS ZG526, Title: Distributed Computing 5 BITS Pilani, Hyderabad Campus
Raynal–Schiper–Toueg Algorithm
local variables:
• array of int SENT[1 …… n, 1 …… n] (n x n array)
• SENTi[j, k] = no. of messages sent by Pj to Pk as known to Pi
• array of int DELIV [1 …… n]
• DELIVi[j] = no. of messages from Pj that have been delivered to Pi
Course ID: SS ZG526, Title: Distributed Computing 6 BITS Pilani, Hyderabad Campus
Raynal–Schiper–Toueg Algorithm
Course ID: SS ZG526, Title: Distributed Computing 7 BITS Pilani, Hyderabad Campus
Raynal–Schiper–Toueg Algorithm
Complexity:
•space requirement at each process: O(n2) integers
•space overhead per message: n2 integers
•time complexity at each process for each send and deliver
event: O(n2)
Course ID: SS ZG526, Title: Distributed Computing 8 BITS Pilani, Hyderabad Campus
Raynal–Schiper–Toueg Algorithm
• P1 sent 3 messages to P2 SENT of P1 Updated SENT of P1
• P1 sent 4 messages to P3 • send(m1, SENT) to P2
0 3 4 0 4 4
• P2 sent 5 messages to P1 • SENT1[1, 2] = 4
5 0 2 5 0 2
• P2 sent 2 messages to P3 0 4 0 0 4 0
• P3 sent 4 messages to P2
• Now if,
• P1 sends m1 to P2 SENT of P2
DELIV2[1] >= ST[1, 2] 0 3 4
DELIV1 = [0 4 0]
5 0 2
DELIV2[3] >= ST[3, 2]
DELIV2 = [4
[3 0 4] 0 4 0
DELIV3 = [3 2 0]
Course ID: SS ZG526, Title: Distributed Computing 9 BITS Pilani, Hyderabad Campus
Raynal–Schiper–Toueg Algorithm
Updated SENT of P3
• P1 sent 3 messages to P2 SENT of P3
• P1 sent 4 messages to P3 • send(m2, SENT) to P2 0 3 4
0 3 4
• P2 sent 5 messages to P1 • SENT3[3, 2] = 5 5 0 2
5 0 2
• P2 sent 2 messages to P3 0 4 0
0 5 0
• P3 sent 4 messages to P2
• Now if, SENT of P2
• P3 sends m2 to P2 0 3 4
DELIV2[1] >= ST[1, 2] 5 0 2
DELIV1 = [0 4 0]
0 4 0
DELIV2[3] >= ST[3, 2]
DELIV2 = [4 0 5]
4]
DELIV3 = [3 2 0]
Course ID: SS ZG526, Title: Distributed Computing 10 BITS Pilani, Hyderabad Campus
Raynal–Schiper–Toueg Algorithm
• P1 sent 3 messages to P2 SENT of P1 Updated SENT of P1
• P1 sent 4 messages to P3 0 4 4 • send(m3, SENT) to P3 0 4 5
• P2 sent 5 messages to P1 5 0 2 • SENT1[1, 3] = 5 5 0 2
• P2 sent 2 messages to P3 0 4 0 0 4 0
• P3 sent 4 messages to P2
• Now if,
• P1 sends m3 to P3 SENT of P3
DELIV3[1] >= ST[1, 3] 0 3 4
DELIV1 = [0 4 0] 5 0 2
DELIV3[2] >= ST[2, 3] 0 5 0
DELIV2 = [4 0 5]
DELIV3 = [3 2 0]
Course ID: SS ZG526, Title: Distributed Computing 11 BITS Pilani, Hyderabad Campus
Birman-Schiper-Stephenson Protocol
• Ci = vector clock of Pi
• Ci[j] = jth element of Ci
Pi sends a message m to Pj
• Pi increments Ci[i]
Course ID: SS ZG526, Title: Distributed Computing 12 BITS Pilani, Hyderabad Campus
Birman-Schiper-Stephenson Protocol
Course ID: SS ZG526, Title: Distributed Computing 13 BITS Pilani, Hyderabad Campus
Birman-Schiper-Stephenson Protocol
P1
000
001
P2
000
001
P3
000
001
Course ID: SS ZG526, Title: Distributed Computing 14 BITS Pilani, Hyderabad Campus
Birman-Schiper-Stephenson Protocol
P1 • C2[3] = tb[3] – 1
000
• C2[1] >= tb[1]
001
• C2[2] >= tb[2]
P2 001
000
001
P3
000
001
Course ID: SS ZG526, Title: Distributed Computing 15 BITS Pilani, Hyderabad Campus
Birman-Schiper-Stephenson Protocol
P1 • C3[2] = tr[2] – 1
000
• C3[1] >= tr[1]
011 • C3[3] >= tr[3]
001
P2 001 011
000
001 011
P3
000
001
011
Course ID: SS ZG526, Title: Distributed Computing 16 BITS Pilani, Hyderabad Campus
Birman-Schiper-Stephenson Protocol
Course ID: SS ZG526, Title: Distributed Computing 17 BITS Pilani, Hyderabad Campus
Birman-Schiper-Stephenson Protocol
P1 (buffer) 0 0 1 011
000
011 deliver
0 0 1 from buffer SES Protocol from
P2 0 0 1 011
Recorded Lecture
000
001 011
P3
000
001
011
Course ID: SS ZG526, Title: Distributed Computing 18 BITS Pilani, Hyderabad Campus
References
• Ajay D. Kshemkalyani, and Mukesh Singhal, Chapter 6,
“Distributed Computing: Principles, Algorithms, and Systems”,
Cambridge University Press, 2008.
• Ajay D. Kshemkalyani, and Mukesh Singhal, Chapter 7,
“Distributed Computing: Principles, Algorithms, and Systems”,
Cambridge University Press, 2008.
• https://courses.csail.mit.edu/6.006/fall11/rec/rec14.pdf
Course ID: SS ZG526, Title: Distributed Computing 19 BITS Pilani, Hyderabad Campus
Course ID: SS ZG526, Title: Distributed Computing 20 BITS Pilani, Hyderabad Campus
Distributed Computing
Terminology & Basic Algorithms
Dr. Barsha Mitra
BITS Pilani CSIS Dept, BITS Pilani, Hyderabad Campus
Hyderabad Campus
Notation & Definitions
• undirected unweighted graph G = (N, L)
represents topology
• vertices are nodes
• edges are channels
• n = |N|
• l = |L|
• diameter of a graph – A, B 1 A, C 2
A, D 1 A, F 2
• minimum number of edges that need A, E 1 B, D 2
to be traversed to go from any node B, C 1 B, E 2
to any other node C, E 1 B, F 2
C, F 1 C, D 2
• diameter = maxi, j ∈ N {length of the D, E 1 D, F 2
shortest path between i and j} E, F 1
Course ID: SS ZG526, Title: Distributed Computing 2 BITS Pilani, Hyderabad Campus
Synchronous Single-Initiator Spanning
Tree Algorithm using Flooding
• algorithm proceeds in rounds (synchronous)
• root initiates the algorithm
• flooding of QUERY messages
• produce a spanning tree rooted at the root node
• each process Pi (Pi root) should output its own parent for the
spanning tree
Course ID: SS ZG526, Title: Distributed Computing 3 BITS Pilani, Hyderabad Campus
Synchronous Single-Initiator Spanning
Tree Algorithm using Flooding
Local Variables maintained at each Pi
• int visited
• int depth
• int parent
• set of int Neighbors Initially at each Pi
• visited = 0
• depth = 0
• parent = NULL
• Neighbors = set of neighbors
Course ID: SS ZG526, Title: Distributed Computing 4 BITS Pilani, Hyderabad Campus
Synchronous Single-Initiator Spanning
Tree Algorithm using Flooding
Algorithm for Pi
Round r = 1
if Pi = root then
visited = 1
depth = 0
send QUERY to Neighbors
if Pi receives a QUERY message then
visited = 1
depth = r
parent = root
plan to send QUERYs to Neighbors at next round
Course ID: SS ZG526, Title: Distributed Computing 5 BITS Pilani, Hyderabad Campus
Synchronous Single-Initiator Spanning
Tree Algorithm using Flooding
Algorithm for Pi
Round r > 1 and r <= diameter
if Pi planned to send in previous round then
Pi sends QUERY to Neighbors
if visited = 0 then
if Pi receives QUERY messages then
visited = 1
depth = r
parent = any randomly selected node from which QUERY was received
plan to send QUERY to Neighbors \ {senders of QUERYs received in r}
Course ID: SS ZG526, Title: Distributed Computing 6 BITS Pilani, Hyderabad Campus
Synchronous Single-Initiator Spanning
Tree Algorithm using Flooding
A (1) B C
QUERY
(1)
F E D
Course ID: SS ZG526, Title: Distributed Computing 7 BITS Pilani, Hyderabad Campus
Synchronous Single-Initiator Spanning
Tree Algorithm using Flooding
A (1) B C
QUERY
(1)
F E D
Course ID: SS ZG526, Title: Distributed Computing 8 BITS Pilani, Hyderabad Campus
Synchronous Single-Initiator Spanning
Tree Algorithm using Flooding
A B (2) C
QUERY
(2)
F (2) E D
Course ID: SS ZG526, Title: Distributed Computing 9 BITS Pilani, Hyderabad Campus
Synchronous Single-Initiator Spanning
Tree Algorithm using Flooding
A B (2) C
QUERY
(2)
F (2) E D
Course ID: SS ZG526, Title: Distributed Computing 10 BITS Pilani, Hyderabad Campus
Synchronous Single-Initiator Spanning
Tree Algorithm using Flooding
A B C
(3)
QUERY
(3)
(3)
F E (3) D
Course ID: SS ZG526, Title: Distributed Computing 11 BITS Pilani, Hyderabad Campus
Synchronous Single-Initiator Spanning
Tree Algorithm using Flooding
A B C
(3)
QUERY
(3)
(3)
F E (3) D
Course ID: SS ZG526, Title: Distributed Computing 12 BITS Pilani, Hyderabad Campus
Synchronous Single-Initiator Spanning
Tree Algorithm using Flooding
A B C
F E D
Course ID: SS ZG526, Title: Distributed Computing 13 BITS Pilani, Hyderabad Campus
Broadcast and Convergecast on a Tree
•A spanning tree is useful for distributing (via a broadcast) and collecting
(via a convergecast) information to/from all the nodes
Broadcast Algorithm:
•BC1:
•The root sends the information to
be broadcast to all its children.
•BC2:
•When a (non-root) node receives
information from its parent, it
copies it and forwards it to its
children.
Course ID: SS ZG526, Title: Distributed Computing 14 BITS Pilani, Hyderabad Campus
Broadcast and Convergecast on a Tree
Convergecast Algorithm:
• CVC1:
• Leaf node sends its report to its parent.
• CVC2:
• At a non-leaf node that is not the root:
When a report is received from all the
child nodes, the collective report is sent
to the parent.
• CVC3:
• At the root: When a report is received
from all the child nodes, the global
function is evaluated using the reports.
Course ID: SS ZG526, Title: Distributed Computing 15 BITS Pilani, Hyderabad Campus
Broadcast and Convergecast on a Tree
Complexity
Course ID: SS ZG526, Title: Distributed Computing 16 BITS Pilani, Hyderabad Campus
Reference
• Ajay D. Kshemkalyani, and Mukesh Singhal, Chapter 5,
“Distributed Computing: Principles, Algorithms, and Systems”,
Cambridge University Press, 2008.
Course ID: SS ZG526, Title: Distributed Computing 17 BITS Pilani, Hyderabad Campus
Course ID: SS ZG526, Title: Distributed Computing 18 BITS Pilani, Hyderabad Campus
Distributed Mutual Exclusion
• Token-based
• Assertion-based
• From Recorded Lecture
• Quorum-based
REPLY
RELEASE
7 sites – S1, S2, S3, S4, S5, S6 and S7, K = 3. R1 = {S1, S2, S4}
R2 = {S2, S3, S5}
R3 = {S1, S3, S7}
R4 = {S2, S5, S6} R3 R4 = M1.
R5 = {S1, S4, S5} R5 R7 = M1.
R6 = {S3, S4, S6}
S4 does not belong to R4 M2.
R7 = {S2, S6, S7}
S2 belongs to R1, R2, R4 and R7 and K = 3 M4.
S7 belongs to R3 and R7 and K = 3 M4.
N1 N2 N3 N4 N1 N2 N3 N4
N9 N6 N5 N9 N6 N5
N8 N7 N8 N7
Data Structures
• HOLDER
• USING
• possible values true or false
• indicates if the current node is executing the critical section
• ASKED
• possible values true or false
• indicates if node has sent a request for the privilege
• prevents the sending of duplicate requests for privilege
Data Structures
• REQUEST_Q
• FIFO queue that can contain “self ” or the identities of
immediate neighbors as elements
• REQUEST_Q of a node consists of the identities of
those immediate neighbors that have requested for
privilege but have not yet been sent the privilege
• maximum size of REQUEST_Q of a node is the number
of immediate neighbors + 1 (for “self ”)
(e) For every site Sj whose i.d. is not in the token queue, it
appends its i.d. to the token queue if RNi[j] = LN[j] + 1.
Performance
• if a site holds the token, no message is required
• if a site does not hold the token, N messages are required
Chittaranjan Hota
Professor, Dept. of Computer Sc. & Information Systems
BITS Pilani BITS, Pilani Hyderabad Campus, Hyderabad
Hyderabad Campus hota@hyderabad.bits-pilani.ac.in
Course Overview
Introduction to distributed computing
Logical time
Global snapshot
Causal ordering of messages in group communication
Peer-to-Peer computing
Cluster computing
Grid computing
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Text and References
R2 John F. Buford, Heather Yu, and Eng K. Lua, “P2P Networking and
Applications”, Morgan Kaufmann, 2009 Elsevier Inc.
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Golden era in Computing
Powerful multi-core
processors
Explosion of General purpose
domain graphic processors
applications
Proliferation Superior
of devices software
methodologies
Virtualization
Wider bandwidth
leveraging the
for communication
powerful hardware
Image source: Cloud Futures 2011, Redmond
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
What is a Distributed System?
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Relation between software components
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Networked vs. distributed systems
Networking OS Distributed OS
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Motivation for Distributed Computing
• Resource sharing
• Access to remote resources
• Increased performance/cost ratio
• Reliability
• Availability, integrity, fault-tolerance
• Scalability
• Modularity and incremental expandability
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Few Examples …
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Continued…
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Next session
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
SS ZG 526: Distributed Computing
Introduction: Part 2
Chittaranjan Hota
Professor, Dept. of Computer Sc. & Information Systems
BITS Pilani BITS, Pilani Hyderabad Campus, Hyderabad
Hyderabad Campus hota@hyderabad.bits-pilani.ac.in
Multiprocessor Vs Multi-computer systems
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Multiprocessors
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Distributed Systems: Pros and Cons
• Advantages
– Communication and resource sharing possible
– Economy : Price-performance
– Reliability & scalability
– Potential for incremental growth
• Disadvantages
– Distribution-aware OSs and applications
– High network connectivity is essential
– Security and privacy
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Design Issues: Scaling
• If possible, do asynchronous communication
– Not always possible if the client has nothing to do
• Alternatively, by moving part of the computation
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
More Design Issues
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Next session
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
SS ZG 526: Distributed Computing
Distributed Communication Models
Chittaranjan Hota
Professor, Dept. of Computer Sc. & Information Systems
BITS Pilani BITS, Pilani Hyderabad Campus, Hyderabad
Hyderabad Campus hota@hyderabad.bits-pilani.ac.in
RPC: Remote Procedure Call
• Issues:
– identifying and accessing the remote procedure
– parameters
– return value
• Sun RPC
• Microsoft’s DCOM
• OMG’s CORBA
• Java RMI
• XML/RPC
• SOAP/.NET
• AJAX (Asynchronous Javascript and XML)
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Remote Procedure Calls (RPC)
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Procedure arguments
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Procedure identification
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Program identifiers
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
SUN RPC
struct square_in {
long arg1;
};
struct square_out {
long res1;
};
program SQUARE_PROG {
version SQUARE_VERS {
square_out SQUAREPROC(square_in) = 1;
} = 1;
} = 0x13451111;
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Rpcgen
rpcgen
C Source Code
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Rpcgen continued…
produces:
– square.h header
– square_svc.c server stub
– square_clnt.c client stub
– square_xdr.c XDR conversion routines
Function names derived from IDL function names and version numbers
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Square Client: Client.c
#include “square.h”
#include <stdio.h>
int main (int argc, char **argv)
{ CLIENT *cl;
square_in in;
square_out *out;
if (argc != 3) { printf(“client <localhost> <integer>”); exit (1); }
cl = clnt_create (argv[1], SQUARE_PROG, SQUARE_VERS, “tcp”);
in.arg1 = atol (argv [2]);
if ((out = squareproc_1(&in, cl)) == NULL)
{ printf (“Error”); exit(1); }
printf (“Result %ld\n”, out -> res1);
exit(0);
}
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Square server: server.c
#include “square.h”
#include <stdio.h>
square_out *squareproc_1_svc (square_in *inp, struct svc_req *rqstp)
{
static square_out *outp;
outp.res1 = inp -> arg1 * inp -> arg1;
return (&outp);
}
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Exe creation
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Next module
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
SS ZG 526: Distributed Computing
Distributed Computation Model, and Logical Clocks
Chittaranjan Hota
Professor, Dept. of Computer Sc. & Information Systems
BITS Pilani BITS, Pilani Hyderabad Campus, Hyderabad
Hyderabad Campus hota@hyderabad.bits-pilani.ac.in
A distributed program
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
A model of distributed executions
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Models of Communication Networks
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Global state of a distributed system
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Consistent global state
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Next session
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
SS ZG 526: Distributed Computing
Distributed Mutual Exclusion (DME)
Chittaranjan Hota
Professor, Dept. of Computer Sc. & Information Systems
BITS Pilani BITS, Pilani Hyderabad Campus, Hyderabad
Hyderabad Campus hota@hyderabad.bits-pilani.ac.in
Distributed Mutual Exclusion
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Performance of DME Algorithms
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Continued…
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
A Centralized Algorithm
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Next session
Lamports’ DME
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
SS ZG 526: Distributed Computing
Distributed Mutual Exclusion (DME)
Chittaranjan Hota
Professor, Dept. of Computer Sc. & Information Systems
BITS Pilani BITS, Pilani Hyderabad Campus, Hyderabad
Hyderabad Campus hota@hyderabad.bits-pilani.ac.in
Lamport’s Distributed Mutual Exclusion
Requesting the critical section.
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Continued…
Releasing the critical section.
1. Site Si, upon exiting the CS, removes its’ request from the top of its’ request
queue and sends a timestamped RELEASE message to all the sites in its
request set.
2. When a site Sj receives a RELEASE message from site Si, it removes Si’s
request from its request queue.
3. When a site removes a request from its request queue, its own request may
become the top of the queue, enabling it to enter the CS. The algorithm
executes CS requests in the increasing order of timestamps.
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Example
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Correctness
• Suppose that both Si and Sj were in CS at the same time (t).
• Then we have:
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Next Session
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
SS ZG 526: Distributed Computing
Distributed Mutual Exclusion (DME)
Chittaranjan Hota
Professor, Dept. of Computer Sc. & Information Systems
BITS Pilani BITS, Pilani Hyderabad Campus, Hyderabad
Hyderabad Campus hota@hyderabad.bits-pilani.ac.in
Ricart-Agrawala DME
Requesting Site:
– A requesting site Pi sends a message request(ts,i) to all
sites.
Receiving Site:
– Upon reception of a request(ts,i) message, the receiving
site Pj will immediately send a timestamped reply(ts,j)
message if and only if:
• Pj is not requesting or executing the critical section OR
• Pj is requesting the critical section but sent a request
with a higher timestamp than the timestamp of Pi
– Otherwise, Pj will defer the reply message.
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Example
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Maekawa’s DME
• Request set of sites Si & Sj: Ri, Rj such that Ri and Rj will have
at-least one common site (Sk). Sk mediates conflicts between
Ri and Rj.
• A site can send only one REPLY message at a time, i.e., a site
can send a REPLY message only after receiving a
RELEASE message for the previous REPLY message.
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Request set
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Maekawa’s Request set
R1 = { 1, 2, 3, 4 }
R2 = { 2, 5, 8, 11 }
R3 = { 3, 6, 8, 13 }
R4 = { 4, 6, 10, 11 }
R5 = { 1, 5, 6, 7 }
R6 = { 2, 6, 9, 12 }
R7 = { 2, 7, 10, 13 } N=13
R8 = { 1, 8, 9, 10 }
R9 = { 3, 7, 9, 11 }
R10 = { 3, 5, 10, 12 }
R11 = { 1, 11, 12, 13 }
R12 = { 4, 7, 8, 12 }
R13 = { 4, 5, 9, 13 }
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Next Session
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
SS ZG 526: Distributed Computing
Distributed Mutual Exclusion (DME)
Chittaranjan Hota
Professor, Dept. of Computer Sc. & Information Systems
BITS Pilani BITS, Pilani Hyderabad Campus, Hyderabad
Hyderabad Campus hota@hyderabad.bits-pilani.ac.in
Maekawa’s DME Algo
Requesting the critical section
1. A site Si requests access to the CS by sending REQUEST(i)
messages to all the sites in its request set Ri.
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Example
R1 {S1, S2, S3}
R2 {S2, S4, S6}
R3 {S3, S5, S6}
R4 {S1, S4, S5}
R5 {S2, S5, S7}
R6 {S1, S6, S7}
R7 {S3, S4, S7}
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Deadlocks
R1 {S1, S2, S3}
R2 {S2, S4, S6}
R3 {S3, S5, S6}
R4 {S1, S4, S5}
R5 {S2, S5, S7}
R6 {S1, S6, S7}
R7 {S3, S4, S7}
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Deadlock handling
FAILED
A FAILED message from site Si to site Sj indicates that Si
cannot grant Sj’s request because it has currently granted
permission to a site with a higher priority request. So that Sj
will not think that it is just waiting for the message to arrive.
INQUIRE
An INQUIRE message from Si to Sj indicates that Si would like
to find out from Sj if it has succeeded in locking all the sites in
its request set.
YIELD
A YIELD message from site Si to Sj indicates that Si is
returning the permission to Sj (to yield to a higher priority
request at Sj ).
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Next Session
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
SS ZG 526: Distributed Computing
Distributed Computation Model, and Logical Clocks
Chittaranjan Hota
Professor, Dept. of Computer Sc. & Information Systems
BITS Pilani BITS, Pilani Hyderabad Campus, Hyderabad
Hyderabad Campus hota@hyderabad.bits-pilani.ac.in
Logical Clocks: Scalar time and Vector time
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Scalar Time
• Proposed by Lamport as an attempt to totally order events
in a distributed system.
• Time domain is the set of non-negative integers.
• The logical local clock of a process pi and its local view of
the global time are squashed into one integer variable Ci.
• Rules R1 and R2 to update the clocks are as follows:
• R1: Before executing an event (send, receive, or internal),
process pi executes the following:
Ci := Ci + d (d > 0)
In general, every time R1 is executed, d can have a
different value; however, typically d is kept at 1.
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Scalar Time
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Scalar Time Examples
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Basic Properties
Consistency Property
• Scalar clocks satisfy the monotonicity and hence the
consistency property:
for two events ei and ej,
Total Ordering
• Scalar clocks can be used to totally order events in a distributed
system.
• The main problem in totally ordering events is that two or more
events at different processes may have identical timestamp.
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Total Ordering
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Properties. . .
Event counting
• If the increment value d is always 1, the scalar time has the
following interesting property: if event e has a timestamp h, then
h-1 represents the minimum logical duration, counted in units of
events, required before producing the event e;
• We call it the height of the event e.
• In other words, h-1 events have been produced sequentially
before the event e regardless of the processes that produced
these events.
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Properties. . .
No Strong Consistency
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Next session
Vector clocks
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
SS ZG 526: Distributed Computing
Distributed Computation Model, and Logical Clocks
Chittaranjan Hota
Professor, Dept. of Computer Sc. & Information Systems
BITS Pilani BITS, Pilani Hyderabad Campus, Hyderabad
Hyderabad Campus hota@hyderabad.bits-pilani.ac.in
Vector Time
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Vector Time
Process pi uses the following two rules R1 and R2 to update its
clock:
• R1: Before executing an event, process pi updates its local
logical time as follows:
vti[i] := vti[i] + d (d > 0)
• R2: Each message m is piggybacked with the vector clock vt of
the sender process at sending time. On the receipt of such a
message (m,vt), process pi executes the following sequence of
actions:
Update its global logical time as follows:
1≤ k≤ n : vti[k] := max(vti [k], vt[k])
Execute R1.
Deliver the message m.
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Vector Time Example
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Vector Time
Comparing Vector Timestamps
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Vector Time
Strong Consistency
• The system of vector clocks is strongly consistent; thus, by examining the
vector timestamp of two events, we can determine if the events are causally
related.
• However, Charron-Bost showed that the dimension of vector clocks cannot be
less than n, the total number of processes in the distributed computation, for
this property to hold.
Event Counting
• If d=1 (in rule R1), then the ith component of vector clock at process pi, vti[i],
denotes the number of events that have occurred at pi until that instant.
• So, if an event e has timestamp vh, vh[j] denotes the number of events
executed by process pj that causally precede e. Clearly, Σvh[j] − 1 represents
the total number of events that causally precede e in the distributed
computation.
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Efficient Implementations of Vector Clocks
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Singhal-Kshemkalyani’s Differential Technique
• based on the fact that between successive message sends to the same process, only
a few entries of the vector clock at the sender process are likely to change.
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Next module
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
SS ZG 526: Distributed Computing
Global State Recording
Chittaranjan Hota
Professor, Dept. of Computer Sc. & Information Systems
BITS Pilani BITS, Pilani Hyderabad Campus, Hyderabad
Hyderabad Campus hota@hyderabad.bits-pilani.ac.in
Global state recording
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Uses of Global State Recording
– termination detection
– deadlock detection
…
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Issues in recording a global state
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Difficulties due to Non-determinism
• Deterministic Computation
• - At any point in computation there is at
most one event that can happen next.
• Non-Deterministic Computation
• - At any point in computation there can
be more than one event that can happen
next.
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Example
Non-deterministic
Deterministic
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Consistent/ In-consistent global states
p q
p
m2
m1
m3
q
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Properties of the recorded global state
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Next session
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
SS ZG 526: Distributed Computing
Global State Recording
Chittaranjan Hota
Professor, Dept. of Computer Sc. & Information Systems
BITS Pilani BITS, Pilani Hyderabad Campus, Hyderabad
Hyderabad Campus hota@hyderabad.bits-pilani.ac.in
Snapshot algorithms for FIFO channels
Chandy-Lamport algorithm
• The Chandy-Lamport algorithm uses a control message, called
a marker whose role in a FIFO system is to separate
messages in the channels.
• After a site has recorded its snapshot, it sends a marker, along
all of its outgoing channels before sending out any more
messages.
• A marker separates the messages in the channel into those to
be included in the snapshot from those not to be recorded in
the snapshot.
• A process must record its snapshot no later than when it
receives a marker on any of its incoming channels.
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Chandy-Lamport algorithm continued…
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
State Recording Example
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Global state recording for non-FIFO channels
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Lai-Yang algorithm for non-FIFO channels
The Lai-Yang algorithm uses a coloring scheme on computation
messages that works as follows:
1 Every process is initially white and turns red while taking a
snapshot. The equivalent of the “Marker Sending Rule” is
executed when a process turns red.
2 Every message sent by a white (red) process is colored white
(red).
3 Thus, a white (red) message is a message that was sent before
(after) the sender of that message recorded its local snapshot.
4 Every white process takes its snapshot at its convenience, but
no later than the instant it receives a red message.
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Lai-Yang algorithm continued…
4 Every white process records a history of all white messages
sent or received by it along each channel.
5 When a process turns red, it sends these histories along with its
snapshot to the initiator process that collects the global
snapshot.
6 The initiator process evaluates transit (LSi, LSj) to compute the
state of a channel Cij as given below:
SCij= white messages sent by Pi on Cij − white messages
received by Pj on Cij
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Next module
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
SS ZG 526: Distributed Computing
Causal Ordering of Messages
Chittaranjan Hota
Professor, Dept. of Computer Sc. & Information Systems
BITS Pilani BITS, Pilani Hyderabad Campus, Hyderabad
Hyderabad Campus hota@hyderabad.bits-pilani.ac.in
Group communication
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Causal ordering of Messages
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Birman- Schiper- Stephenson (BSS) protocol
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
SS ZG 526: Distributed Computing
Causal Ordering of Messages
Chittaranjan Hota
Professor, Dept. of Computer Sc. & Information Systems
BITS Pilani BITS, Pilani Hyderabad Campus, Hyderabad
Hyderabad Campus hota@hyderabad.bits-pilani.ac.in
Schiper-Eggli-Sandoz (SES) protocol
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
SS ZG 526: Distributed Computing
Distributed Mutual Exclusion (DME)
Chittaranjan Hota
Professor, Dept. of Computer Sc. & Information Systems
BITS Pilani BITS, Pilani Hyderabad Campus, Hyderabad
Hyderabad Campus hota@hyderabad.bits-pilani.ac.in
Suzuki-kasami broadcast DME
LN[j] is the sequence no. of the request that site Sj executed most recently.
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
The DME algorithm
Requesting the critical section.
1. If the requesting site Si does not have the token, then it increments it’s sequence number,
RNi [i], and sends a REQUEST(i, sn) message to all other sites. (sn is the updated value of
RNi [i].)
2. When a site Sj receives this message, it sets RNj [i] to max(RNj [i], sn). If Sj has the idle
token, it sends the token to Si if RNj [i] = LN[i] + 1.
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Example
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Analysis of the DME algorithm
Correctness
Mutex is trivial.
– Theorem:
A requesting site enters the CS in finite amount of time.
– Proof:
A request enters the token queue in finite time. The queue is
in FIFO order, and there can be a maximum N-1 sites ahead
of the request.
Performance
0 or N messages per CS invocation.
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Next Session
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
SS ZG 526: Distributed Computing
Distributed Mutual Exclusion (DME)
Chittaranjan Hota
Professor, Dept. of Computer Sc. & Information Systems
BITS Pilani BITS, Pilani Hyderabad Campus, Hyderabad
Hyderabad Campus hota@hyderabad.bits-pilani.ac.in
Raymonds’ Tree Based Algo
where the
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
The Algorithm
Requesting the critical section.
1. When a site wants to enter into the CS, it sends a REQUEST message to the
node along the directed path to the root, provided it does not hold the token and
it’s request_q is empty. It then adds it to it’s request_q.
2. When a site on the path receives this message, it places the REQUEST in it’s
request_q and sends a REQUEST message along the directed path to the root
provided it has not sent out a REQUEST message on its outgoing edge.
3. When the root site receives a REQUEST message, it sends the token to the
site from which it received the REQUEST message and sets it’s holder
variable to point at that site.
4. When the site receives the token, it deletes the top entry from its request_q,
sends the token to the site indicated in this entry, and sets its holder variable
to point at that site. If the request_q is nonempty at this point, then the site
sends a REQUEST message to the site which is pointed at by holder
variable.
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Continued…
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Example1
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Example2
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Analysis
Proof of Correctness
Mutex is trivial.
Finite waiting: all the requests in the system form a FIFO queue and the
token is passed in that order.
Performance
O(logN) messages per CS invocation. (Average distance between two
nodes in a tree)
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)
Next module
SS ZG 526: Distributed Computing Birla Institute of Technology and Science, Pilani (Deemed University)