Chapter1 DS PDF

Chapter 1: Introduction
Ajay Kshemkalyani and Mukesh Singhal
Distributed Computing: Principles, Algorithms, and Systems
Cambridge University Press
A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 1 / 36

Definition
Autonomous processors communicating over a communication network

Some characteristics
I No common physical clock
I No shared memory
I Geographical seperation
I Autonomy and heterogeneity

Distributed System Model
P M P M
P M
Communication network P processor(s)

M memory bank(s)
(WAN/ LAN)
P M P M
P M P M
Figure 1.1: A distributed system connects processors by a communication network.

Relation between Software Components
Distributed application Extent of

distributed
Distributed software protocols
Network protocol stack

(middleware libraries)
Application layer
Operating
Transport layer
system
Network layer
Data link layer
Figure 1.2: Interaction of the software components at each process.

Motivation for Distributed System
Inherently distributed computation

Resource sharing
Access to remote resources
Increased performance/cost ratio
Reliability
I availability, integrity, fault-tolerance
Scalability
Modularity and incremental expandability

Parallel Systems
Multiprocessor systems (direct access to shared memory, UMA model)

I Interconnection network - bus, multi-stage sweitch
I E.g., Omega, Butterfly, Clos, Shuffle-exchange networks
I Interconnection generation function, routing function
Multicomputer parallel systems (no direct access to shared memory, NUMA
model)
I bus, ring, mesh (w w/o wraparound), hypercube topologies
I E.g., NYU Ultracomputer, CM* Conneciton Machine, IBM Blue gene
Array processors (colocated, tightly coupled, common system clock)
I Niche market, e.g., DSP applications

UMA vs. NUMA Models
P P P P PM PM PM
Interconnection network Interconnection network
M M M M PM PM PM
(a) (b)
M memory P processor
Figure 1.3: Two standard architectures for parallel systems. (a) Uniform memory
access (UMA) multiprocessor system. (b) Non-uniform memory access (NUMA)
multiprocessor. In both architectures, the processors may locally cache data from
memory.

Omega, Butterfly Interconnects
P0 000 000 M0 P0 000 000 M0

P1 001 001 M1 P1 001 001 M1
P2 010 010 M2 P2 010 010 M2
P3 011 011 M3 P3 011 011 M3
P4 100 100 M4 P4 100 100 M4
P5 101 101 M5 P5 101 101 M5
P6 110 110 M6 P6 110 110 M6
P7 111 111 M7 P7 111 111 M7
(a) 3−stage Omega network (n=8, M=4) (b) 3−stage Butterfly network (n=8, M=4)
Figure 1.4: Interconnection networks for shared memory multiprocessor systems.
(a) Omega network (b) Butterfly network.

Omega Network
n processors, n memory banks

log n stages: with n/2 switches of size 2x2 in each stage
Interconnection function: Output i of a stage connected to input j of next
stage:
2i for 0 ≤ i ≤ n/2 − 1
j=
2i + 1 − n for n/2 ≤ i ≤ n − 1
Routing function: in any stage s at any switch:
to route to dest. j,
if s + 1th MSB of j = 0 then route on upper wire
else [s + 1th MSB of j = 1] then route on lower wire

Interconnection Topologies for Multiprocesors
0100 0110 1110

1100
0000 0010 1010
1000
0101 0111 1111

1101
0001 0011 1011
1001
processor + memory
(a) (b)
Figure 1.5: (a) 2-D Mesh with wraparound (a.k.a. torus) (b) 3-D hypercube

Flynn’s Taxonomy
I I I I I
C C C C C C Control Unit
I I I I P Processing Unit
P P P P P P I instruction stream
D D D D D D data stream
(a) SIMD (b) MIMD (c) MISD
Figure 1.6: SIMD, MISD, and MIMD modes.

SISD: Single Instruction Stream Single Data Stream (traditional)
SIMD: Single Instruction Stream Multiple Data Stream
I scientific applicaitons, applications on large arrays
I vector processors, systolic arrays, Pentium/SSE, DSP chips
MISD: Multiple Instruciton Stream Single Data Stream
I E.g., visualization
MIMD: Multiple Instruction Stream Multiple Data Stream
I distributed systems, vast majority of parallel systems

Terminology
Coupling
I Interdependency/binding among modules, whether hardware or software (e.g.,
OS, middleware)
Parallelism: T (1)/T (n).
I Function of program and system
Concurrency of a program
I Measures productive CPU time vs. waiting for synchronization operations
Granularity of a program
I Amt. of computation vs. amt. of communication
I Fine-grained program suited for tightly-coupled system

Message-passing vs. Shared Memory
Emulating MP over SM:

I Partition shared address space
I Send/Receive emulated by writing/reading from special mailbox per pair of
processes
Emulating SM over MP:
I Model each shared object as a process
I Write to shared object emulated by sending message to owner process for the
object
I Read from shared object emulated by sending query to owner of shared object

Classification of Primitives (1)
Synchronous (send/receive)
I Handshake between sender and receiver
I Send completes when Receive completes
I Receive completes when data copied into buffer
Asynchronous (send)
I Control returns to process when data copied out of user-specified buffer

Classification of Primitives (2)
Blocking (send/receive)
I Control returns to invoking process after processing of primitive (whether sync
or async) completes
Nonblocking (send/receive)
I Control returns to process immediately after invocation
I Send: even before data copied out of user buffer
I Receive: even before data may have arrived from sender

Non-blocking Primitive
Send(X, destination, handlek ) //handlek is a return parameter

...
...
Wait(handle1 , handle2 , . . . , handlek , . . . , handlem ) //Wait always blocks
Figure 1.7: A nonblocking send primitive. When the Wait call returns, at least
one of its parameters is posted.
Return parameter returns a system-generated handle
I Use later to check for status of completion of call
I Keep checking (loop or periodically) if handle has been posted
I Issue Wait(handle1, handle2, . . .) call with list of handles
I Wait call blocks until one of the stipulated handles is posted

Blocking/nonblocking; Synchronous/asynchronous;
send/receive primities
process i
S S_C S W W
P, S_C
buffer_i
kernel_i
kernel_j
buffer_j
P, R_C
process j R_C
R R W W
(a) blocking sync. Send, blocking Receive (b) nonblocking sync. Send, nonblocking Receive
S S_C S W W
process i P,
S_C
buffer_i
kernel_i
(c) blocking async. Send (d) nonblocking async. Send
duration to copy data from or to user buffer

duration in which the process issuing send or receive primitive is blocked
S Send primitive issued S_C processing for Send completes
R Receive primitive issued R_C processing for Receive completes
P The completion of the previously initiated nonblocking operation
W Process may issue Wait to check completion of nonblocking operation
Figure 1.8:Illustration of 4 send and 2 receive primitives

Asynchronous Executions; Mesage-passing System
P0
m1 m7
P1
m2 m6
m4
P2
m3 m5
P3
internal event send event receive event
Figure 1.9: Asynchronous execution in a message-passing system

Synchronous Executions: Message-passing System
P
0
P
1
P
2
P
3
round 1 round 2 round 3
Figure 1.10: Synchronous execution in a message-passing system
In any round/step/phase: (send | internal)∗ (receive | internal)∗
(1) Sync Execution(int k, n) //k rounds, n processes.
(2) for r = 1 to k do
(3) proc i sends msg to (i + 1) mod n and (i − 1) mod n;
(4) each proc i receives msg from (i + 1) mod n and (i − 1) mod n;
(5) compute app-specific function on received values.

Synchronous vs. Asynchronous Executions (1)
Sync vs async processors; Sync vs async primitives

Sync vs async executions
Async execution
I No processor synchrony, no bound on drift rate of clocks
I Message delays finite but unbounded
I No bound on time for a step at a process
Sync execution
I Processors are synchronized; clock drift rate bounded
I Message delivery occurs in one logical step/round
I Known upper bound on time to execute a step at a process

Synchronous vs. Asynchronous Executions (2)
Difficult to build a truly synchronous system; can simulate this abstraction

Virtual synchrony:
I async execution, processes synchronize as per application requirement;
I execute in rounds/steps
Emulations:
I Async program on sync system: trivial (A is special case of S)
I Sync program on async system: tool called synchronizer

System Emulations
Asynchronous A−>S Synchronous

message−passing (AMP) message−passing (SMP)
S−>A
MP−>SM SM−>MP MP−>SM SM−>MP
A−>S
Asynchronous Synchronous
shared memory (ASM) shared memory (SSM)
S−>A
Figure 1.11: Sync ↔ async, and shared memory ↔ msg-passing emulations
Assumption: failure-free system
System A emulated by system B:
I If not solvable in B, not solvable in A
I If solvable in A, solvable in B

Challenges: System Perspective (1)
Communication mechanisms: E.g., Remote Procedure Call (RPC), remote

object invocation (ROI), message-oriented vs. stream-oriented
communication
Processes: Code migration, process/thread management at clients and
servers, design of software and mobile agents
Naming: Easy to use identifiers needed to locate resources and processes
transparently and scalably
Synchronization
Data storage and access
I Schemes for data storage, search, and lookup should be fast and scalable
across network
I Revisit file system design
Consistency and replication
I Replication for fast access, scalability, avoid bottlenecks
I Require consistency management among replicas

Fault-tolerance: correct and efficient operation despite link, node, process

failures
Distributed systems security
I Secure channels, access control, key management (key generation and key
distribution), authorization, secure group management
Scalability and modularity of algorithms, data, services
Some experimental systems: Globe, Globus, Grid

API for communications, services: ease of use

Transparency: hiding implementation policies from user
I Access: hide differences in data rep across systems, provide uniform operations
to access resources
I Location: locations of resources are transparent
I Migration: relocate resources without renaming
I Relocation: relocate resources as they are being accessed
I Replication: hide replication from the users
I Concurrency: mask the use of shared resources
I Failure: reliable and fault-tolerant operation

Challenges: Algorithm/Design (1)
Useful execution models and frameworks: to reason with and design correct
distributed programs
I Interleaving model
I Partial order model
I Input/Output automata
I Temporal Logic of Actions
Dynamic distributed graph algorithms and routing algorithms
I System topology: distributed graph, with only local neighborhood knowledge
I Graph algorithms: building blocks for group communication, data
dissemination, object location
I Algorithms need to deal with dynamically changing graphs
I Algorithm efficiency: also impacts resource consumption, latency, traffic,
congestion

Time and global state

I 3D space, 1D time
I Physical time (clock) accuracy
I Logical time captures inter-process dependencies and tracks relative time
progression
I Global state observation: inherent distributed nature of system
I Concurrency measures: concurrency depends on program logic, execution
speeds within logical threads, communication speeds

Synchronization/coordination mechanisms
I Physical clock synchronization: hardware drift needs correction
I Leader election: select a distinguished process, due to inherent symmetry
I Mutual exclusion: coordinate access to critical resources
I Distributed deadlock detection and resolution: need to observe global state;
avoid duplicate detection, unnecessary aborts
I Termination detection: global state of quiescence; no CPU processing and no
in-transit messages
I Garbage collection: Reclaim objects no longer pointed to by any process

Group communication, multicast, and ordered message delivery

I Group: processes sharing a context, collaborating
I Multiple joins, leaves, fails
I Concurrent sends: semantics of delivery order
Monitoring distributed events and predicates
I Predicate: condition on global system state
I Debugging, environmental sensing, industrial process control, analyzing event
streams
Distributed program design and verification tools
Debugging distributed programs

Data replication, consistency models, and caching

I Fast, scalable access;
I coordinate replica updates;
I optimize replica placement
World Wide Web design: caching, searching, scheduling
I Global scale distributed system; end-users
I Read-intensive; prefetching over caching
I Object search and navigation are resource-intensive
I User-perceived latency

Distributed shared memory abstraction

I Wait-free algorithm design: process completes execution, irrespective of
actions of other processes, i.e., n − 1 fault-resilience
I Mutual exclusion
F Bakery algorithm, semaphores, based on atomic hardware primitives, fast
algorithms when contention-free access
I Register constructions
F Revisit assumptions about memory access
F What behavior under concurrent unrestricted access to memory?
Foundation for future architectures, decoupled with technology (semiconductor,
biocomputing, quantum . . .)
I Consistency models:
F coherence versus access cost trade-off
F Weaker models than strict consistency of uniprocessors

Reliable and fault-tolerant distributed systems

I Consensus algorithms: processes reach agreement in spite of faults (under
various fault models)
I Replication and replica management
I Voting and quorum systems
I Distributed databases, commit: ACID properties
I Self-stabilizing systems: ”illegal” system state changes to ”legal” state;
requires built-in redundancy
I Checkpointing and recovery algorithms: roll back and restart from earlier
”saved” state
I Failure detectors:
F Difficult to distinguish a ”slow” process/message from a failed process/ never
sent message
F algorithms that ”suspect” a process as having failed and converge on a
determination of its up/down status

Load balancing: to reduce latency, increase throughput, dynamically. E.g.,

server farms
I Computation migration: relocate processes to redistribute workload
I Data migration: move data, based on access patterns
I Distributed scheduling: across processors
Real-time scheduling: difficult without global view, network delays make task
harder
Performance modeling and analysis: Network latency to access resources
must be reduced
I Metrics: theoretical measures for algorithms, practical measures for systems
I Measurement methodologies and tools

Applications and Emerging Challenges (1)
Mobile systems
I Wireless communication: unit disk model; broadcast medium (MAC), power
management etc.
I CS perspective: routing, location management, channel allocation, localization
and position estimation, mobility management
I Base station model (cellular model)
I Ad-hoc network model (rich in distributed graph theory problems)
Sensor networks: Processor with electro-mechanical interface
Ubiquitous or pervasive computing
I Processors embedded in and seamlessly pervading environment
I Wireless sensor and actuator mechanisms; self-organizing; network-centric,
resource-constrained
I E.g., intelligent home, smart workplace

Peer-to-peer computing
I No hierarchy; symmetric role; self-organizing; efficient object storage and
lookup;scalable; dynamic reconfig
Publish/subscribe, content distribution
I Filtering information to extract that of interest
Distributed agents
I Processes that move and cooperate to perform specific tasks; coordination,
controlling mobility, software design and interfaces
Distributed data mining
I Extract patterns/trends of interest
I Data not available in a single repository

Grid computing
I Grid of shared computing resources; use idle CPU cycles
I Issues: scheduling, QOS guarantees, security of machines and jobs
Security
I Confidentiality, authentication, availability in a distributed setting
I Manage wireless, peer-to-peer, grid environments
F Issues: e.g., Lack of trust, broadcast media, resource-constrained, lack of
structure

Chapter1 DS PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter1 DS PDF

Uploaded by

Copyright:

Available Formats

Chapter 1: Introduction

Ajay Kshemkalyani and Mukesh Singhal

Distributed Computing: Principles, Algorithms, and Systems

Cambridge University Press

A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 1 / 36

Autonomous processors communicating over a communication network

A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 2 / 36

Distributed System Model

Communication network P processor(s)

A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 3 / 36

Relation between Software Components

Distributed application Extent of

Network protocol stack

Figure 1.2: Interaction of the software components at each process.

A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 4 / 36

Motivation for Distributed System

Inherently distributed computation

A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 5 / 36

Multiprocessor systems (direct access to shared memory, UMA model)

A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 6 / 36

UMA vs. NUMA Models

Interconnection network Interconnection network

A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 7 / 36

Omega, Butterfly Interconnects

P0 000 000 M0 P0 000 000 M0

A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 8 / 36

n processors, n memory banks

A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 9 / 36

Interconnection Topologies for Multiprocesors

0100 0110 1110

0101 0111 1111

A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 10 / 36

(a) SIMD (b) MIMD (c) MISD

Figure 1.6: SIMD, MISD, and MIMD modes.

A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 11 / 36

A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 12 / 36

Message-passing vs. Shared Memory

Emulating MP over SM:

A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 13 / 36

Classification of Primitives (1)

A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 14 / 36

Classification of Primitives (2)

A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 15 / 36

Send(X, destination, handlek ) //handlek is a return parameter

A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 16 / 36

(c) blocking async. Send (d) nonblocking async. Send

duration to copy data from or to user buffer

Figure 1.8:Illustration of 4 send and 2 receive primitives

A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 17 / 36

Asynchronous Executions; Mesage-passing System

internal event send event receive event

Figure 1.9: Asynchronous execution in a message-passing system

A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 18 / 36

Synchronous Executions: Message-passing System

A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 19 / 36

Synchronous vs. Asynchronous Executions (1)

Sync vs async processors; Sync vs async primitives

A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 20 / 36

Synchronous vs. Asynchronous Executions (2)

Difficult to build a truly synchronous system; can simulate this abstraction

A. Kshemkalyani and M. Singhal (Distributed Computing) Introduction CUP 2008 21 / 36

Asynchronous A−>S Synchronous

MP−>SM SM−>MP MP−>SM SM−>MP