Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel

Parallelism Levels
• Bit-level parallelism
• Number of bits processed per clock cycle (often called a word size)
• Increased from 4-bit, to 8-bit, 16-bit, 32-bit, and to 64-bit
• Instruction-level parallelism
• Computers now use multi-stage processing pipelines to speed up execution
• Data parallelism or loop parallelism
• The program loops can be processed in parallel
• Task parallelism
• The problem can be decomposed into tasks that can be carried out
concurrently. For example, SPMD. Note that data dependencies cause
different flows of control in individual tasks
2
Parallel Computer Architecture
• Flynn’s taxonomy of computer architectures
• Based on the # of concurrent control/instruction
& data streams
• SISD (Single Instruction Single Data)
• Scalar architecture with one processor/core
• SIMD (Single Instruction, Multiple Data)
• Supports vector processing
• Operations on individual vector components are
carried out concurrently
3
Parallel Computer Architecture
• MIMD (Multiple Instructions, Multiple Data)
• Several processors/cores function
asynchronously and independently
• At any time, different processors/cores may be
executing different instructions on different data
• Several types of systems:
• Uniform Memory Access (UMA)
• Cache Only Memory Access (COMA)
• Non-Uniform Memory Access (NUMA)
4
Distributed Systems
• A distributed system is a collection of:
• Autonomous computers
• Connected through a network
• Distribution software called middleware, is used to:

• Coordinate computer activities
• Share system resources
5
Characteristics of Distributed Systems
• Users perceive the system as a single, integrated computing facility
• Components are autonomous
• Scheduling, resource management and security policies are
implemented by each system
• There are multiple:
• Points of control
• Points of failure
• Resources may not be accessible at all times
• Such distributed systems can be scaled via additional resources
• They can be designed to maintain availability even at low levels of
hardware/software/network reliability
6
Desirable Properties of a Distributed System
• Access Transparency
• Local and remote resources are accessed using identical operations
• Location Transparency
• Information objects are accessed without knowing their location
• Concurrency Transparency
• Several processes run concurrently using shared information objects without
interference among them
• Replication Transparency
• Multiple instances of information objects increase reliability without the
knowledge of users or applications
7
Desirable Properties of a Distributed System
• Failure Transparency
• Concealment of failures
• Migration Transparency
• Information objects in the system are moved without affecting the operation
performed on them
• Performance Transparency
• The system can be reconfigured based on the load and quality of service
(QoS) requirements
• Scaling Transparency
• The system and applications can scale without changing the system structure
and without affecting the applications
8
Processes, Threads and Events
• Dispatchable units of work:
• Process – is a program in execution
• Thread – is a lightweight process
• State of a process/thread:
• Information required to restart a suspended process/thread, e.g. program
counter and the current values of the registers
• Event
• A change of state of a process, e.g., local or communication events
9
Amdahl’s Law
 We parallelize our programs in order to run them faster
 How much faster will a parallel program run?
 Suppose that the sequential execution of a program takes T1 time units

and the parallel execution on p processors takes Tp time units
 Suppose that out of the entire execution of the program, s fraction of it is

not parallelizable while 1-s fraction is parallelizable
 Then the speedup (Amdahl’s formula):
4
Amdahl’s Law: An Example
 Suppose that 80% of you program can be parallelized and that you
use 4 processors to run your parallel version of the program
 The speedup you can get according to Amdahl is:
 Although you use 4 processors you cannot get a speedup more than
2.5 times (or 40% of the serial running time)
5
Real Vs. Actual Cases
 Amdahl’s argument is too simplified to be applied to real cases
 When we run a parallel program, there are a communication

overhead and a workload imbalance among processes in general
20 80 20 80
Serial Serial
Parallel 20 20 Parallel 20 20
Process 1 Process 1
Process 2 Process 2
Cannot be parallelized
Process 3 Process 3
Cannot be parallelized Can be parallelized
Process 4 Process 4 Communication overhead

Can be parallelized
Load Unbalance
1. Parallel Speed-up: An Ideal Case 2. Parallel Speed-up: An Actual Case
6
Guidelines
 In order to efficiently benefit from parallelization, we
ought to follow these guidelines:
1. Maximize the fraction of our program that can be parallelized
2. Balance the workload of parallel processes
3. Minimize the time spent for communication
7
Parallel Computer Architectures
Multi-Chip Single-Chip
Multiprocessors Multiprocessors
9
Multi-Chip Multiprocessors
 We can categorize the architecture of multi-chip multiprocessor
computers in terms of two aspects:
 Whether the memory is physically centralized or distributed

 Whether or not the address space is shared
Address Space
Shared Individual
Memory
Centralized SMP (Symmetric Multiprocessor)/UMA N/A

(Uniform Memory Access) Architecture
Distributed Distributed Shared Memory (DSM)/NUMA MPP (Massively Parallel
(Non-Uniform Memory Access) Processors)/UMA
Architecture Architecture
10
Symmetric Multiprocessors
 A system with Symmetric Multiprocessors (SMP) architecture uses a
shared memory that can be accessed equally from all processors
Processor Processor Processor Processor
Cache Cache Cache Cache
Bus or Crossbar Switch
Memory
I/O
 Usually, a single OS controls the SMP system
11
Massively Parallel Processors
 A system with a Massively Parallel Processors (MPP) architecture
consists of nodes with each having its own processor, memory and
I/O subsystem
Interconnection Network
Processor Processor Processor Processor
Cache Cache Cache Cache
Bus Bus Bus Bus
Memory I/O Memory I/O Memory I/O Memory I/O
 Typically, an independent OS runs at each node
12
Distributed Shared Memory
 A Distributed Shared Memory (DSM) system is typically built on a
similar hardware model as MPP
 DSM provides a shared address space to applications using a

hardware/software directory-based coherence protocol
 The memory latency varies according to whether the memory is

accessed directly (a local access) or through the interconnect
(a remote access) (hence, NUMA)
 As in a SMP system, typically a single OS controls a DSM system
13
Multi-Chip
Multi-Chip Single-Chip
Multiprocessors
Multiprocessors Multiprocessors
14
Chip Multiprocessors
 The outcome is a single-chip multiprocessor referred to as Chip
Multiprocessor (CMP)
 CMP is currently considered the architecture of choice
 Cores in a CMP might be coupled either tightly or loosely

 Cores may or may not share caches
 Cores may implement a message passing or a shared memory inter-core
communication method
 Common CMP interconnects (referred to as Network-on-Chips or NoCs)

include bus, ring, 2D mesh, and crossbar
 CMPs could be homogeneous or heterogeneous:

 Homogeneous CMPs include only identical cores
 Heterogeneous CMPs have cores which are not identical
16
Models of Parallel Programming
 What is a parallel programming model?
 A programming model is an abstraction provided by the hardware

to programmers
 It determines how easily programmers can specify their algorithms into

parallel unit of computations (i.e., tasks) that the hardware understands
 It determines how efficiently parallel tasks can be executed on the hardware
 Main Goal: utilize all the processors of the underlying architecture

(e.g., SMP, MPP, CMP) and minimize the elapsed time of
your program
18
Traditional Parallel Programming
Models
Parallel Programming Models
Shared Memory Message Passing
19
Shared Memory Model
 In the shared memory programming model, the abstraction is that
parallel tasks can access any location of the memory
 Parallel tasks can communicate through reading and writing

common memory locations
 This is similar to threads from a single process which share a single

address space
 Multi-threaded programs (e.g., OpenMP programs) are the best fit

with shared memory programming model
20
Shared Memory Model
Single Thread Multi-Thread
Si = Serial
Pj = Parallel Time
Time
S1 S1 Spawn
P1
P1 P2 P3 P3
P2
Join
P3 S2 Shared Address Space
P4
S2
Process
Process
21
Shared Memory Example
begin parallel // spawn a child thread
private int start_iter, end_iter, i;
shared int local_iter=4, sum=0;
shared double sum=0.0, a[], b[], c[];
shared lock_type mylock;
for (i=0; i<8; i++) start_iter = getid() * local_iter;

a[i] = b[i] + c[i]; end_iter = start_iter + local_iter;
sum = 0; for (i=start_iter; i<end_iter; i++)
for (i=0; i<8; i++) a[i] = b[i] + c[i];
if (a[i] > 0) barrier;
sum = sum + a[i];
Print sum; for (i=start_iter; i<end_iter; i++)
if (a[i] > 0) {
Sequential lock(mylock);
sum = sum + a[i];
unlock(mylock);
}
barrier; // necessary
end parallel // kill the child thread

Print sum;
Parallel
22
Traditional Parallel Programming
Models
Parallel Programming Models
Shared Memory Message Passing
28
Message Passing Model
 In message passing, parallel tasks have their own local memories
 One task cannot access another task’s memory
 Hence, to communicate data they have to rely on explicit messages

sent to each other
 This is similar to the abstraction of processes which do not share an

address space
 Message Passing Interface (MPI) programs are the best fit with the
message passing programming model
29
Message Passing Model
Single Thread Message Passing
S = Serial
P = Parallel
Time
Time
S1 S1 S1 S1 S1
P1 P1 P1 P1 P1
P2 S2 S2 S2 S2
P3
P4
Process 0 Process 1 Process 2 Process 3
S2
Node 1 Node 2 Node 3 Node 4
Data transmission over the Network
Process
30
SPMD and MPMD
 When we run multiple processes with message-passing, there are
further categorizations regarding how many different programs are
cooperating in parallel execution
 We distinguish between two models:
1. Single Program Multiple Data (SPMD) model
2. Multiple Programs Multiple Data (MPMP) model
34
SPMD
 In the SPMD model, there is only one program and each process
uses the same executable working on different sets of data
a.out
Node 1 Node 2 Node 3
35
MPMD
 The MPMD model uses different programs for different processes,
but the processes collaborate to solve the same problem
 MPMD has two styles, the master/worker and the coupled analysis
a.out= Structural Analysis,

a.out b.out a.out b.out c.out b.out = fluid analysis and
c.out = thermal analysis
Example
Node 1 Node 2 Node 3 Node 1 Node 2 Node 3
1. MPMD: Master/Slave 2. MPMD: Coupled Analysis

36
Concluding Remarks
 To summarize, keep the following 3 points in mind:
 The purpose of parallelization is to reduce the time spent

for computation
 Ideally, the parallel program is p times faster than the sequential

program, where p is the number of processes involved in the parallel
execution, but this is not always achievable
 Message-passing is the tool to consolidate what parallelization has

separated. It should not be regarded as the parallelization itself
39
8. Distributed systems
◼ Collection of autonomous computers, connected through a network
operating under the control and distribution software.
◼ Middleware → software enabling individual systems to coordinate
their activities and to share system resources.
◼ Main characteristics of distributed systems:
 The users perceive the system as a single, integrated computing facility.
 The components are autonomous.
 Scheduling and other resource management and security policies are
implemented by each system.
 There are multiple points of control and multiple points of failure.
 The resources may not be accessible at all times.
 Can be scaled by adding additional resources.
 Can be designed to maintain availability even at low levels of
hardware/software/network reliability.
Dan C. Marinescu Cloud Computing Second Edition - Chapter 4. 27

Distributed systems - desirable properties
◼ Access transparency - local and remote information objects are
accessed using identical operations.
◼ Location transparency - information objects are accessed without
knowledge of their location.
◼ Concurrency transparency - several processes run concurrently using
shared information objects without interference among them.
◼ Replication transparency - multiple instances of information objects
increase reliability without the knowledge of users or applications.
◼ Failure transparency - the concealment of faults.
◼ Migration transparency - the information objects in the system are moved
without affecting the operation performed on them.
◼ Performance transparency - the system can be reconfigured based on
the load and quality of service requirements.
◼ Scaling transparency - the system and the applications can scale without
a change in the system structure and without affecting the applications.

9. Modularity
◼ Modularity, layering, and hierarchy are means to cope with the
complexity of a distributed application software.
◼ Software modularity, the separation of a function into independent,
interchangeable modules requires well-defined interfaces specifying
the elements provided and supplied to a module.
◼ Requirement for modularity → clearly define the interfaces between
modules and enable the modules to work together.
◼ The steps involved in the transfer of the flow of control between the
caller and the callee:
 The caller saves its state including the registers, the arguments, and the
return address on the stack
 The callee loads the arguments from the stack, carries out the calculations
and then transfers control back to the caller.
 The caller adjusts the stack, restores its registers, and continues its
processing.

Modular software design principles
◼ Information hiding → the user of a module does not need to know
anything about the internal mechanism of the module to make effective
use of it.
◼ Invariant behavior → the functional behavior of a module must be
independent of the site or context from which it is invoked.
◼ Data generality→ the interface to a module must be capable of passing
any data object an application may require.
◼ Secure arguments → the interface to a module must not allow side-
effects on arguments supplied to the interface.
◼ Recursive construction → a program constructed from modules must
be usable as a component in building larger programs/modules
◼ System resource management → resource management for program
modules must be performed by the computer system and not by
individual program modules.

Soft modularity
◼ Soft modularity → divide a program into modules which call each other
and communicate using shared-memory or follow procedure call
convention.
 Hides module implementation details.
 Once the interfaces of the modules are defined, the modules can be
developed independently.
 A module can be replaced with a more elaborate, or with a more efficient
one, as long as its interfaces with the other modules are not changed.
 The modules can be written using different programming languages and
can be tested independently.
◼ Challenges:
 Increases the difficulty of debugging; for example, a call to a module with
an infinite loop will never return.
 There could be naming conflicts and wrong context specifications.
 The caller and the callee are in the same address space and may misuse
the stack, e.g., the callee may use registers that the caller has not saved
on the stack, and so on.
Enforced modularity; the client-server paradigm
◼ Modules are forced to interact only by sending and receiving
messages.
◼ More robust design,
 Clients and servers are independent modules and may fail separately.
 Does not allow errors to propagate.
◼ Servers are stateless, they do not have to maintain state
information. A server may fail and then come back up without the
clients being affected, or even noticing the failure of the server.
◼ Enforced modularity makes an attack less likely because it is difficult
for an intruder to guess the format of the messages or the sequence
numbers of segments, when messages are transported by TCP.
◼ Often based on RPCs.

Remote procedure calls (RPCs)
◼ Introduced in early 1970s by Bruce Nelson and used for the first
time at PARC.
 Reduce fate sharing between caller and the callee.
 RPCs take longer than local calls due to communication delays.
◼ RPC semantics
 At least once → a message is resent several times and an answer is
expected. The server may end up executing a request more than once,
but an answer may never be received. Suitable for operation free of
side-effects
 At most once → a message is acted upon at most once. The sender
sets up a timeout for receiving the response. When the timeout expires
an error code is delivered to the caller. Requires the sender to keep a
history of the time-stamps of all messages as messages may arrive
out-of-order. Suitable for operations which have side effects
 Exactly once → implements at most once semantics and requests an
acknowledgment from server.

Client-server communication for World Wide Web.
Three-way handshake involves
the first three messages
exchanged between the client
browser and the server.
Once the TCP connection is
established the HTTP server
takes its time to construct the
page to respond to the first
request; to satisfy the second
request the HTTP server must
retrieve an image from the disk.
Response time components:
1. RTT (Round-trip time).
2. Server residence time.
3. Data transmission time.
Cloud Computing Second Edition - Chapter

Dan C. Marinescu 4. 34
10. Layering and hierarchy
◼ Layering demands modularity → each layer fulfills a well-defined
function.
◼ Communication patterns are more restrictive, a layer is expected
to communicate only with the adjacent layers. This restriction
reduces the system complexity and makes it easier to understand
its behavior.
◼ Strictly enforced layering can prevent optimizations. For example,
cross-layer communication in networking was proposed to allow
wireless applications to take advantage of information available at
the Media Access Control (MAC) sub-layer of the data link layer.
◼ There are systems where it is difficult to envision a layered
organization because of the complexity of the interaction between
the individual modules.
◼ Could a layered cloud architecture be designed that has practical
implications for the future development of computing clouds?

Communication protocol layering
◼ Internet protocol stack:
 Physical layer → accommodate divers physical communication
channels carrying electromagnetic, optical, or acoustic signals .
 Data link layerHow → address the problem to transport bits, not signals
between two systems directly connected to one another by a
communication channel.
 Network layer → packets carying bits have to traverse a chain of
intermediate nodes from a source to the destination; the concern is how
to forward the packets from one intermediate node to the next.
 Transport layer → the source and the recipient of packets are outside
the network this layer guarantees delivery from source to destination.
 Application layer → data sent and received by the hosts at the network
periphery has a meaning only in the context of an application.
Cloud Computing Second Edition -

Dan C. Marinescu Chapter 4. 36
11. Virtualization; layering and virtualization
◼ Virtualization abstracts the underlying physical resources of a
system and simplifies its use, isolates users from one another, and
supports replication which increases system elasticity and reliability.
◼ Virtualization simulates the interface to a physical object:
 Multiplexing → create multiple virtual objects from one instance of a
physical object. E.g., a processor is multiplexed among a number of
processes or threads.
 Aggregation → create one virtual object from multiple physical objects.
E.g., a number of physical disks are aggregated into a RAID disk.
 Emulation → construct a virtual object from a different type of a physical
object. E.g., a physical disk emulates Random Access Memory.
 Multiplexing and emulation → E.g., virtual memory with paging
multiplexes real memory and disk and a virtual address emulates a real
address; the TCP protocol emulates a reliable bit pipe and multiplexes a
physical communication channel and a processor.

Virtualization and cloud computing
◼ Virtualization is a critical aspect of cloud computing, equally
important for providers and consumers of cloud services for several
reasons:
 System security → it allows isolation of services running on the same
hardware.
 Performance isolation → allows developers to optimize applications and
cloud service providers to exploit multi-tenancy.
 Performance and reliability → it allows applications to migrate from one
platform to another.
 Facilitates development and management of services offered by a
provider.
◼ A hypervisor runs on the physical hardware and exports hardware-
level abstractions to one or more guest operating systems.
◼ A guest OS interacts with the virtual hardware in the same manner it
would interact with the physical hardware, but under the watchful
eye of the hypervisor which traps all privileged operations and
mediates the interactions of the guest OS with the hardware.
12. Peer-to-peer systems (P2P)
◼ P2P represents a significant departure from the client-server model
and have several desirable properties:
 Require a minimally dedicated infrastructure, as resources are contributed
by the participating systems.
 Highly decentralized.
 Scalable, individual nodes are not required to be aware of global state.
 Are resilient to faults and attacks, as few of their elements are critical for
the delivery of service and the abundance of resources can support a high
degree of replication.
 Individual nodes do not require excessive network bandwidth as servers
used by client-server model do.
 The systems are shielded from censorship due to the dynamic and often
unstructured system architecture.
◼ Undesirable properties:
 Decentralization raises the question if P2P systems can be managed
effectively and provide the security required by various applications.
 Shielding from censorship makes them a fertile ground for illegal activities.
Resource sharing in P2P systems
◼ This distributed computing model promotes low-cost access to
storage and CPU cycles provided by participant systems.
 Resources are located in different administrative domains.
 P2P systems are self-organizing and decentralized, while the servers in
a cloud are in a single administrative domain and have a central
management.
◼ Napster, a music-sharing system, developed in late 1990s gave
participants access to storage distributed over the network.
◼ The first volunteer-based scientific computing, SETI@home, used
free cycles of participating systems to carry out compute-intensive
tasks.

Organization of P2P systems
◼ Regardless of the architecture, P2P systems are built around an
overlay network, a virtual network superimposed over the real network.
 Each node maintains a table of overlay links connecting it with other
nodes of this virtual network, each node is identified by its IP addresses.
 Two types of overlay networks, unstructured and structured, are used.
 Random walks starting from a few bootstrap nodes are usually used by
systems desiring to join an unstructured overlay.
◼ Each node of a structured overlay has a unique key which determines
its position in the structure; the keys are selected to guarantee a
uniform distribution in a very large name space.
◼ Structured overlay networks use key-based routing (KBR); given a
starting node v0 and a key k, the function KBR(v0,k) returns the path in
the graph from v0 to the vertex with key k.
◼ Epidemic algorithms are often used by unstructured overlays to
disseminate network topology.

Examples of P2P systems
◼ Skype, a voice over IP telephony service allows close to 700 million

registered users from many countries around the globe to
communicate using a proprietary voice-over-IP protocol.
◼ Data streaming applications such as Cool Streaming
◼ BBC's online video service,
◼ Content distribution networks such as CoDeeN.
◼ Volunteer computing applications based on the BOINC (Berkeley
Open Infrastructure for Networking Computing) platform.

Processes, Threads and Events
• Process Group
• A collection of cooperating processes
• Processes cooperate and communicate to reach a common goal
• Global State of a Distributed System
• Distributed Systems consist of several processes and communication channels
• Global State is the union of states of individual processes and channels
10
Messages and Communication Channels
• A message is a structured unit of information
• A communication channel provides the means for processes or
threads to:
• Communicate with one another
• Coordinate their actions by exchanging messages
• Communication is done using send(m) and receive(m) system calls, where m is
a message
11
Messages and Communication Channels
• State of a communication channel
• Given two processes 𝑝𝑖 and 𝑝𝑗 , the state of the channel 𝜉𝑖,𝑗 from 𝑝𝑖 to 𝑝𝑗
consists of messages sent by 𝑝𝑖 but not yet received by 𝑝𝑗
• Protocol
• A finite set of messages exchanged among processes to help them coordinate
their actions
12
Process Coordination – Communication Protocols
• A major challenge is to guarantee that 2 processes will reach an
agreement in case of channel failures
• Communication protocols ensure process coordination by
implementing:
• Error Control mechanisms
• Using error detection and error correction codes
• Flow Control
• Provides feedback from the receiver, it forces the sender to transmit only the amount of
data the receiver can handle
• Congestion Control
• Ensures that the offered load of the network does not exceed the network capacity
13
Process Coordination – Time and time intervals
• Process Coordination requires:
• A global concept of time shared by cooperating entities
• The measurement of time intervals, the time elapsed between 2 events
• Two events in the global history may be unrelated
• Neither one is the cause of the other
• Such events are said to be concurrent events
• Local timers provide relative time measurements
• An isolated system can be characterized by its history, i.e., a sequence of
events
2
Process Coordination – Time and time intervals
• Global agreement on time is necessary to trigger actions that should
occur concurrently
• Timestamps are often used for event ordering
• Using a global time base constructed on local virtual clocks
3
Causality Example: Event Ordering
4
Logical Clocks
• Logical Clock (LC)
• An abstraction necessary to ensure the clock condition in the absence of a
global clock
• A process maps events to positive integers
• LC(e) is the local variable associated with event e.
• Each process time-stamps the message m it sends with the value of
the logical clock at the time of sending:
• The rules to update the logical clock:
5
Logical Clocks
1 2 3 4 5 12
p 1
m 1 m 2 m
5
1 2 6 7 8 9
p 2
m 3 m
4
1 2 3 10 11
p 3
• Three processes and their logical clock
6
Logical p
1 2 3 4 5 12
Clocks 1
m 1 m 2 m
5
1 2 6 7 8 9
p 2
m 3 m
4
1 2 3 10 11
p 3
7
Message Delivery Rules; Causal Delivery
• A real-life network might reorder messages.
• First-In-First-Out (FIFO) delivery
• Messages are delivered in the same order they are sent.
• Causal delivery
• An extension of the FIFO delivery
• Used in case when a process receives messages from different sources.
• Communication channel typically does not guarantee FIFO delivery
• However, FIFO delivery is enforced by attaching a sequence number to each message sent
• The sequence numbers are also used to reassemble messages out of individual packets.
8
Concurrency
• Required by system and application software:
• Reactive systems respond to external events
• e.g., operating system kernel, embedded systems.
• Improve performance
• Parallel applications partition workload & distribute it to multiple threads running
concurrently.
• Support variable load & shorten the response time of distributed applications, like
• Transaction management systems
• Client-server applications
9
Consensus Protocols
• Consensus
• Process of agreeing to one of several alternates proposed by a number of
agents.
• Consensus Service
• Set of n processes
• Clients send requests, propose a value and wait for a response
• Goal is to get the set of processes to reach consensus on a single proposed
value.
10
Consensus Protocols
• Consensus protocol assumptions:
• Processes run on processors and communicate through a network
• processors and network may experience failures, (but not the complicated failures).
• Processors:
• Operate at arbitrary speeds
• Have stable storage and may rejoin the protocol after a failure
• Send messages to one another.
• Network:
• May lose, reorder, or duplicate messages
• Messages are sent asynchronously
• Message may take arbitrary long time to reach the destination.
11
Client-Server Paradigm
• This paradigm is based on the enforced modularity
• Modules are forced to interact only by sending and receiving messages.
• A more robust design
• Clients and servers are independent modules and may fail separately.
• Servers are stateless
• May fail and then come up without the clients being affected or even noticing
the failure of the server.
• An attack is less likely
• Difficult for an intruder to guess the:
• Format of the messages
• Sequence numbers of the segments, when messages are transported by TCP
12
Logical Clocks
• Logical Clock (LC)
• An abstraction necessary to ensure the clock condition in the absence of a global
clock
• A process maps events to positive integers
• LC(e) is the local variable associated with event e.
• Each process time-stamps the message m it sends with the value of the
logical clock at the time of sending:
• The rules to update the logical clock:

LC(e) = LC + 1 → if e is a local event or a send(m) event
LC(e) = max(LC + 1, TS(m) + 1) → if e = receive(m)
2
Logical 1 2 3 4 5 12
p
Clocks 1
m 1 m 2 m
5
1 2 6 7 8 9
p 2
m 3 m
4
1 2 3 10 11
p 3
LC(e) = LC + 1 → if e is a local event or a send(m) event

LC(e) = max(LC + 1, TS(m) + 1) → if e = receive(m)
3
Client-Server Paradigm
• This paradigm is based on the enforced modularity
• Modules are forced to interact only by sending and receiving messages.
• A more robust design
• Clients and servers are independent modules and may fail separately.
• Servers are stateless
• May fail and then come up without the clients being affected or even noticing
the failure of the server.
• An attack is less likely
• Difficult for an intruder to guess the:
• Format of the messages
• Sequence numbers of the segments, when messages are transported by TCP
4
Services
• Email service
• Sender and receiver communicate asynchronously using inboxes and outboxes
• Mail daemons run at each site.
• Event service
• supports coordination in a distributed environment
• Based on the publish-subscribe paradigm
• An event producer publishes events and an event consumer subscribes to events
• Server maintains queues for each event and delivers notifications to clients when an
event occurs.
5
Services
(a)
(b)
6
WWW
• 3-way handshake
• First 3 messages exchanged between the client and the server
• Once a TCP connection is established the HTTP server takes its time to
construct the page to respond the first request
• To satisfy the second request, the HTTP server must retrieve an image
from the disk
• Response time includes
• Round Trip Time (RTT)
• Server residence time
• Data transmission time
7
Browser Web Server
WWW HTTP request
RTT
SYN
SYN
TCP connection establishment

ACK + HTTP request
Server residence time.

ACK Web page created on the fly
Data
Data transmission time
Data ACK
HTTP request
ACK
Server residence time.

Image retrieved from disk
Image transmission time

Image
time time
8
HTTP Communication HTTP client
Web
request
TCP
port
HTTP
80 server
• A Web client can:
Browser
response
• communicate directly with the server

request to proxy
HTTP client
• communicate through a proxy Web
Browser
request to server
Proxy
TCP port 80
response to client HTTP

server
response to proxy
• use tunneling to cross the network. HTTP client request request

TCP HTTP
Web Tunnel port
Browser
80 server
response response

Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel

Uploaded by

Copyright:

Available Formats

Parallelism Levels

• Distribution software called middleware, is used to:

 How much faster will a parallel program run?

 Suppose that the sequential execution of a program takes T1 time units

 Suppose that out of the entire execution of the program, s fraction of it is

 Then the speedup (Amdahl’s formula):

 The speedup you can get according to Amdahl is:

 When we run a parallel program, there are a communication

Process 4 Process 4 Communication overhead

1. Maximize the fraction of our program that can be parallelized

2. Balance the workload of parallel processes

3. Minimize the time spent for communication

Parallel Computer Architectures

 Whether the memory is physically centralized or distributed

Centralized SMP (Symmetric Multiprocessor)/UMA N/A

Processor Processor Processor Processor

Cache Cache Cache Cache

Bus or Crossbar Switch

 Usually, a single OS controls the SMP system

Processor Processor Processor Processor

Cache Cache Cache Cache

Bus Bus Bus Bus

Memory I/O Memory I/O Memory I/O Memory I/O

 Typically, an independent OS runs at each node

 DSM provides a shared address space to applications using a

 The memory latency varies according to whether the memory is

 As in a SMP system, typically a single OS controls a DSM system

Parallel Computer Architectures

 CMP is currently considered the architecture of choice

 Cores in a CMP might be coupled either tightly or loosely

 Common CMP interconnects (referred to as Network-on-Chips or NoCs)

 CMPs could be homogeneous or heterogeneous:

 A programming model is an abstraction provided by the hardware

 It determines how easily programmers can specify their algorithms into

 It determines how efficiently parallel tasks can be executed on the hardware

 Main Goal: utilize all the processors of the underlying architecture

Shared Memory Message Passing

 Parallel tasks can communicate through reading and writing

 This is similar to threads from a single process which share a single

 Multi-threaded programs (e.g., OpenMP programs) are the best fit

for (i=0; i<8; i++) start_iter = getid() * local_iter;

end parallel // kill the child thread

Shared Memory Message Passing

 One task cannot access another task’s memory

 Hence, to communicate data they have to rely on explicit messages

 This is similar to the abstraction of processes which do not share an

Data transmission over the Network

 We distinguish between two models:

1. Single Program Multiple Data (SPMD) model

2. Multiple Programs Multiple Data (MPMP) model

Node 1 Node 2 Node 3

a.out= Structural Analysis,

Node 1 Node 2 Node 3 Node 1 Node 2 Node 3

1. MPMD: Master/Slave 2. MPMD: Coupled Analysis

 The purpose of parallelization is to reduce the time spent

 Ideally, the parallel program is p times faster than the sequential

 Message-passing is the tool to consolidate what parallelization has

Dan C. Marinescu Cloud Computing Second Edition - Chapter 4. 27

Dan C. Marinescu Cloud Computing Second Edition - Chapter 4. 28

Dan C. Marinescu Cloud Computing Second Edition - Chapter 4. 29

Dan C. Marinescu Cloud Computing Second Edition - Chapter 4. 30

Dan C. Marinescu Cloud Computing Second Edition - Chapter 4. 32

Dan C. Marinescu Cloud Computing Second Edition - Chapter 4. 33