You are on page 1of 30

Parallel Processing

Chapter - 5
Process Level Parallelism

Dr. Basant Tiwari


basanttiw@gmail.com
Department of Computer Science, Hawassa University
Process Level Parallelism

5.1. Distributed computers


5.2. Clusters
5.3. Grid
5.4. Mainframe computers
Distributed computers
• The growing popularity of the Internet and the availability of powerful
computers and high-speed networks as low-cost commodity components, are
changing the way we do computing.
• The distributed computing is used on many systems to solve a large
scale problem.
• There is a tremendous need of High Performance Computing (HPC) in many
applications like space science to Artificial Intelligence. HPC shall be achieved
through Parallel and Distributed Computing.
• Distributed computing consists of a set of processes that cooperate to achieve
a common specific goal.
• Mostly social network sites are implemented on the concept of large distributed
computing systems. These are running in centrally controlled data centers.
• However, the trend in these massively scalable systems is toward the use of
peer-to-peer, utility, cluster, and jungle computing. The utility computing is
basically the grid computing and the cloud computing which is the recent topic
of research. This classification is well shown in the Figure 1.1.
Distributed computers

Figure 1: Classification of Distributed Computing


Distributed System
• A distributed system is a collection of autonomous computers that
are connected through a computer network.
• Each computer executes processes and operates by distribution
middleware. Middleware enables the processes to coordinate their
activities.
• Users perceive the system as a single, integrated computing
facility.

“A distributed system is one in which processes located at


networked computers, communicate and coordinate their actions
only by passing messages.”

“A set of autonomous processes communicating among themselves


to perform a task
Characteristics of Distributed System
• Distributed computing allows autonomous components/
computers to execute the task.
• Distributed computing uses heterogeneous technology
• Distributed computing components may be used exclusively
• Distributed computing performed using concurrent processes
• Distributed computing have multiple points of failure
Decentralized or Distributed System Architecture
• In Distributed System architecture, Data, Process, and Interface
components of an information system are distributed to multiple
locations in a computer network. Accordingly, the processing
workload is distributed across the network.

• In this architecture, there are Set of separate computers that are


capable of autonomous operation, link by a computer network.

• Enable individual computers (different location) to share resources


in the network.

• Server implementation for the same interface located in different


servers.

• Example: Peer to Peer architecture


Peer-to-peer
Peer-to-peer(P2P)
(P2P) architecture:
architecture:
•• Peer-to-peer
Peer-to-peer computing
computing or or networking
networking isis aa distributed
distributed application
application
architecture
architecture that
that partitions
partitions tasks
tasks or
or workloads
workloads between
between peers.
peers. Peers
Peers
are
areequally
equallyprivileged,
privileged,equipotent
equipotentparticipants
participantsininthe
theapplication.
application.
•• The
The nodes
nodes inin peer
peer toto peer
peer networks
networks both
both use
use resources
resources and
and provide
provide
resources.
resources. So,So, ifif the
the nodes
nodes increase,
increase, then
then the
the resource
resource sharing
sharing
capacity
capacityof
ofthe
thepeer
peerto topeer
peernetwork
networkincreases.
increases.
•• This
This isis different
different than
than client
client server
server networks
networks where
where the
the server
server gets
gets
overwhelmed
overwhelmedififthe thenodes
nodesincrease.
increase.
•• Since
Sincenodesnodesininpeer
peerto
topeer
peernetworks
networksact
actasasboth
bothclients
clientsand
andservers,
servers,
ititisisdifficult
difficultto
toprovide
provideadequate
adequatesecurity
securityfor
forthe
thenodes.
nodes.This
Thiscan
canlead
lead
totodenial
denialofofservice
serviceattacks.
attacks.
•• Most
Most modern
modern operating
operating systems
systems such
such as
as Windows
Windows and
and Mac
Mac OS
OS
contain
containsoftware
softwareto
toimplement
implementpeer
peerto
topeer
peernetworks.
networks.
Peer-to-peer computing

Peer
Peerto
to peer
peer……
Peer responsibilities, as a Client:
• Sending commands to other peers
to request a service
• Receiving responses to a request
for a service

Peer responsibilities, as a server:


• Receiving commands from other
peers requesting a service
• Processing service requests and
executing the requested service
• Sending response with results of
the requested service
• Propagating requests for service
to other peers
Peer-to-peer computing

Peer-to-peer
Peer-to-peer……

P2P
P2Phas
hasthree
threestructural
structuralclassifications
classifications
Centralized

Centralized(E.
(E.g.g.Napster)
Napster)
Structured

StructuredDecentralized
Decentralized(E.g.
(E.g.Chord,
Chord,Pastry)
Pastry)
Unstructured

UnstructuredDecentralized
Decentralized(E.g.
(E.g.Gnutella,
Gnutella,Kazaa)
Kazaa)
Peer-to-peer computing

Peer-to-peer
Peer-to-peer……
Centralized
Centralized P2P
P2P
The
 Thenetwork
networkhas
hasaacentral
central
index
indexthat
thatall
allpeers
peerscontact
contact
(like
(likeDNS
DNS//directory
directoryservice)
service)
Advantages:
 Advantages: Minimize
Minimize
network
networktraffic,
traffic,search
searchwhole
whole Query
network
networkquickly
quickly Response

Disadvantages:
 Disadvantages: Central
Centralpoint
point Request
of
offailure,
failure,scalability,
scalability,not
notaa
“true”
“true”P2P
P2P
Peer-to-peer computing

Peer-to-peer
Peer-to-peer……

Structured
Structured Decentralized
Decentralized
P2P
P2P
No
 Nocentral
centralpoint
point
Nodes
 Nodesorganize
organizethemselves
themselves
in
inrelation
relationto
todata
data
Advantages:
 Advantages:true
truep2p,
p2p,
expected
expectedsearch
searchtimes
times
Disadvantages:
 Disadvantages: Extra
Extra
overhead
overheadforforstructure
structure
Peer-to-peer computing

Peer-to-peer
Peer-to-peer……
Unstructured
Unstructured Decentralized
Decentralized
P2P
P2P
No
 No central
central point
point
No
 No relation
relationbetween
betweentopology
topology
and
anddata
data
Advantages:
 Advantages: true
true p2p,
p2p, low
low
cost
costto
tocreate
create
Disadvantages:
 Disadvantages: lack
lack ofof overall
overall
data
data knowledge,
knowledge,efficiency
efficiency
Cluster Computing
• A Cluster computer is a group of linked computers, working together
closely so that in many respects they form a single computer. The
components of a cluster are commonly, but not always, connected to each
other through fast local area networks.
• Clusters are usually deployed to improve performance and/or availability
over that provided by a single computer with much more cost-effectiveness
• Cluster consists of:
 Nodes (master + computing),
 Network,
 OS, and
 Cluster middleware: Suitable concepts and mechanisms can be
used to develop and execute clustering programs in distributed
systems. It enables the components to coordinate their activities.
Cluster Computing
• A type of distributed system
• A collection of workstations of PCs that are interconnected by a high-
speed network
• Work as an integrated collection of resources
• Have a single system image spanning all its nodes
Cluster Computer Architecture
Cluster Computing
There are so many components of the cluster computing as follows:
• High Performance Computers like PCs, Workstations etc.
• Micro- kernel based operating systems.
• High speed networks or switches like Gigabit Ethernets.
• NICs (Network Interface Cards)
• Fast Communication Protocols and Services
• Cluster Middleware which is hardware, Operating system kernels, applications
and subsystems.
• Parallel Programming Environment Tools like compilers, parallel virtual machines
etc.
• Sequential and Parallel applications
Cluster Middleware handles:
 Resource management and scheduling
 Fault handling
 Migration
 Load balancing
Grid Computing
• Grid computing is an Utility computing that enable coordinated resource
sharing and problem solving in dynamic, multi-institutional virtual
organizations.
• As an electric-utility power grid, a computing grid offers an infrastructure
that couples computers, software/middleware, special instruments, and
people & sensors together. Grid is often constructed across LAN, WAN,
or Internet backbone networks at regional, national, or global scales.
• Enterprises or organizations present grids as integrated computing
resources.
• The computers used in a grid are primarily workstations, servers,
clusters, and supercomputers.
• Personal computers, laptops and PDAs can be used as access devices
to a grid system.
Grid Computing

• The grids can be of many types as; Knowledge, Data, Computational,


Application Service Provisioning, Interaction or Utility.
• These have many pros and cons.
Pros are like;
 These are capable to solve larger, and more complex problems in
a shorter time.
 These are easier to collaborate with other organizations, and
these make better use of existing hardware.
Cons are like;
 Grid software and standards are still evolving,
 Non-interactive job submission.  
Types of Grid
Computational Grid
• Executes the application in parallel on multiple machines to reduce the
completion time.
• Processing power is the main computing resource, shared amongst
nodes.
• Provide distributed supercomputing,
• Give high throughput by increasing the completion rate of a stream of
jobs.
Data Grid
• Data storage capacity as the main shared resource amongst nodes
Cluster Vs. Grid Computing

Characteristics Cluster Grid


Commodity and High-end
Population Commodity Computers Computers

Ownership Single Multiple

User Management Centralized Decentralized

Resource Management Centralized Decentralized

Allocation/Scheduling Centralized Decentralized

Single System Image Yes No

scalability 100s 1000

Throughput Medium High


Jungle Computing
• Jungle computing is a simultaneous combination of heterogeneous,
hierarchical, and distributed computing resources.
• In many realistic scientific research areas, domain experts are being
forced into concurrent use of multiple clusters, grids, clouds,
independent computers, and more.
• Jungle computing refers to the use of diverse, distributed and highly
non-uniform high performance computer systems to achieve peak
performance.
• These new distributed computing paradigms have led to a diverse
collection of resources available to research scientists, including stand-
alone machines, cluster systems, grids, clouds, desktop grids, etc. as
shown in the Figure and this varied collection is named as jungle
computing.
Jungle Computing

Figure : The jungle computing - a diverse collection of computing

• Use grid and cloud infrastructures, in a variety of combinations along with traditional
supercomputers all connected via fast networks.
• Uses many-core technologies such as GPUs, as well as supercomputers on chip
• Thus provide high performance computing by using multiple diverse platforms and systems
simultaneously, giving rise to the term "computing jungle".
• Ibis high-performance distributed programming system is an example of the jungle computing.
Mainframe Computer
• Mainframe computers are data processing systems employed in mainly
in large organizations for various applications
• Mainframes are designed to handle very high volume input and output
(I/O) and emphasize throughput computing.
• A data processing system employed in large organizations for various
applications, like bulk data processing, process control, industry &
Consumer statistics, and financial and financial transaction processing.
• Mainframes use proprietary operating systems, most of which are
based on Unix, and a growing number on Linux.
• Over the years they have evolved from being room-sized to networked
configurations of workstations and servers that are an extremely 
competitive and cost effective platforms for e-
commerce development and hosting.
• Mainframes are so called because the earliest ones were housed in
large metal frames.
Mainframe Computer
Mainframe Characteristics
 Centralized control of resources

 HW and operating systems share disk access

 A STYLE of operation -2 tier computing (logic and data on host)

 Thousands of simultaneous I/O operations

 Clustering technologies

 Data and resource sharing capabilities


Evolving Mainframe Computer Architecture

 More and faster processors... MIMD

 More physical memory and greater virtual memory capability

 Dynamic capabilities for upgrading HW and SW

 Enhanced I/O devices and more and faster paths (channels)

 Increased ability to divide resources into multiple, logically


independent and isolated systems (LPAR)
 Enhanced clustering technologies (e.g. Parallel Sysplex)
The Size/Capacity of Mainframes
A Sample Single System Configuration includes:
- Mainframe box with 12 CPUs (I-streams) and 32 GB of memory
- shared database of 2,500 Disks with 40 Disk control units
- 2 tape robots used primarily for logging

Some Performance numbers for that configuration:


- 2 million real I/Os per second to Disk during peak hours
- over 100,000 transactions per second during peak hours.
- 3,380,000,000 (3.38 billion) transactions in 24 hours
- CPU capacity complex to execute over 68 billion instructions per
second (68,000 MIPS)
Mainframe facts
Mainframes in our midst
• Hidden from the public eye – background servers

Who uses mainframes?


• Most Fortune 1000 companies use a mainframe environment
• 60% of all data available on the Internet is stored on mainframe computers

Why mainframes?
• Large-scale transaction processing
 Thousands of transactions per second
• Support thousands of users and application programs
• Simultaneously accessing
• Terabytes of information in databases
• Large-bandwidth communications
End of Chapter - 5

You might also like