You are on page 1of 48

Unit 5

Distributed Multiprocessor
Architectures
Distributed Multiprocessor
Architectures

• Loosely coupled and tightly coupled


architectures
• Cluster computing as an application of loosely
coupled architecture. Examples –CM* and
Hadoop.
Some Basics….
• Whenever working on projects, it seems as
though several people coordinating together
makes for a better solution then one person
trying to piece things together on their own.
• This is similar to the concept of multiprocessing.
• Multiprocessing is n number of p processors
working and operating concurrently.
• A multiprocessing system refers to a system
configuration that contains more than one main
control processor unit (CPU).
Why use a
multiprocessing system?
• First of all, a multiprocessing system is used to
increase overall system performance in work
being accomplished, also referred to as
throughput.
• By working together problems can be divided
up among processor for faster completion,
also called “divide and conqueror”.
• Another reason for using multiprocessing
systems is to increase system availability.
Introduction
• Key attributes of “multiprocessors”:-
– Single computer that includes multiple processors
– Processors may communicate at various levels
• Message passing or shared memory
• Multiprocessor and Multicomputer systems
– Multiple computer system consist of several autonomous computers
which may or may not communicate with each other.
– Multiprocessor system is controlled by single operating system which
provides mechanism for interactions among processors
• Architectural models
– Tightly coupled multiprocessor
– Loosely coupled multiprocessor
Tightly coupled multiprocessor(Basics)

• Communicate via shared memory.


• Complete connectivity between processor and
memory.
• This Connectivity accomplished by any
interconnection network.
• Drawback-:
Performance degradation due to memory
conflicts
Tightly Coupled Architecture(Details)
• A tightly coupled multiprocessor system may
be used in cases where speed is more of a
concern.
• Models:-
– Without private cache
– With private cache
Architecture(Without Private Cache)
• This model consists of p number of processors,
l memory modules, and d I/O channels.
• Everything is then connected using a P/M
interconnection network (PMIN).
• The PMIN is a switch that can connect every
processor to every memory module.
• A memory module can satisfy only one
processors request in a given memory cycle.
This conflict is arbitrated by the PMIN.
Tightly Coupled Architecture
• However, in this system the best way to prevent
these types of conflicts is to make l equal to p (i.e.
memory modules equal to the number of
processors).
• Another way of eliminating this conflict is to use
unmapped local memory (ULM)(Reserved
Memory Area For Each Processor)
• By adding the ULM we are able to reduce the
amount of traffic to the PMIN and thereby
reducing conflicts to and from memory.
Tightly coupled multiprocessor contd.
Interrupt signal
Interconnection network
(ISIN) Input Output
channels
d-1
p-1 . . disks
Processors .. Input/Output . .
.. . .
.. Interconnection network . .
0 (IOPIN) . .
0

Mapped ...... Unmapped Local Memory


Local Memory

Processor Memory
Interconnection network
(PMIN)

Shared Memory Modules


0 ......... l-1 10
Problem
• In this type of system architecture the
memory references made by the processors is
usually main memory.
• Memory reference common to all processor
will cause conflicts.
• PMIN will surely resolve this conflicts but it
will cause delay in operation,which increases
instruction cycle time,which decreases
throughput..
Solution
• Delay can be reduced by having cache for each
processor which will hold memory reference
for each processor.
• But cache coherance problem should be taken
care of.
• Refer to diagram.
Tightly coupled multiprocessor contd.
Interrupt signal
Interconnection network
(ISIN) Input Output
channels
d-1
p-1 . . disks
Processors .. Input/Output . .
.. . .
.. Interconnection network . .
0 (IOPIN) . .
Mapped
Local Memory 0
......
Unmapped Local Memory
Private
Caches

Processor Memory
Interconnection network
(PMIN)

Shared Memory Modules


0 ......... l-1 13
Tightly coupled multiprocessor
• ISIN permits each processor to interrupt to
each processor.
• ISIN also used by failing processor to
broadcast message.
• IOPIN permits processor to communicate with
IO channel.
Tightly coupled multiprocessor contd.
• Processor types
– Homogeneous, if all processors perform same
function
– Heterogeneous, if processors perform different
functions
Note: Two functionally same processor may differ
along other parameters like I/O, memory size,
etc, i.e. they are asymmetric

15
Loosely Coupled Architecture
• Each processor has its own set of I/O devices
and memory where it accesses most of its
instructions and data
• Computer Module: Processor, I/O interface
and memory
Input/Output
Local memory (I/O)
Processor (LM)
(P)

Channel and
Arbiter Switch
(CAS)
Loosely coupled multiprocessor contd.
• Inter-process communicate over different module happens by
exchange of messages, using message transfer system (MTS)
• Distributed system, degree of coupling is loose
• Degree of memory conflicts is less
LM I/O LM I/O

P P

CAS CAS
Computer Module 0 ………….. Computer Module N-1

Message Transfer System (MTS)

17
Loosely coupled multiprocessor
• Inter module communication
– Channel arbiter and switch (CAS)
– Arbiter decide when requests from two or more computer
module collide in accessing a physical segment of MTS

– Also responsible for delaying other request until servicing


request is completed.

18
Loosely coupled multiprocessor
• Message Transfer System (MTS)
– Time based or shared memory
– The latter case can be implemented with set of
memory modules and processor-memory
interconnection network or multiported main
memory.
– MTS determines the performance of
multiprocessor system.
Loosely coupled multiprocessor
• For LCS, that use single time shared bus,
performance limited by ,message arrival rate
on bus, message length and bus capacity.
• For LCS with shared memory, limiting factor is
memory conflict problem imposed by
processor memory interconnection network.
Cm* Architecture
• Project at Carnegie Melon University
• Now what is computer module?

P S

LM I/O
• Computer module consists of processor, S local,
local memory and I/O.
• S local similar to CAS in loosely coupled arch.
Cluster of computer Modules
Inter-cluster Bus

Cm1 Cm10Map Bus


KMAP P S P S

LM I/O LM I/O
Role Of S local
• Receives and interprets requests for access to P's
local and foreign to local memory and the I / O
• S allows a local P to access external resources Cm
• To make interpretation of local and external
applications software provide:
• A translation of local addresses
Address Translation
K map Components
• It uses 4 high order bits along with 1 pSW bit
and then they access map table.
• Map Table determines whether memory is local
or not.
• If memory non local control is given to K map via
map bus.
• CM connected to k map via map bus.
• K map responsible for routing data between s
locals.
AP
Kmap Components
Intercluster Bus 1
Intercluster Bus 2

Link
SEND SEND
SERVICE RETURN
PORT 2 PORT 1

RUN

KBUS PMAP

OUT Map Bus

Cm Cm … Cm
Kmap Components
• Request for non local memory arrives at kbus
via map bus.
• Linc manages communication Between Kmap
and another kmap.
• Pmap ->mapping processor which response to
request between kbus and linc.
Kmap Components
• Kmap can simultanously handle 8 processor
request.
• Pmap uses the concept of queue to handle
request.
Kmap Components
• Service req signaled to kbus whenver req for
non local memory ref.
• Such computer module called master Cm.
• Kmap fetches virtual address via map bus and
allocates context for pmap.
• It places the virtual address in pmap run
queue.
• Pmap performs virtual address to physical
address translation.
Kmap Components
• Using physical address it can initiate memory
acces in any cm.
• Kmap services the out req by sending physical
memoryof memory req via map bus.
• When destination cm completes memory
access it sends return signal to kmap.
Intracluster Communication
KMAP
4
PMAP
3 5 Map Bus
RUN OUT
1
KBUS Cm … Cm
2
Master Slave

• Cm Master initiates a memory access nonlocal


• Master Cm virtual address issued by KBUS
• KBUS activates a context (creating specific data structure
transition) that the PMAP RUN queue
• PMAP treats context and do address translation
• PMAP OUT queue a request for memory cycle Cm Slave of the
current cluster
Intracluster Communication
KMAP
4
PMAP 6
3 5 Map Bus
RUN OUT
1 9 8 7
KBUS Cm … Cm
2
Master Slave
• KBUS send physical address to Cm Slave by Map Bus
• There is the local slave Cm local memory access cycle .
• KBUS "allow" the result of memory access operation to be
provided by Master Cm
• Cm Master takes the data, complete and continuous operation
during execution
Intracluster communication
3
Intercluster Bus
2 4
KMAP KMAP
Map Bus Master Slave Map Bus

1 … 5…
Cm Cm Cm Cm
Master Slave

1. Cm Master sends a transfer request to KMAP Master


2. Master prepares KMAP message / request package encode
intercluster
3. Intercluster message is transmitted on the bus intercluster
routing algorithms
4. Slave KMAP decode incoming requests and sends to the cluster
or localMemory cycle request is sent to Cm Slave
Intracluster communication
Cop Segment Offset 3
Intercluster Bus
8
2 9 7 4
KMAP KMAP
R/W Cm # Page Offset
Map Bus Master Slave Map Bus

1 … 5…
Cm Cm Cm Cm
10 6
K/U R/W Cm # Page Offset
Master Slave

5. Cm Slave Slave transmits the result to KMAP


6. Slave ready KMAP message intercluster (ie context reactivation)
7. KMAP Slave Master transmits the result to KMAP
8. KMAP Master receives and interprets the message received
9. The result is sent to the Master Cm
10. Result received by Cm.
BIGDATA fACTs
• Data intensive applications with Petabytes
of data

• Web pages - 20+ billion web pages x 20KB


= 400+ terabytes

– One computer can read 30-35 MB/sec


from disk ~four months to read the web
– same problem with 1000 machines, < 3
hours
Single-thread performance doesn’t matter
We have large problems and total throughput/price more
important than peak performance

Stuff Breaks – more reliability


• If you have one server, it may stay up three years (1,000
days)
• If you have 10,000 servers, expect to lose ten a day

“Ultra-reliable” hardware doesn’t really help


At large scales, super-fancy reliable hardware still fails, albeit
less often
– software still needs to be fault-tolerant
– commodity machines without fancy hardware give
better perf/price
What is Hadoop?

 It's a framework for running applications on large


clusters of commodity hardware which produces
huge data and to process it

 Hadoop is a framework used to have distributed


processing of big data which is stored at different
physical locations.
• The Apache Hadoop software library is a framework that allows
for the distributed processing of large data sets across clusters
of computers using simple programming models.
• It is designed to scale up from single servers to thousands of
machines, each offering local computation and storage.
• Rather than rely on hardware to deliver high-availability, the
library itself is designed to detect and handle failures at the
application layer, so delivering a highly-available service on top
of a cluster of computers, each of which may be prone to
failures.
Hadoop Includes

 HDFS a distributed filesystem

 Map/Reduce HDFS implements this programming model.


It is an offline computing engine
Hadoop HDFS
• Hardware failure is the norm rather than the
exception.

• Moving Computation is Cheaper than Moving


Data
HDFS
• run on commodity hardware

• HDFS is highly fault-tolerant and is designed


to be deployed on low-cost hardware

• provides high throughput access to


application data

• suitable for applications that have large data


sets
NameNode and DataNodes
• HDFS has a master/slave architecture

• NameNode :-manages the file system


namespace and regulates access to files by
clients

• DataNodes, usually one per node in the cluster,


which manage storage attached to the nodes
that they run on

• a file is split into one or more blocks


• these blocks are stored in a set of DataNodes

• NameNode executes file system namespace


operations like opening, closing, and renaming
files and directories

• It also determines the mapping of blocks to


DataNodes

• The DataNodes are responsible for serving read


and write requests from the file system’s clients.

• The DataNodes also perform block creation,


deletion, and replication upon instruction from
the NameNode.
HDFS Internal
Hadoop mapreduce
• software framework for easily writing applications which
process vast amounts of data (multi-terabyte data-sets) in-
parallel on large clusters (thousands of nodes) of commodity
hardware in a reliable, fault-tolerant manner

• A MapReduce job usually splits the input data-set into


independent chunks which are processed by the map
tasks in a completely parallel manner.

• The framework sorts the outputs of the maps, which are


then input to the reduce tasks

• Typically the compute nodes and the storage nodes are the
same
Hadoop mapreduce
• The MapReduce framework consists of a single
master JobTracker and one slave TaskTracker per
cluster-node

• The master is responsible for scheduling the jobs'


component tasks on the slaves, monitoring them
and re-executing the failed tasks

• The slaves execute the tasks as directed by the


master
Hadoop mapreduce
• applications specify the input/output locations
• supply map and reduce functions via
implementations of appropriate interfaces
and/or abstract-classes.
• The Hadoop job client then submits the job and
configuration to the JobTracker

• JobTracker assumes the responsibility of


distributing the software/configuration to the
slaves, scheduling tasks and monitoring them,
providing status and diagnostic information

You might also like