You are on page 1of 61

Department of

Computer Science and Engineering

UNIT 4-

Faculty Name : Dr.P.Jose


Dr.Umaneshan
Subject Name : MODERN COMPUTER ARCHITECTURE
Course Code : 10211CS129

School of Computing
Vel Tech Rangarajan Dr. Sagunthala R&D Institute of
11/07/2023 Science and Technology 1
Unit-4::Syllabus
UNIT- IV DESIGN
Unit 4 High Performance Computing (HPC)
9Hours

HPC Architecture, Parallel Processing, Parallel


Memory Models, Data vs. Task Parallelism, High
Throughput Computing, Vectorization,
Multithreading.
11/07/2023 2
Unit-4:: HPC
HPC Architecture
Architecture
• HPC technology uses clusters of powerful processors, working
in parallel, to process massive multi-dimensional datasets (big
data) and solve complex problems at extremely high speeds.
• HPC systems typically perform at speeds more than one
million times faster than the fastest commodity desktop, laptop
or server systems.
• In HPC architecture, multiple computer servers are networked
together to form a cluster that has more outstanding
performance than a single computer.
• HPC system shifted from Super Computer to computing
clusters

11/07/2023 3
Unit-4::Invention of HPC
• Need for ever increasing Performance
• Visionary concept of Parallel Computing

11/07/2023 4
Unit-4::Invention of HPC
How does HPC work?

• HPC have three main components:


• Compute
• Network
• Storage
• To build a high performance computing architecture, compute
servers are networked together into a cluster. Software programs
and algorithms are run simultaneously on the servers in the cluster.
The cluster is networked to the data storage to capture the output.

11/07/2023 5
• To build an efficient and high performance computing architecture, in HPC
computational servers are interconnected to work on large or complex operations in
the Cluster environment.
• In the cluster, where software programs or algorithms run simultaneously to produce
the effective outcome.
• Then the cluster will be networked together in the data storage to produce the output.
• To operate at the maximum level, each component in the cluster should pace with
each other (in the cluster each component is referred to as a node).
• The storage component should be able to feed and inject the data from various
network sources.
• Each networking component should be able to deliver high speed data transportation
between the computational servers and data storage devices.

11/07/2023 6
HPC Application

11/07/2023 7
Unit-4::Parallel Processing

• Parallel processing described as a class of techniques


which enables the system to achieve simultaneous
data-processing tasks to increase the computational
speed of a computer system.
• A parallel processing system can carry out simultaneous
data-processing to achieve faster execution time.
• parallel processing is to enhance the computer
processing capability and increase its throughput

11/07/2023 8
Unit-4::Parallel Processing
• one possible way of separating the execution unit into
eight functional units operating in parallel. diagram
shows one possible way of separating the execution unit
into eight functional units operating in parallel.

11/07/2023 9
Unit-4::Parallel Processing

• The adder and integer multiplier performs the


arithmetic operation with integer numbers.
• The floating-point operations are separated into three
circuits operating in parallel.
• The logic, shift, and increment operations can be
performed concurrently on different data. All units are
independent of each other, so one number can be
shifted while another number is being incremented.

11/07/2023 10
Unit-4::Parallel Memory Models

• Parallel memory models in COA are ways of organizing the memory


access and communication among multiple processors in a parallel
computer. Some common models are:
• Uniform Memory Access (UMA): All processors share a single physical
memory and have equal access time to it. Also known as Symmetric
Multiprocessors (SMP).
• Non-uniform Memory Access (NUMA): Each processor has its own
local memory and can access the memory of other processors with
longer latency. Also known as Distributed Shared Memory (DSM).
• Cache-only Memory Access (COMA): There is no local memory for each
processor, only a large shared cache that is dynamically allocated
among the processors.
11/07/2023 11
Unit-4::UMA

• All the processors share the physical memory uniformly.


All the processors have equal access time to all the
memory words. Each processor may have a private
cache memory. Same rule is followed for peripheral
devices.
• When all the processors have equal access to all the
peripheral devices, the system is called a symmetric
multiprocessor. When only one or a few processors can
access the peripheral devices, the system is called
an asymmetric multiprocessor.
11/07/2023 12
Unit-4::UMA

11/07/2023 13
Unit-4::NUMA

• In NUMA multiprocessor model, the access time varies


with the location of the memory word. Here, the
shared memory is physically distributed among all the
processors, called local memories.

• The collection of all local memories forms a global


address space which can be accessed by all the
processors.

11/07/2023 14
Unit-4::NUMA

11/07/2023 15
Unit-4::COMA

• The COMA model is a special case of the NUMA model.


Here, all the distributed main memories are converted to
cache memories.
• Distributed - Memory Multicomputers − consists of
multiple computers, known as nodes, inter-connected by
message passing network.
• Each node acts as an autonomous computer having a
processor, a local memory and sometimes I/O devices.
• In this case, all local memories are private and are
accessible only to the local processors
11/07/2023 16
Unit-4::COMA

11/07/2023 17
Unit-4::Parallelism

Parallelism
Parallelism is an implementation property.
Parallelism is literally the simultaneous physical
execution of tasks at runtime, and it requires
hardware with multiple computing resources.

11/07/2023 18
Unit-4::Parallelism

11/07/2023 19
Unit-4::Parallelism

Parallel Computing
• Parallel computing uses multiple computer cores
to do several operations at once.
• Parallel architecture can break down a job into
its component parts and multi-task them.
• Parallel computer systems are well suited to
modeling and simulating real-world phenomena.

11/07/2023 20
Unit-4:: Examples

Serial computing Examples:


smartphones of 2010, iPhone 4 and Motorola Droid
HP Spectre x360 laptop
Parallel Computing Examples:
(By increasing the processing and Network Speed)
IBM released the first multi-core processors for
computers ten years before that in 2001
The recent enhancements are BigData, IOT, AI Tools
11/07/2023 21
Unit-4::Parallelism-Advantages

Advantages of Parallelism:
• Execute code more efficiently - save time and
money by sorting through “big data” faster than
ever.
• Compared to serial computing, parallel computing is
much better suited for modeling, simulating and
understanding complex, real-world phenomena.

11/07/2023 22
Unit-4::Parallelism-Advantages
Advantages of Parallelism:
• Throwing more resources at a task will shorten its time to
completion
• Parallel computers can be built from cheap, commodity
components.
• Many problems are so large and/or complex that it is
impractical or impossible to solve them on a single computer
can be done by using multiple computing resources.
• It has massive data storage and quick data computations.

11/07/2023 23
Unit-4::Parallelism-Disadvantages

• Programming to target Parallel architecture is a bit difficult


• It Creates data-intensive problems (method inluence the size and
complexity of information source) while using multicore
processors
• The extra cost (i.e. increased execution time) incurred are due to
data transfers, synchronization, communication, thread
creation/destruction, etc.
• costs can sometimes be quite large, and may actually exceed the
gains due to parallelization.

11/07/2023 24
Unit-4::Parallelism-Disadvantages
• Various code tweaking ( changing the values of underlying
variables to coincide with desired results.) has to be performed
for different target architectures for improved performance.
• Better cooling technologies are required in case of clusters.
• Power consumption is huge by the multi-core architectures.
• Parallel solutions are harder to implement, to debug or prove
correct.
• communication and coordination overhead is more when coou

11/07/2023 25
Unit-4::Parallelism-Examples
• SMARTPHONES
• iPhone 5 has a 1.5 GHz dual-core processor.
• iPhone 11 has 6 cores.
• Samsung Galaxy Note 10 has 8 cores.
• . LAPTOPS AND DESKTOPS
• Intel Core™ i5 and Core i7 chips in the HP Spectre Folio
and HP EliteBook x360 each have 4 processing cores.
• The HP Z8 - the world’s most powerful workstation - packs
in 56-cores of computer power
( it perform real-time video editing in 8K video or run
complex 3D simulations.)
11/07/2023 26
Unit-4::Parallelism-Examples

• ILLIAC IV
- First “massively” parallel computer, built largely at
the University of Illinois.
- The machine was developed in the 1960s with help
from NASA and the U.S. Air Force.
- It had 64 processing elements capable of handling
131,072 bits at a time

11/07/2023 27
Unit-4::Parallelism-Examples

• NASA’S SPACE SHUTTLE COMPUTER SYSTEM


- uses 5 IBM AP-101 computers in parallel
- They control the shuttle’s avionics, processing large amounts of fast-paced
real-time data.
- can perform 480,000 instructions per second.
- also been used in F-15 fighter jets and the B-1 bomber

• AMERICAN SUMMIT SUPERCOMPUTER


- most powerful supercomputer
- The machine was built by the U.S. Department of Energy at their Oak
Ridge National Laboratory.
- It’s a 200-petaFLOPS machine that can process 200 quadrillion
operations per second.
11/07/2023 28
Unit-4::Parallelism-Examples

• SETI
-The Search for Extra terrestrial Intelligence (SETI)
- Monitors millions of frequencies all day and night.
- uses parallel computing through the Berkeley Open
Infrastructure
for Network Computing (BOINC) (Millions of people
donate unused computer time to process all those signals.)
• BITCOIN
- uses multiple computers to validate transactions.
11/07/2023 29
Unit-4::Parallelism-Examples

• THE INTERNET OF THINGS (IOT)


- soil sensors to smart cars, drones, and pressure
sensors etc (able to process real-time telemetry
data from the IoT)
• MULTITHREADING
parallel computing software method that works best
in parallel computer systems.

11/07/2023 30
Unit-4::Parallelism-Examples
• PYTHON
- A special multiprocessing module
- It uses “subprocesses” in place of threads. (Threads
share memory, while subprocesses use different
memory
• PARALLEL COMPUTING IN R
- a serial coding language for statistical and graphical
computing. The parallel package released in 2011
• PARALLEL COMPUTING TOOLBOX
- The Parallel Computing Toolbox from MathWorks
11/07/2023 31
Unit-4::Parallelism-Architecture
One machine with multiple processors, or lots of machines
cooperating in a network.
There are 3 distinct architectures.
•Shared memory : Parallel computers use multiple processors to
access the same memory resources. (modern laptops, desktops, and
smartphones)
•Distributed memory: Parallel computers use multiple processors,
each with their own memory, connected over a network. (cloud
computing, distributed rendering of computer graphics, and shared
resource systems like SETI ).
•Hybrid memory: parallel systems combine shared-memory
parallel computers and distributed memory networks.

11/07/2023 32
Unit-4::Parallelism-Architecture

• Quantum computers -
- will enhance parallel computations.
- It can do in 4 minutes what the most powerful
supercomputer on Earth would take 10,000 years to
accomplish
- 300-qubit quantum computer could do more
operations at once than the number of atoms in our
universe

11/07/2023 33
Unit-4::Parallelism-Types

• Data Parallelism
Data Parallelism means concurrent execution of the same task on
each multiple computing core.
•Let’s take an example, summing the contents of an array of size N.
For a single-core system, one thread would simply sum the elements
[0] . . . [N − 1].
•In Dual core the Thread A, running on core 0, could sum the
elements [0] . . . [N/2 − 1] and Thread B, running on core 1, could sum
the elements [N/2] . . . [N − 1].
(Two threads would be running in parallel on separate computing
cores.)
.
11/07/2023 34
Unit-4::Parallelism-Types

• Task Parallelism
• Task Parallelism means concurrent execution of the
different task on multiple computing cores.
In task parallelism might involve two threads, each
performing a unique statistical operation on the array of
elements.
The threads are operating in parallel on separate
computing cores, but each is performing a unique
operation.
11/07/2023 35
Unit-4::Parallelism-Types

• Bit-level parallelism
• Bit-level parallelism is a form of parallel
computing which is based on increasing
processor word size.
• Increasing the word size reduces the number of
instructions the processor must execute in order
to perform an operation on variables whose
sizes are greater than the length of the word.
11/07/2023 36
Unit-4::Parallelism-Types
• Bit-level parallelism
• Bit-level parallelism is a form of parallel computing which is based
on increasing processor word size.
• Increasing the word size reduces the number of instructions the
processor must execute in order to perform an operation on
variables whose sizes are greater than the length of the word.
• (E.g., consider a case where an 8-bit processor must add two 16-
bit integers. First the 8 lower-order bits from each integer were
must added by processor, then add the 8 higher-order bits, and
then two instructions to complete a single operation. A processor
with 16- bit would be able to complete the operation with single
instruction.
11/07/2023 37
Unit-4::Parallelism-Types

• Instruction-level parallelism
• Instruction-level parallelism means the simultaneous
execution of multiple instructions from a program.
Example
• for (i=1; i<=100; i= i+1)
• y[i] = y[i] + x[i];
• This is a parallel loop. Every iteration of the loop can
overlap with any other iteration, although within each
loop iteration there is little opportunity for overlap.
11/07/2023 38
The key differences between Data Parallelisms and Task
Parallelisms
Data Parallelisms Task Parallelisms

1. Same task are performed on different subsets of same 1. Different task are performed on the same or different
data. data.

2. Synchronous computation is performed. 2. Asynchronous computation is performed.

3. As there is only one execution thread operating on all 3. As each processor will execute a different thread or
sets of data, so the speedup is more. process on the same or different set of data, so speedup is
less.

4. Amount of parallelization is proportional to the input 4. Amount of parallelization is proportional to the


size. number of independent tasks is performed.

5. It is designed for optimum load balance on 5. Here, load balancing depends upon on the e
multiprocessor system. availability of the hardware and scheduling algorithms
like static and dynamic scheduling.

11/07/2023 39
Unit-4:: High Throughput Computing

• In computer science, high-throughput computing


(HTC) is the use of many computing resources over
long periods of time to accomplish a computational
task.
• High-throughput computing (HTC) is the use of
distributed computing facilities for applications
requiring large computing power over a long period
of time. HTC systems need to be robust and to
reliably operate over a long time scale
11/07/2023 40
Unit-4:: High Throughput Computing

• High Throughput Computing (HTC) is a computing paradigm that focuses on


executing a large number of tasks over an extended period of time. It is characterized
by the ability to handle a vast number of tasks in a parallel or distributed manner,
with the goal of maximizing the overall throughput of the system.
• HTC is commonly used in scientific and engineering applications, where a large
number of simulations, computations, or data processing tasks need to be
performed. Examples of HTC systems include grid computing, cloud computing, and
volunteer computing, which all rely on a distributed computing infrastructure to
handle a high volume of tasks.
• The main advantage of HTC is its ability to provide high computational capacity to
researchers and scientists, enabling them to run complex simulations and analyze
large datasets at scale. It allows for the efficient utilization of computing resources,
reducing the time and cost required to complete scientific experiments and
engineering projects.
11/07/2023 41
Unit-4:: HTC- Advantages

• Flexibility: more flexible and can be used for many computing tasks related to
business analytics and scientific research.

• Cost-Effectiveness: more cost-effective as compared to the solutions offered by


High-Performance Computing(HPC) ( makes use of hardware and software that is
available and less expensive and performs more tasks)

• Reliability: mostly designed to provide high reliability and make sure that all tasks
run efficiently even if any one of the individual components fails.

• Resource Optimization: Proper resource allocation by ensuring that all the
resources that are available are efficiently used and accordingly increases the value
of computing resources that are available.
11/07/2023 42
Unit-4:: HTC- Exampls

• Deep learning networks,


• image processing
• computer vision applications
• Speech Recognition
• NLP
• AI Tools etc

11/07/2023 43
Unit-4:: HTC- Exampls

What is an example of high throughput computing

• Racks of departmental servers,


• desktop machines,
• leased resources from the
• Cloud, allocations from national supercomputer centers
are all examples of these resources.
• This is an environment of distributed ownership, where
individuals throughout an organization own their own
resources.
11/07/2023 44
Unit-4:: Vectorization

• Parallel Computing method


• A processor can operate on an entire vector in one
instruction
• Work done automatically in parallel (simultaneously)
• Vector instructions access memory with known
pattern
• Reduces branches and branch problems in pipelines
• Vectorization is an example of Si
ngle Instruction, Multiple Data (SIMD) processing
11/07/2023 45
11/07/2023 46
Registers

11/07/2023 47
Vector Instruction Format

• The vector instruction format typically includes


an opcode field that specifies the operation to be
performed, a vector register specifier field that
identifies the vector register that contains the data to
be operated on, and a length field that specifies the
number of elements in the vector
11/07/2023 48
1. Operation Code
Operation code indicates the operation that has to be performed in the given instruction. It decides the
functional unit for the specified operation or reconfigures the multifunction unit.
2. Base Address
Base address field refers to the memory location from where the operands are to be fetched or to where
the result has to be stored. The base address is found in the memory reference instructions. In the vector
instruction, the operand and the result both are stored in the vector registers. Here, the base
address refers to the designated vector register.
3. Address Increment
A vector operand has several data elements and address increment specifies the address of the next
element in the operand. Some computer stores the data element consecutively in main memory for
which the increment is always 1. But, some computers that do not store the data elements consecutively
requires the variable address increment.
4. Address Offset
Address Offset is always specified related to the base address. The effective memory address is calculated
using the address offset.
5. Vector Length
Vector length specifies the number of elements in a vector operand. It identifies the termination of a
vector instruction.

11/07/2023 49
11/07/2023 50
Comparison Of General Processing vs
Vector Processing

11/07/2023 51
Basic Vector Architecture

11/07/2023 52
Application
•Multimedia Processing (compress., graphics, audio synth, image proc.)
•Cryptography (RSA, DES/IDEA, SHA/MD5)
•Speech and handwriting recognition
•Databases (hash/join, data mining, image/video serving)
•The following are some areas where vector processing is used:
1. Petroleum exploration.
2. Medical diagnosis.
3. Data analysis.
4. Weather forecasting.
5. Aerodynamics and space flight simulations.
6. Image processing.
7. Artificial intelligence.

11/07/2023 53
11/07/2023 54
Unit-4: Multithreading

• Processor-central processing unit


• Single core in a multi-core processor to
provide multiple threads of execution
concurrently, by the operating system.
• multithreading aims to increase utilization of a
single core by using
• thread-level parallelism
• instruction-level parallelism

11/07/2023 55
Types of Multithreading

• Fine grained Multi Threading


• Coarse grained Multi Threading
• Simultaneous Multi Threading

11/07/2023 56
11/07/2023 57
11/07/2023 58
Multithreading vs. Multiprocessing

• In programming, an instruction stream is called a


thread and the instance of the computer program
that is executing is called a process. Each process has
its own memory space where it stores threads and
other data the process requires to execute.
• While multithreading allows a process to create more
threads to improve responsiveness, multiprocessing
simply adds more CPUs to increase speed.

11/07/2023 59
Comparison of Multithreading
Multiprocessing

11/07/2023 60
Department of
Computer Science and Engineering

Thank You

School of Computing
Vel Tech Rangarajan Dr. Sagunthala R&D Institute of
11/07/2023 Science and Technology 61

You might also like