You are on page 1of 16

1

Lahore Garrison University


Parallel and Distributed
Computing
Session Fall-2020

Lecture – 05 Week – 03
INSTRUCTOR: MUHAMMAD ARSALAN RAZA
2

Lahore Garrison University


NAME: EMAIL: PHONE NUMBER: VISITING HOURS:
MUHAMMAD ARSALAN RAZA MARSALANRAZA@LGU.EDU.P 0333-7843180 (PRIMARY) MON-FRIDAY:(12:30PM-3:00PM)
K 0321-4069289 THURSDAY: (9:30PM-2:30PM)
3
Preamble

 Fault Tolerance

 Types

 Faulty Systems

Lahore Garrison University


4

Lahore Garrison University


 GPU Architectures
 Conventional CPU architecture
 Modern GPGPU architectures
Lesson Plan  AMD Southern Islands GPU Architecture
 Nvidia Fermi GPU Architecture
 Cell Broadband Engine
5
Conventional CPU Architecture

 CPUs are optimized to minimize the latency of a


single thread.
 Can efficiently handle control flow intensive workloads.
 Lots of space devoted to caching and control
logic.
 Multi-level caches used to avoid latency.
 Limited number of registers due to smaller
number of active threads.
 Control logic to reorder execution, provide
Instruction Level Parallelism and minimize
pipeline stalls.
Lahore Garrison University
6
Modern GPGPU Architecture

 Array of independent “cores” called Compute Units.


 High bandwidth, bankedL2 caches and main memory.
 Banks allow multiple accesses to occur in parallel
 100s of GB/s
 Memory and caches are generally non-coherent.
 Compute units are based on SIMD hardware.
 Both AMD and NVIDIA have 16-element wide SIMDs
 Large register files are used for fast context switching
 No saving/restoring state
 Data is persistent for entire thread execution
 Both vendors have a combination of automatic L1 cache and a user managed scratchpad
 Scratchpad is heavily banked and very high bandwidth (~terabytes/second)

Lahore Garrison University


7
Modern GPU Architecture

 Work-items are automatically grouped into hardware threads called


“wavefronts” (AMD)
or “warps” (NVIDIA)
 Single instruction stream executed on SIMD hardware
 64 work-items in a wavefront, 32 in a warp
 Instruction is issued multiple times on 16-lane SIMD unit
 Control flow is handled by masking SIMD lanes

Lahore Garrison University


8
AMD Southern Islands

 7000-series GPUs based on Graphics Core Next (GCN) architecture


 4 SIMDs per compute unit
 1 Scalar Unit to handle instructions common to wavefront
 Loop iterators, constant variable accesses, branches
 Has a single, integer-only ALU unit
 Separate branch unit used for some conditional instructions
 Radeon HD7970
 32 compute units
 Max performance

Lahore Garrison University


9

AMD
Southern
Islands
Architecture

Lahore Garrison University


10
NVIDIA Fermi Architecture

 GTX 480 - Compute 2.0 capability


 15 cores or Streaming Multiprocessors (SMs)
 Each SM features 32 CUDA processors
 480 CUDA processors
 Global memory with ECC
 SM executes threads in groups of 32 called warps.
 Two warp issue units per SM
 Concurrent kernel execution
 Execute multiple kernels simultaneously to improve efficiency
 CUDA core consists of a single ALU and floating-point unit FPU

Lahore Garrison University


11

NVIDIA Fermi
Architecture

Lahore Garrison University


12
Cell Broadband Engine

 Developed by Sony, Toshiba, IBM


 Transitioned from embedded platforms into HPC via the Playstation 3
 OpenCL drivers available for Cell Bladecenter servers
 Consists of a Power Processing Element (PPE) and multiple Synergistic Processing
Elements (SPE)
 Uses the IBM XL C for OpenCL compiler

Lahore Garrison University


13
Cell Broadband Engine Architecture

Lahore Garrison University


14
Lesson Review

 GPU Architectures
 Conventional CPU architecture
 Modern GPGPU architectures
 AMD Southern Islands GPU Architecture
 Nvidia Fermi GPU Architecture
 Cell Broadband Engine

Lahore Garrison University


15
Next Lesson Preview

 GPU Programming
 Maryland CPU/GPU Cluster Architecture
 Intel’s response to NVIDIA GPUs
 To Accelerate or not to accelerate
 When is GPUs appropriate?
 CPU vs. GPU hardware design philosophies
 CUDA-capable GPU hardware architecture
 Is it hard to program on a GPU?
 How to program on a GPU today?

Lahore Garrison University


16
References

 To cover this topic, different reference material has been used for consultation.

 Textbook:

Distributed Systems: Principles and Paradigms, A. S. Tanenbaum and M. V. Steen,


Prentice Hall, 2nd Edition, 2007.

Distributed and Cloud Computing: Clusters, Grids, Clouds, and the Future Internet, K.
Hwang, J. Dongarra and GC. C. Fox, Elsevier, 1st Ed.

 Google Search Engine

Lahore Garrison University

You might also like