Welcome to Scribd!

Cpus: Latency Oriented Design

Uploaded by

0% found this document useful (0 votes)

26 views2 pages

CPUs are optimized for sequential processing with low latency through high clock speeds, large caches, branch prediction, and powerful ALUs. GPUs are optimized for parallel throughput using a massively parallel architecture with moderate clocks, small caches, and simpler control to boost memory throughput via many lightweight cores. The CPU acts as host, assigning parallel tasks to the GPU which handles them via thousands of cores. Programming models like CUDA expose GPU parallelism as threads. Memory throughput is prioritized over latency on GPUs through techniques like warp scheduling, global memory, and streaming multiprocessors running warps through a single instruction unit.

Original Description:

Original Title

Me_second

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

26 views2 pages

Cpus: Latency Oriented Design

Uploaded by

Shaha Mubarak

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

CPUs: Latency Oriented Design (consists of a few cores optimized for sequential serial processing)

CPUs for sequential parts where latency matters

-High clock frequency
-Large caches
Convert long latency memory accesses to short latency cache accesses
-Sophisticated control which add more complexity
Branch prediction for reduced branch latency
Data forwarding for reduced data latency
-Powerful ALU
Reduced operation latency.

GPUs: Throughput Oriented Design ( has a massively parallel architecture consisting of thousands of smaller,
more efficient cores designed for handling multiple tasks simultaneously )
GPUs for parallel parts where throughput wins.
-Moderate clock frequency (shorter clocks)
-Small caches
To boost memory throughput
-Simple control
No branch prediction , No data forwarding
-Energy efficient ALUs
Many, long latency but heavily pipelined for high throughput
-Require massive number of threads to tolerate latencies.

GPU is a slave, It cannot run the program using itself.

OS talking with CPU, then the CPU is going to assign the parallel tasks (applications) to the GPU because of
boost memory throughput (no interaction between OS and GPU).
Anything that is related to latency and the sequential parts, the CPU will do it.
OS, Databases for CPU only.
Sequential component ( 1-f ) limits the speed up (Amdahl’s low), So the sequential part that will run on CPU, will
control the speed up.

To improve the performance of applications?

1- Heterogeneous execution model ( CPU is the host, GPU is the device )
2- Develop a C-like programming language for GPU ( Compute Unified Device Architecture (CUDA) )
3- Unify all forms of GPU parallelism as CUDA thread
4- Programming model is “Single Instruction Multiple Thread” (SIMT)

Global Memory:
1. Faster than getting memory from the actual RAM but still have other options

Streaming Multiprocessors (SMs):

1. has multiple processors
2. Only one instruction unit (each thread shares program counter)
3. has a group of processors must run the exact same set of instructions at any given time.
4. Up to 32blocks to each SM as resource allows, each block executed as 32-thread warps SIMD
5. Maintains thread/block id #s and manages/schedules thread execution.

Each block:
1. All threads execute the same kernel program (SPMD)
2. each thread/warp handles a small portion of the given task
3. All threads share data and synchronize while doing their share of the work
4. All threads can cooperate together, other cases not.
5. Hardware is free to assigns blocks to any parallel processor at any time, so Each block can execute in any
order relative to other blocks

Warps:
1. from 32-threads in a warp execute the same set of instructions at the same time (SIMD) because there
is only one instruction unit
2. it’s a scheduling unit in SM and it run concurrently
3. No ordering to execute the blocks or warps
4. zero-overhead warp scheduling (No empty slots while there is some pending instruction)
a. Warps whose next instruction has its operands ready for consumption are eligible for execution
b. Eligible Warps are selected for execution on a prioritized scheduling policy
5. Warp/thread Divergence tries to have threads do different instructions in a single warp by using if
statements; for example, the two tasks will be run sequentially so I need more cycle to do it.

operand scoreboarding used to prevent hazards it let instruction becomes ready after the needed values are
deposited
#Cache and Shared Memory differ between GPU models and different architectures.

01 132kV FEEDER HALDIA DUPLEX
Document55 pages
01 132kV FEEDER HALDIA DUPLEX
hbj1990
No ratings yet
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
From Everand
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
Rodrigo Copetti
No ratings yet
AREP Excitation White Paper
Document3 pages
AREP Excitation White Paper
Notfound Byu
100% (2)
Processor Performance Enhancement
Document3 pages
Processor Performance Enhancement
Sheraz Ali
No ratings yet
Webinar Sigrity
Document54 pages
Webinar Sigrity
jagadees21
No ratings yet
Chip IO Circuit Design - IO Buffers Design in IC Communications
Document84 pages
Chip IO Circuit Design - IO Buffers Design in IC Communications
Văn Công
No ratings yet
Parallel Processing With Cuda
Document25 pages
Parallel Processing With Cuda
Sudip Adhikari
No ratings yet
CUDA Execution Model
Document67 pages
CUDA Execution Model
AbiMughal
No ratings yet
High Performance Computing On Gpu
Document37 pages
High Performance Computing On Gpu
Sushant Sharma
No ratings yet
An Overview of General Purpose Graphics Processing Units: Marc Moreno Maza
Document18 pages
An Overview of General Purpose Graphics Processing Units: Marc Moreno Maza
AsHraf G. ElrawEi
No ratings yet
Hreads: Program Counter: Registers
Document21 pages
Hreads: Program Counter: Registers
Rao Atif
No ratings yet
SMT and CMP Architectures
Document19 pages
SMT and CMP Architectures
ChippyVijayan
No ratings yet
GPU in Supercomputer
Document7 pages
GPU in Supercomputer
Chinmai Panibathe
No ratings yet
Lec 3
Document48 pages
Lec 3
zrashad04
No ratings yet
1.1 Summary Notes Computer Science A Level OCR
Document5 pages
1.1 Summary Notes Computer Science A Level OCR
Chi Nguyễn Phương
No ratings yet
Gpgpu Final
Document124 pages
Gpgpu Final
Sibghat Rehman
No ratings yet
Processes and Processors in Distributed Systems
Document93 pages
Processes and Processors in Distributed Systems
Nirali Dutiya
No ratings yet
Lec 1
Document27 pages
Lec 1
foof faaf
No ratings yet
Unit-5 Part1
Document85 pages
Unit-5 Part1
himateja mygopula
No ratings yet
Comp Arch Project 2 Final
Document29 pages
Comp Arch Project 2 Final
Archit
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
Document22 pages
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
Vinay Gupta
No ratings yet
Part 1 - Lecture 2 - Parallel Hardware
Document60 pages
Part 1 - Lecture 2 - Parallel Hardware
Ahmad Abba
No ratings yet
09 - Thread Level Parallelism
Document34 pages
09 - Thread Level Parallelism
Suganya Periasamy
50% (2)
Superscalar Processor Pentium Pro
Document11 pages
Superscalar Processor Pentium Pro
diwaker
No ratings yet
Unit 7 - Parallel Processing Paradigm
Document26 pages
Unit 7 - Parallel Processing Paradigm
MUHMMAD ZAID KURESHI
No ratings yet
Thread
Document8 pages
Thread
trupti.kodinariya9810
No ratings yet
CUDA, Supercomputing For The Masses: Part 4: Understanding and Using Shared Memory
Document3 pages
CUDA, Supercomputing For The Masses: Part 4: Understanding and Using Shared Memory
thatupiso
No ratings yet
EE6304 Lecture12 TLP
Document70 pages
EE6304 Lecture12 TLP
Ashish Soni
No ratings yet
The Processor
Document6 pages
The Processor
bogusbaikaw
No ratings yet
Threads
Document25 pages
Threads
Tâm Lê
No ratings yet
CP4253 Map Unit I
Document31 pages
CP4253 Map Unit I
Nivi V
No ratings yet
Case Study - Mac OS-2 - P
Document36 pages
Case Study - Mac OS-2 - P
Pranjal Kumawat
0% (1)
CUDA - Introduction CUDA - Introduction
Document3 pages
CUDA - Introduction CUDA - Introduction
olia.92
No ratings yet
3 - Types of OS
Document19 pages
3 - Types of OS
Manan Mehta
No ratings yet
SMT and CMP Architectures
Document19 pages
SMT and CMP Architectures
i_2loveu3235
100% (3)
Advantages of Thread Over Process
Document4 pages
Advantages of Thread Over Process
Sakshi
No ratings yet
3a Flynns
Document17 pages
3a Flynns
Hitarth Anand Rohra
No ratings yet
GPU Architecture: National Tsing-Hua University 2017, Summer Semester
Document36 pages
GPU Architecture: National Tsing-Hua University 2017, Summer Semester
Michael Shi
No ratings yet
Threads
Document24 pages
Threads
Ajay Kumar R
No ratings yet
Hardware Multithreading
Document10 pages
Hardware Multithreading
Farin Khan
No ratings yet
Multi Core
Document70 pages
Multi Core
Mark Veltzer
No ratings yet
Multi Threading
Document17 pages
Multi Threading
shivuhc
No ratings yet
SMT and CMP Architectures
Document19 pages
SMT and CMP Architectures
tp2006ster
No ratings yet
Lecture 3 Flynn's Classical Taxonomy
Document29 pages
Lecture 3 Flynn's Classical Taxonomy
nimranoor137
No ratings yet
1.1.2 Types of Processors
Document6 pages
1.1.2 Types of Processors
Jakub
No ratings yet
What Is Serial Computing?: Traditionally, Software Has Been Written For Serial Computation
Document22 pages
What Is Serial Computing?: Traditionally, Software Has Been Written For Serial Computation
Divya Nigam
No ratings yet
Unit 5
Document29 pages
Unit 5
Vanathi Priyadharshini
No ratings yet
L 3 GPU
Document33 pages
L 3 GPU
fdfs
No ratings yet
Unit Iv Multicore Architectures
Document32 pages
Unit Iv Multicore Architectures
parameshwari p
No ratings yet
Computer Organization - Hardwired V/s Micro-Programmed Control Unit
Document9 pages
Computer Organization - Hardwired V/s Micro-Programmed Control Unit
Chala Geta
No ratings yet
Basic Hardwares For ML - Aravind Ariharasudhan
Document23 pages
Basic Hardwares For ML - Aravind Ariharasudhan
Aravind Ariharasudhan
No ratings yet
Lecture 0: Cpus and Gpus: Prof. Mike Giles
Document36 pages
Lecture 0: Cpus and Gpus: Prof. Mike Giles
Aashish
No ratings yet
Hardware Accleration For ML
Document26 pages
Hardware Accleration For ML
Sai Sumanth
No ratings yet
5.operating System - Threads
Document18 pages
5.operating System - Threads
Shashank Rah
No ratings yet
Module 2 - Parallel Computing
Document55 pages
Module 2 - Parallel Computing
muwaheedmustapha
No ratings yet
Name: Abubakr & Haseeb Tariq Subject: Operating System Topic: Threads Types & Its Issues
Document20 pages
Name: Abubakr & Haseeb Tariq Subject: Operating System Topic: Threads Types & Its Issues
Bilal Khan Rind
No ratings yet
Ch-4 - Threads and Concurency
Document35 pages
Ch-4 - Threads and Concurency
sankarkvdc
No ratings yet
28 MIMD Architecture
Document28 pages
28 MIMD Architecture
fanna786
No ratings yet
AOS - Theory - Multi-Processor & Distributed UNIX Operating Systems - I
Document13 pages
AOS - Theory - Multi-Processor & Distributed UNIX Operating Systems - I
Sujith Ur Frnd
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
Document33 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
Neha
No ratings yet
OPENMP Notes
Document4 pages
OPENMP Notes
avinash kumar
No ratings yet
5 Marks Q. Describe Array Processor Architecture
Document11 pages
5 Marks Q. Describe Array Processor Architecture
Gaurav Biswas
No ratings yet
Chapter 4
Document15 pages
Chapter 4
MehreenAkram
No ratings yet
CUDA Programming On Nvidia Gpus: Mike Giles
Document21 pages
CUDA Programming On Nvidia Gpus: Mike Giles
proxymo1
No ratings yet
Rel 003
Document0 pages
Rel 003
Bence M Zoltan
No ratings yet
L24D10 MT23-AP Service Manual
Document44 pages
L24D10 MT23-AP Service Manual
unexpoyap
100% (2)
ANTENTOP-01 - 2014 # 018 Antenna UA6AGW in Experimenters by RU1OZ
Document2 pages
ANTENTOP-01 - 2014 # 018 Antenna UA6AGW in Experimenters by RU1OZ
Lucian Rob
No ratings yet
Overview of IEEE Color Books
Document9 pages
Overview of IEEE Color Books
salsa222
100% (1)
3001 Elex 2 PDF
Document3 pages
3001 Elex 2 PDF
Gemalyn Nacario
No ratings yet
DSP For Matlab and LabVIEW
Document213 pages
DSP For Matlab and LabVIEW
AmaniDarwish
No ratings yet
Configurar Lantronix MSS 100
Document8 pages
Configurar Lantronix MSS 100
Carlos Angarita
No ratings yet
Adaptive Signal Processing, Ebook - Ingle
Document72 pages
Adaptive Signal Processing, Ebook - Ingle
Dev Vishnu
No ratings yet
Logic Gates - Students
Document27 pages
Logic Gates - Students
Findx pro
No ratings yet
Instruction Sheet Is Ie2825 - Style Number Ie2825Bn:: Package Contents
Document4 pages
Instruction Sheet Is Ie2825 - Style Number Ie2825Bn:: Package Contents
dayshift5
No ratings yet
RFMD.: CATV Hybrid Amplifier Modules: Past, Present, Future
Document21 pages
RFMD.: CATV Hybrid Amplifier Modules: Past, Present, Future
Santhosh Mech
No ratings yet
Solution Inverter
Document12 pages
Solution Inverter
Pragati Rana
50% (2)
Digital Signal Processing Lab Manual
Document37 pages
Digital Signal Processing Lab Manual
Vishurock
No ratings yet
DMS 2.5 Install Manual
Document100 pages
DMS 2.5 Install Manual
Fábio Santos
No ratings yet
Solar Power Charger: User Manual
Document8 pages
Solar Power Charger: User Manual
Syazwan Azam
No ratings yet
Design Issues and Preformance of Computer
Document7 pages
Design Issues and Preformance of Computer
Dhruwa Bhatt
0% (1)
Legrand New Pop Up
Document20 pages
Legrand New Pop Up
Adinugroho Sunardi
No ratings yet
01 Vol4 Circuit Kinematic Couplings Co-Simulation Skew
Document352 pages
01 Vol4 Circuit Kinematic Couplings Co-Simulation Skew
Samuel Molinari
No ratings yet
Grundfosliterature 2055
Document2 pages
Grundfosliterature 2055
Jose Marcial
No ratings yet
Smile-B3-Plus: Residential Series 3 KW Inverter
Document2 pages
Smile-B3-Plus: Residential Series 3 KW Inverter
Aunrand Danelle Bautista
No ratings yet
Arteche CT Instantaneous-Relays en PDF
Document28 pages
Arteche CT Instantaneous-Relays en PDF
50 Hz
No ratings yet
Qualified Parts List Directory
Document44 pages
Qualified Parts List Directory
Tanishq Dwivedi
No ratings yet
Xr-100T-Cdte: Amptek
Document4 pages
Xr-100T-Cdte: Amptek
Prashant Sharma
No ratings yet
Network Operating System (Features)
Document11 pages
Network Operating System (Features)
Joanna Lisa Carpio
100% (1)
2632-4555 Schematic
Document18 pages
2632-4555 Schematic
Mrs Krabz
No ratings yet