Welcome to Scribd!

Lecture 29 GPU Architecture Example

Uploaded by

0% found this document useful (0 votes)

6 views15 pages

This document discusses data level parallelism and GPU architectures. It describes how GPUs use a single instruction multiple thread programming model to efficiently perform data parallel operations. The CUDA programming language is used to write functions ("kernels") that execute across many threads in parallel. Threads are organized into blocks, and blocks are organized into grids. GPUs have SIMD processors that execute the same instruction across multiple threads at once. GPU registers are divided among threads, with each thread getting its own elements. Programming the GPU involves distinguishing device and host functions and coordinating data transfers between CPU and GPU memory.

Original Description:

Original Title

Lecture_29_GPU_Architecture_Example

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

6 views15 pages

Lecture 29 GPU Architecture Example

Uploaded by

Udai Valluru

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 15

Search inside document

CKV

Advanced VLSI Architecture

MEL G624

Lecture 29: Data Level Parallelism

CKV
Graphical Processing Units
Basic Idea

Heterogeneous execution model

CPU is the host, GPU is the device

Develop a C‐like programming language for GPU

Compute Unified Device Architecture (CUDA)  Nvidia
OpenCL for vendor-independent language

Unify all forms of GPU parallelism as CUDA thread

Programming model is “Single Instruction Multiple Thread”

CKV
NVIDIA GPU Architecture
Similarities to vector machines:
Works well with data‐level parallel problems
Scatter‐gather transfers from memory into local store
Mask registers
Large register files

Differences:
No scalar processor, scalar integration
Uses multithreading to hide memory latency
Has many functional units, as opposed to a few deeply
pipelined units like a vector processor
CKV
Example
Multiply two vectors of length 8192
Code that works over all elements is the grid

Thread blocks break this down into manageable sizes

512 elements per block

SIMD instruction executes 32 elements (per thread) at a time

Thus grid size = 16 blocks

Block is analogous to a strip‐mined vector loop (VL 32 elements)

Block is assigned to a multithreaded SIMD processor be scheduler

CKV
A[0]=B[0] * C[0]
SIMD A[1]=B[1] * C[1]
Thread 0
A[31]=B[31] * C[31]

SIMD
Thread Thread 1
Block 0 A[63]=B[63] * C[63]

SIMD
Thread
Grid 15
A[511]=B[511] * C[511]

Thread
Block 15
SIMD
Thread
15
A[8191]=B[8191] * C[8191]
CKV
Fermi GTX 480 Thread Block assigned to
Multithreaded SIMD Processor
Thread Scheduler

Thread Block Scheduler

SM
CKV
Terminology
Threads of SIMD instructions
Each has its own PC
Thread scheduler uses scoreboard to dispatch
No data dependencies between threads!
Keeps track of up to 48 threads of SIMD instructions
Hides Memory Latency

Thread block scheduler schedules blocks to SIMD processors

Within each SIMD processor:

16 SIMD lanes
Wide and shallow compared to vector processors
CKV
Thread Scheduler
CKV
GPU Register Example
NVIDIA GPU has 32,768 registers

Divided logically into lanes

Each SIMD thread is limited to 64 vector like registers

SIMD thread has up to:

64 vector registers of 32 32‐bit elements 2048
32 vector registers of 32 64‐bit elements 32-bit Registers

Each CUDA thread gets one element

CUDA threads of thread block uses how many Reg?
CKV
Multi-Threaded SIMD Processor

SIMD Thread
Each instruction works on 32 elements
Keep Track of 48
Lanes
independent threads
CUDA Thread

2048 32-bit registers Dynamically Allocated to a Thread

CKV
Multi Threading

MIMD

SIMD

ILP
SM
CKV
Programming the GPU

Challenge for GPU Programmer

Get good performance

Coordinating the scheduling of computation on the system

processor and GPU

Transfer of data between system memory and GPU Memory

CKV
Programming the GPU

To distinguish between functions for GPU (device) and

System processor (host)

_device_ or _global_ => GPU Device

_host_ => System processor (host)

Variables declared in _device_ or _global_ functions are

allocated to GPU memory
Function Call: name<<<dimGrid, dimBlock>>>..parameters list..)
threadIdx: Identifier for threads per block
blockIdx: Identifier for blocks
blockDim: Number of threads per block
CKV
Programming the GPU
//Invoke DAXPY
daxpy(n,2.0,x,y);
//DAXPY in C
void daxpy(int n, double a, double* x, double* y){
for (int i=0;i<n;i++)
y[i]= a*x[i]+ y[i]
}

//Invoke DAXPY with 256 threads per Thread Block

_host_
int nblocks = (n+255)/256;
daxpy<<<nblocks, 256>>> (n,2.0,x,y);
//DAXPY in CUDA
_device_
void daxpy(int n,double a,double* x,double* y){
int i=blockIDx.x*blockDim.x+threadIdx.x;
if (i<n) y[i]= a*x[i]+ y[i]
}
CKV

Thank You for Attending

Kanji Learner S Course Graded Reading Sets Vol 3 PDF
Document370 pages
Kanji Learner S Course Graded Reading Sets Vol 3 PDF
DarklamaR
100% (1)
GPU Introduction
Document52 pages
GPU Introduction
spark1122
No ratings yet
Music in The World of Islam
Document22 pages
Music in The World of Islam
Arastoo Mihan
100% (1)
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
Document121 pages
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
eipu tu
No ratings yet
Nvidia Cuda
Document26 pages
Nvidia Cuda
Arpit Vijayvergia
No ratings yet
Gpu1 - GPU Introduction
Document20 pages
Gpu1 - GPU Introduction
Richik Dutta
No ratings yet
Gpu History and Cuda Programming Basics
Document44 pages
Gpu History and Cuda Programming Basics
Fransiskus Yoga Esa Wibowo
No ratings yet
Cuda Talk
Document82 pages
Cuda Talk
Kevin Salmeron Vicente
100% (1)
What Is Academic Writing
Document5 pages
What Is Academic Writing
ChiewHui KoNg
100% (1)
FPGA Selection: LTC2387-18 S.No Pin - Name Pin - No. - ADC Mode Purpose
Document6 pages
FPGA Selection: LTC2387-18 S.No Pin - Name Pin - No. - ADC Mode Purpose
Gurinder Pal Singh
No ratings yet
Qualcomm Hexagon Architecture
Document23 pages
Qualcomm Hexagon Architecture
Rahul Sharma
No ratings yet
Studies in The Latin of The Middle Ages and The Renaissance (1900)
Document127 pages
Studies in The Latin of The Middle Ages and The Renaissance (1900)
mysticmd
100% (1)
Summative TeST-Lesson 2 3rd
Document2 pages
Summative TeST-Lesson 2 3rd
Jacqueline Hernandez Mendoza
No ratings yet
PC Engine / TurboGrafx-16 Architecture: Architecture of Consoles: A Practical Analysis, #16
From Everand
PC Engine / TurboGrafx-16 Architecture: Architecture of Consoles: A Practical Analysis, #16
Rodrigo Copetti
No ratings yet
IEC TC57 Substation Configuration Language Summary PDF
Document13 pages
IEC TC57 Substation Configuration Language Summary PDF
zaheer ahamed
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
Document42 pages
Programming Gpus With Cuda: John Mellor-Crummey
askbilladdmicrosoft
No ratings yet
Parallel Processing With Cuda
Document25 pages
Parallel Processing With Cuda
Sudip Adhikari
No ratings yet
Cuda C
Document70 pages
Cuda C
Swati Choudhary
No ratings yet
Lesson 7 Grammar German Cases PDF
Document18 pages
Lesson 7 Grammar German Cases PDF
Boy Berchmas
No ratings yet
Graphic Organizers
Document2 pages
Graphic Organizers
missnancyrc
No ratings yet
Introduction To Gpu Programming With Cuda and Openacc
Document40 pages
Introduction To Gpu Programming With Cuda and Openacc
plop
No ratings yet
Lecture 28
Document22 pages
Lecture 28
Udai Valluru
No ratings yet
Lecture12 GPUArchCUDA02-CUDAMem
Document67 pages
Lecture12 GPUArchCUDA02-CUDAMem
Michelle Saver
No ratings yet
CUDA Programming Basic: High Performance Computing Center Hanoi University of Science & Technology
Document38 pages
CUDA Programming Basic: High Performance Computing Center Hanoi University of Science & Technology
Mato Nguyễn
No ratings yet
Data-Level Parallelism in Vector, SIMD, And: GPU Architectures
Document29 pages
Data-Level Parallelism in Vector, SIMD, And: GPU Architectures
Marcelo Araujo
No ratings yet
Zareen 11
Document8 pages
Zareen 11
Jehangir Vakil
No ratings yet
GPU Programming: CUDA
Document29 pages
GPU Programming: CUDA
Milagros Vega
No ratings yet
Unit 6 Chapter 1 Parallel Programming Tools Cuda - Programming
Document28 pages
Unit 6 Chapter 1 Parallel Programming Tools Cuda - Programming
Pallavi Bharti
No ratings yet
Intro 2 Cuda
Document30 pages
Intro 2 Cuda
ab c
No ratings yet
NNSW Tools
Document39 pages
NNSW Tools
Fernando Cisneros
No ratings yet
Lecture 4
Document40 pages
Lecture 4
Mustjab Hussain
No ratings yet
8 Cud A 1
Document38 pages
8 Cud A 1
Aashish Mittal
No ratings yet
CUDA Compute Unified Device Architecture
Document26 pages
CUDA Compute Unified Device Architecture
proxymo1
No ratings yet
217 Lec2
Document24 pages
217 Lec2
palash
No ratings yet
Summary Exam 2015
Document30 pages
Summary Exam 2015
ayylmao kek
No ratings yet
Lattice QCD On A Novel Vector Architecture: Benjamin Huth, Nils Meyer, Tilo Wettig
Document7 pages
Lattice QCD On A Novel Vector Architecture: Benjamin Huth, Nils Meyer, Tilo Wettig
HW
No ratings yet
CUDA
Document33 pages
CUDA
ravish177
No ratings yet
Xbox 360 Architecture: Lennard Streat Samuel Echefu
Document23 pages
Xbox 360 Architecture: Lennard Streat Samuel Echefu
Marcos Silva
No ratings yet
Lec12 - SIMD
Document36 pages
Lec12 - SIMD
Jane He
No ratings yet
Gpgpu Final
Document124 pages
Gpgpu Final
Sibghat Rehman
No ratings yet
Fast Sort On CPUs, GPUs and Intel MIC Architectures - Technical Report - Intel Labs (Intel-Labs-Radix-Sort-Mic-Report)
Document11 pages
Fast Sort On CPUs, GPUs and Intel MIC Architectures - Technical Report - Intel Labs (Intel-Labs-Radix-Sort-Mic-Report)
metar62276
No ratings yet
A Beginner'S Guide To Programming Gpus With Cuda: Mike Peardon
Document21 pages
A Beginner'S Guide To Programming Gpus With Cuda: Mike Peardon
proxymo1
No ratings yet
GPU Architecture & Implications: David Luebke NVIDIA Research
Document94 pages
GPU Architecture & Implications: David Luebke NVIDIA Research
George Popov
No ratings yet
Assignment 2 Template
Document23 pages
Assignment 2 Template
VortexProYeo
No ratings yet
The Evolution of Gpus For General Purpose Computing
Document38 pages
The Evolution of Gpus For General Purpose Computing
sufikasih
No ratings yet
Josh Cuda
Document27 pages
Josh Cuda
Ramu
No ratings yet
Comparison of Multimedia SIMD, GPUs and Vector
Document13 pages
Comparison of Multimedia SIMD, GPUs and Vector
Harsh Prasad
No ratings yet
Side-Channel Power Analysis of A GPU AES Implementation: Chao Luo, Yunsi Fei, Pei Luo, Saoni Mukherjee, David Kaeli
Document8 pages
Side-Channel Power Analysis of A GPU AES Implementation: Chao Luo, Yunsi Fei, Pei Luo, Saoni Mukherjee, David Kaeli
Anupam Das
No ratings yet
Introduction To Programming Massively Parallel Graphics Processors
Document84 pages
Introduction To Programming Massively Parallel Graphics Processors
djrive
No ratings yet
26-27 SIMD Architecture
Document33 pages
26-27 SIMD Architecture
fanna786
No ratings yet
Final PPT of CCSDS TC System
Document38 pages
Final PPT of CCSDS TC System
Navakanth Bydeti
0% (1)
Efficient JTAG-based Linux Kernel Debugging: Embedded Linux Conference Europe - 2011
Document26 pages
Efficient JTAG-based Linux Kernel Debugging: Embedded Linux Conference Europe - 2011
Axayacatl Alanis Narvaez
No ratings yet
Lecture 30 GPU Programming Loop Parallelism
Document16 pages
Lecture 30 GPU Programming Loop Parallelism
Udai Valluru
No ratings yet
Cuda Emulator
Document7 pages
Cuda Emulator
Midhun Pal Cochin
No ratings yet
Memory: Prof. Dhaval Shah
Document45 pages
Memory: Prof. Dhaval Shah
Vimoli Mehta
No ratings yet
CUDA Programming: Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen
Document28 pages
CUDA Programming: Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen
Chaitanya Nirfarake
No ratings yet
Cuda
Document44 pages
Cuda
avinash kumar
No ratings yet
GCN Architecture Whitepaper
Document18 pages
GCN Architecture Whitepaper
Илија Петровић
No ratings yet
Compute
Document5 pages
Compute
Devesh Singh
No ratings yet
HYS64/72D64000/128020GU-7/8-A Unbuffered DDR-I SDRAM-Modules
Document18 pages
HYS64/72D64000/128020GU-7/8-A Unbuffered DDR-I SDRAM-Modules
Shainame
No ratings yet
2021-04 ATZextra Softing SDE SOVD The Diagnostic Standard of Tomorrow EN
Document3 pages
2021-04 ATZextra Softing SDE SOVD The Diagnostic Standard of Tomorrow EN
gh6975340
No ratings yet
Chap7 CUDA Intro
Document63 pages
Chap7 CUDA Intro
Michael Shi
No ratings yet
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
Document29 pages
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
Xafran Khan
No ratings yet
Amd Ryzen Cpu Optimization
Document55 pages
Amd Ryzen Cpu Optimization
Siddharth Pujari
No ratings yet
Architecture Design and FPGA Code Development For ADC Module
Document6 pages
Architecture Design and FPGA Code Development For ADC Module
IJRASETPublications
No ratings yet
eMMC Architecture and Operation
Document42 pages
eMMC Architecture and Operation
super_faca
No ratings yet
Parralel 01
Document38 pages
Parralel 01
demro channel
No ratings yet
Reading Assignment1
Document15 pages
Reading Assignment1
Udai Valluru
No ratings yet
2015 371813 Shrii-Lalitoo
Document452 pages
2015 371813 Shrii-Lalitoo
Udai Valluru
No ratings yet
Lecture 30 GPU Programming Loop Parallelism
Document16 pages
Lecture 30 GPU Programming Loop Parallelism
Udai Valluru
No ratings yet
Tu0404bgrqpldrki2009isic 140121090507 Phpapp02
Document146 pages
Tu0404bgrqpldrki2009isic 140121090507 Phpapp02
Udai Valluru
No ratings yet
Lecture 20
Document28 pages
Lecture 20
Udai Valluru
No ratings yet
Tu0403switcherki2009isic 140305063913 Phpapp02
Document40 pages
Tu0403switcherki2009isic 140305063913 Phpapp02
Udai Valluru
No ratings yet
Lecture 17
Document20 pages
Lecture 17
Udai Valluru
No ratings yet
FULLTEXT01
Document94 pages
FULLTEXT01
Udai Valluru
No ratings yet
Tu0401switcherki2009isic 140121090158 Phpapp01
Document128 pages
Tu0401switcherki2009isic 140121090158 Phpapp01
Udai Valluru
No ratings yet
Lecture 18
Document38 pages
Lecture 18
Udai Valluru
No ratings yet
Tu0402switcherki2009isic 140305063756 Phpapp01
Document48 pages
Tu0402switcherki2009isic 140305063756 Phpapp01
Udai Valluru
No ratings yet
Advanced VLSI Architecture: Lecture 8: Memory Hierarchy
Document35 pages
Advanced VLSI Architecture: Lecture 8: Memory Hierarchy
Udai Valluru
No ratings yet
Advanced VLSI Architecture: Lecture 25: Data Level Parallelism
Document19 pages
Advanced VLSI Architecture: Lecture 25: Data Level Parallelism
Udai Valluru
No ratings yet
Advanced VLSI Architecture: Lecture 7: Memory Hierarchy
Document20 pages
Advanced VLSI Architecture: Lecture 7: Memory Hierarchy
Udai Valluru
No ratings yet
Advanced VLSI Architecture: Lecture 26: Data Level Parallelism
Document15 pages
Advanced VLSI Architecture: Lecture 26: Data Level Parallelism
Udai Valluru
No ratings yet
Lecture 21
Document25 pages
Lecture 21
Udai Valluru
No ratings yet
Advanced VLSI Architecture: Lecture 23 - 24: Data Level Parallelism
Document27 pages
Advanced VLSI Architecture: Lecture 23 - 24: Data Level Parallelism
Udai Valluru
No ratings yet
Lecture 22
Document29 pages
Lecture 22
Udai Valluru
No ratings yet
MEL G611: Ic Fabrication Technology: Section X: Annealing
Document12 pages
MEL G611: Ic Fabrication Technology: Section X: Annealing
Udai Valluru
No ratings yet
9.ion Implantation
Document18 pages
9.ion Implantation
Udai Valluru
No ratings yet
Sec X, L 30 & 31: Annealing: MEL G611: Ic Fabrication Technology
Document17 pages
Sec X, L 30 & 31: Annealing: MEL G611: Ic Fabrication Technology
Udai Valluru
No ratings yet
(MEL G621) : Lecture 1: Introduction
Document8 pages
(MEL G621) : Lecture 1: Introduction
Udai Valluru
No ratings yet
Sec VII, L 18-21: TFD: MEL G611: Ic Fabrication Technology
Document22 pages
Sec VII, L 18-21: TFD: MEL G611: Ic Fabrication Technology
Udai Valluru
No ratings yet
Rumis Counsels of The Bird
Document2 pages
Rumis Counsels of The Bird
Krysthel Joy Itorma Cuizon
100% (1)
FLAT Assignment-1 PDF
Document2 pages
FLAT Assignment-1 PDF
Anuj Jain
No ratings yet
Avelino Kalalang III - EnG 501 Activity (Morphology)
Document2 pages
Avelino Kalalang III - EnG 501 Activity (Morphology)
Avelino Jiggs Kalalang III
No ratings yet
Come What May
Document1 page
Come What May
MANOLO C. LUCENECIO
No ratings yet
The Python Workbook: Printed Book
Document1 page
The Python Workbook: Printed Book
Satyajit Pramanik Scotz
No ratings yet
Contoh Soal Bahasa Inggris Kelas 8
Document1 page
Contoh Soal Bahasa Inggris Kelas 8
fitrianarosa
No ratings yet
05mindset2 Unit3 Test AnswerKey
Document2 pages
05mindset2 Unit3 Test AnswerKey
Andrea David Martínez
No ratings yet
Tobechukwu
Document2 pages
Tobechukwu
Cyrille MBOUDOU
No ratings yet
Proficiency Cum Professional Development Course For Teachers of English From Colombia and Ecuador
Document2 pages
Proficiency Cum Professional Development Course For Teachers of English From Colombia and Ecuador
Cynthia Mirzap
No ratings yet
Materi BERMAIN PERAN - DRAMA
Document4 pages
Materi BERMAIN PERAN - DRAMA
Mas X
No ratings yet
ARCHICAD CPP Script Guide INT
Document80 pages
ARCHICAD CPP Script Guide INT
jorge
No ratings yet
Lab 9
Document2 pages
Lab 9
Shweta Limbachiya
No ratings yet
21 Scheme
Document4 pages
21 Scheme
raghuace
No ratings yet
Week 11 Tutorial Identify and Correct Misplaced and Dangling Modifiers
Document1 page
Week 11 Tutorial Identify and Correct Misplaced and Dangling Modifiers
03152788
No ratings yet
Elizabethan Age Research Paper
Document5 pages
Elizabethan Age Research Paper
t1s1gebes1d3
100% (1)
Accentuation in Connected Speech Audeppi
Document2 pages
Accentuation in Connected Speech Audeppi
Martìn Di Lorenzo
100% (1)
Daily-Lesson-Log - English Grade 5
Document12 pages
Daily-Lesson-Log - English Grade 5
Flappy Girl
No ratings yet
Sikap Indonesia Terhadap Penjajahan Palestina - Tiar Anwar Bachtiar DKK
Document52 pages
Sikap Indonesia Terhadap Penjajahan Palestina - Tiar Anwar Bachtiar DKK
Pondok Pesantren Ansharu Sunnah Cilame
No ratings yet
Sap Router
Document4 pages
Sap Router
neeraj
No ratings yet
6 Registry Hacks To Make Your Windows PC Faster - PCWorld
Document6 pages
6 Registry Hacks To Make Your Windows PC Faster - PCWorld
MOHAMED GEREZA
No ratings yet
9th U3 l3 Running Errands
Document5 pages
9th U3 l3 Running Errands
api-298537017
No ratings yet
Narrative: Communicative Purpose
Document5 pages
Narrative: Communicative Purpose
wuisan
No ratings yet
Mesl-08 - Week 5 Elements: Item Instruction: Place Your Answer in "Column C", Use Only,, and
Document34 pages
Mesl-08 - Week 5 Elements: Item Instruction: Place Your Answer in "Column C", Use Only,, and
Baser
No ratings yet