CS 179: GPU Computing: Lecture 2: More Basics

Uploaded by

Rajul

0% found this document useful (0 votes)

12 views23 pages

Caltech GPU slides

Original Title

cs179_2017_lec02

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Caltech GPU slides

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

12 views23 pages

CS 179: GPU Computing: Lecture 2: More Basics

Uploaded by

Rajul

Caltech GPU slides

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 23

Search inside document

CS 179: GPU

Computing
LECTURE 2: MORE BASICS
Recap
Can use GPU to solve highly parallelizable problems
Straightforward extension to C++
◦ Separate CUDA code into .cu and .cuh files and compile with nvcc to
create object files (.o files)
Looked at the a[] + b[] -> c[] example
Recap
If you forgot everything, just make sure you understand that CUDA is
simply an extension of other bits of code you write!!!!
◦ Evident in .cu/.cuh vs .cpp/.hpp distinction
◦ .cu/.cuh is compiled by nvcc to produce a .o file
◦ .cpp/.hpp is compiled by g++ and the .o file from the CUDA code is
simply linked in using a "#include xxx.cuh" call
◦ No different from how you link in .o files from normal C++ code
.cu/.cuh vs .cpp/.hpp
.cu/.cuh vs .cpp/.hpp
.cu/.cuh vs .cpp/.hpp
.cu/.cuh vs .cpp/.hpp
Thread Organization
We will now look at how threads are organized and used in GPUs
◦ Keywords you MUST know to code in CUDA:
◦ Thread
◦ Block
◦ Grid
◦ Keywords you MUST know to code WELL in CUDA:
◦ (Streaming) Multiprocessor
◦ Warp
◦ Warp Divergence
Inside a GPU

The black Xs are just

crossing out things you
don’t have to think about
just yet. You'll learn about
them later
Inside a GPU
Think of Device Memory (we will also
refer to it as Global Memory) as a
RAM for your GPU
◦ Faster than getting memory from the
actual RAM but still can be faster
◦ Will come back to this in future lectures
GPUs have many Streaming
Multiprocessors (SMs)
◦ Each SM has multiple processors but
only one instruction unit
◦ Groups of processors must run the
exact same set of instructions at any
given time with in a single SM
Inside a GPU
When a kernel (the thing you define in
.cu files) is called, the task is divided up
into threads
◦ Each thread handles a small portion of
the given task
The threads are divided into a Grid of
Blocks
◦ Both Grids and Blocks are 3 dimensional
◦ e.g.
dim3 dimBlock(8, 8, 8);
dim3 dimGrid(100, 100, 1);
Kernel<<<dimGrid, dimBlock>>>(…);
◦ However, we'll often only work with 1
dimensional grids and blocks
◦ e.g. Kernel<<<block_count, block_size>>>(…);
Inside a GPU
Maximum number of threads per block
count is usually 512 or 1024 depending
on the machine
Maximum number of blocks per grid is
usually 65535
◦ If you go over either of these numbers
your GPU will just give up or output
garbage data
◦ Much of GPU programming is dealing
with this kind of hardware limitations!
Get used to it
◦ This limitation also means that your
Kernel must compensate for the fact that
you may not have enough threads to
individually allocate to your data points
◦ Will show how to do this later (this lecture)
Inside a GPU
Each block is assigned to an SM
Inside the SM, the block is divided into
Warps of threads
◦ Warps consist of 32 threads
◦ All 32 threads MUST run the exact
same set of instructions at the same
time
◦ Due to the fact that there is only one
instruction unit
◦ Warps are run concurrently in an SM
◦ If your Kernel tries to have threads do
different things in a single warp (using
if statements for example), the two tasks
will be run sequentially
◦ Called Warp Divergence (NOT GOOD)
Inside a GPU
(fun hardware info)
In Fermi Architecture (i.e. GPUs with
Compute Capability 2.x), each SM has
32 cores
◦ e.g. GTX 400, 500 series
◦ 32 cores is not what makes each
warp have 32 threads. Previous
architecture also had 32 threads per
warp but had less than 32 cores per
SM
Halo.cms.caltech.edu has 3 GTX 570s
◦ This course will cover CC 2.x
Streaming Multiprocessor
A[] + B[] -> C[] (again)
A[] + B[] -> C[] (again)
A[] + B[] -> C[] (again)
Questions so far?
Stuff that will be useful later
Stuff that will be useful later
Stuff that will be useful later
Next Time...
Global Memory access is not that fast
◦ Tends to be the bottleneck in many GPU programs
◦ Especially true if done stupidly
◦ We'll look at what "stupidly" means

Optimize memory access by utilizing hardware specific memory access

patterns
Optimize memory access by utilizing different caches that come with
the GPU

CUDA Compute Unified Device Architecture
Document26 pages
CUDA Compute Unified Device Architecture
proxymo1
No ratings yet
Lecture12 GPUArchCUDA02-CUDAMem
Document67 pages
Lecture12 GPUArchCUDA02-CUDAMem
Michelle Saver
No ratings yet
Gpu1 - GPU Introduction
Document20 pages
Gpu1 - GPU Introduction
Richik Dutta
No ratings yet
Gpgpu Final
Document124 pages
Gpgpu Final
Sibghat Rehman
No ratings yet
Introduction To Programming Massively Parallel Graphics Processors
Document84 pages
Introduction To Programming Massively Parallel Graphics Processors
djrive
No ratings yet
Introduction To Gpu Programming With Cuda and Openacc
Document40 pages
Introduction To Gpu Programming With Cuda and Openacc
plop
No ratings yet
Chap7 CUDA Intro
Document63 pages
Chap7 CUDA Intro
Michael Shi
No ratings yet
CS 179: GPU Computing Memory Systems
Document43 pages
CS 179: GPU Computing Memory Systems
Rajul
No ratings yet
CUDA
Document33 pages
CUDA
ravish177
No ratings yet
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
Document29 pages
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
Xafran Khan
No ratings yet
GPU Programming: CUDA
Document29 pages
GPU Programming: CUDA
Milagros Vega
No ratings yet
High Performance Computing On Gpu
Document37 pages
High Performance Computing On Gpu
Sushant Sharma
No ratings yet
OPENMP Notes
Document4 pages
OPENMP Notes
avinash kumar
No ratings yet
Parralel 01
Document38 pages
Parralel 01
demro channel
No ratings yet
Cuda Talk
Document82 pages
Cuda Talk
Kevin Salmeron Vicente
100% (1)
Running Simple CUDA Code on a GPU-Based Rocks Cluster
Document17 pages
Running Simple CUDA Code on a GPU-Based Rocks Cluster
proxymo1
No ratings yet
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
Document121 pages
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
eipu tu
No ratings yet
Different Memory and Variable Types on CPUs and GPUs
Document41 pages
Different Memory and Variable Types on CPUs and GPUs
Denis
No ratings yet
cs179 2017 Lec01
Document24 pages
cs179 2017 Lec01
Rajul
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
Document40 pages
Lecture 1: An Introduction To CUDA: Mike Giles
sdancer75
No ratings yet
Lecture 1: An Introduction To CUDA: Mike Giles
Document247 pages
Lecture 1: An Introduction To CUDA: Mike Giles
Stanislav Spatari
No ratings yet
Josh Cuda
Document27 pages
Josh Cuda
Ramu
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
Document42 pages
Programming Gpus With Cuda: John Mellor-Crummey
askbilladdmicrosoft
No ratings yet
лк CUDA - 1 PDCn
Document31 pages
лк CUDA - 1 PDCn
Олеся Барковська
No ratings yet
An Overview of General Purpose Graphics Processing Units: Marc Moreno Maza
Document18 pages
An Overview of General Purpose Graphics Processing Units: Marc Moreno Maza
AsHraf G. ElrawEi
No ratings yet
CUDA, Supercomputing For The Masses: Part 4: Understanding and Using Shared Memory
Document3 pages
CUDA, Supercomputing For The Masses: Part 4: Understanding and Using Shared Memory
thatupiso
No ratings yet
Dragged, Kicking and Scre Aming:: Architec Ture and V Ideo Gam Es
Document17 pages
Dragged, Kicking and Scre Aming:: Architec Ture and V Ideo Gam Es
zixie
No ratings yet
CUDA Programming On Nvidia Gpus: Mike Giles
Document21 pages
CUDA Programming On Nvidia Gpus: Mike Giles
proxymo1
No ratings yet
Nvidia Cuda
Document26 pages
Nvidia Cuda
Arpit Vijayvergia
No ratings yet
Introduction to Parallel Programming with CUDA
Document25 pages
Introduction to Parallel Programming with CUDA
Sudip Adhikari
No ratings yet
Multi Core
Document70 pages
Multi Core
Mark Veltzer
No ratings yet
8 Cud A 1
Document38 pages
8 Cud A 1
Aashish Mittal
No ratings yet
Main Memory: Prof. Mike Giles
Document9 pages
Main Memory: Prof. Mike Giles
Fernanda Foertter
No ratings yet
Mics2010 Submission 13
Document12 pages
Mics2010 Submission 13
Materi Mikom
No ratings yet
Spark Optimization Notes
Document7 pages
Spark Optimization Notes
Data Monk
No ratings yet
OpenCL Tutorial - Basics
Document24 pages
OpenCL Tutorial - Basics
ozgur_sahin_13
No ratings yet
Design For Performance
Document34 pages
Design For Performance
c0de517e.blogspot.com
100% (1)
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
Unit 6 Chapter 1 Parallel Programming Tools Cuda - Programming
Document28 pages
Unit 6 Chapter 1 Parallel Programming Tools Cuda - Programming
Pallavi Bharti
No ratings yet
HW 1
Document4 pages
HW 1
k.rekha
No ratings yet
Gpu History and Cuda Programming Basics
Document44 pages
Gpu History and Cuda Programming Basics
Fransiskus Yoga Esa Wibowo
No ratings yet
L 3 GPU
Document33 pages
L 3 GPU
fdfs
No ratings yet
GPU Cluster4
Document31 pages
GPU Cluster4
Ismar Santos
No ratings yet
Debug Linux Kernel with GDB
Document40 pages
Debug Linux Kernel with GDB
tedy58
No ratings yet
Set VPN Windows XP
Document12 pages
Set VPN Windows XP
achmadzulkarnaen
No ratings yet
Data-Oriented Design and C++ - Mike Acton - CppCon 2014
Document201 pages
Data-Oriented Design and C++ - Mike Acton - CppCon 2014
c0der
No ratings yet
Calculating Prime Numbers Comparing Java, C, and Cuda
Document27 pages
Calculating Prime Numbers Comparing Java, C, and Cuda
Koukou
No ratings yet
Linux & Computer Systems: CS553 Homework #1
Document4 pages
Linux & Computer Systems: CS553 Homework #1
Hariharan Shankar
No ratings yet
09 ParallelizationRecap PDF
Document62 pages
09 ParallelizationRecap PDF
giordano mancini
No ratings yet
Lecture 0: Cpus and Gpus: Prof. Mike Giles
Document36 pages
Lecture 0: Cpus and Gpus: Prof. Mike Giles
Aashish
No ratings yet
Performance Optimization Techniques for GPU Memory
Document77 pages
Performance Optimization Techniques for GPU Memory
Michael Shi
No ratings yet
002 - Introduction To CUDA Programming - 1
Document54 pages
002 - Introduction To CUDA Programming - 1
Vinod VM
No ratings yet
Microblaze Linux: Using An FPGA-based Processor Is: Very Intelligent Very Stupid Don't Know
Document53 pages
Microblaze Linux: Using An FPGA-based Processor Is: Very Intelligent Very Stupid Don't Know
gaurav311086
No ratings yet
Tutorial No 3
Document2 pages
Tutorial No 3
mmed68003
No ratings yet
Gpu Programming
Document96 pages
Gpu Programming
Jino Goju Stark
100% (2)
Ahmad Aljebaly Department of Computer Science Western Michigan University
Document42 pages
Ahmad Aljebaly Department of Computer Science Western Michigan University
Arushi Mittal
No ratings yet
Lab Manual: CS-102 C Omputer Programming Lab
Document46 pages
Lab Manual: CS-102 C Omputer Programming Lab
jayadeeptp
No ratings yet
Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
From Everand
Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
Rodrigo Copetti
No ratings yet
Central Processing Unit (The Brain of The Computer)
Document36 pages
Central Processing Unit (The Brain of The Computer)
gunashekar1982
No ratings yet
Basic Hardwares For ML - Aravind Ariharasudhan
Document23 pages
Basic Hardwares For ML - Aravind Ariharasudhan
Aravind Ariharasudhan
No ratings yet
Fenics Readthedocs Io en Latest
Document17 pages
Fenics Readthedocs Io en Latest
Rajul
No ratings yet
Lec2 17
Document27 pages
Lec2 17
Rajul
No ratings yet
Lec3 17
Document11 pages
Lec3 17
Rajul
No ratings yet
Univariate Volatility Models Lecture Notes
Document22 pages
Univariate Volatility Models Lecture Notes
Rajul
No ratings yet
0.1 Installation of R Packages
Document10 pages
0.1 Installation of R Packages
Rajul
No ratings yet
000 Getstartedrpi Digital
Document116 pages
000 Getstartedrpi Digital
joha
100% (1)
Maths 2year
Document1 page
Maths 2year
Rajul
No ratings yet
Equity Structured Products Accumulator/ Decumulator
Document5 pages
Equity Structured Products Accumulator/ Decumulator
Rajul
No ratings yet
Lec1 17
Document39 pages
Lec1 17
Rajul
No ratings yet
1906 02868
Document12 pages
1906 02868
Rajul
No ratings yet
Long-Range Dependency Effects in Network Timekeeping: David L. Mills University of Delaware
Document33 pages
Long-Range Dependency Effects in Network Timekeeping: David L. Mills University of Delaware
Rajul
No ratings yet
Docent2020 PDF
Document49 pages
Docent2020 PDF
Rajul
No ratings yet
Tor: The Second-Generation Onion Router
Document17 pages
Tor: The Second-Generation Onion Router
Diego Heras
No ratings yet
Elective I (Math)
Document2 pages
Elective I (Math)
Rajul
No ratings yet
Network Time Protocol (NTP) General Overview: David L. Mills University of Delaware
Document22 pages
Network Time Protocol (NTP) General Overview: David L. Mills University of Delaware
Rajul
No ratings yet
Gambling, Random Walks and The Central Limit Theorem: 3.1 Random Variables and Laws of Large Num-Bers
Document59 pages
Gambling, Random Walks and The Central Limit Theorem: 3.1 Random Variables and Laws of Large Num-Bers
Rajul
No ratings yet
mch1 Ps
Document205 pages
mch1 Ps
Rajul
No ratings yet
Clock PDF
Document14 pages
Clock PDF
Rajul
No ratings yet
Linear Models: Stability and Redundancy: 2.1 Singular Value Decomposition
Document24 pages
Linear Models: Stability and Redundancy: 2.1 Singular Value Decomposition
Rajul
No ratings yet
Numerical Methods in Finance. Part A. (2010-2011)
Document23 pages
Numerical Methods in Finance. Part A. (2010-2011)
Rajul
No ratings yet
03 05 Mergingdata PDF
Document10 pages
03 05 Mergingdata PDF
Rajul
No ratings yet
Flume User Guide
Document48 pages
Flume User Guide
Rajul
No ratings yet
mch1 Ps
Document205 pages
mch1 Ps
Rajul
No ratings yet
Tutorial PDF
Document41 pages
Tutorial PDF
Brent Cullen
No ratings yet
Asset-V1 HKUx+HKU 08x+1T2030+type@asset+block@Introduction To FinTech Course Syllabus 05142018
Document2 pages
Asset-V1 HKUx+HKU 08x+1T2030+type@asset+block@Introduction To FinTech Course Syllabus 05142018
Rajul
No ratings yet
Syllabus MSC (CompSc) 2016-17
Document25 pages
Syllabus MSC (CompSc) 2016-17
Rajul
No ratings yet
Overview History R
Document16 pages
Overview History R
Rajul
No ratings yet
Agda Overview 2009
Document6 pages
Agda Overview 2009
Rajul
No ratings yet
Rexx
Document4 pages
Rexx
Suresh Kamakshigari
No ratings yet
OS Introduction - Part 2
Document44 pages
OS Introduction - Part 2
21PC12 - GOKUL D
No ratings yet
Are Ad Me
Document4 pages
Are Ad Me
mign72
No ratings yet
Native API Tour Rev C
Document25 pages
Native API Tour Rev C
zbouzgha
No ratings yet
Eckert5e ch07
Document45 pages
Eckert5e ch07
Stephen Efange
No ratings yet
Manifest
Document35 pages
Manifest
Sbk Mao
No ratings yet
Summer with Mia Log
Document2 pages
Summer with Mia Log
wedokan ngelarani
No ratings yet
Extra - Exercises Unistall Informix
Document14 pages
Extra - Exercises Unistall Informix
Михаи Бонцало
No ratings yet
PKS - TDC 3000 Customer Resource Manual: Clearing The Error Aggregates From A Running System
Document3 pages
PKS - TDC 3000 Customer Resource Manual: Clearing The Error Aggregates From A Running System
Jason Willaims
No ratings yet
Practical-1: Aim:-Make A Single Node Cluster in Hadoop. Solution
Document49 pages
Practical-1: Aim:-Make A Single Node Cluster in Hadoop. Solution
All in one
No ratings yet
Red Hat Enterprise Linux-8-Managing File systems-en-US
Document159 pages
Red Hat Enterprise Linux-8-Managing File systems-en-US
sam
No ratings yet
Tugas Indo
Document16 pages
Tugas Indo
Raivan Nurjaman
No ratings yet
Advanced Computer Architecture CSD-411
Document113 pages
Advanced Computer Architecture CSD-411
Prateek Sharma
No ratings yet
NetPinger 2010 Namespace
Document1 page
NetPinger 2010 Namespace
Amit Apollo Barman
No ratings yet
Forensic Guide To Linux
Document108 pages
Forensic Guide To Linux
jasonddennis
100% (2)
Map Reduce
Document36 pages
Map Reduce
Papai Rana
No ratings yet
Process Management 3.1 What Is A Process
Document11 pages
Process Management 3.1 What Is A Process
priti
No ratings yet
Get-Pip Py
Document413 pages
Get-Pip Py
anik fauziyah
No ratings yet
The Isabelle System Manual
Document69 pages
The Isabelle System Manual
Bolivio Feliz
No ratings yet
Environment Variable and Set-UID Program Lab: 2.1 Task 1: Manipulating Environment Variables
Document9 pages
Environment Variable and Set-UID Program Lab: 2.1 Task 1: Manipulating Environment Variables
mohsen mahdavifar
No ratings yet
Log
Document6 pages
Log
nadiafirdiana1973
No ratings yet
Downloading and Installing SAP Free Version
Document30 pages
Downloading and Installing SAP Free Version
Santhosh Kumar
No ratings yet
How To Recover Bootcamp Partition
Document1 page
How To Recover Bootcamp Partition
Javier Hurtado
No ratings yet
Final-WSTE06-IBM Pattern Modeling and Analysis Tool
Document67 pages
Final-WSTE06-IBM Pattern Modeling and Analysis Tool
Pablo Antonio Torrejón Abarzúa
No ratings yet
Yocto Kernel Lab - Traditional Kernel Recipe
Document39 pages
Yocto Kernel Lab - Traditional Kernel Recipe
john bougs
No ratings yet
MULTITHREADING
Document30 pages
MULTITHREADING
i_2loveu3235
No ratings yet
Oracle Architecture Diagram and Notes
Document9 pages
Oracle Architecture Diagram and Notes
anil_shenoy
No ratings yet
MKVTool
Document10 pages
MKVTool
Kurnia Adi Wiguna
No ratings yet
Peugeot Service Box Installation Instructions - by M H H PDF
Document2 pages
Peugeot Service Box Installation Instructions - by M H H PDF
spilnerscrib
No ratings yet
8 Linux Parted' Commands To Create, Resize and Rescue Disk Partitions
Document13 pages
8 Linux Parted' Commands To Create, Resize and Rescue Disk Partitions
Karun Behal
No ratings yet