You are on page 1of 7


CSE 591: GPU Programming Setting Up GPU Programming Environment

Before Start: CPU Emulator Installation Compilation Options Browsing SDK examples

Ziyi Zheng

Computer Science Department Stony Brook University


Before Start : Emulation For those who want to use CUDA but do not have CUDAenabled GPU CPU Emulator

Before Start: Emulator Installation Compilation options Browsing SDK examples

nvcc . -deviceemu -D_DEVICEEMU

Aiming at debugging to help code development Replaced by Parallel Nsight ( requiring a CUDA-enabled GPU too) Nvidia starts to remove CPU emulator support on CUDA 3.0, March 2010 Latest CUDA version is CUDA 3.2 September 2010 Need to install CUDA 2.3, June 2009 (toolkit and SDK) Older NVCC, older APIs Can use emulation version of CUFFT, CUBLAS No CUSPARSE, CURNG

CUDA for CPUs CUDA C++ compiler in research MCUDA Developed by Wen-mei Hwus group Aiming at comparing GPUs and optimized CPUs performance CUDA code optimized C++ code for multi-core CPUs Linux based Download

CUDA for X86 platform Commercial CUDA C compiler

Under development by Portland Group (PGI) No GPU required Will be demonstrated at the SC10 Supercomputing
conference in November 13-15, 2010.

We are ahead of time


Not required in the course. Use it only when you want to fairly compare the performance between CPU and GPU


Without CUDA-Enabled GPU Step

Before Start: Emulator Installation Compilation Options Browsing SDK examples


2. Install CUDA Toolkit 2.3 3. Install CUDA SDK code examples 2.3


Available Resources

1. 2. Download appropriate GPU driver 3. Install CUDA Toolkit 3.2 4. Install GPU Computing SDK code examples 3.2

NVCC Visual Studio syntax highlighting CUDA BLAS (CUBLAS) and FFT (CUFFT) libraries CUDA Visual Profiler CUDA-GDB for Linux

Not in CUDA 2.3 but included in later version

OPENCL DirectCompute CUDA Fortran compiler CUDA LAPACK library CUDA CUSPARSE and CURNG libraries


ATI/AMD Card + CUDA Convert CUDA code into OPENCL code then build OPENCL code and executed on ATI/AMD card


2. Download ATI Stream SDK 2.2

Additional STEP

1. 2. 3. 4. Download ATI Stream SDK 2.2 Download Swan (27 May 2010)


CUDA Programming Environment Windows, Linux, Mac OS

Before Start: Emulator Installation Compilation Options Browsing SDK examples

Associated Environment Variables Automatic set by Toolkit CUDA_BIN_PATH CUDA_INC_PATH CUDA_LIB_PATH

Microsoft Visual Studio 2008 Need MS Visual Studio?

Integrate NVCC with Visual Studio MS Visual Studio 2008

Go this website: to get Visual

Studio 2008 online.

Either using CUDA build rules (installed by CUDA SDK)

GUI interface Generate compilation commands (options, parameters)

Must validate via student ID. Served as an IDE (integrated development environment) Served as an C/C++ compiler and linker for the host

Or write custom build rules

Command line interface Directly Writing compilation commands Such as :
"C:\CUDA\bin\nvcc.exe" -ccbin "$(VCInstallDir)bin" -c -DWIN32 -D_CONSOLE D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/O2,/Zi,/MT -I"C:\CUDA\include" -I./ -I../../common/inc -o $(ConfigurationName)\$(InputName).obj $(InputFileName)

CUDA Build Rules 2.3 1. Right click a projection 2. Choose Custom Build Rules 3. Choose a CUDA rule 2.3 if available in your system which will be available after you installing the CUDA SDK 2.3 4. Right click a .cu file 5. Choose Property 6. Click CUDA rule 2.3

CUDA Build Rules 2.3

Setting Building Option by Command 1. Click General 2. For Tool : choose Custom Build Tool 3. Then Choose Custom Build Step 4. Enter your building command

CUDA Project

Create one from scratch? Modify existing projects in SDK CUDA visual studio wizard Third party, independent updates, no document support

CPU Emulation Mode for CUDA 2.3 For projections in CUDA SDK 2.3 In visual studio configuration: Chose EmuRelease or EmuDebug Instead of Release or debug


Before Start: Emulator Installation Compilation Options Browsing SDK examples

For your own projections with CUDA 2.3

1. Add a building configuration 2. change build rules settings, (or simply adding -deviceemu
-D_DEVICEEMU into complication command line)

Bandwidth Test Memory transfer on CPU GPU GPU GPU GPU CPU On a 8600m GT card Capability 1.1 # Multi-Processor On a 8600m GT card CPU GPU GPU GPU GPU CPU # cores

Device Query Graphics Hardware Capability

8 32 512 x 512 x 64 512 65535 x 65535 x 1

Block limit per dimension 1236 MB/s 11836 MB/s 380 MB/s Maximum # thread per block Grid limit per dimension

Matrix Multiplication 8600m GT v.s. Core2 Duo 2.4 GHz GPU : CPU in emulation mode : 1000x faster 0.62 ms around 850ms

Template 8600m GT v.s. Core2 Duo 2.4 GHz GPU : CPU in emulation mode : 3 times slower? Multiply 32 numbers by another 32 numbers 179 ms 66 ms

Matrix A: 80x48 Matrix B: 48x128 Matrix C: 80x128 Computational intensive GPU is better than CPU

unsigned int num_threads = 32; dim3 grid( 1, 1, 1); dim3 threads( num_threads, 1, 1);

32 multiplications

Not computational intensive at all CPU is better than GPU