P. 1
NVIDIA CUDA C Programming Guide 3.1

NVIDIA CUDA C Programming Guide 3.1

|Views: 738|Likes:
Published by 邱吉震

More info:

Published by: 邱吉震 on Sep 08, 2010
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





To execute code on devices of specific compute capability, an application must load
binary or PTX code that is compatible with this compute capability as described in
Sections 3.1.2 and 3.1.3. In particular, to be able to execute code on future
architectures with higher compute capability – for which no binary code can be
generated yet –, an application must load PTX code that will be compiled just-in-
time for these devices.
Which PTX and binary code gets embedded in a CUDA C application is controlled
by the –arch and –code compiler options or the –gencode compiler option as
detailed in the nvcc user manual. For example,
nvcc x.cu

–gencode arch=compute_10,code=sm_10
–gencode arch=compute_11,code=\’compute_11,sm_11\’
embeds binary code compatible with compute capability 1.0 (first –gencode
option) and PTX and binary code compatible with compute capability 1.1 (second
-gencode option).
Host code is generated to automatically select at runtime the most appropriate code
to load and execute, which, in the above example, will be:
1.0 binary code for devices with compute capability 1.0,
1.1 binary code for devices with compute capability 1.1, 1.2, 1.3,
binary code obtained by compiling 1.1 PTX code for devices with compute
capabilities 2.0 or higher.
x.cu can have an optimized code path that uses atomic operations, for example,
which are only supported in devices of compute capability 1.1 and higher. The
__CUDA_ARCH__ macro can be used to differentiate various code paths based on
compute capability. It is only defined for device code. When compiling with
“arch=compute_11” for example, __CUDA_ARCH__ is equal to 110.
Applications using the driver API must compile code to separate files and explicitly
load and execute the most appropriate file at runtime.
The nvcc user manual lists various shorthands for the –arch, –code, and
gencode compiler options. For example, “ arch=sm_13” is a shorthand for
“ arch=compute_13 code=compute_13,sm_13” (which is the same as
“ gencode arch=compute_13,code=\’compute_13,sm_13\’”).

Chapter 3. Programming Interface


CUDA C Programming Guide Version 3.1.1

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->