You are on page 1of 384

NVIDIA CUDA

Reference Manual

Version 3.2 Beta
August 2010

Contents
1 2 Deprecated List Module Index 2.1 3 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 5 5 7 7 9 9 10 10 10 11 11 11 11 12 12 13 13 14 15 15 15 15 15 16 17

Data Structure Index 3.1 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

Module Documentation 4.1 CUDA Runtime API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 4.1.2 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Define Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2.1 4.2 CUDART_VERSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Thread Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 4.2.2 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2.1 4.2.2.2 4.2.2.3 4.2.2.4 4.2.2.5 4.2.2.6 cudaThreadExit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaThreadGetCacheConfig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaThreadGetLimit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaThreadSetCacheConfig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaThreadSetLimit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaThreadSynchronize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.3

Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 4.3.2 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2.1 4.3.2.2 4.3.2.3 cudaGetErrorString . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaGetLastError . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaPeekAtLastError . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.4

Device Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ii 4.4.1 4.4.2

CONTENTS Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2.1 4.4.2.2 4.4.2.3 4.4.2.4 4.4.2.5 4.4.2.6 4.4.2.7 4.5 cudaChooseDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaGetDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaGetDeviceCount . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaGetDeviceProperties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaSetDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaSetDeviceFlags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaSetValidDevices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 17 17 18 18 18 20 21 21 23 23 23 23 23 24 24 25 26 26 26 26 27 27 28 28 29 29 30 30 30 30 31 31 32 32 33 33

Stream Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 4.5.2 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2.1 4.5.2.2 4.5.2.3 4.5.2.4 4.5.2.5 cudaStreamCreate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaStreamDestroy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaStreamQuery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaStreamSynchronize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaStreamWaitEvent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.6

Event Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 4.6.2 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.2.1 4.6.2.2 4.6.2.3 4.6.2.4 4.6.2.5 4.6.2.6 4.6.2.7 cudaEventCreate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaEventCreateWithFlags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaEventDestroy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaEventElapsedTime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaEventQuery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaEventRecord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaEventSynchronize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.7

Execution Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 4.7.2 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.2.1 4.7.2.2 4.7.2.3 4.7.2.4 4.7.2.5 4.7.2.6 4.7.2.7 cudaConfigureCall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaFuncGetAttributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaFuncSetCacheConfig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaLaunch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaSetDoubleForDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaSetDoubleForHost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaSetupArgument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Generated for NVIDIA CUDA Library by Doxygen

CONTENTS 4.8 Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.1 4.8.2 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.1 4.8.2.2 4.8.2.3 4.8.2.4 4.8.2.5 4.8.2.6 4.8.2.7 4.8.2.8 4.8.2.9 cudaFree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaFreeArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaFreeHost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaGetSymbolAddress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaGetSymbolSize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaHostAlloc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaHostGetDevicePointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaHostGetFlags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaMalloc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

iii 34 37 37 37 37 38 38 39 39 40 41 41 41 42 43 44 44 45 45 46 47 48 48 49 50 51 52 54 54 55 56 56 57 58 58 59 60

4.8.2.10 cudaMalloc3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.11 cudaMalloc3DArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.12 cudaMallocArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.13 cudaMallocHost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.14 cudaMallocPitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.15 cudaMemcpy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.16 cudaMemcpy2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.17 cudaMemcpy2DArrayToArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.18 cudaMemcpy2DAsync . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.19 cudaMemcpy2DFromArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.20 cudaMemcpy2DFromArrayAsync . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.21 cudaMemcpy2DToArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.22 cudaMemcpy2DToArrayAsync . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.23 cudaMemcpy3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.24 cudaMemcpy3DAsync . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.25 cudaMemcpyArrayToArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.26 cudaMemcpyAsync . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.27 cudaMemcpyFromArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.28 cudaMemcpyFromArrayAsync . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.29 cudaMemcpyFromSymbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.30 cudaMemcpyFromSymbolAsync . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.31 cudaMemcpyToArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.32 cudaMemcpyToArrayAsync . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.33 cudaMemcpyToSymbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.34 cudaMemcpyToSymbolAsync . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Generated for NVIDIA CUDA Library by Doxygen

iv

CONTENTS 4.8.2.35 cudaMemGetInfo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.36 cudaMemset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.37 cudaMemset2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.38 cudaMemset2DAsync . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.39 cudaMemset3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.40 cudaMemset3DAsync . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.41 cudaMemsetAsync . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.42 make_cudaExtent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.43 make_cudaPitchedPtr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2.44 make_cudaPos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 OpenGL Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9.1 4.9.2 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Enumeration Type Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9.2.1 4.9.3 cudaGLMapFlags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 61 61 62 62 63 64 64 64 65 66 66 66 66 67 67 67 68 68 70 70 71 71 71 71 71 71 72 72 73 73 75 75 76 76 76 76

Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9.3.1 4.9.3.2 4.9.3.3 4.9.3.4 cudaGLSetGLDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaGraphicsGLRegisterBuffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaGraphicsGLRegisterImage . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaWGLGetDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.10 Direct3D 9 Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10.1 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10.2 Enumeration Type Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10.2.1 cudaD3D9DeviceList . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10.2.2 cudaD3D9MapFlags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10.2.3 cudaD3D9RegisterFlags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10.3 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10.3.1 cudaD3D9GetDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10.3.2 cudaD3D9GetDevices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10.3.3 cudaD3D9GetDirect3DDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10.3.4 cudaD3D9SetDirect3DDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10.3.5 cudaGraphicsD3D9RegisterResource . . . . . . . . . . . . . . . . . . . . . . . . . 4.11 Direct3D 10 Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11.1 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11.2 Enumeration Type Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11.2.1 cudaD3D10DeviceList . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11.2.2 cudaD3D10MapFlags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11.2.3 cudaD3D10RegisterFlags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Generated for NVIDIA CUDA Library by Doxygen

CONTENTS 4.11.3 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11.3.1 cudaD3D10GetDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11.3.2 cudaD3D10GetDevices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11.3.3 cudaD3D10GetDirect3DDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11.3.4 cudaD3D10SetDirect3DDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11.3.5 cudaGraphicsD3D10RegisterResource . . . . . . . . . . . . . . . . . . . . . . . . 4.12 Direct3D 11 Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12.1 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12.2 Enumeration Type Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12.2.1 cudaD3D11DeviceList . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12.3 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12.3.1 cudaD3D11GetDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12.3.2 cudaD3D11GetDevices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12.3.3 cudaD3D11GetDirect3DDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12.3.4 cudaD3D11SetDirect3DDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12.3.5 cudaGraphicsD3D11RegisterResource . . . . . . . . . . . . . . . . . . . . . . . . 4.13 VDPAU Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.13.1 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.13.2 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.13.2.1 cudaGraphicsVDPAURegisterOutputSurface . . . . . . . . . . . . . . . . . . . . . 4.13.2.2 cudaGraphicsVDPAURegisterVideoSurface . . . . . . . . . . . . . . . . . . . . . 4.13.2.3 cudaVDPAUGetDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.13.2.4 cudaVDPAUSetVDPAUDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.14 Graphics Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.14.1 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.14.2 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.14.2.1 cudaGraphicsMapResources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.14.2.2 cudaGraphicsResourceGetMappedPointer . . . . . . . . . . . . . . . . . . . . . . 4.14.2.3 cudaGraphicsResourceSetMapFlags . . . . . . . . . . . . . . . . . . . . . . . . . 4.14.2.4 cudaGraphicsSubResourceGetMappedArray . . . . . . . . . . . . . . . . . . . . . 4.14.2.5 cudaGraphicsUnmapResources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.14.2.6 cudaGraphicsUnregisterResource . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.15 Texture Reference Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.15.1 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.15.2 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.15.2.1 cudaBindTexture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.15.2.2 cudaBindTexture2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Generated for NVIDIA CUDA Library by Doxygen

v 76 76 77 77 78 78 80 80 80 80 81 81 81 82 82 82 84 84 84 84 85 85 86 87 87 87 87 88 88 89 90 90 91 91 91 91 92

. . . .15. . . . . .1 Detailed Description . . . .18. 105 4. . . . . . . . . . . . . . . . . . . . . . . . . . .18. 4. . . . . . . . . . . .18. . . . .18. . .7 cudaGetTextureReference . . . . .2 cudaBindSurfaceToArray . . . . . . . . . . .16. . . . . . . . . . . . . 100 4. . . . . . . . . . . 101 4. . . . .8 cudaUnbindTexture . . . . . . 4. . . . . . .17. . . .18.2. . .18. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11 cudaFuncGetAttributes . .1 cudaBindSurfaceToArray .2 Function Documentation . . . . . 4. . . . . . . . . . .6 cudaGetTextureAlignmentOffset . .1. . . . . .2. . . . . . . . . . . . . . . . 109 4.14 cudaGetSymbolSize . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18. . . . 107 4. . . . . . . . . . . . . . . . . . . . . .18.6 cudaBindTexture2D . 93 93 94 94 95 95 96 96 96 96 96 98 98 98 98 99 4. . . . . . . . . .1 cudaDriverGetVersion . . . . . . . . 100 4. . . . . . . . . . . . . . . .2. . . .18. . . . . . .18. . . . . . . . . . .18. . . . . . . . . . . . .2.18. . . . . . . . . . . . . . . . . . . . . . . . . 109 4. 107 4. . . . . . . . . . 105 4. . . .2 Function Documentation . . . . . . . . . . . . . . . . .8 cudaBindTextureToArray . . . . . . . . . . . . . . . . .7 cudaBindTextureToArray . . . . . . . .15.18.2. . . . . . . .18. . . . . . . . . . . . .16. . . . . . . . . . . . . . . . . . . . .15. . . . . . . . . . . . . . . . . . . . .vi CONTENTS 4. . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. .16. . . . .18. . . . . . . . . . . . . .2 cudaRuntimeGetVersion . . . . . . . . . . . . . . .16.9 cudaCreateChannelDesc . 102 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15. . . . . . . . . . . . 4. . . . .2. . . .4 cudaBindTexture . .18 C++ API Routines . 4. . . . . .2. . . . . . . . . . . . . . . . . .18.5 cudaGetChannelDesc . . . 4. . . . . . .17 Version Management . . . . . . . . . . . 110 4. . . . . . 4. . .2. . .2 cudaGetSurfaceReference . . . . . . . . . . . . . . . . . . . . . .15. . .2. . . . . . . . . . . . . . . . . . . . .18. . . .2. . . .15. . . .2. . . . .2. . . . . . . 4. .2. . . . . . . . . . . . . . . . .2. . . . . . . . . . .2. . . . . . . . . .2. . . .12 cudaFuncSetCacheConfig . . . . 104 4. . . .17. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . 4. . . . . . . . . . . . . . 4. . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . .18. . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . .17 cudaMallocHost . . . . . . . . . . . . . . 103 4. 106 4. . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . 4.1 Function Documentation . .2. . . . . . . . . . . . .18. . . .2. . . . . .1 cudaBindSurfaceToArray . . .3 cudaBindTexture . . . . . . . .1. . . . . . . . . . . . . . . . . . . . . . . 101 4. . . . . . . . .3 cudaBindTextureToArray . . . .10 cudaEventCreate . . . . .5 cudaBindTexture2D . . . . .17. . . .19 cudaUnbindTexture . . . . . .1 Detailed Description . . . . . . . . . . . 111 Generated for NVIDIA CUDA Library by Doxygen . . . . . . . . . . . . . . . . . . . . . . . 4. . . . .18. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. . . . . . . . . . . . . .16 Surface Reference Management . . . . . . . 108 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15 cudaGetTextureAlignmentOffset . . . . . . . . . .2. . . . . .18 cudaSetupArgument . . . . . . . . . 103 4. . . 4. . . . . . . . . . . . . . 108 4. . . . . . . . . . . .13 cudaGetSymbolAddress . . . . . .18. . . . . . . . . . . . . . . . . . . .16 cudaLaunch . . . . . . . . . . . . . . . . . . . . . . . . . 106 4. . . . . . . . . .2. . . . . .2. . . . .4 cudaCreateChannelDesc . . 100 4.

.20. . 125 4. . . . . . . . . .2. . . . . .21 Direct3D 10 Interoperability [DEPRECATED] .7 cudaD3D9ResourceGetSurfaceDimensions . . . .1 Detailed Description . . . . . . . . . . . . . . . . .2. . .2.2. . . . .5 cudaD3D10ResourceGetMappedPointer . . . . . . .20. . . . . . . . . . . . . . . . . . . . .3 Interactions between CUevent and cudaEvent_t . . . . . . . . . . . . . . . . . . .1 Detailed Description . 126 4. . . . . . 118 4. . . . . . . . . . . . . . . . . . . . . . .19. . . . . . . . . . . . . .20. .2.1 cudaGLMapBufferObject . . . . . . . .3 cudaD3D9ResourceGetMappedArray . . . . . .2. . . . .2 cudaGLMapBufferObjectAsync . . . . .21. .2. . . .2. . . . .10 cudaD3D9UnregisterResource . . . . . .19.4 cudaD3D10ResourceGetMappedPitch . . . . . . . . . . .20. . . . . . . . . .19.20. . . . .2.2. . . . . . . . . . . 127 4. . . . . . . 130 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 4. . . . . . . . . . . . . . . .2. .2. . . . . . . . . . . . .4 Interactions between CUarray and struct cudaArray ∗ . .7 cudaD3D10ResourceGetSurfaceDimensions . . . . . . . . . . . . . . . .3 cudaD3D10ResourceGetMappedArray . . 131 Generated for NVIDIA CUDA Library by Doxygen . . . . .22. . . . . .20 Direct3D 9 Interoperability [DEPRECATED] . . . . . . . . . . . . . . . . . . . . . . . .19. . . . . . . . . 122 4. . . . . . . . . 122 4. . . 115 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22. . . . . .8 cudaD3D10ResourceSetMapFlags . . . . . . . . . . . . . . .20. . . . . . . . . . . . .5 Interactions between CUgraphicsResource and cudaGraphicsResource_t . . . . . . . . . . . . .9 cudaD3D10UnmapResources . . . . . . . . . . . . . .6 cudaD3D10ResourceGetMappedSize . . . . . . . . . . . . . . . . . . .20. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . .6 cudaD3D9ResourceGetMappedSize . . . . . . . . . . . . . .1 Context Management . . . . . .21. . . . . . . .2. 130 4. . .22. . . . . . . . . . . . . . . . . . . . . . . . .2. 126 4. . . . . . . .20. . . . . . . . . . . . . . . 117 4. . . . . . . .1 Detailed Description . . .2 cudaD3D10RegisterResource . . . . . . . . . . . . 129 4. .21.2 Function Documentation . . . . . . . . . . . . . . . . . . . 124 4. . . . . . . . . . . . . . . . . . . . . . . 128 4.21. . . . . . . . . . . . . . . . . . . . . . . .19 Interactions with the CUDA Driver API . . .2. . . .2 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 cudaD3D10MapResources . . . . . . . .5 cudaD3D9ResourceGetMappedPointer . . . . . . .20. . . 130 4. . . . . . . . 112 4.2 cudaD3D9RegisterResource . 122 4. . . . . .21. . .21. . .21. . . . . . . . . . . . . . . 121 4. . . . . . . .4 cudaD3D9ResourceGetMappedPitch . . . .19. . . . . . . . . . . . 113 4. . .21. . . . . . . . . . . .21. . 114 4. 114 4. . . .8 cudaD3D9ResourceSetMapFlags . . . . . . . 121 4. . 112 4. . . . . 112 4. . . . . . . .22. .20. . . . .2 Function Documentation . . . . . . . . . . . . . 114 4.2. . . . .22 OpenGL Interoperability [DEPRECATED] . . .20. . . .20. . . . . . . . .2. .10 cudaD3D10UnregisterResource . . . . . . .21. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 4. . . . .21. . . . . . . . . 112 4. . . . . . . . . . .CONTENTS vii 4. . . . . . 128 4. . . .2 Interactions between CUstream and cudaStream_t . . . 116 4. . . .2. . . . . . . . . . . . . . . . . . . . . . 114 4. . . . . 130 4. .2. . . . . . . . . . . . . 122 4. . . . . . . . . . . . . . .1 cudaD3D9MapResources . . . . . . 119 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 cudaD3D9UnmapResources . . . . . . . . . . . . . . . . . . . . . . .21. . . 119 4. .2. . . . 112 4. . . .

. . . . .23. . . . . .5 cudaGLUnmapBufferObject . . . . . .5 cudaUUID_t . . . . 139 4. . . .16 cudaHostAllocPortable . . . . . . . .1. . .1.23.1. . . . . . . . . . . . . . . . . . . .23. . . . . . . . . .1 cudaError_t . . . . . . . . .1. . . . .2.viii CONTENTS 4.23. . . . . . . . . . . 139 4. . . . . .11 cudaEventBlockingSync . 139 4. . . . . . . . . .3. . . . . . . 144 Generated for NVIDIA CUDA Library by Doxygen . . . . . . . . . . . . . . .23. . . . . . . .1. . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . .2.23. . . .2. . 141 4. . . . . . .3. . . . . . . . . . 138 4. . . .3 cudaGLRegisterBufferObject .1. . . . . . .1. . . . . . . . . . .22.6 cudaGraphicsMapFlags . . 138 4. . . . . . .23. . . . .2. . . .23. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 4. . . . .1 cudaArrayDefault . . . . . . . . . . . . . . . . . .3. . . . .23. . . . . . 140 4. . . . . . .2. . . . . . . . . . . . . . . . 133 4. . . . .1. . . .23. . . . . . 139 4. .23. . . . . . . . . .1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 4. . . . . . . . . . . . .23.2 cudaComputeMode .23. . . . . . . . . . . . . . . . . . . . . . . . . .2 cudaEvent_t . . . . . .1. . . .9 cudaDeviceScheduleSpin . . . . . . . . . . . . . .23. . . . . . . . . . . . . . . . . . . 135 4. . 140 4. . . . . . . . . .3 cudaGraphicsResource_t . . . .23. . .1 Define Documentation . . 140 4. . . . .2 cudaArraySurfaceLoadStore . . . . .12 cudaEventDefault .7 cudaGLUnregisterBufferObject . . 138 4. . . . . . . . . . . . . . 139 4. . . . . . . . . .7 cudaDevicePropDontCare . . . . . . . . . . . . . . .23. . . .6 cudaDeviceMask . . . . . .23. . . . 139 4. . 139 4. . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . 138 4. . .23 Data types used by CUDA Runtime . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . .6 cudaGLUnmapBufferObjectAsync . . . . . . . . . . . . . . . . . . . . .15 cudaHostAllocMapped . . . . . . . . .23. . . . . . . . . . . .23. 139 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 4. . . . . . . . . . . . . . . 140 4. . . . . . . . .5 cudaGraphicsCubeFace . . .23. . . . . . 131 4. . . . . .22.4 cudaStream_t . . .23. . . . . . . . . . . . .4 cudaFuncCache . . . . . . . . . . . .23. . . . . . . . . . .10 cudaDeviceScheduleYield . . . . . . . . . . . . . . . . .2. . . . .22.4 cudaGLSetBufferObjectMapFlags . . . . . . . .5 cudaDeviceMapHost . . . . . . 138 4. . . . . . . . . .1. . . .23. . . . . . . . . . . . .4 cudaDeviceLmemResizeToMax .23. . .3. . . . . . . . . . . . . 138 4. . . . . .3 cudaDeviceBlockingSync . . . . 132 4. . . . . . . . . . . .1 cudaChannelFormatKind . . . 140 4. . . . . . . .1. . . .22. .23. . .2. . . . . 140 4. . .17 cudaHostAllocWriteCombined . . . . . . . . . . . 139 4. . . . . . . . . . . . . . . . . . . . . . . . . .23.2. . 144 4. . . 140 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23. . . . . . . . . .1. 139 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1. . . . . . . 134 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 cudaError . . . . . . . . . .23.23. .23. . . . . .13 cudaEventDisableTiming . . . . . . . . . . . .1. . .1. . . . . . . . . . . . .22. . . . . . . . . . .8 cudaDeviceScheduleAuto . . . . . . . . . . . . . . . . . .3 Enumeration Type Documentation . . . . . . . . . . . . . . . . . . . . . . . 144 4. . . . . . . . . . . . .1. . 138 4. . . . . . . . . . . .23. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . .14 cudaHostAllocDefault . . . . . . . . . . . . . . . . . . . . . . . . 139 4. .2 Typedef Documentation . . . . . . .

. .2 CU_MEMHOSTALLOC_PORTABLE . . .1 CUaddress_mode .1 CU_MEMHOSTALLOC_DEVICEMAP . . . . . . . . . . .3. . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . .10 CUDA_ARRAY3D_SURFACE_LDST . . . . . 145 4. . . . . . . . . . . . . . . . . .CONTENTS ix 4. . . . . . . . . . . . . . . . . . . . .3. . .2. 154 4. . . . . . .4 CU_PARAM_TR_DEFAULT . . . . . . . . . . . . . 145 4. . . .25. . . . . . . . . . . . . . .1. . . . . . . . .1 Define Documentation . . .25. . . . . . . . . . . . . . . . . . . . . . . . . . . 145 4. . . . . . . . .7 CU_TRSF_READ_AS_INTEGER . 147 4. . . . .1. . . . . . . .11 CUDA_MEMCPY3D . . . . . . . . . . . . 145 4. 154 4. . .25. .2.14 cudaTextureReadMode . 154 4. . . . . . . . . . . . . . . . . . . . . .13 cudaTextureFilterMode . . . . . .7 cudaGraphicsRegisterFlags .23. . . . . . . . . . . . . . .23. . . .2. . . . . . . 154 4. . . . . .4 CUarray_format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . 153 4. . . . . . . . . . . . . 154 4. . . 153 4. . . . . . . . . . . . . . . . . . . . .8 cudaLimit . . . .25. . . . . . .1.10 cudaSurfaceBoundaryMode . . . . . . .1. . . . . . . . . . . . . . .5 CU_TRSA_OVERRIDE_FORMAT . . . . 153 4.1. . . .3. . . . . . . . . . . . . 153 4. . . 153 4. . . . . . .23. .2. . . . . . . . . . 153 4. . . . . . . . . .25. .6 CUcontext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 4. . . . . . . . . . . . . . . . . . . . 155 4. . . . . . . . . . . . .25 Data types used by CUDA driver . . . . . . . . . . . . . . . .25. . . . . . 155 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25. . . . . . .12 CUdevice . . . . . . . . . . . . . . . 153 4. . . .3 CUarray_cubemap_face . . . . . . . . . . . . . . . . . . . .25. . . . . . . . . . . . 154 4.25.2. . .25. . . . . . . . . .6 CU_TRSF_NORMALIZED_COORDINATES . . . . . . . . . . . . . . . . . . . . . . .3. . . . .7 CUctx_flags .23. . . . . . . . . .23.1. . . . . . . . . . . . . . . . . .23. . . . . 154 4. . . . . . . . . . . . . . 154 4. . . . 146 4. . . . . . . . . . . .3 CU_MEMHOSTALLOC_WRITECOMBINED .2 Typedef Documentation . . . . . . . .25. . . . . . . .9 CUDA_ARRAY3D_2DARRAY . .13 CUdevice_attribute . .25. . .9 cudaMemcpyKind . .2. . . . .24 CUDA Driver API . . . . . . . . . . .10 CUDA_MEMCPY2D . .25. .25. . . . . . . . . . .9 CUDA_ARRAY_DESCRIPTOR . . . . . . .2. . . . . . .8 CUDA_ARRAY3D_DESCRIPTOR . . . . . . . . . . . . . 154 4. . . . . . . . . .3. . . . . . . . . . .2 CUarray . . 153 4. .25. . . 144 4. . . . . . . . . . . . . 153 4. . . . . . .24. . .23. . . . . . . . . . . . . . . 148 4. . . .2. 154 4. . . . . .1. .1. . . . . . . . . . . . . .25. . . . . . . . . . . . . . . .25. . . . . . . . . . . . .25. . .2.25. . . . . .2. . . . . . .11 cudaSurfaceFormatMode . . . . . . . . . . . . . . . . . . . . . . . .3. . .25. 153 4. . . . . . . . . . .1 Detailed Description . . . . .25. . . . . . . . . . . . . . . . . . . . . . . .8 CU_TRSF_SRGB . . . . . . . . . . . . . . . . .25. . .1. . . . . . . . . .2. .25. . . . . . . . . .25. . .25. . . . . . . . . . . .2. 147 4. . . . . . . .1. . . . . . . . . . . . . .11 CUDA_VERSION . . . . . . . . . 145 4. . . . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . 155 Generated for NVIDIA CUDA Library by Doxygen . . . . . . . . . . .23. . . . 154 4. . . . . . .5 CUcomputemode . . . . . . . . . . . . . . . .25. . . . . . . .1. . . 154 4. . . . . . . . . . . . . . . 153 4. . . . . . .12 cudaTextureAddressMode . . . . . . . . . . . .

. . . . . . . . . . . . . . 155 4. . .2 CUarray_cubemap_face_enum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25. . . . . . . . . . . . . . . .3.8 CUevent_flags_enum . .25. 162 4. .3 Enumeration Type Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . 156 4. . . . . . . . . . . . . .3. . . . . . . . . .10 CUfunc_cache_enum .26 CUjit_option . .2. . . . . . . .3 CUarray_format_enum . .2.2. . . . .25. . . . . .2. . . . . . . 155 4. . . . . . . . . . . .19 CUfunc_cache . . . . . . . . .25. . . . .9 CUfilter_mode_enum . . . .2. . . . . . . . . 155 4. . . . . . . . . . . . . . . . . . . . . . . . .28 CUlimit . .2. . . . . . . . . . . . . . . . . . . .3. . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 4. . . .25. .27 CUjit_target . . . . . . . .25. .25. . . . . . . . . . . . . . .25. . . .2. . . . .25. . .25. . . . . . . .2. . . . . .3. . . 162 4. . . . .3. . . . . . . . . . . . . . 162 Generated for NVIDIA CUDA Library by Doxygen .2.25. . . . .25 CUjit_fallback . . .33 CUsurfref . . . . . . . . .3. . . .32 CUstream .3. .3. 156 4. . . .4 CUcomputemode_enum .2. . . . 155 4.3. . . . . . . . . 155 4. . . . . . . . . . . . . . . . .25. . . . . . . .3. . . . . . . . . 160 4.21 CUfunction_attribute . . . . . . . . . . 156 4. . .14 CUjit_fallback_enum . . . . . .2. . . 157 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 4. . . . . . . . . . . . . . . .25. . . . . .2. . . . .25. . 162 4. . . . . . . . . . . . 161 4. . .23 CUgraphicsRegisterFlags . . . . . . . . . . . .3. . . . . . . .22 CUgraphicsMapResourceFlags . . . . . . . . .25. . . . . . . 155 4.15 CUdevprop . . . . . .x CONTENTS 4. . . . . 155 4. . . . . . . . . . . . . . . . . . . . . . .25. . 155 4. . . .25. . . . . . . . . . .2. .25. .29 CUmemorytype . . . . . . . . . 158 4. . .20 CUfunction . . . 155 4. . . .25. . . . . . . . . . 157 4. . . . . . . . . . .31 CUresult . . . . .2. . . .16 CUevent . . . . . . . . . .12 CUgraphicsMapResourceFlags_enum . . .6 cudaError_enum . . . . . . . . . . . . . . . . . .25. . . . . . .25. . . . . . . . . . . . . . .34 CUtexref . . . . .3. . . . . . . . . . . . . 156 4. . . . . . . . . . . . . .5 CUctx_flags_enum . . . . . . . . . . . . . . . . . . . . . . 157 4. . . . . . . . . . .25. . .25. . . . . . . . . . . . . . . . .25. . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 4.18 CUfilter_mode . . . . .25. . . .14 CUdeviceptr . . . . . . .2. . . . . . . . . . . . . . . .25. . . . . . . . . .15 CUjit_option_enum .25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . .2. . . . . . . . .25. . . .25. . . . . . . . . . . . . . . . . . . . . . . . . .25. 156 4. . . . . . . . . . . . .2. . . . . . . . . . . . .3. .11 CUfunction_attribute_enum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 4. . . . . . . . . . . . . . . 161 4. . . . .3. . . . . . . . . . 156 4. . . . . . . . . . . . 156 4. . . . . .25. . . . . . .25. . . . . . . . . . . . . . . . . . . . .25. . . . . . . . . . .24 CUgraphicsResource . . . . . . . . . . . . . . . .25. . . . . . . . . . . . .30 CUmodule . . . . . . . . . . . . . . . . . .2. . . . . . . . . .17 CUevent_flags . . . . . . . . . . . . . . . . .7 CUdevice_attribute_enum . . . .25. . . . . . . . . . . . . . . 157 4. . . . . . . . . . . . . . . . . . . . . . . . 156 4. . . .13 CUgraphicsRegisterFlags_enum . . . . . . . . . . . . . . . . . .1 CUaddress_mode_enum . . 156 4. 161 4. .25. . .2. . . . . . . . . . . . . . . 156 4. . . . . . . . .25.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26 Initialization . . .2. . . .4 cuCtxDetach .6 cuCtxGetCacheConfig . .2. . . .2 Function Documentation . 180 4. . . . .CONTENTS xi 4. . . 166 4. . . . . . . . . . . . . . . .28. . . . . .7 cuCtxGetDevice . . . . . . . . . . . . . . . . . . 176 4. . . . . . . . . . . . . . . . . 177 4. . . . . . . . . .27. .29. . . . . . . . . . . . . .7 cuDeviceTotalMem . . . . . .28. . .2 Function Documentation . . . 172 4. . . . .28. . . . . . . . . . . . . . . . . . . . 166 4. . . . . . . . . . . . . . .2 cuCtxCreate . . . . . . . . . .29. . .6 cuDeviceGetProperties . . . . . . . . . .1 cuCtxAttach . .2. . . . . . . . . . . . .2 cuDeviceGet . . . . . .29 Context Management .28 Device Management . . .29. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27 Version Management . . .29. . . . . . . . . .26. . . . . 167 4. . 164 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28. . . . . . . . .29.2. .2. . . . . . . . . . . . .5 cuCtxGetApiVersion . . . . . .27. . . . . . . . . . . 165 4. . . . . . . . . . . .1 Detailed Description . . . . . . . . .1 Detailed Description . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . 165 4. . . . . . . . 166 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 cuInit . . . . . . . . . . . .29. . . . . . . . . . . . . . . . . .1 cuDeviceComputeCapability . . . . .3 cuDeviceGetAttribute . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . .29. . . . . . . . . . . . .29. . .2. . . . . . . . . . . . . . . . . .29. . . .2. . . . . . . . . . . . . . . . . . . . .11 cuCtxSetCacheConfig . . . . . . . . . . . . . . . . . . . . . . . . . 170 4. . . . . . . . . . . . . . . 174 4. . . . . . . . . . .3. 176 4. . . .2. . . . 167 4. . . . . . . . .28. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 cuCtxSetLimit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Detailed Description . . . . .2. . . . . . . . . . . .2 Function Documentation . . . . . . . . . . . . . . .26. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 4. . . . . . . . .2. . . . . . . . . . .16 CUjit_target_enum . . . 168 4. . . . . . . . . . . . . . . . . . . . . . .3 cuCtxDestroy . . . . . . . . . . . . . . . . . 175 4. . . . . . . . . . . . . . . . . . . . . . . . . . . .25. . . . . . . .1 cuDriverGetVersion . . . .8 cuCtxGetLimit . . . 163 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 4. . . . . . . .5 cuDeviceGetName . . . . . .26. . . . . . . . . . . . . . . 166 4. . . . . . . . . . . . . .2. .28. . . . 170 4. . . . . . . . . .2. .1 Detailed Description . . . . . . .13 cuCtxSynchronize . . . . . . 173 4. . . . . . . . . 165 4. . . . . . . . . . . . .29. . . . 179 4. . . . .25. . . . . . . . . . . . . .2. . . . . .2. . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . .2. . . . . . . . . . . 177 4. . . . . . 174 4. . .2. . . . 181 Generated for NVIDIA CUDA Library by Doxygen . . .2. . . . .29. . . . . . . . . . . . . .25. . . . . . . . 178 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . 167 4. . . . . . . . .9 cuCtxPopCurrent . . . . . . . . . . . . . . . . . . . .4 cuDeviceGetCount . . . . . .29. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28. . . . . . . . . . . 174 4. . . . . . 178 4. . . 179 4. . . . . . . . . . . . . .18 CUmemorytype_enum . 168 4. . . . . . . . . .2. . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . .10 cuCtxPushCurrent . . . . . .29. . . .2 Function Documentation . . 165 4. . . . . . . . . . . . . . . . . .29. . . . . . . . . . . .28. . . . . . . . . . . . . . 171 4. . . . . . . . . . . . . . . . .27. . . . 164 4. . . . . .29. . . . . .17 CUlimit_enum . . . . . .28. . . . . . .3. . . .

. . . . .18 cuMemcpyDtoA . . . . . . . . . . . . . . . . .31. . . .31. . . . . . . . . . 210 4. . . . . . .31. . . . . 212 4. . . . . . . . . . . .31. . .31. 214 Generated for NVIDIA CUDA Library by Doxygen . .31. . . . . . . . . . . . .3 cuArrayCreate . . . . . . . . . . . . . . .9 cuModuleUnload . . . . . .2. .2. . . 183 4. . . . . . 211 4. .31. .2. . . . . . . . . . . . . . . . .30. 187 4. .30. . . . . . .31.14 cuMemcpyAtoA . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Function Documentation .31. . . . . . . . . . . . . . . . . . . . . . .30. . . . . . . . . . . . . . . 182 4. .2. . . . . . . . . . . . . 196 4.30. . . . . . . . . . . . . . . . .2. . . . . . . . .4 cuModuleGetTexRef . . . .31. . .21 cuMemcpyDtoH . . . . . . . . . . . . 200 4. .20 cuMemcpyDtoDAsync . . . . . . . . . . . . . . . . . . . .1 Detailed Description . . . . . . . . .2. . . . . .8 cuModuleLoadFatBinary . . . . . . . .2. .7 cuMemAllocHost . 185 4. .10 cuMemcpy2DAsync . . . . . . . .2. . . . . . . . . . .31 Memory Management . 183 4. . . . . . . . . . . 196 4. .13 cuMemcpy3DAsync .31. . . . . . . . . . . . . 191 4. . .2. . . . .4 cuArrayDestroy . . . . . . .31. . . 191 4. . . . . . . . . . . . . . . . . . . . . . . . . .30. . . . . . . . . .19 cuMemcpyDtoD . . . . . . . . . . . . . . . . .12 cuMemcpy3D . . . . . . . . .2. . . . . . .xii CONTENTS 4. . . . . . . 197 4. . .2 cuArray3DGetDescriptor . . . 213 4. . . .16 cuMemcpyAtoH . . . . . . . . . . . . . .31. . . . . . . . . . . . . .30. . . . . . . . . 211 4. . .5 cuModuleLoad . . . . . . . . . . . .3 cuModuleGetSurfRef . . . . . . . . . . . . . .1 cuModuleGetFunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 4. . . . . . 207 4.2. . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . .9 cuMemcpy2D . . . . . . . . . . . . . . . . . . . . .6 cuMemAlloc .8 cuMemAllocPitch .2 cuModuleGetGlobal . . . 198 4. . . . . . . . . . . . . . . . . . . . . . . 213 4. . .2. . . . . .31. . 210 4. . . . . . . . . . . . 184 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 4. . . . . . 202 4.2. . .7 cuModuleLoadDataEx . . . . . . . . . . . . . . . . . . . . . .6 cuModuleLoadDataodule Management . . . . . . . . .31. . . . . . . . .2. 195 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . .31. . . . . . . .30. . . . .1 cuArray3DCreate . . . . . . . . . . . . . . . . . . . . . . . . . 193 4. . . .2 Function Documentation . . . . . . . . .31. . . . . . .30. . . . . . 188 4. . . . .30. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . .5 cuArrayGetDescriptor . . . . .11 cuMemcpy2DUnaligned . . . .31.22 cuMemcpyDtoHAsync . . . .17 cuMemcpyAtoHAsync .31. . .15 cuMemcpyAtoD . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . .1 Detailed Description . . . . . . . . . . . . . . . . . . . . . . 204 4. . . 184 4. .2. . 182 4. . . . . . . . . .2. . . . . . . . .30. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . 187 4. .

. . . . . . . . . . . . . . .32. . . . . . . .31. . . .CONTENTS xiii 4. 221 4. . . . . . . . . . . 220 4. . . . . . . . . . . . . 230 4. .1 cuStreamCreate . . . . . . . . . . . 218 4.33. . . . . . 231 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31. . . .33 cuMemHostGetFlags . . . . . . . . . .37 cuMemsetD2D16Async . . 225 4. . .31. . . . .31. . . . . .31. . . . . . . . 233 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 4. . . . . . . . . . . . . . . . . . . . . .28 cuMemFreeHost . . . . . . . . . . . . . .32. . . . . . . . . .33. . . . 216 4. . . . . . . . .2 cuEventDestroy . . .45 cuMemsetD8Async . . . . . . . . . . .32. . . . . . .32. . . . . . . 230 4. . . .2. . . . . . . . . 224 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.36 cuMemsetD2D16 . . . . . . .2. . .2. . . . . . . . . . . . . . 231 4. . .24 cuMemcpyHtoAAsync . 230 4. . . . . . . . . . . . . . . . 223 4. . . . . . . . . . .31. . .44 cuMemsetD8 . .2. . . . . . . . . . . . . . . . . . . . . . . . . . .23 cuMemcpyHtoA . .2 Function Documentation . . . . . .2. . . . . . . . 219 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . .5 cuStreamWaitEvent . . . 232 4. . . . . . . .31.31. . . . . . . 228 4. . . . . . . . . . . . . . . . . . . . . . . . . . . 217 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 4. . . . . . . . . . . . .32. .26 cuMemcpyHtoDAsync . . . . . . . . . . . . . . . . . . . . . . . . .4 cuStreamSynchronize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 4. . . . . . . . . . . . . . .41 cuMemsetD2D8Async . . . . . . . . . . . 218 4. .25 cuMemcpyHtoD . . . . . . . . . . . . . . . . . . . . . . . . 222 4. . . . . . . . . . . .2. .31. . . .1 Detailed Description . . 224 4. . .2 cuStreamDestroy . . . . .33. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 4. . . . . . . .2. . 215 4. . . . . .1 cuEventCreate . 219 4. . .2. . . . . . . . .31. . . . . 233 4. . . . .2. . . . . .42 cuMemsetD32 .34 cuMemsetD16 . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . .2. . . . . . .2. . . . .2. . .32. . . . . . . . . . . .2.31. . . .31.30 cuMemGetInfo . . . . . . . . . . . . . . . . . .29 cuMemGetAddressRange . . . . . . . . .31. . . . . .33. . . . . . . . . . . .31. . . . . . . . . . . . . .2. . . . .31 cuMemHostAlloc . . . . . . . . . . . . . 228 4. . . . . . . . . .3 cuStreamQuery . . . . . . . . . . . . . . . . . . .31.35 cuMemsetD16Async . . . . . . . . . .2 Function Documentation . . . . 226 4. .31. . . . . .27 cuMemFree . . . . . . . . . . . . . . . . . . . . .32 cuMemHostGetDevicePointer . .31. 234 Generated for NVIDIA CUDA Library by Doxygen . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . .2.31. . . . . . . . .2. . . . . . . . . . . . . . . . . . .33. . . . . . . . . . . . .33 Event Management . 227 4. . . . . . . . . . . . . . . . . .38 cuMemsetD2D32 . . . . . . . . . . .2. . . . . . . . . . . . . . . .39 cuMemsetD2D32Async . . . . . . . . 230 4. . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31. .32 Stream Management . .3 cuEventElapsedTime . . .31. . . . . 226 4. . . . . .31. . .2. . . . . . . . . . . 222 4.43 cuMemsetD32Async . . . . 233 4. . . . . . . . . . . . . . . . . 231 4.40 cuMemsetD2D8 . . . . . . . . . . . . . . . .2. . . . . . . . . . . 234 4. . . .32. . . . .2. . .2. . . . . . . . . . .2. .2. . . . . . . . . . . . . . . . . . . . . . . . .31. . . . . . . . . . . . . .31. . . . . . . . . . . . . . . . . .1 Detailed Description . . . . . . . . . . . . . . .2. .

. . . . . . . . . . . . . . . . . . . . . . .1 cuTexRefGetAddress . . . . . . . . . . .2 cuFuncSetBlockShape .2. . . . . . . . . . .36. . . . . . . . . . . . . . . . . . . .2. . .33. . . . .2 cuTexRefGetAddressMode . . .34. . . . . . . . . . . . . . 246 4. . .2 Function Documentation . . 244 4.36. . . . . .33. . . . .7 cuLaunchGridAsync . . . .34. . . . . .36. . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . .3 cuTexRefGetArray . . . . .2. . . . . . . . . 240 4. . . . .12 cuTexRefSetFlags . .34. .8 cuParamSetf . . . . . . . . . . . 246 4. .1 cuParamSetTexRef . . . . 238 4. . . .8 cuTexRefSetAddress2D . . . . . . . . . . . .3 cuFuncSetCacheConfig . . . . . .2. . . . . . . . . .2. . . . . . . . .2. . . . . . . . . . . . . . . . . . 247 4. .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 4.34. . . . . .2. . . . . . . . . .34. . . . . . . . . 246 4. . . . . . . . . . . . .33. . . . . . .35.2. . . . 237 4. . . . . . .34 Execution Control . . . . . . . . . . . . . . . . . . 250 4. . . 241 4.2. . . . . . . . . . . .2. 242 4. . . .36 Texture Reference Management . . . .34. . . . . . . . . . . . . . . . . . . .4 cuFuncSetSharedSize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36. . . . . . . . . . . . .36. . . .4 cuTexRefGetFilterMode . . . . . . . . . . . . . . 239 4. . . . . . 238 4.34. . . . . . . . . .1 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 cuParamSetSize . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36. . . . . . .5 cuTexRefGetFlags . . . . . . . .10 cuTexRefSetArray . . . . . . . . .36.36.36. . . . . . . . . . . . . . . . . . . . . . . . . .34. . . .1 cuFuncGetAttribute . . . . . . . . . . . . . . . . . . .1 Detailed Description . . . . . . . . . . .34. . . .11 cuParamSetv . . . .35. . . . .2. . 243 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Detailed Description . . . . . . . . . . . . . . . . . .5 cuLaunch . . . . . . . . . . .35. . . . . . .2. . . . . . 251 Generated for NVIDIA CUDA Library by Doxygen . . .5 cuEventRecord . . . . . . . . . . . . .36. .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35 Execution Control [DEPRECATED] . . . . . . . 248 4. . . 244 4. . . . . .6 cuTexRefGetFormat . . . . . . .2. 243 4.36.34. .6 cuEventSynchronize . . . . . . . . . . . . . . . . . . 237 4. . . . . . . . . . . . . . . . . . . . . . . 249 4. . . . . . .9 cuParamSeti . . . . . . . . . . . . . . . . . . . . 242 4. . . .34. . 247 4. . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . 235 4. . . . . .34. . . . . . 245 4. . . . . . . . . . .xiv CONTENTS 4. . . . . . . . 244 4. . . . . . . . . . . . . . . . . . . . .7 cuTexRefSetAddress . .2 Function Documentation . .2. . . 235 4. . . 247 4. . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11 cuTexRefSetFilterMode . . . . . . 249 4. . . . .6 cuLaunchGrid . . . 244 4. . . . . . .2. . . . . . . .2. . .36. . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36. . . . .34. . . . . . . . . . . . .13 cuTexRefSetFormat . . .36. . . . . . . . . . . 239 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 4. . . . . . . 250 4.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 4.2. . . . . .9 cuTexRefSetAddressMode . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . 241 4. . . . . . 236 4. . . .2. . .2 Function Documentation . . . . . . . .36. . . . . . . . . . . . . . 251 4. . .4 cuEventQuery . . . .

. .39. . 258 4. . . . . . . . . . . . . .2. .39. . . . . 253 4. . 263 4. . . . . . . . . . .41. . . . .41. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 cuGraphicsResourceGetMappedPointer . . . . . .1 CUGLmap_flags . . . . . . . . . . . . . . . . . . . . . .2 cuGLMapBufferObject . . . . . . . . . . . . . . . . . . . . . .6 cuGraphicsUnregisterResource . . .1 cuGraphicsMapResources . . . . . . . . . . . . . . . . . . . . . . . . . . . .41. . . . . . 253 4. . . . . . . . . . . . . . .1 Detailed Description . . . . .41 OpenGL Interoperability [DEPRECATED] . . . . . . . . . . . . . . . . . . . . . . . .41. . . . . . . . . . . . . . . . . . . .4 cuGraphicsSubResourceGetMappedArray .2. . . 259 4. . . . . . . . . . .37. 260 4. . . . . . . .1 Detailed Description . . . 253 4. . . . .39. . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.2. . . . . . . .2 Function Documentation .41. . . .4. .4. . .2. . . . . .41. . . . . . . .38 Surface Reference Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Enumeration Type Documentation . . . . . .38. . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . .2.2. . . . . . . .38. .4. . . . . . .2 cuTexRefDestroy . . . . . . . . . . . . .2 Function Documentation . . . . . . . . .2 cuSurfRefSetArray . . . . 266 4. 266 4. . . . . . . . . . . 266 4. . . . . . . . . . . . . . . .4 Function Documentation .2. . . 253 4. . . . .2. . . . . . .38. . .1 Detailed Description . . . . . . . . . . . . . . . . . . .1 cuTexRefCreate . . . . . . . . .37. . . . . . . . . . . . . . . . . . . . . . . . . . .41. . .40.39. 265 4. . . . . . . . . . . . . . . . . . . . . . . . . . .37. . 262 4. . . . . . . . . . . . . . . . . . . . .37 Texture Reference Management [DEPRECATED] . . . . . . . 257 4. . . . . . . .2. . . .4 cuWGLGetDevice . . . . .41. .5 cuGraphicsUnmapResources . . . . . . . . 255 4. . . . . .40. . . . . . . . . . . . . . . . . . . . . . . . . . 263 4. 265 4. . . . . . . . . . . . . . . .1 cuGLInit . . . . . . . . . . . . . . . . . . . .40 OpenGL Interoperability .37. . . . 258 4. . . 262 4. . . . . . . . . . . .4. 266 4. 265 4. . 260 4. . .41. . . . . . . . . . . . . . . . . .3 cuGraphicsGLRegisterImage .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Detailed Description . . . . . . . . . . . . . . . . .38. . . . 257 4. . . . . 262 4. . . . . . . . . . . . . .2. . . . . . . . . . . . . . .1 cuGLCtxCreate . . . . . . . . . .2. . .3 cuGLMapBufferObjectAsync . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 cuGraphicsResourceSetMapFlags . . . . . . . . . . . . . . . . . . . . . . . . .1 Detailed Description . . . . . . . . . . . . 253 4. .41. . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 4. . . . . . . . 255 4. . . .2 cuGraphicsGLRegisterBuffer . . . . . . . . . . 262 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .CONTENTS xv 4. . . . . . . . . . . . 257 4. . . 255 4. . . . .39. . . . . . . . . . . . . . . 265 4. .40. . . . . . . . .39 Graphics Interoperability . . . . . . . . . . . . . . .1 cuSurfRefGetArray . . . . . . . . . . . . . . . . . . . . . . . . . 255 4. . . .2. . . . . . .40. . . . . . . .40. . . . . . . . .40. . . . . . . . . .1 CUGLmap_flags_enum . . 255 4. . . . . . 267 Generated for NVIDIA CUDA Library by Doxygen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 4. . . . . . . . .4 cuGLRegisterBufferObject . . . . . .2 Function Documentation . . . . . . . . 264 4. .2 Function Documentation . 266 4. . . . . . . . . . . . . . . . . . . . . . . . . .39. .2 Typedef Documentation . . . . . . . . . . .39. . . . . . . . . . . . . .

. . . 282 4. . . . . . . . . . . . . . . . . . . .2 cuD3D9CtxCreateOnDevice . . . . . . 277 4. . . .6 cuGLUnmapBufferObject . . 278 4. . . . . . . . . . . . . . . . . . . .5 cuD3D9GetDirect3DDevice . . . . . . . . 281 4. .4. . . . . 272 4. .43. 273 4. .7 cuD3D9ResourceGetSurfaceDimensions . . .41. . . . . 279 4. . . . . . .42. . . . . . . . . . . . . . .42. . . . . . . . . 274 4. . . .41. . . . . .4. . . . . . .42. .4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43 Direct3D 9 Interoperability [DEPRECATED] . . .43. . .4. . . . . . . . .1 CUd3d9map_flags_enum . . . . . . . . . . . . . . . . . . . . . . . . . . .41. . . . . . 275 4. . . . . . . . . . . 272 4. . . . . . . . . .3. . . . . . . . . . . . . . . . 285 4. . . .43. . . . . . . . . . . 278 4.43. . . . . . . . . . . .43. . . . . 272 4. . . . . . . . . . . . . . . . . . . . . . . 278 4. . . 269 4. . . . . . . 268 4. . . .42. . . . . . . . . . . 272 4. . . . . . . . . . . . .42. . . . . . . . . . . . . . . . . . . . . . . . . . 286 Generated for NVIDIA CUDA Library by Doxygen . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . 271 4. . . . . . .44 Direct3D 10 Interoperability . . . . . . . .2 cuD3D9RegisterResource . . . . . . . . . . . . . . .43.43. .42. .4. . . . . .43. . . . . . . . . . . . .2 CUd3d9register_flags_enum . . . .4.43. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. .1 Detailed Description . . .4. . . . . . . . . . .2 Typedef Documentation . . . .1 CUd3d9DeviceList_enum . . . . . . . . . . . . . . .4. . . . . . . . . 272 4. . . . .3 cuD3D9ResourceGetMappedArray . . . .43. .42.5 cuGLSetBufferObjectMapFlags . . . . . . . . . 283 4. . . . . . . . . . .43.1 Detailed Description . . . . . . . .42 Direct3D 9 Interoperability . . . . . . .43. . .43. . . 273 4.6 cuGraphicsD3D9RegisterResource . . . . . . . . . . . . . . . . . . . . .3 Enumeration Type Documentation . . . . . . .2 CUd3d9register_flags .4. . . .8 cuGLUnregisterBufferObject . . . . . . . . . . . . . . . . . . . . . . . . . 280 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . . . . . . . .1 cuD3D9CtxCreate . .43. . . 278 4. . . . . . . . . . . .4. . . . . . . . . . . . . . . . . . . . . . . . . 272 4. . . . . . . . . . . . . . . 278 4.42. .4 cuD3D9ResourceGetMappedPitch . . . . . . . . . . . . . . . . . .3 Enumeration Type Documentation . 278 4. . . . . 270 4. . . . . .1 cuD3D9MapResources . . . .42. . . . . . . . .4. . . 282 4. . . . . . . .2 Typedef Documentation . . . . . .43. . . . . .2. . . . . . . . . . . . . . . . . .1 CUd3d9DeviceList . .42. . . . . . . . . . . .3 cuD3D9GetDevice . . . . .8 cuD3D9ResourceSetMapFlags . . . .42.7 cuGLUnmapBufferObjectAsync . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . .43. 269 4.xvi CONTENTS 4. . . . . . . . . . . . . 278 4. . . . . . . . . . . .4. . . . . . . . . . . . . . . . . . .4 Function Documentation . . . . . . . .6 cuD3D9ResourceGetMappedSize .4 cuD3D9GetDevices . . . .4. . .4. . . . . . . . . . . . . . . . . . . . . . . . .43. . . . . . 278 4. . . . . . . . . . . .4. . .2. .9 cuD3D9UnmapResources . . . . . . .1 CUd3d9map_flags . . . . . . . . . . . .42. .4 Function Documentation . . . 278 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 cuD3D9UnregisterResource . . . . . . . . . . . . .4. . . . . . . . . 274 4. . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . . . . . . . . . . . . . . . . . .43. . . . . . . . . . 285 4. . . . . . . . . . . . . . . . 284 4. . . . .41.5 cuD3D9ResourceGetMappedPointer . . . . . . . . . . . . . . . .3. . . .4. . . . . . . . . 271 4. .

44. . . . . . . . . . . . . . . . . . .45. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 4.46. . .45. . . . . . . . . . . . . . .3 cuD3D10GetDevice . . . .3. .1 CUD3D10map_flags_enum . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. .4. . . .4 Function Documentation . . . . . . . .45. . . . . 293 4. . . . . . . . . . . 301 4. . . . . . . .6 cuD3D10ResourceGetMappedSize . . . . . . . . . . . . . . . . . . 302 4. . . . . . 293 4. . . . 297 4. 289 4. . .2 Typedef Documentation . . . . . . .4 cuD3D10GetDevices . . . .45. . . . . . . . . .4 cuD3D10ResourceGetMappedPitch . . . . . 288 4. . . . . . . . . . . . .1 CUd3d11DeviceList_enum .10 cuD3D10UnregisterResource . .4. . . . . . . . . . . . . . 287 4. . . . . . .1 Detailed Description .3. . . . . . . . . . .44. . . . . . . . .8 cuD3D10ResourceSetMapFlags . . . . . . . . . . . . . . . . . . . . . .4. . . . . . . .4. . . . .45. . . . . . . . . .45. . .46. . . . . . . . . .4. . . . . . . .2 cuD3D10RegisterResource . . . . . . . . . . . .45. . . . . 288 4. . . . . . . . . . . . . . .1 cuD3D10CtxCreate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . . . . . . . . . . . . . . .1 Detailed Description . . 301 4. . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . . . . . . . . . . .1 CUd3d10DeviceList . . . . . . . . . . .44.45.45. . . . . .3. . 286 4. . . . .2 cuD3D10CtxCreateOnDevice . . . . . . . . . . . . . . . .44. . . . . . . . . . . . . . . . . . . . . . . . .45. . . . . . .45. . . . . .2. . . . . . . . 287 4. . . . . . . . . . . . . . . . . . . . . . . .5 cuD3D10ResourceGetMappedPointer . . . . . . . .4. . .2 CUD3D10register_flags . . . . . 293 4. . . . . . .44. .7 cuD3D10ResourceGetSurfaceDimensions . .44. . . . . . . 289 4. . . . . . . 293 4. . . . . . . .4. . . . . . . . . . . . .2 CUD3D10register_flags_enum . . . .3 Enumeration Type Documentation . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . 293 4.46. . . . . . .1 CUd3d11DeviceList .4 Function Documentation . 301 4. . . . . . . . . .2 Typedef Documentation . . .4. . . . . . 299 4. . . . . . .2. . . 296 4. . . . . .6 cuGraphicsD3D10RegisterResource . . . . . . . . . . . .1 Detailed Description . . 287 4. . 297 4. . . . 300 4. . . . .45 Direct3D 10 Interoperability [DEPRECATED] . . . . . .44. . . . . . . . . . .3. . . . . . . . . . . . .46. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 4.4. . . . 294 4. . . . . . . . . . . .1 CUd3d10DeviceList_enum . . 302 Generated for NVIDIA CUDA Library by Doxygen .45. . .44.45. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45. . . . . . . . . . . . . . . . . . . . . . . . .45. . . . . . . . . .45. . . . . . .1 cuD3D10MapResources . . . . . . . . . .44. . .2 Typedef Documentation . . . . . . . . . . . . . 293 4. . . . . . . . . .1 CUD3D10map_flags . . . . . .45. . . . . .3 Enumeration Type Documentation . . . . . . . . 287 4. . . . . 298 4. . . . . . 293 4. . . . . . . . . . . . . . .CONTENTS xvii 4. . . . . . . . . . . . . . . .44. . . . . . . 293 4. . . . . .4. . . . . . . . 290 4. . . . . . . . 301 4. . . . . . . .9 cuD3D10UnmapResources .44. . 293 4. . . . .46 Direct3D 11 Interoperability . . . . . . . . . . . . . . . . . . . . . .3 Enumeration Type Documentation . . . . . . . . .4.4. . . . . . . . . . . . . . . . 292 4. . .2. . .3 cuD3D10ResourceGetMappedArray . . . . . . . . .5 cuD3D10GetDirect3DDevice . . . .46. . . . . . . . . . . . . . . . . . . 287 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . . . . 299 4. . . . . . . . . .44. . .45. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xviii

CONTENTS 4.46.4 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 4.46.4.1 cuD3D11CtxCreate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 4.46.4.2 cuD3D11CtxCreateOnDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 4.46.4.3 cuD3D11GetDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 4.46.4.4 cuD3D11GetDevices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 4.46.4.5 cuD3D11GetDirect3DDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 4.46.4.6 cuGraphicsD3D11RegisterResource . . . . . . . . . . . . . . . . . . . . . . . . . 305

4.47 VDPAU Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 4.47.1 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 4.47.2 Function Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 4.47.2.1 cuGraphicsVDPAURegisterOutputSurface . . . . . . . . . . . . . . . . . . . . . . 307 4.47.2.2 cuGraphicsVDPAURegisterVideoSurface . . . . . . . . . . . . . . . . . . . . . . . 308 4.47.2.3 cuVDPAUCtxCreate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 4.47.2.4 cuVDPAUGetDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 5 Data Structure Documentation 5.1 311

CUDA_ARRAY3D_DESCRIPTOR_st Struct Reference . . . . . . . . . . . . . . . . . . . . . . . . 311 5.1.1 5.1.2 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 Field Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 5.1.2.1 5.1.2.2 5.1.2.3 5.1.2.4 5.1.2.5 5.1.2.6 Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 Height . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 NumChannels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 Width . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

5.2

CUDA_ARRAY_DESCRIPTOR_st Struct Reference . . . . . . . . . . . . . . . . . . . . . . . . . . 313 5.2.1 5.2.2 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Field Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 5.2.2.1 5.2.2.2 5.2.2.3 5.2.2.4 Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Height . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 NumChannels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Width . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313

5.3

CUDA_MEMCPY2D_st Struct Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 5.3.1 5.3.2 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 Field Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 5.3.2.1 5.3.2.2 dstArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 dstDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
Generated for NVIDIA CUDA Library by Doxygen

CONTENTS 5.3.2.3 5.3.2.4 5.3.2.5 5.3.2.6 5.3.2.7 5.3.2.8 5.3.2.9

xix dstHost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 dstMemoryType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 dstPitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 dstXInBytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 dstY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 Height . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 srcArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

5.3.2.10 srcDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 5.3.2.11 srcHost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 5.3.2.12 srcMemoryType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 5.3.2.13 srcPitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 5.3.2.14 srcXInBytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 5.3.2.15 srcY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 5.3.2.16 WidthInBytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 5.4 CUDA_MEMCPY3D_st Struct Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 5.4.1 5.4.2 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 Field Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 5.4.2.1 5.4.2.2 5.4.2.3 5.4.2.4 5.4.2.5 5.4.2.6 5.4.2.7 5.4.2.8 5.4.2.9 Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 dstArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 dstDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 dstHeight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 dstHost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 dstLOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 dstMemoryType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 dstPitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 dstXInBytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

5.4.2.10 dstY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 5.4.2.11 dstZ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 5.4.2.12 Height . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 5.4.2.13 reserved0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 5.4.2.14 reserved1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 5.4.2.15 srcArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 5.4.2.16 srcDevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 5.4.2.17 srcHeight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 5.4.2.18 srcHost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 5.4.2.19 srcLOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 5.4.2.20 srcMemoryType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
Generated for NVIDIA CUDA Library by Doxygen

xx

CONTENTS 5.4.2.21 srcPitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 5.4.2.22 srcXInBytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 5.4.2.23 srcY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 5.4.2.24 srcZ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 5.4.2.25 WidthInBytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 5.5 cudaChannelFormatDesc Struct Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 5.5.1 5.5.2 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Field Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 5.5.2.1 5.5.2.2 5.5.2.3 5.5.2.4 5.5.2.5 5.6 f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 w . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

cudaDeviceProp Struct Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 5.6.1 5.6.2 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 Field Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 5.6.2.1 5.6.2.2 5.6.2.3 5.6.2.4 5.6.2.5 5.6.2.6 5.6.2.7 5.6.2.8 5.6.2.9 canMapHostMemory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 clockRate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 computeMode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 concurrentKernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 deviceOverlap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 ECCEnabled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 integrated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 kernelExecTimeoutEnabled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 major . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321

5.6.2.10 maxGridSize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 5.6.2.11 maxTexture1D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 5.6.2.12 maxTexture2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 5.6.2.13 maxTexture2DArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 5.6.2.14 maxTexture3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 5.6.2.15 maxThreadsDim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 5.6.2.16 maxThreadsPerBlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 5.6.2.17 memPitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 5.6.2.18 minor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 5.6.2.19 multiProcessorCount . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 5.6.2.20 name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 5.6.2.21 pciBusID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
Generated for NVIDIA CUDA Library by Doxygen

CONTENTS

xxi 5.6.2.22 pciDeviceID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 5.6.2.23 regsPerBlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 5.6.2.24 sharedMemPerBlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 5.6.2.25 surfaceAlignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 5.6.2.26 tccDriver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 5.6.2.27 textureAlignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 5.6.2.28 totalConstMem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 5.6.2.29 totalGlobalMem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 5.6.2.30 warpSize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323

5.7

cudaExtent Struct Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 5.7.1 5.7.2 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 Field Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 5.7.2.1 5.7.2.2 5.7.2.3 depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 height . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 width . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324

5.8

cudaFuncAttributes Struct Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 5.8.1 5.8.2 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Field Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 5.8.2.1 5.8.2.2 5.8.2.3 5.8.2.4 5.8.2.5 5.8.2.6 5.8.2.7 binaryVersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 constSizeBytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 localSizeBytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 maxThreadsPerBlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 numRegs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 ptxVersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 sharedSizeBytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

5.9

cudaMemcpy3DParms Struct Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 5.9.1 5.9.2 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 Field Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 5.9.2.1 5.9.2.2 5.9.2.3 5.9.2.4 5.9.2.5 5.9.2.6 5.9.2.7 5.9.2.8 dstArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 dstPos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 dstPtr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 extent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 kind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 srcArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 srcPos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 srcPtr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328

5.10 cudaPitchedPtr Struct Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Generated for NVIDIA CUDA Library by Doxygen

xxii

CONTENTS 5.10.1 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 5.10.2 Field Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 5.10.2.1 pitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 5.10.2.2 ptr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 5.10.2.3 xsize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 5.10.2.4 ysize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329

5.11 cudaPos Struct Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 5.11.1 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 5.11.2 Field Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 5.11.2.1 x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 5.11.2.2 y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 5.11.2.3 z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 5.12 CUdevprop_st Struct Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 5.12.1 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 5.12.2 Field Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 5.12.2.1 clockRate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 5.12.2.2 maxGridSize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 5.12.2.3 maxThreadsDim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 5.12.2.4 maxThreadsPerBlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 5.12.2.5 memPitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 5.12.2.6 regsPerBlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 5.12.2.7 sharedMemPerBlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 5.12.2.8 SIMDWidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 5.12.2.9 textureAlign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 5.12.2.10 totalConstantMemory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 5.13 surfaceReference Struct Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 5.13.1 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 5.13.2 Field Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 5.13.2.1 channelDesc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 5.14 textureReference Struct Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 5.14.1 Detailed Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 5.14.2 Field Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 5.14.2.1 addressMode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 5.14.2.2 channelDesc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 5.14.2.3 filterMode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 5.14.2.4 normalized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334

Generated for NVIDIA CUDA Library by Doxygen

Chapter 1

Deprecated List
Global cudaD3D9MapResources This function is deprecated as of Cuda 3.0.

Global cudaD3D9RegisterResource This function is deprecated as of Cuda 3.0.

Global cudaD3D9ResourceGetMappedArray This function is deprecated as of Cuda 3.0.

Global cudaD3D9ResourceGetMappedPitch This function is deprecated as of Cuda 3.0.

Global cudaD3D9ResourceGetMappedPointer This function is deprecated as of Cuda 3.0.

Global cudaD3D9ResourceGetMappedSize This function is deprecated as of Cuda 3.0.

Global cudaD3D9ResourceGetSurfaceDimensions This function is deprecated as of Cuda 3.0.

Global cudaD3D9ResourceSetMapFlags This function is deprecated as of Cuda 3.0.

Global cudaD3D9UnmapResources This function is deprecated as of Cuda 3.0.

Global cudaD3D9UnregisterResource This function is deprecated as of Cuda 3.0.

Global cudaD3D10MapResources This function is deprecated as of Cuda 3.0.

Global cudaD3D10RegisterResource This function is deprecated as of Cuda 3.0.

Global cudaD3D10ResourceGetMappedArray This function is deprecated as of Cuda 3.0.

2 Global cudaD3D10ResourceGetMappedPitch This function is deprecated as of Cuda 3.0.

Deprecated List

Global cudaD3D10ResourceGetMappedPointer This function is deprecated as of Cuda 3.0.

Global cudaD3D10ResourceGetMappedSize This function is deprecated as of Cuda 3.0.

Global cudaD3D10ResourceGetSurfaceDimensions This function is deprecated as of Cuda 3.0.

Global cudaD3D10ResourceSetMapFlags This function is deprecated as of Cuda 3.0.

Global cudaD3D10UnmapResources This function is deprecated as of Cuda 3.0.

Global cudaD3D10UnregisterResource This function is deprecated as of Cuda 3.0.

Global cudaGLMapBufferObject This function is deprecated as of Cuda 3.0.

Global cudaGLMapBufferObjectAsync This function is deprecated as of Cuda 3.0.

Global cudaGLRegisterBufferObject This function is deprecated as of Cuda 3.0.

Global cudaGLSetBufferObjectMapFlags This function is deprecated as of Cuda 3.0.

Global cudaGLUnmapBufferObject This function is deprecated as of Cuda 3.0.

Global cudaGLUnmapBufferObjectAsync This function is deprecated as of Cuda 3.0.

Global cudaGLUnregisterBufferObject This function is deprecated as of Cuda 3.0.

Global cudaErrorPriorLaunchFailure This error return is deprecated as of CUDA 3.1. Device emulation mode was removed with the CUDA 3.1 release.

Global cudaErrorAddressOfConstant This error return is deprecated as of CUDA 3.1. Variables in constant memory may now have their address taken by the runtime via cudaGetSymbolAddress().

Global cudaErrorTextureFetchFailed This error return is deprecated as of CUDA 3.1. Device emulation mode was removed with the CUDA 3.1 release.

Generated for NVIDIA CUDA Library by Doxygen

0.0. Global cuGLUnmapBufferObjectAsync This function is deprecated as of Cuda 3.1. Global cuGLSetBufferObjectMapFlags This function is deprecated as of Cuda 3.1 release. Global cudaErrorMemoryValueTooLarge This error return is deprecated as of CUDA 3.2. Global CUDA_ERROR_CONTEXT_ALREADY_CURRENT This error return is deprecated as of CUDA 3.0. Device emulation mode was removed with the CUDA 3. Device emulation mode was removed with the CUDA 3.1.1 release. Global cudaErrorCudartUnloading This error return is deprecated as of CUDA 3.1 release.3 Global cudaErrorTextureNotBound This error return is deprecated as of CUDA 3. Global cudaErrorSynchronizationError This error return is deprecated as of CUDA 3.1. Generated for NVIDIA CUDA Library by Doxygen . Global cuParamSetTexRef Global cuTexRefCreate Global cuTexRefDestroy Global cuGLInit This function is deprecated as of Cuda 3.1.0.0.2.1 release. It is no longer an error to attempt to push the active context via cuCtxPushCurrent(). Global cuGLMapBufferObjectAsync This function is deprecated as of Cuda 3. Device emulation mode was removed with the CUDA 3.0. Device emulation mode was removed with the CUDA 3. Global cuGLRegisterBufferObject This function is deprecated as of Cuda 3.0. Global cuGLUnmapBufferObject This function is deprecated as of Cuda 3. Global cuGLUnregisterBufferObject This function is deprecated as of Cuda 3. Global cudaErrorMixedDeviceExecution This error return is deprecated as of CUDA 3.0. Global cuGLMapBufferObject This function is deprecated as of Cuda 3.

0. Global cuD3D9RegisterResource This function is deprecated as of Cuda 3.0.0.0.4 Global cuD3D9MapResources This function is deprecated as of Cuda 3. Global cuD3D10ResourceGetMappedSize This function is deprecated as of Cuda 3.0. Global cuD3D10UnregisterResource This function is deprecated as of Cuda 3. Global cuD3D9UnregisterResource This function is deprecated as of Cuda 3.0. Global cuD3D9ResourceGetMappedPitch This function is deprecated as of Cuda 3.0.0. Global cuD3D9ResourceGetMappedSize This function is deprecated as of Cuda 3.0.0. Global cuD3D9ResourceGetMappedArray This function is deprecated as of Cuda 3. Global cuD3D10ResourceGetMappedPointer This function is deprecated as of Cuda 3.0. Global cuD3D9ResourceGetSurfaceDimensions This function is deprecated as of Cuda 3. Global cuD3D10ResourceGetSurfaceDimensions This function is deprecated as of Cuda 3. Global cuD3D10ResourceGetMappedArray This function is deprecated as of Cuda 3.0.0. Global cuD3D9ResourceGetMappedPointer This function is deprecated as of Cuda 3.0.0. Global cuD3D10RegisterResource This function is deprecated as of Cuda 3.0. Deprecated List Generated for NVIDIA CUDA Library by Doxygen .0.0.0. Global cuD3D10ResourceSetMapFlags This function is deprecated as of Cuda 3. Global cuD3D10ResourceGetMappedPitch This function is deprecated as of Cuda 3. Global cuD3D10UnmapResources This function is deprecated as of Cuda 3. Global cuD3D10MapResources This function is deprecated as of Cuda 3. Global cuD3D9UnmapResources This function is deprecated as of Cuda 3.0. Global cuD3D9ResourceSetMapFlags This function is deprecated as of Cuda 3.

. . . . . . . . . . . . . . Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Execution Control . . . . Error Handling . . . . . . . . . . . . . . . . . . . .1 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Texture Reference Management . . . VDPAU Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Version Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Chapter 2 Module Index 2. . . . . . . . . . . . . . . . . . . . Graphics Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stream Management . . . . . . . . . . . . . . . . . Initialization . . . . . . . . . . . Stream Management . . . . . . . . . . . . . . . . . . Version Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Device Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Context Management . . . . . . . . . . . . . C++ API Routines . . . . . . . . . . . . . . . Thread Management . . . Device Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Direct3D 10 Interoperability [DEPRECATED] . . . . . . . . . . . . . . . . . . . . Direct3D 9 Interoperability [DEPRECATED] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 11 15 17 23 26 30 34 66 130 70 114 75 122 80 84 87 91 96 98 99 112 135 147 148 165 166 167 173 182 188 230 233 Here is a list of all modules: CUDA Runtime API . . . . . . . . . . . . . . . . . . . . . Data types used by CUDA driver . . . . . . . . . . . . . . . . . . . . OpenGL Interoperability [DEPRECATED] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Direct3D 10 Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Direct3D 11 Interoperability . . . . . . . Event Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interactions with the CUDA Driver API . . . . . . . . . . . . . . . . . . . . OpenGL Interoperability . Surface Reference Management . . . . . . . . . . . . . . . . . . . . Direct3D 9 Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Module Management . . . . . . . . . . . . . . . . . . Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Event Management . . . . Data types used by CUDA Runtime . . . . . . . . . . . CUDA Driver API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Texture Reference Management [DEPRECATED] Surface Reference Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 244 245 253 255 257 262 265 271 277 286 292 301 307 Generated for NVIDIA CUDA Library by Doxygen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Direct3D 10 Interoperability [DEPRECATED] . Direct3D 9 Interoperability [DEPRECATED] . . VDPAU Interoperability . . . . . OpenGL Interoperability [DEPRECATED] . . . . . . . . . . . . . . . . . . . . . Graphics Interoperability . . . Direct3D 10 Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OpenGL Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Texture Reference Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . Direct3D 11 Interoperability . . . . . . . . . . . . . . . . . . . Module Index . . . . . . . . . . . . . . . . . . . Execution Control [DEPRECATED] . . . . . . . . . . . . . . . . Direct3D 9 Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Execution Control . . . . . . . . . . . . . . . .

. . cudaMemcpy3DParms . . . . . . . . . . . . . . cudaPitchedPtr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . surfaceReference . . . . cudaChannelFormatDesc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaPos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Chapter 3 Data Structure Index 3. . . . . . . . . . . . .1 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CUDA_MEMCPY2D_st . . . 311 313 314 316 319 320 324 325 327 329 330 331 333 334 Here are the data structures with brief descriptions: CUDA_ARRAY3D_DESCRIPTOR_st CUDA_ARRAY_DESCRIPTOR_st . . . . . textureReference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cudaDeviceProp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CUdevprop_st . . . . . cudaExtent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CUDA_MEMCPY3D_st . . . . . . . . . . . . . cudaFuncAttributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8 Data Structure Index Generated for NVIDIA CUDA Library by Doxygen .

1 CUDA Runtime API Modules • • • • • • • • • • • • • • • • • Thread Management Error Handling Device Management Stream Management Event Management Execution Control Memory Management OpenGL Interoperability Direct3D 9 Interoperability Direct3D 10 Interoperability Direct3D 11 Interoperability VDPAU Interoperability Graphics Interoperability Texture Reference Management Surface Reference Management Version Management C++ API Routines C++-style interface built on top of CUDA runtime API.Chapter 4 Module Documentation 4. • Data types used by CUDA Runtime Defines • #define CUDART_VERSION 3020 . • Interactions with the CUDA Driver API Interactions between the CUDA Driver API and the CUDA Runtime API.

2. The C++ API also has some CUDA-specific wrappers that wrap C API routines that deal with symbols. references and default arguments.1 Define Documentation #define CUDART_VERSION 3020 CUDA Runtime API Version 3.1.1. and device functions. For example.2 Generated for NVIDIA CUDA Library by Doxygen .h) is a C-style interface that does not require compiling with nvcc. These wrappers require the use of nvcc because they depend on code being generated by the compiler. 4. textures. the execution configuration syntax to invoke kernels is only available in source code compiled with nvcc. using overloading.10 Module Documentation 4.2 4.h) is a C++-style interface built on top of the C API. The C API (cuda_runtime_api.1 Detailed Description There are two levels for the runtime API. These wrappers can be used from C++ code and can be compiled with any C++ compiler. It wraps some of the C API routines.1. The C++ API (cuda_runtime.

Returns: cudaSuccess Note: Note that this function may also return error codes from previous. • cudaError_t cudaThreadGetLimit (size_t ∗pValue. Any subsequent API call reinitializes the runtime. • cudaError_t cudaThreadGetCacheConfig (enum cudaFuncCache ∗pCacheConfig) Returns the preferred cache configuration for the current host thread. See also: cudaThreadSynchronize Generated for NVIDIA CUDA Library by Doxygen . cudaThreadExit() is implicitly called on host thread exit. 4. • cudaError_t cudaThreadSetCacheConfig (enum cudaFuncCache cacheConfig) Sets the preferred cache configuration for the current host thread. size_t value) Set resource limits.2 Thread Management Functions • cudaError_t cudaThreadExit (void) Exit and clean up from CUDA launches.1 Function Documentation cudaError_t cudaThreadExit (void) Explicitly cleans up all runtime-related resources associated with the calling host thread.2 Thread Management 11 4.2.2. • cudaError_t cudaThreadSetLimit (enum cudaLimit limit.2. enum cudaLimit limit) Returns resource limits.2. asynchronous launches.2 4. • cudaError_t cudaThreadSynchronize (void) Wait for compute device to finish. 4.1 Detailed Description This section describes the thread management functions of the CUDA runtime application programming interface.4.

The supported cudaLimit values are: • cudaLimitStackSize: stack size of each GPU thread. The supported cache configurations are: • cudaFuncCachePreferNone: no preference for shared memory or L1 (default) • cudaFuncCachePreferShared: prefer larger shared memory and smaller L1 cache • cudaFuncCachePreferL1: prefer larger L1 cache and smaller shared memory Parameters: pCacheConfig . See also: cudaThreadSetLimit Generated for NVIDIA CUDA Library by Doxygen . asynchronous launches.2 Module Documentation cudaError_t cudaThreadGetCacheConfig (enum cudaFuncCache ∗ pCacheConfig) On devices where the L1 cache and shared memory use the same hardware resources. • cudaLimitPrintfFifoSize: size of the FIFO used by the printf() device system call. See also: cudaThreadSetCacheConfig. cudaFuncSetCacheConfig (C++ API) 4.2.3 cudaError_t cudaThreadGetLimit (size_t ∗ pValue. enum cudaLimit limit) Returns in ∗pValue the current size of limit. this returns through pCacheConfig the preferred cache configuration for the current host thread.Returned cache configuration Returns: cudaSuccess.2.2.Limit to query pValue . cudaErrorInitializationError Note: Note that this function may also return error codes from previous. asynchronous launches.12 4.2. cudaErrorInvalidValue Note: Note that this function may also return error codes from previous. Parameters: limit . The runtime will use the requested configuration if possible. This will return a pCacheConfig of cudaFuncCachePreferNone on devices where the size of the L1 cache and shared memory are fixed. but it is free to choose a different configuration if required to execute functions. cudaFuncSetCacheConfig (C API).Returned size in bytes of limit Returns: cudaSuccess. This is only a preference. • cudaLimitMallocHeapSize: size of the heap used by the malloc() and free() device system calls. cudaErrorUnsupportedLimit.

this sets through cacheConfig the preferred cache configuration for the current host thread. The runtime will use the requested configuration if possible. • cudaLimitPrintfFifoSize controls the size of the FIFO used by the printf() device system call. This is only a preference.2. cudaErrorInitializationError Note: Note that this function may also return error codes from previous. cudaFuncSetCacheConfig (C API). Attempting to set this limit on devices of compute capability less than 2.5 cudaError_t cudaThreadSetLimit (enum cudaLimit limit. rounding up to nearest element size. Attempting to set this limit on devices of compute capability less than 2. cudaFuncSetCacheConfig (C++ API) 4. Generated for NVIDIA CUDA Library by Doxygen .0 and higher. The driver is free to modify the requested value to meet h/w requirements (this could be clamping to minimum or maximum values. This limit is only applicable to devices of compute capability 2. Setting each cudaLimit has its own specific restrictions.0 will result in the error cudaErrorUnsupportedLimit being returned. This limit is only applicable to devices of compute capability 2.2. The supported cache configurations are: • cudaFuncCachePreferNone: no preference for shared memory or L1 (default) • cudaFuncCachePreferShared: prefer larger shared memory and smaller L1 cache • cudaFuncCachePreferL1: prefer larger L1 cache and smaller shared memory Parameters: cacheConfig .0 will result in the error cudaErrorUnsupportedLimit being returned. Setting the thread-wide cache configuration to cudaFuncCachePreferNone will cause subsequent kernel launches to prefer to not change the cache configuration unless required to launch the kernel. so each is discussed here.2 Thread Management 4. The application can use cudaThreadGetLimit() to find out exactly what the limit has been set to. size_t value) Setting limit to value is a request by the application to update the current limit maintained by the thread.4 cudaError_t cudaThreadSetCacheConfig (enum cudaFuncCache cacheConfig) 13 On devices where the L1 cache and shared memory use the same hardware resources. Any function preference set via cudaFuncSetCacheConfig (C API) or cudaFuncSetCacheConfig (C++ API) will be preferred over this thread-wide setting. Setting cudaLimitPrintfFifoSize must be performed before launching any kernel that uses the printf() device system call.4. See also: cudaThreadGetCacheConfig. otherwise cudaErrorInvalidValue will be returned. This setting does nothing on devices where the size of the L1 cache and shared memory are fixed. • cudaLimitStackSize controls the stack size of each GPU thread. etc). but it is free to choose a different configuration if required to execute the function.2.0 and higher.2.Requested cache configuration Returns: cudaSuccess. asynchronous launches. Launching a kernel with a different preference than the most recent preference setting may insert a device-side synchronization point.

asynchronous launches. Returns: cudaSuccess Note: Note that this function may also return error codes from previous. Attempting to set this limit on devices of compute capability less than 2.0 will result in the error cudaErrorUnsupportedLimit being returned.Limit to set value . See also: cudaThreadGetLimit 4. cudaThreadSynchronize() returns an error if one of the preceding tasks has failed.Size in bytes of limit Returns: cudaSuccess. otherwise cudaErrorInvalidValue will be returned. cudaErrorUnsupportedLimit. See also: cudaThreadExit Generated for NVIDIA CUDA Library by Doxygen . Setting cudaLimitMallocHeapSize must be performed before launching any kernel that uses the malloc() or free() device system calls. If the cudaDeviceBlockingSync flag was set for this device.6 cudaError_t cudaThreadSynchronize (void) Blocks until the device has completed all preceding requested tasks.14 Module Documentation • cudaLimitMallocHeapSize controls the size of the heap used by the malloc() and free() device system calls.0 and higher. cudaErrorInvalidValue Note: Note that this function may also return error codes from previous. Parameters: limit . This limit is only applicable to devices of compute capability 2.2.2. the host thread will block until the device has finished its work. asynchronous launches.

cudaErrorInvalidDeviceFunction. cudaErrorInvalidResourceHandle. cudaErrorNotYetImplemented.Error code to convert to string Returns: char∗ pointer to a NULL-terminated string See also: cudaGetLastError. 4. Parameters: error .3 Error Handling Functions • const char ∗ cudaGetErrorString (cudaError_t error) Returns the message string from an error code. Returns: cudaSuccess. cudaErrorInvalidDevicePointer.3. cudaErrorInvalidChannelDescriptor. cudaErrorStartupFailure. cudaErrorInvalidSymbol. • cudaError_t cudaGetLastError (void) Returns the last error from a runtime call. cudaErrorSetOnActiveProcess.3.4. cudaErrorInitializationError.2 cudaError_t cudaGetLastError (void) Returns the last error that has been produced by any of the runtime calls in the same host thread and resets it to cudaSuccess. cudaErrorInvalidFilterSetting. • cudaError_t cudaPeekAtLastError (void) Returns the last error from a runtime call. cudaErrorApiFailureBase Generated for NVIDIA CUDA Library by Doxygen . cudaErrorMemoryAllocation. cudaErrorInvalidTextureBinding. cudaErrorInvalidHostPointer. cudaErrorInvalidPitchValue.1 Detailed Description This section describes the error handling functions of the CUDA runtime application programming interface. cudaErrorMissingConfiguration. cudaErrorInvalidValue. cudaErrorLaunchTimeout. cudaErrorInvalidDevice.2 4. cudaErrorUnmapBufferObjectFailed. cudaErrorInvalidMemcpyDirection. cudaPeekAtLastError. cudaErrorUnknown. cudaErrorInsufficientDriver.3. cudaErrorLaunchFailure. cudaErrorLaunchOutOfResources.2.2.1 Function Documentation const char∗ cudaGetErrorString (cudaError_t error) Returns the message string from an error code. cudaErrorInvalidNormSetting. cudaErrorInvalidTexture. cudaErrorInvalidConfiguration.3 Error Handling 15 4. 4. cudaError 4.3.

See also: cudaPeekAtLastError. cudaErrorSetOnActiveProcess. cudaErrorInvalidDevice.2. cudaErrorInvalidNormSetting. cudaErrorInvalidTextureBinding. cudaErrorLaunchTimeout. cudaErrorInvalidConfiguration. cudaErrorInvalidSymbol. cudaGetErrorString. cudaErrorInvalidPitchValue. cudaErrorInvalidChannelDescriptor. cudaErrorLaunchOutOfResources. cudaErrorApiFailureBase Note: Note that this function may also return error codes from previous. cudaErrorUnmapBufferObjectFailed. See also: cudaGetLastError. cudaErrorInvalidValue. asynchronous launches. cudaErrorInitializationError. cudaErrorInvalidResourceHandle. cudaError Generated for NVIDIA CUDA Library by Doxygen .3 cudaError_t cudaPeekAtLastError (void) Returns the last error that has been produced by any of the runtime calls in the same host thread. cudaErrorInvalidFilterSetting. cudaErrorStartupFailure. cudaErrorLaunchFailure. cudaErrorNotYetImplemented. cudaErrorInvalidDevicePointer. cudaErrorUnknown.3. Returns: cudaSuccess. cudaErrorMissingConfiguration. cudaErrorInvalidTexture. cudaErrorInvalidHostPointer. cudaErrorMemoryAllocation. cudaError 4. cudaErrorInsufficientDriver. cudaGetErrorString. cudaErrorInvalidMemcpyDirection. asynchronous launches. Note that this call does not reset the error to cudaSuccess like cudaGetLastError(). cudaErrorInvalidDeviceFunction.16 Note: Module Documentation Note that this function may also return error codes from previous.

int device) Returns information about the compute-device. int len) Set a list of devices that can be used for CUDA. • cudaError_t cudaSetDevice (int device) Set device to be used for GPU executions. cudaGetDevice. cudaSetDevice.4 Device Management Functions • cudaError_t cudaChooseDevice (int ∗device.2 4. • cudaError_t cudaGetDevice (int ∗device) Returns which device is currently being used. • cudaError_t cudaGetDeviceCount (int ∗count) Returns the number of compute-capable devices. 4. • cudaError_t cudaSetDeviceFlags (unsigned int flags) Sets flags to be used for device executions. asynchronous launches.4. const struct cudaDeviceProp ∗ prop) Returns in ∗device the device which has properties that best match ∗prop.1 Function Documentation cudaError_t cudaChooseDevice (int ∗ device. cudaGetDeviceProperties Generated for NVIDIA CUDA Library by Doxygen .4. const struct cudaDeviceProp ∗prop) Select compute-device which best matches criteria. See also: cudaGetDeviceCount.Desired device properties Returns: cudaSuccess. • cudaError_t cudaGetDeviceProperties (struct cudaDeviceProp ∗prop.4.Device with best match prop .4 Device Management 17 4. 4. • cudaError_t cudaSetValidDevices (int ∗device_arr.2.4. cudaErrorInvalidValue Note: Note that this function may also return error codes from previous.1 Detailed Description This section describes the device management functions of the CUDA runtime application programming interface. Parameters: device .

See also: cudaGetDeviceCount. If there is no such device. Parameters: count .2 cudaError_t cudaGetDevice (int ∗ device) Module Documentation Returns in ∗device the device on which the active host thread executes the device code. cudaGetDeviceProperties.0 that are available for execution.2. The cudaDeviceProp structure is defined as: struct cudaDeviceProp { char name[256]. this device will report major and minor compute capability versions of 9999.Returns the device on which the active host thread executes the device code. Parameters: device . int maxThreadsDim[3].4. size_t sharedMemPerBlock. int maxThreadsPerBlock. cudaChooseDevice 4. int regsPerBlock.4.Returns the number of devices with compute capability greater or equal to 1. cudaGetDeviceProperties. asynchronous launches. Generated for NVIDIA CUDA Library by Doxygen .18 4. cudaSetDevice.4. Since this device will be able to emulate all hardware features. size_t totalGlobalMem.0 Returns: cudaSuccess Note: Note that this function may also return error codes from previous.2. int maxGridSize[3]. asynchronous launches.3 cudaError_t cudaGetDeviceCount (int ∗ count) Returns in ∗count the number of devices with compute capability greater or equal to 1. cudaGetDeviceCount() returns 1 and device 0 only supports device emulation mode. cudaChooseDevice 4. int device) Returns in ∗prop the properties of device dev.2. size_t memPitch. int warpSize.4 cudaError_t cudaGetDeviceProperties (struct cudaDeviceProp ∗ prop. cudaSetDevice. See also: cudaGetDevice. Returns: cudaSuccess Note: Note that this function may also return error codes from previous.

• maxGridSize[3] contains the maximum size of each dimension of a grid. • regsPerBlock is the maximum number of 32-bit registers available to a thread block. int deviceOverlap. this amount is shared by all thread blocks simultaneously resident on a multiprocessor. minor are the major and minor revision numbers defining the device’s compute capability. • maxThreadsPerBlock is the maximum number of threads per block. } 19 where: • name[256] is an ASCII string identifying the device. int concurrentKernels. Available modes are as follows: Generated for NVIDIA CUDA Library by Doxygen . • clockRate is the clock frequency in kilohertz. int integrated. • major. size_t textureAlignment. or 0 if not. • canMapHostMemory is 1 if the device can map host memory into the CUDA address space for use with cudaHostAlloc()/cudaHostGetDevicePointer(). this number is shared by all thread blocks simultaneously resident on a multiprocessor. int clockRate. • kernelExecTimeoutEnabled is 1 if there is a run time limit for kernels executed on the device. or 0 if not.4. • multiProcessorCount is the number of multiprocessors on the device. int pciBusID. int kernelExecTimeoutEnabled. int computeMode. • warpSize is the warp size in threads. • totalConstMem is the total amount of constant memory available on the device in bytes. int minor. • integrated is 1 if the device is an integrated (motherboard) GPU and 0 if it is a discrete (card) component.4 Device Management size_t totalConstMem. int pciDeviceID. • textureAlignment is the alignment requirement. texture base addresses that are aligned to textureAlignment bytes do not need an offset applied to texture fetches. • deviceOverlap is 1 if the device can concurrently copy memory between host and device while executing a kernel. • maxThreadsDim[3] contains the maximum size of each dimension of a block. int tccDriver. int multiProcessorCount. or 0 if not. int major. • memPitch is the maximum pitch in bytes allowed by the memory copy functions that involve memory regions allocated through cudaMallocPitch(). • sharedMemPerBlock is the maximum amount of shared memory available to a thread block in bytes. int ECCEnabled. • computeMode is the compute mode that the device is currently in. • totalGlobalMem is the total amount of global memory available on the device in bytes. int canMapHostMemory.

cudaSetDevice. cudaGetDevice. cudaErrorSetOnActiveProcess Note: Note that this function may also return error codes from previous. • pciDeviceID is the PCI device (sometimes called slot) identifier of the device. cudaErrorInvalidDevice. Returns: cudaSuccess. cudaChooseDevice Generated for NVIDIA CUDA Library by Doxygen .4. • concurrentKernels is 1 if the device supports executing multiple kernels within the same context simultaneously. – cudaComputeModeProhibited: Compute-prohibited mode .Device is not restricted and multiple threads can use cudaSetDevice() with this device. – cudaComputeModeExclusive: Compute-exclusive mode . cudaErrorNoDevice will be returned. cudaErrorInvalidDevice See also: cudaGetDeviceCount.20 Module Documentation – cudaComputeModeDefault: Default mode .No threads can use cudaSetDevice() with this device. then this call returns cudaErrorSetOnActiveProcess.Properties for the specified device device . Parameters: device . See also: cudaGetDeviceCount. or 0 if not.Device number to get properties for Returns: cudaSuccess.5 cudaError_t cudaSetDevice (int device) Records device as the device on which the active host thread executes the device code. cudaChooseDevice 4. Any errors from calling cudaSetDevice() with an exclusive (and occupied) or prohibited device will only show up after a non-device management runtime function is called. At that time. • tccDriver is 1 if the device is using a TCC driver or 0 if not. asynchronous launches.2. • pciBusID is the PCI bus identifier of the device. Parameters: prop .Only one thread will be able to use cudaSetDevice() with this device.Device on which the active host thread should execute the device code. cudaGetDeviceProperties. cudaGetDevice. or 0 if not. It is not guaranteed that multiple kernels will be resident on the device concurrently so this feature should not be relied upon for correctness. If the host thread has already initialized the CUDA runtime by calling non-device management runtime functions or if there exists a CUDA driver context active on the host thread. • ECCEnabled is 1 if the device has ECC support turned on.

List of devices to try Generated for NVIDIA CUDA Library by Doxygen . Parameters: flags . This can increase latency when waiting for the device. If the host thread has already initialized the CUDA runtime by calling non-device management runtime functions or if there exists a CUDA driver context active on the host thread. If this function is not called. cudaChooseDevice 4.2. cudaHostGetDevicePointer() will always return a failure code. If len is not 0 and device_arr is NULL or if len exceeds the number of devices in the system. cudaErrorSetOnActiveProcess See also: cudaGetDeviceCount. int len) Sets a list of devices for CUDA execution in priority order using device_arr.4. cudaErrorInvalidDevice. • cudaDeviceScheduleAuto: The default value if the flags parameter is zero. Parameters: device_arr . • cudaDeviceLmemResizeToMax: Instruct CUDA to not reduce local memory after resizing local memory for a kernel. This can decrease latency when waiting for the device. • cudaDeviceBlockingSync: Instruct CUDA to block the CPU thread on a synchronization primitive when waiting for the device to finish work. or if it is called with a len of 0. • cudaDeviceScheduleYield: Instruct CUDA to yield its thread when waiting for results from the device. If C > P. then this call returns cudaErrorSetOnActiveProcess. cudaSetDevice.4. CUDA will try devices from the list sequentially until it finds one that works. If a specified device ID in the list does not exist.4. This can prevent thrashing by local memory allocations when launching many kernels with high local memory usage at the cost of potentially increased memory usage.7 cudaError_t cudaSetValidDevices (int ∗ device_arr.4 Device Management 4. then CUDA will yield to other OS threads when waiting for the device. cudaSetValidDevices. uses a heuristic based on the number of active CUDA contexts in the process C and the number of logical processors in the system P. otherwise CUDA will not yield while waiting for results and actively spin on the processor. • cudaDeviceScheduleSpin: Instruct CUDA to actively spin when waiting for results from the device. • cudaDeviceMapHost: This flag must be set in order to allocate pinned host memory that is accessible to the device. The two LSBs of the flags parameter can be used to control how the CPU thread interacts with the OS scheduler when waiting for results from the device. The parameter len specifies the number of elements in the list. this function will return cudaErrorInvalidDevice. but may lower the performance of CPU threads if they are performing work in parallel with the CUDA thread.6 cudaError_t cudaSetDeviceFlags (unsigned int flags) 21 Records flags as the flags to use when the active host thread executes device code. cudaGetDevice. but can increase the performance of CPU threads performing work in parallel with the device. cudaGetDeviceProperties. then cudaErrorInvalidValue is returned. If this flag is not set.2.Parameters for device operation Returns: cudaSuccess. then CUDA will go back to its default behavior of trying devices sequentially from a default list containing all of the available CUDA devices in the system.

cudaChooseDevice Generated for NVIDIA CUDA Library by Doxygen . asynchronous launches.Number of devices in specified list Returns: cudaSuccess.22 len . cudaErrorInvalidDevice Note: Module Documentation Note that this function may also return error codes from previous. cudaSetDevice. cudaSetDeviceFlags. See also: cudaGetDeviceCount. cudaGetDeviceProperties. cudaErrorInvalidValue.

5. cudaStreamWaitEvent. • cudaError_t cudaStreamQuery (cudaStream_t stream) Queries an asynchronous stream for completion status. unsigned int flags) Make a compute stream wait on an event.2. • cudaError_t cudaStreamWaitEvent (cudaStream_t stream. cudaEvent_t event. cudaStreamDestroy 4. • cudaError_t cudaStreamDestroy (cudaStream_t stream) Destroys and cleans up an asynchronous stream. cudaErrorInvalidValue Note: Note that this function may also return error codes from previous.5.1 Function Documentation cudaError_t cudaStreamCreate (cudaStream_t ∗ pStream) Creates a new asynchronous stream.5.4. asynchronous launches. 4.5 Stream Management Functions • cudaError_t cudaStreamCreate (cudaStream_t ∗pStream) Create an asynchronous stream. Generated for NVIDIA CUDA Library by Doxygen .5 Stream Management 23 4. 4. cudaStreamSynchronize. Parameters: pStream . See also: cudaStreamQuery.Pointer to new stream identifier Returns: cudaSuccess.2 4. • cudaError_t cudaStreamSynchronize (cudaStream_t stream) Waits for stream tasks to complete.2 cudaError_t cudaStreamDestroy (cudaStream_t stream) Destroys and cleans up the asynchronous stream specified by stream.5.2.1 Detailed Description This section describes the stream management functions of the CUDA runtime application programming interface.

5. cudaStreamSynchronize 4.2. cudaStreamDestroy Generated for NVIDIA CUDA Library by Doxygen . the host thread will block until the stream is finished with all of its tasks.4 cudaError_t cudaStreamSynchronize (cudaStream_t stream) Blocks until stream has completed all operations. cudaStreamWaitEvent. asynchronous launches. cudaStreamQuery.24 Parameters: stream .Stream identifier Returns: cudaSuccess.2. cudaStreamDestroy 4. cudaErrorInvalidResourceHandle Note: Module Documentation Note that this function may also return error codes from previous. cudaStreamWaitEvent. cudaStreamQuery.Stream identifier Returns: cudaSuccess. cudaErrorInvalidResourceHandle Note: Note that this function may also return error codes from previous. cudaErrorNotReady cudaErrorInvalidResourceHandle Note: Note that this function may also return error codes from previous. asynchronous launches. Parameters: stream .Stream identifier Returns: cudaSuccess. Parameters: stream . cudaStreamSynchronize. See also: cudaStreamCreate. cudaStreamWaitEvent. See also: cudaStreamCreate. If the cudaDeviceBlockingSync flag was set for this device.3 cudaError_t cudaStreamQuery (cudaStream_t stream) Returns cudaSuccess if all operations in stream have completed. See also: cudaStreamCreate.5. asynchronous launches. or cudaErrorNotReady if not.

cudaStreamDestroy Generated for NVIDIA CUDA Library by Doxygen .Event to wait on flags . cudaStreamQuery. this call acts as if the record has already completed. unsigned int flags) 25 Makes all future work submitted to stream wait until event reports completion before beginning execution. This synchronization will be performed efficiently on the device. and the subsequent calls will not have any effect on stream. cudaErrorInvalidResourceHandle Note: Note that this function may also return error codes from previous. any future work submitted in any stream will wait for event to complete before beginning execution. and so is a functional no-op. cudaEvent_t event.5 Stream Management 4.5 cudaError_t cudaStreamWaitEvent (cudaStream_t stream. Parameters: stream . If cudaEventRecord() has not been called on event.Stream to wait event . The stream stream will wait only for the completion of the most recent host call to cudaEventRecord() on event. This effectively creates a barrier for all future work submitted to the device on this thread.Parameters for the operation (must be 0) Returns: cudaSuccess. any functions (including cudaEventRecord() and cudaEventDestroy()) may be called on event again. See also: cudaStreamCreate.2. cudaStreamSynchronize. asynchronous launches.4. If stream is NULL.5. Once this call has returned.

unsigned int flags) Creates an event object with the specified flags. cudaStream_t stream=0) Records an event. cudaEventQuery. cudaEvent_t start. cudaEventElapsedTime.6. cudaEventSynchronize. • cudaError_t cudaEventQuery (cudaEvent_t event) Queries an event’s status. asynchronous launches.Newly created event Returns: cudaSuccess.6. cudaErrorMemoryAllocation Note: Note that this function may also return error codes from previous. cudaStreamWaitEvent Generated for NVIDIA CUDA Library by Doxygen . • cudaError_t cudaEventCreateWithFlags (cudaEvent_t ∗event. cudaEventDestroy.2. 4. cudaErrorInvalidValue.1 Function Documentation cudaError_t cudaEventCreate (cudaEvent_t ∗ event) Creates an event object using cudaEventDefault. • cudaError_t cudaEventSynchronize (cudaEvent_t event) Waits for an event to complete.6 Event Management Functions • cudaError_t cudaEventCreate (cudaEvent_t ∗event) Creates an event object. cudaErrorLaunchFailure. cudaEventCreateWithFlags. See also: cudaEventCreate (C++ API). Parameters: event .26 Module Documentation 4. cudaEvent_t end) Computes the elapsed time between events. cudaEventRecord. • cudaError_t cudaEventRecord (cudaEvent_t event.6. cudaErrorInitializationError. • cudaError_t cudaEventDestroy (cudaEvent_t event) Destroys an event object.1 Detailed Description This section describes the event management functions of the CUDA runtime application programming interface. • cudaError_t cudaEventElapsedTime (float ∗ms. 4.2 4.

2.4. unsigned int flags) 27 Creates an event object with the specified flags. Valid flags include: • cudaEventDefault: Default event creation flag. Parameters: event . Events created with this flag specified and the cudaEventBlockingSync flag not specified will provide the best performance when used with cudaStreamWaitEvent() and cudaEventQuery(). asynchronous launches.6. cudaEventElapsedTime Generated for NVIDIA CUDA Library by Doxygen .6 Event Management 4. asynchronous launches.Event to destroy Returns: cudaSuccess.2. cudaEventSynchronize.3 cudaError_t cudaEventDestroy (cudaEvent_t event) Destroys the event specified by event. See also: cudaEventCreate (C API).6. cudaEventDestroy. cudaErrorInitializationError. Parameters: event .Flags for new event Returns: cudaSuccess. cudaEventQuery. cudaErrorInvalidValue. See also: cudaEventCreate (C API). cudaErrorMemoryAllocation Note: Note that this function may also return error codes from previous. cudaEventSynchronize.Newly created event flags . cudaErrorLaunchFailure. cudaStreamWaitEvent 4.2 cudaError_t cudaEventCreateWithFlags (cudaEvent_t ∗ event. cudaEventRecord. cudaErrorInitializationError. cudaErrorInvalidValue. cudaErrorLaunchFailure Note: Note that this function may also return error codes from previous. cudaEventElapsedTime. cudaEventCreateWithFlags. • cudaEventDisableTiming: Specifies that the created event does not need to record timing data. A host thread that uses cudaEventSynchronize() to wait on an event created with this flag will block until the event actually completes. • cudaEventBlockingSync: Specifies that event should use blocking synchronization.

cudaErrorLaunchFailure Note: Note that this function may also return error codes from previous.6. cudaErrorInvalidValue.Starting event end .5 cudaError_t cudaEventQuery (cudaEvent_t event) Query the status of all device work preceding the most recent call to cudaEventRecord() (in the appropriate compute streams.2. cudaEventElapsedTime cudaErrorInvalidValue. cudaErrorIn- 4. Parameters: ms . If cudaEventRecord() has not been called on either event. cudaEventDestroy. cudaEventSynchronize. validResourceHandle. cudaEventCreateWithFlags. cudaErrorNotReady. cudaErrorLaunchFailure Note: Note that this function may also return error codes from previous. cudaErrorNotReady is returned. asynchronous launches. Any number of other different stream operations could execute in between the two measured events. cudaEventQuery. If either event was last recorded in a non-NULL stream. cudaErrorInitializationError.Event to query Returns: cudaSuccess. See also: cudaEventCreate (C API). then this function will return cudaErrorInvalidResourceHandle. the resulting time may be greater than expected (even if both used the same stream handle). thus altering the timing in a significant way. cudaEventRecord. or if cudaEventRecord() has not been called on event. cudaErrorIn- Generated for NVIDIA CUDA Library by Doxygen . If this work has successfully been completed by the device.2. as specified by the arguments to cudaEventRecord()). cudaEventRecord cudaErrorInitializationError. then cudaErrorInvalidResourceHandle is returned. cudaEventQuery() would return cudaErrorNotReady on at least one of the events).5 microseconds). Parameters: event . cudaEvent_t end) Computes the elapsed time between two events (in milliseconds with a resolution of around 0. then cudaSuccess is returned.Time between start and end in ms start . cudaEventSynchronize. validResourceHandle. See also: cudaEventCreate (C API).6. If cudaEventRecord() has been called on both events but one or both of them has not yet been completed (that is.4 Module Documentation cudaError_t cudaEventElapsedTime (float ∗ ms. cudaEventDestroy.28 4. asynchronous launches. cudaEventCreateWithFlags. cudaErrorNotReady. This happens because the cudaEventRecord() operation takes place asynchronously and there is no guarantee that the measured latency is actually just between the two events. If either event was created with the cudaEventDisableTiming flag. If this work has not yet been completed by the device then cudaErrorNotReady is returned. cudaEvent_t start.Ending event Returns: cudaSuccess.

Parameters: event .Stream in which to record event Returns: cudaSuccess. Waiting for an event that was created with the cudaEventBlockingSync flag will cause the calling CPU thread to block until the event has been completed by the device. cudaSuccess is returned immediately. the event is recorded after all preceding operations in stream have been completed. then this call will overwrite any existing state in event. cudaErrorInvalidResourceHandle.Event to record stream . asynchronous launches.6. cudaErrorLaunchFailure Note: Note that this function may also return error codes from previous. cudaEventElapsedTime Generated for NVIDIA CUDA Library by Doxygen . cudaErrorInvalidResourceHandle. cudaEventDestroy. cudaErrorInvalidValue. Parameters: event . See also: cudaEventCreate (C API).6.2. cudaStreamWaitEvent 4. See also: cudaEventCreate (C API). cudaEventQuery.4. If cudaEventRecord() has previously been called on event.2. cudaEventSynchronize. cudaEventCreateWithFlags. as specified by the arguments to cudaEventRecord()). it is recorded after all preceding operations in the CUDA context have been completed. cudaStream_t stream = 0) 29 Records an event. cudaErrorInitializationError. If stream is non-zero. If the cudaEventBlockingSync flag has not been set. Since operation is asynchronous. cudaEventQuery. cudaErrorLaunchFailure Note: Note that this function may also return error codes from previous. cudaEventQuery() and/or cudaEventSynchronize() must be used to determine when the event has actually been recorded. cudaEventElapsedTime. asynchronous launches. If cudaEventRecord() has not been called on event. then the CPU thread will busy-wait until the event has been completed by the device.6 Event Management 4. otherwise.6 cudaError_t cudaEventRecord (cudaEvent_t event.Event to wait for Returns: cudaSuccess.7 cudaError_t cudaEventSynchronize (cudaEvent_t event) Wait until the completion of all device work preceding the most recent call to cudaEventRecord() (in the appropriate compute streams. cudaEventDestroy. cudaErrorInitializationError. cudaEventRecord. cudaEventCreateWithFlags. cudaErrorInvalidValue. Any subsequent calls which examine the status of event will only examine the completion of this most recent call to cudaEventRecord().

together with any arguments for the call. size_t size. dim3 blockDim. Parameters: gridDim . cudaStream_t stream = 0) Specifies the grid and block dimensions for the device call to be executed similar to the execution configuration syntax.1 Function Documentation cudaError_t cudaConfigureCall (dim3 gridDim.Stream identifier Returns: cudaSuccess. size_t sharedMem=0. size_t offset) Configure a device launch. This data contains the dimension for the grid and thread blocks. dim3 blockDim. const char ∗func) Find out attributes for a given function.2.Block dimensions sharedMem .7. • cudaError_t cudaFuncSetCacheConfig (const char ∗func. • cudaError_t cudaSetupArgument (const void ∗arg. Each call pushes data on top of an execution stack. • cudaError_t cudaSetDoubleForDevice (double ∗d) Converts a double argument to be executed on a device. • cudaError_t cudaLaunch (const char ∗entry) Launches a device function. cudaConfigureCall() is stack based. cudaStream_t stream=0) Configure a device-launch. • cudaError_t cudaFuncGetAttributes (struct cudaFuncAttributes ∗attr.2 4. size_t sharedMem = 0. cudaErrorInvalidConfiguration Generated for NVIDIA CUDA Library by Doxygen . 4.7 Execution Control Functions • cudaError_t cudaConfigureCall (dim3 gridDim.7.Grid dimensions blockDim . enum cudaFuncCache cacheConfig) Sets the preferred cache configuration for a device function.7. • cudaError_t cudaSetDoubleForHost (double ∗d) Converts a double argument after execution on a device.30 Module Documentation 4.1 Detailed Description This section describes the execution control functions of the CUDA runtime application programming interface.Shared memory stream . 4.

but it is free to choose a different configuration if required to execute func. cudaSetupArgument (C API). this sets through cacheConfig the preferred cache configuration for the function specified via func.7. cudaFuncSetCacheConfig (C API).7. The runtime will use the requested configuration if possible. cudaFuncGetAttributes (C++ API). cudaSetupArgument (C API) 4.3 cudaError_t cudaFuncSetCacheConfig (const char ∗ func.2 cudaError_t cudaFuncGetAttributes (struct cudaFuncAttributes ∗ attr. then cudaErrorInvalidDeviceFunction is returned. The supported cache configurations are: • cudaFuncCachePreferNone: no preference for shared memory or L1 (default) • cudaFuncCachePreferShared: prefer larger shared memory and smaller L1 cache Generated for NVIDIA CUDA Library by Doxygen .2. cudaLaunch (C API). If the specified function does not exist. Parameters: attr . See also: cudaConfigureCall. which is a character string that specifies the fully-decorated (C++) name for a function that executes on the device. asynchronous launches. cudaSetDoubleForDevice. This setting does nothing on devices where the size of the L1 cache and shared memory are fixed.2. If the specified function does not exist. cudaSetDoubleForDevice. Note that some function attributes such as maxThreadsPerBlock may vary based on the device that is currently being used. cudaSetDoubleForHost. 4. then cudaErrorInvalidDeviceFunction is returned. See also: 31 cudaFuncSetCacheConfig (C API). enum cudaFuncCache cacheConfig) On devices where the L1 cache and shared memory use the same hardware resources. asynchronous launches. Launching a kernel with a different preference than the most recent preference setting may insert a device-side synchronization point. cudaErrorInvalidDeviceFunction Note: Note that this function may also return error codes from previous.7 Execution Control Note: Note that this function may also return error codes from previous. The parameter specified by func must be declared as a __global__ function. cudaSetDoubleForHost. The parameter specified by func must be declared as a __global__ function. cudaFuncGetAttributes (C API). The fetched attributes are placed in attr. func is a character string that specifies the fully-decorated (C++) name for a function that executes on the device.Return pointer to function’s attributes func . This is only a preference.Function to get attributes of Returns: cudaSuccess. cudaLaunch (C API). const char ∗ func) This function obtains the attributes of a function specified via func.4. cudaErrorInitializationError.

The parameter specified by entry must be declared as a __global__ function. cudaSetDoubleForHost. cudaLaunch() must be preceded by a call to cudaConfigureCall() since it pops the data that was pushed by cudaConfigureCall() from the execution stack.2. cudaErrorLaunchOutOfResources. The parameter entry must be a character string naming a function that executes on the device. See also: cudaConfigureCall.5 cudaError_t cudaSetDoubleForDevice (double ∗ d) Parameters: d . cudaFuncGetAttributes (C API). cudaSetupArgument (C API).Device char string naming device function to execute Returns: cudaSuccess. If the device does natively support doubles. cudaLaunch (C API). cudaErrorSharedObjectInitFailed Note: Note that this function may also return error codes from previous. cudaLaunch (C++ API).Char string naming device function cacheConfig .7.2.7. cudaSetDoubleForHost. See also: cudaConfigureCall. cudaFuncSetCacheConfig (C API). Generated for NVIDIA CUDA Library by Doxygen . then this function does nothing. cudaErrorInvalidConfiguration. asynchronous launches. asynchronous launches. cudaThreadGetCacheConfig.Requested cache configuration Returns: cudaSuccess. Parameters: entry .Double to convert Converts the double value of d to an internal float representation if the device does not support double arithmetic. cudaErrorLaunchFailure. cudaSetupArgument (C API). cudaErrorInitializationError. cudaErrorInvalidDeviceFunction Note: Module Documentation Note that this function may also return error codes from previous.32 • cudaFuncCachePreferL1: prefer larger L1 cache and smaller shared memory Parameters: func . cudaThreadSetCacheConfig 4.4 cudaError_t cudaLaunch (const char ∗ entry) Launches the function entry on the device. cudaThreadSetCacheConfig 4. cudaThreadGetCacheConfig. cudaSetDoubleForDevice. cudaFuncGetAttributes (C API). cudaErrorInvalidDeviceFunction. cudaErrorLaunchTimeout. cudaSetDoubleForDevice. cudaFuncSetCacheConfig (C++ API).

asynchronous launches.7.7 cudaError_t cudaSetupArgument (const void ∗ arg. size_t size. cudaSetupArgument (C API) 4. The arguments are stored in the top of the execution stack. cudaFuncSetCacheConfig (C API). cudaSetupArgument (C++ API).2. See also: cudaConfigureCall. Parameters: arg . cudaSetupArgument (C API) 4. cudaFuncGetAttributes (C API). cudaSetDoubleForDevice. cudaSetDoubleForHost. Parameters: d . cudaSetDoubleForDevice.7. Generated for NVIDIA CUDA Library by Doxygen . cudaFuncSetCacheConfig (C API). size_t offset) Pushes size bytes of the argument pointed to by arg at offset bytes from the start of the parameter passing area. asynchronous launches.Double to convert Returns: cudaSuccess Note: Note that this function may also return error codes from previous. which starts at offset 0.Argument to push for a kernel launch size . See also: cudaConfigureCall. cudaLaunch (C API).2.6 cudaError_t cudaSetDoubleForHost (double ∗ d) Converts the double value of d from a potentially internal float representation if the device does not support double arithmetic. cudaFuncGetAttributes (C API).Offset in argument stack to push new arg Returns: cudaSuccess Note: Note that this function may also return error codes from previous.Size of argument offset . asynchronous launches. cudaSetupArgument() must be preceded by a call to cudaConfigureCall(). cudaLaunch (C API). cudaLaunch (C API). then this function does nothing.4.7 Execution Control Returns: cudaSuccess Note: Note that this function may also return error codes from previous. See also: 33 cudaConfigureCall. cudaFuncGetAttributes (C API). cudaFuncSetCacheConfig (C API). cudaSetDoubleForHost. If the device does natively support doubles.

enum cudaMemcpyKind kind) Copies data between host and device. unsigned int flags=0) Allocate an array on the device. size_t height=0. const struct cudaChannelFormatDesc ∗desc. or 3D memory objects on the device. • cudaError_t cudaGetSymbolSize (size_t ∗size. • cudaError_t cudaFreeArray (struct cudaArray ∗array) Frees an array on the device. • cudaError_t cudaHostGetFlags (unsigned int ∗pFlags. const void ∗src. • cudaError_t cudaFreeHost (void ∗ptr) Frees page-locked memory.8 Memory Management Functions • cudaError_t cudaFree (void ∗devPtr) Frees memory on the device. const struct cudaChannelFormatDesc ∗desc. • cudaError_t cudaMallocPitch (void ∗∗devPtr. void ∗pHost. struct cudaExtent extent) Allocates logical 1D. unsigned int flags) Allocates page-locked memory on the host. • cudaError_t cudaHostAlloc (void ∗∗pHost. • cudaError_t cudaMalloc (void ∗∗devPtr. const char ∗symbol) Finds the size of the object associated with a CUDA symbol. • cudaError_t cudaMallocArray (struct cudaArray ∗∗array. Generated for NVIDIA CUDA Library by Doxygen . size_t size. size_t size) Allocate memory on the device. size_t size) Allocates page-locked memory on the host. 2D. void ∗pHost) Passes back flags used to allocate pinned host memory allocated by cudaHostAlloc(). struct cudaExtent extent. unsigned int flags) Passes back device pointer of mapped host memory allocated by cudaHostAlloc(). size_t height) Allocates pitched memory on the device. • cudaError_t cudaMalloc3D (struct cudaPitchedPtr ∗pitchedDevPtr. • cudaError_t cudaGetSymbolAddress (void ∗∗devPtr. unsigned int flags=0) Allocate an array on the device. const char ∗symbol) Finds the address associated with a CUDA symbol. size_t width.34 Module Documentation 4. • cudaError_t cudaMalloc3DArray (struct cudaArray ∗∗array. • cudaError_t cudaMallocHost (void ∗∗ptr. • cudaError_t cudaHostGetDevicePointer (void ∗∗pDevice. size_t count. size_t ∗pitch. • cudaError_t cudaMemcpy (void ∗dst. size_t width.

size_t hOffsetSrc. • cudaError_t cudaMemcpy3D (const struct cudaMemcpy3DParms ∗p) Copies data between 3D objects. size_t spitch. • cudaError_t cudaMemcpyAsync (void ∗dst. size_t hOffset. size_t width. size_t height. size_t dpitch. enum cudaMemcpyKind kind. cudaStream_t stream=0) Copies data between host and device. size_t height. size_t hOffsetDst. enum cudaMemcpyKind kind) Copies data between host and device. size_t hOffset. const struct cudaArray ∗src. size_t dpitch. size_t wOffsetSrc. const void ∗src. • cudaError_t cudaMemcpy2DAsync (void ∗dst. size_t wOffset. cudaStream_t stream=0) Generated for NVIDIA CUDA Library by Doxygen . size_t count.8 Memory Management 35 • cudaError_t cudaMemcpy2D (void ∗dst. const void ∗src. enum cudaMemcpyKind kind. size_t height. size_t wOffset. size_t wOffsetSrc. enum cudaMemcpyKind kind) Copies data between host and device. size_t height. size_t dpitch. enum cudaMemcpyKind kind) Copies data between host and device. enum cudaMemcpyKind kind. size_t hOffset. size_t wOffset. const struct cudaArray ∗src. size_t wOffset. const struct cudaArray ∗src. size_t hOffset. size_t count. const struct cudaArray ∗src. size_t count. enum cudaMemcpyKind kind=cudaMemcpyDeviceToDevice) Copies data between host and device. const void ∗src. size_t spitch. size_t width. size_t width. cudaStream_t stream=0) Copies data between host and device. size_t hOffset. const void ∗src. size_t hOffsetDst. size_t height. cudaStream_t stream=0) Copies data between 3D objects. • cudaError_t cudaMemcpy2DArrayToArray (struct cudaArray ∗dst. size_t spitch. size_t width. size_t wOffset. • cudaError_t cudaMemcpyFromArray (void ∗dst. size_t width. • cudaError_t cudaMemcpyArrayToArray (struct cudaArray ∗dst. enum cudaMemcpyKind kind. const struct cudaArray ∗src. size_t spitch. size_t hOffsetSrc. size_t wOffset. size_t hOffset. • cudaError_t cudaMemcpyFromArrayAsync (void ∗dst. enum cudaMemcpyKind kind. cudaStream_t stream=0) Copies data between host and device. enum cudaMemcpyKind kind) Copies data between host and device. size_t dpitch. enum cudaMemcpyKind kind=cudaMemcpyDeviceToDevice) Copies data between host and device. • cudaError_t cudaMemcpy2DFromArrayAsync (void ∗dst. const void ∗src. size_t height. size_t wOffsetDst. size_t width. • cudaError_t cudaMemcpy2DToArray (struct cudaArray ∗dst. const struct cudaArray ∗src. • cudaError_t cudaMemcpy3DAsync (const struct cudaMemcpy3DParms ∗p. size_t count. cudaStream_t stream=0) Copies data between host and device. • cudaError_t cudaMemcpy2DFromArray (void ∗dst.4. size_t height. size_t wOffsetDst. size_t width. • cudaError_t cudaMemcpy2DToArrayAsync (struct cudaArray ∗dst.

size_t wOffset. int value. cudaStream_t stream=0) Initializes or sets device memory to a value. size_t h. • cudaError_t cudaMemGetInfo (size_t ∗free. enum cudaMemcpyKind kind) Copies data between host and device. const void ∗src. enum cudaMemcpyKind kind=cudaMemcpyHostToDevice) Copies data to the given symbol on the device.36 Copies data between host and device. int value. struct cudaExtent extent. int value. • cudaError_t cudaMemsetAsync (void ∗devPtr. size_t count. int value. • cudaError_t cudaMemset3D (struct cudaPitchedPtr pitchedDevPtr. cudaStream_t stream=0) Initializes or sets device memory to a value. • cudaError_t cudaMemset3DAsync (struct cudaPitchedPtr pitchedDevPtr. size_t hOffset. size_t width. const void ∗src. size_t offset=0. size_t count. size_t wOffset. size_t count. size_t count. size_t ∗total) Gets free and total device memory. size_t count. size_t height) Initializes or sets device memory to a value. Module Documentation • cudaError_t cudaMemcpyFromSymbol (void ∗dst. • cudaError_t cudaMemset (void ∗devPtr. enum cudaMemcpyKind kind. const char ∗symbol. size_t count. cudaStream_t stream=0) Copies data from the given symbol on the device. size_t count) Initializes or sets device memory to a value. enum cudaMemcpyKind kind. struct cudaExtent extent) Initializes or sets device memory to a value. int value. Generated for NVIDIA CUDA Library by Doxygen . int value. • cudaError_t cudaMemset2DAsync (void ∗devPtr. cudaStream_t stream=0) Initializes or sets device memory to a value. size_t offset=0. size_t pitch. • cudaError_t cudaMemcpyToSymbolAsync (const char ∗symbol. • cudaError_t cudaMemcpyFromSymbolAsync (void ∗dst. size_t d) Returns a cudaExtent based on input parameters. const void ∗src. size_t offset. const char ∗symbol. • cudaError_t cudaMemcpyToArray (struct cudaArray ∗dst. enum cudaMemcpyKind kind=cudaMemcpyDeviceToHost) Copies data from the given symbol on the device. • cudaError_t cudaMemcpyToArrayAsync (struct cudaArray ∗dst. enum cudaMemcpyKind kind. cudaStream_t stream=0) Copies data to the given symbol on the device. size_t count. size_t hOffset. cudaStream_t stream=0) Copies data between host and device. size_t pitch. size_t width. • cudaError_t cudaMemset2D (void ∗devPtr. const void ∗src. • cudaError_t cudaMemcpyToSymbol (const char ∗symbol. size_t offset. size_t height. • struct cudaExtent make_cudaExtent (size_t w.

• struct cudaPos make_cudaPos (size_t x. size_t p.8. asynchronous launches. which must have been returned by a previous call to cudaMalloc() or cudaMallocPitch(). cudaFreeHost. cudaErrorInitializationError Note: Note that this function may also return error codes from previous. no operation is performed. no operation is performed.8 Memory Management 37 • struct cudaPitchedPtr make_cudaPitchedPtr (void ∗d. cudaErrorInitializationError Note: Note that this function may also return error codes from previous.8. If devPtr is 0.2. cudaErrorInvalidValue is returned. 4. asynchronous launches. See also: cudaMalloc. Generated for NVIDIA CUDA Library by Doxygen . cudaMallocArray. cudaMallocHost (C API). cudaErrorInvalidValue. or if cudaFree(devPtr) has already been called before. If cudaFreeArray(array) has already been called before. Parameters: devPtr .Device pointer to memory to free Returns: cudaSuccess. Parameters: array .4.8. cudaErrorInvalidDevicePointer.1 Detailed Description This section describes the memory management functions of the CUDA runtime application programming interface. size_t z) Returns a cudaPos based on input parameters.2 cudaError_t cudaFreeArray (struct cudaArray ∗ array) Frees the CUDA array array.Pointer to array to free Returns: cudaSuccess.8.2. which must have been ∗ returned by a previous call to cudaMallocArray(). size_t ysz) Returns a cudaPitchedPtr based on input parameters.1 Function Documentation cudaError_t cudaFree (void ∗ devPtr) Frees the memory space pointed to by devPtr.2 4. cudaFreeArray. cudaMalloc3DArray. size_t y. If devPtr is 0. 4. size_t xsz. Otherwise. cudaMalloc3D. cudaHostAlloc 4. an error is returned. cudaFree() returns cudaErrorInvalidDevicePointer in case of failure. cudaMallocPitch.

cudaHostAlloc 4.Return device pointer associated with symbol symbol . cudaFree.38 See also: Module Documentation cudaMalloc. Parameters: devPtr .8. cudaMalloc3D. cudaMallocArray. which must have been returned by a previous call to cudaMallocHost() or cudaHostAlloc(). cudaMallocPitch. asynchronous launches. cudaErrorDuplicateVariableName is returned. cudaFreeHost.3 cudaError_t cudaFreeHost (void ∗ ptr) Frees the memory space pointed to by hostPtr. const char ∗ symbol) Returns in ∗devPtr the address of symbol symbol on the device. cudaMalloc3DArray. symbol can either be a variable that resides in global or constant memory space. cudaMallocArray. If there are multiple global or constant variables with the same string name (from separate files) and the lookup is done via character string. See also: cudaMalloc.8.2. Parameters: ptr . cudaMallocPitch. cudaHostAlloc 4. or if symbol is not declared in the global or constant memory space. cudaMallocHost (C API). cudaFreeArray. cudaFree. asynchronous launches.Pointer to memory to free Returns: cudaSuccess.2. If symbol cannot be found. cudaErrorInitializationError Note: Note that this function may also return error codes from previous.4 cudaError_t cudaGetSymbolAddress (void ∗∗ devPtr.Global variable or string symbol to search for Returns: cudaSuccess. cudaMallocHost (C API). See also: cudaGetSymbolAddress (C++ API) cudaGetSymbolSize (C API) Generated for NVIDIA CUDA Library by Doxygen . cudaErrorDuplicateVariableName Note: Note that this function may also return error codes from previous. cudaErrorInvalidSymbol. naming a variable that resides in global or constant memory space. ∗devPtr is unchanged and the error cudaErrorInvalidSymbol is returned. or it can be a character string.

• cudaHostAllocPortable: The memory returned by this call will be considered as pinned memory by all CUDA contexts. The device pointer to the memory may be obtained by calling cudaHostGetDevicePointer().8 Memory Management 4. See also: cudaGetSymbolAddress (C API) cudaGetSymbolSize (C++ API) 4. since it reduces the amount of memory available to the system for paging. • cudaHostAllocMapped: Maps the allocation into the CUDA address space. The flags parameter enables different options to be specified that affect the allocation. cudaSetDeviceFlags() must have been called with the cudaDeviceMapHost flag in order for the cudaHostAllocMapped flag to have any effect. ∗size is unchanged and the error cudaErrorInvalidSymbol is returned.2. Generated for NVIDIA CUDA Library by Doxygen . • cudaHostAllocWriteCombined: Allocates the memory as write-combined (WC). unsigned int flags) Allocates size bytes of host memory that is page-locked and accessible to the device.4.5 cudaError_t cudaGetSymbolSize (size_t ∗ size. As a result.Size of object associated with symbol symbol .8. Parameters: size . or if symbol is not declared in global or constant memory space.6 cudaError_t cudaHostAlloc (void ∗∗ pHost. this function is best used sparingly to allocate staging areas for data exchange between host and device. or it can be a character string. as follows. Allocating excessive amounts of pinned memory may degrade system performance. The driver tracks the virtual memory ranges allocated with this function and automatically accelerates calls to functions such as cudaMemcpy(). const char ∗ symbol) 39 Returns in ∗size the size of symbol symbol. not just the one that performed the allocation. symbol can either be a variable that resides in global or constant memory space. but cannot be read efficiently by most CPUs.2. WC memory is a good option for buffers that will be written by the CPU and read by the device via mapped pinned memory or host->device transfers. WC memory can be transferred across the PCI Express bus more quickly on some system configurations. asynchronous launches. it can be read or written with much higher bandwidth than pageable memory obtained with functions such as malloc().Global variable or string symbol to find size of Returns: cudaSuccess. mapped and/or write-combined with no restrictions. cudaErrorInvalidSymbol Note: Note that this function may also return error codes from previous. • cudaHostAllocDefault: This flag’s value is defined to be 0 and causes cudaHostAlloc() to emulate cudaMallocHost(). size_t size. If symbol cannot be found.8. All of these flags are orthogonal to one another: a developer may allocate memory that is portable. naming a variable that resides in global or constant memory space. Since the memory can be accessed directly by the device.

pinned host buffer allocated by cudaHostAlloc(). Memory allocated by this function must be freed with cudaFreeHost(). cudaErrorMemoryAllocation Note: Note that this function may also return error codes from previous.8. cudaFreeHost 4. void ∗ pHost.7 cudaError_t cudaHostGetDevicePointer (void ∗∗ pDevice. Parameters: pDevice .Flags for extensions (must be 0 for now) Returns: cudaSuccess. Parameters: pHost . cudaErrorInvalidValue. cudaErrorMemoryAllocation Note: Note that this function may also return error codes from previous. cudaHostGetDevicePointer() will fail if the cudaDeviceMapHost flag was not specified before deferred context creation occurred. asynchronous launches. See also: cudaSetDeviceFlags. cudaMallocHost (C API).2. or if called on a device that does not support mapped.Requested host pointer mapping flags .Returned device pointer for mapped memory pHost .Requested allocation size in bytes flags . See also: cudaSetDeviceFlags. flags provides for future releases. cudaHostAlloc Generated for NVIDIA CUDA Library by Doxygen . asynchronous launches.Device pointer to allocated memory size . pinned memory. it must be set to 0.40 Module Documentation The cudaHostAllocMapped flag may be specified on CUDA contexts for devices that do not support mapped pinned memory. For now.Requested properties of allocated memory Returns: cudaSuccess. unsigned int flags) Passes back the device pointer corresponding to the mapped. The failure is deferred to cudaHostGetDevicePointer() because the memory may be mapped into other CUDA contexts via the cudaHostAllocPortable flag.

cudaFree. cudaMallocArray.9 cudaError_t cudaMalloc (void ∗∗ devPtr.2.Host pointer Returns: cudaSuccess. size_t size) Allocates size bytes of linear memory on the device and returns in ∗devPtr a pointer to the allocated memory. cudaFreeHost.Returned flags word pHost .8.Pointer to allocated device memory size . the logical width and height of the allocation.8 cudaError_t cudaHostGetFlags (unsigned int ∗ pFlags. it is highly recommended that programmers perform allocations using cudaMalloc3D() or cudaMallocPitch(). The memory is not cleared.2.4. struct cudaExtent extent) Allocates at least width ∗ height ∗ depth bytes of linear memory on the device and returns a cudaPitchedPtr in which ptr is a pointer to the allocated memory. The returned cudaPitchedPtr contains additional fields xsize and ysize. Generated for NVIDIA CUDA Library by Doxygen . cudaFreeArray.8. For allocations of 2D and 3D objects. asynchronous launches. Parameters: pFlags . Parameters: devPtr . cudaMalloc3DArray. which are equivalent to the width and height extent parameters provided by the programmer during allocation.2. See also: cudaHostAlloc 4. Due to alignment restrictions in the hardware.8 Memory Management 4.Requested allocation size in bytes Returns: cudaSuccess. this is especially true if the application will be performing memory copies involving 2D or 3D objects (whether linear memory or CUDA arrays). cudaMalloc3D. cudaErrorInvalidValue Note: Note that this function may also return error codes from previous. cudaMallocHost (C API). The allocated memory is suitably aligned for any kind of variable. The function may pad the allocation to ensure hardware alignment requirements are met. The pitch returned in the pitch field of pitchedDevPtr is the width in bytes of the allocation.10 cudaError_t cudaMalloc3D (struct cudaPitchedPtr ∗ pitchedDevPtr. cudaHostAlloc 4.8. cudaMalloc() returns cudaErrorMemoryAllocation in case of failure. cudaErrorMemoryAllocation See also: cudaMallocPitch. void ∗ pHost) 41 cudaHostGetFlags() will fail if the input pointer does not reside in an address range allocated by cudaHostAlloc().

flags provides for future releases. 0}. 0.Pointer to allocated array in device memory Generated for NVIDIA CUDA Library by Doxygen . For 2D arrays valid extent ranges are {(1. • A 2D array is allocated if only the depth extent is zero. (1. Parameters: array . Note: Due to the differing extent limits. where cudaChannelFormatKind is one of cudaChannelFormatKindSigned. 0}. cudaMemset3D. cudaMalloc3DArray. unsigned int flags = 0) Allocates a CUDA array according to the cudaChannelFormatDesc structure desc and returns a handle to the new CUDA array in ∗array. cudaMallocArray. 2048).42 Parameters: pitchedDevPtr . 8192).Pointer to allocated pitched device memory extent . (1.11 cudaError_t cudaMalloc3DArray (struct cudaArray ∗∗ array.2. 32768). cudaHostAlloc. y. 2D. • A 3D array is allocated if all three extents are non-zero. or cudaChannelFormatKindFloat. make_cudaPitchedPtr. it may be advantageous to use a degenerate array (with unused dimensions set to one) of higher dimensionality. w. (1. For 1D arrays valid extent ranges are {(1. a degenerate 2D array allows for significantly more linear storage than a 1D array. z. cudaChannelFormatKindUnsigned. cudaMalloc3DArray() is able to allocate 1D. 2048). asynchronous launches. 2048)}. cudaErrorMemoryAllocation Note: Module Documentation Note that this function may also return error codes from previous. const struct cudaChannelFormatDesc ∗ desc. For 3D arrays valid extent ranges are {(1. or 3D arrays. cudaFreeHost. struct cudaExtent extent.8. The cudaChannelFormatDesc is defined as: struct cudaChannelFormatDesc { int x. cudaMemcpy3D. cudaFree. 65536). cudaFreeArray.Requested allocation size (width field in bytes) Returns: cudaSuccess. • A 1D array is allocated if the height and depth extent are both zero. See also: cudaMallocPitch. For now. make_cudaExtent 4. it must be set to 0. enum cudaChannelFormatKind f. cudaMallocHost (C API). }. For instance.

cudaMallocHost (C API).8 Memory Management desc . const struct cudaChannelFormatDesc ∗ desc.Requested array allocation width height . }. cudaFreeHost.Requested array allocation height flags . y. or cudaChannelFormatKindFloat.Requested channel format extent . cudaHostAlloc. asynchronous launches.2.4. cudaFreeArray. cudaFreeHost.Requested channel format width . cudaHostAlloc Generated for NVIDIA CUDA Library by Doxygen . cudaFree. make_cudaExtent 4.12 cudaError_t cudaMallocArray (struct cudaArray ∗∗ array. size_t height = 0. cudaMalloc3DArray. cudaChannelFormatKindUnsigned. asynchronous launches. See also: 43 cudaMalloc3D. enum cudaChannelFormatKind f. unsigned int flags = 0) Allocates a CUDA array according to the cudaChannelFormatDesc structure desc and returns a handle to the new CUDA array in ∗array. cudaErrorMemoryAllocation Note: Note that this function may also return error codes from previous. as follows. cudaMalloc3D.Requested allocation size (width field in elements) flags .Pointer to allocated array in device memory desc . cudaFreeArray. cudaFree. • cudaArrayDefault: This flag’s value is defined to be 0 and provides default array allocation • cudaArraySurfaceLoadStore: Allocates an array that can be read from or written to using a surface reference Parameters: array . The flags parameter enables different options to be specified that affect the allocation. cudaMalloc. w. cudaMallocHost (C API). cudaMallocPitch. cudaMallocPitch. The cudaChannelFormatDesc is defined as: struct cudaChannelFormatDesc { int x. size_t width.8. cudaErrorMemoryAllocation Note: Note that this function may also return error codes from previous.Requested properties of allocated array Returns: cudaSuccess. where cudaChannelFormatKind is one of cudaChannelFormatKindSigned.Flags for extensions (must be 0 for now) Returns: cudaSuccess. z. See also: cudaMalloc.

the address is computed as: T* pElement = (T*)((char*)BaseAddress + Row * pitch) + Column.8. cudaMallocArray. The pitch returned in ∗pitch by cudaMallocPitch() is the width in bytes of the allocation. it can be read or written with much higher bandwidth than pageable memory obtained with functions such as malloc().Pitch for allocation width . cudaMallocPitch. cudaMalloc3D. Since the memory can be accessed directly by the device. it is recommended that programmers consider performing pitch allocations using cudaMallocPitch(). cudaHostAlloc 4. See also: cudaMalloc. cudaFreeHost. cudaFree. Allocating excessive amounts of memory with cudaMallocHost() may degrade system performance.13 cudaError_t cudaMallocHost (void ∗∗ ptr.Pointer to allocated host memory size .Requested allocation size in bytes Returns: cudaSuccess. cudaErrorMemoryAllocation Note: Note that this function may also return error codes from previous.44 4. size_t ∗ pitch. Parameters: ptr .Pointer to allocated pitched device memory pitch . this function is best used sparingly to allocate staging areas for data exchange between host and device. since it reduces the amount of memory available to the system for paging.Requested pitched allocation height Returns: cudaSuccess. The intended usage of pitch is as a separate parameter of the allocation. For allocations of 2D arrays. The function may pad the allocation to ensure that corresponding pointers in any given row will continue to meet the alignment requirements for coalescing as the address is updated from row to row. size_t width. asynchronous launches. Given the row and column of an array element of type T.8.Requested pitched allocation width (in bytes) height . size_t size) Module Documentation Allocates size bytes of host memory that is page-locked and accessible to the device. size_t height) Allocates at least width (in bytes) ∗ height bytes of linear memory on the device and returns in ∗devPtr a pointer to the allocated memory. Parameters: devPtr . cudaMallocHost (C++ API). cudaFreeArray. used to compute addresses within the 2D array. cudaMalloc3DArray. As a result. this is especially true if the application will be performing 2D memory copies between different regions of device memory (whether linear memory or CUDA arrays).2.14 cudaError_t cudaMallocPitch (void ∗∗ devPtr. Due to pitch alignment restrictions in the hardware. cudaErrorMemoryAllocation Generated for NVIDIA CUDA Library by Doxygen .2. cudaHostAlloc. The driver tracks the virtual memory ranges allocated with this function and automatically accelerates calls to functions such as cudaMemcpy∗().

including any padding added to the end of each row. See also: cudaMemcpy2D. cudaMemcpyFromArray. cudaMemcpy2DToArrayAsync. cudaMemcpyFromArrayAsync. cudaFree.15 cudaError_t cudaMemcpy (void ∗ dst. cudaMemcpy2DFromArray. Calling cudaMemcpy() with dst and src pointers that do not match the direction of the copy results in an undefined behavior. The memory areas may not overlap. cudaErrorInvalidValue. where kind is one of cudaMemcpyHostToHost. cudaMemcpy2DArrayToArray.Destination memory address src . Calling cudaMemcpy2D() with dst and src pointers that do not match the direction of the copy results in an undefined behavior.8. cudaErrorInvalidMemcpyDirection Note: Note that this function may also return error codes from previous. asynchronous launches.8. width must not exceed either dpitch or spitch. size_t width. cudaMemcpyDeviceToHost. enum cudaMemcpyKind kind) Copies a matrix (height rows of width bytes each) from the memory area pointed to by src to the memory area pointed to by dst. Parameters: dst . const void ∗ src. Parameters: dst . cudaMallocArray. cudaMemcpyAsync. cudaMemcpy2DAsync. cudaMalloc3DArray. cudaMemcpyHostToDevice. cudaMemcpyToArray.Destination memory address Generated for NVIDIA CUDA Library by Doxygen . cudaMemcpy2DToArray. cudaMemcpyToArrayAsync.Source memory address count .4. cudaMemcpyDeviceToHost.Type of transfer Returns: cudaSuccess. cudaMemcpyHostToDevice. or cudaMemcpyDeviceToDevice.16 cudaError_t cudaMemcpy2D (void ∗ dst. where kind is one of cudaMemcpyHostToHost. cudaMallocHost (C API). cudaMemcpyFromSymbol. cudaMemcpyFromSymbolAsync 4. size_t dpitch. and specifies the direction of the copy. enum cudaMemcpyKind kind) Copies count bytes from the memory area pointed to by src to the memory area pointed to by dst.2. const void ∗ src. size_t count. cudaFreeArray.Size in bytes to copy kind . cudaHostAlloc 4. and specifies the direction of the copy. asynchronous launches. cudaErrorInvalidDevicePointer. or cudaMemcpyDeviceToDevice.2. size_t height. cudaMemcpy2D() returns an error if dpitch or spitch exceeds the maximum allowed.8 Memory Management Note: Note that this function may also return error codes from previous. size_t spitch. cudaMemcpyToSymbolAsync. dpitch and spitch are the widths in memory in bytes of the 2D arrays pointed to by dst and src. cudaFreeHost. cudaMemcpyToSymbol. cudaMalloc3D. See also: 45 cudaMalloc. cudaMemcpyArrayToArray. cudaMemcpy2DFromArrayAsync. The memory areas may not overlap.

cudaMemcpy2DArrayToArray. cudaMemcpyHostToDevice. size_t height.Pitch of destination memory src . wOffsetDst + width must not exceed the width of the CUDA array dst. Parameters: dst . cudaMemcpyToArray. wOffsetSrc + width must not exceed the width of the CUDA array src. and specifies the direction of the copy.Height of matrix transfer (rows) kind . cudaMemcpyFromSymbolAsync 4. asynchronous launches. cudaMemcpyDeviceToHost. hOffsetSrc) to the CUDA array dst starting at the upper left corner (wOffsetDst.Width of matrix transfer (columns in bytes) height .17 cudaError_t cudaMemcpy2DArrayToArray (struct cudaArray ∗ dst. cudaMemcpy2DAsync.46 dpitch .Height of matrix transfer (rows) kind . See also: cudaMemcpy. cudaMemcpyFromSymbol. enum cudaMemcpyKind kind = cudaMemcpyDeviceToDevice) Copies a matrix (height rows of width bytes each) from the CUDA array srcArray starting at the upper left corner (wOffsetSrc. where kind is one of cudaMemcpyHostToHost.Pitch of source memory width . cudaMemcpyToSymbolAsync.Type of transfer Returns: cudaSuccess. cudaMemcpyFromArrayAsync. cudaMemcpyToSymbol. or cudaMemcpyDeviceToDevice. cudaMemcpyAsync. size_t hOffsetSrc. size_t wOffsetDst. cudaMemcpy2DToArray. cudaMemcpy2DFromArrayAsync. cudaErrorInvalidDevicePointer. cudaMemcpyFromArray. size_t hOffsetDst.Type of transfer Returns: Module Documentation cudaSuccess.Source memory address spitch . cudaErrorInvalidMemcpyDirection Note: Note that this function may also return error codes from previous. cudaErrorInvalidValue. cudaErrorInvalidMemcpyDirection Generated for NVIDIA CUDA Library by Doxygen .Destination memory address wOffsetDst . cudaMemcpyToArrayAsync. hOffsetDst).Source starting X offset hOffsetSrc .Destination starting Y offset src . cudaErrorInvalidPitchValue.2. cudaMemcpy2DFromArray. size_t wOffsetSrc.Width of matrix transfer (columns in bytes) height . cudaMemcpyArrayToArray. size_t width.Source memory address wOffsetSrc .8.Destination starting X offset hOffsetDst . const struct cudaArray ∗ src. cudaMemcpy2DToArrayAsync. cudaErrorInvalidValue.Source starting Y offset width .

cudaMemcpyArrayToArray. cudaMemcpy2DAsync. cudaMemcpyHostToDevice. width must not exceed either dpitch or spitch. cudaErrorInvalidDevicePointer. asynchronous launches. the copy may overlap with operations in other streams. cudaStream_t stream = 0) Copies a matrix (height rows of width bytes each) from the memory area pointed to by src to the memory area pointed to by dst. Parameters: dst . cudaMemcpyToArray. dpitch and spitch are the widths in memory in bytes of the 2D arrays pointed to by dst and src.4.Stream identifier Returns: cudaSuccess. cudaErrorInvalidMemcpyDirection Note: Note that this function may also return error codes from previous.Pitch of destination memory src . cudaMemcpyFromSymbol.Type of transfer stream . cudaMemcpy2D. size_t height. cudaMemcpyFromArrayAsync. and specifies the direction of the copy. size_t spitch. cudaMemcpyFromSymbolAsync 4. cudaMemcpy2DFromArrayAsync. cudaMemcpyToSymbol. so the call may return before the copy is complete. cudaMemcpyAsync. cudaMemcpy2DAsync() returns an error if dpitch or spitch is greater than the maximum allowed. asynchronous launches. See also: 47 cudaMemcpy.2. The copy can optionally be associated to a stream by passing a non-zero stream argument. cudaMemcpyFromArray. cudaMemcpy2DToArray. cudaErrorInvalidValue. size_t width. cudaMemcpy2DFromArray. cudaMemcpyDeviceToHost. cudaMemcpyToSymbolAsync. cudaMemcpyFromArrayAsync. cudaMemcpy2DToArrayAsync. cudaMemcpyFromArray.18 cudaError_t cudaMemcpy2DAsync (void ∗ dst. enum cudaMemcpyKind kind. cudaMemcpyArrayToArray.Pitch of source memory width . The memory areas may not overlap. cudaMemcpy2D. cudaMemcpyAsync. const void ∗ src.Source memory address spitch . cudaMemcpyFromSymbolAsync Generated for NVIDIA CUDA Library by Doxygen . cudaMemcpyToArrayAsync. cudaMemcpyToSymbol.8. cudaMemcpyToSymbolAsync. Calling cudaMemcpy2DAsync() with dst and src pointers that do not match the direction of the copy results in an undefined behavior.Width of matrix transfer (columns in bytes) height . cudaMemcpy2DFromArrayAsync. See also: cudaMemcpy.Height of matrix transfer (rows) kind . cudaMemcpyFromSymbol. If kind is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost and stream is non-zero. size_t dpitch.Destination memory address dpitch .8 Memory Management Note: Note that this function may also return error codes from previous. cudaMemcpy2DArrayToArray. cudaMemcpyToArray. cudaErrorInvalidPitchValue. cudaMemcpy2DAsync() is asynchronous with respect to the host. cudaMemcpyToArrayAsync. cudaMemcpy2DFromArray. cudaMemcpy2DToArrayAsync. cudaMemcpy2DToArray. or cudaMemcpyDeviceToDevice. where kind is one of cudaMemcpyHostToHost. including any padding added to the end of each row. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed as input.

It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed as input. size_t hOffset. cudaMemcpy2D.19 Module Documentation cudaError_t cudaMemcpy2DFromArray (void ∗ dst. asynchronous launches. Parameters: dst . cudaMemcpy2DToArrayAsync. If kind is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost and stream is non-zero.Source starting X offset hOffset . cudaMemcpyFromArray. size_t width. size_t dpitch. Generated for NVIDIA CUDA Library by Doxygen . cudaMemcpyFromSymbol. cudaMemcpyArrayToArray. or cudaMemcpyDeviceToDevice. See also: cudaMemcpy. size_t height. size_t dpitch. dpitch is the width in memory in bytes of the 2D array pointed to by dst. width must not exceed dpitch. cudaMemcpy2DFromArrayAsync() is asynchronous with respect to the host. cudaMemcpyDeviceToHost. size_t wOffset.Source starting Y offset width . where kind is one of cudaMemcpyHostToHost. hOffset) to the memory area pointed to by dst. wOffset + width must not exceed the width of the CUDA array src. size_t height. cudaMemcpyAsync. The copy can optionally be associated to a stream by passing a non-zero stream argument. size_t width. cudaMemcpy2DFromArray() returns an error if dpitch exceeds the maximum allowed. enum cudaMemcpyKind kind.Width of matrix transfer (columns in bytes) height . including any padding added to the end of each row. and specifies the direction of the copy.Source memory address wOffset .8.2. const struct cudaArray ∗ src. cudaMemcpy2DFromArrayAsync() returns an error if dpitch exceeds the maximum allowed.Type of transfer Returns: cudaSuccess. enum cudaMemcpyKind kind) Copies a matrix (height rows of width bytes each) from the CUDA array srcArray starting at the upper left corner (wOffset. cudaMemcpyToArrayAsync. cudaMemcpyFromSymbolAsync 4. const struct cudaArray ∗ src.48 4.Height of matrix transfer (rows) kind . cudaMemcpy2DAsync. cudaMemcpyDeviceToHost. cudaMemcpyToArray. cudaErrorInvalidPitchValue. including any padding added to the end of each row. size_t wOffset. cudaMemcpy2DArrayToArray. width must not exceed dpitch. size_t hOffset.2.8. cudaMemcpyHostToDevice. cudaMemcpyFromArrayAsync. hOffset) to the memory area pointed to by dst. cudaErrorInvalidMemcpyDirection Note: Note that this function may also return error codes from previous.Destination memory address dpitch . cudaErrorInvalidDevicePointer. cudaMemcpyToSymbol. cudaMemcpy2DFromArrayAsync. the copy may overlap with operations in other streams. so the call may return before the copy is complete. cudaMemcpy2DToArray. and specifies the direction of the copy. cudaErrorInvalidValue. or cudaMemcpyDeviceToDevice. dpitch is the width in memory in bytes of the 2D array pointed to by dst. where kind is one of cudaMemcpyHostToHost. cudaStream_t stream = 0) Copies a matrix (height rows of width bytes each) from the CUDA array srcArray starting at the upper left corner (wOffset. cudaMemcpyHostToDevice.20 cudaError_t cudaMemcpy2DFromArrayAsync (void ∗ dst. cudaMemcpyToSymbolAsync. wOffset + width must not exceed the width of the CUDA array src.Pitch of destination memory src .

cudaMemcpyAsync.8. cudaMemcpy2D.Source starting X offset hOffset . asynchronous launches. cudaMemcpy2DToArray() returns an error if spitch exceeds the maximum allowed.Type of transfer stream . size_t width.Source starting Y offset width . cudaMemcpy2DToArrayAsync.Source memory address wOffset . size_t wOffset.Height of matrix transfer (rows) kind .Destination starting Y offset src . cudaErrorInvalidValue.Width of matrix transfer (columns in bytes) height .Width of matrix transfer (columns in bytes) height . cudaMemcpy2DArrayToArray. cudaMemcpyToArrayAsync. cudaErrorInvalidDevicePointer. size_t height. See also: cudaMemcpy.21 cudaError_t cudaMemcpy2DToArray (struct cudaArray ∗ dst.Pitch of source memory width . cudaMemcpyFromArray. cudaMemcpyFromSymbolAsync 4. cudaMemcpyToSymbol. Parameters: dst . and specifies the direction of the copy. cudaMemcpyToArray.2. hOffset) where kind is one of cudaMemcpyHostToHost. cudaErrorInvalidMemcpyDirection Note: Note that this function may also return error codes from previous. cudaErrorInvalidPitchValue. cudaMemcpy2DFromArray. width must not exceed spitch.Destination memory address wOffset . cudaMemcpyFromSymbol. cudaMemcpy2DToArray. cudaMemcpyFromArrayAsync.Destination memory address dpitch . cudaMemcpyToSymbolAsync. size_t hOffset.8 Memory Management Parameters: dst .Stream identifier Returns: 49 cudaSuccess.4. wOffset + width must not exceed the width of the CUDA array dst. cudaMemcpy2DAsync. size_t spitch.Height of matrix transfer (rows) kind . cudaMemcpyDeviceToHost.Source memory address spitch .Pitch of destination memory src . const void ∗ src. enum cudaMemcpyKind kind) Copies a matrix (height rows of width bytes each) from the memory area pointed to by src to the CUDA array dst starting at the upper left corner (wOffset.Destination starting X offset hOffset . cudaMemcpyHostToDevice.Type of transfer Generated for NVIDIA CUDA Library by Doxygen . or cudaMemcpyDeviceToDevice. including any padding added to the end of each row. spitch is the width in memory in bytes of the 2D array pointed to by src. cudaMemcpyArrayToArray.

Height of matrix transfer (rows) kind . cudaErrorInvalidMemcpyDirection Note: Note that this function may also return error codes from previous. Parameters: dst . cudaMemcpyFromArrayAsync.50 Returns: Module Documentation cudaSuccess.2. asynchronous launches.Stream identifier Returns: cudaSuccess. cudaMemcpyFromSymbolAsync 4. cudaMemcpyAsync. cudaMemcpy2DToArrayAsync() is asynchronous with respect to the host. cudaStream_t stream = 0) Copies a matrix (height rows of width bytes each) from the memory area pointed to by src to the CUDA array dst starting at the upper left corner (wOffset. cudaMemcpy2DFromArrayAsync. The copy can optionally be associated to a stream by passing a non-zero stream argument. cudaMemcpyArrayToArray.Type of transfer stream . cudaMemcpyFromArray. spitch is the width in memory in bytes of the 2D array pointed to by src. cudaMemcpyToArray. including any padding added to the end of each row. width must not exceed spitch. so the call may return before the copy is complete. wOffset + width must not exceed the width of the CUDA array dst.Destination memory address wOffset . cudaMemcpy2DAsync. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed as input. size_t wOffset. cudaMemcpy2DArrayToArray. cudaMemcpyHostToDevice. and specifies the direction of the copy. cudaMemcpyDeviceToHost. or cudaMemcpyDeviceToDevice. cudaErrorInvalidDevicePointer. asynchronous launches. cudaErrorInvalidValue. See also: cudaMemcpy. the copy may overlap with operations in other streams.22 cudaError_t cudaMemcpy2DToArrayAsync (struct cudaArray ∗ dst. cudaMemcpyToSymbol. size_t height. cudaErrorInvalidValue. cudaMemcpy2D. size_t width. const void ∗ src. size_t hOffset. enum cudaMemcpyKind kind. size_t spitch. Generated for NVIDIA CUDA Library by Doxygen . cudaMemcpyToSymbolAsync. cudaMemcpy2DFromArray.Source memory address spitch .Destination starting Y offset src .Pitch of source memory width . cudaMemcpyToArrayAsync. hOffset) where kind is one of cudaMemcpyHostToHost.Width of matrix transfer (columns in bytes) height . cudaErrorInvalidDevicePointer. cudaErrorInvalidPitchValue. cudaMemcpy2DToArrayAsync() returns an error if spitch exceeds the maximum allowed. cudaMemcpyFromSymbol.8. If kind is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost and stream is non-zero. cudaErrorInvalidPitchValue.Destination starting X offset hOffset . cudaErrorInvalidMemcpyDirection Note: Note that this function may also return error codes from previous. cudaMemcpy2DToArrayAsync.

struct cudaPitchedPtr srcPtr. undefined behavior will result. cudaMemcpy2DAsync. Generated for NVIDIA CUDA Library by Doxygen . size_t height. cudaMemcpyToSymbol. struct cudaPos make_cudaPos(size_t x. size_t z). and kind of copy performed is specified by the cudaMemcpy3DParms struct which should be initialized to zero before use: cudaMemcpy3DParms myParms = {0}. cudaMemcpy3D() will return an error if they do not have the same element size. struct cudaPos dstPos. cudaMemcpyToArray. If a CUDA array is participating in the copy. the extent is defined in terms of that array’s elements.23 cudaError_t cudaMemcpy3D (const struct cudaMemcpy3DParms ∗ p) struct cudaExtent { size_t width. enum cudaMemcpyKind kind. size_t z. cudaMemcpyFromArrayAsync. cudaMemcpy2DToArray. The srcPos and dstPos fields are optional offsets into the source and destination objects and are defined in units of each object’s elements. struct cudaPos srcPos. struct cudaPos { size_t x. struct cudaMemcpy3DParms { struct cudaArray *srcArray. size_t depth. }.8. The source and destination object may not overlap. The kind field defines the direction of the copy. struct cudaExtent make_cudaExtent(size_t w. cudaMemcpy2DArrayToArray. cudaMemcpyFromArray. struct cudaPitchedPtr dstPtr. If the source and destination are both arrays. If overlapping source and destination objects are specified. cudaMemcpyArrayToArray. struct cudaExtent extent.4. For CUDA arrays. or cudaMemcpyDeviceToDevice. cudaMemcpyAsync. It must be one of cudaMemcpyHostToHost. cudaMemcpyHostToDevice. cudaMemcpyToArrayAsync. cudaMemcpyFromSymbolAsync 4. cudaMemcpyToSymbolAsync. destination.2. cudaMemcpyFromSymbol. device memory. The source and destination objects may be in either host memory. The source. cudaMemcpy2D. or a CUDA array. extent. The extent field defines the dimensions of the transferred area in elements. cudaMemcpy2DFromArrayAsync. The element for a host or device pointer is assumed to be unsigned char.8 Memory Management See also: 51 cudaMemcpy. }. size_t h. If no CUDA array is participating in the copy then the extents are defined in elements of unsigned char. size_t y. Passing more than one non-zero source or destination will cause cudaMemcpy3D() to return an error. size_t y. }. 2048) for any dimension. cudaMemcpy2DFromArray. size_t d). cudaMemcpy3D() copies data betwen two 3D objects. The struct passed to cudaMemcpy3D() must specify one of srcArray or srcPtr and one of dstArray or dstPtr. cudaMemcpyDeviceToHost. positions must be in the range [0. struct cudaArray *dstArray.

and kind of copy performed is specified by the cudaMemcpy3DParms struct which should be initialized to zero before use: Generated for NVIDIA CUDA Library by Doxygen .24 cudaError_t cudaMemcpy3DAsync (const struct cudaMemcpy3DParms ∗ p. make_cudaPos 4. Parameters: p . cudaErrorInvalidPitchValue. cudaMemcpy2DToArray. size_t z. struct cudaPos { size_t x. extent. cudaMemcpyFromArrayAsync. cudaMemcpy2DFromArrayAsync. cudaMemcpy2D.52 Module Documentation The source object must lie entirely within the region defined by srcPos and extent. cudaMemcpyFromSymbolAsync. cudaMemcpyArrayToArray. cudaErrorInvalidMemcpyDirection Note: Note that this function may also return error codes from previous. cudaErrorInvalidValue. struct cudaPos dstPos. struct cudaArray *dstArray. struct cudaExtent extent. size_t depth. cudaMalloc3DArray. cudaStream_t stream = 0) struct cudaExtent { size_t width. cudaErrorInvalidDevicePointer. cudaMemcpy3DAsync. cudaMemcpyFromSymbol. cudaMemcpyFromArray. size_t height. asynchronous launches. }. struct cudaPitchedPtr srcPtr. }. cudaMemcpy3D() returns an error if the pitch of srcPtr or dstPtr exceeds the maximum allowed. cudaMemcpy3DAsync() copies data betwen two 3D objects. }. struct cudaPos srcPos. The source and destination objects may be in either host memory. size_t d). See also: cudaMalloc3D. destination. cudaMemset3D. size_t y. cudaMemcpyToArray. cudaMemcpyAsync. cudaMemcpy2DFromArray. enum cudaMemcpyKind kind. The destination object must lie entirely within the region defined by dstPos and extent. cudaMemcpy2DAsync. cudaMemcpyToSymbol. struct cudaPitchedPtr dstPtr. size_t y. cudaMemcpyToArrayAsync. make_cudaExtent. size_t z). cudaMemcpyToSymbolAsync. The pitch of a cudaPitchedPtr allocated with cudaMalloc3D() will always be valid. or a CUDA array. size_t h.8. device memory.2. cudaMemcpy2DArrayToArray. struct cudaMemcpy3DParms { struct cudaArray *srcArray. struct cudaExtent make_cudaExtent(size_t w. cudaMemcpy2DToArrayAsync.3D memory copy parameters Returns: cudaSuccess. The source. struct cudaPos make_cudaPos(size_t x. cudaMemcpy.

cudaMemcpyFromArrayAsync. cudaMemcpy2DArrayToArray. cudaMemcpyArrayToArray.4. cudaMemcpy3DAsync() will return an error if they do not have the same element size. cudaMalloc3DArray. cudaMemcpy3DAsync() returns an error if the pitch of srcPtr or dstPtr exceeds the maximum allowed. cudaMemcpyFromSymbolAsync. See also: cudaMalloc3D. The kind field defines the direction of the copy. cudaMemcpyHostToDevice. cudaMemcpy2DFromArray. The source and destination object may not overlap. positions must be in the range [0. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed as input. The destination object must lie entirely within the region defined by dstPos and extent. cudaErrorInvalidMemcpyDirection Note: Note that this function may also return error codes from previous. cudaErrorInvalidValue. cudaMemcpy2DToArrayAsync. cudaMemcpyFromSymbol. or cudaMemcpyDeviceToDevice. cudaMemcpyToSymbol. cudaMemcpyAsync. The pitch of a cudaPitchedPtr allocated with cudaMalloc3D() will always be valid. If a CUDA array is participating in the copy. cudaMemcpyDeviceToHost. Parameters: p . make_cudaExtent. The source object must lie entirely within the region defined by srcPos and extent. cudaMemcpy. If kind is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost and stream is non-zero. cudaMemcpy2DToArray. cudaMemcpy3DAsync() is asynchronous with respect to the host. The extent field defines the dimensions of the transferred area in elements. The srcPos and dstPos fields are optional offsets into the source and destination objects and are defined in units of each object’s elements. It must be one of cudaMemcpyHostToHost. 53 The struct passed to cudaMemcpy3DAsync() must specify one of srcArray or srcPtr and one of dstArray or dstPtr. so the call may return before the copy is complete. cudaMemcpy3D. If overlapping source and destination objects are specified. cudaMemcpy2D. cudaMemcpyToArray. cudaMemset3D. the extent is defined in terms of that array’s elements. The copy can optionally be associated to a stream by passing a non-zero stream argument. If the source and destination are both arrays. make_cudaPos Generated for NVIDIA CUDA Library by Doxygen . cudaMemcpyFromArray. Passing more than one non-zero source or destination will cause cudaMemcpy3DAsync() to return an error.Stream identifier Returns: cudaSuccess. For CUDA arrays. cudaMemcpyToSymbolAsync. cudaMemcpy2DAsync. cudaMemcpy2DFromArrayAsync. undefined behavior will result. cudaMemcpyToArrayAsync. asynchronous launches. The element for a host or device pointer is assumed to be unsigned char. 2048) for any dimension. If no CUDA array is participating in the copy then the extents are defined in elements of unsigned char.8 Memory Management cudaMemcpy3DParms myParms = {0}. cudaErrorInvalidDevicePointer. the copy may overlap with operations in other streams.3D memory copy parameters stream . cudaErrorInvalidPitchValue.

cudaErrorInvalidValue.Destination starting X offset hOffsetDst . It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed as input. cudaMemcpyFromArrayAsync. const struct cudaArray ∗ src. cudaMemcpyAsync. hOffsetDst) where kind is one of cudaMemcpyHostToHost. and specifies the direction of the copy. size_t hOffsetDst. cudaMemcpy2DArrayToArray. or cudaMemcpyDeviceToDevice.2. Parameters: dst . the copy may overlap with operations in other streams.Destination memory address Generated for NVIDIA CUDA Library by Doxygen . size_t wOffsetDst.Size in bytes to copy kind . cudaMemcpyFromArray.2.Source memory address wOffsetSrc . The memory areas may not overlap. cudaMemcpy2DAsync. size_t hOffsetSrc. cudaMemcpyHostToDevice. cudaMemcpyToSymbol.8.Destination starting Y offset src . Calling cudaMemcpyAsync() with dst and src pointers that do not match the direction of the copy results in an undefined behavior.26 cudaError_t cudaMemcpyAsync (void ∗ dst. and specifies the direction of the copy. enum cudaMemcpyKind kind = cudaMemcpyDeviceToDevice) Copies count bytes from the CUDA array src starting at the upper left corner (wOffsetSrc. The copy can optionally be associated to a stream by passing a non-zero stream argument.Destination memory address wOffsetDst . size_t count. cudaMemcpyAsync() is asynchronous with respect to the host. const void ∗ src. cudaMemcpyToSymbolAsync. cudaMemcpy2D. cudaMemcpyDeviceToHost. cudaStream_t stream = 0) Copies count bytes from the memory area pointed to by src to the memory area pointed to by dst. or cudaMemcpyDeviceToDevice. size_t wOffsetSrc. cudaMemcpyToArray. See also: cudaMemcpy. cudaMemcpyDeviceToHost. size_t count. cudaMemcpyHostToDevice. cudaMemcpyToArrayAsync.Source starting Y offset count .54 4. If kind is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost and the stream is non-zero. cudaErrorInvalidMemcpyDirection Note: Note that this function may also return error codes from previous. where kind is one of cudaMemcpyHostToHost.8.Type of transfer Returns: cudaSuccess. enum cudaMemcpyKind kind. cudaMemcpy2DToArrayAsync. cudaMemcpy2DFromArray. hOffsetSrc) to the CUDA array dst starting at the upper left corner (wOffsetDst. cudaMemcpyFromSymbolAsync 4. Parameters: dst .25 Module Documentation cudaError_t cudaMemcpyArrayToArray (struct cudaArray ∗ dst. so the call may return before the copy is complete.Source starting X offset hOffsetSrc . cudaMemcpy2DToArray. cudaMemcpy2DFromArrayAsync. asynchronous launches. cudaMemcpyFromSymbol.

hOffset) to the memory area pointed to by dst. const struct cudaArray ∗ src. cudaMemcpyToArray. cudaMemcpy2DToArray. cudaMemcpyToSymbol. cudaMemcpyFromArrayAsync.8 Memory Management src . cudaErrorInvalidDevicePointer.Source memory address count . cudaMemcpy2DFromArrayAsync.Stream identifier Returns: cudaSuccess.Source memory address wOffset . cudaMemcpyHostToDevice.Source starting Y offset count . cudaMemcpyArrayToArray. cudaErrorInvalidMemcpyDirection Note: Note that this function may also return error codes from previous. enum cudaMemcpyKind kind) Copies count bytes from the CUDA array src starting at the upper left corner (wOffset. cudaMemcpyFromSymbolAsync Generated for NVIDIA CUDA Library by Doxygen . cudaMemcpy2DAsync. cudaErrorInvalidMemcpyDirection Note: Note that this function may also return error codes from previous. See also: cudaMemcpy. cudaMemcpyToSymbolAsync. cudaErrorInvalidDevicePointer. cudaMemcpy2DAsync. cudaMemcpyFromArray.2.4. See also: 55 cudaMemcpy.27 cudaError_t cudaMemcpyFromArray (void ∗ dst. cudaMemcpy2DFromArray. cudaMemcpy2DArrayToArray. cudaMemcpyArrayToArray. cudaMemcpy2DToArrayAsync. cudaMemcpy2D. cudaMemcpyToArrayAsync. size_t count. cudaErrorInvalidValue. cudaMemcpy2DArrayToArray. cudaMemcpy2DFromArray.Source starting X offset hOffset . cudaMemcpyAsync. cudaMemcpy2DToArrayAsync. cudaMemcpy2DToArray. cudaMemcpyToSymbol.Size in bytes to copy kind . cudaMemcpyFromArrayAsync. asynchronous launches. cudaErrorInvalidValue. cudaMemcpyToArray. Parameters: dst . cudaMemcpyToArrayAsync. cudaMemcpyDeviceToHost. cudaMemcpyToSymbolAsync. cudaMemcpyFromSymbol.Type of transfer stream .Destination memory address src . cudaMemcpy2D.8. size_t wOffset. cudaMemcpy2DFromArrayAsync. or cudaMemcpyDeviceToDevice. and specifies the direction of the copy. cudaMemcpyFromSymbol.Size in bytes to copy kind . size_t hOffset. where kind is one of cudaMemcpyHostToHost. asynchronous launches. cudaMemcpyFromSymbolAsync 4.Type of transfer Returns: cudaSuccess.

29 cudaError_t cudaMemcpyFromSymbol (void ∗ dst.Destination memory address src . The memory areas may not overlap. cudaMemcpy2DArrayToArray.Destination memory address symbol .8. or cudaMemcpyDeviceToDevice. kind can be either cudaMemcpyDeviceToHost or cudaMemcpyDeviceToDevice. cudaMemcpyToSymbolAsync.28 Module Documentation cudaError_t cudaMemcpyFromArrayAsync (void ∗ dst. cudaMemcpyToArrayAsync. enum cudaMemcpyKind kind. const struct cudaArray ∗ src.Source starting X offset hOffset . size_t count. cudaMemcpyDeviceToHost. The copy can optionally be associated to a stream by passing a non-zero stream argument. cudaErrorInvalidMemcpyDirection Note: Note that this function may also return error codes from previous. cudaMemcpyFromArrayAsync() is asynchronous with respect to the host. cudaMemcpyArrayToArray. Parameters: dst . size_t hOffset. cudaMemcpy2DToArrayAsync. size_t count. cudaErrorInvalidValue. symbol can either be a variable that resides in global or constant memory space. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed as input. cudaErrorInvalidDevicePointer. cudaMemcpyFromArray. If kind is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost and stream is non-zero. size_t offset = 0.2. cudaMemcpyToArray. cudaMemcpy2DFromArray. cudaMemcpyFromSymbol. Parameters: dst . cudaMemcpy2DFromArrayAsync.Size in bytes to copy Generated for NVIDIA CUDA Library by Doxygen .Symbol source from device count .56 4. enum cudaMemcpyKind kind = cudaMemcpyDeviceToHost) Copies count bytes from the memory area pointed to by offset bytes from the start of symbol symbol to the memory area pointed to by dst. cudaMemcpy2DAsync. cudaStream_t stream = 0) Copies count bytes from the CUDA array src starting at the upper left corner (wOffset.Source memory address wOffset . hOffset) to the memory area pointed to by dst. where kind is one of cudaMemcpyHostToHost. or it can be a character string. See also: cudaMemcpy.Source starting Y offset count .2. and specifies the direction of the copy. cudaMemcpy2D. the copy may overlap with operations in other streams. cudaMemcpyFromSymbolAsync 4. so the call may return before the copy is complete.Size in bytes to copy kind . size_t wOffset.Type of transfer stream . const char ∗ symbol. cudaMemcpyAsync.Stream identifier Returns: cudaSuccess. cudaMemcpy2DToArray.8. asynchronous launches. naming a variable that resides in global or constant memory space. cudaMemcpyHostToDevice. cudaMemcpyToSymbol.

cudaStream_t stream = 0) Copies count bytes from the memory area pointed to by offset bytes from the start of symbol symbol to the memory area pointed to by dst. Parameters: dst . cudaMemcpyToSymbol. cudaMemcpy2DToArrayAsync. cudaMemcpyToSymbolAsync Generated for NVIDIA CUDA Library by Doxygen .Offset from start of symbol in bytes kind . cudaMemcpyToSymbolAsync. cudaMemcpyArrayToArray. cudaMemcpyToArray. cudaMemcpyToSymbol. cudaMemcpyFromArrayAsync. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed as input.8. asynchronous launches. the copy may overlap with operations in other streams. cudaMemcpy2DToArrayAsync. cudaMemcpyFromArrayAsync.Type of transfer stream . The memory areas may not overlap. cudaErrorInvalidMemcpyDirection Note: Note that this function may also return error codes from previous. cudaMemcpy2DToArray. See also: cudaMemcpy. cudaMemcpy2DArrayToArray. cudaMemcpy2DFromArrayAsync. cudaMemcpy2DFromArrayAsync. const char ∗ symbol.Size in bytes to copy offset . cudaMemcpy2DFromArray. size_t count. cudaErrorInvalidValue. cudaMemcpyFromSymbolAsync 4. cudaErrorInvalidSymbol. cudaMemcpy2D.2.Stream identifier Returns: cudaSuccess. so the call may return before the copy is complete. cudaErrorInvalidSymbol. cudaErrorInvalidValue. cudaMemcpy2DAsync. size_t offset. If kind is cudaMemcpyDeviceToHost and stream is non-zero. naming a variable that resides in global or constant memory space. cudaMemcpyToArray. cudaMemcpyFromArray.Offset from start of symbol in bytes kind . cudaMemcpy2DFromArray. The copy can optionally be associated to a stream by passing a non-zero stream argument. cudaErrorInvalidMemcpyDirection Note: Note that this function may also return error codes from previous. cudaMemcpy2DAsync. See also: cudaMemcpy. symbol can either be a variable that resides in global or constant memory space. cudaErrorInvalidDevicePointer.Symbol source from device count . cudaMemcpy2DArrayToArray.Destination memory address symbol . cudaMemcpyToArrayAsync.4. asynchronous launches. cudaMemcpy2DToArray. cudaErrorInvalidDevicePointer.30 cudaError_t cudaMemcpyFromSymbolAsync (void ∗ dst. cudaMemcpyAsync.8 Memory Management offset . cudaMemcpyFromSymbolAsync() is asynchronous with respect to the host. enum cudaMemcpyKind kind. cudaMemcpy2D.Type of transfer Returns: 57 cudaSuccess. kind can be either cudaMemcpyDeviceToHost or cudaMemcpyDeviceToDevice. cudaMemcpyFromArray. cudaMemcpyToArrayAsync. cudaMemcpyArrayToArray. cudaMemcpyFromSymbol. or it can be a character string. cudaMemcpyAsync.

cudaMemcpy2DToArrayAsync. size_t count.Destination starting X offset hOffset .Destination starting X offset hOffset .Size in bytes to copy Generated for NVIDIA CUDA Library by Doxygen .Size in bytes to copy kind . size_t hOffset. cudaMemcpyHostToDevice.Destination starting Y offset src . enum cudaMemcpyKind kind) Copies count bytes from the memory area pointed to by src to the CUDA array dst starting at the upper left corner (wOffset.Source memory address count . cudaMemcpyAsync.2. cudaMemcpyFromArrayAsync. Parameters: dst . cudaMemcpy2DFromArray. and specifies the direction of the copy.8. See also: cudaMemcpy. cudaMemcpyFromSymbol. and specifies the direction of the copy. cudaMemcpy2D.58 4. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed as input. the copy may overlap with operations in other streams. The copy can optionally be associated to a stream by passing a non-zero stream argument. If kind is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost and stream is non-zero. cudaMemcpyToSymbol. cudaMemcpyFromArray. cudaErrorInvalidMemcpyDirection Note: Note that this function may also return error codes from previous. asynchronous launches.Source memory address count . size_t wOffset. cudaMemcpy2DFromArrayAsync. const void ∗ src. hOffset). size_t wOffset. cudaErrorInvalidDevicePointer.8.Type of transfer Returns: cudaSuccess. cudaMemcpyFromSymbolAsync 4. cudaMemcpyArrayToArray.Destination starting Y offset src . or cudaMemcpyDeviceToDevice. hOffset). cudaMemcpyHostToDevice.31 Module Documentation cudaError_t cudaMemcpyToArray (struct cudaArray ∗ dst.32 cudaError_t cudaMemcpyToArrayAsync (struct cudaArray ∗ dst. cudaMemcpyToArrayAsync() is asynchronous with respect to the host. cudaMemcpyToSymbolAsync. Parameters: dst . cudaStream_t stream = 0) Copies count bytes from the memory area pointed to by src to the CUDA array dst starting at the upper left corner (wOffset. size_t hOffset. where kind is one of cudaMemcpyHostToHost. cudaErrorInvalidValue.Destination memory address wOffset . cudaMemcpy2DAsync.2. size_t count. cudaMemcpyDeviceToHost.Destination memory address wOffset . cudaMemcpy2DToArray. where kind is one of cudaMemcpyHostToHost. const void ∗ src. cudaMemcpy2DArrayToArray. so the call may return before the copy is complete. cudaMemcpyDeviceToHost. enum cudaMemcpyKind kind. or cudaMemcpyDeviceToDevice. cudaMemcpyToArrayAsync.

size_t offset = 0.Offset from start of symbol in bytes kind . cudaMemcpy2DAsync. size_t count. cudaMemcpyFromArrayAsync. cudaErrorInvalidValue. const void ∗ src. The memory areas may not overlap. cudaMemcpy2DArrayToArray.Stream identifier Returns: cudaSuccess. cudaMemcpyFromArrayAsync.Symbol destination on device src . cudaMemcpyFromSymbol. naming a variable that resides in global or constant memory space. cudaErrorInvalidDevicePointer. cudaMemcpy2DToArrayAsync. cudaErrorInvalidMemcpyDirection Note: Note that this function may also return error codes from previous. cudaMemcpyArrayToArray. cudaMemcpy2DToArrayAsync. cudaMemcpy2D. cudaMemcpy2DFromArray. See also: cudaMemcpy.Size in bytes to copy offset . asynchronous launches. cudaMemcpyFromSymbolAsync 4. cudaMemcpyToSymbol. Parameters: symbol . cudaMemcpyToSymbolAsync. kind can be either cudaMemcpyHostToDevice or cudaMemcpyDeviceToDevice. symbol can either be a variable that resides in global or constant memory space.8. cudaErrorInvalidValue.Type of transfer Returns: cudaSuccess. See also: 59 cudaMemcpy. cudaMemcpyAsync. cudaMemcpy2DToArray.4. cudaMemcpy2DFromArrayAsync. cudaMemcpyArrayToArray. cudaErrorInvalidSymbol. cudaMemcpyToArray. cudaMemcpyFromArray.33 cudaError_t cudaMemcpyToSymbol (const char ∗ symbol. cudaErrorInvalidMemcpyDirection Note: Note that this function may also return error codes from previous. cudaMemcpyFromSymbol. cudaMemcpyToSymbolAsync.Type of transfer stream . cudaMemcpyAsync. enum cudaMemcpyKind kind = cudaMemcpyHostToDevice) Copies count bytes from the memory area pointed to by src to the memory area pointed to by offset bytes from the start of symbol symbol. cudaMemcpy2DFromArrayAsync. asynchronous launches. cudaMemcpy2DAsync. cudaMemcpy2DFromArray. cudaMemcpyToArrayAsync. cudaMemcpy2D. cudaMemcpyFromArray.2. cudaMemcpy2DToArray. cudaErrorInvalidDevicePointer.8 Memory Management kind .Source memory address count . cudaMemcpy2DArrayToArray. cudaMemcpyToArray. or it can be a character string. cudaMemcpyFromSymbolAsync Generated for NVIDIA CUDA Library by Doxygen .

size_t count. cudaErrorInvalidMemcpyDirection Note: Note that this function may also return error codes from previous. Generated for NVIDIA CUDA Library by Doxygen . Parameters: free . cudaMemcpyToArrayAsync. or it can be a character string. so the call may return before the copy is complete. cudaMemcpyFromSymbolAsync 4. enum cudaMemcpyKind kind. See also: cudaMemcpy.8. The memory areas may not overlap. cudaMemcpy2DArrayToArray.Type of transfer stream .8. cudaMemcpy2DToArray. cudaMemcpyFromArrayAsync.34 Module Documentation cudaError_t cudaMemcpyToSymbolAsync (const char ∗ symbol. cudaMemcpy2D. cudaMemcpy2DToArrayAsync. cudaErrorInitializationError.Offset from start of symbol in bytes kind .2. cudaMemcpyFromSymbol.Returned total memory in bytes Returns: cudaSuccess. cudaErrorInvalidValue. kind can be either cudaMemcpyHostToDevice or cudaMemcpyDeviceToDevice. size_t ∗ total) Returns in ∗free and ∗total respectively. If kind is cudaMemcpyHostToDevice and stream is non-zero.2.Size in bytes to copy offset . cudaMemcpy2DAsync. size_t offset. cudaMemcpyAsync. const void ∗ src. cudaMemcpy2DFromArray. cudaErrorInvalidValue. asynchronous launches.Stream identifier Returns: cudaSuccess. naming a variable that resides in global or constant memory space. cudaErrorLaunchFailure Note: Note that this function may also return error codes from previous.Source memory address count . symbol can either be a variable that resides in global or constant memory space. the copy may overlap with operations in other streams. cudaMemcpyArrayToArray.35 cudaError_t cudaMemGetInfo (size_t ∗ free. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed as input. cudaErrorInvalidDevicePointer. Parameters: symbol .60 4. cudaMemcpy2DFromArrayAsync. cudaMemcpyToSymbol. cudaMemcpyToSymbolAsync() is asynchronous with respect to the host. the free and total amount of memory available for allocation by the device in bytes. cudaMemcpyFromArray. asynchronous launches. cudaStream_t stream = 0) Copies count bytes from the memory area pointed to by src to the memory area pointed to by offset bytes from the start of symbol symbol. The copy can optionally be associated to a stream by passing a non-zero stream argument. cudaErrorInvalidSymbol. cudaMemcpyToArray.Symbol destination on device src .Returned free memory in bytes total .

See also: cudaMemset2D.Width of matrix set (columns in bytes) height . pitch is the width in bytes of the 2D array pointed to by dstPtr. asynchronous launches. including any padding added to the end of each row.2.Height of matrix set (rows) Returns: cudaSuccess. cudaErrorInvalidDevicePointer Note: Note that this function may also return error codes from previous.8 Memory Management 4. cudaMemset3DAsync 4. cudaMemset3D. size_t pitch.37 cudaError_t cudaMemset2D (void ∗ devPtr. cudaMemsetAsync. See also: cudaMemset.Value to set for each byte of specified memory count .Pointer to 2D device memory pitch .8. size_t count) 61 Fills the first count bytes of the memory area pointed to by devPtr with the constant byte value value. int value. cudaErrorInvalidDevicePointer Note: Note that this function may also return error codes from previous.Pitch in bytes of 2D device memory value .8. Parameters: devPtr .4.2.Pointer to device memory value . asynchronous launches.36 cudaError_t cudaMemset (void ∗ devPtr. Parameters: devPtr . cudaErrorInvalidValue. cudaMemset3DAsync Generated for NVIDIA CUDA Library by Doxygen .Size in bytes to set Returns: cudaSuccess. size_t height) Sets to the specified value value a matrix (height rows of width bytes each) pointed to by dstPtr. cudaMemset2DAsync. cudaMemsetAsync. cudaErrorInvalidValue. int value. cudaMemset3D. cudaMemset2DAsync.Value to set for each byte of specified memory width . This function performs fastest when the pitch is one that has been passed back by cudaMallocPitch(). size_t width.

Parameters: pitchedDevPtr . The object to initialize is defined by pitchedDevPtr. The operation can optionally be associated to a stream by passing a non-zero stream argument.Width of matrix set (columns in bytes) height .Size parameters for where to set device memory (width field in bytes) Generated for NVIDIA CUDA Library by Doxygen . If stream is nonzero.Height of matrix set (rows) stream . The xsize field specifies the logical width of each row in bytes. The pitch field of pitchedDevPtr is the width in memory in bytes of the 3D array pointed to by pitchedDevPtr. Parameters: devPtr . int value. cudaMemset3D.2. the operation may overlap with operations in other streams. size_t height. extents with height equal to the ysize of pitchedDevPtr will perform faster than when the height is shorter than the ysize. int value.38 Module Documentation cudaError_t cudaMemset2DAsync (void ∗ devPtr. cudaStream_t stream = 0) Sets to the specified value value a matrix (height rows of width bytes each) pointed to by dstPtr.62 4.Pointer to pitched device memory value . asynchronous launches.Pointer to 2D device memory pitch . cudaMemset2DAsync() is asynchronous with respect to the host. including any padding added to the end of each row. while the ysize field specifies the height of each 2D slice in rows.Pitch in bytes of 2D device memory value . cudaMemsetAsync.8.39 cudaError_t cudaMemset3D (struct cudaPitchedPtr pitchedDevPtr. Extents with width greater than or equal to the xsize of pitchedDevPtr may perform significantly faster than extents narrower than the xsize.Value to set for each byte of specified memory width . a height in rows.Value to set for each byte of specified memory extent . cudaErrorInvalidDevicePointer Note: Note that this function may also return error codes from previous. including any padding added to the end of each row. See also: cudaMemset.Stream identifier Returns: cudaSuccess. cudaMemset2D. struct cudaExtent extent) Initializes each element of a 3D array to the specified value value. This function performs fastest when the pitch is one that has been passed back by cudaMallocPitch().8. The extents of the initialized region are specified as a width in bytes. cudaErrorInvalidValue. pitch is the width in bytes of the 2D array pointed to by dstPtr. size_t pitch.2. and a depth in slices. This function performs fastest when the pitchedDevPtr has been allocated by cudaMalloc3D(). cudaMemset3DAsync 4. size_t width. so the call may return before the memset is complete. Secondarily.

40 cudaError_t cudaMemset3DAsync (struct cudaPitchedPtr pitchedDevPtr. make_cudaExtent Generated for NVIDIA CUDA Library by Doxygen . If stream is nonzero. The pitch field of pitchedDevPtr is the width in memory in bytes of the 3D array pointed to by pitchedDevPtr. Extents with width greater than or equal to the xsize of pitchedDevPtr may perform significantly faster than extents narrower than the xsize. cudaMemset2DAsync. cudaMemset2DAsync.8 Memory Management Returns: cudaSuccess. so the call may return before the memset is complete. the operation may overlap with operations in other streams. cudaStream_t stream = 0) Initializes each element of a 3D array to the specified value value. See also: 63 cudaMemset.4. asynchronous launches. cudaErrorInvalidDevicePointer Note: Note that this function may also return error codes from previous. The xsize field specifies the logical width of each row in bytes. asynchronous launches. make_cudaExtent 4. cudaMalloc3D. This function performs fastest when the pitchedDevPtr has been allocated by cudaMalloc3D().2.Stream identifier Returns: cudaSuccess. cudaMemsetAsync. cudaMemset3DAsync() is asynchronous with respect to the host. int value. and a depth in slices. struct cudaExtent extent. Parameters: pitchedDevPtr .Pointer to pitched device memory value . cudaMemset3D. The object to initialize is defined by pitchedDevPtr. cudaErrorInvalidDevicePointer Note: Note that this function may also return error codes from previous. make_cudaPitchedPtr. cudaErrorInvalidValue. including any padding added to the end of each row. cudaMemset2D. Secondarily. a height in rows.Value to set for each byte of specified memory extent . cudaMalloc3D. make_cudaPitchedPtr. See also: cudaMemset. cudaErrorInvalidValue. extents with height equal to the ysize of pitchedDevPtr will perform faster than when the height is shorter than the ysize. cudaMemset3DAsync. cudaMemsetAsync. cudaMemset2D. The operation can optionally be associated to a stream by passing a non-zero stream argument. while the ysize field specifies the height of each 2D slice in rows.Size parameters for where to set device memory (width field in bytes) stream .8. The extents of the initialized region are specified as a width in bytes.

Height in elements d . and ysz. size_t count. Parameters: d . size_t ysz) [read] Returns a cudaPitchedPtr based on the specified input parameters d. Parameters: w .41 Module Documentation cudaError_t cudaMemsetAsync (void ∗ devPtr. The operation can optionally be associated to a stream by passing a non-zero stream argument. xsz.Stream identifier Returns: cudaSuccess. cudaMemset2DAsync. See also: cudaMemset. cudaMemsetAsync() is asynchronous with respect to the host. h.64 4. h.8. make_cudaPos 4.Width in bytes h .2.Depth in elements Returns: cudaExtent specified by w. size_t h.Pitch of allocated memory in bytes Generated for NVIDIA CUDA Library by Doxygen . cudaMemset2D.Pointer to allocated memory p . cudaMemset3D.8.2. the operation may overlap with operations in other streams. Parameters: devPtr . cudaStream_t stream = 0) Fills the first count bytes of the memory area pointed to by devPtr with the constant byte value value. If stream is non-zero.Pointer to device memory value . and d See also: make_cudaPitchedPtr. so the call may return before the memset is complete.2. cudaMemset3DAsync 4. asynchronous launches.43 struct cudaPitchedPtr make_cudaPitchedPtr (void ∗ d.8.42 struct cudaExtent make_cudaExtent (size_t w. int value. and d.Size in bytes to set stream . cudaErrorInvalidDevicePointer Note: Note that this function may also return error codes from previous.Value to set for each byte of specified memory count . size_t d) [read] Returns a cudaExtent based on the specified input parameters w. p. cudaErrorInvalidValue. size_t p. size_t xsz.

make_cudaPos 65 4. p. size_t y.2. and z See also: make_cudaExtent. y. and ysz See also: make_cudaExtent.8 Memory Management xsz .4. y. Parameters: x .Z position Returns: cudaPos specified by x. and z.8. size_t z) [read] Returns a cudaPos based on the specified input parameters x.Logical width of allocation in elements ysz .Y position z .44 struct cudaPos make_cudaPos (size_t x.X position y .Logical height of allocation in elements Returns: cudaPitchedPtr specified by d. xsz. make_cudaPitchedPtr Generated for NVIDIA CUDA Library by Doxygen .

• cudaError_t cudaGraphicsGLRegisterImage (struct cudaGraphicsResource ∗∗resource. 4. unsigned int flags) Registers an OpenGL buffer object. • cudaError_t cudaGraphicsGLRegisterBuffer (struct cudaGraphicsResource ∗∗resource.1 Enumeration Type Documentation enum cudaGLMapFlags CUDA GL Map Flags Enumerator: cudaGLMapFlagsNone Default.9. HGPUNV hGpu) Gets the CUDA device associated with hGpu. cudaGLMapFlagsReadOnly = 1. • cudaError_t cudaWGLGetDevice (int ∗device. Assume resource can be read/written cudaGLMapFlagsReadOnly CUDA kernels will not write to this resource cudaGLMapFlagsWriteDiscard CUDA kernels will only write to and will not read from this resource Generated for NVIDIA CUDA Library by Doxygen .9.2.9. cudaGLMapFlagsWriteDiscard = 2 } Functions • cudaError_t cudaGLSetGLDevice (int device) Sets the CUDA device for use with OpenGL interoperability. GLuint buffer. GLenum target.66 Module Documentation 4. 4.1 Detailed Description This section describes the OpenGL interoperability functions of the CUDA runtime application programming interface. GLuint image.2 4.9 OpenGL Interoperability Modules • OpenGL Interoperability [DEPRECATED] Enumerations • enum cudaGLMapFlags { cudaGLMapFlagsNone = 0. unsigned int flags) Register an OpenGL texture or renderbuffer object.

cudaErrorSetOnActiveProcess Note: Note that this function may also return error codes from previous. cudaGraphicsRe- Generated for NVIDIA CUDA Library by Doxygen . then this call returns cudaErrorSetOnActiveProcess. Records the thread as using OpenGL interoperability.Pointer to the returned object handle buffer . cudaErrorInvalidValue. asynchronous launches. cudaErrorUnknown Note: Note that this function may also return error codes from previous. This is the default value.9 OpenGL Interoperability 67 4. The map flags flags specify the intended usage.9.9. • cudaGraphicsMapFlagsReadOnly: Specifies that CUDA will not write to this resource. If the host thread has already initialized the CUDA runtime by calling non-device management runtime functions or if there exists a CUDA driver context active on the host thread.Device to use for OpenGL interoperability Returns: cudaSuccess.3. cudaErrorInvalidDevice.9. cudaGLMapBufferObjectAsync. cudaGLUnregisterBufferObject. cudaGLUnmapBufferObjectAsync 4. cudaErrorInvalidResourceHandle.3.name of buffer object to be registered flags .1 Function Documentation cudaError_t cudaGLSetGLDevice (int device) Records device as the device on which the active host thread executes the device code. See also: cudaGLCtxCreate. See also: cudaGLRegisterBufferObject. It is therefore assumed that this resource will be read from and written to by CUDA. so none of the data previously stored in the resource will be preserved. asynchronous launches. Parameters: resource . as follows: • cudaGraphicsMapFlagsNone: Specifies no hints about how this resource will be used. cudaGraphicsUnregisterResource.Map flags Returns: cudaSuccess. sourceGetMappedPointer cudaGraphicsMapResources.4. cudaGLUnmapBufferObject. unsigned int flags) Registers the buffer object specified by buffer for access by CUDA.3 4. Parameters: device . • cudaGraphicsMapFlagsWriteDiscard: Specifies that CUDA will not read from this resource and will write over the entire contents of the resource. cudaErrorInvalidDevice. cudaGLMapBufferObject.2 cudaError_t cudaGraphicsGLRegisterBuffer (struct cudaGraphicsResource ∗∗ resource. A handle to the registered object is returned as resource. GLuint buffer.

target must match the type of the object. See also: cudaGLSetGLDevice cudaGraphicsUnregisterResource. The map flags flags specify the intended usage. and must be one of GL_TEXTURE_2D. • cudaGraphicsMapFlagsReadOnly: Specifies that CUDA will not write to this resource. A handle to the registered object is returned as resource.4 cudaError_t cudaWGLGetDevice (int ∗ device. hGpu . GL_TEXTURE_3D. or -1 if hGpu is not a compute device. This is the default value. cudaGraphicsMapResources. GL_TEXTURE_RECTANGLE. as queried via WGL_NV_gpu_affinity() Generated for NVIDIA CUDA Library by Doxygen . or GL_RENDERBUFFER. cudaErrorUnknown Note: Note that this function may also return error codes from previous. cudaErrorInvalidResourceHandle.Pointer to the returned object handle image . so none of the data previously stored in the resource will be preserved.Handle to a GPU. The following image classes are currently disallowed: • Textures with borders • Multisampled renderbuffers Parameters: resource .3.9. unsigned int flags) Registers the texture or renderbuffer object specified by image for access by CUDA.3. Parameters: device .Map flags Returns: cudaSuccess. GLuint image. It is therefore assumed that this resource will be read from and written to by CUDA.name of texture or renderbuffer object to be registered target . GLenum target.Identifies the type of object specified by image. flags .Returns the device associated with hGpu. cudaGraphicsSubResourceGetMappedArray 4. if applicable. cudaErrorInvalidDevice. asynchronous launches. HGPUNV hGpu) Returns the CUDA device associated with a hGpu. cudaErrorInvalidValue.68 4. GL_TEXTURE_CUBE_MAP. as follows: • cudaGraphicsMapFlagsNone: Specifies no hints about how this resource will be used.9. GL_TEXTURE_2D_ARRAY. • cudaGraphicsMapFlagsWriteDiscard: Specifies that CUDA will not read from this resource and will write over the entire contents of the resource.3 Module Documentation cudaError_t cudaGraphicsGLRegisterImage (struct cudaGraphicsResource ∗∗ resource.

cudaGLSetGLDevice 69 Generated for NVIDIA CUDA Library by Doxygen .4. See also: WGL_NV_gpu_affinity.9 OpenGL Interoperability Returns: cudaSuccess Note: Note that this function may also return error codes from previous. asynchronous launches.

Generated for NVIDIA CUDA Library by Doxygen . cudaD3D9MapFlagsReadOnly = 1.70 Module Documentation 4. • cudaError_t cudaD3D9GetDirect3DDevice (IDirect3DDevice9 ∗∗ppD3D9Device) Gets the Direct3D device against which the current CUDA context was created. enum cudaD3D9DeviceList deviceList) Gets the CUDA devices corresponding to a Direct3D 9 device. IDirect3DDevice9 ∗pD3D9Device. unsigned int flags) Register a Direct3D 9 resource for access by CUDA. int device=-1) Sets the Direct3D device to use for interoperability in this thread. cudaD3D9DeviceListNextFrame = 3 } • enum cudaD3D9MapFlags { cudaD3D9MapFlagsNone = 0. int ∗pCudaDevices. cudaD3D9RegisterFlagsArray = 1 } Functions • cudaError_t cudaD3D9GetDevice (int ∗device.10. • cudaError_t cudaD3D9GetDevices (unsigned int ∗pCudaDeviceCount. cudaD3D9DeviceListCurrentFrame = 2. cudaD3D9MapFlagsWriteDiscard = 2 } • enum cudaD3D9RegisterFlags { cudaD3D9RegisterFlagsNone = 0.1 Detailed Description This section describes the Direct3D 9 interoperability functions of the CUDA runtime application programming interface.10 Direct3D 9 Interoperability Modules • Direct3D 9 Interoperability [DEPRECATED] Enumerations • enum cudaD3D9DeviceList { cudaD3D9DeviceListAll = 1. const char ∗pszAdapterName) Gets the device number for an adapter. • cudaError_t cudaD3D9SetDirect3DDevice (IDirect3DDevice9 ∗pD3D9Device. cudaGraphicsResource ∗∗resource. • cudaError_t cudaGraphicsD3D9RegisterResource (struct rect3DResource9 ∗pD3DResource. unsigned int cudaDeviceCount. IDi- 4.

cudaErrorUnknown Note: Note that this function may also return error codes from previous.4.2 enum cudaD3D9MapFlags CUDA D3D9 Map Flags Enumerator: cudaD3D9MapFlagsNone Default.10.2 4.10.1 Enumeration Type Documentation enum cudaD3D9DeviceList CUDA devices corresponding to a D3D9 device Enumerator: cudaD3D9DeviceListAll The CUDA devices for all GPUs used by a D3D9 device cudaD3D9DeviceListCurrentFrame The CUDA devices for the GPUs used by a D3D9 device in its currently rendering frame cudaD3D9DeviceListNextFrame The CUDA devices for the GPUs to be used by a D3D9 device in the next frame 4. Parameters: device .10. Resource can be accessed througa void∗ cudaD3D9RegisterFlagsArray Resource can be accessed through a CUarray∗ 4. Generated for NVIDIA CUDA Library by Doxygen .3.1 Function Documentation cudaError_t cudaD3D9GetDevice (int ∗ device.2.10.Returns the device corresponding to pszAdapterName pszAdapterName .3 4.10 Direct3D 9 Interoperability 71 4.10. asynchronous launches. Assume resource can be read/written cudaD3D9MapFlagsReadOnly CUDA kernels will not write to this resource cudaD3D9MapFlagsWriteDiscard CUDA kernels will only write to and will not read from this resource 4.10. If no device on the adapter with name pszAdapterName is CUDA-compatible then the call will fail. cudaErrorInvalidValue.2. const char ∗ pszAdapterName) Returns in ∗device the CUDA-compatible device corresponding to the adapter name pszAdapterName obtained from EnumDisplayDevices or IDirect3D9::GetAdapterIdentifier().3 enum cudaD3D9RegisterFlags CUDA D3D9 Register Flags Enumerator: cudaD3D9RegisterFlagsNone Default.2.D3D9 adapter to get device for Returns: cudaSuccess.

Parameters: ppD3D9Device . cudaD3D9DeviceListCurrentFrame for the devices used to render the current frame (in SLI).2 cudaError_t cudaD3D9GetDevices (unsigned int ∗ pCudaDeviceCount. See also: cudaGraphicsUnregisterResource. cudaGraphicsD3D9RegisterResource. asynchronous launches. cudaErrorUnknown Note: Note that this function may also return error codes from previous. cudaGraphicsSubResourceGetMappedArray. Parameters: pCudaDeviceCount .10. int ∗ pCudaDevices. Also returns in ∗pCudaDevices at most cudaDeviceCount of the the CUDA-compatible devices corresponding to the Direct3D 9 device pD3D9Device. or cudaD3D9DeviceListNextFrame for the devices used to render the next frame (in SLI).Returns the Direct3D device for this thread Returns: cudaSuccess. Returns: cudaSuccess.Returned number of CUDA devices corresponding to pD3D9Device pCudaDevices .Returned CUDA devices corresponding to pD3D9Device cudaDeviceCount . IDirect3DDevice9 ∗ pD3D9Device. cudaErrorNoDevice. cudaGraphicsMapResources.The size of the output device array pCudaDevices pD3D9Device . enum cudaD3D9DeviceList deviceList) Returns in ∗pCudaDeviceCount the number of CUDA-compatible devices corresponding to the Direct3D 9 device pD3D9Device.10.3. cudaGraphicsResourceGetMappedPointer 4. See also: cudaD3D9SetDirect3DDevice Generated for NVIDIA CUDA Library by Doxygen . asynchronous launches. If any of the GPUs being used to render pDevice are not CUDA capable then the call will return cudaErrorNoDevice. unsigned int cudaDeviceCount.Direct3D 9 device to query for CUDA devices deviceList .The set of devices to return.72 See also: cudaD3D9SetDirect3DDevice. Module Documentation 4. This set may be cudaD3D9DeviceListAll for all devices. cudaErrorUnknown Note: Note that this function may also return error codes from previous.3 cudaError_t cudaD3D9GetDirect3DDevice (IDirect3DDevice9 ∗∗ ppD3D9Device) Returns in ∗ppD3D9Device the Direct3D device against which this CUDA context was created in cudaD3D9SetDirect3DDevice().3.

Only stand-alone objects of type IDirect3DSurface9 may be explicitly shared. one must register the base texture object. Returns: cudaSuccess. Parameters: pD3D9Device . individual mipmap levels and faces of cube maps may not be registered directly. cudaErrorSetOnActiveProcess Note: Note that this function may also return error codes from previous. This device must be among the devices returned when querying cudaD3D9DeviceListAll from cudaD3D9GetDevices. int device = -1) 73 Records pD3D9Device as the Direct3D device to use for Direct3D interoperability on this host thread.The CUDA device to use. In particular. The only valid value for this parameter is • cudaGraphicsRegisterFlagsNone Generated for NVIDIA CUDA Library by Doxygen . The flags argument may be used to specify additional parameters at register time. may be set to -1 to automatically select an appropriate CUDA device. If the host thread has already initialized the CUDA runtime by calling non-device management runtime functions or if there exists a CUDA driver context active on the host thread. Also on success.4 cudaError_t cudaD3D9SetDirect3DDevice (IDirect3DDevice9 ∗ pD3D9Device. If this call is successful then the application will be able to map and unmap this resource until it is unregistered through cudaGraphicsUnregisterResource(). cudaErrorInitializationError. cudaErrorInvalidValue. IDirect3DResource9 ∗ pD3DResource. this call will increase the internal reference count on pD3DResource. To access individual surfaces associated with a texture. 4.10 Direct3D 9 Interoperability 4.3. • IDirect3DBaseTexture9: individual surfaces on this texture may be accessed through an array. then this call returns cudaErrorSetOnActiveProcess.Direct3D device to use for this thread device . This reference count will be decremented when this resource is unregistered through cudaGraphicsUnregisterResource().5 cudaError_t cudaGraphicsD3D9RegisterResource (struct cudaGraphicsResource ∗∗ resource. asynchronous launches.10. See also: cudaD3D9GetDevice.10. This call is potentially high-overhead and should not be called every frame in interactive applications. cudaGraphicsD3D9RegisterResource. The type of pD3DResource must be one of the following. unsigned int flags) Registers the Direct3D 9 resource pD3DResource for access by CUDA.3. This reference count will be decremented upon destruction of this context through cudaThreadExit(). • IDirect3DVertexBuffer9: may be accessed through a device pointer • IDirect3DIndexBuffer9: may be accessed through a device pointer • IDirect3DSurface9: may be accessed through an array. Successful context creation on pD3D9Device will increase the internal reference count on pD3D9Device.4.

cudaGraphicsMapResources. then cudaErrorInvalidResourceHandle is returned. cudaGraphicsResourceGetMappedPointer Generated for NVIDIA CUDA Library by Doxygen . cudaErrorUnknown Note: Note that this function may also return error codes from previous. • Surfaces of depth or stencil formats cannot be shared. If pD3DResource is of incorrect type or is already registered.74 Module Documentation Not all Direct3D resources of the above types may be used for interoperability with CUDA.Parameters for resource registration Returns: cudaSuccess. • Textures which are not of a format which is 1.Pointer to returned resource handle pD3DResource . See also: cudaD3D9SetDirect3DDevice cudaGraphicsUnregisterResource. asynchronous launches. 2. cudaErrorInvalidValue. or 32-bit integer or floating-point data cannot be shared. cudaGraphicsSubResourceGetMappedArray. If Direct3D interoperability is not initialized using cudaD3D9SetDirect3DDevice then cudaErrorInvalidDevice is returned. If pD3DResource cannot be registered. 16. The following are some limitations. cudaErrorInvalidDevice.Direct3D resource to register flags . or 4 channels of 8. then cudaErrorUnknown is returned. • The primary rendertarget may not be registered with CUDA. cudaErrorInvalidResourceHandle. • Resources allocated as shared may not be registered with CUDA. Parameters: resource .

11 Direct3D 10 Interoperability Modules • Direct3D 10 Interoperability [DEPRECATED] Enumerations • enum cudaD3D10DeviceList { cudaD3D10DeviceListAll = 1. 4. int ∗pCudaDevices. cudaD3D10MapFlagsReadOnly = 1. enum cudaD3D10DeviceList deviceList) Gets the CUDA devices corresponding to a Direct3D 10 device. cudaD3D10DeviceListCurrentFrame = 2. • cudaError_t cudaGraphicsD3D10RegisterResource ID3D10Resource ∗pD3DResource. • cudaError_t cudaD3D10GetDevices (unsigned int ∗pCudaDeviceCount.4. unsigned int flags) Register a Direct3D 10 resource for access by CUDA. cudaD3D10MapFlagsWriteDiscard = 2 } • enum cudaD3D10RegisterFlags { cudaD3D10RegisterFlagsNone = 0. IDXGIAdapter ∗pAdapter) Gets the device number for an adapter. Generated for NVIDIA CUDA Library by Doxygen .1 Detailed Description This section describes the Direct3D 10 interoperability functions of the CUDA runtime application programming interface. ID3D10Device ∗pD3D10Device. • cudaError_t cudaD3D10SetDirect3DDevice (ID3D10Device ∗pD3D10Device.11. cudaD3D10RegisterFlagsArray = 1 } Functions • cudaError_t cudaD3D10GetDevice (int ∗device. cudaD3D10DeviceListNextFrame = 3 } • enum cudaD3D10MapFlags { cudaD3D10MapFlagsNone = 0. int device=-1) Sets the Direct3D 10 device to use for interoperability in this thread. unsigned int cudaDeviceCount.11 Direct3D 10 Interoperability 75 4. • cudaError_t cudaD3D10GetDirect3DDevice (ID3D10Device ∗∗ppD3D10Device) Gets the Direct3D device against which the current CUDA context was created. (struct cudaGraphicsResource ∗∗resource.

IDXGIAdapter ∗ pAdapter) Returns in ∗device the CUDA-compatible device corresponding to the adapter pAdapter obtained from IDXGIFactory::EnumAdapters.11.2 enum cudaD3D10MapFlags CUDA D3D10 Map Flags Enumerator: cudaD3D10MapFlagsNone Default. cudaErrorInvalidValue.76 Module Documentation 4.3 enum cudaD3D10RegisterFlags CUDA D3D10 Register Flags Enumerator: cudaD3D10RegisterFlagsNone Default. Resource can be accessed through a void∗ cudaD3D10RegisterFlagsArray Resource can be accessed through a CUarray∗ 4.2 4.11. Assume resource can be read/written cudaD3D10MapFlagsReadOnly CUDA kernels will not write to this resource cudaD3D10MapFlagsWriteDiscard CUDA kernels will only write to and will not read from this resource 4.11. This call will succeed only if a device on adapter pAdapter is Cuda-compatible.2.D3D10 adapter to get device for Returns: cudaSuccess.3. cudaErrorUnknown Note: Note that this function may also return error codes from previous. Parameters: device .3 4. Generated for NVIDIA CUDA Library by Doxygen .2.1 Function Documentation cudaError_t cudaD3D10GetDevice (int ∗ device.11. asynchronous launches.1 Enumeration Type Documentation enum cudaD3D10DeviceList CUDA devices corresponding to a D3D10 device Enumerator: cudaD3D10DeviceListAll The CUDA devices for all GPUs used by a D3D10 device cudaD3D10DeviceListCurrentFrame The CUDA devices for the GPUs used by a D3D10 device in its currently rendering frame cudaD3D10DeviceListNextFrame The CUDA devices for the GPUs to be used by a D3D10 device in the next frame 4.2.11.Returns the device corresponding to pAdapter pAdapter .11.

See also: cudaGraphicsUnregisterResource. cudaErrorUnknown Note: Note that this function may also return error codes from previous.Returned CUDA devices corresponding to pD3D10Device cudaDeviceCount .11. unsigned int cudaDeviceCount.The size of the output device array pCudaDevices pD3D10Device .4.The set of devices to return. enum cudaD3D10DeviceList deviceList) Returns in ∗pCudaDeviceCount the number of CUDA-compatible devices corresponding to the Direct3D 10 device pD3D10Device. cudaErrorNoDevice. cudaD3D10DeviceListCurrentFrame for the devices used to render the current frame (in SLI). Also returns in ∗pCudaDevices at most cudaDeviceCount of the the CUDAcompatible devices corresponding to the Direct3D 10 device pD3D10Device. Parameters: ppD3D10Device . cudaGraphicsD3D10RegisterResource. cudaErrorUnknown Note: Note that this function may also return error codes from previous. cudaGraphicsResourceGetMappedPointer 4. 77 4. This set may be cudaD3D10DeviceListAll for all devices. ID3D10Device ∗ pD3D10Device.3. asynchronous launches. cudaGraphicsSubResourceGetMappedArray. Parameters: pCudaDeviceCount .11 Direct3D 10 Interoperability See also: cudaD3D10SetDirect3DDevice.11.3.Returned number of CUDA devices corresponding to pD3D10Device pCudaDevices . See also: cudaD3D10SetDirect3DDevice Generated for NVIDIA CUDA Library by Doxygen .Direct3D 10 device to query for CUDA devices deviceList . If any of the GPUs being used to render pDevice are not CUDA capable then the call will return cudaErrorNoDevice. int ∗ pCudaDevices. asynchronous launches. cudaGraphicsMapResources. or cudaD3D10DeviceListNextFrame for the devices used to render the next frame (in SLI).3 cudaError_t cudaD3D10GetDirect3DDevice (ID3D10Device ∗∗ ppD3D10Device) Returns in ∗ppD3D10Device the Direct3D device against which this CUDA context was created in cudaD3D10SetDirect3DDevice().2 cudaError_t cudaD3D10GetDevices (unsigned int ∗ pCudaDeviceCount. Returns: cudaSuccess.Returns the Direct3D device for this thread Returns: cudaSuccess.

unsigned int flags) Registers the Direct3D 10 resource pD3DResource for access by CUDA. If this call is successful.78 4. cudaErrorInitializationError. asynchronous launches.4 Module Documentation cudaError_t cudaD3D10SetDirect3DDevice (ID3D10Device ∗ pD3D10Device. this call will increase the internal reference count on pD3DResource. This device must be among the devices returned when querying cudaD3D10DeviceListAll from cudaD3D10GetDevices. The only valid value for this parameter is • cudaGraphicsRegisterFlagsNone Not all Direct3D resources of the above types may be used for interoperability with CUDA.3. If the host thread has already initialized the CUDA runtime by calling non-device management runtime functions or if there exists a CUDA driver context active on the host thread. int device = -1) Records pD3D10Device as the Direct3D 10 device to use for Direct3D 10 interoperability on this host thread. then this call returns cudaErrorSetOnActiveProcess. See also: cudaD3D10GetDevice. Returns: cudaSuccess. cudaErrorInvalidValue. cudaGraphicsD3D10RegisterResource 4.The CUDA device to use. then the application will be able to map and unmap this resource until it is unregistered through cudaGraphicsUnregisterResource(). Generated for NVIDIA CUDA Library by Doxygen .Direct3D device to use for interoperability device .11. • ID3D10Buffer: may be accessed via a device pointer • ID3D10Texture1D: individual subresources of the texture may be accessed via arrays • ID3D10Texture2D: individual subresources of the texture may be accessed via arrays • ID3D10Texture3D: individual subresources of the texture may be accessed via arrays The flags argument may be used to specify additional parameters at register time. This reference count will be decremented when this resource is unregistered through cudaGraphicsUnregisterResource(). Successful context creation on pD3D10Device will increase the internal reference count on pD3D10Device. Also on success. may be set to -1 to automatically select an appropriate CUDA device.3. This reference count will be decremented upon destruction of this context through cudaThreadExit(). cudaErrorSetOnActiveProcess Note: Note that this function may also return error codes from previous.11. The type of pD3DResource must be one of the following. Parameters: pD3D10Device .5 cudaError_t cudaGraphicsD3D10RegisterResource (struct cudaGraphicsResource ∗∗ resource. This call is potentially high-overhead and should not be called every frame in interactive applications. ID3D10Resource ∗ pD3DResource. The following are some limitations.

4. cudaGraphicsMapResources. cudaErrorInvalidDevice. asynchronous launches.Direct3D resource to register flags . • Resources allocated as shared may not be registered with CUDA. If Direct3D interoperability is not initialized using cudaD3D10SetDirect3DDevice then cudaErrorInvalidDevice is returned. cudaErrorUnknown Note: Note that this function may also return error codes from previous. then cudaErrorInvalidResourceHandle is returned. then cudaErrorUnknown is returned.Pointer to returned resource handle pD3DResource . cudaGraphicsSubResourceGetMappedArray. If pD3DResource cannot be registered. Parameters: resource . or 32-bit integer or floating-point data cannot be shared.Parameters for resource registration Returns: cudaSuccess. See also: cudaD3D10SetDirect3DDevice cudaGraphicsUnregisterResource. • Surfaces of depth or stencil formats cannot be shared. or 4 channels of 8. 79 • Textures which are not of a format which is 1. cudaErrorInvalidValue.11 Direct3D 10 Interoperability • The primary rendertarget may not be registered with CUDA. 2. 16. cudaErrorInvalidResourceHandle. cudaGraphicsResourceGetMappedPointer Generated for NVIDIA CUDA Library by Doxygen . If pD3DResource is of incorrect type or is already registered.

12.12. • cudaError_t cudaGraphicsD3D11RegisterResource ID3D11Resource ∗pD3DResource. IDXGIAdapter ∗pAdapter) Gets the device number for an adapter. int ∗pCudaDevices.2.12 Direct3D 11 Interoperability Enumerations • enum cudaD3D11DeviceList { cudaD3D11DeviceListAll = 1. enum cudaD3D11DeviceList deviceList) Gets the CUDA devices corresponding to a Direct3D 11 device. cudaD3D11DeviceListNextFrame = 3 } Functions • cudaError_t cudaD3D11GetDevice (int ∗device. • cudaError_t cudaD3D11SetDirect3DDevice (ID3D11Device ∗pD3D11Device.80 Module Documentation 4. int device=-1) Sets the Direct3D 11 device to use for interoperability in this thread. ID3D11Device ∗pD3D11Device. • cudaError_t cudaD3D11GetDevices (unsigned int ∗pCudaDeviceCount. unsigned int cudaDeviceCount. 4. cudaD3D11DeviceListCurrentFrame = 2. (struct cudaGraphicsResource ∗∗resource.12.2 4.1 Enumeration Type Documentation enum cudaD3D11DeviceList CUDA devices corresponding to a D3D11 device Enumerator: cudaD3D11DeviceListAll The CUDA devices for all GPUs used by a D3D11 device cudaD3D11DeviceListCurrentFrame The CUDA devices for the GPUs used by a D3D11 device in its currently rendering frame cudaD3D11DeviceListNextFrame The CUDA devices for the GPUs to be used by a D3D11 device in the next frame Generated for NVIDIA CUDA Library by Doxygen . • cudaError_t cudaD3D11GetDirect3DDevice (ID3D11Device ∗∗ppD3D11Device) Gets the Direct3D device against which the current CUDA context was created. 4.1 Detailed Description This section describes the Direct3D 11 interoperability functions of the CUDA runtime application programming interface. unsigned int flags) Register a Direct3D 11 resource for access by CUDA.

12. or cudaD3D11DeviceListNextFrame for the devices used to render the next frame (in SLI).Returns the device corresponding to pAdapter pAdapter . cudaErrorInvalidValue. cudaD3D11DeviceListCurrentFrame for the devices used to render the current frame (in SLI).Returned number of CUDA devices corresponding to pD3D11Device pCudaDevices .The set of devices to return. IDXGIAdapter ∗ pAdapter) Returns in ∗device the CUDA-compatible device corresponding to the adapter pAdapter obtained from IDXGIFactory::EnumAdapters. See also: cudaGraphicsUnregisterResource.12 Direct3D 11 Interoperability 81 4.3. cudaGraphicsSubResourceGetMappedArray. Also returns in ∗pCudaDevices at most cudaDeviceCount of the the CUDAcompatible devices corresponding to the Direct3D 11 device pD3D11Device.12.12. cudaGraphicsMapResources.The size of the output device array pCudaDevices pD3D11Device .3.1 Function Documentation cudaError_t cudaD3D11GetDevice (int ∗ device. If any of the GPUs being used to render pDevice are not CUDA capable then the call will return cudaErrorNoDevice. asynchronous launches. unsigned int cudaDeviceCount.2 cudaError_t cudaD3D11GetDevices (unsigned int ∗ pCudaDeviceCount.3 4. This set may be cudaD3D11DeviceListAll for all devices. ID3D11Device ∗ pD3D11Device. cudaGraphicsResourceGetMappedPointer 4. int ∗ pCudaDevices. cudaErrorNoDevice. cudaErrorUnknown Note: Note that this function may also return error codes from previous. cudaGraphicsMapResources. Returns: cudaSuccess. Parameters: device . cudaGraphicsSubResourceGetMappedArray. See also: cudaGraphicsUnregisterResource. Parameters: pCudaDeviceCount . This call will succeed only if a device on adapter pAdapter is Cuda-compatible. cudaGraphicsResourceGetMappedPointer Generated for NVIDIA CUDA Library by Doxygen .Returned CUDA devices corresponding to pD3D11Device cudaDeviceCount .Direct3D 11 device to query for CUDA devices deviceList . enum cudaD3D11DeviceList deviceList) Returns in ∗pCudaDeviceCount the number of CUDA-compatible devices corresponding to the Direct3D 11 device pD3D11Device. cudaErrorUnknown Note: Note that this function may also return error codes from previous. asynchronous launches.4.D3D11 adapter to get device for Returns: cudaSuccess.

Parameters: ppD3D11Device .12.Direct3D device to use for interoperability device .5 cudaError_t cudaGraphicsD3D11RegisterResource (struct cudaGraphicsResource ∗∗ resource.3. Also on success. Returns: cudaSuccess.The CUDA device to use. this call will increase the internal reference count on Generated for NVIDIA CUDA Library by Doxygen . may be set to -1 to automatically select an appropriate CUDA device.Returns the Direct3D device for this thread Returns: cudaSuccess. See also: cudaD3D11SetDirect3DDevice 4. If this call is successful. If the host thread has already initialized the CUDA runtime by calling non-device management runtime functions or if there exists a CUDA driver context active on the host thread.82 4.4 cudaError_t cudaD3D11SetDirect3DDevice (ID3D11Device ∗ pD3D11Device.12. asynchronous launches. Successful context creation on pD3D11Device will increase the internal reference count on pD3D11Device. int device = -1) Records pD3D11Device as the Direct3D 11 device to use for Direct3D 11 interoperability on this host thread. ID3D11Resource ∗ pD3DResource. cudaErrorInvalidValue. cudaErrorInitializationError. cudaErrorSetOnActiveProcess Note: Note that this function may also return error codes from previous. cudaGraphicsD3D11RegisterResource 4. unsigned int flags) Registers the Direct3D 11 resource pD3DResource for access by CUDA. See also: cudaD3D11GetDevice.3. then the application will be able to map and unmap this resource until it is unregistered through cudaGraphicsUnregisterResource(). This device must be among the devices returned when querying cudaD3D11DeviceListAll from cudaD3D11GetDevices. This reference count will be decremented upon destruction of this context through cudaThreadExit().12.3. asynchronous launches.3 Module Documentation cudaError_t cudaD3D11GetDirect3DDevice (ID3D11Device ∗∗ ppD3D11Device) Returns in ∗ppD3D11Device the Direct3D device against which this CUDA context was created in cudaD3D11SetDirect3DDevice(). then this call returns cudaErrorSetOnActiveProcess. cudaErrorUnknown Note: Note that this function may also return error codes from previous. Parameters: pD3D11Device .

4. Parameters: resource . If Direct3D interoperability is not initialized using cudaD3D11SetDirect3DDevice then cudaErrorInvalidDevice is returned. or 4 channels of 8. See also: cudaD3D11SetDirect3DDevice cudaGraphicsUnregisterResource. This reference count will be decremented when this resource is unregistered through cudaGraphicsUnregisterResource(). then cudaErrorInvalidResourceHandle is returned. If pD3DResource is of incorrect type or is already registered. The type of pD3DResource must be one of the following.Direct3D resource to register flags . or 32-bit integer or floating-point data cannot be shared. The following are some limitations. • ID3D11Buffer: may be accessed via a device pointer • ID3D11Texture1D: individual subresources of the texture may be accessed via arrays • ID3D11Texture2D: individual subresources of the texture may be accessed via arrays • ID3D11Texture3D: individual subresources of the texture may be accessed via arrays The flags argument may be used to specify additional parameters at register time. cudaGraphicsResourceGetMappedPointer Generated for NVIDIA CUDA Library by Doxygen . 2. asynchronous launches.Parameters for resource registration Returns: cudaSuccess. cudaGraphicsSubResourceGetMappedArray. • Resources allocated as shared may not be registered with CUDA. then cudaErrorUnknown is returned. cudaErrorInvalidValue. • The primary rendertarget may not be registered with CUDA. • Textures which are not of a format which is 1. cudaErrorInvalidDevice. cudaErrorUnknown Note: Note that this function may also return error codes from previous.Pointer to returned resource handle pD3DResource . This call is potentially high-overhead and should not be called every frame in interactive applications. cudaErrorInvalidResourceHandle. cudaGraphicsMapResources.12 Direct3D 11 Interoperability 83 pD3DResource. The only valid value for this parameter is • cudaGraphicsRegisterFlagsNone Not all Direct3D resources of the above types may be used for interoperability with CUDA. 16. If pD3DResource cannot be registered. • Surfaces of depth or stencil formats cannot be shared.

as follows: • cudaGraphicsMapFlagsNone: Specifies no hints about how this resource will be used. • cudaGraphicsMapFlagsWriteDiscard: Specifies that CUDA will not read from this resource and will write over the entire contents of the resource. unsigned int flags) Register a VdpOutputSurface object. cudaErrorUnknown Generated for NVIDIA CUDA Library by Doxygen . VdpOutputSurface vdpSurface.1 Function Documentation cudaError_t cudaGraphicsVDPAURegisterOutputSurface (struct cudaGraphicsResource ∗∗ resource.2. The surface’s intended usage is specified using flags. • cudaError_t cudaGraphicsVDPAURegisterVideoSurface (struct cudaGraphicsResource ∗∗resource. • cudaError_t cudaVDPAUGetDevice ∗vdpGetProcAddress) (int ∗device. cudaErrorInvalidValue. VdpDevice vdpDevice. • cudaError_t cudaVDPAUSetVDPAUDevice (int device. ∗vdpGetProcAddress) Sets the CUDA device for use with VDPAU interoperability. VdpDevice vdpDevice. so none of the data previously stored in the resource will be preserved.VDPAU object to be registered flags .1 Detailed Description This section describes the VDPAU interoperability functions of the CUDA runtime application programming interface.84 Module Documentation 4. This is the default value. Parameters: resource . VdpOutputSurface vdpSurface. VdpGetProcAddress Gets the CUDA device associated with a VdpDevice. VdpVideoSurface vdpSurface. A handle to the registered object is returned as resource.13 VDPAU Interoperability Functions • cudaError_t cudaGraphicsVDPAURegisterOutputSurface (struct cudaGraphicsResource ∗∗resource. unsigned int flags) Registers the VdpOutputSurface specified by vdpSurface for access by CUDA. VdpGetProcAddress 4.13.13. unsigned int flags) Register a VdpVideoSurface object. cudaErrorInvalidDevice. • cudaGraphicsMapFlagsReadOnly: Specifies that CUDA will not write to this resource. cudaErrorInvalidResourceHandle.2 4. It is therefore assumed that this resource will be read from and written to by CUDA. 4.Pointer to the returned object handle vdpSurface .13.Map flags Returns: cudaSuccess.

cudaErrorInvalidResourceHandle. See also: cudaVDPAUSetVDPAUDevice cudaGraphicsUnregisterResource.13. VdpGetProcAddress ∗ vdpGetProcAddress) Returns the CUDA device associated with a VdpDevice. This is the default value. if applicable. The surface’s intended usage is specified using flags.2.Map flags Returns: cudaSuccess.VDPAU object to be registered flags . asynchronous launches. as follows: • cudaGraphicsMapFlagsNone: Specifies no hints about how this resource will be used.Returns the device associated with vdpDevice. cudaErrorInvalidValue. cudaGraphicsSubResourceGetMappedArray 4.3 cudaError_t cudaVDPAUGetDevice (int ∗ device. cudaErrorInvalidDevice.Pointer to the returned object handle vdpSurface . vdpDevice . VdpDevice vdpDevice. Parameters: resource . asynchronous launches.4.13 VDPAU Interoperability Note: Note that this function may also return error codes from previous. • cudaGraphicsMapFlagsReadOnly: Specifies that CUDA will not write to this resource. or -1 if the device associated with vdpDevice is not a compute device. unsigned int flags) Registers the VdpVideoSurface specified by vdpSurface for access by CUDA.2 cudaError_t cudaGraphicsVDPAURegisterVideoSurface (struct cudaGraphicsResource ∗∗ resource. • cudaGraphicsMapFlagsWriteDiscard: Specifies that CUDA will not read from this resource and will write over the entire contents of the resource. A handle to the registered object is returned as resource. cudaErrorUnknown Note: Note that this function may also return error codes from previous. It is therefore assumed that this resource will be read from and written to by CUDA.A VdpDevice handle vdpGetProcAddress .2. so none of the data previously stored in the resource will be preserved. Parameters: device . VdpVideoSurface vdpSurface.13.VDPAU’s VdpGetProcAddress function pointer Generated for NVIDIA CUDA Library by Doxygen . See also: 85 cudaVDPAUSetVDPAUDevice cudaGraphicsUnregisterResource. cudaGraphicsSubResourceGetMappedArray 4.

VdpGetProcAddress ∗ vdpGetProcAddress) Records device as the device on which the active host thread executes the device code. cudaGraphicsVDPAURegisterOutputSurface Generated for NVIDIA CUDA Library by Doxygen . See also: cudaVDPAUSetVDPAUDevice 4. Records the thread as using VDPAU interoperability.2. asynchronous launches. See also: cudaGraphicsVDPAURegisterVideoSurface.86 Returns: cudaSuccess Note: Module Documentation Note that this function may also return error codes from previous. cudaErrorSetOnActiveProcess Note: Note that this function may also return error codes from previous. If the host thread has already initialized the CUDA runtime by calling non-device management runtime functions or if there exists a CUDA driver context active on the host thread.The VdpDevice to interoperate with vdpGetProcAddress . cudaErrorInvalidDevice. Parameters: device .Device to use for VDPAU interoperability vdpDevice .4 cudaError_t cudaVDPAUSetVDPAUDevice (int device. VdpDevice vdpDevice.VDPAU’s VdpGetProcAddress function pointer Returns: cudaSuccess. asynchronous launches.13. then this call returns cudaErrorSetOnActiveProcess.

1 Function Documentation cudaError_t cudaGraphicsMapResources (int count.14. cudaStream_t stream=0) Unmap graphics resources.14. This function provides the synchronization guarantee that any graphics calls issued before cudaGraphicsMapResources() will complete before any subsequent CUDA work issued in stream begins. cudaStream_t stream=0) Map graphics resources for access by CUDA. unsigned int mipLevel) Get an array through which to access a subresource of a mapped graphics resource. cudaGraphicsResource_t ∗resources.14 Graphics Interoperability Functions • cudaError_t cudaGraphicsMapResources (int count.2. The resources in resources may be accessed by CUDA until they are unmapped. The graphics API from which resources were registered should not access any resources while they are mapped by CUDA. • cudaError_t cudaGraphicsSubResourceGetMappedArray (struct cudaArray ∗∗array. • cudaError_t cudaGraphicsUnregisterResource (cudaGraphicsResource_t resource) Unregisters a graphics resource for access by CUDA. cudaGraphicsResource_t ∗ resources.1 Detailed Description This section describes the graphics interoperability functions of the CUDA runtime application programming interface. 4. Parameters: count . the results are undefined. 4. • cudaError_t cudaGraphicsResourceSetMapFlags (cudaGraphicsResource_t resource.Resources to map for CUDA Generated for NVIDIA CUDA Library by Doxygen If any of . cudaGraphicsResource_t resource) Get an device pointer through which to access a mapped graphics resource. unsigned int flags) Set usage flags for mapping a graphics resource. unsigned int arrayIndex. If resources contains any duplicate entries then cudaErrorInvalidResourceHandle is returned. If an application does so. resources are presently mapped for access by CUDA then cudaErrorUnknown is returned. cudaStream_t stream = 0) Maps the count graphics resources in resources for access by CUDA. size_t ∗size. cudaGraphicsResource_t resource.14 Graphics Interoperability 87 4.Number of resources to map resources . • cudaError_t cudaGraphicsResourceGetMappedPointer (void ∗∗devPtr.14.4. • cudaError_t cudaGraphicsUnmapResources (int count.2 4. cudaGraphicsResource_t ∗resources.

asynchronous launches. cudaGraphicsResource_t resource) Returns in ∗devPtr a pointer through which the mapped graphics resource resource may be accessed.Stream for synchronization Returns: cudaSuccess.2. Generated for NVIDIA CUDA Library by Doxygen . • cudaGraphicsMapFlagsReadOnly: Specifies that CUDA will not write to resource.14. asynchronous launches. size_t ∗ size. cudaGraphicsSubResourceGetMappedArray 4.88 stream .Returned pointer through which resource may be accessed size . If resource is not mapped then cudaErrorUnknown is returned. See also: cudaGraphicsMapResources. cudaErrorInvalidResourceHandle.Returned size of the buffer accessible starting at ∗devPtr resource . ∗ Parameters: devPtr . The value set in devPtr may change every time that resource is mapped.3 cudaError_t cudaGraphicsResourceSetMapFlags (cudaGraphicsResource_t resource. Returns in ∗size the size of the memory in bytes which may be accessed from that pointer.Mapped resource to access Returns: cudaSuccess. cudaErrorUnknown Note: Note that this function may also return error codes from previous.2. cudaErrorUnknown Note: Module Documentation Note that this function may also return error codes from previous. It is therefore assumed that CUDA may read from or write to resource. cudaErrorInvalidResourceHandle.2 cudaError_t cudaGraphicsResourceGetMappedPointer (void ∗∗ devPtr. See also: cudaGraphicsResourceGetMappedPointer cudaGraphicsSubResourceGetMappedArray cudaGraphicsUnmapResources 4.14. If resource is not a buffer then it cannot be accessed via a pointer and cudaErrorUnknown is returned. unsigned int flags) Set flags for mapping the graphics resource resource. Changes to flags will take effect the next time resource is mapped. cudaErrorInvalidValue. The flags argument may be any of the following: • cudaGraphicsMapFlagsNone: Specifies no hints about how resource will be used.

Parameters: resource .14. Note: Note that this function may also return error codes from previous. cudaErrorInvalidValue. cudaErrorInvalidResourceHandle. cudaGraphicsResource_t resource. If resource is not a texture then it cannot be accessed via an array and cudaErrorUnknown is returned. cudaErrorUnknown Note: Note that this function may also return error codes from previous. Parameters: array . See also: cudaGraphicsResourceGetMappedPointer Generated for NVIDIA CUDA Library by Doxygen .14 Graphics Interoperability 89 • cudaGraphicsMapFlagsWriteDiscard: Specifies CUDA will not read from resource and will write over the entire contents of resource.Returned array through which a subresource of resource may be accessed resource . unsigned int arrayIndex. cudaErrorInvalidValue. cudaErrorUnknown.4 cudaError_t cudaGraphicsSubResourceGetMappedArray (struct cudaArray ∗∗ array. asynchronous launches.Registered resource to set flags for flags .4. The value set in array may change every time that resource is mapped. If resource is presently mapped for access by CUDA then cudaErrorUnknown is returned.Mipmap level for the subresource to access Returns: cudaSuccess.Mapped resource to access arrayIndex .Parameters for resource mapping Returns: cudaSuccess.Array index for array textures or cubemap face index as defined by cudaGraphicsCubeFace for cubemap textures for the subresource to access mipLevel . asynchronous launches. See also: cudaGraphicsMapResources 4. If resource is not mapped then cudaErrorUnknown is returned. cudaErrorInvalidResourceHandle. If arrayIndex is not a valid array index for resource then cudaErrorInvalidValue is returned. so none of the data previously stored in resource will be preserved. If flags is not one of the above values then cudaErrorInvalidValue is returned.2. unsigned int mipLevel) Returns in ∗array an array through which the subresource of the mapped graphics resource resource which corresponds to array index arrayIndex and mipmap level mipLevel may be accessed. If mipLevel is not a valid mipmap level for resource then cudaErrorInvalidValue is returned.

Resource to unregister Returns: cudaSuccess. cudaGraphicsResource_t ∗ resources.14.2. See also: cudaGraphicsMapResources If any of 4. cudaGraphicsGLRegisterImage cudaGraph- Generated for NVIDIA CUDA Library by Doxygen . resources are not presently mapped for access by Cuda then cudaErrorUnknown is returned. asynchronous launches. Parameters: count . If resources contains any duplicate entries then cudaErrorInvalidResourceHandle is returned.Number of resources to unmap resources .14. cudaGraphicsD3D10RegisterResource.90 4. cudaErrorUnknown Note: Note that this function may also return error codes from previous. If resource is invalid then cudaErrorInvalidResourceHandle is returned. the resources in resources may not be accessed by CUDA until they are mapped again.Stream for synchronization Returns: cudaSuccess. See also: cudaGraphicsD3D9RegisterResource. cudaGraphicsGLRegisterBuffer. This function provides the synchronization guarantee that any CUDA work issued in stream before cudaGraphicsUnmapResources() will complete before any subsequently issued graphics work begins. cudaErrorUnknown Note: Note that this function may also return error codes from previous. asynchronous launches. cudaErrorInvalidResourceHandle. icsD3D11RegisterResource. cudaStream_t stream = 0) Unmaps the count graphics resources in resources.6 cudaError_t cudaGraphicsUnregisterResource (cudaGraphicsResource_t resource) Unregisters the graphics resource resource so it is not accessible by CUDA unless registered again. Once unmapped.2. cudaErrorInvalidResourceHandle.Resources to unmap stream . Parameters: resource .5 Module Documentation cudaError_t cudaGraphicsUnmapResources (int count.

const void ∗ devPtr.15 Texture Reference Management Functions • cudaError_t cudaBindTexture (size_t ∗offset. const struct cudaChannelFormatDesc ∗ desc. size_t width. const struct textureReference ∗texref. const void ∗devPtr. const char ∗symbol) Get the texture reference associated with a symbol. • cudaError_t cudaGetTextureAlignmentOffset (size_t ∗offset. size_t size=UINT_MAX) Binds a memory area to a texture. size_t pitch) Binds a 2D memory area to a texture. • struct cudaChannelFormatDesc cudaCreateChannelDesc (int x. 4. • cudaError_t cudaBindTexture2D (size_t ∗offset. const struct cudaArray ∗array) Get the channel descriptor of an array. const struct textureReference ∗texref) Get the alignment offset of a texture. const struct cudaChannelFormatDesc ∗desc. int w.1 Detailed Description This section describes the low level texture reference management functions of the CUDA runtime application programming interface.15. enum cudaChannelFormatKind f) Returns a channel descriptor using the specified format. • cudaError_t cudaGetTextureReference (const struct textureReference ∗∗texref. const struct cudaChannelFormatDesc ∗desc.4. Since the hardware enforces an alignment requirement on texture base addresses.15. const struct cudaArray ∗array. cudaBindTexture() returns in ∗offset a byte offset that must be applied to texture fetches in order to read from the desired memory. size_t size = UINT_MAX) Binds size bytes of the memory area pointed to by devPtr to the texture reference texref. • cudaError_t cudaGetChannelDesc (struct cudaChannelFormatDesc ∗desc. const struct textureReference ∗ texref. size_t height.15 Texture Reference Management 91 4.2 4. desc describes how the memory is interpreted when fetching values from the texture. • cudaError_t cudaBindTextureToArray (const struct textureReference ∗texref. • cudaError_t cudaUnbindTexture (const struct textureReference ∗texref) Unbinds a texture. Any memory previously bound to texref is unbound.2. const void ∗devPtr. const struct textureReference ∗texref. int z. 4.1 Function Documentation cudaError_t cudaBindTexture (size_t ∗ offset. This offset must be divided by the texel size and passed to kernels that read from the texture so they can be applied to the Generated for NVIDIA CUDA Library by Doxygen . int y.15. const struct cudaChannelFormatDesc ∗desc) Binds an array to a texture.

cudaErrorInvalidDevicePointer.Channel format size . cudaBindTexture2D() returns in ∗offset a byte offset that must be applied to texture fetches in order to read from the desired memory. cudaBindTexture2D (C API). If the device memory pointer was returned from cudaMalloc().Texture reference to bind devPtr . asynchronous launches. Since the hardware enforces an alignment requirement on texture base addresses.Height in texel units pitch . cudaGetTextureAlignmentOffset (C API) 4.2D memory area on device desc .2 cudaError_t cudaBindTexture2D (size_t ∗ offset. cudaGetChannelDesc. cudaErrorInvalidValue.92 Module Documentation tex1Dfetch() function. const struct textureReference ∗ texref. cudaUnbindTexture (C API). size_t pitch) Binds the 2D memory area pointed to by devPtr to the texture reference texref. size_t width. the offset is guaranteed to be 0 and NULL may be passed as the offset parameter. cudaErrorInvalidTexture Generated for NVIDIA CUDA Library by Doxygen .Width in texel units height .Pitch in bytes Returns: cudaSuccess. cudaErrorInvalidDevicePointer. cudaGetTextureReference. const void ∗ devPtr. cudaBindTextureToArray (C API).15. If the device memory pointer was returned from cudaMalloc().2.Offset in bytes texref .Memory area on device desc . Parameters: offset . the offset is guaranteed to be 0 and NULL may be passed as the offset parameter. height in texel units. This offset must be divided by the texel size and passed to kernels that read from the texture so they can be applied to the tex2D() function. desc describes how the memory is interpreted when fetching values from the texture. cudaErrorInvalidValue.Size of the memory area pointed to by devPtr Returns: cudaSuccess. Parameters: offset . Any memory previously bound to texref is unbound. cudaBindTexture (C++ API).Texture to bind devPtr . See also: cudaCreateChannelDesc (C API).Channel format width . size_t height. and pitch in byte units. const struct cudaChannelFormatDesc ∗ desc.Offset in bytes texref . The size of the area is constrained by width in texel units. cudaErrorInvalidTexture Note: Note that this function may also return error codes from previous.

3 cudaError_t cudaBindTextureToArray (const struct textureReference ∗ texref.15 Texture Reference Management Note: Note that this function may also return error codes from previous. cudaGetChannelDesc. enum cudaChannelFormatKind f.15.Memory array on device desc .2. cudaChannelFormatKindUnsigned. The cudaChannelFormatDesc is defined as: struct cudaChannelFormatDesc { int x. cudaBindTexture2D (C API). cudaBindTexture2D (C++ API). cudaGetTextureReference. and w. cudaGetTextureAlignmentOffset (C API) 4. y. z. Any CUDA array previously bound to texref is unbound. cudaGetChannelDesc. desc describes how the memory is interpreted when fetching values from the texture. cudaErrorInvalidDevicePointer. inherited channel descriptor).X component y . cudaUnbindTexture (C API). See also: 93 cudaCreateChannelDesc (C API). See also: cudaCreateChannelDesc (C API). cudaBindTexture (C API). w. cudaErrorInvalidValue. cudaGetTextureReference.15. enum cudaChannelFormatKind f) [read] Returns a channel descriptor with format f and number of bits of each component x. int y. const struct cudaArray ∗ array. y.Channel format Returns: cudaSuccess. Parameters: x . cudaBindTextureToArray (C API). Parameters: texref . asynchronous launches. const struct cudaChannelFormatDesc ∗ desc) Binds the CUDA array array to the texture reference texref.2.Texture to bind array . cudaBindTexture (C API). }. cudaErrorInvalidTexture Note: Note that this function may also return error codes from previous. asynchronous launches.4 struct cudaChannelFormatDesc cudaCreateChannelDesc (int x. cudaGetTextureAlignmentOffset (C API) 4. where cudaChannelFormatKind is one of cudaChannelFormatKindSigned. cudaBindTexture2D (C++ API.Y component Generated for NVIDIA CUDA Library by Doxygen . z.4. cudaBindTextureToArray (C API). int w. or cudaChannelFormatKindFloat. cudaBindTextureToArray (C++ API). int z.

Z component w . cudaUnbindTexture (C API). cudaErrorInvalidTexture. const struct textureReference ∗ texref) Returns in ∗offset the offset that was returned when texture reference texref was bound.15. Generated for NVIDIA CUDA Library by Doxygen . cudaErrorInvalidTextureBinding Note: Note that this function may also return error codes from previous.5 cudaError_t cudaGetChannelDesc (struct cudaChannelFormatDesc ∗ desc.Texture to get offset of Returns: cudaSuccess. cudaGetTextureAlignmentOffset (C API) 4. cudaBindTexture2D (C API).Offset of texture reference in bytes texref . cudaGetChannelDesc.W component f . cudaBindTexture2D (C API).Channel format array .Channel format Returns: Channel descriptor with format f See also: Module Documentation cudaCreateChannelDesc (C++ API). const struct cudaArray ∗ array) Returns in ∗desc the channel descriptor of the CUDA array array. Parameters: offset . cudaGetTextureAlignmentOffset (C API) 4.15.6 cudaError_t cudaGetTextureAlignmentOffset (size_t ∗ offset. cudaUnbindTexture (C API). cudaErrorInvalidValue Note: Note that this function may also return error codes from previous. cudaBindTextureToArray (C API). cudaGetTextureReference. cudaBindTextureToArray (C API).Memory array on device Returns: cudaSuccess.2. cudaBindTexture (C API). asynchronous launches. See also: cudaCreateChannelDesc (C API).2.94 z . cudaBindTexture (C API). asynchronous launches. cudaGetTextureReference. Parameters: desc .

cudaBindTexture2D (C API). cudaUnbindTexture (C++ API). cudaBindTexture (C API).2. cudaErrorInvalidTexture Note: Note that this function may also return error codes from previous. cudaBindTextureToArray (C API). cudaBindTexture (C API). cudaGetTextureAlignmentOffset (C API). cudaUnbindTexture (C API). asynchronous launches.15 Texture Reference Management See also: 95 cudaCreateChannelDesc (C API).8 cudaError_t cudaUnbindTexture (const struct textureReference ∗ texref) Unbinds the texture bound to texref. cudaBindTexture2D (C API). cudaGetTextureReference. asynchronous launches. cudaGetTextureAlignmentOffset (C API) Generated for NVIDIA CUDA Library by Doxygen . See also: cudaCreateChannelDesc (C API). See also: cudaCreateChannelDesc (C API). cudaBindTextureToArray (C API). cudaGetChannelDesc.7 cudaError_t cudaGetTextureReference (const struct textureReference ∗∗ texref. cudaGetChannelDesc.Symbol to find texture reference for Returns: cudaSuccess. cudaBindTexture (C API). Parameters: texref . cudaGetTextureAlignmentOffset (C++ API) 4. cudaBindTexture2D (C API). cudaUnbindTexture (C API) 4. cudaGetTextureReference. cudaGetChannelDesc. Parameters: texref . cudaBindTextureToArray (C API).Texture to unbind Returns: cudaSuccess Note: Note that this function may also return error codes from previous.2.15.Texture associated with symbol symbol .15.4. const char ∗ symbol) Returns in ∗texref the structure associated to the texture reference defined by symbol symbol.

2 4.Surface associated with symbol Generated for NVIDIA CUDA Library by Doxygen .96 Module Documentation 4. const struct cudaArray ∗array. • cudaError_t cudaGetSurfaceReference (const struct surfaceReference ∗∗surfref. const struct cudaChannelFormatDesc ∗desc) Binds an array to a surface. inherited channel descriptor).Channel format Returns: cudaSuccess. const struct cudaChannelFormatDesc ∗ desc) Binds the CUDA array array to the surface reference surfref.2.2. 4. const char ∗ symbol) Returns in ∗surfref the structure associated to the surface reference defined by symbol symbol.1 Detailed Description This section describes the low level surface reference management functions of the CUDA runtime application programming interface.2 cudaError_t cudaGetSurfaceReference (const struct surfaceReference ∗∗ surfref.16.16. cudaGetSurfaceReference 4. desc describes how the memory is interpreted when fetching values from the surface. 4.16. See also: cudaBindSurfaceToArray (C++ API).Surface to bind array . cudaErrorInvalidValue. const char ∗symbol) Get the surface reference associated with a symbol. Parameters: surfref .Memory array on device desc .1 Function Documentation cudaError_t cudaBindSurfaceToArray (const struct surfaceReference ∗ surfref. cudaErrorInvalidSurface Note: Note that this function may also return error codes from previous. asynchronous launches.16. cudaBindSurfaceToArray (C++ API. Any CUDA array previously bound to surfref is unbound. const struct cudaArray ∗ array.16 Surface Reference Management Functions • cudaError_t cudaBindSurfaceToArray (const struct surfaceReference ∗surfref. Parameters: surfref .

asynchronous launches.4. See also: cudaBindSurfaceToArray (C API) 97 Generated for NVIDIA CUDA Library by Doxygen .Symbol to find surface reference for Returns: cudaSuccess. cudaErrorInvalidSurface Note: Note that this function may also return error codes from previous.16 Surface Reference Management symbol .

4.17 Version Management Functions • cudaError_t cudaDriverGetVersion (int ∗driverVersion) Returns the CUDA driver version.1.1 Function Documentation cudaError_t cudaDriverGetVersion (int ∗ driverVersion) Returns in ∗driverVersion the version number of the installed CUDA driver. This function automatically returns cudaErrorInvalidValue if the driverVersion argument is NULL.2 cudaError_t cudaRuntimeGetVersion (int ∗ runtimeVersion) Returns in ∗runtimeVersion the version number of the installed CUDA Runtime. Returns: cudaSuccess.1 4.17. cudaErrorInvalidValue See also: cudaDriverGetVersion Generated for NVIDIA CUDA Library by Doxygen .98 Module Documentation 4. If no driver is installed. asynchronous launches. Parameters: driverVersion . See also: cudaRuntimeGetVersion 4. cudaErrorInvalidValue Note: Note that this function may also return error codes from previous.17.17. Parameters: runtimeVersion .Returns the CUDA driver version. This function automatically returns cudaErrorInvalidValue if the runtimeVersion argument is NULL.Returns the CUDA Runtime version.1. • cudaError_t cudaRuntimeGetVersion (int ∗runtimeVersion) Returns the CUDA Runtime version. then 0 is returned as the driver version (via driverVersion). Returns: cudaSuccess.

const struct cudaArray ∗array) [C++ API] Binds an array to a surface • template<class T . size_t height. dim. enum cudaTextureReadMode readMode> cudaError_t cudaBindTexture (size_t ∗offset. dim. const struct cudaChannelFormatDesc &desc) [C++ API] Binds an array to a surface • template<class T . readMode > &tex. enum cudaTextureReadMode readMode> cudaError_t cudaBindTextureToArray (const struct texture< T. const void ∗devPtr. int dim. int dim. int dim> cudaError_t cudaBindSurfaceToArray (const struct surface< T. const struct cudaArray ∗array. const struct cudaArray ∗array. Functions • template<class T . const struct texture< T. int dim. size_t width. const struct texture< T. size_t size=UINT_MAX) [C++ API] Binds a memory area to a texture • template<class T . const struct cudaChannelFormatDesc &desc. enum cudaTextureReadMode readMode> cudaError_t cudaBindTexture2D (size_t ∗offset. const struct cudaArray ∗array) [C++ API] Binds an array to a texture • template<class T . size_t pitch) [C++ API] Binds a 2D memory area to a texture • template<class T . size_t height. size_t width. dim. int dim. const void ∗devPtr. size_t pitch) [C++ API] Binds a 2D memory area to a texture • template<class T .18 C++ API Routines 99 4. const struct cudaChannelFormatDesc &desc. const struct cudaChannelFormatDesc &desc) [C++ API] Binds an array to a texture • template<class T > cudaChannelFormatDesc cudaCreateChannelDesc (void) [C++ API] Returns a channel descriptor using the specified format • cudaError_t cudaEventCreate (cudaEvent_t ∗event. int dim.18 C++ API Routines C++-style interface built on top of CUDA runtime API. enum cudaTextureReadMode readMode> cudaError_t cudaBindTexture (size_t ∗offset. enum cudaTextureReadMode readMode> cudaError_t cudaBindTexture2D (size_t ∗offset. dim. readMode > &tex.4. const struct texture< T. dim. readMode > &tex. int dim> cudaError_t cudaBindSurfaceToArray (const struct surface< T. unsigned int flags) [C++ API] Creates an event object with the specified flags Generated for NVIDIA CUDA Library by Doxygen . const void ∗devPtr. dim. const void ∗devPtr. readMode > &tex. readMode > &tex. int dim. readMode > &tex. dim > &surf. size_t size=UINT_MAX) [C++ API] Binds a memory area to a texture • template<class T . dim > &surf. const struct texture< T. enum cudaTextureReadMode readMode> cudaError_t cudaBindTextureToArray (const struct texture< T.

dim. int dim. • template<class T > cudaError_t cudaGetSymbolAddress (void ∗∗devPtr. readMode > &tex) [C++ API] Unbinds a texture 4. const struct texture< T.2 4. readMode > &tex) [C++ API] Get the alignment offset of a texture • template<class T > cudaError_t cudaLaunch (T ∗entry) [C++ API] Launches a device function • cudaError_t cudaMallocHost (void ∗∗ptr. The channel descriptor is inherited from the CUDA array. dim. T ∗entry) [C++ API] Find out attributes for a given function • template<class T > cudaError_t cudaFuncSetCacheConfig (T ∗func.2.1 Detailed Description This section describes the C++ high level API functions of the CUDA runtime application programming interface. const T &symbol) [C++ API] Finds the size of the object associated with a CUDA symbol • template<class T . int dim. Any CUDA array previously bound to surf is unbound. To use these functions.1 Function Documentation template<class T . dim > & surf. const T &symbol) [C++ API] Finds the address associated with a CUDA symbol • template<class T > cudaError_t cudaGetSymbolSize (size_t ∗size. unsigned int flags) [C++ API] Allocates page-locked memory on the host • template<class T > cudaError_t cudaSetupArgument (T arg.18. enum cudaFuncCache cacheConfig) Sets the preferred cache configuration for a device function.18. int dim> cudaError_t cudaBindSurfaceToArray (const struct surface< T. size_t offset) [C++ API] Configure a device launch • template<class T . const struct cudaArray ∗ array) Binds the CUDA array array to the surface reference surf. your application needs to be compiled with the nvcc compiler.100 Module Documentation • template<class T > cudaError_t cudaFuncGetAttributes (struct cudaFuncAttributes ∗attr. 4. size_t size.18. enum cudaTextureReadMode readMode> cudaError_t cudaUnbindTexture (const struct texture< T. Generated for NVIDIA CUDA Library by Doxygen . enum cudaTextureReadMode readMode> cudaError_t cudaGetTextureAlignmentOffset (size_t ∗offset.

18 C++ API Routines Parameters: surf . desc describes how the memory is interpreted when dealing with the surface. const struct textureReference∗.Surface to bind array .Memory array on device desc . Parameters: offset . cudaErrorInvalidSurface Note: Note that this function may also return error codes from previous. dim.18. cudaErrorInvalidValue.Offset in bytes tex . const void ∗ devPtr. See also: cudaBindSurfaceToArray (C API). const struct texture< T. readMode > & tex.4.Memory array on device Returns: cudaSuccess.Texture to bind Generated for NVIDIA CUDA Library by Doxygen . Any CUDA array previously bound to surf is unbound. size_t size = UINT_MAX) Binds size bytes of the memory area pointed to by devPtr to texture reference tex.2.3 template<class T . cudaBindSurfaceToArray (C++ API) 101 4. inherited channel descriptor) 4. cudaBindSurfaceToArray (C++ API.2. See also: cudaBindSurfaceToArray (C API). Any memory previously bound to tex is unbound.Channel format Returns: cudaSuccess. asynchronous launches. dim > & surf. size_t) function. int dim. enum cudaTextureReadMode readMode> cudaError_t cudaBindTexture (size_t ∗ offset. Parameters: surf .18. const void∗.Surface to bind array .2 template<class T . cudaErrorInvalidSurface Note: Note that this function may also return error codes from previous. const struct cudaChannelFormatDesc & desc) Binds the CUDA array array to the surface reference surf. cudaErrorInvalidValue. int dim> cudaError_t cudaBindSurfaceToArray (const struct surface< T. const struct cudaChannelFormatDesc∗. The channel descriptor is inherited from the texture reference type. The offset parameter is an optional byte offset as with the low-level cudaBindTexture(size_t∗. asynchronous launches. const struct cudaArray ∗ array.

cudaUnbindTexture (C++ API).18. cudaBindTexture (C++ API. cudaGetTextureAlignmentOffset (C++ API) 4. cudaBindTexture2D (C++ API. cudaUnbindTexture (C++ API). inherited channel descriptor). readMode > & tex. inherited channel descriptor). Any memory previously bound to tex is unbound. enum cudaTextureReadMode readMode> cudaError_t cudaBindTexture (size_t ∗ offset.Size of the memory area pointed to by devPtr Returns: cudaSuccess. asynchronous launches.Memory area on device desc .2. cudaErrorInvalidValue. const struct cudaChannelFormatDesc & desc. cudaBindTexture (C API). cudaBindTexture2D (C++ API.Offset in bytes tex . inherited channel descriptor).Size of the memory area pointed to by devPtr Returns: Module Documentation cudaSuccess. cudaErrorInvalidDevicePointer. inherited channel descriptor). cudaBindTextureToArray (C++ API. cudaErrorInvalidTexture Note: Note that this function may also return error codes from previous. dim. cudaErrorInvalidValue. See also: cudaCreateChannelDesc (C++ API). cudaGetTextureAlignmentOffset (C++ API) Generated for NVIDIA CUDA Library by Doxygen . cudaGetTextureReference. int dim. cudaBindTexture2D (C++ API). const void ∗ devPtr. inherited channel descriptor). cudaBindTextureToArray (C++ API). cudaBindTextureToArray (C++ API). The offset parameter is an optional byte offset as with the low-level cudaBindTexture() function. cudaBindTexture (C++ API). Parameters: offset .Memory area on device size . cudaGetChannelDesc. desc describes how the memory is interpreted when fetching values from the texture. cudaGetChannelDesc.Channel format size .102 devPtr . cudaBindTextureToArray (C++ API. cudaBindTexture (C API). cudaBindTexture2D (C++ API). cudaErrorInvalidDevicePointer.4 template<class T . cudaGetTextureReference.Texture to bind devPtr . size_t size = UINT_MAX) Binds size bytes of the memory area pointed to by devPtr to texture reference tex. asynchronous launches. See also: cudaCreateChannelDesc (C++ API). const struct texture< T. cudaErrorInvalidTexture Note: Note that this function may also return error codes from previous.

2. Parameters: offset .2D memory area on device width . If the device memory pointer was returned from cudaMalloc(). cudaUnbindTexture (C++ API). size_t pitch) Binds the 2D memory area pointed to by devPtr to the texture reference tex. Any memory previously bound to tex is unbound. See also: cudaCreateChannelDesc (C++ API).Width in texel units height . cudaBindTexture2D() returns in ∗offset a byte offset that must be applied to texture fetches in order to read from the desired memory. Since the hardware enforces an alignment requirement on texture base addresses. cudaBindTextureToArray (C++ API). cudaErrorInvalidTexture Note: Note that this function may also return error codes from previous.Offset in bytes tex . This offset must be divided by the texel size and passed to kernels that read from the texture so they can be applied to the tex2D() function.5 103 template<class T . cudaBindTexture2D (C API). Any memory previously bound to tex is unbound. int dim. const struct texture< T.Offset in bytes Generated for NVIDIA CUDA Library by Doxygen . cudaGetChannelDesc. const void ∗ devPtr. cudaErrorInvalidValue. and pitch in byte units. size_t width. size_t width. const void ∗ devPtr. inherited channel descriptor). the offset is guaranteed to be 0 and NULL may be passed as the offset parameter. cudaGetTextureReference. desc describes how the memory is interpreted when fetching values from the texture. size_t height. inherited channel descriptor). cudaGetTextureAlignmentOffset (C++ API) 4. readMode > & tex. The size of the area is constrained by width in texel units. cudaBindTexture2D (C++ API). This offset must be divided by the texel size and passed to kernels that read from the texture so they can be applied to the tex2D() function. const struct texture< T. height in texel units. The size of the area is constrained by width in texel units.18 C++ API Routines 4. int dim. Parameters: offset . The channel descriptor is inherited from the texture reference type.18. the offset is guaranteed to be 0 and NULL may be passed as the offset parameter. size_t pitch) Binds the 2D memory area pointed to by devPtr to the texture reference tex. readMode > & tex.18. enum cudaTextureReadMode readMode> cudaError_t cudaBindTexture2D (size_t ∗ offset. and pitch in byte units.4.Height in texel units pitch . cudaBindTexture2D() returns in ∗offset a byte offset that must be applied to texture fetches in order to read from the desired memory. dim. cudaBindTexture (C++ API). height in texel units.6 template<class T . const struct cudaChannelFormatDesc & desc. If the device memory pointer was returned from cudaMalloc(). enum cudaTextureReadMode readMode> cudaError_t cudaBindTexture2D (size_t ∗ offset. cudaErrorInvalidDevicePointer. size_t height. cudaBindTexture (C++ API. Since the hardware enforces an alignment requirement on texture base addresses. cudaBindTextureToArray (C++ API.Pitch in bytes Returns: cudaSuccess.Texture reference to bind devPtr . dim.2. asynchronous launches.

inherited channel descriptor). cudaUnbindTexture (C++ API).Texture to bind array .7 template<class T . inherited channel descriptor).2D memory area on device desc . cudaBindTexture2D (C++ API). cudaGetTextureAlignmentOffset (C++ API) 4.2. cudaErrorInvalidDevicePointer. inherited channel descriptor).104 tex . cudaBindTexture (C++ API. cudaErrorInvalidValue.Memory array on device Returns: cudaSuccess. cudaGetChannelDesc. cudaBindTexture (C++ API). cudaBindTextureToArray (C++ API). cudaGetTextureReference. cudaBindTexture2D (C++ API. cudaBindTexture (C++ API. cudaBindTextureToArray (C API). See also: cudaCreateChannelDesc (C++ API). cudaGetChannelDesc. dim.Height in texel units pitch .Pitch in bytes Returns: Module Documentation cudaSuccess. cudaBindTexture2D (C++ API. cudaGetTextureAlignmentOffset (C++ API) Generated for NVIDIA CUDA Library by Doxygen . asynchronous launches.Width in texel units height . inherited channel descriptor). enum cudaTextureReadMode readMode> cudaError_t cudaBindTextureToArray (const struct texture< T. cudaBindTextureToArray (C++ API. const struct cudaArray ∗ array) Binds the CUDA array array to the texture reference tex. inherited channel descriptor). Any CUDA array previously bound to tex is unbound. Parameters: tex . int dim. The channel descriptor is inherited from the CUDA array. asynchronous launches. cudaErrorInvalidTexture Note: Note that this function may also return error codes from previous. cudaErrorInvalidDevicePointer.Channel format width . cudaGetTextureReference. cudaBindTexture2D (C API). See also: cudaCreateChannelDesc (C++ API). cudaBindTexture (C++ API).Texture reference to bind devPtr . readMode > & tex. cudaUnbindTexture (C++ API). cudaBindTextureToArray (C++ API). cudaErrorInvalidTexture Note: Note that this function may also return error codes from previous. cudaErrorInvalidValue.18.

2. cudaBindTexture2D (High level). inherited channel descriptor). cudaErrorInvalidTexture Note: Note that this function may also return error codes from previous. inherited channel descriptor). cudaBindTextureToArray (C API). enum cudaChannelFormatKind f. w.8 105 template<class T . The cudaChannelFormatDesc is defined as: struct cudaChannelFormatDesc { int x. }. Returns: Channel descriptor with format f See also: cudaCreateChannelDesc (Low level). cudaGetTextureReference.18 C++ API Routines 4. cudaGetTextureReference. cudaBindTextureToArray (C++ API. desc describes how the memory is interpreted when fetching values from the texture. const struct cudaChannelFormatDesc & desc) Binds the CUDA array array to the texture reference tex. cudaChannelFormatKindUnsigned. cudaBindTextureToArray (High level.9 template<class T > cudaChannelFormatDesc cudaCreateChannelDesc (void) Returns a channel descriptor with format f and number of bits of each component x. cudaBindTexture (C++ API). cudaGetChannelDesc. cudaUnbindTexture (C++ API). cudaErrorInvalidValue. Any CUDA array previously bound to tex is unbound. cudaBindTexture (High level. int dim. cudaBindTexture2D (C++ API. readMode > & tex. cudaGetTextureAlignmentOffset (High level) Generated for NVIDIA CUDA Library by Doxygen . cudaGetTextureAlignmentOffset (C++ API) 4. inherited channel descriptor). Parameters: tex . inherited channel descriptor). cudaBindTexture2D (C++ API). where cudaChannelFormatKind is one of cudaChannelFormatKindSigned. cudaBindTextureToArray (High level).4. enum cudaTextureReadMode readMode> cudaError_t cudaBindTextureToArray (const struct texture< T. cudaErrorInvalidDevicePointer. cudaGetChannelDesc. or cudaChannelFormatKindFloat. inherited channel descriptor). cudaUnbindTexture (High level). y. cudaBindTexture (C++ API. dim.Memory array on device desc . See also: cudaCreateChannelDesc (C++ API). const struct cudaArray ∗ array. y.Texture to bind array .2. z.18.Channel format Returns: cudaSuccess. asynchronous launches.18. z. and w. cudaBindTexture (High level).

106 4. T ∗ entry) This function obtains the attributes of a function specified via entry.Newly created event flags .Flags for new event Returns: cudaSuccess. unsigned int flags) Module Documentation Creates an event object with the specified flags. The parameter entry can either be a pointer to a function that executes on the device. If the specified function does not exist. cudaErrorLaunchFailure. cudaErrorInvalidDeviceFunction Note: Note that this function may also return error codes from previous. asynchronous launches. Generated for NVIDIA CUDA Library by Doxygen . cudaEventElapsedTime. Events created with this flag specified and the cudaEventBlockingSync flag not specified will provide the best performance when used with cudaStreamWaitEvent() and cudaEventQuery(). cudaStreamWaitEvent 4. • cudaEventBlockingSync: Specifies that event should use blocking synchronization.Function to get attributes of Returns: cudaSuccess. cudaErrorMemoryAllocation Note: Note that this function may also return error codes from previous. asynchronous launches.2. Note that some function attributes such as maxThreadsPerBlock may vary based on the device that is currently being used.10 cudaError_t cudaEventCreate (cudaEvent_t ∗ event. cudaErrorInvalidValue. Parameters: event . The parameter specified by entry must be declared as a __global__ function. cudaErrorInitializationError. See also: cudaEventCreate (C API).2. • cudaEventDisableTiming: Specifies that the created event does not need to record timing data.11 template<class T > cudaError_t cudaFuncGetAttributes (struct cudaFuncAttributes ∗ attr. Parameters: attr . then cudaErrorInvalidDeviceFunction is returned. A host thread that uses cudaEventSynchronize() to wait on an event created with this flag will block until the event actually completes. cudaErrorInitializationError. cudaEventRecord.18. The fetched attributes are placed in attr.Return pointer to function’s attributes entry . cudaEventQuery. cudaEventCreateWithFlags.18. cudaEventDestroy. cudaEventSynchronize. Valid flags include: • cudaEventDefault: Default event creation flag. or it can be a character string specifying the fully-decorated (C++) name of a function that executes on the device.

cudaFuncSetCacheConfig (C API). The parameter specified by func must be declared as a __global__ function. If there are multiple global or constant variables with the same string name (from separate files) and the lookup is done via character string. asynchronous launches. cudaErrorInvalidDeviceFunction Note: Note that this function may also return error codes from previous. ∗devPtr is unchanged and the error cudaErrorInvalidSymbol is returned. cudaFuncSetCacheConfig (C++ API). See also: cudaConfigureCall. or if symbol is not declared in the global or constant memory space. cudaSetDoubleForHost. cudaLaunch (C++ API). or it can be a character string specifying the fully-decorated (C++) name for a function that executes on the device. This is only a preference. const T & symbol) Returns in ∗devPtr the address of symbol symbol on the device.12 template<class T > cudaError_t cudaFuncSetCacheConfig (T ∗ func.4. cudaErrorDuplicateVariableName is returned. naming a variable that resides in global or constant memory space. If the specified function does not exist. then cudaErrorInvalidDeviceFunction is returned. but it is free to choose a different configuration if required to execute func.Char string naming device function cacheConfig . This setting does nothing on devices where the size of the L1 cache and shared memory are fixed. The supported cache configurations are: • cudaFuncCachePreferNone: no preference for shared memory or L1 (default) • cudaFuncCachePreferShared: prefer larger shared memory and smaller L1 cache • cudaFuncCachePreferL1: prefer larger L1 cache and smaller shared memory Parameters: func . enum cudaFuncCache cacheConfig) On devices where the L1 cache and shared memory use the same hardware resources.2.18. func can either be a pointer to a function that executes on the device. Launching a kernel with a different preference than the most recent preference setting may insert a device-side synchronization point. cudaSetDoubleForDevice.Requested cache configuration Returns: cudaSuccess. cudaSetupArgument (C++ API). The runtime will use the requested configuration if possible. cudaErrorInitializationError. symbol can either be a variable that resides in global or constant memory space. Generated for NVIDIA CUDA Library by Doxygen .13 template<class T > cudaError_t cudaGetSymbolAddress (void ∗∗ devPtr. If symbol cannot be found. or it can be a character string. this sets through cacheConfig the preferred cache configuration for the function specified via func. cudaLaunch (C API).18. cudaFuncGetAttributes (C++ API). cudaSetupArgument (C++ API) 4. cudaFuncGetAttributes (C API). cudaThreadGetCacheConfig.18 C++ API Routines See also: 107 cudaConfigureCall. cudaThreadSetCacheConfig 4. cudaSetDoubleForHost. cudaSetDoubleForDevice.2.

Global variable or string symbol to find size of Returns: cudaSuccess. cudaErrorInvalidTexture.14 template<class T > cudaError_t cudaGetSymbolSize (size_t ∗ size. cudaErrorInvalidTextureBinding Generated for NVIDIA CUDA Library by Doxygen . If symbol cannot be found. cudaErrorInvalidSymbol. See also: cudaGetSymbolAddress (C++ API) cudaGetSymbolSize (C API) 4. cudaErrorDuplicateVariableName Note: Note that this function may also return error codes from previous.Global/constant variable or string symbol to search for Returns: cudaSuccess. cudaErrorDuplicateVariableName is returned.15 template<class T . const T & symbol) Returns in ∗size the size of symbol symbol.2. or if symbol is not declared in global or constant memory space. or it can be a character string.18.Offset of texture reference in bytes tex .Return device pointer associated with symbol symbol . cudaErrorInvalidSymbol.2. dim. readMode > & tex) Returns in ∗offset the offset that was returned when texture reference tex was bound.108 Parameters: devPtr . cudaErrorDuplicateVariableName Note: Module Documentation Note that this function may also return error codes from previous. symbol can either be a variable that resides in global or constant memory space. Parameters: size . enum cudaTextureReadMode readMode> cudaError_t cudaGetTextureAlignmentOffset (size_t ∗ offset. ∗size is unchanged and the error cudaErrorInvalidSymbol is returned. const struct texture< T. int dim. See also: cudaGetSymbolAddress (C API) cudaGetSymbolSize (C++ API) 4. Parameters: offset . If there are multiple global variables with the same string name (from separate files) and the lookup is done via character string. asynchronous launches.18. naming a variable that resides in global or constant memory space.Texture to get offset of Returns: cudaSuccess. asynchronous launches.Size of object associated with symbol symbol .

cudaErrorLaunchTimeout. or it can be a character string. See also: cudaConfigureCall. naming a function that executes on the device. cudaUnbindTexture (C++ API). cudaGetTextureReference. The flags parameter enables different options to be specified that affect the allocation. cudaFuncGetAttributes (C++ API). As a result.2. cudaSetupArgument (C++ API). cudaErrorLaunchFailure. Since the memory can be accessed directly by the device. Generated for NVIDIA CUDA Library by Doxygen . cudaThreadGetCacheConfig. cudaFuncSetCacheConfig (C++ API). The parameter entry can either be a function that executes on the device. • cudaHostAllocPortable: The memory returned by this call will be considered as pinned memory by all CUDA contexts.Device function pointer or char string naming device function to execute Returns: cudaSuccess. cudaLaunch (C API). cudaBindTexture (C++ API). cudaSetDoubleForDevice. cudaErrorLaunchOutOfResources. See also: 109 cudaCreateChannelDesc (C++ API). asynchronous launches. cudaSetDoubleForHost.4. cudaThreadSetCacheConfig 4. it can be read or written with much higher bandwidth than pageable memory obtained with functions such as malloc(). • cudaHostAllocDefault: This flag’s value is defined to be 0. inherited channel descriptor).18 C++ API Routines Note: Note that this function may also return error codes from previous. asynchronous launches. cudaLaunch() must be preceded by a call to cudaConfigureCall() since it pops the data that was pushed by cudaConfigureCall() from the execution stack. The driver tracks the virtual memory ranges allocated with this function and automatically accelerates calls to functions such as cudaMemcpy().17 cudaError_t cudaMallocHost (void ∗∗ ptr. inherited channel descriptor). cudaBindTextureToArray (C++ API.18. cudaBindTexture2D (C++ API). since it reduces the amount of memory available to the system for paging.2. cudaBindTexture2D (C++ API. The parameter specified by entry must be declared as a __global__ function.18. cudaGetChannelDesc. as follows. inherited channel descriptor). this function is best used sparingly to allocate staging areas for data exchange between host and device. cudaErrorSharedObjectInitFailed Note: Note that this function may also return error codes from previous.16 template<class T > cudaError_t cudaLaunch (T ∗ entry) Launches the function entry on the device. cudaErrorInvalidDeviceFunction. cudaGetTextureAlignmentOffset (C API) 4. size_t size. unsigned int flags) Allocates size bytes of host memory that is page-locked and accessible to the device. not just the one that performed the allocation. cudaBindTextureToArray (C++ API). cudaErrorSharedObjectSymbolNotFound. Parameters: entry . Allocating excessive amounts of pinned memory may degrade system performance. cudaBindTexture (C++ API. cudaErrorInvalidConfiguration.

mapped and/or write-combined with no restrictions. The cudaHostAllocMapped flag may be specified on CUDA contexts for devices that do not support mapped pinned memory. asynchronous launches. cudaErrorMemoryAllocation Note: Note that this function may also return error codes from previous. Generated for NVIDIA CUDA Library by Doxygen .Device pointer to allocated memory size . The device pointer to the memory may be obtained by calling cudaHostGetDevicePointer(). Parameters: ptr .18. which starts at offset 0.18 template<class T > cudaError_t cudaSetupArgument (T arg.Argument to push for a kernel launch offset . See also: cudaSetDeviceFlags. • cudaHostAllocWriteCombined: Allocates the memory as write-combined (WC). Memory allocated by this function must be freed with cudaFreeHost().2. The arguments are stored in the top of the execution stack. cudaMallocHost (C API). WC memory can be transferred across the PCI Express bus more quickly on some system configurations.Offset in argument stack to push new arg Returns: cudaSuccess Note: Note that this function may also return error codes from previous.Requested allocation size in bytes flags . cudaSetDeviceFlags() must have been called with the cudaDeviceMapHost flag in order for the cudaHostAllocMapped flag to have any effect. Parameters: arg . WC memory is a good option for buffers that will be written by the CPU and read by the device via mapped pinned memory or host->device transfers. size_t offset) Pushes size bytes of the argument pointed to by arg at offset bytes from the start of the parameter passing area.Requested properties of allocated memory Returns: cudaSuccess. but cannot be read efficiently by most CPUs. cudaSetupArgument() must be preceded by a call to cudaConfigureCall(). cudaHostAlloc 4. All of these flags are orthogonal to one another: a developer may allocate memory that is portable. The failure is deferred to cudaHostGetDevicePointer() because the memory may be mapped into other CUDA contexts via the cudaHostAllocPortable flag. asynchronous launches.110 Module Documentation • cudaHostAllocMapped: Maps the allocation into the CUDA address space. cudaFreeHost.

cudaBindTexture2D (C++ API).18. inherited channel descriptor). cudaBindTextureToArray (C++ API). cudaLaunch (C++ API).Texture to unbind Returns: cudaSuccess Note: Note that this function may also return error codes from previous. cudaBindTexture2D (C++ API. Parameters: tex . cudaUnbindTexture (C API). cudaBindTexture (C++ API. dim.2. cudaGetTextureAlignmentOffset (C++ API) Generated for NVIDIA CUDA Library by Doxygen . cudaBindTexture (C++ API). readMode > & tex) Unbinds the texture bound to tex. See also: cudaCreateChannelDesc (C++ API). inherited channel descriptor). cudaGetChannelDesc. cudaSetupArgument (C API) 4. cudaGetTextureReference. cudaSetDoubleForDevice. int dim. inherited channel descriptor). cudaSetDoubleForHost.4. cudaFuncGetAttributes (C++ API).18 C++ API Routines See also: 111 cudaConfigureCall. asynchronous launches.19 template<class T . enum cudaTextureReadMode readMode> cudaError_t cudaUnbindTexture (const struct texture< T. cudaBindTextureToArray (C++ API.

3 Interactions between CUevent and cudaEvent_t The types CUevent and cudaEvent_t are identical and may be used interchangeably. then the CUDA Runtime will not increment or decrement the reference count of that CUcontext. All CUDA Runtime API state (e. then the CUDA runtime will decrement the reference count of that CUcontext in the function cudaThreadExit.2 Interactions between CUstream and cudaStream_t The types CUstream and cudaStream_t are identical and may be used interchangeably.1 Context Management CUDA Runtime API calls operate on the CUDA Driver API CUcontext which is bound to the current host thread. 4. If there exists no CUDA Driver API CUcontext bound to the current thread at the time of a CUDA Runtime API call which requires a CUcontext then the CUDA Runtime will implicitly create a new CUcontext before executing the call. In order to use a struct cudaArray ∗ in a CUDA Driver API function which takes a CUarray. cudaSetDeviceFlags.19. cudaD3D9SetDirect3DDevice. If a CUcontext is created by the CUDA Runtime.19. Note that these functions will fail with cudaErrorSetOnActiveProcess if they are called when a CUcontext is bound to the current host thread.g. The CUDA Runtime will return cudaErrorIncompatibleDriverContext in such cases. and is incremented by cuCtxAttach and decremented by cuCtxDetach. cudaSetValidDevices.112 Module Documentation 4. In order to use a CUarray in a CUDA Runtime API function which takes a struct cudaArray ∗. 4. This section describes the interactions between the CUDA Driver API and the CUDA Runtime API 4. cudaGLSetGLDevice. it is necessary to explicitly cast the struct cudaArray ∗ to a CUarray . If the CUDA Runtime creates a CUcontext then the CUcontext will be created using the parameters specified by the CUDA Runtime API functions cudaSetDevice. and cudaD3D11SetDirect3DDevice. In particular.4 Interactions between CUarray and struct cudaArray ∗ The types CUarray and struct cudaArray ∗ represent the same data type and may be used interchangeably by casting the two types between each other. global variables’ addresses and values) travels with its underlying CUcontext. The lifetime of a CUcontext is managed by a reference counting mechanism. if a CUcontext is moved from one thread to another (using cuCtxPopCurrent and cuCtxPushCurrent) then all CUDA Runtime API state will move to that thread as well. The reference count of a CUcontext is initially set to 0.19. If a CUcontext is created by the CUDA Driver API (or is created by a separate instance of the CUDA Runtime API library). it is necessary to explicitly cast the CUarray to a struct cudaArray ∗. 4. cudaD3D10SetDirect3DDevice.19. Please note that attaching to legacy contexts (those with a version of 3010 as returned by cuCtxGetApiVersion()) is not possible. Generated for NVIDIA CUDA Library by Doxygen .19 Interactions with the CUDA Driver API Interactions between the CUDA Driver API and the CUDA Runtime API.

4.19.5 Interactions between CUgraphicsResource and cudaGraphicsResource_t The types CUgraphicsResource and struct cudaGraphicsResource ∗ represent the same data type and may be used interchangeably by casting the two types between each other. In order to use a struct cudaGraphicsResource ∗ in a CUDA Driver API function which takes a CUgraphicsResource. it is necessary to explicitly cast the struct cudaGraphicsResource ∗ to a CUgraphicsResource . it is necessary to explicitly cast the CUgraphicsResource to a struct cudaGraphicsResource ∗.19 Interactions with the CUDA Driver API 113 4. In order to use a CUgraphicsResource in a CUDA Runtime API function which takes a struct cudaGraphicsResource ∗. Generated for NVIDIA CUDA Library by Doxygen .

20. unsigned int level) Get the dimensions of a registered Direct3D surface. size_t ∗pHeight. Generated for NVIDIA CUDA Library by Doxygen . unsigned int face. IDirect3DResource9 ∗∗ppResources) Map Direct3D resources for access by CUDA. • cudaError_t cudaD3D9ResourceSetMapFlags (IDirect3DResource9 ∗pResource. unsigned int face. unsigned int level) Get the pitch of a subresource of a Direct3D resource which has been mapped for access by CUDA. • cudaError_t cudaD3D9UnregisterResource (IDirect3DResource9 ∗pResource) Unregisters a Direct3D resource for access by CUDA. • cudaError_t cudaD3D9ResourceGetSurfaceDimensions (size_t ∗pWidth. unsigned int level) Get a pointer through which to access a subresource of a Direct3D resource which has been mapped for access by CUDA. • cudaError_t cudaD3D9ResourceGetMappedArray (cudaArray ∗∗ppArray. IDirect3DResource9 ∗pResource. unsigned int face. IDirect3DResource9 ∗pResource.2 4.20 Direct3D 9 Interoperability [DEPRECATED] Functions • cudaError_t cudaD3D9MapResources (int count. 4. unsigned int flags) Registers a Direct3D resource for access by CUDA.0. unsigned int level) Get an array through which to access a subresource of a Direct3D resource which has been mapped for access by CUDA.20. • cudaError_t cudaD3D9ResourceGetMappedSize (size_t ∗pSize. IDirect3DResource9 ∗pResource. unsigned int face. 4. IDirect3DResource9 ∗pResource.1 Function Documentation cudaError_t cudaD3D9MapResources (int count. IDirect3DResource9 ∗pResource. • cudaError_t cudaD3D9ResourceGetMappedPointer (void ∗∗pPointer.20. • cudaError_t cudaD3D9UnmapResources (int count.2. • cudaError_t cudaD3D9ResourceGetMappedPitch (size_t ∗pPitch. • cudaError_t cudaD3D9RegisterResource (IDirect3DResource9 ∗pResource.114 Module Documentation 4. unsigned int flags) Set usage flags for mapping a Direct3D resource. IDirect3DResource9 ∗∗ ppResources) Deprecated This function is deprecated as of Cuda 3. unsigned int level) Get the size of a subresource of a Direct3D resource which has been mapped for access by CUDA. IDirect3DResource9 ∗∗ppResources) Unmap Direct3D resources for access by CUDA.1 Detailed Description This section describes deprecated Direct3D 9 interoperability functions. unsigned int face. size_t ∗pDepth. size_t ∗pPitchSlice.

If any of ppResources are presently mapped for access by CUDA then cudaErrorUnknown is returned. • IDirect3DVertexBuffer9: No notes. The type of pResource must be one of the following. This call is potentially high-overhead and should not be called every frame in interactive applications. • IDirect3DSurface9: Only stand-alone objects of type IDirect3DSurface9 may be explicitly shared.2. If this call is successful. • IDirect3DIndexBuffer9: No notes. If any of ppResources have not been registered for use with CUDA or if ppResources contains any duplicate entries then cudaErrorInvalidResourceHandle is returned. See also: cudaGraphicsMapResources 4. all surfaces associated with all mipmap levels of all faces of the texture will be accessible to CUDA. Registers the Direct3D resource pResource for access by CUDA. Parameters: count .0. individual mipmap levels and faces of cube maps may not be registered directly. The following value is allowed: Generated for NVIDIA CUDA Library by Doxygen . The flags argument specifies the mechanism through which CUDA will access the Direct3D resource.Number of resources to map for CUDA ppResources . To access individual surfaces associated with a texture. • IDirect3DBaseTexture9: When a texture is registered. one must register the base texture object. Direct3D should not access any resources while they are mapped by CUDA.20. This reference count will be decremented when this resource is unregistered through cudaD3D9UnregisterResource(). this call will increase the internal reference count on pResource. asynchronous launches. the results are undefined. cudaErrorInvalidResourceHandle. unsigned int flags) Deprecated This function is deprecated as of Cuda 3.2 cudaError_t cudaD3D9RegisterResource (IDirect3DResource9 ∗ pResource. then the application will be able to map and unmap this resource until it is unregistered through cudaD3D9UnregisterResource().Resources to map for CUDA Returns: cudaSuccess. In particular. 115 The resources in ppResources may be accessed in CUDA kernels until they are unmapped.20 Direct3D 9 Interoperability [DEPRECATED] Maps the count Direct3D resources in ppResources for access by CUDA.4. Also on success. cudaErrorUnknown Note: Note that this function may also return error codes from previous. If an application does so. This function provides the synchronization guarantee that any Direct3D calls issued before cudaD3D9MapResources() will complete before any CUDA kernels issued after cudaD3D9MapResources() begin.

See also: cudaGraphicsD3D9RegisterResource 4. • Textures which are not of a format which is 1. cudaErrorInvalidValue.3 cudaError_t cudaD3D9ResourceGetMappedArray (cudaArray ∗∗ ppArray. unsigned int face. This option is valid for all resource types. and cudaD3D9ResourceGetMappedPitch() respectively. The following are some limitations: • The primary rendertarget may not be registered with CUDA.20. IDirect3DResource9 ∗ pResource. The value set in pArray may change every time that pResource is mapped. is a non-stand-alone IDirect3DSurface9) or is already registered. Returns in ∗pArray an array through which the subresource of the mapped Direct3D resource pResource.g. then cudaErrorInvalidResourceHandle is returned. see cudaD3D9ResourceGetMappedPointer().Parameters for resource registration Returns: cudaSuccess. size.0.Resource to register flags . • Surfaces of depth or stencil formats cannot be shared. or 4 channels of 8. or 32-bit integer or floating-point data cannot be shared. The pointer. unsigned int level) Deprecated This function is deprecated as of Cuda 3. 16. If pResource cannot be registered then cudaErrorUnknown is returned. If pResource is not mapped. then cudaErrorUnknown is returned. asynchronous launches. 2. • Resources allocated as shared may not be registered with CUDA. If pResource was not registered with usage flags cudaD3D9RegisterFlagsArray. If pResource is not registered then cudaErrorInvalidResourceHandle is returned. For usage requirements of face and level parameters. Parameters: pResource . cudaD3D9ResourceGetMappedSize(). then cudaErrorInvalidResourceHandle is returned. Generated for NVIDIA CUDA Library by Doxygen . Not all Direct3D resources of the above types may be used for interoperability with CUDA. If pResource is of incorrect type (e. and pitch for each subresource of this resource may be queried through cudaD3D9ResourceGetMappedPointer().116 Module Documentation • cudaD3D9RegisterFlagsNone: Specifies that CUDA will access this resource through a void∗. cudaErrorUnknown Note: Note that this function may also return error codes from previous. • Any resources allocated in D3DPOOL_SYSTEMMEM or D3DPOOL_MANAGED may not be registered with CUDA.2. which corresponds to face and level may be accessed. then cudaErrorInvalidDevice is returned. cudaErrorInvalidResourceHandle. If Direct3D interoperability is not initialized on this context.

If pResource was not registered with usage flags cudaD3D9RegisterFlagsNone.Mapped resource to access face .Face of resource to access level . For usage requirements of face and level parameters. the byte offset of the sample at position x. The values set in pPitch and pPitchSlice may change every time that pResource is mapped.Returned pitch of subresource pPitchSlice .20.20 Direct3D 9 Interoperability [DEPRECATED] Parameters: ppArray .Returned Z-slice pitch of subresource pResource . If pResource is not of type IDirect3DBaseTexture9 or one of its sub-types or if pResource has not been registered for use with CUDA. IDirect3DResource9 ∗ pResource. See also: cudaGraphicsSubResourceGetMappedArray 117 4. The pitch and Z-slice pitch values may be used to compute the location of a sample on a surface as follows. unsigned int level) Deprecated This function is deprecated as of Cuda 3. asynchronous launches. If pResource is not mapped for access by CUDA then cudaErrorUnknown is returned. Parameters: pPitch .4.Returned array corresponding to subresource pResource .Face of resource to access level .Level of resource to access Returns: cudaSuccess. size_t ∗ pPitchSlice.Level of resource to access Generated for NVIDIA CUDA Library by Doxygen . z from the base pointer of the surface is: z∗ slicePitch + y ∗ pitch + (bytes per pixel) ∗ x Both parameters pPitch and pPitchSlice are optional and may be set to NULL. then cudaErrorInvalidResourceHandle is returned. the byte offset of the sample at position x. y. cudaErrorUnknown Note: Note that this function may also return error codes from previous.Mapped resource to access face .4 cudaError_t cudaD3D9ResourceGetMappedPitch (size_t ∗ pPitch. then cudaErrorInvalidResourceHandle is returned. which corresponds to face and level.2. cudaErrorInvalidResourceHandle. unsigned int face. For a 2D surface. y from the base pointer of the surface is: y ∗ pitch + (bytes per pixel) ∗ x For a 3D surface. see cudaD3D9ResourceGetMappedPointer().0. Returns in ∗pPitch and ∗pPitchSlice the pitch and Z-slice pitch of the subresource of the mapped Direct3D resource pResource.

Face of resource to access level . cudaErrorInvalidValue. If pResource was not registered with usage flags cudaD3D9RegisterFlagsNone. then level must correspond to a valid mipmap level. cudaErrorInvalidResourceHandle. then cudaErrorInvalidValue is returned. See also: cudaGraphicsResourceGetMappedPointer 4.Returned pointer corresponding to subresource pResource .2. cudaErrorInvalidResourceHandle.5 cudaError_t cudaD3D9ResourceGetMappedPointer (void ∗∗ pPointer.0. asynchronous launches. See also: cudaGraphicsResourceGetMappedPointer Generated for NVIDIA CUDA Library by Doxygen . then cudaErrorUnknown is returned. unsigned int face. For all other types level must be 0.Level of resource to access Returns: cudaSuccess. then cudaErrorInvalidValue is returned. unsigned int level) Deprecated This function is deprecated as of Cuda 3.118 Returns: Module Documentation cudaSuccess. cudaErrorUnknown Note: Note that this function may also return error codes from previous. The value set in pPointer may change every time that pResource is mapped. face must be 0. then cudaErrorInvalidResourceHandle is returned. IDirect3DResource9 ∗ pResource. If pResource is not registered. Only mipmap level 0 is supported for now. If face is invalid.Mapped resource to access face . cudaErrorUnknown Note: Note that this function may also return error codes from previous. If pResource is not mapped. If level is invalid. For all other types. If pResource is of type IDirect3DBaseTexture9. which corresponds to face and level. cudaErrorInvalidValue. If pResource is of type IDirect3DCubeTexture9. then cudaErrorInvalidResourceHandle is returned. then face must one of the values enumerated by type D3DCUBEMAP_FACES. Parameters: pPointer .20. Returns in ∗pPointer the base pointer of the subresource of the mapped Direct3D resource pResource. asynchronous launches.

cudaErrorInvalidResourceHandle. unsigned int face.2. If pResource is not mapped for access by CUDA then cudaErrorUnknown is returned. size_t ∗ pHeight.Level of resource to access Returns: cudaSuccess.Returned size of subresource pResource . see cudaD3D9ResourceGetMappedPointer(). cudaErrorInvalidValue. the value returned in ∗pDepth will be 0.Mapped resource to access face . Parameters: pWidth . ∗pHeight. asynchronous launches.20. For usage requirements of face and level parameters. See also: cudaGraphicsResourceGetMappedPointer 4.Returned width of surface Generated for NVIDIA CUDA Library by Doxygen . pHeight. it is possible that the dimensions of a resource will be an integer factor larger than the dimensions reported by the Direct3D runtime.6 119 cudaError_t cudaD3D9ResourceGetMappedSize (size_t ∗ pSize.2. unsigned int face. IDirect3DResource9 ∗ pResource. IDirect3DResource9 ∗ pResource.20 Direct3D 9 Interoperability [DEPRECATED] 4. Parameters: pSize .0. and ∗pDepth the dimensions of the subresource of the mapped Direct3D resource pResource which corresponds to face and level. unsigned int level) Deprecated This function is deprecated as of Cuda 3.4. size_t ∗ pDepth.20.0. For 2D surfaces. The value set in pSize may change every time that pResource is mapped. The parameters pWidth.7 cudaError_t cudaD3D9ResourceGetSurfaceDimensions (size_t ∗ pWidth. Returns in ∗pSize the size of the subresource of the mapped Direct3D resource pResource. For usage requirements of face and level parameters. Because anti-aliased surfaces may have multiple samples per pixel. unsigned int level) Deprecated This function is deprecated as of Cuda 3. which corresponds to face and level. see cudaD3D9ResourceGetMappedPointer. then cudaErrorInvalidResourceHandle is returned. Returns in ∗pWidth.Face of resource to access level . cudaErrorUnknown Note: Note that this function may also return error codes from previous. If pResource is not of type IDirect3DBaseTexture9 or IDirect3DSurface9 or if pResource has not been registered for use with CUDA. then cudaErrorInvalidResourceHandle is returned. If pResource was not registered with usage flags cudaD3D9RegisterFlagsNone. If pResource has not been registered for use with CUDA then cudaErrorInvalidResourceHandle is returned. and pDepth are optional.

2. Set flags for mapping the Direct3D resource pResource. The flags argument may be any of the following: • cudaD3D9MapFlagsNone: Specifies no hints about how this resource will be used. asynchronous launches. • cudaD3D9MapFlagsReadOnly: Specifies that CUDA kernels which access this resource will not write to this resource. See also: cudaGraphicsSubResourceGetMappedArray 4.Level of resource to access Returns: cudaSuccess. cudaErrorInvalidValue. • cudaD3D9MapFlagsWriteDiscard: Specifies that CUDA kernels which access this resource will not read from this resource and will write over the entire contents of the resource. then cudaErrorUnknown is returned. cudaErrorUnknown Note: Note that this function may also return error codes from previous.Registered resource to access face . If pResource is presently mapped for access by CUDA.120 pHeight . asynchronous launches. Changes to flags will take effect the next time pResource is mapped. Parameters: pResource .Returned depth of surface pResource .20. It is therefore assumed that this resource will be read from and written to by CUDA kernels. cudaErrorInvalidResourceHandle. unsigned int flags) Deprecated This function is deprecated as of Cuda 3.Parameters for resource mapping Returns: cudaSuccess.Face of resource to access level . Note: Module Documentation Note that this function may also return error codes from previous. This is the default value.8 cudaError_t cudaD3D9ResourceSetMapFlags (IDirect3DResource9 ∗ pResource. then cudaErrorInvalidResourceHandle is returned.Registered resource to set flags for flags . See also: cudaInteropResourceSetMapFlags Generated for NVIDIA CUDA Library by Doxygen . If pResource has not been registered for use with CUDA. so none of the data previously stored in the resource will be preserved. cudaErrorInvalidResourceHandle.Returned height of surface pDepth .0. cudaErrorInvalidValue.

20 Direct3D 9 Interoperability [DEPRECATED] 4. then cudaErrorInvalidResourceHandle is returned. Parameters: count .9 cudaError_t cudaD3D9UnmapResources (int count.Number of resources to unmap for CUDA ppResources . This function provides the synchronization guarantee that any CUDA kernels issued before cudaD3D9UnmapResources() will complete before any Direct3D calls issued after cudaD3D9UnmapResources() begin. cudaErrorInvalidResourceHandle. See also: cudaGraphicsUnmapResources 4. asynchronous launches. cudaErrorUnknown Note: Note that this function may also return error codes from previous. cudaErrorInvalidResourceHandle. Parameters: pResource .10 cudaError_t cudaD3D9UnregisterResource (IDirect3DResource9 ∗ pResource) Deprecated This function is deprecated as of Cuda 3. then cudaErrorInvalidResourceHandle is returned. If pResource is not registered.0.0. If any of ppResources are not presently mapped for access by CUDA then cudaErrorUnknown is returned.2. IDirect3DResource9 ∗∗ ppResources) 121 Deprecated This function is deprecated as of Cuda 3.Resources to unmap for CUDA Returns: cudaSuccess.2. If any of ppResources have not been registered for use with CUDA or if ppResources contains any duplicate entries.4. Unregisters the Direct3D resource pResource so it is not accessible by CUDA unless registered again.20. Unmaps the count Direct3D resources in ppResources.Resource to unregister Returns: cudaSuccess. asynchronous launches. cudaErrorUnknown Note: Note that this function may also return error codes from previous.20. See also: cudaGraphicsUnregisterResource Generated for NVIDIA CUDA Library by Doxygen .

• cudaError_t cudaD3D10ResourceGetMappedSize (size_t ∗pSize.0. • cudaError_t cudaD3D10ResourceGetMappedPitch (size_t ∗pPitch. ID3D10Resource ∗pResource.1 Detailed Description This section describes deprecated Direct3D 10 interoperability functions.21 Direct3D 10 Interoperability [DEPRECATED] Functions • cudaError_t cudaD3D10MapResources (int count.21. • cudaError_t cudaD3D10RegisterResource (ID3D10Resource ∗pResource.1 Function Documentation cudaError_t cudaD3D10MapResources (int count.122 Module Documentation 4. ID3D10Resource ∗pResource. ID3D10Resource ∗pResource. • cudaError_t cudaD3D10UnregisterResource (ID3D10Resource ∗pResource) Unregisters a Direct3D resource. • cudaError_t cudaD3D10ResourceGetSurfaceDimensions (size_t ∗pWidth.21. ID3D10Resource ∗pResource. • cudaError_t cudaD3D10ResourceGetMappedArray (cudaArray ∗∗ppArray. unsigned int subResource) Get a pointer through which to access a subresource of a Direct3D resource which has been mapped for access by CUDA. 4. size_t ∗pHeight. • cudaError_t cudaD3D10UnmapResources (int count. unsigned int subResource) Get the dimensions of a registered Direct3D surface. • cudaError_t cudaD3D10ResourceGetMappedPointer (void ∗∗pPointer. ID3D10Resource ∗pResource. • cudaError_t cudaD3D10ResourceSetMapFlags (ID3D10Resource ∗pResource. unsigned int flags) Register a Direct3D 10 resource for access by CUDA. ID3D10Resource ∗∗ppResources) Unmaps Direct3D resources. unsigned int subResource) Get an array through which to access a subresource of a Direct3D resource which has been mapped for access by CUDA.2. unsigned int subResource) Get the size of a subresource of a Direct3D resource which has been mapped for access by CUDA. unsigned int subResource) Get the pitch of a subresource of a Direct3D resource which has been mapped for access by CUDA. unsigned int flags) Set usage flags for mapping a Direct3D resource. Generated for NVIDIA CUDA Library by Doxygen . size_t ∗pDepth. 4.21. ID3D10Resource ∗∗ ppResources) Deprecated This function is deprecated as of Cuda 3. ID3D10Resource ∗∗ppResources) Map Direct3D Resources for access by CUDA. size_t ∗pPitchSlice.2 4.

4. unsigned int flags) Deprecated This function is deprecated as of Cuda 3. asynchronous launches. If this call is successful. 123 The resources in ppResources may be accessed in CUDA kernels until they are unmapped. The following values are allowed. This reference count will be decremented when this resource is unregistered through cudaD3D10UnregisterResource().Resources to map for CUDA Returns: cudaSuccess. cudaErrorUnknown Note: Note that this function may also return error codes from previous. cudaErrorInvalidResourceHandle.0. • ID3D10Texture1D: No restrictions. The flags argument specifies the mechanism through which CUDA will access the Direct3D resource. Registers the Direct3D resource pResource for access by CUDA. • ID3D10Texture2D: No restrictions.2 cudaError_t cudaD3D10RegisterResource (ID3D10Resource ∗ pResource. This call is potentially high-overhead and should not be called every frame in interactive applications. The type of pResource must be one of the following: • ID3D10Buffer: Cannot be used with flags set to cudaD3D10RegisterFlagsArray. Parameters: count .Number of resources to map for CUDA ppResources . If any of ppResources have not been registered for use with CUDA or if ppResources contains any duplicate entries then cudaErrorInvalidResourceHandle is returned.21 Direct3D 10 Interoperability [DEPRECATED] Maps the count Direct3D resources in ppResources for access by CUDA. this call will increase the internal reference count on pResource. Also on success. Generated for NVIDIA CUDA Library by Doxygen . • ID3D10Texture3D: No restrictions. If any of ppResources are presently mapped for access by CUDA then cudaErrorUnknown is returned. If an application does so. the results are undefined. then the application will be able to map and unmap this resource until it is unregistered through cudaD3D10UnregisterResource(). This function provides the synchronization guarantee that any Direct3D calls issued before cudaD3D10MapResources() will complete before any CUDA kernels issued after cudaD3D10MapResources() begin.2.21. Direct3D should not access any resources while they are mapped by CUDA. See also: cudaGraphicsMapResources 4.

cudaErrorUnknown Note: Note that this function may also return error codes from previous.124 Module Documentation • cudaD3D10RegisterFlagsNone: Specifies that CUDA will access this resource through a void∗. cudaErrorInvalidResourceHandle. If Direct3D interoperability is not initialized on this context then cudaErrorInvalidDevice is returned. The pointer. If pResource was not registered with usage flags cudaD3D10RegisterFlagsArray. and ID3D10Texture3D. cudaErrorInvalidValue.2. see cudaD3D10ResourceGetMappedPointer(). If pResource is not registered. size. cudaD3D10ResourceGetMappedSize(). Generated for NVIDIA CUDA Library by Doxygen . Returns in ∗ppArray an array through which the subresource of the mapped Direct3D resource pResource which corresponds to subResource may be accessed. • cudaD3D10RegisterFlagsArray: Specifies that CUDA will access this resource through a CUarray queried on a sub-resource basis through cuD3D10ResourceGetMappedArray().Parameters for resource registration Returns: cudaSuccess. Parameters: pResource .21. This option is only valid for resources of type ID3D10Texture1D. ID3D10Resource ∗ pResource. The value set in ppArray may change every time that pResource is mapped. 16. For usage requirements of the subResource parameter. Not all Direct3D resources of the above types may be used for interoperability with CUDA. or 32-bit integer or floating-point data cannot be shared. • Resources allocated as shared may not be registered with CUDA. • Surfaces of depth or stencil formats cannot be shared.Resource to register flags . and cudaD3D10ResourceGetMappedPitch() respectively. asynchronous launches. • The primary rendertarget may not be registered with CUDA. This option is valid for all resource types.3 cudaError_t cudaD3D10ResourceGetMappedArray (cudaArray ∗∗ ppArray. • Textures which are not of a format which is 1. See also: cudaGraphicsD3D10RegisterResource 4. and pitch for each subresource of this resource may be queried through cudaD3D10ResourceGetMappedPointer(). ID3D10Texture2D. then cudaErrorInvalidResourceHandle is returned. unsigned int subResource) Deprecated This function is deprecated as of Cuda 3. or 4 channels of 8. cudaErrorInvalidDevice.0. If pResource is not mapped then cudaErrorUnknown is returned. then cudaErrorInvalidResourceHandle is returned. 2. If pResource is of incorrect type or is already registered then cudaErrorInvalidResourceHandle is returned. The following are some limitations. If pResource cannot be registered then cudaErrorUnknown is returned.

Returned pitch of subresource pPitchSlice . If pResource is not of type ID3D10Texture1D. The values set in pPitch and pPitchSlice may change every time that pResource is mapped. or if pResource has not been registered for use with CUDA. Parameters: pPitch . y. If pResource is not mapped for access by CUDA then cudaErrorUnknown is returned.2. The pitch and Z-slice pitch values may be used to compute the location of a sample on a surface as follows. the byte offset of the sample at position x.Returned array corresponding to subresource pResource . cudaErrorInvalidResourceHandle. size_t ∗ pPitchSlice. y from the base pointer of the surface is: y ∗ pitch + (bytes per pixel) ∗ x For a 3D surface. Returns in ∗pPitch and ∗pPitchSlice the pitch and Z-slice pitch of the subresource of the mapped Direct3D resource pResource.21 Direct3D 10 Interoperability [DEPRECATED] Parameters: ppArray .4 cudaError_t cudaD3D10ResourceGetMappedPitch (size_t ∗ pPitch.Returned Z-slice pitch of subresource pResource . cudaErrorUnknown Note: Note that this function may also return error codes from previous. For usage requirements of the subResource parameter see cudaD3D10ResourceGetMappedPointer().Subresource of pResource to access Returns: cudaSuccess.4. See also: cudaGraphicsSubResourceGetMappedArray 125 4. cudaErrorInvalidValue. z from the base pointer of the surface is: z∗ slicePitch + y ∗ pitch + (bytes per pixel) ∗ x Both parameters pPitch and pPitchSlice are optional and may be set to NULL. ID3D10Texture2D. cudaErrorUnknown Generated for NVIDIA CUDA Library by Doxygen . If pResource was not registered with usage flags cudaD3D10RegisterFlagsNone. then cudaErrorInvalidResourceHandle is returned.Mapped resource to access subResource . asynchronous launches. or ID3D10Texture3D. cudaErrorInvalidValue.21. cudaErrorInvalidResourceHandle.Mapped resource to access subResource . For a 2D surface. ID3D10Resource ∗ pResource.0. the byte offset of the sample at position x.Subresource of pResource to access Returns: cudaSuccess. then cudaErrorInvalidResourceHandle is returned. which corresponds to subResource. unsigned int subResource) Deprecated This function is deprecated as of Cuda 3.

If pResource is not registered. See also: cudaGraphicsSubResourceGetMappedArray 4. If pResource is not mapped then cudaErrorUnknown is returned.0. Returns in ∗pPointer the base pointer of the subresource of the mapped Direct3D resource pResource which corresponds to subResource.Subresource of pResource to access Returns: cudaSuccess.5 cudaError_t cudaD3D10ResourceGetMappedPointer (void ∗∗ pPointer. then cudaErrorInvalidResourceHandle is returned.6 cudaError_t cudaD3D10ResourceGetMappedSize (size_t ∗ pSize. Parameters: pPointer . cudaErrorInvalidValue. If pResource is of type ID3D10Buffer then subResource must be 0.2. asynchronous launches. The value set in pPointer may change every time that pResource is mapped. The value set in pSize may change every time that pResource is mapped. For usage requirements of the subResource parameter see cudaD3D10ResourceGetMappedPointer(). ID3D10Resource ∗ pResource. cudaErrorUnknown Note: Note that this function may also return error codes from previous.Mapped resource to access subResource . then cudaErrorInvalidResourceHandle is returned.2. then the value of subResource must come from the subresource calculation in D3D10CalcSubResource().0.21. If pResource was not registered with usage flags cudaD3D10RegisterFlagsNone. If pResource is not mapped for access by CUDA then cudaErrorUnknown is returned. ID3D10Resource ∗ pResource. Generated for NVIDIA CUDA Library by Doxygen .21. then cudaErrorInvalidResourceHandle is returned.Returned pointer corresponding to subresource pResource . cudaErrorInvalidResourceHandle. See also: cudaGraphicsResourceGetMappedPointer 4. unsigned int subResource) Deprecated This function is deprecated as of Cuda 3. unsigned int subResource) Deprecated This function is deprecated as of Cuda 3.126 Note: Module Documentation Note that this function may also return error codes from previous. If pResource was not registered with usage flags cudaD3D9RegisterFlagsNone. If pResource is of any other type. If pResource has not been registered for use with CUDA then cudaErrorInvalidHandle is returned. Returns in ∗pSize the size of the subresource of the mapped Direct3D resource pResource which corresponds to subResource. asynchronous launches.

4. ∗pHeight. size_t ∗ pHeight.21 Direct3D 10 Interoperability [DEPRECATED] Parameters: pSize . Note: Note that this function may also return error codes from previous. See also: cudaGraphicsResourceGetMappedPointer 127 4. asynchronous launches. asynchronous launches. unsigned int subResource) Deprecated This function is deprecated as of Cuda 3.Mapped resource to access subResource . The parameters pWidth.Registered resource to access subResource .Subresource of pResource to access Returns: cudaSuccess. pHeight. cudaErrorInvalidResourceHandle. the value returned in ∗pDepth will be 0.7 cudaError_t cudaD3D10ResourceGetSurfaceDimensions (size_t ∗ pWidth. or ID3D10Texture3D. cudaErrorInvalidValue. or if pResource has not been registered for use with CUDA.Returned height of surface pDepth . and pDepth are optional. and ∗pDepth the dimensions of the subresource of the mapped Direct3D resource pResource which corresponds to subResource. cudaErrorUnknown Note: Note that this function may also return error codes from previous. ID3D10Resource ∗ pResource. it is possible that the dimensions of a resource will be an integer factor larger than the dimensions reported by the Direct3D runtime.2. Because anti-aliased surfaces may have multiple samples per pixel. then cudaErrorInvalidHandle is returned. For usage requirements of subResource parameters see cudaD3D10ResourceGetMappedPointer(). For 2D surfaces. cudaErrorInvalidResourceHandle. cudaErrorInvalidValue.Returned size of subresource pResource . If pResource is not of type ID3D10Texture1D.Returned width of surface pHeight . Returns in ∗pWidth. ID3D10Texture2D. size_t ∗ pDepth.Subresource of pResource to access Returns: cudaSuccess.0.21. See also: cudaGraphicsSubResourceGetMappedArray Generated for NVIDIA CUDA Library by Doxygen .Returned depth of surface pResource . Parameters: pWidth .

9 cudaError_t cudaD3D10UnmapResources (int count. ID3D10Resource ∗∗ ppResources) Deprecated This function is deprecated as of Cuda 3.0. See also: cudaGraphicsResourceSetMapFlags 4. This is the default value. Generated for NVIDIA CUDA Library by Doxygen . It is therefore assumed that this resource will be read from and written to by CUDA kernels. unsigned int flags) Deprecated This function is deprecated as of Cuda 3. cudaErrorUnknown. Note: Note that this function may also return error codes from previous.21. The flags argument may be any of the following: • cudaD3D10MapFlagsNone: Specifies no hints about how this resource will be used. Set usage flags for mapping the Direct3D resource pResource. so none of the data previously stored in the resource will be preserved. Parameters: pResource .2.0. then cudaErrorInvalidResourceHandle is returned. asynchronous launches.Parameters for resource mapping Returns: cudaSuccess. If pResource is presently mapped for access by CUDA then cudaErrorUnknown is returned.2. • cudaD3D10MapFlagsWriteDiscard: Specifies that CUDA kernels which access this resource will not read from this resource and will write over the entire contents of the resource.128 4. Unmaps the count Direct3D resource in ppResources. This function provides the synchronization guarantee that any CUDA kernels issued before cudaD3D10UnmapResources() will complete before any Direct3D calls issued after cudaD3D10UnmapResources() begin. If pResource has not been registered for use with CUDA then cudaErrorInvalidHandle is returned.21. • cudaD3D10MapFlagsReadOnly: Specifies that CUDA kernels which access this resource will not write to this resource.Registered resource to set flags for flags . If any of ppResources have not been registered for use with CUDA or if ppResources contains any duplicate entries.8 Module Documentation cudaError_t cudaD3D10ResourceSetMapFlags (ID3D10Resource ∗ pResource. cudaErrorInvalidValue. cudaErrorInvalidResourceHandle. If any of ppResources are not presently mapped for access by CUDA then cudaErrorUnknown is returned. Changes to flags will take effect the next time pResource is mapped.

Resources to unmap for CUDA Returns: cudaSuccess.21. cudaErrorInvalidResourceHandle.Number of resources to unmap for CUDA ppResources .Resource to unregister Returns: cudaSuccess. If pResource is not registered. cudaErrorUnknown Note: Note that this function may also return error codes from previous. asynchronous launches.0. asynchronous launches.10 cudaError_t cudaD3D10UnregisterResource (ID3D10Resource ∗ pResource) Deprecated This function is deprecated as of Cuda 3. cudaErrorInvalidResourceHandle. See also: cudaGraphicsUnmapResources 129 4. cudaErrorUnknown Note: Note that this function may also return error codes from previous.2.4. See also: cudaGraphicsUnregisterResource Generated for NVIDIA CUDA Library by Doxygen .21 Direct3D 10 Interoperability [DEPRECATED] Parameters: count . Parameters: pResource . then cudaErrorInvalidResourceHandle is returned. Unregisters the Direct3D resource resource so it is not accessible by CUDA unless registered again.

22. The OpenGL context used to create the buffer. must be bound to the current thread when this is called. • cudaError_t cudaGLUnregisterBufferObject (GLuint bufObj) Unregisters a buffer object for access by CUDA. any OpenGL operation which references the buffer will result in undefined behavior. All streams in the current thread are synchronized with the current GL context. cudaStream_t stream) Maps a buffer object for access by CUDA. GLuint bufObj.1 Detailed Description This section describes deprecated OpenGL interoperability functionality.130 Module Documentation 4.22 OpenGL Interoperability [DEPRECATED] Functions • cudaError_t cudaGLMapBufferObject (void ∗∗devPtr. • cudaError_t cudaGLUnmapBufferObjectAsync (GLuint bufObj.2 4.Buffer object ID to map Generated for NVIDIA CUDA Library by Doxygen . • cudaError_t cudaGLMapBufferObjectAsync (void ∗∗devPtr.22. • cudaError_t cudaGLRegisterBufferObject (GLuint bufObj) Registers a buffer object for access by CUDA. GLuint bufObj) Deprecated This function is deprecated as of Cuda 3. Parameters: devPtr . The buffer must have previously been registered by calling cudaGLRegisterBufferObject(). cudaStream_t stream) Unmaps a buffer object for access by CUDA. • cudaError_t cudaGLSetBufferObjectMapFlags (GLuint bufObj. While a buffer is mapped by CUDA. or another context from the same share group. 4. unsigned int flags) Set usage flags for mapping an OpenGL buffer.2.0.1 Function Documentation cudaError_t cudaGLMapBufferObject (void ∗∗ devPtr. 4.Returned device pointer to CUDA object bufObj .22. Maps the buffer object of ID bufObj into the address space of CUDA and returns in ∗devPtr the base pointer of the resulting mapping. GLuint bufObj) Maps a buffer object for access by CUDA. • cudaError_t cudaGLUnmapBufferObject (GLuint bufObj) Unmaps a buffer object for access by CUDA.

or another context from the same share group. Generated for NVIDIA CUDA Library by Doxygen . See also: cudaGraphicsMapResources 131 4. Stream /p stream is synchronized with the current GL context.2. The OpenGL context used to create the buffer.3 cudaError_t cudaGLRegisterBufferObject (GLuint bufObj) Deprecated This function is deprecated as of Cuda 3. GLuint bufObj. This function must be called before CUDA can map the buffer object.2.4.Stream to synchronize Returns: cudaSuccess. cudaStream_t stream) Deprecated This function is deprecated as of Cuda 3.22. any OpenGL operation which references the buffer will result in undefined behavior. While a buffer is mapped by CUDA.0. cudaErrorMapBufferObjectFailed Note: Note that this function may also return error codes from previous. asynchronous launches. or another context from the same share group.Returned device pointer to CUDA object bufObj .22 OpenGL Interoperability [DEPRECATED] Returns: cudaSuccess. asynchronous launches.22. Parameters: devPtr . must be bound to the current thread when this is called. Registers the buffer object of ID bufObj for access by CUDA.0. must be bound to the current thread when this is called.2 cudaError_t cudaGLMapBufferObjectAsync (void ∗∗ devPtr. The buffer must have previously been registered by calling cudaGLRegisterBufferObject(). Maps the buffer object of ID bufObj into the address space of CUDA and returns in ∗devPtr the base pointer of the resulting mapping. See also: cudaGraphicsMapResources 4.Buffer object ID to map stream . cudaErrorMapBufferObjectFailed Note: Note that this function may also return error codes from previous. The OpenGL context used to create the buffer.

The flags argument may be any of the following: • cudaGLMapFlagsNone: Specifies no hints about how this buffer will be used.2. Parameters: bufObj . cudaErrorInitializationError Note: Module Documentation Note that this function may also return error codes from previous. • cudaGLMapFlagsWriteDiscard: Specifies that CUDA kernels which access this buffer will not read from the buffer and will write over the entire contents of the buffer. If bufObj is presently mapped for access by CUDA. cudaErrorUnknown Note: Note that this function may also return error codes from previous. This is the default value. If bufObj has not been registered for use with CUDA. cudaErrorInvalidResourceHandle.0.Buffer object ID to register Returns: cudaSuccess. unsigned int flags) Deprecated This function is deprecated as of Cuda 3. It is therefore assumed that this buffer will be read from and written to by CUDA kernels.Parameters for buffer mapping Returns: cudaSuccess. asynchronous launches. Set flags for mapping the OpenGL buffer bufObj Changes to flags will take effect the next time bufObj is mapped.22. cudaErrorInvalidValue. so none of the data previously stored in the buffer will be preserved.4 cudaError_t cudaGLSetBufferObjectMapFlags (GLuint bufObj.132 Parameters: bufObj .Registered buffer object to set flags for flags . then cudaErrorUnknown is returned. asynchronous launches. • cudaGLMapFlagsReadOnly: Specifies that CUDA kernels which access this buffer will not write to the buffer. then cudaErrorInvalidResourceHandle is returned. See also: cudaGraphicsResourceSetMapFlags Generated for NVIDIA CUDA Library by Doxygen . See also: cudaGraphicsGLRegisterBuffer 4.

5 cudaError_t cudaGLUnmapBufferObject (GLuint bufObj) 133 Deprecated This function is deprecated as of Cuda 3. cudaErrorUnmapBufferObjectFailed Note: Note that this function may also return error codes from previous.22. See also: cudaGraphicsUnmapResources Generated for NVIDIA CUDA Library by Doxygen .6 cudaError_t cudaGLUnmapBufferObjectAsync (GLuint bufObj.Buffer object to unmap stream .2.Stream to synchronize Returns: cudaSuccess. must be bound to the current thread when this is called. the base address returned by cudaGLMapBufferObject() is invalid and subsequent references to the address result in undefined behavior.22 OpenGL Interoperability [DEPRECATED] 4.Buffer object to unmap Returns: cudaSuccess. Stream /p stream is synchronized with the current GL context. See also: cudaGraphicsUnmapResources 4. Parameters: bufObj . asynchronous launches.0. Parameters: bufObj .4. When a buffer is unmapped.2. Unmaps the buffer object of ID bufObj for access by CUDA. The OpenGL context used to create the buffer. cudaErrorUnmapBufferObjectFailed Note: Note that this function may also return error codes from previous. The OpenGL context used to create the buffer. or another context from the same share group. cudaErrorInvalidDevicePointer. the base address returned by cudaGLMapBufferObject() is invalid and subsequent references to the address result in undefined behavior. Unmaps the buffer object of ID bufObj for access by CUDA. must be bound to the current thread when this is called. cudaErrorInvalidDevicePointer.22. or another context from the same share group. All streams in the current thread are synchronized with the current GL context. asynchronous launches. cudaStream_t stream) Deprecated This function is deprecated as of Cuda 3.0. When a buffer is unmapped.

Unregisters the buffer object of ID bufObj for access by CUDA and releases any CUDA resources associated with the buffer. asynchronous launches. or another context from the same share group.0.22. it may no longer be mapped by CUDA.7 cudaError_t cudaGLUnregisterBufferObject (GLuint bufObj) Module Documentation Deprecated This function is deprecated as of Cuda 3.134 4. See also: cudaGraphicsUnregisterResource Generated for NVIDIA CUDA Library by Doxygen .2. Once a buffer is unregistered. Parameters: bufObj . must be bound to the current thread when this is called.Buffer object to unregister Returns: cudaSuccess Note: Note that this function may also return error codes from previous. The GL context used to create the buffer.

cudaAddressModeClamp = 1.4. cudaAddressModeBorder = 3 } • enum cudaTextureFilterMode { cudaFilterModePoint = 0. cudaAddressModeMirror = 2. cudaChannelFormatKindNone = 3 } Generated for NVIDIA CUDA Library by Doxygen .23 Data types used by CUDA Runtime 135 4. cudaBoundaryModeClamp = 1.23 Data types used by CUDA Runtime Data Structures • • • • • • • • • struct cudaChannelFormatDesc struct cudaDeviceProp struct cudaExtent struct cudaFuncAttributes struct cudaMemcpy3DParms struct cudaPitchedPtr struct cudaPos struct surfaceReference struct textureReference Enumerations • enum cudaSurfaceBoundaryMode { cudaBoundaryModeZero = 0. cudaFormatModeAuto = 1 } • enum cudaTextureAddressMode { cudaAddressModeWrap = 0. cudaFilterModeLinear = 1 } • enum cudaTextureReadMode { cudaReadModeElementType = 0. cudaChannelFormatKindFloat = 2. cudaReadModeNormalizedFloat = 1 } Data types used by CUDA Runtime Data types used by CUDA Runtime Author: NVIDIA Corporation • enum cudaChannelFormatKind { cudaChannelFormatKindSigned = 0. cudaChannelFormatKindUnsigned = 1. cudaBoundaryModeTrap = 2 } • enum cudaSurfaceFormatMode { cudaFormatModeForced = 0.

cudaErrorInvalidChannelDescriptor = 20. cudaErrorLaunchOutOfResources = 7. cudaErrorNotReady = 34. cudaErrorInitializationError = 3. cudaErrorTextureFetchFailed = 23. cudaErrorInvalidDevicePointer = 17. cudaErrorMissingConfiguration = 1. cudaComputeModeExclusive = 1. cudaErrorInvalidConfiguration = 9. cudaErrorLaunchFailure = 4. cudaErrorInvalidDeviceFunction = 8. cudaErrorMemoryAllocation = 2. cudaErrorInvalidTexture = 18. cudaErrorInvalidDevice = 10. cudaErrorInsufficientDriver = 35. cudaErrorSynchronizationError = 25. cudaErrorInvalidMemcpyDirection = 21. cudaErrorInvalidNormSetting = 27. cudaErrorUnknown = 30. cudaErrorUnmapBufferObjectFailed = 15. cudaErrorInvalidHostPointer = 16. cudaErrorInvalidTextureBinding = 19. cudaErrorMemoryValueTooLarge = 32. Module Documentation Generated for NVIDIA CUDA Library by Doxygen . cudaErrorTextureNotBound = 24. cudaErrorInvalidFilterSetting = 26. cudaErrorMixedDeviceExecution = 28. cudaComputeModeProhibited = 2 } • enum cudaError { cudaSuccess = 0. cudaErrorNotYetImplemented = 31. cudaErrorAddressOfConstant = 22. cudaErrorInvalidSymbol = 13. cudaErrorCudartUnloading = 29. cudaErrorInvalidPitchValue = 12.136 • enum cudaComputeMode { cudaComputeModeDefault = 0. cudaErrorMapBufferObjectFailed = 14. cudaErrorPriorLaunchFailure = 5. cudaErrorInvalidValue = 11. cudaErrorInvalidResourceHandle = 33. cudaErrorLaunchTimeout = 6.

cudaGraphicsCubeFaceNegativeY = 0x03. cudaMemcpyDeviceToDevice = 3 } • typedef enum cudaError cudaError_t Generated for NVIDIA CUDA Library by Doxygen 137 . cudaLimitPrintfFifoSize = 0x01. cudaErrorInvalidKernelImage = 47. cudaErrorStartupFailure = 0x7f. cudaMemcpyDeviceToHost = 2. cudaGraphicsCubeFaceNegativeX = 0x01. cudaGraphicsMapFlagsReadOnly = 1. cudaErrorDuplicateVariableName = 43.23 Data types used by CUDA Runtime cudaErrorSetOnActiveProcess = 36. cudaErrorNoDevice = 38. cudaErrorNoKernelImageForDevice = 48. cudaGraphicsCubeFacePositiveY = 0x02. cudaFuncCachePreferL1 = 2 } • enum cudaGraphicsCubeFace { cudaGraphicsCubeFacePositiveX = 0x00. cudaFuncCachePreferShared = 1. cudaMemcpyHostToDevice = 1. cudaGraphicsCubeFacePositiveZ = 0x04. cudaGraphicsCubeFaceNegativeZ = 0x05 } • enum cudaGraphicsMapFlags { cudaGraphicsMapFlagsNone = 0. cudaErrorSharedObjectSymbolNotFound = 40. cudaErrorApiFailureBase = 10000 } • enum cudaFuncCache { cudaFuncCachePreferNone = 0. cudaErrorDevicesUnavailable = 46. cudaErrorDuplicateTextureName = 44.4. cudaErrorDuplicateSurfaceName = 45. cudaErrorUnsupportedLimit = 42. cudaGraphicsMapFlagsWriteDiscard = 2 } • enum cudaGraphicsRegisterFlags { cudaGraphicsRegisterFlagsNone = 0 } • enum cudaLimit { cudaLimitStackSize = 0x00. cudaErrorIncompatibleDriverContext = 49. cudaErrorSharedObjectInitFailed = 41. cudaErrorInvalidSurface = 37. cudaLimitMallocHeapSize = 0x02 } • enum cudaMemcpyKind { cudaMemcpyHostToHost = 0. cudaErrorECCUncorrectable = 39.

1.23.1.23.1.3 #define cudaDeviceBlockingSync 4 Device flag .23.138 • • • • • • • • • • • • • • • • • • • • • typedef struct CUevent_st ∗ cudaEvent_t typedef struct cudaGraphicsResource ∗ cudaGraphicsResource_t typedef struct CUstream_st ∗ cudaStream_t typedef struct CUuuid_st cudaUUID_t #define cudaArrayDefault 0x00 #define cudaArraySurfaceLoadStore 0x02 #define cudaDeviceBlockingSync 4 #define cudaDeviceLmemResizeToMax 16 #define cudaDeviceMapHost 8 #define cudaDeviceMask 0x1f #define cudaDevicePropDontCare #define cudaDeviceScheduleAuto 0 #define cudaDeviceScheduleSpin 1 #define cudaDeviceScheduleYield 2 #define cudaEventBlockingSync 1 #define cudaEventDefault 0 #define cudaEventDisableTiming 2 #define cudaHostAllocDefault 0 #define cudaHostAllocMapped 2 #define cudaHostAllocPortable 1 #define cudaHostAllocWriteCombined 4 Module Documentation 4.5 #define cudaDeviceMapHost 8 Device flag .23.1.2 #define cudaArraySurfaceLoadStore 0x02 Must be set in cudaMallocArray in order to bind surfaces to the CUDA array 4.Use blocking synchronization 4.6 #define cudaDeviceMask 0x1f Device flags mask Generated for NVIDIA CUDA Library by Doxygen .1 4.23.4 #define cudaDeviceLmemResizeToMax 16 Device flag .1.23.1 Define Documentation #define cudaArrayDefault 0x00 Default CUDA array allocation flag 4.23.Keep local memory allocation after launch 4.Support mapped pinned allocations 4.1.

23.Automatic scheduling 4.1.4.23.23.17 #define cudaHostAllocWriteCombined 4 Write-combined memory Generated for NVIDIA CUDA Library by Doxygen .1.23.1.1.14 #define cudaHostAllocDefault 0 Default page-locked allocation flag 4.23 Data types used by CUDA Runtime 4.23.Yield default scheduling 4.1.11 #define cudaEventBlockingSync 1 Event uses blocking synchronization 4.23.23.13 #define cudaEventDisableTiming 2 Event will not record timing data 4.23.Spin default scheduling 4.23.15 #define cudaHostAllocMapped 2 Map allocation into device space 4.12 #define cudaEventDefault 0 Default event flag 4.1.23.16 #define cudaHostAllocPortable 1 Pinned memory accessible by all CUDA contexts 4.8 #define cudaDeviceScheduleAuto 0 Device flag .1.7 #define cudaDevicePropDontCare 139 Empty device properties 4.9 #define cudaDeviceScheduleSpin 1 Device flag .1.1.1.10 #define cudaDeviceScheduleYield 2 Device flag .23.1.

23.23.3 4.1 Enumeration Type Documentation enum cudaChannelFormatKind Channel format kind Enumerator: cudaChannelFormatKindSigned Signed channel format cudaChannelFormatKindUnsigned Unsigned channel format cudaChannelFormatKindFloat Float channel format cudaChannelFormatKindNone No channel format 4.2 4.2.2 typedef struct CUevent_st∗ cudaEvent_t CUDA event types 4.2.2 enum cudaComputeMode CUDA device compute modes Enumerator: cudaComputeModeDefault Default compute mode (Multiple threads can use cudaSetDevice() with this device) cudaComputeModeExclusive Compute-exclusive mode (Only one thread will be able to use cudaSetDevice() with this device) cudaComputeModeProhibited Compute-prohibited mode (No threads can use cudaSetDevice() with this device) Generated for NVIDIA CUDA Library by Doxygen .23.3.23.4 typedef struct CUstream_st∗ cudaStream_t CUDA stream 4.140 Module Documentation 4.1 Typedef Documentation typedef enum cudaError cudaError_t CUDA Error types 4.5 typedef struct CUuuid_st cudaUUID_t CUDA UUID types 4.2.2.2.23.23.23.3.23.23.3 typedef struct cudaGraphicsResource∗ cudaGraphicsResource_t CUDA graphics resource types 4.

The device cannot be used until cudaThreadExit() is called. Although this error is similar to cudaErrorInvalidConfiguration. Device emulation mode was removed with the CUDA 3. All existing device memory allocations are invalid and must be reconstructed if the program is to continue using CUDA. this can also mean that the operation being queried is complete (see cudaEventQuery() and cudaStreamQuery()). Requesting more shared memory per block than the device supports will trigger this error. cudaErrorInvalidDeviceFunction The requested device function does not exist or is not compiled for the proper device architecture.23 Data types used by CUDA Runtime 4. This was previously used for device emulation of kernel launches. cudaErrorInitializationError The API call failed because the CUDA driver and runtime could not be initialized.see the device property kernelExecTimeoutEnabled for more information. cudaErrorLaunchFailure An exception occurred on the device while executing a kernel. or the kernel launch specifies too many threads for the kernel’s register count. cudaErrorUnmapBufferObjectFailed This indicates that the buffer object could not be unmapped. as will requesting too many threads or blocks. cudaErrorMemoryAllocation The API call failed because it was unable to allocate enough memory to perform the requested operation. cudaErrorPriorLaunchFailure This indicated that a previous kernel launch failed. cudaErrorInvalidSymbol This indicates that the symbol name/identifier passed to the API call is not a valid name or identifier. Common causes include dereferencing an invalid device pointer and accessing out of bounds shared memory. cudaErrorInvalidHostPointer This indicates that at least one host pointer passed to the API call is not a valid host pointer.3.23. Deprecated This error return is deprecated as of CUDA 3.3 enum cudaError 141 CUDA error types Enumerator: cudaSuccess The API call returned with no errors. See cudaDeviceProp for more device limitations. Generated for NVIDIA CUDA Library by Doxygen . All existing device memory allocations are invalid and must be reconstructed if the program is to continue using CUDA. cudaErrorInvalidConfiguration This indicates that a kernel launch is requesting resources that can never be satisfied by the current device.1 release. cudaErrorMissingConfiguration The device function being invoked (usually via cudaLaunch()) was not previously configured via the cudaConfigureCall() function. This can only occur if timeouts are enabled . The device cannot be used until cudaThreadExit() is called. cudaErrorInvalidValue This indicates that one or more of the parameters passed to the API call is not within an acceptable range of values. cudaErrorInvalidPitchValue This indicates that one or more of the pitch-related parameters passed to the API call is not within the acceptable range for pitch. cudaErrorMapBufferObjectFailed This indicates that the buffer object could not be mapped. this error usually indicates that the user has attempted to pass too many arguments to the device kernel. cudaErrorLaunchOutOfResources This indicates that a launch did not occur because it did not have appropriate resources.1. cudaErrorLaunchTimeout This indicates that the device kernel took too long to execute. cudaErrorInvalidDevice This indicates that the device ordinal supplied by the user does not correspond to a valid CUDA device. In the case of query calls.4.

cudaErrorTextureFetchFailed This indicated that a texture fetch was not able to be performed. cudaErrorAddressOfConstant This indicated that the user has taken the address of a constant variable.1 release. cudaErrorTextureNotBound This indicated that a texture was not bound for access. cudaErrorInvalidNormSetting This indicates that an attempt was made to read a non-float texture as a normalized float. cudaErrorCudartUnloading This indicated an issue with calling API functions during the unload process of the CUDA runtime in prior releases. cudaErrorMixedDeviceExecution Mixing of device and device emulation code was not allowed.1. Deprecated This error return is deprecated as of CUDA 3. Deprecated This error return is deprecated as of CUDA 3. Deprecated This error return is deprecated as of CUDA 3. Deprecated This error return is deprecated as of CUDA 3. This is not supported by CUDA. Device emulation mode was removed with the CUDA 3. cudaErrorInvalidChannelDescriptor This indicates that the channel descriptor passed to the API call is not valid.1 release. Deprecated This error return is deprecated as of CUDA 3. Device emulation mode was removed with the CUDA 3. Device emulation mode was removed with the CUDA 3. This occurs if you call cudaGetTextureAlignmentOffset() with an unbound texture.142 Module Documentation cudaErrorInvalidDevicePointer This indicates that at least one device pointer passed to the API call is not a valid device pointer.1 release. cudaErrorInvalidFilterSetting This indicates that a non-float texture was being accessed with linear filtering.1.1. This was previously used for some device emulation functions.1. Generated for NVIDIA CUDA Library by Doxygen . cudaErrorInvalidTexture This indicates that the texture passed to the API call is not a valid texture. cudaErrorInvalidMemcpyDirection This indicates that the direction of the memcpy passed to the API call is not one of the types specified by cudaMemcpyKind. cudaErrorInvalidTextureBinding This indicates that the texture binding is not valid.2. which was forbidden up until the CUDA 3. Device emulation mode was removed with the CUDA 3. Variables in constant memory may now have their address taken by the runtime via cudaGetSymbolAddress(). Deprecated This error return is deprecated as of CUDA 3.1. This occurs if the format is not one of the formats specified by cudaChannelFormatKind.1 release. This is not supported by CUDA. cudaErrorUnknown This indicates that an unknown internal error has occurred.1 release. This was previously used for device emulation of texture operations. This was previously used for device emulation of texture operations. cudaErrorSynchronizationError This indicated that a synchronization operation had failed. or if one of the dimensions is invalid.

Deprecated This error return is deprecated as of CUDA 3. cudaErrorInsufficientDriver This indicates that the installed NVIDIA CUDA driver is older than the CUDA runtime library. cudaErrorECCUncorrectable This indicates that an uncorrectable ECC error was detected during execution. cudaErrorNoDevice This indicates that no CUDA-capable devices were detected by the installed CUDA driver. cudaErrorSharedObjectSymbolNotFound This indicates that a link to a shared object failed to resolve. ∗ or cudaVDPAUSetVDPAUDevice() after initializing the CUDA runtime by calling non-device management operations (allocating memory and launching kernels are examples of non-device management operations). This result is not actually an error.4. cudaSetDeviceFlags().23 Data types used by CUDA Runtime 143 cudaErrorNotYetImplemented This indicates that the API call is not yet implemented. cudaSetValidDevices(). cudaErrorNoKernelImageForDevice This indicates that there is no kernel image available that is suitable for the device. This can only occur if you are using CUDA Runtime/Driver interoperability and have created an existing Driver context using an older API. cudaErrorNotReady This indicates that asynchronous operations issued previously have not completed yet.1 release. cudaErrorDuplicateSurfaceName This indicates that multiple surfaces (across separate CUDA source files in the application) share the same string name. cudaErrorDevicesUnavailable This indicates that all CUDA devices are busy or unavailable at the current time. cudaErrorDuplicateVariableName This indicates that multiple global or constant variables (across separate CUDA source files in the application) share the same string name. cudaErrorUnsupportedLimit This indicates that the cudaLimit passed to the API call is not supported by the active device. Calls that may return this value include cudaEventQuery() and cudaStreamQuery(). but must be indicated differently than cudaSuccess (which indicates completion). Please see Interactions with the CUDA Driver API for more information. cudaErrorIncompatibleDriverContext This indicates that the current context is not compatible with this version of the CUDA Runtime. They can also be unavailable due to memory constraints on a device that already has active CUDA work being performed. This is not a supported configuration. cudaD3D11SetDirect3DDevice(). This can occur when a user specifies code generation options for a particular CUDA source file that do not include the corresponding device configuration. Device emulation mode was removed with the CUDA 3. cudaErrorDuplicateTextureName This indicates that multiple textures (across separate CUDA source files in the application) share the same string name. This error can also be returned if using runtime/driver interoperability and there is an existing CUcontext active on the host thread. Generated for NVIDIA CUDA Library by Doxygen . Devices are often busy/unavailable due to use of cudaComputeModeExclusive or cudaComputeModeProhibited. cudaErrorInvalidKernelImage This indicates that the device kernel image is invalid.1. Users should install an updated NVIDIA display driver to allow the application to run. cudaD3D10SetDirect3DDevice. cudaErrorInvalidSurface This indicates that the surface passed to the API call is not a valid surface. cudaErrorSetOnActiveProcess This indicates that the user has called cudaSetDevice(). cudaErrorSharedObjectInitFailed This indicates that initialization of a shared object failed. cudaD3D9SetDirect3DDevice(). cudaErrorInvalidResourceHandle This indicates that a resource handle passed to the API call was not valid. Production releases of CUDA will never return this error. cudaErrorMemoryValueTooLarge This indicated that an emulated device pointer exceeded the 32-bit address range. Resource handles are opaque types like cudaStream_t and cudaEvent_t.

no preference cudaFuncCachePreferShared Prefer larger shared memory and smaller L1 cache cudaFuncCachePreferL1 Prefer larger L1 cache and smaller shared memory 4.23.4 enum cudaFuncCache CUDA function cache configurations Enumerator: cudaFuncCachePreferNone Default function cache configuration. 4. Production releases of CUDA should not return such errors.23.3. cudaErrorApiFailureBase Any unhandled CUDA driver error is added to this value and returned via the runtime.23.3.5 enum cudaGraphicsCubeFace CUDA graphics interop array indices for cube maps Enumerator: cudaGraphicsCubeFacePositiveX Positive X face of cubemap cudaGraphicsCubeFaceNegativeX Negative X face of cubemap cudaGraphicsCubeFacePositiveY Positive Y face of cubemap cudaGraphicsCubeFaceNegativeY Negative Y face of cubemap cudaGraphicsCubeFacePositiveZ Positive Z face of cubemap cudaGraphicsCubeFaceNegativeZ Negative Z face of cubemap 4.144 Module Documentation cudaErrorStartupFailure This indicates an internal startup failure in the CUDA runtime. Assume resource can be read/written cudaGraphicsMapFlagsReadOnly CUDA will not write to this resource cudaGraphicsMapFlagsWriteDiscard CUDA will only write to and will not read from this resource 4.23.3.3.6 enum cudaGraphicsMapFlags CUDA graphics interop map flags Enumerator: cudaGraphicsMapFlagsNone Default.7 enum cudaGraphicsRegisterFlags CUDA graphics interop register flags Enumerator: cudaGraphicsRegisterFlagsNone Default Generated for NVIDIA CUDA Library by Doxygen .

23.9 enum cudaMemcpyKind CUDA memory copy types Enumerator: cudaMemcpyHostToHost Host -> Host cudaMemcpyHostToDevice Host -> Device cudaMemcpyDeviceToHost Device -> Host cudaMemcpyDeviceToDevice Device -> Device 4.3.3.23 Data types used by CUDA Runtime 4.3.10 enum cudaSurfaceBoundaryMode CUDA Surface boundary modes Enumerator: cudaBoundaryModeZero Zero boundary mode cudaBoundaryModeClamp Clamp boundary mode cudaBoundaryModeTrap Trap boundary mode 4.11 enum cudaSurfaceFormatMode CUDA Surface format modes Enumerator: cudaFormatModeForced Forced format mode cudaFormatModeAuto Auto format mode 4.3.4.12 enum cudaTextureAddressMode CUDA texture address modes Enumerator: cudaAddressModeWrap Wrapping address mode cudaAddressModeClamp Clamp to edge address mode cudaAddressModeMirror Mirror address mode cudaAddressModeBorder Border address mode Generated for NVIDIA CUDA Library by Doxygen .23.23.23.3.8 enum cudaLimit 145 CUDA Limits Enumerator: cudaLimitStackSize GPU thread stack size cudaLimitPrintfFifoSize GPU printf FIFO size cudaLimitMallocHeapSize GPU malloc heap size 4.23.

146 4.3.14 enum cudaTextureReadMode CUDA texture read modes Enumerator: cudaReadModeElementType Read texture as specified element type cudaReadModeNormalizedFloat Read texture as normalized float Generated for NVIDIA CUDA Library by Doxygen .3.23.13 enum cudaTextureFilterMode Module Documentation CUDA texture filter modes Enumerator: cudaFilterModePoint Point filter mode cudaFilterModeLinear Linear filter mode 4.23.

24 CUDA Driver API Modules • • • • • • • • • • • • • • • • • • Data types used by CUDA driver Initialization Version Management Device Management Context Management Module Management Memory Management Stream Management Event Management Execution Control Texture Reference Management Surface Reference Management Graphics Interoperability OpenGL Interoperability Direct3D 9 Interoperability Direct3D 10 Interoperability Direct3D 11 Interoperability VDPAU Interoperability 4.24 CUDA Driver API 147 4. Generated for NVIDIA CUDA Library by Doxygen .24.4.1 Detailed Description This section describes the low-level CUDA driver application programming interface.

25 Data types used by CUDA driver Data Structures • • • • • struct CUDA_ARRAY3D_DESCRIPTOR_st struct CUDA_ARRAY_DESCRIPTOR_st struct CUDA_MEMCPY2D_st struct CUDA_MEMCPY3D_st struct CUdevprop_st Defines • • • • • • • • • • • #define CU_MEMHOSTALLOC_DEVICEMAP 0x02 #define CU_MEMHOSTALLOC_PORTABLE 0x01 #define CU_MEMHOSTALLOC_WRITECOMBINED 0x04 #define CU_PARAM_TR_DEFAULT -1 #define CU_TRSA_OVERRIDE_FORMAT 0x01 #define CU_TRSF_NORMALIZED_COORDINATES 0x02 #define CU_TRSF_READ_AS_INTEGER 0x01 #define CU_TRSF_SRGB 0x10 #define CUDA_ARRAY3D_2DARRAY 0x01 #define CUDA_ARRAY3D_SURFACE_LDST 0x02 #define CUDA_VERSION 3020 Typedefs • • • • • • • • • • • • • • • • • • • • • • • typedef enum CUaddress_mode_enum CUaddress_mode typedef struct CUarray_st ∗ CUarray typedef enum CUarray_cubemap_face_enum CUarray_cubemap_face typedef enum CUarray_format_enum CUarray_format typedef enum CUcomputemode_enum CUcomputemode typedef struct CUctx_st ∗ CUcontext typedef enum CUctx_flags_enum CUctx_flags typedef struct CUDA_ARRAY3D_DESCRIPTOR_st CUDA_ARRAY3D_DESCRIPTOR typedef struct CUDA_ARRAY_DESCRIPTOR_st CUDA_ARRAY_DESCRIPTOR typedef struct CUDA_MEMCPY2D_st CUDA_MEMCPY2D typedef struct CUDA_MEMCPY3D_st CUDA_MEMCPY3D typedef int CUdevice typedef enum CUdevice_attribute_enum CUdevice_attribute typedef unsigned int CUdeviceptr typedef struct CUdevprop_st CUdevprop typedef struct CUevent_st ∗ CUevent typedef enum CUevent_flags_enum CUevent_flags typedef enum CUfilter_mode_enum CUfilter_mode typedef enum CUfunc_cache_enum CUfunc_cache typedef struct CUfunc_st ∗ CUfunction typedef enum CUfunction_attribute_enum CUfunction_attribute typedef enum CUgraphicsMapResourceFlags_enum CUgraphicsMapResourceFlags typedef enum CUgraphicsRegisterFlags_enum CUgraphicsRegisterFlags Generated for NVIDIA CUDA Library by Doxygen .148 Module Documentation 4.

25 Data types used by CUDA driver • • • • • • • • • • • typedef struct CUgraphicsResource_st ∗ CUgraphicsResource typedef enum CUjit_fallback_enum CUjit_fallback typedef enum CUjit_option_enum CUjit_option typedef enum CUjit_target_enum CUjit_target typedef enum CUlimit_enum CUlimit typedef enum CUmemorytype_enum CUmemorytype typedef struct CUmod_st ∗ CUmodule typedef enum cudaError_enum CUresult typedef struct CUstream_st ∗ CUstream typedef struct CUsurfref_st ∗ CUsurfref typedef struct CUtexref_st ∗ CUtexref 149 Enumerations • enum CUaddress_mode_enum { CU_TR_ADDRESS_MODE_WRAP = 0. CU_CTX_SCHED_SPIN = 1. CU_CUBEMAP_FACE_POSITIVE_Z = 0x04. CU_AD_FORMAT_SIGNED_INT8 = 0x08. CU_AD_FORMAT_UNSIGNED_INT16 = 0x02. CU_CUBEMAP_FACE_NEGATIVE_X = 0x01. CU_AD_FORMAT_FLOAT = 0x20 } • enum CUcomputemode_enum { CU_COMPUTEMODE_DEFAULT = 0. CU_CUBEMAP_FACE_NEGATIVE_Y = 0x03. CU_CUBEMAP_FACE_POSITIVE_Y = 0x02. CU_TR_ADDRESS_MODE_CLAMP = 1.4. CU_CTX_BLOCKING_SYNC = 4. CU_CUBEMAP_FACE_NEGATIVE_Z = 0x05 } • enum CUarray_format_enum { CU_AD_FORMAT_UNSIGNED_INT8 = 0x01. CU_TR_ADDRESS_MODE_BORDER = 3 } • enum CUarray_cubemap_face_enum { CU_CUBEMAP_FACE_POSITIVE_X = 0x00. CU_TR_ADDRESS_MODE_MIRROR = 2. CU_COMPUTEMODE_EXCLUSIVE = 1. CU_COMPUTEMODE_PROHIBITED = 2 } • enum CUctx_flags_enum { CU_CTX_SCHED_AUTO = 0. CU_CTX_MAP_HOST = 8. CU_CTX_LMEM_RESIZE_TO_MAX = 16 } Generated for NVIDIA CUDA Library by Doxygen . CU_AD_FORMAT_HALF = 0x10. CU_AD_FORMAT_SIGNED_INT16 = 0x09. CU_CTX_SCHED_YIELD = 2 . CU_AD_FORMAT_UNSIGNED_INT32 = 0x03. CU_AD_FORMAT_SIGNED_INT32 = 0x0a.

CUDA_ERROR_DEINITIALIZED = 4. CUDA_ERROR_INVALID_IMAGE = 200. CUDA_ERROR_ALREADY_ACQUIRED = 210. CUDA_ERROR_ALREADY_MAPPED = 208. CUDA_ERROR_INVALID_VALUE = 1. CUDA_ERROR_LAUNCH_FAILED = 700. CUDA_ERROR_NOT_INITIALIZED = 3. CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_X = 5. CUDA_ERROR_ECC_UNCORRECTABLE = 214.150 • enum cudaError_enum { CUDA_SUCCESS = 0. CUDA_ERROR_NOT_MAPPED = 211. CUDA_ERROR_OPERATING_SYSTEM = 304. CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND = 302. CUDA_ERROR_INVALID_SOURCE = 300. CUDA_ERROR_NOT_MAPPED_AS_ARRAY = 212. CUDA_ERROR_MAP_FAILED = 205. CUDA_ERROR_ARRAY_IS_MAPPED = 207. CUDA_ERROR_FILE_NOT_FOUND = 301. CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES = 701. CUDA_ERROR_UNSUPPORTED_LIMIT = 215. CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Y = 3. CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING = 703. CUDA_ERROR_LAUNCH_TIMEOUT = 702. CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_X = 2. CUDA_ERROR_INVALID_HANDLE = 400. CUDA_ERROR_INVALID_DEVICE = 101. CUDA_ERROR_UNKNOWN = 999 } • enum CUdevice_attribute_enum { CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK = 1. CUDA_ERROR_OUT_OF_MEMORY = 2. CUDA_ERROR_INVALID_CONTEXT = 201. CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Z = 4. CUDA_ERROR_NOT_FOUND = 500. CUDA_ERROR_UNMAP_FAILED = 206. CUDA_ERROR_NOT_READY = 600. CUDA_ERROR_NO_DEVICE = 100. CUDA_ERROR_NO_BINARY_FOR_GPU = 209. Module Documentation Generated for NVIDIA CUDA Library by Doxygen . CUDA_ERROR_SHARED_OBJECT_INIT_FAILED = 303. CUDA_ERROR_NOT_MAPPED_AS_POINTER = 213. CUDA_ERROR_CONTEXT_ALREADY_CURRENT = 202.

CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_HEIGHT = 28. CU_DEVICE_ATTRIBUTE_INTEGRATED = 18. CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS = 31. CU_DEVICE_ATTRIBUTE_SHARED_MEMORY_PER_BLOCK = 8. CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_DEPTH = 26. CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_HEIGHT = 23. CU_DEVICE_ATTRIBUTE_MAX_PITCH = 11. CU_DEVICE_ATTRIBUTE_REGISTERS_PER_BLOCK = 12. CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_WIDTH = 27. CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK = 8. CU_DEVICE_ATTRIBUTE_COMPUTE_MODE = 20. CU_EVENT_DISABLE_TIMING = 2 } • enum CUfilter_mode_enum { CU_TR_FILTER_MODE_POINT = 0. CU_DEVICE_ATTRIBUTE_TEXTURE_ALIGNMENT = 14. CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_HEIGHT = 25. CU_DEVICE_ATTRIBUTE_WARP_SIZE = 10. CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY = 19. CU_DEVICE_ATTRIBUTE_SURFACE_ALIGNMENT = 30. CU_DEVICE_ATTRIBUTE_GPU_OVERLAP = 15. CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT = 17. CU_DEVICE_ATTRIBUTE_PCI_BUS_ID = 33. CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Z = 7. CU_DEVICE_ATTRIBUTE_ECC_ENABLED = 32. CU_DEVICE_ATTRIBUTE_TCC_DRIVER = 35 } • enum CUevent_flags_enum { CU_EVENT_DEFAULT = 0. CU_EVENT_BLOCKING_SYNC = 1. CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_WIDTH = 21.25 Data types used by CUDA driver CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Y = 6. CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT = 16. CU_TR_FILTER_MODE_LINEAR = 1 } Generated for NVIDIA CUDA Library by Doxygen 151 . CU_DEVICE_ATTRIBUTE_CLOCK_RATE = 13. CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_NUMSLICES = 29. CU_DEVICE_ATTRIBUTE_PCI_DEVICE_ID = 34.4. CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_BLOCK = 12. CU_DEVICE_ATTRIBUTE_TOTAL_CONSTANT_MEMORY = 9. CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_WIDTH = 22. CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_WIDTH = 24.

CU_LIMIT_PRINTF_FIFO_SIZE = 0x01. CU_JIT_TARGET_FROM_CUCONTEXT. CU_TARGET_COMPUTE_11. CU_TARGET_COMPUTE_12. CU_JIT_THREADS_PER_BLOCK. CU_FUNC_CACHE_PREFER_SHARED = 0x01. CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES. CU_FUNC_ATTRIBUTE_NUM_REGS = 4. CU_PREFER_BINARY } • enum CUjit_option_enum { CU_JIT_MAX_REGISTERS = 0. CU_TARGET_COMPUTE_13. CU_JIT_TARGET. CU_JIT_FALLBACK_STRATEGY } • enum CUjit_target_enum { CU_TARGET_COMPUTE_10 = 0. CU_TARGET_COMPUTE_20. CU_FUNC_ATTRIBUTE_PTX_VERSION = 5. CU_FUNC_CACHE_PREFER_L1 = 0x02 } • enum CUfunction_attribute_enum { CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK = 0. CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES = 3. CU_JIT_OPTIMIZATION_LEVEL. CU_FUNC_ATTRIBUTE_CONST_SIZE_BYTES = 2. CU_TARGET_COMPUTE_21 } • enum CUlimit_enum { CU_LIMIT_STACK_SIZE = 0x00. CU_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES = 1. CU_JIT_WALL_TIME. CU_MEMORYTYPE_DEVICE = 0x02. CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES. CU_FUNC_ATTRIBUTE_BINARY_VERSION = 6 } • enum CUgraphicsMapResourceFlags_enum • enum CUgraphicsRegisterFlags_enum • enum CUjit_fallback_enum { CU_PREFER_PTX = 0. CU_MEMORYTYPE_ARRAY = 0x03 } Module Documentation Generated for NVIDIA CUDA Library by Doxygen . CU_JIT_INFO_LOG_BUFFER. CU_JIT_ERROR_LOG_BUFFER.152 • enum CUfunc_cache_enum { CU_FUNC_CACHE_PREFER_NONE = 0x00. CU_LIMIT_MALLOC_HEAP_SIZE = 0x02 } • enum CUmemorytype_enum { CU_MEMORYTYPE_HOST = 0x01.

25.25 Data types used by CUDA driver 153 4. Flag for cuTexRefSetFlags() 4.1]. 4.25.25.25. faster to DMA.25.1.25.25.1.1 4.2 #define CU_MEMHOSTALLOC_PORTABLE 0x01 If set.25.4.10 #define CUDA_ARRAY3D_SURFACE_LDST 0x02 This flag must be set in order to bind a surface reference to the CUDA array Generated for NVIDIA CUDA Library by Doxygen .1.6 #define CU_TRSF_NORMALIZED_COORDINATES 0x02 Use normalized texture coordinates in the range [0. use default texunit from texture reference.1.25. Flag for cuMemHostAlloc() 4. Flag for cuTexRefSetArray() 4.5 #define CU_TRSA_OVERRIDE_FORMAT 0x01 Override the texref format with a format inferred from the array.fast to write. Flag for cuTexRefSetFlags() 4. host memory is portable between CUDA contexts. host memory is mapped into CUDA address space and cuMemHostGetDevicePointer() may be called on the host pointer.3 #define CU_MEMHOSTALLOC_WRITECOMBINED 0x04 If set.4 #define CU_PARAM_TR_DEFAULT -1 For texture references loaded into the module.25. Flag for cuMemHostAlloc() 4.1.7 #define CU_TRSF_READ_AS_INTEGER 0x01 Read the texture as integers rather than promoting the values to floats in the range [0.1. host memory is allocated as write-combined .1) instead of [0.dim). 4.1 Define Documentation #define CU_MEMHOSTALLOC_DEVICEMAP 0x02 If set. Flag for cuMemHostAlloc() 4.25.1.8 #define CU_TRSF_SRGB 0x10 Perform sRGB->linear conversion during texture read. the CUDA array contains an array of 2D slices and the Depth member of CUDA_ARRAY3D_DESCRIPTOR specifies the number of slices.1.9 #define CUDA_ARRAY3D_2DARRAY 0x01 If set. slow to read except via SSE4 streaming load instruction (MOVNTDQA). Flag for cuTexRefSetFlags() 4. not the depth of a 3D array.1.1.

25.2.2.2.4 typedef enum CUarray_format_enum CUarray_format Array formats 4.25.2.25.2.1.25.2.25.25.5 typedef enum CUcomputemode_enum CUcomputemode Compute Modes 4.25.6 typedef struct CUctx_st∗ CUcontext CUDA context 4.7 typedef enum CUctx_flags_enum CUctx_flags Context creation flags 4.1 Typedef Documentation typedef enum CUaddress_mode_enum CUaddress_mode Texture reference addressing modes 4.25.8 typedef struct CUDA_ARRAY3D_DESCRIPTOR_st CUDA_ARRAY3D_DESCRIPTOR 3D array descriptor 4.25.2 4.2.2.25.25.2.25.2 typedef struct CUarray_st∗ CUarray CUDA array 4.9 typedef struct CUDA_ARRAY_DESCRIPTOR_st CUDA_ARRAY_DESCRIPTOR Array descriptor 4.154 4.2.11 #define CUDA_VERSION 3020 Module Documentation CUDA API version number 4.3 typedef enum CUarray_cubemap_face_enum CUarray_cubemap_face Array indices for cube faces 4.10 typedef struct CUDA_MEMCPY2D_st CUDA_MEMCPY2D 2D memory copy parameters Generated for NVIDIA CUDA Library by Doxygen .

25.25.2.25.16 typedef struct CUevent_st∗ CUevent CUDA event 4.18 typedef enum CUfilter_mode_enum CUfilter_mode Texture reference filtering modes 4.2.25.19 typedef enum CUfunc_cache_enum CUfunc_cache Function cache configurations 4.25 Data types used by CUDA driver 4.11 typedef struct CUDA_MEMCPY3D_st CUDA_MEMCPY3D 155 3D memory copy parameters 4.25.2.2.2.17 typedef enum CUevent_flags_enum CUevent_flags Event creation flags 4.15 typedef struct CUdevprop_st CUdevprop Legacy device properties 4.12 typedef int CUdevice CUDA device 4.22 typedef enum CUgraphicsMapResourceFlags_enum CUgraphicsMapResourceFlags Flags for mapping and unmapping interop resources Generated for NVIDIA CUDA Library by Doxygen .2.2.2.25.21 typedef enum CUfunction_attribute_enum CUfunction_attribute Function properties 4.25.25.25.2.25.25.14 typedef unsigned int CUdeviceptr CUDA device pointer 4.2.2.4.13 typedef enum CUdevice_attribute_enum CUdevice_attribute Device properties 4.2.25.20 typedef struct CUfunc_st∗ CUfunction CUDA function 4.

156 4.2.28 Limits 4.25.29 typedef enum CUmemorytype_enum CUmemorytype typedef enum CUlimit_enum CUlimit Memory types 4.2.2.25.25 typedef enum CUjit_fallback_enum CUjit_fallback Cubin matching fallback strategies 4.25.25.25.2.27 typedef enum CUjit_target_enum CUjit_target Online compilation targets 4.25.2.23 typedef enum CUgraphicsRegisterFlags_enum CUgraphicsRegisterFlags Module Documentation Flags to register a graphics resource 4.2.2.26 typedef enum CUjit_option_enum CUjit_option Online compiler options 4.33 typedef struct CUsurfref_st∗ CUsurfref CUDA surface reference 4.24 typedef struct CUgraphicsResource_st∗ CUgraphicsResource CUDA graphics interop resource 4.25.2.25.32 typedef struct CUstream_st∗ CUstream typedef enum cudaError_enum CUresult CUDA stream 4.25.2.25.25.31 Error codes 4.2.2.30 typedef struct CUmod_st∗ CUmodule CUDA module 4.25.2.34 typedef struct CUtexref_st∗ CUtexref CUDA texture reference Generated for NVIDIA CUDA Library by Doxygen .

2 enum CUarray_cubemap_face_enum Array indices for cube faces Enumerator: CU_CUBEMAP_FACE_POSITIVE_X Positive X face of cubemap CU_CUBEMAP_FACE_NEGATIVE_X Negative X face of cubemap CU_CUBEMAP_FACE_POSITIVE_Y Positive Y face of cubemap CU_CUBEMAP_FACE_NEGATIVE_Y Negative Y face of cubemap CU_CUBEMAP_FACE_POSITIVE_Z Positive Z face of cubemap CU_CUBEMAP_FACE_NEGATIVE_Z Negative Z face of cubemap 4.1 Enumeration Type Documentation enum CUaddress_mode_enum Texture reference addressing modes Enumerator: CU_TR_ADDRESS_MODE_WRAP Wrapping address mode CU_TR_ADDRESS_MODE_CLAMP Clamp to edge address mode CU_TR_ADDRESS_MODE_MIRROR Mirror address mode CU_TR_ADDRESS_MODE_BORDER Border address mode 4.4 enum CUcomputemode_enum Compute Modes Enumerator: CU_COMPUTEMODE_DEFAULT Default compute mode (Multiple contexts allowed per device) Generated for NVIDIA CUDA Library by Doxygen .25 Data types used by CUDA driver 157 4.3.3.3 4.3.25.25.3 enum CUarray_format_enum Array formats Enumerator: CU_AD_FORMAT_UNSIGNED_INT8 Unsigned 8-bit integers CU_AD_FORMAT_UNSIGNED_INT16 Unsigned 16-bit integers CU_AD_FORMAT_UNSIGNED_INT32 Unsigned 32-bit integers CU_AD_FORMAT_SIGNED_INT8 Signed 8-bit integers CU_AD_FORMAT_SIGNED_INT16 Signed 16-bit integers CU_AD_FORMAT_SIGNED_INT32 Signed 32-bit integers CU_AD_FORMAT_HALF 16-bit floating point CU_AD_FORMAT_FLOAT 32-bit floating point 4.4.3.25.25.25.

CUDA_ERROR_NO_DEVICE This indicates that no CUDA-capable devices were detected by the installed CUDA driver. CUDA_ERROR_OUT_OF_MEMORY The API call failed because it was unable to allocate enough memory to perform the requested operation. This can also indicate an invalid CUDA module. See cuCtxGetApiVersion() for more details.e. 3010 context with 3020 API calls).158 Module Documentation CU_COMPUTEMODE_EXCLUSIVE Compute-exclusive mode (Only one context can be present on this device at a time) CU_COMPUTEMODE_PROHIBITED Compute-prohibited mode (No contexts can be created on this device at this time) 4.3. In the case of query calls. This can also be returned if a user mixes different API versions (i. CUDA_ERROR_INVALID_CONTEXT This most frequently indicates that there is no context bound to the current thread. It is no longer an error to attempt to push the active context via cuCtxPushCurrent(). CUDA_ERROR_DEINITIALIZED This indicates that the CUDA driver is in the process of shutting down. this can also mean that the operation being queried is complete (see cuEventQuery() and cuStreamQuery()). Generated for NVIDIA CUDA Library by Doxygen .2.6 enum cudaError_enum Error codes Enumerator: CUDA_SUCCESS The API call returned with no errors.25. CUDA_ERROR_INVALID_IMAGE This indicates that the device kernel image is invalid. CUDA_ERROR_CONTEXT_ALREADY_CURRENT This indicated that the context being supplied as a parameter to the API call was already the active context. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had cuCtxDestroy() invoked on it). CUDA_ERROR_INVALID_DEVICE This indicates that the device ordinal supplied by the user does not correspond to a valid CUDA device. CUDA_ERROR_NOT_INITIALIZED This indicates that the CUDA driver has not been initialized with cuInit() or that initialization has failed. CUDA_ERROR_INVALID_VALUE This indicates that one or more of the parameters passed to the API call is not within an acceptable range of values. Deprecated This error return is deprecated as of CUDA 3.5 enum CUctx_flags_enum Context creation flags Enumerator: CU_CTX_SCHED_AUTO Automatic scheduling CU_CTX_SCHED_SPIN Set spin as default scheduling CU_CTX_SCHED_YIELD Set yield as default scheduling CU_CTX_BLOCKING_SYNC Use blocking synchronization CU_CTX_MAP_HOST Support mapped pinned allocations CU_CTX_LMEM_RESIZE_TO_MAX Keep local memory allocation after launch 4.3.25.

This error usually indicates that the user has attempted to pass too many arguments to the device kernel. All existing device memory allocations from this context are invalid and must be reconstructed if the program is to continue using CUDA. Passing arguments of the wrong size (i. CUDA_ERROR_NOT_FOUND This indicates that a named symbol was not found. 159 CUDA_ERROR_ARRAY_IS_MAPPED This indicates that the specified array is currently mapped and thus cannot be destroyed. CUDA_ERROR_NOT_MAPPED_AS_ARRAY This indicates that a mapped resource is not available for access as an array. CUDA_ERROR_NOT_READY This indicates that asynchronous operations issued previously have not completed yet. CUDA_ERROR_ECC_UNCORRECTABLE This indicates that an uncorrectable ECC error was detected during execution. CUDA_ERROR_NOT_MAPPED_AS_POINTER This indicates that a mapped resource is not available for access as a pointer. Examples of symbols are global/constant variable names. but must be indicated differently than CUDA_SUCCESS (which indicates completion). CUDA_ERROR_ALREADY_ACQUIRED This indicates that a resource has already been acquired. Generated for NVIDIA CUDA Library by Doxygen . This can occur when a user specifies code generation options for a particular CUDA source file that do not include the corresponding device configuration. The context cannot be used. CUDA_ERROR_LAUNCH_FAILED An exception occurred on the device while executing a kernel.4. CUDA_ERROR_SHARED_OBJECT_INIT_FAILED This indicates that initialization of a shared object failed. and surface names. CUDA_ERROR_NOT_MAPPED This indicates that a resource is not mapped. CUDA_ERROR_INVALID_SOURCE This indicates that the device kernel source is invalid.25 Data types used by CUDA driver CUDA_ERROR_MAP_FAILED This indicates that a map or register operation has failed. CUDA_ERROR_LAUNCH_TIMEOUT This indicates that the device kernel took too long to execute. CUDA_ERROR_UNSUPPORTED_LIMIT This indicates that the CUlimit passed to the API call is not supported by the active device. texture names. or the kernel launch specifies too many threads for the kernel’s register count. The context cannot be used (and must be destroyed similar to CUDA_ERROR_LAUNCH_FAILED). so it must be destroyed (and a new one should be created). This result is not actually an error. CUDA_ERROR_OPERATING_SYSTEM This indicates that an OS call failed. CUDA_ERROR_UNMAP_FAILED This indicates that an unmap or unregister operation has failed. CUDA_ERROR_NO_BINARY_FOR_GPU This indicates that there is no kernel image available that is suitable for the device. Common causes include dereferencing an invalid device pointer and accessing out of bounds shared memory. CUDA_ERROR_FILE_NOT_FOUND This indicates that the file specified was not found. All existing device memory allocations from this context are invalid and must be reconstructed if the program is to continue using CUDA. CUDA_ERROR_INVALID_HANDLE This indicates that a resource handle passed to the API call was not valid. CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND This indicates that a link to a shared object failed to resolve.e. CUDA_ERROR_ALREADY_MAPPED This indicates that the resource is already mapped. a 64-bit pointer when a 32-bit int is expected) is equivalent to passing too many arguments and can also result in this error. Calls that may return this value include cuEventQuery() and cuStreamQuery(). This can only occur if timeouts are enabled . CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES This indicates that a launch did not occur because it did not have appropriate resources. Resource handles are opaque types like CUstream and CUevent.see the device attribute CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT for more information.

7 enum CUdevice_attribute_enum Device properties Enumerator: CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK Maximum number of threads per block CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_X Maximum block dimension X CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Y Maximum block dimension Y CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Z Maximum block dimension Z CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_X Maximum grid dimension X CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Y Maximum grid dimension Y CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Z Maximum grid dimension Z CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK Maximum shared memory available per block in bytes CU_DEVICE_ATTRIBUTE_SHARED_MEMORY_PER_BLOCK Deprecated.160 Module Documentation CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING This error indicates a kernel launch that uses an incompatible texturing mode.3. use CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_BLOCK CU_DEVICE_ATTRIBUTE_CLOCK_RATE Peak clock frequency in kilohertz CU_DEVICE_ATTRIBUTE_TEXTURE_ALIGNMENT Alignment requirement for textures CU_DEVICE_ATTRIBUTE_GPU_OVERLAP Device can possibly copy memory and execute a kernel concurrently CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT Number of multiprocessors on device CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT Specifies whether there is a run time limit on kernels CU_DEVICE_ATTRIBUTE_INTEGRATED Device is integrated with host memory CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY Device can map host memory into CUDA address space CU_DEVICE_ATTRIBUTE_COMPUTE_MODE Compute mode (See CUcomputemode for details) CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_WIDTH Maximum 1D texture width CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_WIDTH Maximum 2D texture width CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_HEIGHT Maximum 2D texture height CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_WIDTH Maximum 3D texture width CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_HEIGHT Maximum 3D texture height Generated for NVIDIA CUDA Library by Doxygen .25. ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK use CU_DEVICE_- CU_DEVICE_ATTRIBUTE_TOTAL_CONSTANT_MEMORY Memory available on device for __constant__ variables in a CUDA C kernel in bytes CU_DEVICE_ATTRIBUTE_WARP_SIZE Warp size in threads CU_DEVICE_ATTRIBUTE_MAX_PITCH Maximum pitch in bytes allowed by memory copies CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_BLOCK Maximum number of 32-bit registers available per block CU_DEVICE_ATTRIBUTE_REGISTERS_PER_BLOCK Deprecated. CUDA_ERROR_UNKNOWN This indicates that an unknown internal error has occurred. 4.

8 enum CUevent_flags_enum Event creation flags Enumerator: CU_EVENT_DEFAULT Default event flag CU_EVENT_BLOCKING_SYNC Event uses blocking synchronization CU_EVENT_DISABLE_TIMING Event will not record timing data 4.3.25.3.3.10 enum CUfunc_cache_enum Function cache configurations Enumerator: CU_FUNC_CACHE_PREFER_NONE no preference for shared memory or L1 (default) CU_FUNC_CACHE_PREFER_SHARED prefer larger shared memory and smaller L1 cache CU_FUNC_CACHE_PREFER_L1 prefer larger L1 cache and smaller shared memory Generated for NVIDIA CUDA Library by Doxygen .25 Data types used by CUDA driver CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_DEPTH Maximum 3D texture depth 161 CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_WIDTH Maximum texture array width CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_HEIGHT Maximum texture array height CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_NUMSLICES Maximum slices in a texture array CU_DEVICE_ATTRIBUTE_SURFACE_ALIGNMENT Alignment requirement for surfaces CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS Device can possibly execute multiple kernels concurrently CU_DEVICE_ATTRIBUTE_ECC_ENABLED Device has ECC support enabled CU_DEVICE_ATTRIBUTE_PCI_BUS_ID PCI bus ID of the device CU_DEVICE_ATTRIBUTE_PCI_DEVICE_ID PCI device ID of the device CU_DEVICE_ATTRIBUTE_TCC_DRIVER Device is using TCC driver model 4.4.9 enum CUfilter_mode_enum Texture reference filtering modes Enumerator: CU_TR_FILTER_MODE_POINT Point filter mode CU_TR_FILTER_MODE_LINEAR Linear filter mode 4.25.25.

so a binary version 1.25.3.15 enum CUjit_option_enum Online compiler options Enumerator: CU_JIT_MAX_REGISTERS Max number of registers that a thread may use.0. Option type: unsigned int Generated for NVIDIA CUDA Library by Doxygen .3. CU_FUNC_ATTRIBUTE_NUM_REGS The number of registers used by each thread of this function. This value is the major PTX version ∗ 10 + the minor PTX version. CU_FUNC_ATTRIBUTE_PTX_VERSION The PTX virtual architecture version for which the function was compiled. This does not include dynamically-allocated shared memory requested by the user at runtime. beyond which a launch of the function would fail. Note that this will return a value of 10 for legacy cubins that do not have a properly-encoded binary architecture version.3 function would return the value 13.25. CU_FUNC_ATTRIBUTE_BINARY_VERSION The binary architecture version for which the function was compiled.3. so a PTX version 1. CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES The size in bytes of local memory used by each thread of this function. CU_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES The size in bytes of statically-allocated shared memory required by this function. CU_FUNC_ATTRIBUTE_CONST_SIZE_BYTES The size in bytes of user-allocated constant memory required by this function.3.12 enum CUgraphicsMapResourceFlags_enum Flags for mapping and unmapping interop resources 4. Note that this may return the undefined value of 0 for cubins compiled prior to CUDA 3.3 function would return the value 13.162 4.3. 4. This value is the major binary version ∗ 10 + the minor binary version.11 enum CUfunction_attribute_enum Module Documentation Function properties Enumerator: CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK The maximum number of threads per block.25. This number depends on both the function and the device on which the function is currently loaded.13 enum CUgraphicsRegisterFlags_enum Flags to register a graphics resource 4.25.25.14 enum CUjit_fallback_enum Cubin matching fallback strategies Enumerator: CU_PREFER_PTX Prefer to compile ptx CU_PREFER_BINARY Prefer to fall back to compatible binary code 4.

Log messages will be capped at this size (including null terminator) OUT: Amount of log buffer filled with messages Option type: unsigned int CU_JIT_ERROR_LOG_BUFFER Pointer to a buffer in which to print any log messages from PTXAS that reflect errors (the buffer size is specified via option CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES) Option type: char∗ CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES IN: Log buffer size in bytes. Option type: unsigned int for enumerated type CUjit_fallback_enum 4. Determines the target based on the current attached context (default) Option type: No option value needed CU_JIT_TARGET Target is chosen based on supplied CUjit_target_enum. Option type: unsigned int CU_JIT_TARGET_FROM_CUCONTEXT No option value required. this option does not currently take into account any other resource limitations. Option type: unsigned int for enumerated type CUjit_target_enum CU_JIT_FALLBACK_STRATEGY Specifies choice of fallback strategy if matching cubin is not found.0 Compute device class 1.1 Generated for NVIDIA CUDA Library by Doxygen .4. This restricts the resource utilization fo the compiler (e. such as shared memory utilization. Option type: unsigned int CU_JIT_WALL_TIME Returns a float value in the option of the wall clock time.g. Log messages will be capped at this size (including null terminator) OUT: Amount of log buffer filled with messages Option type: unsigned int CU_JIT_OPTIMIZATION_LEVEL Level of optimizations to apply to generated code (0 .4). spent creating the cubin Option type: float CU_JIT_INFO_LOG_BUFFER Pointer to a buffer in which to print any log messsages from PTXAS that are informational in nature (the buffer size is specified via option CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES) Option type: char∗ CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES IN: Log buffer size in bytes.25. with 4 being the default and highest level of optimizations.25 Data types used by CUDA driver 163 CU_JIT_THREADS_PER_BLOCK IN: Specifies minimum number of threads per block to target compilation for OUT: Returns the number of threads the compiler actually targeted. Note.16 enum CUjit_target_enum Online compilation targets Enumerator: CU_TARGET_COMPUTE_10 CU_TARGET_COMPUTE_11 CU_TARGET_COMPUTE_12 CU_TARGET_COMPUTE_13 CU_TARGET_COMPUTE_20 CU_TARGET_COMPUTE_21 Compute device class 1.2 Compute device class 1. max registers) such that a block with the given number of threads should be able to launch based on register limitations. Choice is based on supplied CUjit_fallback_enum. in milliseconds.3 Compute device class 2.0 Compute device class 2.1 Compute device class 1.3.

17 Limits Enumerator: CU_LIMIT_STACK_SIZE GPU thread stack size CU_LIMIT_PRINTF_FIFO_SIZE GPU printf FIFO size CU_LIMIT_MALLOC_HEAP_SIZE GPU malloc heap size enum CUlimit_enum Module Documentation 4.25.18 enum CUmemorytype_enum Memory types Enumerator: CU_MEMORYTYPE_HOST Host memory CU_MEMORYTYPE_DEVICE Device memory CU_MEMORYTYPE_ARRAY Array memory Generated for NVIDIA CUDA Library by Doxygen .3.3.164 4.25.

26. If cuInit() has not been called. Parameters: Flags .2.2 4. Currently.26 Initialization 165 4. CUDA_ERROR_INVALID_DEVICE Note: Note that this function may also return error codes from previous. asynchronous launches. 4.Initialization flag for CUDA. CUDA_ERROR_INVALID_VALUE.4. any function from the driver API will return CUDA_ERROR_NOT_INITIALIZED. the Flags parameter must be 0.26.26 Initialization Functions • CUresult cuInit (unsigned int Flags) Initialize the CUDA driver API.1 Function Documentation CUresult cuInit (unsigned int Flags) Initializes the driver API and must be called before any other function from the driver API.26.1 Detailed Description This section describes the initialization functions of the low-level CUDA driver application programming interface. Returns: CUDA_SUCCESS. Generated for NVIDIA CUDA Library by Doxygen . 4.

1 Function Documentation CUresult cuDriverGetVersion (int ∗ driverVersion) Returns in ∗driverVersion the version number of the installed CUDA driver.166 Module Documentation 4. This function automatically returns CUDA_ERROR_INVALID_VALUE if the driverVersion argument is NULL.Returns the CUDA driver version Returns: CUDA_SUCCESS.2 4.1 Detailed Description This section describes the version management functions of the low-level CUDA driver application programming interface. 4.27. Generated for NVIDIA CUDA Library by Doxygen .27. asynchronous launches.27 Version Management Functions • CUresult cuDriverGetVersion (int ∗driverVersion) Returns the CUDA driver version.2.27. 4. Parameters: driverVersion . CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous.

• CUresult cuDeviceGet (CUdevice ∗device. CUDA_ERROR_INVALID_VALUE.28 Device Management Functions • CUresult cuDeviceComputeCapability (int ∗major.Device handle Returns: CUDA_SUCCESS. CUDA_ERROR_NOT_INITIALIZED.1 Detailed Description This section describes the device management functions of the low-level CUDA driver application programming interface. CUdevice dev) Returns in ∗major and ∗minor the major and minor revision numbers that define the compute capability of the device dev.2 4. CUdevice dev) Returns information about the device. int ∗minor. asynchronous launches. 4. int ordinal) Returns a handle to a compute device.Minor revision number dev . CUDA_ERROR_DEINITIALIZED.28. • CUresult cuDeviceTotalMem (size_t ∗bytes. 4. int ∗ minor. Parameters: major . CUdevice dev) Returns properties for a selected device. CUDA_ERROR_INVALID_DEVICE Note: Note that this function may also return error codes from previous. Generated for NVIDIA CUDA Library by Doxygen . CUdevice dev) Returns the compute capability of the device.2. • CUresult cuDeviceGetCount (int ∗count) Returns the number of compute-capable devices.28. CUDA_ERROR_INVALID_CONTEXT.Major revision number minor . CUdevice dev) Returns an identifer string for the device. CUdevice dev) Returns the total amount of memory on the device. CUdevice_attribute attrib.28.1 Function Documentation CUresult cuDeviceComputeCapability (int ∗ major.4. • CUresult cuDeviceGetAttribute (int ∗pi. int len. • CUresult cuDeviceGetName (char ∗name. • CUresult cuDeviceGetProperties (CUdevprop ∗prop.28 Device Management 167 4.

CUDA_ERROR_INVALID_VALUE. CUDA_ERROR_INVALID_DEVICE Note: Note that this function may also return error codes from previous. The supported attributes are: • CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK: Maximum number of threads per block. cuDeviceGetProperties. • CU_DEVICE_ATTRIBUTE_TOTAL_CONSTANT_MEMORY: Memory available on device for __constant__ variables in a CUDA C kernel in bytes.Returned device handle ordinal .Device number to get handle for Returns: CUDA_SUCCESS. cuDeviceGetAttribute. int ordinal) Returns in ∗device a device handle given an ordinal in the range [0. cuDeviceTotalMem 4. CUDA_ERROR_NOT_INITIALIZED. cuDeviceGet. cuDeviceGetCount.3 CUresult cuDeviceGetAttribute (int ∗ pi. See also: cuDeviceComputeCapability. • CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Y: Maximum y-dimension of a block. • CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Z: Maximum z-dimension of a block. cuDeviceGetCount()-1]. • CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK: Maximum amount of shared memory available to a thread block in bytes. • CU_DEVICE_ATTRIBUTE_MAX_PITCH: Maximum pitch in bytes allowed by the memory copy functions that involve memory regions allocated through cuMemAllocPitch(). CUDA_ERROR_INVALID_CONTEXT.2 CUresult cuDeviceGet (CUdevice ∗ device. CUDA_ERROR_DEINITIALIZED. CUdevice dev) Returns in ∗pi the integer value of the attribute attrib on device dev. cuDeviceGetProperties. • CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_X: Maximum x-dimension of a grid.28. • CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Z: Maximum z-dimension of a grid.168 See also: Module Documentation cuDeviceGetAttribute. • CU_DEVICE_ATTRIBUTE_WARP_SIZE: Warp size in threads.2. Parameters: device . asynchronous launches. cuDeviceGetName. cuDeviceGetName.2. Generated for NVIDIA CUDA Library by Doxygen . cuDeviceGetCount. • CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_X: Maximum x-dimension of a block. cuDeviceTotalMem 4. • CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Y: Maximum y-dimension of a grid.28. this amount is shared by all thread blocks simultaneously resident on a multiprocessor. CUdevice_attribute attrib.

• CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT: 1 if there is a run time limit for kernels executed on the device.Device is not restricted and can have multiple CUDA contexts present at a single time. • CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT: Number of multiprocessors on the device. – CU_COMPUTEMODE_EXCLUSIVE: Compute-exclusive mode . Generated for NVIDIA CUDA Library by Doxygen .Device attribute to query dev . – CU_COMPUTEMODE_PROHIBITED: Compute-prohibited mode .Device handle Returns: CUDA_SUCCESS. or 0 if not.4. • CU_DEVICE_ATTRIBUTE_TEXTURE_ALIGNMENT: Alignment requirement. CUDA_ERROR_INVALID_VALUE. this number is shared by all thread blocks simultaneously resident on a multiprocessor. or 0 if not. or 0 if not. CUDA_ERROR_INVALID_DEVICE Note: Note that this function may also return error codes from previous. • CU_DEVICE_ATTRIBUTE_TCC_DRIVER: 1 if the device is using a TCC driver. • CU_DEVICE_ATTRIBUTE_INTEGRATED: 1 if the device is integrated with the memory subsystem. • CU_DEVICE_ATTRIBUTE_ECC_ENABLED: 1 if error correction is enabled on the device. or 0 if not. • CU_DEVICE_ATTRIBUTE_PCI_DEVICE_ID: PCI device (also known as slot) identifier of the device.Device can have only one CUDA context present on it at a time. 0 if error correction is disabled or not supported by the device.Returned device attribute value attrib . • CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS: 1 if the device supports executing multiple kernels within the same context simultaneously. or 0 if not.Device is prohibited from creating new CUDA contexts.28 Device Management 169 • CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_BLOCK: Maximum number of 32-bit registers available to a thread block. CUDA_ERROR_INVALID_CONTEXT. Parameters: pi . • CU_DEVICE_ATTRIBUTE_GPU_OVERLAP: 1 if the device can concurrently copy memory between host and device while executing a kernel. texture base addresses aligned to textureAlign bytes do not need an offset applied to texture fetches. • CU_DEVICE_ATTRIBUTE_PCI_BUS_ID: PCI bus identifier of the device. • CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY: 1 if the device can map host memory into the CUDA address space. Available modes are as follows: – CU_COMPUTEMODE_DEFAULT: Default mode . CUDA_ERROR_NOT_INITIALIZED. asynchronous launches. • CU_DEVICE_ATTRIBUTE_COMPUTE_MODE: Compute mode that device is currently in. TCC is only available on Tesla hardware running Windows Vista or later. It is not guaranteed that multiple kernels will be resident on the device concurrently so this feature should not be relied upon for correctness. CUDA_ERROR_DEINITIALIZED. • CU_DEVICE_ATTRIBUTE_CLOCK_RATE: Peak clock frequency in kilohertz.

ERROR_INVALID_CONTEXT. cuDeviceGetCount() returns 0. cuDeviceGetProperties.0 that are available for execution. int len. See also: cuDeviceComputeCapability. cuDeviceGetName. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. cuDeviceGet. cuDeviceTotalMem Generated for NVIDIA CUDA Library by Doxygen . CUDA_ERROR_INVALID_VALUE.2.Returned number of compute-capable devices Returns: CUDA_SUCCESS.Maximum length of string to store in name dev .5 CUresult cuDeviceGetName (char ∗ name. cuDeviceGetCount.Returned identifier string for the device len . CUDA_ERROR_NOT_INITIALIZED. len specifies the maximum length of the string that may be returned.28. CUDA_ERROR_DEINITIALIZED. cuDeviceGetCount.Device to get identifier string for Returns: CUDA_SUCCESS. cuDeviceGetAttribute. Parameters: count . Parameters: name .170 See also: Module Documentation cuDeviceComputeCapability. cuDeviceGetAttribute. cuDeviceTotalMem 4. cuDeviceGet. See also: cuDeviceComputeCapability. cuDeviceGet. cuDeviceGetProperties. If there is no such device.28. cuDeviceGetName. cuDeviceTotalMem CUDA_- 4. CUDA_ERROR_NOT_INITIALIZED.4 CUresult cuDeviceGetCount (int ∗ count) Returns in ∗count the number of devices with compute capability greater than or equal to 1. CUDA_ERROR_INVALID_CONTEXT. CUdevice dev) Returns an ASCII string identifying the device dev in the NULL-terminated string pointed to by name. CUDA_ERROR_DEINITIALIZED. cuDeviceGetProperties. asynchronous launches.2. CUDA_ERROR_INVALID_DEVICE Note: Note that this function may also return error codes from previous. asynchronous launches.

where: • maxThreadsPerBlock is the maximum number of threads per block.2. int maxThreadsDim[3]. CUDA_ERROR_DEINITIALIZED. int clockRate. cuDeviceGet. int sharedMemPerBlock. int maxGridSize[3].6 CUresult cuDeviceGetProperties (CUdevprop ∗ prop. • maxThreadsDim[3] is the maximum sizes of each dimension of a block. asynchronous launches. Parameters: prop . int totalConstantMemory. cuDeviceGetAttribute. • maxGridSize[3] is the maximum sizes of each dimension of a grid.Device to get properties for Returns: CUDA_SUCCESS. texture base addresses that are aligned to textureAlign bytes do not need an offset applied to texture fetches. • totalConstantMemory is the total amount of constant memory available on the device in bytes. CUDA_ERROR_INVALID_CONTEXT. • textureAlign is the alignment requirement.28 Device Management 4. See also: cuDeviceComputeCapability. • memPitch is the maximum pitch allowed by the memory copy functions that involve memory regions allocated through cuMemAllocPitch(). CUDA_ERROR_INVALID_DEVICE Note: Note that this function may also return error codes from previous.28. The CUdevprop structure is defined as: typedef struct CUdevprop_st { int maxThreadsPerBlock. cuDeviceGetName. • regsPerBlock is the total number of registers available per block. • sharedMemPerBlock is the total amount of shared memory available per block in bytes. int memPitch. CUdevice dev) 171 Returns in ∗prop the properties of device dev. CUDA_ERROR_INVALID_VALUE. • clockRate is the clock frequency in kilohertz. int textureAlign } CUdevprop. int SIMDWidth.4. CUDA_ERROR_NOT_INITIALIZED. cuDeviceGetCount. cuDeviceTotalMem Generated for NVIDIA CUDA Library by Doxygen . int regsPerBlock.Returned properties of device dev . • SIMDWidth is the warp size.

CUDA_ERROR_DEINITIALIZED. cuDeviceGetAttribute. Parameters: bytes . asynchronous launches. CUDA_ERROR_INVALID_VALUE. CUdevice dev) Module Documentation Returns in ∗bytes the total amount of memory available on the device dev in bytes. See also: cuDeviceComputeCapability. Generated for NVIDIA CUDA Library by Doxygen . cuDeviceGetCount. cuDeviceGetProperties. CUDA_ERROR_INVALID_CONTEXT. cuDeviceGet.2.7 CUresult cuDeviceTotalMem (size_t ∗ bytes. cuDeviceGetName.Returned memory available on device in bytes dev . CUDA_ERROR_INVALID_DEVICE Note: Note that this function may also return error codes from previous. CUDA_ERROR_NOT_INITIALIZED.28.172 4.Device handle Returns: CUDA_SUCCESS.

• CUresult cuCtxGetDevice (CUdevice ∗device) Returns the device ID for the current context. • CUresult cuCtxSynchronize (void) Block for a context’s tasks to complete. • CUresult cuCtxDetach (CUcontext ctx) Decrement a context’s usage-count. • CUresult cuCtxPushCurrent (CUcontext ctx) Pushes a floating context on the current CPU thread. CUdevice dev) Create a CUDA context. • CUresult cuCtxGetCacheConfig (CUfunc_cache ∗pconfig) Returns the preferred cache configuration for the current context.1 Detailed Description This section describes the context management functions of the low-level CUDA driver application programming interface. CUlimit limit) Returns resource limits. • CUresult cuCtxCreate (CUcontext ∗pctx. size_t value) Set resource limits. • CUresult cuCtxGetLimit (size_t ∗pvalue. • CUresult cuCtxGetApiVersion (CUcontext ctx. unsigned int ∗version) Gets the context’s API version.29 Context Management Functions • CUresult cuCtxAttach (CUcontext ∗pctx. • CUresult cuCtxPopCurrent (CUcontext ∗pctx) Pops the current CUDA context from the current CPU thread. • CUresult cuCtxSetCacheConfig (CUfunc_cache config) Sets the preferred cache configuration for the current context.29 Context Management 173 4.4.29. unsigned int flags. unsigned int flags) Increment a context’s usage-count. 4. • CUresult cuCtxSetLimit (CUlimit limit. • CUresult cuCtxDestroy (CUcontext ctx) Destroy the current context or a floating CUDA context. Generated for NVIDIA CUDA Library by Doxygen .

interacts with the OS scheduler when waiting for results from the GPU. Currently. cuCtxGetApiVersion. cuCtxDestroy.29. cuCtxPopCurrent.Returned context handle of the current context flags . cuCtxGetDevice. This can increase latency when waiting for the GPU. cuCtxSynchronize CUDA_- 4. then CUDA will yield to other OS threads when waiting for the GPU.Context attach flags (must be 0) Returns: CUDA_SUCCESS. ERROR_INVALID_CONTEXT. If a context is already current to the thread.2 4. but can increase the performance of CPU threads performing work in parallel with the GPU. cuCtxGetLimit. CUdevice dev) Creates a new CUDA context and associates it with the calling thread. cuCtxGetCacheConfig.2. See also: cuCtxCreate. cuCtxSetLimit.29. unsigned int flags. The flags parameter is described below. CUDA_ERROR_DEINITIALIZED.174 Module Documentation 4. This can decrease latency when waiting for the GPU. otherwise CUDA will not yield while waiting for results and actively spin on the processor. unsigned int flags) Increments the usage count of the context and passes back a context handle in ∗pctx that must be passed to cuCtxDetach() when the application is done with the context.29. cuCtxPushCurrent. asynchronous launches. it is supplanted by the newly created context and may be restored by a subsequent call to cuCtxPopCurrent(). the flags parameter must be 0. • CU_CTX_SCHED_AUTO: The default value if the flags parameter is zero. uses a heuristic based on the number of active CUDA contexts in the process C and the number of logical processors in the system P. The two LSBs of the flags parameter can be used to control how the OS thread.2. cuCtxAttach() fails if there is no context current to the thread. • CU_CTX_SCHED_SPIN: Instruct CUDA to actively spin when waiting for results from the GPU.2 CUresult cuCtxCreate (CUcontext ∗ pctx. which owns the CUDA context at the time of an API call. cuCtxDetach. CUDA_ERROR_NOT_INITIALIZED. Generated for NVIDIA CUDA Library by Doxygen . cuCtxSetCacheConfig. Parameters: pctx . The context is created with a usage count of 1 and the caller of cuCtxCreate() must call cuCtxDestroy() or cuCtxDetach() when done using the context. • CU_CTX_BLOCKING_SYNC: Instruct CUDA to block the CPU thread on a synchronization primitive when waiting for the GPU to finish work.1 Function Documentation CUresult cuCtxAttach (CUcontext ∗ pctx. If C > P. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. • CU_CTX_SCHED_YIELD: Instruct CUDA to yield its thread when waiting for results from the GPU. but may lower the performance of CPU threads if they are performing work in parallel with the CUDA thread.

cuCtxGetLimit. The nvidia-smi tool can be used to set the compute mode for devices. CUDA_ERROR_NOT_INITIALIZED.3 CUresult cuCtxDestroy (CUcontext ctx) Destroys the CUDA context specified by ctx. This flag must be set in order to allocate pinned host memory that is accessible to the GPU.Context creation flags dev . cuCtxSetLimit. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. cuCtxPushCurrent. asynchronous launches. ERROR_INVALID_CONTEXT. CUDA_ERROR_INVALID_CONTEXT. Floating contexts (detached from a CPU thread via cuCtxPopCurrent()) may be destroyed by this function. The function cuDeviceGetAttribute() can be used with CU_DEVICE_ATTRIBUTE_COMPUTE_MODE to determine the compute mode of the device. Parameters: pctx .Context to destroy Returns: CUDA_SUCCESS. cuCtxPopCurrent. • CU_CTX_LMEM_RESIZE_TO_MAX: Instruct CUDA to not reduce local memory after resizing local memory for a kernel. See also: cuCtxAttach. CUDA_ERROR_INVALID_DEVICE. cuCtxSetCacheConfig. Generated for NVIDIA CUDA Library by Doxygen CUDA_- . cuCtxGetCacheConfig. This can prevent thrashing by local memory allocations when launching many kernels with high local memory usage at the cost of potentially increased memory usage. cuCtxSynchronize 4. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. CUDA_ERROR_DEINITIALIZED. cuCtxDetach. Similarly. this function fails. asynchronous launches. CUDA_ERROR_OUT_OF_MEMORY. or the context is current to any CPU thread other than the current one.4. cuCtxDestroy. Documentation for nvidia-smi can be obtained by passing a -h option to it. Note to Linux users: Context creation will fail with CUDA_ERROR_UNKNOWN if the compute mode of the device is CU_COMPUTEMODE_PROHIBITED. CUDA_ERROR_NOT_INITIALIZED.29. context creation will also fail with CUDA_ERROR_UNKNOWN if the compute mode for the device is set to CU_COMPUTEMODE_EXCLUSIVE and there is already an active context on the device. CUDA_ERROR_INVALID_VALUE. Parameters: ctx .29 Context Management 175 • CU_CTX_MAP_HOST: Instruct CUDA to support mapped pinned allocations. cuCtxGetApiVersion. CUDA_ERROR_DEINITIALIZED.2.Device to create context on Returns: CUDA_SUCCESS. If the context usage count is not equal to 1.Returned context handle of the new context flags . cuCtxGetDevice.

cuCtxGetLimit. Parameters: ctx . CUDA_ERROR_NOT_INITIALIZED. cuCtxSetLimit. cuCtxPushCurrent. returns the API version used to create the currently bound context. cuCtxDetach.Pointer to version Returns: CUDA_SUCCESS.4 CUresult cuCtxDetach (CUcontext ctx) Decrements the usage count of the context ctx. CUDA_ERROR_DEINITIALIZED. cuCtxSetCacheConfig. cuCtxDetach. cuCtxGetApiVersion. cuCtxGetLimit. cuCtxSetCacheConfig. cuCtxSynchronize CUDA_- Generated for NVIDIA CUDA Library by Doxygen . CUDA_- 4. This wil return the API version used to create a context (for example. See also: cuCtxAttach. If ctx is NULL.Context to check version . cuCtxGetDevice. cuCtxSetCacheConfig. The context must be a handle that was passed back by cuCtxCreate() or cuCtxAttach().5 CUresult cuCtxGetApiVersion (CUcontext ctx. asynchronous launches. cuCtxPushCurrent. cuCtxGetCacheConfig. cuCtxPopCurrent. cuCtxPopCurrent. ERROR_INVALID_CONTEXT. cuCtxSynchronize 4. cuCtxGetLimit. cuCtxCreate. cuCtxPopCurrent. cuCtxGetApiVersion. which library developers can use to direct callers to a specific API version. Note that this API version may not be the same as returned by cuDriverGetVersion. cuCtxGetDevice. See also: cuCtxAttach. Parameters: ctx . 3010 or 3020). unsigned int ∗ version) Returns the API version used to create ctx in version. ERROR_INVALID_CONTEXT Note: Note that this function may also return error codes from previous.176 See also: Module Documentation cuCtxAttach. cuCtxPushCurrent. and destroys the context if the usage count goes to 0.29. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. and must be current to the calling thread. cuCtxGetDevice.2.2. cuCtxSetLimit. cuCtxCreate. cuCtxSetLimit. cuCtxDestroy. cuCtxGetCacheConfig.Context to destroy Returns: CUDA_SUCCESS. cuCtxDestroy.29. CUDA_ERROR_DEINITIALIZED. cuCtxSynchronize CUDA_ERROR_NOT_INITIALIZED. asynchronous launches. cuCtxCreate.

The supported cache configurations are: • CU_FUNC_CACHE_PREFER_NONE: no preference for shared memory or L1 (default) • CU_FUNC_CACHE_PREFER_SHARED: prefer larger shared memory and smaller L1 cache • CU_FUNC_CACHE_PREFER_L1: prefer larger L1 cache and smaller shared memory Parameters: pconfig . CUDA_ERROR_NOT_INITIALIZED. cuCtxSetCacheConfig.29. cuFuncSetCacheConfig CUDA_- 4. This will return a pconfig of CU_FUNC_CACHE_PREFER_NONE on devices where the size of the L1 cache and shared memory are fixed.6 CUresult cuCtxGetCacheConfig (CUfunc_cache ∗ pconfig) 177 On devices where the L1 cache and shared memory use the same hardware resources. CUDA_ERROR_INVALID_VALUE. This is only a preference. CUDA_ERROR_DEINITIALIZED.Returned cache configuration Returns: CUDA_SUCCESS.7 CUresult cuCtxGetDevice (CUdevice ∗ device) Returns in ∗device the ordinal of the current context’s device. cuCtxCreate. cuCtxDestroy. cuCtxGetDevice. CUDA_ERROR_NOT_INITIALIZED. cuCtxPopCurrent. ERROR_INVALID_CONTEXT. Parameters: device . cuCtxGetCacheConfig. cuCtxDetach. cuCtxSynchronize. cuCtxSetLimit. cuCtxPopCurrent.29. See also: cuCtxAttach.29 Context Management 4. cuCtxCreate. this returns through pconfig the preferred cache configuration for the current context. cuCtxSetLimit. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous.Returned device ID for the current context Returns: CUDA_SUCCESS. cuCtxSynchronize CUDA_- Generated for NVIDIA CUDA Library by Doxygen . CUDA_ERROR_DEINITIALIZED. but it is free to choose a different configuration if required to execute functions.4.2. cuCtxGetLimit. The driver will use the requested configuration if possible. cuCtxGetApiVersion. cuCtxGetApiVersion. asynchronous launches.2. cuCtxPushCurrent. cuCtxDestroy. cuCtxDetach. Note: Note that this function may also return error codes from previous. asynchronous launches. ERROR_INVALID_CONTEXT. See also: cuCtxAttach. cuCtxGetLimit. cuCtxPushCurrent. cuCtxSetCacheConfig.

Parameters: pctx .Returned new context handle Returns: CUDA_SUCCESS. cuCtxGetDevice. cuCtxGetDevice. Floating contexts may be destroyed by calling cuCtxDestroy(). cuCtxCreate. cuCtxGetLimit. Parameters: limit . cuCtxSetLimit. cuCtxGetApiVersion. • CU_LIMIT_PRINTF_FIFO_SIZE: size of the FIFO used by the printf() device system call. cuCtxDestroy. cuCtxPopCurrent() passes back the old context handle in ∗pctx. the usage count may be incremented with cuCtxAttach() and decremented with cuCtxDetach(). See also: cuCtxAttach. cuCtxSynchronize 4. cuCtxSynchronize CUDA_ERROR_NOT_INITIALIZED. cuCtxCreate.8 CUresult cuCtxGetLimit (size_t ∗ pvalue. asynchronous launches.9 CUresult cuCtxPopCurrent (CUcontext ∗ pctx) Pops the current CUDA context from the CPU thread. That context may then be made current to a different CPU thread by calling cuCtxPushCurrent(). cuCtxGetCacheConfig. If successful. cuCtxGetApiVersion.178 4. CUlimit limit) Module Documentation Returns in ∗pvalue the current size of limit. cuCtxDestroy. CUDA contexts have a usage count of 1 upon creation. cuCtxSetCacheConfig. ERROR_INVALID_CONTEXT Note: Note that this function may also return error codes from previous.2. CUDA_ERROR_UNSUPPORTED_LIMIT Note: Note that this function may also return error codes from previous.Returned size in bytes of limit Returns: CUDA_SUCCESS. The CUDA context must have a usage count of 1. CUDA_ERROR_DEINITIALIZED.Limit to query pvalue . CUDA_ERROR_INVALID_VALUE. asynchronous launches.29. this function makes that context current to the CPU thread again. cuCtxDetach. The supported CUlimit values are: • CU_LIMIT_STACK_SIZE: stack size of each GPU thread. CUDA_- Generated for NVIDIA CUDA Library by Doxygen . cuCtxSetCacheConfig. cuCtxDetach.29. • CU_LIMIT_MALLOC_HEAP_SIZE: size of the heap used by the malloc() and free() device system calls. cuCtxGetCacheConfig. cuCtxSetLimit.2. See also: cuCtxAttach. cuCtxPopCurrent. cuCtxPushCurrent. If a context was current to the CPU thread before cuCtxCreate() or cuCtxPushCurrent() was called. cuCtxPushCurrent.

not attached to any thread.Floating context to attach Returns: CUDA_SUCCESS. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. CUDA_ERROR_DEINITIALIZED. Any function preference set via cuFuncSetCacheConfig() will be preferred over this context-wide setting. cuCtxGetLimit. cuCtxGetApiVersion. CUDA_ERROR_DEINITIALIZED. CUDA_ERROR_INVALID_VALUE Generated for NVIDIA CUDA Library by Doxygen CUDA_- . The driver will use the requested configuration if possible. The specified context becomes the CPU thread’s current context. cuCtxCreate. cuCtxSetLimit. See also: cuCtxAttach. The context must be "floating. Launching a kernel with a different preference than the most recent preference setting may insert a device-side synchronization point.29. cuCtxGetCacheConfig.11 CUresult cuCtxSetCacheConfig (CUfunc_cache config) On devices where the L1 cache and shared memory use the same hardware resources. ERROR_INVALID_CONTEXT. so all CUDA functions that operate on the current context are affected. This is only a preference.10 CUresult cuCtxPushCurrent (CUcontext ctx) 179 Pushes the given context ctx onto the CPU thread’s stack of current contexts. This setting does nothing on devices where the size of the L1 cache and shared memory are fixed. CUDA_ERROR_NOT_INITIALIZED. cuCtxDetach. cuCtxSetCacheConfig. this sets through config the preferred cache configuration for the current context. cuCtxPopCurrent. ERROR_INVALID_CONTEXT.2.2. Setting the context-wide cache configuration to CU_FUNC_CACHE_PREFER_NONE will cause subsequent kernel launches to prefer to not change the cache configuration unless required to launch the kernel. Contexts are made to float by calling cuCtxPopCurrent().4. CUDA_ERROR_NOT_INITIALIZED. The previous current context may be made current again by calling cuCtxDestroy() or cuCtxPopCurrent().29. cuCtxSynchronize CUDA_- 4. Parameters: ctx . asynchronous launches. cuCtxGetDevice.29 Context Management 4.Requested cache configuration Returns: CUDA_SUCCESS.e." i. but it is free to choose a different configuration if required to execute the function. cuCtxDestroy. The supported cache configurations are: • CU_FUNC_CACHE_PREFER_NONE: no preference for shared memory or L1 (default) • CU_FUNC_CACHE_PREFER_SHARED: prefer larger shared memory and smaller L1 cache • CU_FUNC_CACHE_PREFER_L1: prefer larger L1 cache and smaller shared memory Parameters: config .

CUDA_ERROR_INVALID_VALUE. cuCtxGetLimit. See also: cuCtxAttach. cuCtxPushCurrent.0 and higher. The application can use cuCtxGetLimit() to find out exactly what the limit has been set to. cuCtxDetach. asynchronous launches. The driver is free to modify the requested value to meet h/w requirements (this could be clamping to minimum or maximum values. This limit is only applicable to devices of compute capability 2. Setting each CUlimit has its own specific restrictions.Size in bytes of limit Returns: CUDA_SUCCESS. cuCtxSynchronize Generated for NVIDIA CUDA Library by Doxygen . Attempting to set this limit on devices of compute capability less than 2.0 will result in the error CUDA_ERROR_UNSUPPORTED_LIMIT being returned. Setting CU_LIMIT_MALLOC_HEAP_SIZE must be performed before launching any kernel that uses the malloc() or free() device system calls. cuCtxDetach.0 will result in the error CUDA_ERROR_UNSUPPORTED_LIMIT being returned. cuCtxGetApiVersion. See also: cuCtxAttach. Setting CU_LIMIT_PRINTF_FIFO_SIZE must be performed before launching any kernel that uses the printf() device system call. asynchronous launches. cuCtxGetLimit. This limit is only applicable to devices of compute capability 2. cuCtxCreate.Limit to set value . Attempting to set this limit on devices of compute capability less than 2. cuFuncSetCacheConfig 4. Parameters: limit . cuCtxGetCacheConfig.29. otherwise CUDA_ERROR_INVALID_VALUE will be returned. Attempting to set this limit on devices of compute capability less than 2.2.0 and higher.0 and higher. • CU_LIMIT_PRINTF_FIFO_SIZE controls the size of the FIFO used by the printf() device system call. cuCtxDestroy. This limit is only applicable to devices of compute capability 2. rounding up to nearest element size. cuCtxPopCurrent. etc). cuCtxSetLimit. cuCtxSynchronize. cuCtxDestroy.180 Note: Module Documentation Note that this function may also return error codes from previous. size_t value) Setting limit to value is a request by the application to update the current limit maintained by the context. cuCtxGetDevice. • CU_LIMIT_MALLOC_HEAP_SIZE controls the size of the heap used by the malloc() and free() device system calls.12 CUresult cuCtxSetLimit (CUlimit limit. cuCtxCreate.0 will result in the error CUDA_ERROR_UNSUPPORTED_LIMIT being returned. so each is discussed here. otherwise CUDA_ERROR_INVALID_VALUE will be returned. cuCtxGetCacheConfig. cuCtxPopCurrent. cuCtxGetDevice. cuCtxGetApiVersion. • CU_LIMIT_STACK_SIZE controls the stack size of each GPU thread. cuCtxPushCurrent. CUDA_ERROR_UNSUPPORTED_LIMIT Note: Note that this function may also return error codes from previous. cuCtxSetCacheConfig.

2. cuCtxDestroy.29.4. cuCtxGetApiVersion. CUDA_- Generated for NVIDIA CUDA Library by Doxygen . cuCtxGetLimit. cuCtxSetLimit CUDA_ERROR_NOT_INITIALIZED. cuCtxDetach. asynchronous launches. See also: cuCtxAttach. cuCtxPopCurrent. Returns: CUDA_SUCCESS. ERROR_INVALID_CONTEXT Note: Note that this function may also return error codes from previous. cuCtxGetCacheConfig.29 Context Management 4. If the context was created with the CU_CTX_BLOCKING_SYNC flag. cuCtxGetDevice.13 CUresult cuCtxSynchronize (void) 181 Blocks until the device has completed all preceding requested tasks. cuCtxSynchronize() returns an error if one of the preceding tasks failed. cuCtxPushCurrent cuCtxSetCacheConfig. CUDA_ERROR_DEINITIALIZED. the CPU thread will block until the GPU context has finished its work. cuCtxCreate.

• CUresult cuModuleGetGlobal (CUdeviceptr ∗dptr. const void ∗image) Load a module’s data. cuModuleGetFunction() returns CUDA_ERROR_NOT_FOUND. CUmodule hmod. CUmodule hmod.1 Detailed Description This section describes the module management functions of the low-level CUDA driver application programming interface.30. • CUresult cuModuleUnload (CUmodule hmod) Unloads a module.30. size_t ∗bytes. 4. • CUresult cuModuleGetTexRef (CUtexref ∗pTexRef.Module to retrieve function from name .182 Module Documentation 4.Name of function to retrieve Generated for NVIDIA CUDA Library by Doxygen . • CUresult cuModuleLoadFatBinary (CUmodule ∗module.1 Function Documentation CUresult cuModuleGetFunction (CUfunction ∗ hfunc.30 Module Management Functions • CUresult cuModuleGetFunction (CUfunction ∗hfunc. Parameters: hfunc . void ∗∗optionValues) Load a module’s data with options. const void ∗image. const void ∗fatCubin) Load a module’s data.2. CUmodule hmod. unsigned int numOptions. • CUresult cuModuleLoadData (CUmodule ∗module. • CUresult cuModuleLoadDataEx (CUmodule ∗module. 4.Returned function handle hmod . CUmodule hmod. • CUresult cuModuleLoad (CUmodule ∗module. const char ∗name) Returns a function handle. CUjit_option ∗options. const char ∗name) Returns a handle to a surface reference. If no function of that name exists. const char ∗name) Returns a global pointer from a module.30.2 4. CUmodule hmod. const char ∗name) Returns a handle to a texture reference. const char ∗fname) Loads a compute module. const char ∗ name) Returns in ∗hfunc the handle of the function of name name located in module hmod. • CUresult cuModuleGetSurfRef (CUsurfref ∗pSurfRef.

30 Module Management Returns: 183 CUDA_SUCCESS. const char ∗ name) Returns in ∗pSurfRef the handle of the surface reference of name name in the module hmod.Module to retrieve global from name .Name of global to retrieve Returns: CUDA_SUCCESS. Parameters: pSurfRef . cuModuleGetGlobal() returns CUDA_ERROR_NOT_FOUND. CUDA_ERROR_NOT_FOUND Note: Note that this function may also return error codes from previous. cuModuleLoad. cuModuleLoadData. Both parameters dptr and bytes are optional. cuModuleUnload 4.3 CUresult cuModuleGetSurfRef (CUsurfref ∗ pSurfRef.Module to retrieve surface reference from name . cuModuleLoadFatBinary. cuModuleLoad. cuModuleLoadDataEx. CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_NOT_INITIALIZED. Parameters: dptr .2 CUresult cuModuleGetGlobal (CUdeviceptr ∗ dptr. CUDA_ERROR_INVALID_VALUE. CUDA_ERROR_DEINITIALIZED. it is ignored. If no variable of that name exists.4. cuModuleLoadFatBinary. See also: cuModuleGetGlobal. CUmodule hmod. CUDA_ERROR_DEINITIALIZED. const char ∗ name) Returns in ∗dptr and ∗bytes the base pointer and size of the global of name name located in module hmod. cuModuleUnload 4.Returned surface reference hmod . cuModuleLoadData. CUDA_ERROR_INVALID_CONTEXT. asynchronous launches.Returned global device pointer bytes . cuModuleGetSurfRef() returns CUDA_ERROR_NOT_FOUND. CUDA_ERROR_INVALID_VALUE.Returned global size in bytes hmod . If one of them is NULL. CUmodule hmod. cuModuleLoadDataEx.30. cuModuleGetTexRef. CUDA_ERROR_NOT_FOUND Note: Note that this function may also return error codes from previous. asynchronous launches. CUDA_ERROR_INVALID_CONTEXT.30.Name of surface reference to retrieve Generated for NVIDIA CUDA Library by Doxygen . cuModuleGetTexRef. size_t ∗ bytes.2. See also: cuModuleGetFunction.2. If no surface reference of that name exists.

Module to retrieve texture reference from name .Filename of module to load Generated for NVIDIA CUDA Library by Doxygen . if the memory for functions and data (constant and global) needed by the module cannot be allocated. cuModuleGetTexRef. since it will be destroyed when the module is unloaded.30. cuModuleLoadData. CUDA_ERROR_INVALID_VALUE. CUmodule hmod. CUDA_ERROR_DEINITIALIZED. See also: cuModuleGetFunction. The file should be a cubin file as output by nvcc or a PTX file. CUDA_ERROR_NOT_INITIALIZED. cuModuleGetGlobal. See also: cuModuleGetFunction. cuModuleUnload 4. CUDA_ERROR_INVALID_CONTEXT. const char ∗ name) Returns in ∗pTexRef the handle of the texture reference of name name in the module hmod. asynchronous launches. cuModuleLoadDataEx. Parameters: module .Returned texture reference hmod . The CUDA driver API does not attempt to lazily allocate the resources needed by a module. cuModuleLoadDataEx. cuModuleLoadFatBinary. If no texture reference of that name exists.Returned module fname . CUDA_ERROR_NOT_INITIALIZED. asynchronous launches.184 Returns: Module Documentation CUDA_SUCCESS.Name of texture reference to retrieve Returns: CUDA_SUCCESS. cuModuleLoad. cuModuleLoadData. either as output by nvcc or handwrtten. CUDA_ERROR_DEINITIALIZED. CUDA_ERROR_NOT_FOUND Note: Note that this function may also return error codes from previous.4 CUresult cuModuleGetTexRef (CUtexref ∗ pTexRef. This texture reference handle should not be destroyed.5 CUresult cuModuleLoad (CUmodule ∗ module. CUDA_ERROR_NOT_FOUND Note: Note that this function may also return error codes from previous. CUDA_ERROR_INVALID_VALUE. cuModuleLoad() fails. cuModuleGetGlobal. const char ∗ fname) Takes a filename fname and loads the corresponding module module into the current context.2. CUDA_ERROR_INVALID_CONTEXT. cuModuleGetSurfRef. Parameters: pTexRef . cuModuleLoadFatBinary. cuModuleUnload 4. cuModuleGetTexRef() returns CUDA_ERROR_NOT_FOUND.30. cuModuleLoad.2.

cuModuleLoadData. CUDA_ERROR_SHARED_OBJECT_INIT_FAILED Note: Note that this function may also return error codes from previous.6 CUresult cuModuleLoadData (CUmodule ∗ module.4. cuModuleUnload 4. The number of total options is supplied via numOptions. passing a cubin or PTX file as a NULL-terminated text string. cuModuleGetTexRef. See also: cuModuleGetFunction. CUDA_ERROR_NOT_INITIALIZED. cuModuleUnload 4. Supported options are (types for the option values are specified in parentheses after the option name): Generated for NVIDIA CUDA Library by Doxygen . See also: cuModuleGetFunction. cuModuleGetGlobal. CUDA_ERROR_NOT_FOUND. Parameters: module . CUDA_ERROR_DEINITIALIZED. const void ∗ image. Any outputs will be returned via optionValues.2. CUDA_ERROR_SHARED_OBJECT_INIT_FAILED Note: Note that this function may also return error codes from previous. passing a cubin or PTX file as a NULL-terminated text string. cuModuleLoadDataEx. cuModuleGetGlobal.30. asynchronous launches.2. cuModuleLoad. The pointer may be obtained by mapping a cubin or PTX file. cuModuleGetTexRef.30 Module Management Returns: 185 CUDA_SUCCESS. CUDA_ERROR_OUT_OF_MEMORY. cuModuleLoadFatBinary. CUDA_ERROR_DEINITIALIZED. cuModuleLoadDataEx. CUDA_ERROR_INVALID_CONTEXT. CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND. CUDA_ERROR_NOT_INITIALIZED. or incorporating a cubin object into the executable resources and using operating system calls such as Windows FindResource() to obtain the pointer. CUDA_ERROR_INVALID_VALUE. CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND. or incorporating a cubin object into the executable resources and using operating system calls such as Windows FindResource() to obtain the pointer.Module data to load Returns: CUDA_SUCCESS.Returned module image . CUDA_ERROR_FILE_NOT_FOUND. asynchronous launches. const void ∗ image) Takes a pointer image and loads the corresponding module module into the current context. CUDA_ERROR_OUT_OF_MEMORY. CUDA_ERROR_INVALID_CONTEXT. Options are passed as an array via options and any corresponding parameters are passed in optionValues.7 CUresult cuModuleLoadDataEx (CUmodule ∗ module. CUjit_option ∗ options. unsigned int numOptions. cuModuleLoadFatBinary. CUDA_ERROR_INVALID_VALUE. The pointer may be obtained by mapping a cubin or PTX file. void ∗∗ optionValues) Takes a pointer image and loads the corresponding module module into the current context.30.

spent compiling the PTX code. possible values are: – CU_PREFER_PTX – CU_PREFER_BINARY Parameters: module . • CU_JIT_OPTIMIZATION_LEVEL: (unsigned int) input is the level of optimization to apply to generated code (0 .186 Module Documentation • CU_JIT_MAX_REGISTERS: (unsigned int) input specifies the maximum number of registers per thread. CUDA_ERROR_INVALID_CONTEXT. • CU_JIT_TARGET: (unsigned int for enumerated type CUjit_target_enum) input is the compilation target based on supplied CUjit_target_enum. with 4 being the default and highest level. • CU_JIT_INFO_LOG_BUFFER: (char∗) input is a pointer to a buffer in which to print any informational log messages from PTX assembly (the buffer size is specified via option CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES).4).Number of options options .Module data to load numOptions . • CU_JIT_ERROR_LOG_BUFFER: (char∗) input is a pointer to a buffer in which to print any error log messages from PTX assembly (the buffer size is specified via option CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES). CUDA_ERROR_INVALID_VALUE. CUDA_ERROR_OUT_OF_MEMORY. in milliseconds. • CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES: (unsigned int) input is the size in bytes of the buffer. CUDA_ERROR_SHARED_OBJECT_INIT_FAILED Generated for NVIDIA CUDA Library by Doxygen .Option values for JIT Returns: CUDA_SUCCESS. CUDA_ERROR_DEINITIALIZED. output is the number of bytes filled with messages. • CU_JIT_THREADS_PER_BLOCK: (unsigned int) input specifies number of threads per block to target compilation for. CUDA_ERROR_NOT_INITIALIZED. • CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES: (unsigned int) input is the size in bytes of the buffer. output returns the number of threads the compiler actually targeted. possible values are: – CU_TARGET_COMPUTE_10 – CU_TARGET_COMPUTE_11 – CU_TARGET_COMPUTE_12 – CU_TARGET_COMPUTE_13 – CU_TARGET_COMPUTE_20 • CU_JIT_FALLBACK_STRATEGY: (unsigned int for enumerated type CUjit_fallback_enum) chooses fallback strategy if matching cubin is not found.Options for JIT optionValues . • CU_JIT_TARGET_FROM_CUCONTEXT: (No option value) causes compilation target to be determined based on current attached context (default). CUDA_ERROR_NO_BINARY_FOR_GPU. • CU_JIT_WALL_TIME: (float) output returns the float value of wall clock time. output is the number of bytes filled with messages. CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND.Returned module image .

cuModuleGetTexRef. asynchronous launches. cuModuleGetTexRef. cuModuleGetTexRef. cuModuleUnload 4. cuModuleLoad. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. CUDA_ERROR_OUT_OF_MEMORY.Module to unload Returns: CUDA_SUCCESS. and therefore this function is an internal function in this version of CUDA. Parameters: module . See also: 187 cuModuleGetFunction.8 CUresult cuModuleLoadFatBinary (CUmodule ∗ module.Fat binary to load Returns: CUDA_SUCCESS. cuModuleGetGlobal. CUDA_ERROR_INVALID_VALUE. CUDA_ERROR_INVALID_CONTEXT. cuModuleLoad. cuModuleUnload 4. CUDA_ERROR_NOT_FOUND. cuModuleLoadData.4. See also: cuModuleGetFunction. cuModuleLoadFatBinary CUDA_- Generated for NVIDIA CUDA Library by Doxygen . CUDA_ERROR_DEINITIALIZED.30. but compiled and optimized for different architectures. which is a collection of different cubin files. asynchronous launches. Parameters: hmod .30 Module Management Note: Note that this function may also return error codes from previous.9 CUresult cuModuleUnload (CUmodule hmod) Unloads a module hmod from the current context. CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND. More information can be found in the nvcc document. cuModuleLoadData. asynchronous launches. See also: cuModuleGetFunction.2. cuModuleLoad.2. There is currently no documented API for constructing and using fat binary objects by programmers. CUDA_ERROR_DEINITIALIZED. cuModuleGetGlobal.Returned module fatCubin . cuModuleLoadDataEx. cuModuleLoadData. CUDA_ERROR_SHARED_OBJECT_INIT_FAILED Note: Note that this function may also return error codes from previous. ERROR_INVALID_CONTEXT. cuModuleGetGlobal. CUDA_ERROR_NOT_INITIALIZED. cuModuleLoadDataEx. The pointer represents a fat binary object. const void ∗ fatCubin) Takes a pointer fatCubin and loads the corresponding module module into the current context. CUDA_ERROR_NO_BINARY_FOR_GPU. all representing the same device code. CUDA_ERROR_NOT_INITIALIZED. cuModuleLoadFatBinary.30.

• CUresult cuArrayGetDescriptor (CUDA_ARRAY_DESCRIPTOR ∗pArrayDescriptor. • CUresult cuMemcpyAtoD (CUdeviceptr dstDevice. size_t dstOffset. • CUresult cuMemAlloc (CUdeviceptr ∗dptr. CUarray srcArray. CUstream hStream) Copies memory for 3D arrays.188 Module Documentation 4. CUarray hArray) Get a 1D or 2D CUDA array descriptor. • CUresult cuMemcpy2DAsync (const CUDA_MEMCPY2D ∗pCopy. • CUresult cuArrayDestroy (CUarray hArray) Destroys a CUDA array. size_t ByteCount) Copies memory from Array to Array. • CUresult cuArray3DGetDescriptor (CUDA_ARRAY3D_DESCRIPTOR ∗pArrayDescriptor. CUstream hStream) Copies memory for 2D arrays. size_t bytesize) Allocates page-locked host memory. const CUDA_ARRAY3D_DESCRIPTOR ∗pAllocateArray) Creates a 3D CUDA array. const CUDA_ARRAY_DESCRIPTOR ∗pAllocateArray) Creates a 1D or 2D CUDA array. • CUresult cuArrayCreate (CUarray ∗pHandle. • CUresult cuMemAllocHost (void ∗∗pp. • CUresult cuMemcpy2DUnaligned (const CUDA_MEMCPY2D ∗pCopy) Copies memory for 2D arrays. • CUresult cuMemcpy3DAsync (const CUDA_MEMCPY3D ∗pCopy. • CUresult cuMemcpyAtoA (CUarray dstArray. unsigned int ElementSizeBytes) Allocates pitched device memory. size_t ∗pPitch. size_t WidthInBytes. size_t srcOffset. • CUresult cuMemcpy3D (const CUDA_MEMCPY3D ∗pCopy) Copies memory for 3D arrays. size_t ByteCount) Generated for NVIDIA CUDA Library by Doxygen . • CUresult cuMemAllocPitch (CUdeviceptr ∗dptr. size_t bytesize) Allocates device memory. CUarray hArray) Get a 3D CUDA array descriptor. CUarray srcArray. size_t Height. • CUresult cuMemcpy2D (const CUDA_MEMCPY2D ∗pCopy) Copies memory for 2D arrays.31 Memory Management Functions • CUresult cuArray3DCreate (CUarray ∗pHandle. size_t srcOffset.

CUstream hStream) Copies memory from Array to Host. • CUresult cuMemcpyDtoH (void ∗dstHost. size_t ∗total) Gets free and total memory. • CUresult cuMemcpyAtoHAsync (void ∗dstHost. Generated for NVIDIA CUDA Library by Doxygen . size_t ByteCount. size_t srcOffset. CUarray srcArray. size_t dstOffset. CUdeviceptr dptr) Get information on memory allocations. size_t ByteCount. CUdeviceptr srcDevice. • CUresult cuMemcpyHtoAAsync (CUarray dstArray. size_t ByteCount) Copies memory from Host to Device. size_t ByteCount. • CUresult cuMemcpyDtoDAsync (CUdeviceptr dstDevice. CUstream hStream) Copies memory from Host to Device. • CUresult cuMemcpyHtoA (CUarray dstArray. • CUresult cuMemGetAddressRange (CUdeviceptr ∗pbase. CUdeviceptr srcDevice. size_t dstOffset. size_t ByteCount. • CUresult cuMemcpyHtoDAsync (CUdeviceptr dstDevice. • CUresult cuMemFreeHost (void ∗p) Frees page-locked host memory. size_t ByteCount. size_t ByteCount) Copies memory from Device to Array. CUstream hStream) Copies memory from Host to Array. CUstream hStream) Copies memory from Device to Host. 189 • CUresult cuMemcpyAtoH (void ∗dstHost. const void ∗srcHost. • CUresult cuMemFree (CUdeviceptr dptr) Frees device memory. CUdeviceptr srcDevice. CUarray srcArray. CUdeviceptr srcDevice. const void ∗srcHost. CUstream hStream) Copies memory from Device to Device. size_t ByteCount) Copies memory from Host to Array. size_t ByteCount) Copies memory from Device to Device. size_t ∗psize. size_t srcOffset. • CUresult cuMemcpyDtoA (CUarray dstArray. const void ∗srcHost. • CUresult cuMemcpyHtoD (CUdeviceptr dstDevice.31 Memory Management Copies memory from Array to Device.4. size_t ByteCount) Copies memory from Array to Host. const void ∗srcHost. size_t dstOffset. • CUresult cuMemGetInfo (size_t ∗free. • CUresult cuMemcpyDtoHAsync (void ∗dstHost. size_t ByteCount) Copies memory from Device to Host. • CUresult cuMemcpyDtoD (CUdeviceptr dstDevice. CUdeviceptr srcDevice.

size_t dstPitch. • CUresult cuMemsetD16Async (CUdeviceptr dstDevice. size_t N) Initializes device memory. size_t Width. unsigned int ui. size_t N) Initializes device memory. CUstream hStream) Sets device memory. size_t dstPitch. size_t Width. CUstream hStream) Sets device memory. void ∗p) Passes back flags that were used for a pinned allocation. unsigned char uc. • CUresult cuMemsetD2D16 (CUdeviceptr dstDevice. • CUresult cuMemHostGetDevicePointer (CUdeviceptr ∗pdptr. • CUresult cuMemsetD32 (CUdeviceptr dstDevice. size_t dstPitch. unsigned short us. unsigned short us. • CUresult cuMemsetD16 (CUdeviceptr dstDevice. unsigned int Flags) Passes back device pointer of mapped pinned memory. CUstream hStream) Sets device memory. size_t Height. size_t dstPitch. size_t Height) Initializes device memory. size_t dstPitch. unsigned int ui. • CUresult cuMemsetD2D8 (CUdeviceptr dstDevice. unsigned int Flags) Allocates page-locked host memory. CUstream hStream) Sets device memory. • CUresult cuMemsetD32Async (CUdeviceptr dstDevice. size_t Width.190 Module Documentation • CUresult cuMemHostAlloc (void ∗∗pp. CUstream hStream) Sets device memory. • CUresult cuMemsetD2D8Async (CUdeviceptr dstDevice. unsigned int ui. size_t Height) Initializes device memory. size_t bytesize. size_t Height) Initializes device memory. size_t N. void ∗p. size_t dstPitch. size_t Width. • CUresult cuMemsetD2D32Async (CUdeviceptr dstDevice. • CUresult cuMemsetD8Async (CUdeviceptr dstDevice. size_t Width. size_t N. unsigned int ui. • CUresult cuMemsetD2D16Async (CUdeviceptr dstDevice. CUstream hStream) Sets device memory. • CUresult cuMemsetD2D32 (CUdeviceptr dstDevice. size_t N) Initializes device memory. • CUresult cuMemsetD8 (CUdeviceptr dstDevice. unsigned char uc. Generated for NVIDIA CUDA Library by Doxygen . unsigned short us. unsigned char uc. unsigned char uc. • CUresult cuMemHostGetFlags (unsigned int ∗pFlags. size_t Height. unsigned short us. size_t N. size_t Width. size_t Height.

it may be 1. cuSurfRefSetArray will fail when attempting to bind the CUDA array to a surface reference. CUarray_format is defined as: typedef enum CUarray_format_enum { CU_AD_FORMAT_UNSIGNED_INT8 = 0x01. where: • Width. unsigned int Depth. Generated for NVIDIA CUDA Library by Doxygen . Height. unsigned int NumChannels.2. desc.Depth = 0. Here are examples of CUDA array descriptions: Description for a CUDA array of 2048 floats: CUDA_ARRAY3D_DESCRIPTOR desc. and three-dimensional otherwise.Format = CU_AD_FORMAT_FLOAT. CU_AD_FORMAT_SIGNED_INT32 = 0x0a.2 4. } CUDA_ARRAY3D_DESCRIPTOR. CU_AD_FORMAT_UNSIGNED_INT32 = 0x03. desc. 2.Width = 2048. the CUDA array is one-dimensional if height and depth are 0. CU_AD_FORMAT_UNSIGNED_INT16 = 0x02. two-dimensional if depth is 0.31 Memory Management 191 4. If this flag is not set. unsigned int Height. CUarray_format Format. height. and Depth are the width.31. const CUDA_ARRAY3D_DESCRIPTOR ∗ pAllocateArray) Creates a CUDA array according to the CUDA_ARRAY3D_DESCRIPTOR structure pAllocateArray and returns a handle to the new CUDA array in ∗pHandle. CU_AD_FORMAT_FLOAT = 0x20 } CUarray_format. 4. CU_AD_FORMAT_HALF = 0x10. The CUDA_ARRAY3D_DESCRIPTOR is defined as: typedef struct { unsigned int Width. • Flags may be set to CUDA_ARRAY3D_SURFACE_LDST to enable surface references to be bound to the CUDA array.1 Function Documentation CUresult cuArray3DCreate (CUarray ∗ pHandle. • Format specifies the format of the elements. CU_AD_FORMAT_SIGNED_INT16 = 0x09. and depth of the CUDA array (in elements).1 Detailed Description This section describes the memory management functions of the low-level CUDA driver application programming interface.NumChannels = 1. desc. • NumChannels specifies the number of packed components per CUDA array element.4.Height = 0. or 4. unsigned int Flags.31.31. desc. CU_AD_FORMAT_SIGNED_INT8 = 0x08. desc.

cuMemcpyAtoD. cuMemcpyDtoD.Format = CU_AD_FORMAT_FLOAT.Depth = 0. CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_INVALID_VALUE. Parameters: pArrayDescriptor . cuMemcpy2D. desc. desc. desc. cuMemAllocPitch.3D array to get descriptor of Generated for NVIDIA CUDA Library by Doxygen . desc. cuMemcpyDtoH. cuMemcpy2DUnaligned. cuMemsetD2D16. cuArrayCreate.Returned 3D array descriptor hArray . This function may be called on 1D and 2D arrays.3D array descriptor Returns: CUDA_SUCCESS. but need to know the CUDA array parameters for validation or other purposes. CUDA_ERROR_OUT_OF_MEMORY. cuMemcpyDtoDAsync. cuMemsetD2D32. Parameters: pHandle . cuMemFree. CUDA_ERROR_DEINITIALIZED. cuMemGetInfo. cuMemsetD32 4. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. cuArrayDestroy. cuMemsetD16.192 Description for a 64 x 64 CUDA array of floats: CUDA_ARRAY3D_DESCRIPTOR desc. cuMemHostGetDevicePointer. desc. cuMemsetD8. Module Documentation Description for a width x height x depth CUDA array of 64-bit. desc. cuMemcpyDtoHAsync. cuMemFreeHost. cuMemsetD2D8. cuMemcpyAtoA. cuMemAllocHost.2. cuMemcpyHtoD. in which case the Height and/or Depth members of the descriptor struct will be set to 0.Height = 64. cuMemcpy2DAsync. cuMemcpyAtoH. It is useful for subroutines that have been passed a CUDA array.Returned array pAllocateArray .Width = 64. desc.NumChannels = 4.FormatFlags = CU_AD_FORMAT_HALF.2 CUresult cuArray3DGetDescriptor (CUDA_ARRAY3D_DESCRIPTOR ∗ pArrayDescriptor. cuMemcpyAtoHAsync. CUDA_ERROR_INVALID_CONTEXT. cuMemcpyDtoA. cuMemcpy3DAsync. 4x16-bit float16’s: CUDA_ARRAY3D_DESCRIPTOR desc. cuArrayGetDescriptor.Width = width. cuMemGetAddressRange. cuMemHostAlloc. cuMemcpyHtoA. desc. See also: cuArray3DGetDescriptor. desc.NumChannels = 1. CUarray hArray) Returns in ∗pArrayDescriptor a descriptor containing information on the format and dimensions of the CUDA array hArray.31. desc. cuMemcpyHtoAAsync.Height = height. cuMemcpyHtoDAsync. cuMemAlloc.Depth = depth. asynchronous launches. cuMemcpy3D.

The CUDA_ARRAY_DESCRIPTOR is defined as: typedef struct { unsigned int Width. } CUDA_ARRAY_DESCRIPTOR. CUDA_ERROR_INVALID_HANDLE Note: Note that this function may also return error codes from previous. • NumChannels specifies the number of packed components per CUDA array element. CU_AD_FORMAT_HALF = 0x10. the CUDA array is onedimensional if height is 0. cuMemGetAddressRange. cuMemcpyAtoD. cuMemcpyDtoH. unsigned int Height. CU_AD_FORMAT_SIGNED_INT16 = 0x09. CUDA_ERROR_NOT_INITIALIZED. cuMemcpy3DAsync. cuMemcpyDtoDAsync. Here are examples of CUDA array descriptions: Description for a CUDA array of 2048 floats: Generated for NVIDIA CUDA Library by Doxygen . const CUDA_ARRAY_DESCRIPTOR ∗ pAllocateArray) Creates a CUDA array according to the CUDA_ARRAY_DESCRIPTOR structure pAllocateArray and returns a handle to the new CUDA array in ∗pHandle. cuMemcpyDtoD. • Format specifies the format of the elements. cuMemcpyHtoD. where: • Width. two-dimensional otherwise. cuMemcpy2DAsync. and height of the CUDA array (in elements). cuMemAllocHost. See also: cuArray3DCreate. unsigned int NumChannels. cuMemcpy2D. cuMemFreeHost. CU_AD_FORMAT_UNSIGNED_INT16 = 0x02. cuMemHostAlloc. cuMemsetD2D16. cuMemcpyDtoA. cuMemcpyAtoHAsync. cuMemcpyHtoDAsync. cuMemcpyHtoA. cuMemsetD2D8. cuMemHostGetDevicePointer. CUarray_format Format. cuMemsetD32 4. cuMemcpy3D. CU_AD_FORMAT_SIGNED_INT8 = 0x08. CUDA_ERROR_DEINITIALIZED. it may be 1. cuMemcpyHtoAAsync. cuMemFree. cuMemsetD2D32. cuMemAlloc. CU_AD_FORMAT_FLOAT = 0x20 } CUarray_format.2. asynchronous launches. and Height are the width. cuMemAllocPitch. cuArrayGetDescriptor.31 Memory Management Returns: 193 CUDA_SUCCESS. 2. cuArrayCreate. cuArrayDestroy. cuMemsetD16. cuMemcpy2DUnaligned. CUDA_ERROR_INVALID_CONTEXT. cuMemcpyAtoH. cuMemcpyDtoHAsync. cuMemsetD8.3 CUresult cuArrayCreate (CUarray ∗ pHandle. cuMemcpyAtoA.4. CU_AD_FORMAT_UNSIGNED_INT32 = 0x03.31. CUDA_ERROR_INVALID_VALUE. CUarray_format is defined as: typedef enum CUarray_format_enum { CU_AD_FORMAT_UNSIGNED_INT8 = 0x01. cuMemGetInfo. CU_AD_FORMAT_SIGNED_INT32 = 0x0a. or 4.

FormatFlags = CU_AD_FORMAT_UNSIGNED_INT8. desc. cuMemcpy2D. cuMemcpyHtoD.Width = width. cuMemAllocPitch. cuMemcpy2DAsync. Module Documentation Description for a 64 x 64 CUDA array of floats: CUDA_ARRAY_DESCRIPTOR desc. cuMemsetD2D16.194 CUDA_ARRAY_DESCRIPTOR desc. cuMemcpy2DUnaligned. desc. Description for a width x height CUDA array of 16-bit elements. cuMemcpyDtoH. cuMemcpyHtoA. asynchronous launches. cuMemGetAddressRange.Height = height.Height = height. cuMemHostAlloc.Format = CU_AD_FORMAT_FLOAT. cuArrayGetDescriptor. cuMemcpyDtoA. cuMemsetD16. cuMemcpy3DAsync. cuMemcpyDtoDAsync. cuMemsetD32 Generated for NVIDIA CUDA Library by Doxygen .Format = CU_AD_FORMAT_FLOAT.Returned array pAllocateArray . CUDA_ERROR_DEINITIALIZED. Description for a width x height CUDA array of 64-bit.Width = width. cuMemcpyAtoD. desc. See also: cuArray3DCreate. cuMemAlloc.Width = 2048. desc. cuMemFree. desc. desc. cuMemcpyAtoA. cuMemsetD2D32. cuMemsetD8. desc. each of which is two 8-bit unsigned chars: CUDA_ARRAY_DESCRIPTOR arrayDesc. 4x16-bit float16’s: CUDA_ARRAY_DESCRIPTOR desc. desc.Array descriptor Returns: CUDA_SUCCESS. desc. desc. cuMemcpyDtoHAsync. cuMemcpyHtoAAsync. desc. cuArrayDestroy. Parameters: pHandle . cuMemcpyHtoDAsync. desc. desc. desc.NumChannels = 1. cuMemcpyAtoH. CUDA_ERROR_OUT_OF_MEMORY. cuMemGetInfo. CUDA_ERROR_INVALID_VALUE.Width = 64.Height = 1.NumChannels = 1. cuMemFreeHost. cuMemcpyAtoHAsync. cuArray3DGetDescriptor. CUDA_ERROR_INVALID_CONTEXT. CUDA_ERROR_NOT_INITIALIZED.FormatFlags = CU_AD_FORMAT_HALF.NumChannels = 4. cuMemAllocHost. cuMemHostGetDevicePointer.Height = 64. desc. cuMemcpy3D. desc. cuMemsetD2D8.NumChannels = 2. cuMemcpyDtoD. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous.

cuMemcpyDtoD. cuMemcpyDtoD. cuMemcpyDtoH. cuMemcpy2DUnaligned. cuMemcpyHtoAAsync. CUDA_ERROR_ARRAY_IS_MAPPED Note: Note that this function may also return error codes from previous. cuMemsetD2D8. cuMemGetInfo. cuMemsetD32 4. cuMemFree. but need to know the CUDA array parameters for validation or other purposes. cuMemHostGetDevicePointer. cuMemcpyHtoA. See also: cuArray3DCreate. cuMemGetAddressRange. cuMemsetD2D16. cuArrayCreate. cuMemAllocPitch. cuMemcpyDtoDAsync. See also: cuArray3DCreate.4. cuMemcpyAtoHAsync. CUDA_ERROR_INVALID_HANDLE Note: Note that this function may also return error codes from previous. cuMemcpyHtoAAsync. cuMemsetD8. cuMemcpy3D.2. asynchronous launches. cuMemcpyHtoDAsync. cuMemAlloc. cuArray3DGetDescriptor. cuMemsetD8.5 CUresult cuArrayGetDescriptor (CUDA_ARRAY_DESCRIPTOR ∗ pArrayDescriptor. cuMemcpyDtoA. CUDA_ERROR_NOT_INITIALIZED.31.31. cuMemcpyDtoHAsync. cuMemGetAddressRange. CUDA_ERROR_INVALID_VALUE. cuMemsetD16. cuMemcpyHtoA. cuMemsetD2D16.Array to destroy Returns: CUDA_SUCCESS. Parameters: pArrayDescriptor . CUarray hArray) Returns in ∗pArrayDescriptor a descriptor containing information on the format and dimensions of the CUDA array hArray. CUDA_ERROR_DEINITIALIZED. cuMemcpyHtoD.Array to get descriptor of Returns: CUDA_SUCCESS. cuMemFreeHost. CUDA_ERROR_INVALID_CONTEXT. cuMemcpy2DUnaligned. cuMemsetD2D8. cuMemcpy2DAsync. cuMemcpyHtoDAsync. cuMemcpyDtoHAsync.Returned array descriptor hArray . cuMemGetInfo. Parameters: hArray . cuMemsetD16. cuMemcpyAtoH. cuMemHostAlloc. cuArrayDestroy. cuMemcpy2D. cuMemAllocHost. cuMemcpy2D. asynchronous launches. It is useful for subroutines that have been passed a CUDA array.2. cuMemcpyDtoA. cuMemAllocPitch. cuMemcpyAtoD. cuMemcpyAtoA. cuArray3DGetDescriptor. cuMemcpyHtoD. CUDA_ERROR_DEINITIALIZED.4 CUresult cuArrayDestroy (CUarray hArray) 195 Destroys the CUDA array hArray. cuMemsetD2D32. cuMemcpy3DAsync. cuMemsetD2D32. CUDA_ERROR_INVALID_CONTEXT. cuMemFreeHost. cuMemAlloc. cuMemcpyDtoDAsync. cuMemcpyDtoH. cuMemAllocHost. cuMemHostGetDevicePointer. cuMemcpyAtoHAsync. cuMemsetD32 Generated for NVIDIA CUDA Library by Doxygen . cuMemcpy2DAsync. cuArrayCreate. CUDA_ERROR_NOT_INITIALIZED. cuMemcpy3D. cuMemcpyAtoD. cuMemcpyAtoH. cuMemcpyAtoA.31 Memory Management 4. cuMemFree. cuArrayGetDescriptor. cuMemHostAlloc. CUDA_ERROR_INVALID_HANDLE. cuMemcpy3DAsync.

See also: cuArray3DCreate.Requested allocation size in bytes Returns: CUDA_SUCCESS. cuMemcpyAtoD.Returned host pointer to page-locked memory bytesize .31. cuMemcpyAtoHAsync. cuArrayDestroy. cuMemsetD2D32. cuMemcpy3DAsync. Parameters: dptr . cuMemcpyDtoD. cuMemcpy2DAsync. cuMemcpy2DUnaligned. asynchronous launches. it can be read or written with much higher bandwidth than pageable memory obtained with functions such as malloc().2.7 CUresult cuMemAllocHost (void ∗∗ pp. CUDA_ERROR_DEINITIALIZED. cuMemsetD16. If bytesize is 0. cuMemFree. cuMemAlloc() returns CUDA_ERROR_INVALID_VALUE.Returned device pointer bytesize . cuMemcpyHtoA. CUDA_ERROR_INVALID_VALUE. since it reduces the amount of memory available to the system for paging. cuMemsetD2D16. Since the memory can be accessed directly by the device. cuMemFreeHost. CUDA_ERROR_INVALID_CONTEXT. Parameters: pp . cuMemcpyDtoH. CUDA_ERROR_INVALID_VALUE. CUDA_ERROR_NOT_INITIALIZED. cuMemAllocHost. CUDA_ERROR_NOT_INITIALIZED. cuMemsetD32 4. Allocating excessive amounts of memory with cuMemAllocHost() may degrade system performance.Requested allocation size in bytes Returns: CUDA_SUCCESS. cuMemGetAddressRange. cuMemHostAlloc. cuMemAllocPitch.196 4. cuMemGetInfo. As a result. CUDA_ERROR_DEINITIALIZED. cuArrayGetDescriptor. this function is best used sparingly to allocate staging areas for data exchange between host and device. CUDA_ERROR_OUT_OF_MEMORY Note: Note that this function may also return error codes from previous. cuMemsetD2D8. size_t bytesize) Allocates bytesize bytes of host memory that is page-locked and accessible to the device. cuMemcpy3D. cuMemcpyDtoA. cuMemcpyDtoDAsync. The memory is not cleared. cuMemHostGetDevicePointer.2. cuMemsetD8. cuMemcpyHtoD.31. cuMemcpy2D. The driver tracks the virtual memory ranges allocated with this function and automatically accelerates calls to functions such as cuMemcpy(). cuMemcpyAtoH. cuArray3DGetDescriptor. cuArrayCreate. size_t bytesize) Module Documentation Allocates bytesize bytes of linear memory on the device and returns in ∗dptr a pointer to the allocated memory. CUDA_ERROR_INVALID_CONTEXT. Generated for NVIDIA CUDA Library by Doxygen . asynchronous launches. CUDA_ERROR_OUT_OF_MEMORY Note: Note that this function may also return error codes from previous. cuMemcpyAtoA. cuMemcpyHtoAAsync. The allocated memory is suitably aligned for any kind of variable. cuMemcpyHtoDAsync. cuMemcpyDtoHAsync.6 CUresult cuMemAlloc (CUdeviceptr ∗ dptr.

ElementSizeBytes specifies the size of the largest reads and writes that will be performed on the memory range. it is recommended that programmers consider performing pitch allocations using cuMemAllocPitch(). cuMemsetD2D16. cuArrayGetDescriptor. cuArrayCreate. cuArray3DGetDescriptor. cuMemAlloc. the kernel will run correctly. Given the row and column of an array element of type T.Returned device pointer pPitch . CUDA_ERROR_NOT_INITIALIZED. the address is computed as: T* pElement = (T*)((char*)BaseAddress + Row * Pitch) + Column. cuMemcpy2DAsync. cuMemHostGetDevicePointer.31. cuMemcpy2D.Returned pitch of allocation in bytes WidthInBytes . For allocations of 2D arrays. Due to alignment restrictions in the hardware. cuMemcpyHtoA. cuMemcpy3D. cuArrayDestroy. cuMemcpy3D. The function may pad the allocation to ensure that corresponding pointers in any given row will continue to meet the alignment requirements for coalescing as the address is updated from row to row. cuMemcpyDtoHAsync. cuMemcpy3DAsync. size_t Height. cuMemcpyAtoHAsync. cuMemcpyAtoA.8 CUresult cuMemAllocPitch (CUdeviceptr ∗ dptr. cuMemcpyHtoDAsync. The pitch returned by cuMemAllocPitch() is guaranteed to work with cuMemcpy2D() under all circumstances. cuMemcpy2DAsync. cuMemcpy2DUnaligned.Requested allocation height in rows ElementSizeBytes . The pitch returned in ∗pPitch by cuMemAllocPitch() is the width in bytes of the allocation. Generated for NVIDIA CUDA Library by Doxygen . cuMemsetD2D8. cuMemcpy2DUnaligned. cuMemsetD32 4. cuMemcpy2D. CUDA_ERROR_INVALID_VALUE.Requested allocation width in bytes Height .2. cuMemGetInfo. cuMemHostAlloc. cuArrayGetDescriptor. The intended usage of pitch is as a separate parameter of the allocation. this is especially true if the application will be performing 2D memory copies between different regions of device memory (whether linear memory or CUDA arrays). cuMemcpyAtoA. cuMemcpyDtoA. cuMemcpyDtoD. 8 or 16 (since coalesced memory transactions are not possible on other data sizes). cuMemcpyAtoH. unsigned int ElementSizeBytes) Allocates at least WidthInBytes ∗ Height bytes of linear memory on the device and returns in ∗dptr a pointer to the allocated memory. CUDA_ERROR_INVALID_CONTEXT. cuMemcpyDtoA. cuMemFreeHost. CUDA_ERROR_DEINITIALIZED.31 Memory Management See also: 197 cuArray3DCreate. cuMemcpyHtoD.4. cuMemsetD16. cuMemsetD2D32. cuMemcpyAtoD. If ElementSizeBytes is smaller than the actual read/write size of a kernel. cuMemcpyDtoH. cuMemFree. cuArrayCreate. cuMemcpy3DAsync. but possibly at reduced speed. cuMemAllocHost. cuMemcpyAtoHAsync. cuMemAlloc. cuMemAllocPitch. cuMemsetD8. cuArrayDestroy. asynchronous launches. See also: cuArray3DCreate. size_t ∗ pPitch. ElementSizeBytes may be 4. The byte alignment of the pitch returned by cuMemAllocPitch() is guaranteed to match or exceed the alignment requirement for texture binding with cuTexRefSetAddress2D(). cuMemcpyAtoH. Parameters: dptr . size_t WidthInBytes. cuMemGetAddressRange. CUDA_ERROR_OUT_OF_MEMORY Note: Note that this function may also return error codes from previous. cuMemcpyAtoD. cuArray3DGetDescriptor. cuMemcpyDtoDAsync. cuMemcpyHtoAAsync. used to compute addresses within the 2D array.Size of largest reads/writes for range Returns: CUDA_SUCCESS.

void *dstHost. cuMemcpyDtoH. unsigned int dstPitch. CUdeviceptr dstDevice. cuMemcpyHtoAAsync. If srcMemoryType is CU_MEMORYTYPE_ARRAY. unsigned int srcPitch. If srcMemoryType is CU_MEMORYTYPE_HOST. Generated for NVIDIA CUDA Library by Doxygen . cuMemHostAlloc. srcArray is ignored. CU_MEMORYTYPE_DEVICE = 0x02. dstY. unsigned int WidthInBytes. srcHost. cuMemsetD32 4. CUdeviceptr srcDevice. } CUDA_MEMCPY2D. cuMemGetAddressRange. unsigned int Height. The CUDA_MEMCPY2D structure is defined as: typedef struct CUDA_MEMCPY2D_st { unsigned int srcXInBytes. CU_MEMORYTYPE_ARRAY = 0x03 } CUmemorytype. srcY. where: • srcMemoryType and dstMemoryType specify the type of memory of the source and destination. CUmemorytype dstMemoryType. CUmemorytype srcMemoryType. cuMemGetInfo. If srcMemoryType is CU_MEMORYTYPE_DEVICE. dstHost and dstPitch specify the (host) base address of the destination data and the bytes per row to apply. cuMemFree. If dstMemoryType is CU_MEMORYTYPE_HOST. respectively. srcDevice and srcPitch are ignored. srcArray is ignored. const void *srcHost. srcArray specifies the handle of the source data. cuMemsetD2D16. cuMemHostGetDevicePointer. cuMemcpyHtoA. cuMemsetD2D32. cuMemcpyDtoHAsync.2. cuMemcpyHtoD. srcHost and srcPitch specify the (host) base address of the source data and the bytes per row to apply. cuMemcpyDtoDAsync. cuMemsetD2D8. srcDevice and srcPitch specify the (device) base address of the source data and the bytes per row to apply. cuMemFreeHost. cuMemsetD16. cuMemcpyHtoDAsync. CUmemorytype_enum is defined as: typedef enum CUmemorytype_enum { CU_MEMORYTYPE_HOST = 0x01. CUarray srcArray.198 Module Documentation cuMemcpyDtoD.31. cuMemsetD8. CUarray dstArray. dstArray is ignored.9 CUresult cuMemcpy2D (const CUDA_MEMCPY2D ∗ pCopy) Perform a 2D memory copy according to the parameters specified in pCopy. unsigned int dstXInBytes.

the starting address is void* Start = (void*)((char*)srcHost+srcY*srcPitch + srcXInBytes). ERROR_INVALID_CONTEXT. For CUDA arrays. the base address is void* dstStart = (void*)((char*)dstHost+dstY*dstPitch + dstXInBytes). For device pointers. dstHost. On intra-device memory copies (device ? device. If dstMemoryType is CU_MEMORYTYPE_ARRAY.Parameters for the memory copy Returns: CUDA_SUCCESS. but may run significantly slower in the cases where cuMemcpy2D() would have returned an error code. • WidthInBytes and Height specify the width (in bytes) and height of the 2D copy being performed. the starting address is CUdeviceptr Start = srcDevice+srcY*srcPitch+srcXInBytes. the starting address is CUdeviceptr dstStart = dstDevice+dstY*dstPitch+dstXInBytes. CUDA_ERROR_DEINITIALIZED. cuMemcpy2D() returns an error if any pitch is greater than the maximum allowed (CU_DEVICE_ATTRIBUTE_MAX_PITCH). srcXInBytes must be evenly divisible by the array element size. CUDA array ? device. cuMemcpy2DUnaligned() does not have this restriction.31 Memory Management 199 If dstMemoryType is CU_MEMORYTYPE_DEVICE. • srcXInBytes and srcY specify the base address of the source data for the copy. For device pointers.4. For CUDA arrays. dstXInBytes must be evenly divisible by the array element size. dstDevice and dstPitch are ignored. cuMemcpy2D() may fail for pitches not computed by cuMemAllocPitch(). Parameters: pCopy . srcPitch must be greater than or equal to WidthInBytes + srcXInBytes. For host pointers. dstDevice and dstPitch specify the (device) base address of the destination data and the bytes per row to apply. For host pointers. dstArray is ignored. CUDA_ERROR_NOT_INITIALIZED. • If specified. • dstXInBytes and dstY specify the base address of the destination data for the copy. cuMemAllocPitch() passes back pitches that always work with cuMemcpy2D(). and dstPitch must be greater than or equal to WidthInBytes + dstXInBytes. CUDA array ? CUDA array). dstArray specifies the handle of the destination data. CUDA_ERROR_INVALID_VALUE Generated for NVIDIA CUDA Library by Doxygen CUDA_- .

10 CUresult cuMemcpy2DAsync (const CUDA_MEMCPY2D ∗ pCopy. CUstream hStream) Perform a 2D memory copy according to the parameters specified in pCopy. srcHost and srcPitch specify the (host) base address of the source data and the bytes per row to apply. cuMemcpyHtoA. cuMemsetD2D8. unsigned int Height. cuMemsetD16. const void *srcHost. void *dstHost. dstY. CUmemorytype_enum is defined as: typedef enum CUmemorytype_enum { CU_MEMORYTYPE_HOST = 0x01. cuMemcpyAtoA. cuMemcpy2DAsync. cuMemAllocPitch. cuArray3DGetDescriptor. srcDevice and srcPitch are ignored. asynchronous launches. If srcMemoryType is CU_MEMORYTYPE_DEVICE. CU_MEMORYTYPE_DEVICE = 0x02. cuMemcpy3D. cuMemcpyDtoD. If srcMemoryType is CU_MEMORYTYPE_ARRAY. If srcMemoryType is CU_MEMORYTYPE_HOST. cuMemcpyDtoHAsync.31. CUdeviceptr srcDevice. respectively. CU_MEMORYTYPE_ARRAY = 0x03 } CUmemorytype. cuMemGetInfo. cuMemAllocHost. cuArrayDestroy. cuMemcpyDtoA. cuMemcpyHtoDAsync. cuMemcpyHtoAAsync. cuMemsetD32 4. unsigned int dstXInBytes. cuMemFreeHost. unsigned int srcPitch. CUdeviceptr dstDevice. } CUDA_MEMCPY2D. CUarray srcArray. cuMemHostAlloc.2. Generated for NVIDIA CUDA Library by Doxygen . cuMemHostGetDevicePointer. unsigned int WidthInBytes. CUarray dstArray. cuMemFree. cuMemGetAddressRange. See also: cuArray3DCreate. cuMemcpyDtoDAsync. cuMemsetD2D32. cuMemcpyAtoD. CUmemorytype dstMemoryType. srcDevice and srcPitch specify the (device) base address of the source data and the bytes per row to apply. cuMemcpy3DAsync. The CUDA_MEMCPY2D structure is defined as: typedef struct CUDA_MEMCPY2D_st { unsigned int srcXInBytes. cuMemcpyAtoH.200 Note: Module Documentation Note that this function may also return error codes from previous. cuMemcpy2DUnaligned. srcArray is ignored. cuArrayCreate. cuMemsetD8. cuArrayGetDescriptor. srcArray is ignored. cuMemsetD2D16. unsigned int dstPitch. srcHost. srcY. CUmemorytype srcMemoryType. cuMemcpyAtoHAsync. cuMemAlloc. where: • srcMemoryType and dstMemoryType specify the type of memory of the source and destination. cuMemcpyHtoD. srcArray specifies the handle of the source data. cuMemcpyDtoH.

For device pointers. For host pointers. CUDA array ? CUDA array). cuMemcpy2D() may fail for pitches not computed by cuMemAllocPitch(). dstDevice and dstPitch specify the (device) base address of the destination data and the bytes per row to apply. For device pointers. and dstHeight must be greater than or equal to Height + dstY. On intra-device memory copies (device ? device. • srcXInBytes and srcY specify the base address of the source data for the copy. and dstPitch must be greater than or equal to WidthInBytes + dstXInBytes. cuMemAllocPitch() passes back pitches that always work with cuMemcpy2D(). the base address is void* dstStart = (void*)((char*)dstHost+dstY*dstPitch + dstXInBytes). dstArray is ignored. the starting address is CUdeviceptr Start = srcDevice+srcY*srcPitch+srcXInBytes. srcPitch must be greater than or equal to WidthInBytes + srcXInBytes. the starting address is CUdeviceptr dstStart = dstDevice+dstY*dstPitch+dstXInBytes. cuMemcpy2DUnaligned() does not have this restriction. dstHost. For CUDA arrays. If dstMemoryType is CU_MEMORYTYPE_ARRAY. dstArray is ignored. dstArray specifies the handle of the destination data. CUDA array ? device. srcHeight must be greater than or equal to Height + srcY. srcXInBytes must be evenly divisible by the array element size. srcPitch must be greater than or equal to WidthInBytes + srcXInBytes. dstDevice and dstPitch are ignored. and dstPitch must be greater than or equal to WidthInBytes + dstXInBytes. but may run significantly slower in the cases where cuMemcpy2D() would have returned an error code. • If specified. For host pointers. • dstXInBytes and dstY specify the base address of the destination data for the copy. cuMemcpy2D() returns an error if any pitch is greater than the maximum allowed (CU_DEVICE_ATTRIBUTE_MAX_PITCH). dstHost and dstPitch specify the (host) base address of the destination data and the bytes per row to apply. • WidthInBytes and Height specify the width (in bytes) and height of the 2D copy being performed. If dstMemoryType is CU_MEMORYTYPE_DEVICE. • If specified. • If specified.4. dstXInBytes must be evenly divisible by the array element size. For CUDA arrays. the starting address is void* Start = (void*)((char*)srcHost+srcY*srcPitch + srcXInBytes).31 Memory Management 201 If dstMemoryType is CU_MEMORYTYPE_HOST. Generated for NVIDIA CUDA Library by Doxygen .

cuMemsetD16Async. unsigned int Height. cuMemcpyHtoD. unsigned int dstPitch. CUdeviceptr srcDevice. cuMemcpyAtoHAsync. cuMemsetD2D16Async. cuMemsetD32.11 CUresult cuMemcpy2DUnaligned (const CUDA_MEMCPY2D ∗ pCopy) Perform a 2D memory copy according to the parameters specified in pCopy. CUmemorytype_enum is defined as: Generated for NVIDIA CUDA Library by Doxygen . CUdeviceptr dstDevice. Parameters: pCopy . CUDA_ERROR_DEINITIALIZED. cuMemcpyDtoHAsync. unsigned int WidthInBytes. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous.Stream identifier Returns: CUDA_SUCCESS.202 Module Documentation cuMemcpy2DAsync() is asynchronous and can optionally be associated to a stream by passing a non-zero hStream argument. cuMemsetD2D16. cuMemcpyAtoA. cuMemGetInfo. cuMemsetD8. cuMemGetAddressRange. cuMemsetD2D8. cuMemcpy3DAsync. cuMemsetD16. cuMemcpyDtoA. CUarray srcArray.Parameters for the memory copy hStream . cuMemHostAlloc. cuMemsetD8Async. cuArrayDestroy. const void *srcHost. cuMemFreeHost.31.2. cuMemAllocHost. } CUDA_MEMCPY2D. cuMemcpyDtoH. srcY. cuMemcpyAtoH. CUmemorytype srcMemoryType. void *dstHost. respectively. cuMemcpyHtoA. ERROR_INVALID_CONTEXT. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed as input. cuMemsetD2D32Async. cuArrayGetDescriptor. cuMemHostGetDevicePointer. cuMemcpyAtoD. asynchronous launches. cuMemsetD32Async CUDA_- 4. See also: cuArray3DCreate. cuMemAllocPitch. The CUDA_MEMCPY2D structure is defined as: typedef struct CUDA_MEMCPY2D_st { unsigned int srcXInBytes. cuMemcpyHtoDAsync. CUmemorytype dstMemoryType. cuMemFree. cuMemcpy2D. CUarray dstArray. cuMemsetD2D32. cuMemcpy3D. cuMemsetD2D8Async. CUDA_ERROR_NOT_INITIALIZED. cuMemcpyDtoDAsync. cuMemcpy2DUnaligned. cuArrayCreate. cuMemcpyDtoD. where: • srcMemoryType and dstMemoryType specify the type of memory of the source and destination. cuMemcpyHtoAAsync. cuMemAlloc. dstY. unsigned int dstXInBytes. unsigned int srcPitch. cuArray3DGetDescriptor.

203 If srcMemoryType is CU_MEMORYTYPE_HOST. dstDevice and dstPitch specify the (device) base address of the destination data and the bytes per row to apply. dstHost and dstPitch specify the (host) base address of the destination data and the bytes per row to apply. For host pointers. If srcMemoryType is CU_MEMORYTYPE_ARRAY. Generated for NVIDIA CUDA Library by Doxygen . the starting address is void* Start = (void*)((char*)srcHost+srcY*srcPitch + srcXInBytes). srcArray is ignored. srcXInBytes must be evenly divisible by the array element size. CU_MEMORYTYPE_DEVICE = 0x02. If dstMemoryType is CU_MEMORYTYPE_HOST. dstHost. srcHost.4. srcArray specifies the handle of the source data. srcDevice and srcPitch are ignored. the starting address is CUdeviceptr Start = srcDevice+srcY*srcPitch+srcXInBytes. For device pointers. For CUDA arrays. For host pointers. the base address is void* dstStart = (void*)((char*)dstHost+dstY*dstPitch + dstXInBytes).31 Memory Management typedef enum CUmemorytype_enum { CU_MEMORYTYPE_HOST = 0x01. dstArray specifies the handle of the destination data. • srcXInBytes and srcY specify the base address of the source data for the copy. dstArray is ignored. the starting address is CUdeviceptr dstStart = dstDevice+dstY*dstPitch+dstXInBytes. For device pointers. dstDevice and dstPitch are ignored. dstArray is ignored. If dstMemoryType is CU_MEMORYTYPE_ARRAY. srcHost and srcPitch specify the (host) base address of the source data and the bytes per row to apply. If dstMemoryType is CU_MEMORYTYPE_DEVICE. • dstXInBytes and dstY specify the base address of the destination data for the copy. srcArray is ignored. If srcMemoryType is CU_MEMORYTYPE_DEVICE. CU_MEMORYTYPE_ARRAY = 0x03 } CUmemorytype. srcDevice and srcPitch specify the (device) base address of the source data and the bytes per row to apply.

cuMemAlloc.204 Module Documentation For CUDA arrays. asynchronous launches. srcZ. cuArrayGetDescriptor. cuMemcpyHtoD. cuMemsetD2D8. srcPitch must be greater than or equal to WidthInBytes + srcXInBytes. cuMemHostAlloc. See also: cuArray3DCreate. cuMemFreeHost. cuMemcpy2D() returns an error if any pitch is greater than the maximum allowed (CU_DEVICE_ATTRIBUTE_MAX_PITCH). cuMemcpyAtoHAsync. cuMemsetD8. cuMemsetD32 CUDA_- 4. cuMemcpyDtoH. • WidthInBytes and Height specify the width (in bytes) and height of the 2D copy being performed. cuMemcpyDtoHAsync. cuMemcpy2DAsync. cuMemGetAddressRange. CUDA array ? CUDA array). cuMemcpyDtoA. dstXInBytes must be evenly divisible by the array element size. cuArrayCreate. srcY. cuMemcpyAtoD. cuMemcpy3D. but may run significantly slower in the cases where cuMemcpy2D() would have returned an error code. cuMemcpy2DUnaligned() does not have this restriction. // ignored when src is array. cuMemHostGetDevicePointer. dstZ. cuMemsetD2D32.2. CUmemorytype dstMemoryType. // ignored when src is array unsigned int srcHeight. cuMemcpy2D. cuMemsetD2D16. cuMemAllocPitch() passes back pitches that always work with cuMemcpy2D(). cuMemcpyDtoD.12 CUresult cuMemcpy3D (const CUDA_MEMCPY3D ∗ pCopy) Perform a 3D memory copy according to the parameters specified in pCopy. const void *srcHost. cuMemFree. cuMemGetInfo. cuMemcpyAtoA. ERROR_INVALID_CONTEXT. Parameters: pCopy . CUarray srcArray. cuMemcpyAtoH. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous.Parameters for the memory copy Returns: CUDA_SUCCESS. cuMemcpyDtoDAsync. cuMemcpyHtoA. unsigned int dstLOD. cuMemcpy3DAsync. The CUDA_MEMCPY3D structure is defined as: typedef struct CUDA_MEMCPY3D_st { unsigned int srcXInBytes. unsigned int srcLOD. and dstPitch must be greater than or equal to WidthInBytes + dstXInBytes. cuMemcpy2D() may fail for pitches not computed by cuMemAllocPitch(). unsigned int srcPitch. • If specified. CUDA array ? device. may be 0 if Depth==1 unsigned int dstXInBytes. cuMemsetD16. cuMemcpyHtoDAsync. cuMemAllocHost. CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_DEINITIALIZED. dstY. CUdeviceptr srcDevice. cuArrayDestroy. cuMemAllocPitch. cuArray3DGetDescriptor. On intra-device memory copies (device ? device. Generated for NVIDIA CUDA Library by Doxygen .31. cuMemcpyHtoAAsync. CUmemorytype srcMemoryType.

dstHost. CUdeviceptr dstDevice. CU_MEMORYTYPE_DEVICE = 0x02. and the height of each 2D slice of the 3D array. dstDevice and dstPitch specify the (device) base address of the destination data. srcDevice. If dstMemoryType is CU_MEMORYTYPE_HOST. srcHost. CU_MEMORYTYPE_ARRAY = 0x03 } CUmemorytype. the bytes per row. • srcXInBytes. and the height of each 2D slice of the 3D array. dstHost and dstPitch specify the (host) base address of the destination data. srcPitch and srcHeight are ignored. and the height of each 2D slice of the 3D array. unsigned int Height. If dstMemoryType is CU_MEMORYTYPE_ARRAY. If srcMemoryType is CU_MEMORYTYPE_HOST. the bytes per row. and the height of each 2D slice of the 3D array. srcY and srcZ specify the base address of the source data for the copy. For host pointers. respectively. srcHost. } CUDA_MEMCPY3D. srcArray specifies the handle of the source data. dstArray is ignored. srcArray is ignored.4. unsigned int dstPitch. srcArray is ignored. dstPitch and dstHeight are ignored. unsigned int Depth. srcPitch and srcHeight specify the (device) base address of the source data. // ignored when dst is array unsigned int dstHeight. srcPitch and srcHeight specify the (host) base address of the source data. CUarray dstArray. the starting address is void* Start = (void*)((char*)srcHost+(srcZ*srcHeight+srcY)*srcPitch + srcXInBytes). If srcMemoryType is CU_MEMORYTYPE_ARRAY.31 Memory Management void *dstHost. dstDevice. For device pointers. the bytes per row. dstArray specifies the handle of the destination data. If dstMemoryType is CU_MEMORYTYPE_DEVICE. the bytes per row. 205 where: • srcMemoryType and dstMemoryType specify the type of memory of the source and destination. dstArray is ignored. // ignored when dst is array. may be 0 if Depth==1 unsigned int WidthInBytes. If srcMemoryType is CU_MEMORYTYPE_DEVICE. srcDevice. CUmemorytype_enum is defined as: typedef enum CUmemorytype_enum { CU_MEMORYTYPE_HOST = 0x01. the starting address is Generated for NVIDIA CUDA Library by Doxygen .

srcXInBytes must be evenly divisible by the array element size. cuMemcpy2DAsync. • If specified. cuMemAllocHost. cuMemFree. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. ERROR_INVALID_CONTEXT. CUDA_ERROR_NOT_INITIALIZED. cuMemsetD16. cuArrayGetDescriptor. cuMemcpy2DUnaligned. cuMemcpy3DAsync. CUDA_ERROR_DEINITIALIZED. Parameters: pCopy . • WidthInBytes. cuMemsetD8. cuMemFreeHost. cuMemGetAddressRange. For CUDA arrays. cuMemcpyAtoD. srcHeight must be greater than or equal to Height + srcY. dstY and dstZ specify the base address of the destination data for the copy. cuMemcpyDtoH. cuArrayCreate.206 Module Documentation CUdeviceptr Start = srcDevice+(srcZ*srcHeight+srcY)*srcPitch+srcXInBytes. cuMemcpyDtoDAsync. cuArray3DGetDescriptor. cuMemcpyDtoA. cuMemAllocPitch. and dstPitch must be greater than or equal to WidthInBytes + dstXInBytes. cuMemcpyDtoHAsync. cuArrayDestroy. cuMemsetD2D32. asynchronous launches.Parameters for the memory copy Returns: CUDA_SUCCESS. For host pointers. cuMemcpyHtoA. height and depth of the 3D copy being performed. For CUDA arrays. cuMemcpyAtoA. cuMemcpyHtoAAsync. cuMemsetD2D8. cuMemcpyHtoDAsync. cuMemcpyAtoHAsync. the starting address is CUdeviceptr dstStart = dstDevice+(dstZ*dstHeight+dstY)*dstPitch+dstXInBytes. Height and Depth specify the width (in bytes). srcPitch must be greater than or equal to WidthInBytes + srcXInBytes. cuMemcpy2D. The srcLOD and dstLOD members of the CUDA_MEMCPY3D structure must be set to 0. the base address is void* dstStart = (void*)((char*)dstHost+(dstZ*dstHeight+dstY)*dstPitch + dstXInBytes). dstXInBytes must be evenly divisible by the array element size. See also: cuArray3DCreate. cuMemcpyAtoH. cuMemsetD32 CUDA_- Generated for NVIDIA CUDA Library by Doxygen . cuMemHostAlloc. For device pointers. cuMemcpyDtoD. • If specified. cuMemcpyHtoD. cuMemcpy3D() returns an error if any pitch is greater than the maximum allowed (CU_DEVICE_ATTRIBUTE_MAX_PITCH). cuMemsetD2D16. cuMemHostGetDevicePointer. cuMemAlloc. and dstHeight must be greater than or equal to Height + dstY. • dstXInBytes. cuMemGetInfo.

dstZ. // ignored when dst is array unsigned int dstHeight. CUmemorytype dstMemoryType. and the height of each 2D slice of the 3D array. dstArray is ignored. } CUDA_MEMCPY3D. srcY. dstHost and dstPitch specify the (host) base address of the destination data. may be 0 if Depth==1 unsigned int dstXInBytes. srcPitch and srcHeight specify the (host) base address of the source data. srcDevice. unsigned int Height. the bytes per row. unsigned int Depth. // ignored when dst is array.4.31 Memory Management 4. If srcMemoryType is CU_MEMORYTYPE_ARRAY. srcHost. may be 0 if Depth==1 unsigned int WidthInBytes. const void *srcHost. srcPitch and srcHeight are ignored. // ignored when src is array.13 CUresult cuMemcpy3DAsync (const CUDA_MEMCPY3D ∗ pCopy. CUdeviceptr dstDevice. If srcMemoryType is CU_MEMORYTYPE_HOST. CUarray srcArray. dstArray is ignored. srcArray is ignored. CUarray dstArray. unsigned int srcPitch. CU_MEMORYTYPE_ARRAY = 0x03 } CUmemorytype. where: • srcMemoryType and dstMemoryType specify the type of memory of the source and destination. CUmemorytype srcMemoryType. srcArray specifies the handle of the source data. srcZ. If dstMemoryType is CU_MEMORYTYPE_DEVICE. dstY. // ignored when src is array unsigned int srcHeight. Generated for NVIDIA CUDA Library by Doxygen . the bytes per row. unsigned int srcLOD. srcHost. and the height of each 2D slice of the 3D array. CUstream hStream) 207 Perform a 3D memory copy according to the parameters specified in pCopy. srcDevice. unsigned int dstPitch.31. CUmemorytype_enum is defined as: typedef enum CUmemorytype_enum { CU_MEMORYTYPE_HOST = 0x01. The CUDA_MEMCPY3D structure is defined as: typedef struct CUDA_MEMCPY3D_st { unsigned int srcXInBytes. void *dstHost. CUdeviceptr srcDevice.2. and the height of each 2D slice of the 3D array. If dstMemoryType is CU_MEMORYTYPE_HOST. unsigned int dstLOD. the bytes per row. If srcMemoryType is CU_MEMORYTYPE_DEVICE. respectively. srcArray is ignored. the bytes per row. dstDevice and dstPitch specify the (device) base address of the destination data. CU_MEMORYTYPE_DEVICE = 0x02. srcPitch and srcHeight specify the (device) base address of the source data. and the height of each 2D slice of the 3D array.

For device pointers. The srcLOD and dstLOD members of the CUDA_MEMCPY3D structure must be set to 0. • dstXInBytes. For CUDA arrays. • srcXInBytes. dstHost. the starting address is CUdeviceptr Start = srcDevice+(srcZ*srcHeight+srcY)*srcPitch+srcXInBytes. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed as input. For host pointers. For host pointers. and dstPitch must be greater than or equal to WidthInBytes + dstXInBytes. srcXInBytes must be evenly divisible by the array element size. srcPitch must be greater than or equal to WidthInBytes + srcXInBytes. Parameters: pCopy . height and depth of the 3D copy being performed. dstArray specifies the handle of the destination data. srcY and srcZ specify the base address of the source data for the copy. the starting address is CUdeviceptr dstStart = dstDevice+(dstZ*dstHeight+dstY)*dstPitch+dstXInBytes. dstPitch and dstHeight are ignored. For CUDA arrays. the starting address is void* Start = (void*)((char*)srcHost+(srcZ*srcHeight+srcY)*srcPitch + srcXInBytes). cuMemcpy3D() returns an error if any pitch is greater than the maximum allowed (CU_DEVICE_ATTRIBUTE_MAX_PITCH). Height and Depth specify the width (in bytes). and dstHeight must be greater than or equal to Height + dstY. For device pointers. • If specified. • If specified.208 Module Documentation If dstMemoryType is CU_MEMORYTYPE_ARRAY. cuMemcpy3DAsync() is asynchronous and can optionally be associated to a stream by passing a non-zero hStream argument. dstXInBytes must be evenly divisible by the array element size.Stream identifier Generated for NVIDIA CUDA Library by Doxygen . • WidthInBytes. dstDevice. the base address is void* dstStart = (void*)((char*)dstHost+(dstZ*dstHeight+dstY)*dstPitch + dstXInBytes). dstY and dstZ specify the base address of the destination data for the copy.Parameters for the memory copy hStream . srcHeight must be greater than or equal to Height + srcY.

cuMemcpyHtoA. cuMemHostAlloc. cuMemcpy2DUnaligned. cuMemFreeHost. cuMemcpy2DAsync. cuMemcpyAtoD. cuMemsetD2D8Async. CUarray srcArray. cuMemcpyHtoAAsync. dstArray and srcArray specify the handles of the destination and source CUDA arrays for the copy.31 Memory Management Returns: CUDA_SUCCESS. cuArrayGetDescriptor. cuMemsetD2D8. cuMemsetD16. Parameters: dstArray . cuMemAllocHost. cuMemcpyDtoA. respectively. cuMemsetD2D16. cuMemcpyDtoH. See also: 209 CUDA_- cuArray3DCreate. cuMemcpyDtoDAsync. cuArray3DGetDescriptor.14 CUresult cuMemcpyAtoA (CUarray dstArray.Source array srcOffset . cuMemcpyHtoD. asynchronous launches. size_t srcOffset. cuMemcpyAtoA. cuMemcpyHtoDAsync. CUDA_ERROR_DEINITIALIZED. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. See also: cuArray3DCreate. cuMemsetD16. cuMemsetD32Async 4. cuMemcpy3DAsync. and count must be evenly divisible by that size. cuMemcpyAtoH. ERROR_INVALID_CONTEXT. cuArrayGetDescriptor. cuMemcpyDtoDAsync. cuMemcpy3D. cuMemcpyDtoHAsync. cuMemGetAddressRange. cuMemsetD2D32. cuMemsetD16Async. cuArrayCreate. cuMemGetAddressRange.Offset in bytes of source array ByteCount . cuMemsetD2D16Async. cuMemGetInfo. size_t dstOffset. cuMemcpy2DUnaligned. cuMemFree. cuArrayDestroy. cuMemcpy3D. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. cuMemcpyDtoD. cuMemcpyDtoHAsync. cuArrayCreate. cuMemsetD2D16. cuMemAllocPitch. cuMemsetD32 CUDA_- Generated for NVIDIA CUDA Library by Doxygen . cuMemsetD32. cuMemcpyHtoDAsync. dstOffset and srcOffset specify the destination and source offsets in bytes into the CUDA arrays. CUDA_ERROR_NOT_INITIALIZED. cuMemcpyHtoAAsync. cuMemcpyDtoD. cuMemAllocPitch. cuMemcpyAtoH. cuArray3DGetDescriptor. cuMemcpyAtoD. cuMemsetD8. cuMemGetInfo. asynchronous launches. cuMemsetD2D8. cuMemFree. cuMemsetD2D32Async. cuMemcpy2D.31.4. cuMemAllocHost. cuMemcpyDtoA. cuMemsetD2D32. cuArrayDestroy. cuMemcpyHtoD. size_t ByteCount) Copies from one 1D CUDA array to another. ByteCount is the number of bytes to be copied. cuMemcpyAtoHAsync.Size of memory copy in bytes Returns: CUDA_SUCCESS. cuMemsetD8Async. cuMemHostAlloc.2. cuMemHostGetDevicePointer. cuMemcpy2DAsync. cuMemcpyDtoH. cuMemAlloc. CUDA_ERROR_NOT_INITIALIZED. cuMemsetD8. cuMemcpyHtoA. The size of the elements in the CUDA arrays need not be the same format. cuMemcpy2D. CUDA_ERROR_DEINITIALIZED. cuMemcpyAtoHAsync.Offset in bytes of destination array srcArray . cuMemFreeHost. cuMemHostGetDevicePointer. cuMemAlloc. ERROR_INVALID_CONTEXT. but the elements must be the same size.Destination array dstOffset .

Offset in bytes of source array ByteCount . ByteCount specifies the number of bytes to copy and must be evenly divisible by the array element size. CUDA_ERROR_NOT_INITIALIZED. ERROR_INVALID_CONTEXT. cuMemGetAddressRange. cuMemsetD2D16.Destination device pointer srcArray . CUarray srcArray. cuMemAllocHost. cuMemHostAlloc. cuArray3DGetDescriptor. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. cuMemFree. cuMemAllocPitch.Destination device pointer srcArray . cuMemFreeHost. cuMemcpy3DAsync. CUDA_ERROR_DEINITIALIZED. srcArray and srcOffset specify the CUDA array handle and starting offset in bytes of the source data. size_t srcOffset. Generated for NVIDIA CUDA Library by Doxygen CUDA_- . cuMemsetD16. cuMemGetInfo. cuMemcpyDtoH.2. Parameters: dstHost . cuMemsetD32 CUDA_- 4. CUarray srcArray.Source array srcOffset . cuMemcpy2D. cuMemcpyDtoD. cuArrayDestroy. cuMemcpyDtoA. cuMemcpyHtoDAsync. cuMemsetD2D32. cuMemcpyAtoA. cuMemsetD8. cuMemcpyAtoHAsync. cuMemcpyHtoD. See also: cuArray3DCreate.210 4. dstDevice specifies the base pointer of the destination and must be naturally aligned with the CUDA array elements.Offset in bytes of source array ByteCount . cuArrayGetDescriptor. size_t srcOffset. cuMemcpy3D.16 CUresult cuMemcpyAtoH (void ∗ dstHost.31. CUDA_ERROR_DEINITIALIZED. cuMemcpyDtoHAsync. srcArray and srcOffset specify the CUDA array handle and the offset in bytes into the array where the copy is to begin. ERROR_INVALID_CONTEXT. cuMemcpy2DUnaligned. dstHost specifies the base pointer of the destination. asynchronous launches. ByteCount specifies the number of bytes to copy.31. cuMemcpy2DAsync.Size of memory copy in bytes Returns: CUDA_SUCCESS. cuMemcpyHtoA.2. CUDA_ERROR_NOT_INITIALIZED. Parameters: dstDevice .Source array srcOffset . cuMemcpyHtoAAsync.15 Module Documentation CUresult cuMemcpyAtoD (CUdeviceptr dstDevice. cuMemcpyDtoDAsync. cuMemsetD2D8.Size of memory copy in bytes Returns: CUDA_SUCCESS. asynchronous launches. size_t ByteCount) Copies from one 1D CUDA array to device memory. size_t ByteCount) Copies from one 1D CUDA array to host memory. cuMemAlloc. cuMemcpyAtoH. cuMemHostGetDevicePointer. cuArrayCreate.

4.31 Memory Management See also:

211

cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32

4.31.2.17

CUresult cuMemcpyAtoHAsync (void ∗ dstHost, CUarray srcArray, size_t srcOffset, size_t ByteCount, CUstream hStream)

Copies from one 1D CUDA array to host memory. dstHost specifies the base pointer of the destination. srcArray and srcOffset specify the CUDA array handle and starting offset in bytes of the source data. ByteCount specifies the number of bytes to copy. cuMemcpyAtoHAsync() is asynchronous and can optionally be associated to a stream by passing a non-zero stream argument. It only works on page-locked host memory and returns an error if a pointer to pageable memory is passed as input. Parameters: dstHost - Destination pointer srcArray - Source array srcOffset - Offset in bytes of source array ByteCount - Size of memory copy in bytes hStream - Stream identifier Returns: CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous, asynchronous launches. See also: cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D8Async, cuMemsetD2D16, cuMemsetD2D16Async, cuMemsetD2D32, cuMemsetD2D32Async, cuMemsetD8, cuMemsetD8Async, cuMemsetD16, cuMemsetD16Async, cuMemsetD32, cuMemsetD32Async CUDA_-

4.31.2.18

CUresult cuMemcpyDtoA (CUarray dstArray, size_t dstOffset, CUdeviceptr srcDevice, size_t ByteCount)

Copies from device memory to a 1D CUDA array. dstArray and dstOffset specify the CUDA array handle and starting index of the destination data. srcDevice specifies the base pointer of the source. ByteCount specifies the number of bytes to copy.
Generated for NVIDIA CUDA Library by Doxygen

212 Parameters: dstArray - Destination array dstOffset - Offset in bytes of destination array srcDevice - Source device pointer ByteCount - Size of memory copy in bytes Returns:

Module Documentation

CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous, asynchronous launches. See also:

CUDA_-

cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32

4.31.2.19

CUresult cuMemcpyDtoD (CUdeviceptr dstDevice, CUdeviceptr srcDevice, size_t ByteCount)

Copies from device memory to device memory. dstDevice and srcDevice are the base pointers of the destination and source, respectively. ByteCount specifies the number of bytes to copy. Note that this function is asynchronous. Parameters: dstDevice - Destination device pointer srcDevice - Source device pointer ByteCount - Size of memory copy in bytes Returns: CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous, asynchronous launches. See also: cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32 CUDA_-

Generated for NVIDIA CUDA Library by Doxygen

4.31 Memory Management 4.31.2.20

213

CUresult cuMemcpyDtoDAsync (CUdeviceptr dstDevice, CUdeviceptr srcDevice, size_t ByteCount, CUstream hStream)

Copies from device memory to device memory. dstDevice and srcDevice are the base pointers of the destination and source, respectively. ByteCount specifies the number of bytes to copy. Note that this function is asynchronous and can optionally be associated to a stream by passing a non-zero hStream argument Parameters: dstDevice - Destination device pointer srcDevice - Source device pointer ByteCount - Size of memory copy in bytes hStream - Stream identifier Returns: CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous, asynchronous launches. See also: cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D8Async, cuMemsetD2D16, cuMemsetD2D16Async, cuMemsetD2D32, cuMemsetD2D32Async, cuMemsetD8, cuMemsetD8Async, cuMemsetD16, cuMemsetD16Async, cuMemsetD32, cuMemsetD32Async CUDA_-

4.31.2.21

CUresult cuMemcpyDtoH (void ∗ dstHost, CUdeviceptr srcDevice, size_t ByteCount)

Copies from device to host memory. dstHost and srcDevice specify the base pointers of the destination and source, respectively. ByteCount specifies the number of bytes to copy. Note that this function is synchronous. Parameters: dstHost - Destination host pointer srcDevice - Source device pointer ByteCount - Size of memory copy in bytes Returns: CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous, asynchronous launches.
Generated for NVIDIA CUDA Library by Doxygen

CUDA_-

214 See also:

Module Documentation

cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32

4.31.2.22

CUresult cuMemcpyDtoHAsync (void ∗ dstHost, CUdeviceptr srcDevice, size_t ByteCount, CUstream hStream)

Copies from device to host memory. dstHost and srcDevice specify the base pointers of the destination and source, respectively. ByteCount specifies the number of bytes to copy. cuMemcpyDtoHAsync() is asynchronous and can optionally be associated to a stream by passing a non-zero hStream argument. It only works on page-locked memory and returns an error if a pointer to pageable memory is passed as input. Parameters: dstHost - Destination host pointer srcDevice - Source device pointer ByteCount - Size of memory copy in bytes hStream - Stream identifier Returns: CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous, asynchronous launches. See also: cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D8Async, cuMemsetD2D16, cuMemsetD2D16Async, cuMemsetD2D32, cuMemsetD2D32Async, cuMemsetD8, cuMemsetD8Async, cuMemsetD16, cuMemsetD16Async, cuMemsetD32, cuMemsetD32Async CUDA_-

4.31.2.23

CUresult cuMemcpyHtoA (CUarray dstArray, size_t dstOffset, const void ∗ srcHost, size_t ByteCount)

Copies from host memory to a 1D CUDA array. dstArray and dstOffset specify the CUDA array handle and starting offset in bytes of the destination data. pSrc specifies the base address of the source. ByteCount specifies the number of bytes to copy.
Generated for NVIDIA CUDA Library by Doxygen

4.31 Memory Management Parameters: dstArray - Destination array dstOffset - Offset in bytes of destination array srcHost - Source host pointer ByteCount - Size of memory copy in bytes Returns: CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous, asynchronous launches. See also:

215

CUDA_-

cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32

4.31.2.24

CUresult cuMemcpyHtoAAsync (CUarray dstArray, size_t dstOffset, const void ∗ srcHost, size_t ByteCount, CUstream hStream)

Copies from host memory to a 1D CUDA array. dstArray and dstOffset specify the CUDA array handle and starting offset in bytes of the destination data. srcHost specifies the base address of the source. ByteCount specifies the number of bytes to copy. cuMemcpyHtoAAsync() is asynchronous and can optionally be associated to a stream by passing a non-zero hStream argument. It only works on page-locked memory and returns an error if a pointer to pageable memory is passed as input. Parameters: dstArray - Destination array dstOffset - Offset in bytes of destination array srcHost - Source host pointer ByteCount - Size of memory copy in bytes hStream - Stream identifier Returns: CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous, asynchronous launches.
Generated for NVIDIA CUDA Library by Doxygen

CUDA_-

216 See also:

Module Documentation

cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D8Async, cuMemsetD2D16, cuMemsetD2D16Async, cuMemsetD2D32, cuMemsetD2D32Async, cuMemsetD8, cuMemsetD8Async, cuMemsetD16, cuMemsetD16Async, cuMemsetD32, cuMemsetD32Async

4.31.2.25

CUresult cuMemcpyHtoD (CUdeviceptr dstDevice, const void ∗ srcHost, size_t ByteCount)

Copies from host memory to device memory. dstDevice and srcHost are the base addresses of the destination and source, respectively. ByteCount specifies the number of bytes to copy. Note that this function is synchronous. Parameters: dstDevice - Destination device pointer srcHost - Source host pointer ByteCount - Size of memory copy in bytes Returns: CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous, asynchronous launches. See also: cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32 CUDA_-

4.31.2.26

CUresult cuMemcpyHtoDAsync (CUdeviceptr dstDevice, const void ∗ srcHost, size_t ByteCount, CUstream hStream)

Copies from host memory to device memory. dstDevice and srcHost are the base addresses of the destination and source, respectively. ByteCount specifies the number of bytes to copy. cuMemcpyHtoDAsync() is asynchronous and can optionally be associated to a stream by passing a non-zero hStream argument. It only works on page-locked memory and returns an error if a pointer to pageable memory is passed as input. Parameters: dstDevice - Destination device pointer
Generated for NVIDIA CUDA Library by Doxygen

4.31 Memory Management srcHost - Source host pointer ByteCount - Size of memory copy in bytes hStream - Stream identifier Returns: CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous, asynchronous launches. See also:

217

CUDA_-

cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D8Async, cuMemsetD2D16, cuMemsetD2D16Async, cuMemsetD2D32, cuMemsetD2D32Async, cuMemsetD8, cuMemsetD8Async, cuMemsetD16, cuMemsetD16Async, cuMemsetD32, cuMemsetD32Async

4.31.2.27

CUresult cuMemFree (CUdeviceptr dptr)

Frees the memory space pointed to by dptr, which must have been returned by a previous call to cuMemAlloc() or cuMemAllocPitch(). Parameters: dptr - Pointer to memory to free Returns: CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous, asynchronous launches. See also: cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32 CUDA_-

Generated for NVIDIA CUDA Library by Doxygen

218 4.31.2.28 CUresult cuMemFreeHost (void ∗ p)

Module Documentation

Frees the memory space pointed to by p, which must have been returned by a previous call to cuMemAllocHost(). Parameters: p - Pointer to memory to free Returns: CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous, asynchronous launches. See also: cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemGetAddressRange, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32 CUDA_-

4.31.2.29

CUresult cuMemGetAddressRange (CUdeviceptr ∗ pbase, size_t ∗ psize, CUdeviceptr dptr)

Returns the base address in ∗pbase and size in ∗psize of the allocation by cuMemAlloc() or cuMemAllocPitch() that contains the input pointer dptr. Both parameters pbase and psize are optional. If one of them is NULL, it is ignored. Parameters: pbase - Returned base address psize - Returned size of device memory allocation dptr - Device pointer to query Returns: CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous, asynchronous launches. See also: cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetInfo, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32 CUDA_-

Generated for NVIDIA CUDA Library by Doxygen

4.31 Memory Management 4.31.2.30 CUresult cuMemGetInfo (size_t ∗ free, size_t ∗ total)

219

Returns in ∗free and ∗total respectively, the free and total amount of memory available for allocation by the CUDA context, in bytes. Parameters: free - Returned free memory in bytes total - Returned total memory in bytes Returns: CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous, asynchronous launches. See also: cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemHostAlloc, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32 CUDA_-

4.31.2.31

CUresult cuMemHostAlloc (void ∗∗ pp, size_t bytesize, unsigned int Flags)

Allocates bytesize bytes of host memory that is page-locked and accessible to the device. The driver tracks the virtual memory ranges allocated with this function and automatically accelerates calls to functions such as cuMemcpyHtoD(). Since the memory can be accessed directly by the device, it can be read or written with much higher bandwidth than pageable memory obtained with functions such as malloc(). Allocating excessive amounts of pinned memory may degrade system performance, since it reduces the amount of memory available to the system for paging. As a result, this function is best used sparingly to allocate staging areas for data exchange between host and device. The Flags parameter enables different options to be specified that affect the allocation, as follows. • CU_MEMHOSTALLOC_PORTABLE: The memory returned by this call will be considered as pinned memory by all CUDA contexts, not just the one that performed the allocation. • CU_MEMHOSTALLOC_DEVICEMAP: Maps the allocation into the CUDA address space. The device pointer to the memory may be obtained by calling cuMemHostGetDevicePointer(). This feature is available only on GPUs with compute capability greater than or equal to 1.1. • CU_MEMHOSTALLOC_WRITECOMBINED: Allocates the memory as write-combined (WC). WC memory can be transferred across the PCI Express bus more quickly on some system configurations, but cannot be read efficiently by most CPUs. WC memory is a good option for buffers that will be written by the CPU and read by the GPU via mapped pinned memory or host->device transfers. All of these flags are orthogonal to one another: a developer may allocate memory that is portable, mapped and/or write-combined with no restrictions.
Generated for NVIDIA CUDA Library by Doxygen

220

Module Documentation

The CUDA context must have been created with the CU_CTX_MAP_HOST flag in order for the CU_MEMHOSTALLOC_MAPPED flag to have any effect. The CU_MEMHOSTALLOC_MAPPED flag may be specified on CUDA contexts for devices that do not support mapped pinned memory. The failure is deferred to cuMemHostGetDevicePointer() because the memory may be mapped into other CUDA contexts via the CU_MEMHOSTALLOC_PORTABLE flag. The memory allocated by this function must be freed with cuMemFreeHost(). Parameters: pp - Returned host pointer to page-locked memory bytesize - Requested allocation size in bytes Flags - Flags for allocation request Returns: CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, CUDA_ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE, CUDA_ERROR_OUT_OF_MEMORY Note: Note that this function may also return error codes from previous, asynchronous launches. See also: cuArray3DCreate, cuArray3DGetDescriptor, cuArrayCreate, cuArrayDestroy, cuArrayGetDescriptor, cuMemAlloc, cuMemAllocHost, cuMemAllocPitch, cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoA, cuMemcpyAtoD, cuMemcpyAtoH, cuMemcpyAtoHAsync, cuMemcpyDtoA, cuMemcpyDtoD, cuMemcpyDtoDAsync, cuMemcpyDtoH, cuMemcpyDtoHAsync, cuMemcpyHtoA, cuMemcpyHtoAAsync, cuMemcpyHtoD, cuMemcpyHtoDAsync, cuMemFree, cuMemFreeHost, cuMemGetAddressRange, cuMemGetInfo, cuMemHostGetDevicePointer, cuMemsetD2D8, cuMemsetD2D16, cuMemsetD2D32, cuMemsetD8, cuMemsetD16, cuMemsetD32

4.31.2.32

CUresult cuMemHostGetDevicePointer (CUdeviceptr ∗ pdptr, void ∗ p, unsigned int Flags)

Passes back the device pointer pdptr corresponding to the mapped, pinned host buffer p allocated by cuMemHostAlloc. cuMemHostGetDevicePointer() will fail if the CU_MEMALLOCHOST_DEVICEMAP flag was not specified at the time the memory was allocated, or if the function is called on a GPU that does not support mapped pinned memory. Flags provides for future releases. For now, it must be set to 0. Parameters: pdptr - Returned device pointer p - Host pointer Flags - Options (must be 0) Returns: CUDA_SUCCESS, CUDA_ERROR_DEINITIALIZED, CUDA_ERROR_NOT_INITIALIZED, ERROR_INVALID_CONTEXT, CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous, asynchronous launches.
Generated for NVIDIA CUDA Library by Doxygen

CUDA_-

cuMemsetD2D8.34 CUresult cuMemsetD16 (CUdeviceptr dstDevice. unsigned short us.Value to set N . cuMemGetInfo. cuMemFreeHost.4. cuMemcpyAtoH. asynchronous launches. CUDA_ERROR_DEINITIALIZED.2. cuArrayDestroy. cuMemsetD16. size_t N) Sets the memory range of N 16-bit values to the specified value us. cuMemcpyDtoH. cuMemcpyHtoDAsync. cuMemcpyHtoD.Destination device pointer us . cuMemFree. See also: cuMemAllocHost. cuMemcpy3D. Parameters: dstDevice . cuMemGetAddressRange. cuMemcpy2D. Generated for NVIDIA CUDA Library by Doxygen CUDA_- . cuMemAllocPitch. cuArrayGetDescriptor. ERROR_INVALID_CONTEXT. cuMemcpyHtoA. cuMemcpyDtoD.31 Memory Management See also: 221 cuArray3DCreate.31.Host pointer Returns: CUDA_SUCCESS.31. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous.Number of elements Returns: CUDA_SUCCESS. void ∗ p) Passes back the flags pFlags that were specified when allocating the pinned host buffer p allocated by cuMemHostAlloc. cuArrayCreate.Returned flags word p . CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. asynchronous launches. cuMemcpy2DUnaligned. cuMemcpy2DAsync. cuMemsetD32 4.2. cuMemcpyAtoHAsync. cuMemcpyAtoA. cuMemcpyDtoDAsync. cuMemsetD2D32. cuMemHostAlloc. ERROR_INVALID_CONTEXT. cuMemsetD2D16.33 CUresult cuMemHostGetFlags (unsigned int ∗ pFlags. cuMemHostGetFlags() will fail if the pointer does not reside in an allocation performed by cuMemAllocHost() or cuMemHostAlloc(). CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_DEINITIALIZED. Parameters: pFlags . CUDA_ERROR_NOT_INITIALIZED. cuMemAllocHost. cuMemsetD8. cuMemcpyDtoA. cuMemAlloc. cuMemcpyDtoHAsync. cuMemcpyHtoAAsync. cuMemcpy3DAsync. cuMemHostAlloc CUDA_- 4. cuMemcpyAtoD. cuArray3DGetDescriptor.

Destination device pointer us . cuMemcpyAtoD. cuMemcpyDtoH. cuMemsetD2D8. cuMemcpy2DUnaligned. cuMemFree. cuMemsetD2D32Async. cuMemsetD16Async. cuMemsetD8. cuMemcpy2DAsync. cuMemsetD32. cuMemHostGetDevicePointer. cuMemsetD2D8. cuMemGetAddressRange.36 CUresult cuMemsetD2D16 (CUdeviceptr dstDevice. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. cuMemsetD2D8Async. cuMemGetAddressRange. cuMemcpy2DUnaligned. cuMemcpy3D. cuMemsetD8. cuMemsetD8Async. Height specifies the number of rows to set. cuMemsetD2D32Async. cuMemcpy2D. cuMemHostAlloc. unsigned short us. cuMemcpyDtoA.Stream identifier Returns: CUDA_SUCCESS. cuMemsetD8Async. cuArray3DGetDescriptor. cuMemsetD32Async 4. cuMemcpyDtoHAsync. cuMemcpyHtoDAsync. cuMemsetD2D32. ERROR_INVALID_CONTEXT. cuMemcpy3D. cuMemcpyHtoD. cuMemcpyAtoHAsync.Value to set N . cuMemAllocPitch. cuMemsetD2D16. CUDA_ERROR_DEINITIALIZED. cuMemcpyAtoA. cuMemcpyHtoA. CUstream hStream) Sets the memory range of N 16-bit values to the specified value us. cuMemAlloc. CUDA_ERROR_NOT_INITIALIZED. cuMemAlloc. cuArrayCreate. cuMemcpyAtoH. and dstPitch specifies the number of bytes between each row. size_t Width. cuArrayDestroy.Number of elements hStream . cuMemGetInfo. cuMemcpyHtoDAsync. cuMemcpyHtoAAsync. Parameters: dstDevice . cuMemsetD2D16Async. cuMemcpyDtoD. cuMemcpy2D. asynchronous launches. cuArrayDestroy. cuMemsetD16. cuMemHostGetDevicePointer. cuMemcpyDtoHAsync.2. cuMemcpyDtoA. cuArrayGetDescriptor.222 See also: Module Documentation cuArray3DCreate. cuMemsetD2D8Async. See also: cuArray3DCreate. cuMemAllocPitch. size_t N. cuMemcpyHtoA. cuMemFreeHost. cuMemsetD32. This function performs fastest when the pitch is one that has been passed back by cuMemAllocPitch(). cuMemcpyDtoD. cuMemcpy3DAsync. cuMemcpy2DAsync. cuArrayGetDescriptor. cuMemsetD2D16. cuMemcpyDtoDAsync. cuMemAllocHost. cuMemcpyAtoHAsync. cuMemsetD16Async() is asynchronous and can optionally be associated to a stream by passing a non-zero stream argument.35 CUresult cuMemsetD16Async (CUdeviceptr dstDevice. cuMemsetD32Async CUDA_- 4.2. Generated for NVIDIA CUDA Library by Doxygen . cuMemcpyAtoA. cuMemsetD2D32. cuMemFreeHost. cuMemcpyAtoH. cuMemGetInfo. unsigned short us. cuMemAllocHost. cuArrayCreate. size_t Height) Sets the 2D memory range of Width 16-bit values to the specified value us. cuMemcpyHtoAAsync.31. cuMemcpyDtoH. cuMemcpyDtoDAsync. cuMemHostAlloc. cuMemsetD2D16Async. cuArray3DGetDescriptor.31. cuMemcpy3DAsync. cuMemcpyAtoD. size_t dstPitch. cuMemFree. cuMemcpyHtoD.

cuMemGetAddressRange. ERROR_INVALID_CONTEXT. cuMemcpyHtoAAsync. cuMemsetD32Async 4. cuMemsetD2D8Async.Number of rows Returns: CUDA_SUCCESS. cuMemcpy2DAsync.Stream identifier Returns: CUDA_SUCCESS. unsigned short us. cuArrayDestroy. cuMemcpyHtoA. cuArrayCreate. cuMemcpyDtoDAsync. cuMemAlloc. cuMemsetD8Async. CUDA_ERROR_DEINITIALIZED.4. size_t Height. cuMemsetD8.Destination device pointer dstPitch .Pitch of destination device pointer us . cuMemcpy2D. size_t dstPitch. cuMemsetD2D8. cuMemcpy3DAsync. cuMemsetD2D16Async.Value to set Width . cuMemsetD16Async. cuMemcpyHtoD. cuArrayGetDescriptor. Generated for NVIDIA CUDA Library by Doxygen CUDA_- .31 Memory Management Parameters: dstDevice .Destination device pointer dstPitch . cuMemcpyAtoD. Height specifies the number of rows to set. cuMemcpy3D. cuMemcpyDtoHAsync. cuMemcpyAtoA. This function performs fastest when the pitch is one that has been passed back by cuMemAllocPitch().2.Value to set Width . size_t Width. cuMemHostAlloc. cuMemcpyAtoH. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. cuArray3DGetDescriptor. cuMemFree.Pitch of destination device pointer us .Width of row Height . cuMemHostGetDevicePointer. CUDA_ERROR_NOT_INITIALIZED. asynchronous launches. cuMemAllocPitch. cuMemsetD32. cuMemAllocHost. cuMemcpyDtoD. and dstPitch specifies the number of bytes between each row. asynchronous launches.37 CUresult cuMemsetD2D16Async (CUdeviceptr dstDevice.Number of rows hStream . cuMemcpy2DUnaligned. cuMemFreeHost. Parameters: dstDevice . cuMemcpyDtoH. CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_DEINITIALIZED.Width of row Height . CUstream hStream) Sets the 2D memory range of Width 16-bit values to the specified value us. cuMemGetInfo. cuMemcpyDtoA.31. ERROR_INVALID_CONTEXT. cuMemsetD2D32Async. cuMemcpyHtoDAsync. cuMemsetD16. cuMemsetD2D16Async() is asynchronous and can optionally be associated to a stream by passing a non-zero stream argument. cuMemsetD2D32. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. cuMemcpyAtoHAsync. See also: 223 CUDA_- cuArray3DCreate.

size_t dstPitch. cuMemFree. cuMemHostGetDevicePointer. cuMemcpyAtoD. cuMemcpyAtoA. cuMemcpyHtoD. cuMemsetD2D8Async.224 See also: Module Documentation cuArray3DCreate.31. asynchronous launches. cuMemcpyHtoDAsync. cuMemsetD8Async.Pitch of destination device pointer ui . ERROR_INVALID_CONTEXT. cuMemsetD8. CUstream hStream) Sets the 2D memory range of Width 32-bit values to the specified value ui. cuMemsetD16Async. cuMemsetD2D8. cuMemcpyDtoH. cuMemAllocHost. cuMemcpy2DAsync. size_t dstPitch. This function performs fastest when the pitch is one that has been passed back by cuMemAllocPitch().39 CUresult cuMemsetD2D32Async (CUdeviceptr dstDevice. CUDA_ERROR_NOT_INITIALIZED. cuMemcpyAtoD. Height specifies the number of rows to set. See also: cuArray3DCreate. cuMemcpyAtoHAsync. cuMemcpy3DAsync. cuMemcpy2DUnaligned. cuMemcpyAtoH. cuMemcpyHtoAAsync. cuMemFree. Generated for NVIDIA CUDA Library by Doxygen . cuMemcpyDtoD. cuMemcpy3DAsync. cuArray3DGetDescriptor.Value to set Width . cuMemcpyHtoDAsync. cuMemcpyDtoDAsync. size_t Width.2. cuMemcpy2DUnaligned. unsigned int ui. cuMemcpyHtoD. size_t Width. cuMemFreeHost.Width of row Height . cuArrayDestroy. cuMemsetD16. cuMemcpy3D. cuMemAlloc.Destination device pointer dstPitch . cuMemGetInfo. cuMemsetD2D16Async. cuMemsetD2D32Async() is asynchronous and can optionally be associated to a stream by passing a non-zero stream argument. cuMemsetD16. and dstPitch specifies the number of bytes between each row. cuArrayGetDescriptor. cuArrayGetDescriptor. Height specifies the number of rows to set. cuMemcpy3D. size_t Height) Sets the 2D memory range of Width 32-bit values to the specified value ui. Parameters: dstDevice . cuMemsetD2D8. and dstPitch specifies the number of bytes between each row. cuMemcpy2D. cuMemGetAddressRange. cuMemsetD16Async. unsigned int ui. cuMemAllocPitch. cuMemcpyDtoH. cuMemcpyDtoA. cuMemsetD8Async. CUDA_ERROR_DEINITIALIZED. cuMemHostGetDevicePointer. cuMemcpyAtoH. cuMemsetD32Async 4. cuMemsetD8. This function performs fastest when the pitch is one that has been passed back by cuMemAllocPitch(). cuMemsetD2D8Async. cuMemcpyHtoA. cuMemsetD32. cuArrayDestroy.Number of rows Returns: CUDA_SUCCESS. cuMemcpyDtoA. cuMemcpyHtoA. cuMemAllocHost. cuMemcpy2D. cuMemGetInfo. cuArray3DGetDescriptor. cuArrayCreate.38 CUresult cuMemsetD2D32 (CUdeviceptr dstDevice. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. cuMemsetD2D16. cuMemcpyDtoDAsync. size_t Height. cuMemcpyDtoHAsync. cuMemcpyAtoHAsync. cuMemcpy2DAsync. cuMemAllocPitch. cuMemGetAddressRange. cuMemsetD2D32Async. cuMemHostAlloc. cuMemsetD32Async CUDA_- 4. cuMemcpyAtoA. cuMemcpyDtoHAsync. cuMemcpyHtoAAsync. cuMemHostAlloc. cuMemsetD2D16. cuMemcpyDtoD.31. cuMemsetD2D32. cuArrayCreate. cuMemFreeHost.2. cuMemAlloc. cuMemsetD32. cuMemsetD2D32Async.

40 CUresult cuMemsetD2D8 (CUdeviceptr dstDevice.Destination device pointer dstPitch . cuMemcpyHtoD. and dstPitch specifies the number of bytes between each row. cuMemGetAddressRange. cuMemcpyDtoHAsync. cuMemcpyDtoA. ERROR_INVALID_CONTEXT. cuMemcpyAtoD.Width of row Height . cuMemsetD2D8.Pitch of destination device pointer ui . CUDA_ERROR_NOT_INITIALIZED.Value to set Width . cuMemsetD2D32. cuMemHostGetDevicePointer.2. cuMemcpyHtoA. cuMemsetD8. size_t dstPitch. CUDA_ERROR_DEINITIALIZED. cuMemHostAlloc. cuMemFree. cuMemsetD2D8Async. cuMemsetD2D16Async. cuMemAlloc. cuMemcpyAtoHAsync. cuMemcpy2DUnaligned.Width of row Height . cuMemcpyDtoH. cuMemcpyHtoAAsync.4. cuMemAllocPitch. cuMemsetD16. cuMemcpyHtoDAsync. cuMemFreeHost. Height specifies the number of rows to set. cuMemcpyAtoA. cuMemcpyDtoD. cuMemsetD2D16.Pitch of destination device pointer uc . This function performs fastest when the pitch is one that has been passed back by cuMemAllocPitch(). cuMemcpy3D. cuMemcpy2D. cuArrayCreate.Stream identifier Returns: CUDA_SUCCESS. Parameters: dstDevice . cuMemcpyDtoDAsync. size_t Width. cuMemcpyAtoH. CUDA_ERROR_DEINITIALIZED.Value to set Width . cuMemcpy2DAsync. size_t Height) Sets the 2D memory range of Width 8-bit values to the specified value uc. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. CUDA_ERROR_NOT_INITIALIZED.Number of rows hStream . cuArrayDestroy. See also: 225 CUDA_- cuArray3DCreate. cuMemsetD32Async 4. unsigned char uc.Number of rows Returns: CUDA_SUCCESS.31. cuArray3DGetDescriptor. cuMemsetD32. cuMemGetInfo.31 Memory Management Parameters: dstDevice . cuMemsetD16Async. cuArrayGetDescriptor. ERROR_INVALID_CONTEXT.Destination device pointer dstPitch . cuMemAllocHost. Generated for NVIDIA CUDA Library by Doxygen CUDA_- . CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. cuMemsetD8Async. asynchronous launches. asynchronous launches. cuMemcpy3DAsync.

cuMemcpy2DAsync. cuMemsetD8Async. cuMemFreeHost. cuMemGetAddressRange. cuMemcpyAtoA. cuMemcpyHtoAAsync. cuMemcpyAtoH. cuMemcpyDtoD. cuMemsetD2D32. cuMemsetD2D16Async. size_t N) Sets the memory range of N 32-bit values to the specified value ui. cuMemHostAlloc. cuMemcpy3D. cuMemcpy2DAsync. cuMemcpyHtoAAsync. cuMemsetD2D16. cuArrayCreate. cuMemcpyAtoD.31. cuMemcpyDtoDAsync. cuMemAlloc. cuMemsetD32. ERROR_INVALID_CONTEXT. cuMemsetD32Async CUDA_- 4. cuMemsetD16. size_t Height. unsigned int ui. cuMemsetD2D32Async.31. Height specifies the number of rows to set.Width of row Height . CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. cuArray3DGetDescriptor. cuMemcpyDtoHAsync. cuMemcpy2D. cuMemsetD8Async. cuMemsetD8. cuMemsetD32Async 4. cuArrayDestroy. cuMemAllocPitch. cuMemGetInfo. cuMemcpyHtoD. cuMemHostGetDevicePointer. cuMemsetD16. cuMemGetAddressRange. cuMemcpyDtoD. size_t Width. cuMemcpyAtoHAsync. cuMemsetD8. cuMemcpyDtoHAsync. cuMemsetD2D16Async. cuMemsetD2D8Async. See also: cuArray3DCreate. size_t dstPitch. cuMemcpyAtoHAsync. cuMemAllocHost.41 CUresult cuMemsetD2D8Async (CUdeviceptr dstDevice. Parameters: dstDevice . cuMemcpy2DUnaligned. cuMemcpy2D. cuMemcpyHtoA. cuMemHostGetDevicePointer. cuMemcpyHtoD.226 See also: Module Documentation cuArray3DCreate. cuMemsetD16Async. cuMemcpyDtoDAsync. asynchronous launches. cuArrayCreate. cuMemcpyAtoA. cuMemsetD2D32. cuArray3DGetDescriptor.Value to set Width .42 CUresult cuMemsetD32 (CUdeviceptr dstDevice. cuMemcpyAtoH. cuMemFreeHost. cuMemFree. cuMemcpyDtoH. cuMemcpyDtoA. CUstream hStream) Sets the 2D memory range of Width 8-bit values to the specified value uc. cuMemcpyHtoDAsync. cuMemAllocHost. cuMemsetD2D32Async.Pitch of destination device pointer uc . CUDA_ERROR_NOT_INITIALIZED. cuMemsetD16Async. cuMemsetD2D8. cuMemcpy3D. cuMemcpy3DAsync. cuMemsetD32.2. Generated for NVIDIA CUDA Library by Doxygen . cuMemFree. This function performs fastest when the pitch is one that has been passed back by cuMemAllocPitch(). unsigned char uc. cuMemcpy3DAsync. and dstPitch specifies the number of bytes between each row. cuMemcpy2DUnaligned. cuMemGetInfo. cuMemcpyDtoA.Number of rows hStream . cuMemcpyHtoDAsync. cuMemsetD2D8Async() is asynchronous and can optionally be associated to a stream by passing a non-zero stream argument. cuArrayDestroy. CUDA_ERROR_DEINITIALIZED.2. cuMemcpyDtoH.Destination device pointer dstPitch . cuArrayGetDescriptor. cuArrayGetDescriptor.Stream identifier Returns: CUDA_SUCCESS. cuMemsetD2D16. cuMemcpyAtoD. cuMemAllocPitch. cuMemcpyHtoA. cuMemAlloc. cuMemHostAlloc.

cuMemcpyHtoDAsync. cuMemAlloc. cuMemcpyAtoH. cuMemsetD2D8. cuMemsetD8. cuArray3DGetDescriptor. cuMemcpyDtoA. cuMemcpyAtoH.Value to set N . cuMemcpyAtoD. cuMemAlloc. CUDA_ERROR_DEINITIALIZED.31. cuMemcpy2D. ERROR_INVALID_CONTEXT. cuMemcpy2DUnaligned. cuMemcpy2DAsync. Parameters: dstDevice . cuMemcpy3DAsync. cuArrayGetDescriptor. unsigned int ui. cuMemcpyDtoDAsync. cuMemAllocPitch. cuMemsetD8Async. CUstream hStream) Sets the memory range of N 32-bit values to the specified value ui. cuMemcpyAtoA. cuMemcpyHtoAAsync. cuMemHostAlloc. cuMemcpyDtoH. cuMemcpyDtoD. cuMemFree. cuMemcpyHtoD. cuMemcpyHtoA. cuArrayCreate. cuMemcpyAtoD. cuMemcpy2D. cuMemcpyDtoD.Number of elements hStream . cuMemAllocHost.43 CUresult cuMemsetD32Async (CUdeviceptr dstDevice. cuMemcpyAtoA.Destination device pointer ui . cuMemsetD16Async. cuMemcpyDtoA.Destination device pointer ui . cuMemcpyDtoHAsync. cuMemcpy3D. cuMemcpyAtoHAsync. cuMemsetD32Async 4. cuMemGetInfo. cuMemcpyDtoH. asynchronous launches. ERROR_INVALID_CONTEXT. CUDA_ERROR_NOT_INITIALIZED. cuMemFreeHost. cuMemcpy2DAsync. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. Generated for NVIDIA CUDA Library by Doxygen CUDA_- . CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. size_t N. cuMemsetD2D32. cuMemsetD2D8Async. cuMemAllocPitch. CUDA_ERROR_NOT_INITIALIZED. cuMemcpy2DUnaligned.31 Memory Management Parameters: dstDevice . CUDA_ERROR_DEINITIALIZED. cuMemHostGetDevicePointer. cuArray3DGetDescriptor. cuMemcpyAtoHAsync.2. cuMemAllocHost. cuMemsetD2D32Async. cuMemcpy3DAsync.Number of elements Returns: CUDA_SUCCESS. cuMemsetD32Async() is asynchronous and can optionally be associated to a stream by passing a non-zero stream argument. cuMemGetAddressRange. asynchronous launches. cuMemcpyDtoHAsync. cuMemsetD2D16Async.Stream identifier Returns: CUDA_SUCCESS. See also: 227 CUDA_- cuArray3DCreate.Value to set N . cuMemsetD2D16. cuArrayDestroy. cuMemcpyDtoDAsync. cuArrayCreate. See also: cuArray3DCreate. cuArrayDestroy. cuArrayGetDescriptor. cuMemsetD16. cuMemcpy3D.4.

cuMemcpyHtoDAsync. cuMemcpyAtoA. cuMemcpyHtoAAsync. cuMemsetD2D32Async. cuMemsetD2D32Async.228 Module Documentation cuMemcpyHtoA. cuMemGetInfo. CUDA_ERROR_DEINITIALIZED. cuMemcpyDtoD. cuMemsetD2D32. CUDA_ERROR_INVALID_VALUE CUDA_- Generated for NVIDIA CUDA Library by Doxygen . cuMemHostAlloc. cuMemcpy2DAsync. cuMemFreeHost. cuMemcpyHtoD. cuMemsetD2D8. cuMemsetD8Async. cuMemAllocPitch. cuMemGetAddressRange. cuMemcpyHtoAAsync. cuMemcpyHtoA. cuMemcpy2DUnaligned. cuMemsetD16.Destination device pointer uc . cuMemFreeHost.Value to set N . cuMemsetD2D8. cuMemcpy3D. cuMemHostAlloc. cuMemsetD2D16.Stream identifier Returns: CUDA_SUCCESS.31.2. CUDA_ERROR_NOT_INITIALIZED. cuMemsetD2D32.Number of elements hStream .44 CUresult cuMemsetD8 (CUdeviceptr dstDevice. cuMemcpyAtoHAsync. cuMemAlloc. cuMemsetD32. cuMemcpyHtoDAsync.31. cuMemsetD2D16Async.Destination device pointer uc . cuMemsetD8Async. cuMemAllocHost. CUstream hStream) Sets the memory range of N 8-bit values to the specified value uc. cuArray3DGetDescriptor. cuMemGetInfo. cuMemGetAddressRange. cuMemsetD16. cuMemsetD16Async. cuMemcpyAtoD. Parameters: dstDevice . cuArrayCreate. cuArrayGetDescriptor. unsigned char uc. cuMemsetD2D16Async. size_t N) Sets the memory range of N 8-bit values to the specified value uc. cuMemsetD32Async CUDA_- 4.Number of elements Returns: CUDA_SUCCESS. cuMemsetD2D16.Value to set N . cuMemFree. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. ERROR_INVALID_CONTEXT. cuMemcpyDtoHAsync. ERROR_INVALID_CONTEXT. cuMemcpyDtoA. cuMemcpy3DAsync. Parameters: dstDevice . size_t N.2. cuMemFree. cuMemsetD2D8Async. cuMemsetD2D8Async. cuArrayDestroy. cuMemcpyDtoDAsync. cuMemsetD8Async() is asynchronous and can optionally be associated to a stream by passing a non-zero stream argument. cuMemcpyDtoH.45 CUresult cuMemsetD8Async (CUdeviceptr dstDevice. cuMemcpy2D. cuMemcpyHtoD. cuMemHostGetDevicePointer. cuMemsetD32 4. cuMemHostGetDevicePointer. See also: cuArray3DCreate. cuMemsetD8. cuMemsetD16Async. cuMemcpyAtoH. asynchronous launches. unsigned char uc. CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_DEINITIALIZED.

cuMemcpyDtoHAsync. cuMemAllocPitch. cuMemAllocHost. cuMemFreeHost. cuArrayGetDescriptor. cuMemsetD8. cuMemsetD16. cuMemHostGetDevicePointer. cuMemsetD2D16.31 Memory Management Note: Note that this function may also return error codes from previous. cuMemsetD32Async Generated for NVIDIA CUDA Library by Doxygen . cuMemsetD16Async. cuMemcpy3D. cuMemcpyAtoA. cuMemcpyAtoHAsync. cuMemGetAddressRange. cuMemAlloc. asynchronous launches. See also: 229 cuArray3DCreate.4. cuMemcpyHtoD. cuMemcpyDtoD. cuMemsetD2D8Async. cuMemcpyDtoH. cuArrayDestroy. cuArrayCreate. cuMemsetD2D16Async. cuMemGetInfo. cuMemcpy2DUnaligned. cuMemcpyHtoAAsync. cuMemcpyAtoD. cuMemsetD2D32Async. cuMemcpyHtoDAsync. cuMemcpyAtoH. cuMemcpyDtoDAsync. cuMemFree. cuMemsetD32. cuMemcpy3DAsync. cuMemcpyHtoA. cuMemHostAlloc. cuMemcpy2DAsync. cuMemsetD2D8. cuMemcpy2D. cuMemsetD2D32. cuArray3DGetDescriptor. cuMemcpyDtoA.

asynchronous launches. CUDA_ERROR_DEINITIALIZED. unsigned int Flags) Make a compute stream wait on an event.2. See also: cuStreamDestroy. Parameters: phStream . 4.230 Module Documentation 4. CUDA_ERROR_OUT_OF_MEMORY Note: Note that this function may also return error codes from previous. CUDA_ERROR_INVALID_CONTEXT.32 Stream Management Functions • CUresult cuStreamCreate (CUstream ∗phStream. • CUresult cuStreamQuery (CUstream hStream) Determine status of a compute stream.32.1 Function Documentation CUresult cuStreamCreate (CUstream ∗ phStream. cuStreamQuery. • CUresult cuStreamWaitEvent (CUstream hStream. CUDA_ERROR_NOT_INITIALIZED. • CUresult cuStreamDestroy (CUstream hStream) Destroys a stream.Returned newly created stream Flags .Parameters for stream creation (must be 0) Returns: CUDA_SUCCESS. unsigned int Flags) Creates a stream and returns a handle in phStream. cuStreamSynchronize Generated for NVIDIA CUDA Library by Doxygen . • CUresult cuStreamSynchronize (CUstream hStream) Wait until a stream’s tasks are completed. CUevent hEvent. unsigned int Flags) Create a stream.32. 4.1 Detailed Description This section describes the stream management functions of the low-level CUDA driver application programming interface. Flags is required to be 0.32. cuStreamWaitEvent. CUDA_ERROR_INVALID_VALUE.2 4.

Parameters: hStream . cuStreamWaitEvent. CUDA_ERROR_NOT_INITIALIZED.32.4.Stream to query status of Returns: CUDA_SUCCESS.Stream to wait for Returns: CUDA_SUCCESS. the CPU thread will block until the stream is finished with all of its tasks. Parameters: hStream .32. CUDA_ERROR_INVALID_CONTEXT. cuStreamSynchronize 4. CUDA_ERROR_DEINITIALIZED. If the context was created with the CU_CTX_BLOCKING_SYNC flag. CUDA_ERROR_DEINITIALIZED.4 CUresult cuStreamSynchronize (CUstream hStream) Waits until the device has completed all operations in the stream specified by hStream. CUDA_ERROR_DEINITIALIZED.32.2. asynchronous launches.3 CUresult cuStreamQuery (CUstream hStream) Returns CUDA_SUCCESS if all operations in the stream specified by hStream have completed.2. CUDA_ERROR_NOT_READY Note: Note that this function may also return error codes from previous. See also: cuStreamCreate. cuStreamSynchronize CUDA_- 4. CUDA_ERROR_NOT_INITIALIZED. ERROR_INVALID_CONTEXT. Parameters: hStream .2. or CUDA_ERROR_NOT_READY if not. ERROR_INVALID_CONTEXT. CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. asynchronous launches. CUDA_ERROR_INVALID_HANDLE. See also: cuStreamCreate.32 Stream Management 4. cuStreamWaitEvent.2 CUresult cuStreamDestroy (CUstream hStream) 231 Destroys the stream specified by hStream. cuStreamDestroy.Stream to destroy Returns: CUDA_SUCCESS. CUDA_ERROR_INVALID_HANDLE Generated for NVIDIA CUDA Library by Doxygen CUDA_- . cuStreamQuery.

Parameters: hStream . any functions (including cuEventRecord() and cuEventDestroy()) may be called on hEvent again. unsigned int Flags) Makes all future work submitted to hStream wait until hEvent reports completion before beginning execution.5 CUresult cuStreamWaitEvent (CUstream hStream. asynchronous launches. This effectively creates a barrier for all future work submitted to the context. See also: cuStreamCreate. If hStream is 0 (the NULL stream) any future work submitted in any stream will wait for hEvent to complete before beginning execution.2. ERROR_INVALID_CONTEXT. The stream hStream will wait only for the completion of the most recent host call to cuEventRecord() on hEvent. cuStreamQuery 4. this call acts as if the record has already completed.Event to wait on (may not be NULL) Flags . CUevent hEvent. cuEventRecord. CUDA_ERROR_NOT_INITIALIZED. This synchronization will be performed efficiently on the device. CUDA_ERROR_DEINITIALIZED. cuStreamQuery. Note: Note that this function may also return error codes from previous.Stream to wait hEvent . Once this call has returned. CUDA_ERROR_INVALID_HANDLE. cuStreamDestroy CUDA_- Generated for NVIDIA CUDA Library by Doxygen . See also: cuStreamCreate. cuStreamSynchronize. and so is a functional no-op.Parameters for the operation (must be 0) Returns: CUDA_SUCCESS. asynchronous launches. cuStreamDestroy.32. cuStreamWaitEvent. and the subsequent calls will not have any effect on hStream. If cuEventRecord() has not been called on hEvent.232 Note: Module Documentation Note that this function may also return error codes from previous.

• CUresult cuEventDestroy (CUevent hEvent) Destroys an event.33 Event Management 233 4. • CUresult cuEventElapsedTime (float ∗pMilliseconds.1 Detailed Description This section describes the event management functions of the low-level CUDA driver application programming interface.2 4. CUDA_ERROR_OUT_OF_MEMORY Generated for NVIDIA CUDA Library by Doxygen . CUstream hStream) Records an event. Valid flags include: • CU_EVENT_DEFAULT: Default event creation flag.33. 4. • CUresult cuEventRecord (CUevent hEvent.Event creation flags Returns: CUDA_SUCCESS. CUDA_ERROR_DEINITIALIZED. • CUresult cuEventQuery (CUevent hEvent) Queries an event’s status. unsigned int Flags) Creates an event ∗phEvent with the flags specified via Flags. CUDA_ERROR_INVALID_VALUE. A CPU thread that uses cuEventSynchronize() to wait on an event created with this flag will block until the event has actually been recorded.1 Function Documentation CUresult cuEventCreate (CUevent ∗ phEvent. CUevent hEnd) Computes the elapsed time between two events. • CUresult cuEventSynchronize (CUevent hEvent) Waits for an event to complete. Parameters: phEvent .Returns newly created event Flags . • CU_EVENT_BLOCKING_SYNC: Specifies that the created event should use blocking synchronization.2.33.33. • CU_EVENT_DISABLE_TIMING: Specifies that the created event does not need to record timing data. CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_INVALID_CONTEXT. 4.4.33 Event Management Functions • CUresult cuEventCreate (CUevent ∗phEvent. CUevent hStart. unsigned int Flags) Creates an event. Events created with this flag specified and the CU_EVENT_BLOCKING_SYNC flag not specified will provide the best performance when used with cuStreamWaitEvent() and cuEventQuery().

CUDA_ERROR_NOT_READY Generated for NVIDIA CUDA Library by Doxygen . then this function will return CUDA_ERROR_INVALID_HANDLE. CUDA_ERROR_INVALID_HANDLE. If either event was last recorded in a non-NULL stream. CUDA_ERROR_NOT_INITIALIZED.Ending event Returns: CUDA_SUCCESS. CUevent hStart. CUDA_ERROR_INVALID_CONTEXT.234 Note: Module Documentation Note that this function may also return error codes from previous. cuEventQuery. cuEventQuery. cuEventElapsedTime 4.3 CUresult cuEventElapsedTime (float ∗ pMilliseconds. cuEventSynchronize.Event to destroy Returns: CUDA_SUCCESS. See also: cuEventCreate. cuEventSynchronize. cuEventRecord.33. cuEventDestroy.2 CUresult cuEventDestroy (CUevent hEvent) Destroys the event specified by hEvent. Parameters: hEvent . CUDA_ERROR_NOT_INITIALIZED. Any number of other different stream operations could execute in between the two measured events.Starting event hEnd .Time between hStart and hEnd in ms hStart .33. If either event was created with the CU_EVENT_DISABLE_TIMING flag. CUDA_ERROR_DEINITIALIZED. cuEventQuery() would return CUDA_ERROR_NOT_READY on at least one of the events). See also: cuEventRecord. cuEventElapsedTime CUDA_- 4. ERROR_INVALID_CONTEXT. CUDA_ERROR_NOT_READY is returned.5 microseconds). CUevent hEnd) Computes the elapsed time between two events (in milliseconds with a resolution of around 0.2. asynchronous launches. CUDA_ERROR_INVALID_HANDLE Note: Note that this function may also return error codes from previous. thus altering the timing in a significant way.2. CUDA_ERROR_DEINITIALIZED. asynchronous launches. This happens because the cuEventRecord() operation takes place asynchronously and there is no guarantee that the measured latency is actually just between the two events. If cuEventRecord() has been called on both events but one or both of them has not yet been completed (that is. If cuEventRecord() has not been called on either event then CUDA_ERROR_INVALID_HANDLE is returned. the resulting time may be greater than expected (even if both used the same stream handle). Parameters: pMilliseconds .

33.Stream to record event for Returns: CUDA_SUCCESS.2. cuEventDestroy 235 4. If cuEventRecord() has previously been called on hEvent. Parameters: hEvent . Any subsequent calls which examine the status of hEvent will only examine the completion of this most recent call to cuEventRecord().33. See also: cuEventCreate.Event to query Returns: CUDA_SUCCESS. CUDA_ERROR_INVALID_CONTEXT. CUstream hStream) Records an event. then this call will overwrite any existing state in hEvent. cuEventElapsedTime 4.4 CUresult cuEventQuery (CUevent hEvent) Query the status of all device work preceding the most recent call to cuEventRecord() (in the appropriate compute streams. CUDA_ERROR_NOT_INITIALIZED. as specified by the arguments to cuEventRecord()). cuEventRecord.Event to record hStream . CUDA_ERROR_NOT_READY Note: Note that this function may also return error codes from previous. cuEventSynchronize.33 Event Management Note: Note that this function may also return error codes from previous. Parameters: hEvent . If this work has not yet been completed by the device then CUDA_ERROR_NOT_READY is returned. cuEventQuery. CUDA_ERROR_INVALID_CONTEXT. CUDA_ERROR_INVALID_VALUE Generated for NVIDIA CUDA Library by Doxygen . then CUDA_SUCCESS is returned. CUDA_ERROR_NOT_INITIALIZED. cuEventQuery and/or cuEventSynchronize() must be used to determine when the event has actually been recorded. otherwise. or if cuEventRecord() has not been called on hEvent. If this work has successfully been completed by the device. the event is recorded after all preceding operations in hStream have been completed. CUDA_ERROR_DEINITIALIZED.5 CUresult cuEventRecord (CUevent hEvent. Since operation is asynchronous. CUDA_ERROR_INVALID_HANDLE. cuEventSynchronize. If hStream is non-zero. See also: cuEventCreate. cuEventRecord.2. asynchronous launches. asynchronous launches. CUDA_ERROR_INVALID_HANDLE. cuEventDestroy. CUDA_ERROR_DEINITIALIZED. it is recorded after all preceding operations in the CUDA context have been completed.4. CUDA_ERROR_INVALID_VALUE.

cuEventElapsedTime CUDA_- Generated for NVIDIA CUDA Library by Doxygen . cuEventRecord. as specified by the arguments to cuEventRecord()). cuEventQuery. cuStreamWaitEvent. If the CU_EVENT_BLOCKING_SYNC flag has not been set.2. See also: cuEventCreate. CUDA_ERROR_INVALID_HANDLE Note: Note that this function may also return error codes from previous. cuEventSynchronize. cuEventDestroy. cuEventDestroy. cuEventElapsedTime 4.Event to wait for Returns: CUDA_SUCCESS.6 CUresult cuEventSynchronize (CUevent hEvent) Wait until the completion of all device work preceding the most recent call to cuEventRecord() (in the appropriate compute streams. asynchronous launches. CUDA_ERROR_NOT_INITIALIZED. See also: cuEventCreate.33. If cuEventRecord() has not been called on hEvent. CUDA_ERROR_DEINITIALIZED. Parameters: hEvent . then the CPU thread will busy-wait until the event has been completed by the device. ERROR_INVALID_CONTEXT.236 Note: Module Documentation Note that this function may also return error codes from previous. CUDA_SUCCESS is returned immediately. cuEventQuery. Waiting for an event that was created with the CU_EVENT_BLOCKING_SYNC flag will cause the calling CPU thread to block until the event has been completed by the device. asynchronous launches.

int offset. • CUresult cuLaunchGridAsync (CUfunction f.34. CUfunc_cache config) Sets the preferred cache configuration for a device function. unsigned int numbytes) Adds arbitrary data to the function’s argument list. int offset.1 Detailed Description This section describes the execution control functions of the low-level CUDA driver application programming interface. • CUresult cuParamSetf (CUfunction hfunc. • CUresult cuParamSeti (CUfunction hfunc. void ∗ptr. int y. • CUresult cuLaunch (CUfunction f) Launches a CUDA function. int grid_height) Launches a CUDA function. int z) Sets the block-dimensions for the function. • CUresult cuFuncSetBlockShape (CUfunction hfunc. float value) Adds a floating-point parameter to the function’s argument list. CUfunction_attribute attrib. unsigned int value) Adds an integer parameter to the function’s argument list. • CUresult cuParamSetSize (CUfunction hfunc. CUfunction hfunc) Returns information about a function. unsigned int bytes) Sets the dynamic shared-memory size for the function. 4. int grid_width. int grid_height. • CUresult cuFuncSetCacheConfig (CUfunction hfunc. • CUresult cuFuncSetSharedSize (CUfunction hfunc. unsigned int numbytes) Sets the parameter size for the function. int grid_width.4. Generated for NVIDIA CUDA Library by Doxygen . • CUresult cuParamSetv (CUfunction hfunc. CUstream hStream) Launches a CUDA function. int offset.34 Execution Control Modules • Execution Control [DEPRECATED] Functions • CUresult cuFuncGetAttribute (int ∗pi. • CUresult cuLaunchGrid (CUfunction f. int x.34 Execution Control 237 4.

cuParamSetf. See also: cuFuncSetBlockShape. This number depends on both the function and the device on which the function is currently loaded.Attribute requested hfunc .34. • CU_FUNC_ATTRIBUTE_NUM_REGS: The number of registers used by each thread of this function. so a binary version 1. • CU_FUNC_ATTRIBUTE_PTX_VERSION: The PTX virtual architecture version for which the function was compiled. CUDA_ERROR_INVALID_HANDLE. This value is the major binary version ∗ 10 + the minor binary version. • CU_FUNC_ATTRIBUTE_CONST_SIZE_BYTES: The size in bytes of user-allocated constant memory required by this function. CUDA_ERROR_DEINITIALIZED. This value is the major PTX version ∗ 10 + the minor PTX version. Note that this may return the undefined value of 0 for cubins compiled prior to CUDA 3.3 function would return the value 13. CUDA_ERROR_INVALID_CONTEXT.1 Function Documentation CUresult cuFuncGetAttribute (int ∗ pi. cuLaunchGridAsync Generated for NVIDIA CUDA Library by Doxygen . cuFuncSetSharedSize. beyond which a launch of the function would fail.Returned attribute value attrib . so a PTX version 1. CUDA_ERROR_NOT_INITIALIZED. cuParamSetv. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. cuFuncSetCacheConfig. • CU_FUNC_ATTRIBUTE_BINARY_VERSION: The binary architecture version for which the function was compiled. This does not include dynamically-allocated shared memory requested by the user at runtime.3 function would return the value 13. cuLaunchGrid. Note that this will return a value of 10 for legacy cubins that do not have a properly-encoded binary architecture version. • CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES: The size in bytes of local memory used by each thread of this function.34. The supported attributes are: • CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK: The maximum number of threads per block. Parameters: pi .238 Module Documentation 4.Function to query attribute of Returns: CUDA_SUCCESS. cuParamSeti. cuParamSetSize. CUfunction hfunc) Returns in ∗pi the integer value of the attribute attrib on the kernel given by hfunc.2. • CU_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES: The size in bytes of statically-allocated shared memory per block required by this function. asynchronous launches.0.2 4. CUfunction_attribute attrib. cuLaunch.

cuParamSeti. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous.2 CUresult cuFuncSetBlockShape (CUfunction hfunc.34 Execution Control 4. In that case. cuFuncGetAttribute.2. y. CUDA_ERROR_INVALID_CONTEXT. This is only a preference.Kernel to specify dimensions of x . CUDA_ERROR_INVALID_HANDLE. int x. ERROR_INVALID_CONTEXT Generated for NVIDIA CUDA Library by Doxygen CUDA_ERROR_NOT_INITIALIZED. The driver will use the requested configuration if possible. This setting does nothing on devices where the size of the L1 cache and shared memory are fixed. Any context-wide preference set via cuCtxSetCacheConfig() will be overridden by this per-function setting unless the per-function setting is CU_FUNC_CACHE_PREFER_NONE. the current context-wide setting will be used.Requested cache configuration Returns: CUDA_SUCCESS. CUDA_- . cuFuncSetCacheConfig. Parameters: hfunc . asynchronous launches. CUDA_ERROR_DEINITIALIZED.X dimension y . cuParamSetSize.Z dimension Returns: CUDA_SUCCESS. cuLaunchGrid. The supported cache configurations are: • CU_FUNC_CACHE_PREFER_NONE: no preference for shared memory or L1 (default) • CU_FUNC_CACHE_PREFER_SHARED: prefer larger shared memory and smaller L1 cache • CU_FUNC_CACHE_PREFER_L1: prefer larger L1 cache and smaller shared memory Parameters: hfunc . this sets through config the preferred cache configuration for the device function hfunc. CUDA_ERROR_DEINITIALIZED. cuLaunchGridAsync 4. int z) 239 Specifies the x.4. cuLaunch.34.34.Y dimension z . and z dimensions of the thread blocks that are created when the kernel given by hfunc is launched. CUfunc_cache config) On devices where the L1 cache and shared memory use the same hardware resources. cuParamSetf. cuParamSetv. See also: cuFuncSetSharedSize. int y. Launching a kernel with a different preference than the most recent preference setting may insert a device-side synchronization point. but it is free to choose a different configuration if required to execute hfunc. CUDA_ERROR_NOT_INITIALIZED.Kernel to configure cache for config .3 CUresult cuFuncSetCacheConfig (CUfunction hfunc.2.

5 CUresult cuLaunch (CUfunction f) Invokes the kernel f on a 1 x 1 x 1 grid of blocks. CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES. CUDA_ERROR_DEINITIALIZED. Parameters: hfunc .2. CUDA_ERROR_LAUNCH_FAILED.240 Note: Module Documentation Note that this function may also return error codes from previous. CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING. cuParamSetSize. cuParamSetSize. cuParamSetf. cuFuncGetAttribute. See also: cuFuncSetBlockShape. cuLaunchGrid. asynchronous launches. cuParamSetv. cuLaunch. See also: cuFuncSetBlockShape. cuLaunchGridAsync Generated for NVIDIA CUDA Library by Doxygen . cuFuncSetBlockShape. cuParamSeti. cuParamSetf. CUDA_ERROR_NOT_INITIALIZED. cuLaunchGridAsync 4. cuCtxSetCacheConfig. cuLaunch. cuParamSetf. cuLaunchGrid. cuParamSeti.Dynamic shared-memory size per thread in bytes Returns: CUDA_SUCCESS. CUDA_ERROR_NOT_INITIALIZED. cuLaunchGridAsync 4. cuLaunchGrid. cuParamSetv. CUDA_ERROR_INVALID_CONTEXT.2. cuFuncSetSharedSize. CUDA_ERROR_INVALID_CONTEXT. CUDA_ERROR_LAUNCH_TIMEOUT. cuFuncGetAttribute. See also: cuCtxGetCacheConfig. cuParamSetv. CUDA_ERROR_INVALID_VALUE. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous.34.34. CUDA_ERROR_DEINITIALIZED. cuFuncGetAttribute.Kernel to launch Returns: CUDA_SUCCESS. unsigned int bytes) Sets through bytes the amount of dynamic shared memory that will be available to each thread block when the kernel given by hfunc is launched. asynchronous launches.Kernel to specify dynamic shared-memory size for bytes . asynchronous launches. Parameters: f . cuFuncSetCacheConfig.4 CUresult cuFuncSetSharedSize (CUfunction hfunc. cuParamSeti. CUDA_ERROR_SHARED_OBJECT_INIT_FAILED Note: Note that this function may also return error codes from previous. CUDA_ERROR_INVALID_HANDLE. cuParamSetSize. The block contains the number of threads specified by a previous call to cuFuncSetBlockShape().

int grid_height. cuFuncGetAttribute. cuParamSetf.7 CUresult cuLaunchGridAsync (CUfunction f. int grid_width. CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING. cuParamSetSize. CUDA_ERROR_DEINITIALIZED.Stream identifier Returns: CUDA_SUCCESS. cuParamSetf. CUDA_ERROR_LAUNCH_TIMEOUT. asynchronous launches.Kernel to launch grid_width . CUDA_ERROR_INVALID_VALUE.Height of grid in blocks hStream .Height of grid in blocks Returns: CUDA_SUCCESS. CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES. CUDA_ERROR_LAUNCH_FAILED. cuLaunchGridAsync 4.Kernel to launch grid_width . cuParamSeti.2.34. cuFuncSetSharedSize. See also: cuFuncSetBlockShape. Each block contains the number of threads specified by a previous call to cuFuncSetBlockShape(). cuLaunchGridAsync() can optionally be associated to a stream by passing a non-zero hStream argument.34. CUDA_ERROR_LAUNCH_TIMEOUT. CUDA_ERROR_INVALID_CONTEXT. int grid_width. Parameters: f . CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING. int grid_height) 241 Invokes the kernel f on a grid_width x grid_height grid of blocks.34 Execution Control 4. cuParamSetv.Width of grid in blocks grid_height .4.6 CUresult cuLaunchGrid (CUfunction f. CUDA_ERROR_DEINITIALIZED.2. cuLaunch. Parameters: f . cuParamSeti. CUDA_ERROR_INVALID_VALUE. CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES. cuLaunchGrid Generated for NVIDIA CUDA Library by Doxygen . Each block contains the number of threads specified by a previous call to cuFuncSetBlockShape(). CUDA_ERROR_SHARED_OBJECT_INIT_FAILED Note: Note that this function may also return error codes from previous. CUDA_ERROR_SHARED_OBJECT_INIT_FAILED Note: Note that this function may also return error codes from previous. CUDA_ERROR_NOT_INITIALIZED. cuParamSetSize. asynchronous launches. CUDA_ERROR_INVALID_CONTEXT. cuLaunch.Width of grid in blocks grid_height . cuFuncGetAttribute. CUDA_ERROR_LAUNCH_FAILED. cuParamSetv. See also: cuFuncSetBlockShape. CUstream hStream) Invokes the kernel f on a grid_width x grid_height grid of blocks. cuFuncSetSharedSize.

34. cuLaunchGrid.Kernel to add parameter to offset . offset is a byte offset. CUDA_ERROR_NOT_INITIALIZED. cuLaunchGrid. cuFuncSetSharedSize. cuLaunch.Offset to add parameter to argument list value . offset is a byte offset. float value) Module Documentation Sets a floating-point parameter that will be specified the next time the kernel corresponding to hfunc will be invoked. unsigned int value) Sets an integer parameter that will be specified the next time the kernel corresponding to hfunc will be invoked.9 CUresult cuParamSeti (CUfunction hfunc. cuParamSetSize. int offset. asynchronous launches.8 CUresult cuParamSetf (CUfunction hfunc. CUDA_ERROR_NOT_INITIALIZED.2.242 4.Offset to add parameter to argument list value . cuParamSeti. cuLaunch. See also: cuFuncSetBlockShape. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. cuParamSetv. cuParamSetv. cuFuncGetAttribute.34.Value of parameter Returns: CUDA_SUCCESS. Parameters: hfunc . asynchronous launches.2. cuParamSetf. CUDA_ERROR_DEINITIALIZED. cuLaunchGridAsync CUDA_- 4. cuFuncGetAttribute. cuFuncSetSharedSize. int offset. CUDA_ERROR_DEINITIALIZED. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous.Value of parameter Returns: CUDA_SUCCESS. ERROR_INVALID_CONTEXT.Kernel to add parameter to offset . ERROR_INVALID_CONTEXT. cuLaunchGridAsync CUDA_- Generated for NVIDIA CUDA Library by Doxygen . See also: cuFuncSetBlockShape. cuParamSetSize. Parameters: hfunc .

11 CUresult cuParamSetv (CUfunction hfunc. cuParamSeti.2. cuFuncSetSharedSize. CUDA_ERROR_NOT_INITIALIZED. cuLaunchGrid.34 Execution Control 4. offset is a byte offset.2. ERROR_INVALID_CONTEXT.Pointer to arbitrary data numbytes . cuLaunchGridAsync CUDA_- 4. See also: cuFuncSetBlockShape.10 CUresult cuParamSetSize (CUfunction hfunc. cuParamSetv.Kernel to add data to offset . cuParamSetf. cuLaunch. asynchronous launches. asynchronous launches. cuParamSeti.34. Parameters: hfunc . CUDA_ERROR_DEINITIALIZED. cuParamSetSize. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. cuParamSetf. cuLaunchGrid. void ∗ ptr.34. cuLaunchGridAsync CUDA_- Generated for NVIDIA CUDA Library by Doxygen . CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous.Kernel to set parameter size for numbytes .Offset to add data to argument list ptr . CUDA_ERROR_DEINITIALIZED. See also: cuFuncSetBlockShape. unsigned int numbytes) Copies an arbitrary amount of data (specified in numbytes) from ptr into the parameter space of the kernel corresponding to hfunc. cuFuncGetAttribute.4. cuFuncGetAttribute. CUDA_ERROR_NOT_INITIALIZED.Size of data to copy in bytes Returns: CUDA_SUCCESS. Parameters: hfunc . cuFuncSetSharedSize. unsigned int numbytes) 243 Sets through numbytes the total size in bytes needed by the function parameters of the kernel corresponding to hfunc. cuLaunch. ERROR_INVALID_CONTEXT. int offset.Size of parameter list in bytes Returns: CUDA_SUCCESS.

ERROR_INVALID_CONTEXT.35. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous.2.244 Module Documentation 4.Texture unit (must be CU_PARAM_TR_DEFAULT) hTexRef .1 Detailed Description This section describes the deprecated execution control functions of the low-level CUDA driver application programming interface. int texunit. CUtexref hTexRef) Adds a texture-reference to the function’s argument list.2 4.Texture-reference to add to argument list Returns: CUDA_SUCCESS. CUDA_ERROR_DEINITIALIZED. CUDA_ERROR_NOT_INITIALIZED. 4. the texture-reference must be obtained via cuModuleGetTexRef() and the texunit parameter must be set to CU_PARAM_TR_DEFAULT. int texunit.35. CUDA_- Generated for NVIDIA CUDA Library by Doxygen . 4.1 Function Documentation CUresult cuParamSetTexRef (CUfunction hfunc. CUtexref hTexRef) Deprecated Makes the CUDA array or linear memory bound to the texture reference hTexRef available to a device program as a texture.Kernel to add texture-reference to texunit . In this version of CUDA.35. Parameters: hfunc . asynchronous launches.35 Execution Control [DEPRECATED] Functions • CUresult cuParamSetTexRef (CUfunction hfunc.

CUtexref hTexRef) Gets the flags used by a texture reference. • CUresult cuTexRefGetAddressMode (CUaddress_mode ∗pam. CUaddress_mode am) Sets the addressing mode for a texture reference. const CUDA_ARRAY_DESCRIPTOR ∗desc. • CUresult cuTexRefGetArray (CUarray ∗phArray. • CUresult cuTexRefSetAddressMode (CUtexref hTexRef.4. CUarray hArray. • CUresult cuTexRefSetAddress (size_t ∗ByteOffset. • CUresult cuTexRefSetFilterMode (CUtexref hTexRef. int ∗pNumChannels. size_t bytes) Binds an address as a texture reference. size_t Pitch) Binds an address as a 2D texture reference. • CUresult cuTexRefSetAddress2D (CUtexref hTexRef. • CUresult cuTexRefGetFlags (unsigned int ∗pFlags. • CUresult cuTexRefGetFilterMode (CUfilter_mode ∗pfm. CUtexref hTexRef) Gets the address associated with a texture reference. int dim. CUdeviceptr dptr. CUtexref hTexRef. CUtexref hTexRef. • CUresult cuTexRefSetFormat (CUtexref hTexRef. Generated for NVIDIA CUDA Library by Doxygen .36 Texture Reference Management Modules • Texture Reference Management [DEPRECATED] Functions • CUresult cuTexRefGetAddress (CUdeviceptr ∗pdptr. CUtexref hTexRef) Gets the array bound to a texture reference. int NumPackedComponents) Sets the format for a texture reference. int dim) Gets the addressing mode used by a texture reference. • CUresult cuTexRefSetFlags (CUtexref hTexRef. • CUresult cuTexRefSetArray (CUtexref hTexRef. CUtexref hTexRef) Gets the filter-mode used by a texture reference. CUfilter_mode fm) Sets the filtering mode for a texture reference. CUarray_format fmt. unsigned int Flags) Binds an array as a texture reference. • CUresult cuTexRefGetFormat (CUarray_format ∗pFormat. CUdeviceptr dptr. unsigned int Flags) Sets the flags for a texture reference.36 Texture Reference Management 245 4. CUtexref hTexRef) Gets the format used by a texture reference.

cuTexRefGetFlags.2 CUresult cuTexRefGetAddressMode (CUaddress_mode ∗ pam.1 Function Documentation CUresult cuTexRefGetAddress (CUdeviceptr ∗ pdptr.36. CUDA_ERROR_INVALID_VALUE See also: cuTexRefSetAddress. CUDA_ERROR_NOT_INITIALIZED. cuTexRefSetFlags.Returned device address hTexRef . cuTexRefSetAddressMode. cuTexRefGetFormat CUDA_- Generated for NVIDIA CUDA Library by Doxygen . 4. cuTexRefSetFormat.1 Detailed Description This section describes the texture reference management functions of the low-level CUDA driver application programming interface. CUtexref hTexRef) Returns in ∗pdptr the base address bound to the texture reference hTexRef. int dim) Returns in ∗pam the addressing mode corresponding to the dimension dim of the texture reference hTexRef. CUtexref hTexRef. CUDA_ERROR_DEINITIALIZED. cuTexRefGetAddress.Texture reference dim . cuTexRefGetFormat CUDA_- 4. cuTexRefSetFlags.2. cuTexRefSetFormat. cuTexRefGetArray. cuTexRefSetFilterMode. cuTexRefSetArray. ERROR_INVALID_CONTEXT. CUDA_ERROR_INVALID_VALUE See also: cuTexRefSetAddress.36. cuTexRefSetArray. cuTexRefSetAddressMode.Returned addressing mode hTexRef . CUDA_ERROR_DEINITIALIZED. cuTexRefGetFilterMode. cuTexRefSetAddress2D. cuTexRefSetFilterMode.36.2 4. the only valid value for dim are 0 and 1.Dimension Returns: CUDA_SUCCESS. CUDA_ERROR_NOT_INITIALIZED. cuTexRefGetFlags. cuTexRefGetFilterMode. Parameters: pdptr .246 Module Documentation 4. Currently. cuTexRefGetAddressMode. Parameters: pam .2.36.Texture reference Returns: CUDA_SUCCESS. cuTexRefGetArray. cuTexRefSetAddress2D. or returns CUDA_ERROR_INVALID_VALUE if the texture reference is not bound to any device memory range. ERROR_INVALID_CONTEXT.

cuTexRefGetFormat CUDA_- 4.Returned flags hTexRef .Texture reference Returns: CUDA_SUCCESS. ERROR_INVALID_CONTEXT. cuTexRefGetAddress.36. cuTexRefGetArray. CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_DEINITIALIZED. Parameters: pFlags .Texture reference Returns: CUDA_SUCCESS.2. or returns CUDA_ERROR_INVALID_VALUE if the texture reference is not bound to any CUDA array.Returned filtering mode hTexRef . cuTexRefSetAddress2D. ERROR_INVALID_CONTEXT. cuTexRefSetAddress2D. cuTexRefGetFormat CUDA_- 4. cuTexRefGetAddress. CUDA_ERROR_INVALID_VALUE Generated for NVIDIA CUDA Library by Doxygen CUDA_- . cuTexRefSetArray. cuTexRefSetFlags. ERROR_INVALID_CONTEXT.4. cuTexRefSetAddressMode. CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_INVALID_VALUE See also: cuTexRefSetAddress. cuTexRefGetFlags. CUDA_ERROR_DEINITIALIZED. CUDA_ERROR_NOT_INITIALIZED. cuTexRefSetFlags.Returned array hTexRef .5 CUresult cuTexRefGetFlags (unsigned int ∗ pFlags. cuTexRefSetAddressMode. CUDA_ERROR_INVALID_VALUE See also: cuTexRefSetAddress. Parameters: phArray . cuTexRefSetFilterMode. CUtexref hTexRef) 247 Returns in ∗phArray the CUDA array bound to the texture reference hTexRef. cuTexRefSetFormat. cuTexRefGetAddressMode. CUtexref hTexRef) Returns in ∗pFlags the flags of the texture reference hTexRef. cuTexRefGetFlags.36.36.3 CUresult cuTexRefGetArray (CUarray ∗ phArray. CUtexref hTexRef) Returns in ∗pfm the filtering mode of the texture reference hTexRef. Parameters: pfm . cuTexRefSetFilterMode.Texture reference Returns: CUDA_SUCCESS. cuTexRefGetAddressMode.2.4 CUresult cuTexRefGetFilterMode (CUfilter_mode ∗ pfm. CUDA_ERROR_DEINITIALIZED.2. cuTexRefSetArray. cuTexRefGetFilterMode. cuTexRefSetFormat.36 Texture Reference Management 4.

36.Texture reference to bind dptr .Texture reference Returns: CUDA_SUCCESS.36. Parameters: pFormat . cuTexRefGetFlags CUDA_- 4. int ∗ pNumChannels. cuTexRefGetFormat 4. cuTexRefSetFormat. cuTexRefSetFormat. ERROR_INVALID_CONTEXT. cuTexRefGetAddress. size_t bytes) Binds a linear address range to the texture reference hTexRef.2. CUDA_ERROR_NOT_INITIALIZED. cuTexRefSetAddressMode. CUDA_ERROR_INVALID_VALUE CUDA_- Generated for NVIDIA CUDA Library by Doxygen . cuTexRefGetFilterMode. If pFormat or pNumChannels is NULL. CUtexref hTexRef) Returns in ∗pFormat and ∗pNumChannels the format and number of components of the CUDA array bound to the texture reference hTexRef. cuTexRefSetAddress() passes back a byte offset in ∗ByteOffset that must be applied to texture fetches in order to read from the desired memory. If the device memory pointer was returned from cuMemAlloc(). cuTexRefSetFlags.2. cuTexRefSetFilterMode. cuTexRefGetAddressMode. CUDA_ERROR_DEINITIALIZED.6 CUresult cuTexRefGetFormat (CUarray_format ∗ pFormat. the offset is guaranteed to be 0 and NULL may be passed as the ByteOffset parameter. CUDA_ERROR_DEINITIALIZED. cuTexRefSetAddress2D. This offset must be divided by the texel size and passed to kernels that read from the texture so they can be applied to the tex1Dfetch() function. cuTexRefSetAddressMode. Any previous address or CUDA array state associated with the texture reference is superseded by this function.Returned byte offset hTexRef . cuTexRefSetFilterMode. cuTexRefGetArray. CUtexref hTexRef. cuTexRefGetArray. Parameters: ByteOffset .7 CUresult cuTexRefSetAddress (size_t ∗ ByteOffset. CUDA_ERROR_NOT_INITIALIZED. cuTexRefSetAddress2D. cuTexRefGetAddressMode. ERROR_INVALID_CONTEXT. it will be ignored.Size of memory to bind in bytes Returns: CUDA_SUCCESS.248 See also: Module Documentation cuTexRefSetAddress. cuTexRefSetArray. CUdeviceptr dptr. Since the hardware enforces an alignment requirement on texture base addresses. cuTexRefGetAddress. CUDA_ERROR_INVALID_VALUE See also: cuTexRefSetAddress.Returned format pNumChannels . cuTexRefSetFlags. cuTexRefGetFilterMode.Device pointer to bind bytes . cuTexRefSetArray.Returned number of components hTexRef . Any memory previously bound to hTexRef is unbound.

36 Texture Reference Management See also: 249 cuTexRefSetAddress2D. Parameters: hTexRef . the second.Line pitch in bytes Returns: CUDA_SUCCESS. cuTexRefGetFilterMode. or cuTexRefSetAddress2D() to bind the texture reference to linear memory. ERROR_INVALID_CONTEXT. cuTexRefGetArray. cuTexRefSetArray. int dim. cuTexRefSetFlags. cuTexRefGetAddress. cuTexRefGetFormat CUDA_- 4. cuTexRefSetFlags.Texture reference to bind desc . the addressing mode is applied to the first parameter of the functions used to fetch from the texture. CUDA_ERROR_INVALID_VALUE is returned.9 CUresult cuTexRefSetAddressMode (CUtexref hTexRef. cuTexRefSetArray.2. CUdeviceptr dptr.36. cuTexRefSetFilterMode. If dim is zero. const CUDA_ARRAY_DESCRIPTOR ∗ desc. cuTexRefSetAddressMode. cuTexRefSetAddressMode. It is required that dptr be aligned to the appropriate hardware-specific texture alignment. Note that this call has no effect if hTexRef is bound to linear memory. cuTexRefGetFilterMode. Using a tex2D() function inside a kernel requires a call to either cuTexRefSetArray() to bind the corresponding texture reference to an array. cuTexRefGetAddress. CUaddress_mode am) Specifies the addressing mode am for the given dimension dim of the texture reference hTexRef. cuTexRefSetFilterMode. cuTexRefSetFormat. CUDA_ERROR_NOT_INITIALIZED.8 CUresult cuTexRefSetAddress2D (CUtexref hTexRef. Function calls to cuTexRefSetFormat() cannot follow calls to cuTexRefSetAddress2D() for the same texture reference. Generated for NVIDIA CUDA Library by Doxygen . CUDA_ERROR_DEINITIALIZED. cuTexRefGetAddressMode. size_t Pitch) Binds a linear address range to the texture reference hTexRef.2. CU_TR_ADDRESS_MODE_BORDER = 3 } CUaddress_mode. CUDA_ERROR_INVALID_VALUE See also: cuTexRefSetAddress. CUaddress_mode is defined as: typedef enum CUaddress_mode_enum { CU_TR_ADDRESS_MODE_WRAP = 0. cuTexRefGetFormat 4. cuTexRefGetFlags. Any memory previously bound to hTexRef is unbound. If an unaligned dptr is supplied. if dim is 1. cuTexRefGetAddressMode. CU_TR_ADDRESS_MODE_CLAMP = 1.Device pointer to bind Pitch . You can query this value using the device attribute CU_DEVICE_ATTRIBUTE_TEXTURE_ALIGNMENT. and so on. cuTexRefGetFlags.Descriptor of CUDA array dptr . Any previous address or CUDA array state associated with the texture reference is superseded by this function. CU_TR_ADDRESS_MODE_MIRROR = 2. cuTexRefSetFormat. cuTexRefGetArray.36.4.

cuTexRefSetFormat.11 CUresult cuTexRefSetFilterMode (CUtexref hTexRef. CUfilter_mode_enum is defined as: typedef enum CUfilter_mode_enum { CU_TR_FILTER_MODE_POINT = 0. unsigned int Flags) Binds the CUDA array hArray to the texture reference hTexRef.2. cuTexRefGetAddressMode.250 Parameters: hTexRef . cuTexRefSetAddress2D. CUDA_ERROR_DEINITIALIZED. cuTexRefGetArray. cuTexRefGetFilterMode.Dimension am . CUarray hArray. Parameters: hTexRef .36. Any previous address or CUDA array state associated with the texture reference is superseded by this function. cuTexRefGetAddress. ERROR_INVALID_CONTEXT.10 CUresult cuTexRefSetArray (CUtexref hTexRef. Parameters: hTexRef . CUfilter_mode fm) Specifies the filtering mode fm to be used when reading memory through the texture reference hTexRef. ERROR_INVALID_CONTEXT. cuTexRefSetFormat.Options (must be CU_TRSA_OVERRIDE_FORMAT) Returns: CUDA_SUCCESS. cuTexRefSetAddressMode.Addressing mode to set Returns: Module Documentation CUDA_SUCCESS. cuTexRefGetFlags.Texture reference to bind hArray . cuTexRefSetAddress2D. CU_TR_FILTER_MODE_LINEAR = 1 } CUfilter_mode.2. cuTexRefSetFilterMode. cuTexRefSetArray.Texture reference dim .Texture reference Generated for NVIDIA CUDA Library by Doxygen . CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_INVALID_VALUE See also: CUDA_- cuTexRefSetAddress. CUDA_ERROR_NOT_INITIALIZED.36. CUDA_ERROR_DEINITIALIZED. cuTexRefSetFilterMode. cuTexRefGetArray. cuTexRefGetFlags. CUDA_ERROR_INVALID_VALUE See also: cuTexRefSetAddress. cuTexRefGetFilterMode. cuTexRefSetFlags. cuTexRefGetAddressMode. cuTexRefSetFlags. cuTexRefGetAddress. Any CUDA array previously bound to hTexRef is unbound. Note that this call has no effect if hTexRef is bound to linear memory.Array to bind Flags . cuTexRefGetFormat CUDA_- 4. Flags must be set to CU_TRSA_OVERRIDE_FORMAT. cuTexRefGetFormat 4.

unsigned int Flags) Specifies optional flags via Flags to specify the behavior of data returned through the texture reference hTexRef. cuTexRefGetArray.Filtering mode to set Returns: CUDA_SUCCESS. cuTexRefSetAddressMode. Instead. CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_INVALID_VALUE See also: 251 CUDA_- cuTexRefSetAddress.Texture reference fmt . cuTexRefGetFormat CUDA_- 4. The valid flags are: • CU_TRSF_READ_AS_INTEGER. • CU_TRSF_NORMALIZED_COORDINATES. CUDA_ERROR_NOT_INITIALIZED. cuTexRefSetArray. which suppresses the default behavior of having the texture coordinates range from [0.0) reference the entire breadth of the array dimension. cuTexRefSetArray. CUarray_format fmt. CUDA_ERROR_INVALID_VALUE See also: cuTexRefSetAddress. ERROR_INVALID_CONTEXT. ERROR_INVALID_CONTEXT.36. fmt and NumPackedComponents are exactly analogous to the Format and NumChannels members of the CUDA_ARRAY_DESCRIPTOR structure: They specify the format of each component and the number of components per array element.4. cuTexRefGetFlags. CUDA_ERROR_DEINITIALIZED. cuTexRefSetAddressMode. 1]. cuTexRefGetFlags. cuTexRefGetFormat 4. Parameters: hTexRef .Texture reference Flags .Optional flags to set Returns: CUDA_SUCCESS. cuTexRefSetAddress2D. cuTexRefSetFilterMode. Parameters: hTexRef .2.12 CUresult cuTexRefSetFlags (CUtexref hTexRef. cuTexRefGetFilterMode. cuTexRefGetArray. cuTexRefSetAddress2D. cuTexRefSetFormat.13 CUresult cuTexRefSetFormat (CUtexref hTexRef. cuTexRefGetFilterMode.36. which suppresses the default behavior of having the texture promote integer data to floating point data in the range [0. cuTexRefGetAddress. cuTexRefSetFlags. cuTexRefGetAddressMode.Number of components per array element Generated for NVIDIA CUDA Library by Doxygen .Format to set NumPackedComponents . the texture coordinates [0. cuTexRefGetAddress. 1.36 Texture Reference Management fm . cuTexRefSetFormat.2. cuTexRefGetAddressMode. int NumPackedComponents) Specifies the format of the data to be read by the texture reference hTexRef. CUDA_ERROR_DEINITIALIZED. Dim) where Dim is the width or height of the CUDA array.

252 Returns: Module Documentation CUDA_SUCCESS. cuTexRefGetAddress. cuTexRefGetFilterMode. ERROR_INVALID_CONTEXT. cuTexRefGetFlags. cuTexRefSetAddress2D. CUDA_ERROR_DEINITIALIZED. cuTexRefGetFormat Generated for NVIDIA CUDA Library by Doxygen . cuTexRefSetArray. CUDA_ERROR_NOT_INITIALIZED. cuTexRefSetAddressMode. cuTexRefSetFilterMode. cuTexRefGetAddressMode. cuTexRefGetArray. cuTexRefSetFlags. CUDA_ERROR_INVALID_VALUE See also: CUDA_- cuTexRefSetAddress.

CUDA_ERROR_INVALID_VALUE See also: cuTexRefDestroy CUDA_- 4. 4.2 4.37. 4.37.4.1 Function Documentation CUresult cuTexRefCreate (CUtexref ∗ pTexRef) Deprecated Creates a texture reference and returns its handle in ∗pTexRef. filtering.) to be used when the memory is read through this texture reference. Once created.37. Parameters: pTexRef .1 Detailed Description This section describes the deprecated texture reference management functions of the low-level CUDA driver application programming interface.2 CUresult cuTexRefDestroy (CUtexref hTexRef) Deprecated Destroys the texture reference specified by hTexRef. etc. Parameters: hTexRef . CUDA_ERROR_DEINITIALIZED. ERROR_INVALID_CONTEXT. CUDA_ERROR_NOT_INITIALIZED. • CUresult cuTexRefDestroy (CUtexref hTexRef) Destroys a texture reference.Returned texture reference Returns: CUDA_SUCCESS.37.2.37 Texture Reference Management [DEPRECATED] Functions • CUresult cuTexRefCreate (CUtexref ∗pTexRef) Creates a texture reference.Texture reference to destroy Generated for NVIDIA CUDA Library by Doxygen . Other texture reference functions are used to specify the format and interpretation (addressing.37 Texture Reference Management [DEPRECATED] 253 4. the application must call cuTexRefSetArray() or cuTexRefSetAddress() to associate the reference with allocated memory.2.

CUDA_ERROR_NOT_INITIALIZED. ERROR_INVALID_CONTEXT. CUDA_ERROR_INVALID_VALUE See also: cuTexRefCreate CUDA_- Generated for NVIDIA CUDA Library by Doxygen .254 Returns: Module Documentation CUDA_SUCCESS. CUDA_ERROR_DEINITIALIZED.

or returns CUDA_ERROR_INVALID_VALUE if the surface reference is not bound to any CUDA array. CUDA_ERROR_INVALID_VALUE See also: cuModuleGetSurfRef. CUarray hArray.38.38. Flags must be set to 0. 4.2 4. unsigned int Flags) Sets the CUDA array hArray to be read and written by the surface reference hSurfRef.Surface reference handle hSurfRef .1 Function Documentation CUresult cuSurfRefGetArray (CUarray ∗ phArray.2.38 Surface Reference Management 255 4. Any previous CUDA array state associated with the surface reference is superseded by this function. Parameters: hSurfRef .38.2 CUresult cuSurfRefSetArray (CUsurfref hSurfRef.1 Detailed Description This section describes the surface reference management functions of the low-level CUDA driver application programming interface.38 Surface Reference Management Functions • CUresult cuSurfRefGetArray (CUarray ∗phArray. CUarray hArray.Surface reference handle hArray . The CUDA_ARRAY3D_SURFACE_LDST flag must have been set for the CUDA array. CUsurfref hSurfRef) Passes back the CUDA array bound to a surface reference. 4. Parameters: phArray .4.set to 0 Generated for NVIDIA CUDA Library by Doxygen . Any CUDA array previously bound to hSurfRef is unbound. cuSurfRefSetArray CUDA_- 4. CUDA_ERROR_DEINITIALIZED. CUsurfref hSurfRef) Returns in ∗phArray the CUDA array bound to the surface reference hSurfRef.CUDA array handle Flags . unsigned int Flags) Sets the CUDA array for a surface reference.Surface reference handle Returns: CUDA_SUCCESS. CUDA_ERROR_NOT_INITIALIZED.38. ERROR_INVALID_CONTEXT.2. • CUresult cuSurfRefSetArray (CUsurfref hSurfRef.

cuSurfRefGetArray CUDA_- Generated for NVIDIA CUDA Library by Doxygen . ERROR_INVALID_CONTEXT. CUDA_ERROR_DEINITIALIZED. CUDA_ERROR_INVALID_VALUE See also: cuModuleGetSurfRef.256 Returns: Module Documentation CUDA_SUCCESS. CUDA_ERROR_NOT_INITIALIZED.

size_t ∗pSize. Parameters: count .39.Number of resources to map Generated for NVIDIA CUDA Library by Doxygen .39 Graphics Interoperability 257 4. CUgraphicsResource resource) Get a device pointer through which to access a mapped graphics resource. unsigned int arrayIndex. The graphics API from which resources were registered should not access any resources while they are mapped by CUDA.39 Graphics Interoperability Functions • CUresult cuGraphicsMapResources (unsigned int count. CUgraphicsResource resource.39.4. If any of resources are presently mapped for access by CUDA then CUDA_ERROR_ALREADY_MAPPED is returned. CUgraphicsResource ∗ resources. unsigned int mipLevel) Get an array through which to access a subresource of a mapped graphics resource. CUgraphicsResource ∗resources. 4. CUgraphicsResource ∗resources. • CUresult cuGraphicsResourceSetMapFlags (CUgraphicsResource resource.2. • CUresult cuGraphicsUnregisterResource (CUgraphicsResource resource) Unregisters a graphics resource for access by CUDA. CUstream hStream) Unmap graphics resources. If resources includes any duplicate entries then CUDA_ERROR_INVALID_HANDLE is returned. • CUresult cuGraphicsUnmapResources (unsigned int count. CUstream hStream) Maps the count graphics resources in resources for access by CUDA. The resources in resources may be accessed by CUDA until they are unmapped.1 Detailed Description This section describes the graphics interoperability functions of the low-level CUDA driver application programming interface. CUstream hStream) Map graphics resources for access by CUDA. This function provides the synchronization guarantee that any graphics calls issued before cuGraphicsMapResources() will complete before any subsequent CUDA work issued in stream begins. • CUresult cuGraphicsSubResourceGetMappedArray (CUarray ∗pArray.1 Function Documentation CUresult cuGraphicsMapResources (unsigned int count. 4.2 4. • CUresult cuGraphicsResourceGetMappedPointer (CUdeviceptr ∗pDevPtr. unsigned int flags) Set usage flags for mapping a graphics resource. If an application does so.39. the results are undefined.

CUDA_ERROR_INVALID_HANDLE.2. CUDA_ERROR_INVALID_HANDLE.3 CUresult cuGraphicsResourceSetMapFlags (CUgraphicsResource resource.Returned size of the buffer accessible starting at ∗pPointer resource .2. CUDA_ERROR_DEINITIALIZED. CUDA_ERROR_DEINITIALIZED. See also: cuGraphicsResourceGetMappedPointer cuGraphicsSubResourceGetMappedArray cuGraphicsUnmapResources 4.39. CUDA_ERROR_NOT_INITIALIZED.Mapped resource to access Returns: CUDA_SUCCESS. size_t ∗ pSize. CUDA_ERROR_NOT_MAPPED CUDA_ERROR_NOT_MAPPED_AS_POINTER Note: Note that this function may also return error codes from previous. See also: cuGraphicsMapResources. CUDA_ERROR_INVALID_CONTEXT. CUgraphicsResource resource) Returns in ∗pDevPtr a pointer through which the mapped graphics resource resource may be accessed. asynchronous launches.39. Changes to flags will take effect the next time resource is mapped.Returned pointer through which resource may be accessed pSize . If resource is not a buffer then it cannot be accessed via a pointer and CUDA_ERROR_NOT_MAPPED_AS_POINTER is returned. CUDA_ERROR_ALREADY_MAPPED. unsigned int flags) Set flags for mapping the graphics resource resource.Stream with which to synchronize Returns: Module Documentation CUDA_SUCCESS. ∗ Parameters: pDevPtr . Returns in pSize the size of the memory in bytes which may be accessed from that pointer. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. CUDA_ERROR_INVALID_CONTEXT. asynchronous launches. If resource is not mapped then CUDA_ERROR_NOT_MAPPED is returned.2 CUresult cuGraphicsResourceGetMappedPointer (CUdeviceptr ∗ pDevPtr.Resources to map for CUDA usage hStream . CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_INVALID_VALUE. The flags argument may be any of the following: Generated for NVIDIA CUDA Library by Doxygen . The value set in pPointer may change every time that resource is mapped.258 resources . cuGraphicsSubResourceGetMappedArray 4.

39 Graphics Interoperability 259 • CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE: Specifies no hints about how this resource will be used.4. CUDA_ERROR_INVALID_CONTEXT. If arrayIndex is not a valid array index for resource then CUDA_ERROR_INVALID_VALUE is returned. If flags is not one of the above values then CUDA_ERROR_INVALID_VALUE is returned. CUDA_ERROR_ALREADY_MAPPED Note: Note that this function may also return error codes from previous. CUDA_ERROR_INVALID_VALUE. unsigned int arrayIndex. • CU_GRAPHICS_MAP_RESOURCE_FLAGS_WRITEDISCARD: Specifies that CUDA kernels which access this resource will not read from this resource and will write over the entire contents of the resource.Parameters for resource mapping Returns: CUDA_SUCCESS. This is the default value. CUDA_ERROR_DEINITIALIZED. If resource is not mapped then CUDA_ERROR_NOT_MAPPED is returned. CUDA_ERROR_NOT_INITIALIZED. CUgraphicsResource resource. Parameters: resource . so none of the data previously stored in the resource will be preserved. asynchronous launches.2.Returned array through which a subresource of resource may be accessed resource .Mapped resource to access arrayIndex . Parameters: pArray . • CU_GRAPHICS_MAP_RESOURCE_FLAGS_READONLY: Specifies that CUDA kernels which access this resource will not write to this resource.4 CUresult cuGraphicsSubResourceGetMappedArray (CUarray ∗ pArray. CUDA_ERROR_INVALID_HANDLE.Mipmap level for the subresource to access Generated for NVIDIA CUDA Library by Doxygen . The value set in ∗pArray may change every time that resource is mapped. If resource is not a texture then it cannot be accessed via an array and CUDA_ERROR_NOT_MAPPED_AS_ARRAY is returned.Array index for array textures or cubemap face index as defined by CUarray_cubemap_face for cubemap textures for the subresource to access mipLevel . unsigned int mipLevel) Returns in ∗pArray an array through which the subresource of the mapped graphics resource resource which corresponds to array index arrayIndex and mipmap level mipLevel may be accessed. If mipLevel is not a valid mipmap level for resource then CUDA_ERROR_INVALID_VALUE is returned.Registered resource to set flags for flags . It is therefore assumed that this resource will be read from and written to by CUDA kernels. If resource is presently mapped for access by CUDA then CUDA_ERROR_ALREADY_MAPPED is returned.39. See also: cuGraphicsMapResources 4.

CUstream hStream) Unmaps the count graphics resources in resources.Number of resources to unmap resources . See also: cuGraphicsMapResources 4. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. If resources includes any duplicate entries then CUDA_ERROR_INVALID_HANDLE is returned. See also: cuGraphicsResourceGetMappedPointer 4.5 CUresult cuGraphicsUnmapResources (unsigned int count. This function provides the synchronization guarantee that any CUDA work issued in stream before cuGraphicsUnmapResources() will complete before any subsequently issued graphics work begins. CUDA_ERROR_NOT_MAPPED.Stream with which to synchronize Returns: CUDA_SUCCESS. asynchronous launches.39. the resources in resources may not be accessed by CUDA until they are mapped again. CUDA_ERROR_INVALID_HANDLE. CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_NOT_MAPPED CUDA_ERROR_NOT_MAPPED_AS_ARRAY Note: Note that this function may also return error codes from previous. CUDA_ERROR_INVALID_CONTEXT. CUDA_ERROR_INVALID_VALUE. CUDA_ERROR_NOT_INITIALIZED. Parameters: count .6 CUresult cuGraphicsUnregisterResource (CUgraphicsResource resource) Unregisters the graphics resource resource so it is not accessible by CUDA unless registered again.2. CUgraphicsResource ∗ resources. If any of resources are not presently mapped for access by CUDA then CUDA_ERROR_NOT_MAPPED is returned.39. Once unmapped. asynchronous launches. CUDA_ERROR_INVALID_CONTEXT.260 Returns: Module Documentation CUDA_SUCCESS. CUDA_ERROR_DEINITIALIZED.Resources to unmap hStream . CUDA_ERROR_DEINITIALIZED. If resource is invalid then CUDA_ERROR_INVALID_HANDLE is returned. CUDA_ERROR_INVALID_HANDLE.Resource to unregister Generated for NVIDIA CUDA Library by Doxygen .2. Parameters: resource .

CUDA_ERROR_INVALID_CONTEXT. See also: cuGraphicsD3D9RegisterResource. CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_INVALID_HANDLE.39 Graphics Interoperability Returns: 261 CUDA_SUCCESS. CUDA_ERROR_DEINITIALIZED. cuGraphicsD3D11RegisterResource. cuGraphicsD3D10RegisterResource.4. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. cuGraphicsGLRegisterImage Generated for NVIDIA CUDA Library by Doxygen . cuGraphicsGLRegisterBuffer. asynchronous launches.

Parameters: pCtx . CUDA_ERROR_OUT_OF_MEMORY Note: Note that this function may also return error codes from previous. • CUresult cuGraphicsGLRegisterImage (CUgraphicsResource ∗pCudaResource. unsigned int Flags.2 4. It may fail if the needed OpenGL driver facilities are not available. CUdevice device) Create a CUDA context for interoperability with OpenGL. • CUresult cuGraphicsGLRegisterBuffer (CUgraphicsResource ∗pCudaResource. unsigned int Flags. CUDA_ERROR_INVALID_CONTEXT.Options for CUDA context creation device . Generated for NVIDIA CUDA Library by Doxygen . • CUresult cuWGLGetDevice (CUdevice ∗pDevice. see cuCtxCreate(). and associates the CUDA context with the calling thread. unsigned int Flags) Registers an OpenGL buffer object. CUDA_ERROR_INVALID_VALUE.40. GLenum target. CUDA_ERROR_NOT_INITIALIZED.2.1 Detailed Description This section describes the OpenGL interoperability functions of the low-level CUDA driver application programming interface.40.Device on which to create the context Returns: CUDA_SUCCESS. CUDA_ERROR_DEINITIALIZED.1 Function Documentation CUresult cuGLCtxCreate (CUcontext ∗ pCtx.262 Module Documentation 4.Returned CUDA context Flags . CUdevice device) Creates a new CUDA context. 4. asynchronous launches. HGPUNV hGpu) Gets the CUDA device associated with hGpu.40. initializes OpenGL interoperability.40 OpenGL Interoperability Modules • OpenGL Interoperability [DEPRECATED] Functions • CUresult cuGLCtxCreate (CUcontext ∗pCtx. 4. unsigned int Flags) Register an OpenGL texture or renderbuffer object. GLuint buffer. GLuint image. For usage of the Flags parameter. It must be called before performing any other OpenGL interoperability operations.

This is the default value. CUDA_ERROR_INVALID_HANDLE. cuGraphicsMapResources. This is the default value. cuGLUnregisterBufferObject. Parameters: pCudaResource .Map flags Returns: CUDA_SUCCESS. unsigned int Flags) Registers the texture or renderbuffer object specified by image for access by CUDA. cuGLSetBufferObjectMapFlags. unsigned int Flags) Registers the buffer object specified by buffer for access by CUDA. as follows: • CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE: Specifies no hints about how this resource will be used. Generated for NVIDIA CUDA Library by Doxygen .3 CUresult cuGraphicsGLRegisterImage (CUgraphicsResource ∗ pCudaResource.40 OpenGL Interoperability See also: 263 cuCtxCreate. A handle to the registered object is returned as pCudaResource. as follows: • CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE: Specifies no hints about how this resource will be used. target must match the type of the object. cuGraphicsUnregisterResource. cuGLRegisterBufferObject. A handle to the registered object is returned as pCudaResource. CUDA_ERROR_ALREADY_MAPPED. cuGLMapBufferObject. so none of the data previously stored in the resource will be preserved. cuGraphicsResourceGetMappedPointer 4. • CU_GRAPHICS_MAP_RESOURCE_FLAGS_READ_ONLY: Specifies that CUDA will not write to this resource. It is therefore assumed that this resource will be read from and written to by CUDA. GLuint buffer.40. cuWGLGetDevice 4. Note: Note that this function may also return error codes from previous. It is therefore assumed that this resource will be read from and written to by CUDA. cuGLInit. cuGLMapBufferObjectAsync.2 CUresult cuGraphicsGLRegisterBuffer (CUgraphicsResource ∗ pCudaResource. cuGLUnmapBufferObjectAsync. See also: cuGLCtxCreate. • CU_GRAPHICS_MAP_RESOURCE_FLAGS_READ_ONLY: Specifies that CUDA will not write to this resource. • CU_GRAPHICS_MAP_RESOURCE_FLAGS_WRITE_DISCARD: Specifies that CUDA will not read from this resource and will write over the entire contents of the resource. CUDA_ERROR_INVALID_CONTEXT. The map flags Flags specify the intended usage. cuGLUnmapBufferObject.name of buffer object to be registered Flags . GLenum target.40.4. asynchronous launches.2. GLuint image. The map flags Flags specify the intended usage.2.Pointer to the returned object handle buffer .

cuGLRegisterBufferObject.Identifies the type of object specified by image. HGPUNV hGpu) Returns in ∗pDevice the CUDA device associated with a hGpu. GL_TEXTURE_2D_ARRAY.name of texture or renderbuffer object to be registered target . asynchronous launches. cuGLUnmapBufferObjectAsync.4 CUresult cuWGLGetDevice (CUdevice ∗ pDevice. See also: cuGLCtxCreate. CUDA_ERROR_INVALID_HANDLE. The following image classes are currently disallowed: • Textures with borders • Multisampled renderbuffers Parameters: pCudaResource . GL_TEXTURE_RECTANGLE. Flags . cuGLInit. cuGLSetBufferObjectMapFlags CUDA_- Generated for NVIDIA CUDA Library by Doxygen . CUDA_ERROR_DEINITIALIZED. See also: cuGLCtxCreate. asynchronous launches. Note: Note that this function may also return error codes from previous. Parameters: pDevice . sourceGetMappedArray cuGraphicsMapResources.Device associated with hGpu hGpu . and must be one of GL_TEXTURE_2D.Handle to a GPU. GL_TEXTURE_3D. cuGraphicsSubRe- 4. cuGLMapBufferObject. ERROR_INVALID_CONTEXT.Map flags Returns: CUDA_SUCCESS.Pointer to the returned object handle image . if applicable. CUDA_ERROR_ALREADY_MAPPED. CUDA_ERROR_NOT_INITIALIZED. cuGraphicsUnregisterResource.2. GL_TEXTURE_CUBE_MAP.40.264 Module Documentation • CU_GRAPHICS_MAP_RESOURCE_FLAGS_WRITE_DISCARD: Specifies that CUDA will not read from this resource and will write over the entire contents of the resource. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. as queried via WGL_NV_gpu_affinity() Returns: CUDA_SUCCESS. cuGLUnregisterBufferObject. cuGLUnmapBufferObject. or GL_RENDERBUFFER. CUDA_ERROR_INVALID_CONTEXT. so none of the data previously stored in the resource will be preserved.

41 OpenGL Interoperability [DEPRECATED] Typedefs • typedef enum CUGLmap_flags_enum CUGLmap_flags Enumerations • enum CUGLmap_flags_enum Functions • CUresult cuGLInit (void) Initializes OpenGL interoperability.41. size_t ∗size.41. size_t ∗size. • CUresult cuGLSetBufferObjectMapFlags (GLuint buffer. • CUresult cuGLUnmapBufferObject (GLuint buffer) Unmaps an OpenGL buffer object.4. CUstream hStream) Unmaps an OpenGL buffer object.41 OpenGL Interoperability [DEPRECATED] 265 4. 4. 4. unsigned int Flags) Set the map flags for an OpenGL buffer object. CUstream hStream) Maps an OpenGL buffer object. • CUresult cuGLMapBufferObjectAsync (CUdeviceptr ∗dptr.1 Typedef Documentation typedef enum CUGLmap_flags_enum CUGLmap_flags Flags to map or unmap a resource Generated for NVIDIA CUDA Library by Doxygen .1 Detailed Description This section describes deprecated OpenGL interoperability functionality. • CUresult cuGLUnregisterBufferObject (GLuint buffer) Unregister an OpenGL buffer object. • CUresult cuGLMapBufferObject (CUdeviceptr ∗dptr.41.2 4. GLuint buffer.2. • CUresult cuGLUnmapBufferObjectAsync (GLuint buffer. GLuint buffer) Maps an OpenGL buffer object. • CUresult cuGLRegisterBufferObject (GLuint buffer) Registers an OpenGL buffer object.

cuGLSetBufferObjectMapFlags. Returns: CUDA_SUCCESS. cuGLMapBufferObjectAsync.3 4. CUDA_ERROR_DEINITIALIZED. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. CUDA_ERROR_NOT_INITIALIZED. It may fail if the needed OpenGL driver facilities are not available.1 Function Documentation CUresult cuGLInit (void) Deprecated This function is deprecated as of Cuda 3. See also: cuGLCtxCreate.The name of the buffer object to map Returns: CUDA_SUCCESS. CUDA_ERROR_INVALID_CONTEXT. cuWGLGetDevice CUDA_- 4.41.41.41.4. CUDA_ERROR_INVALID_VALUE. Parameters: dptr . or a member of the same shareGroup. cuGLUnmapBufferObjectAsync.0.Returned mapped base pointer size . Maps the buffer object specified by buffer into the address space of the current CUDA context and returns in ∗dptr and ∗size the base pointer and size of the resulting mapping. size_t ∗ size. cuGLMapBufferObject.0.1 Enumeration Type Documentation enum CUGLmap_flags_enum Flags to map or unmap a resource 4.4 4.2 CUresult cuGLMapBufferObject (CUdeviceptr ∗ dptr. All streams in the current CUDA context are synchronized with the current GL context. CUDA_ERROR_DEINITIALIZED.4. asynchronous launches. There must be a valid OpenGL context bound to the current thread when this function is called. This must be the same context. cuGLUnregisterBufferObject. cuGLRegisterBufferObject.41. CUDA_ERROR_NOT_INITIALIZED.3.Returned size of mapping buffer . This function is deprecated and calling it is no longer required. GLuint buffer) Deprecated This function is deprecated as of Cuda 3. ERROR_INVALID_CONTEXT.266 Module Documentation 4. Initializes OpenGL interoperability. CUDA_ERROR_MAP_FAILED Generated for NVIDIA CUDA Library by Doxygen . cuGLUnmapBufferObject. as the context that was bound when the buffer was registered.41.

Stream hStream in the current CUDA context is synchronized with the current GL context. This must be the same context. CUDA_ERROR_INVALID_CONTEXT. GLuint buffer.4. This function must be called before CUDA can map the buffer object. Registers the buffer object specified by buffer for access by CUDA. asynchronous launches.4. as the context that was bound when the buffer was registered. CUDA_ERROR_DEINITIALIZED.41. Parameters: buffer .41. See also: cuGraphicsMapResources 4. Maps the buffer object specified by buffer into the address space of the current CUDA context and returns in ∗dptr and ∗size the base pointer and size of the resulting mapping. There must be a valid OpenGL context bound to the current thread when this function is called.4. See also: cuGraphicsMapResources 267 4.0.4 CUresult cuGLRegisterBufferObject (GLuint buffer) Deprecated This function is deprecated as of Cuda 3.Stream to synchronize Returns: CUDA_SUCCESS. Generated for NVIDIA CUDA Library by Doxygen . asynchronous launches. and the buffer name is resolved by that context. CUDA_ERROR_INVALID_VALUE.41 OpenGL Interoperability [DEPRECATED] Note: Note that this function may also return error codes from previous. CUDA_ERROR_NOT_INITIALIZED.The name of the buffer object to register.Returned mapped base pointer size . or a member of the same shareGroup.3 CUresult cuGLMapBufferObjectAsync (CUdeviceptr ∗ dptr.0. Parameters: dptr . There must be a valid OpenGL context bound to the current thread when this function is called. size_t ∗ size.The name of the buffer object to map hStream .Returned size of mapping buffer . CUDA_ERROR_MAP_FAILED Note: Note that this function may also return error codes from previous. CUstream hStream) Deprecated This function is deprecated as of Cuda 3.

or a member of the same shareGroup.4. then CUDA_ERROR_INVALID_HANDLE is returned. so none of the data previously stored in the resource will be preserved. CUDA_ERROR_INVALID_CONTEXT. ERROR_INVALID_CONTEXT.5 CUresult cuGLSetBufferObjectMapFlags (GLuint buffer.Buffer object to unmap Flags .Map flags Returns: CUDA_SUCCESS. • CU_GL_MAP_RESOURCE_FLAGS_WRITE_DISCARD: Specifies that CUDA kernels which access this resource will not read from this resource and will write over the entire contents of the resource. CUDA_ERROR_DEINITIALIZED. It is therefore assumed that this resource will be read from and written to by CUDA kernels. Parameters: buffer . unsigned int Flags) Deprecated This function is deprecated as of Cuda 3.41. This must be the same context.268 Returns: Module Documentation CUDA_SUCCESS.0. • CU_GL_MAP_RESOURCE_FLAGS_READ_ONLY: Specifies that CUDA kernels which access this resource will not write to this resource. The Flags argument may be any of the following: • CU_GL_MAP_RESOURCE_FLAGS_NONE: Specifies no hints about how this resource will be used. CUDA_ERROR_ALREADY_MAPPED Note: Note that this function may also return error codes from previous. Changes to Flags will take effect the next time buffer is mapped. See also: cuGraphicsGLRegisterBuffer CUDA_- 4. See also: cuGraphicsResourceSetMapFlags Generated for NVIDIA CUDA Library by Doxygen . CUDA_ERROR_NOT_INITIALIZED. Sets the map flags for the buffer object specified by buffer. If buffer is presently mapped for access by CUDA. CUDA_ERROR_INVALID_HANDLE. CUDA_ERROR_NOT_INITIALIZED. Note: Note that this function may also return error codes from previous. then CUDA_ERROR_ALREADY_MAPPED is returned. There must be a valid OpenGL context bound to the current thread when this function is called. as the context that was bound when the buffer was registered. CUDA_ERROR_ALREADY_MAPPED. This is the default value. asynchronous launches. asynchronous launches. If buffer has not been registered for use with CUDA.

41 OpenGL Interoperability [DEPRECATED] 4. Parameters: buffer . CUstream hStream) Deprecated This function is deprecated as of Cuda 3. as the context that was bound when the buffer was registered. ERROR_INVALID_CONTEXT. See also: cuGraphicsUnmapResources CUDA_- 4. There must be a valid OpenGL context bound to the current thread when this function is called. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous.41. CUDA_ERROR_DEINITIALIZED. CUDA_ERROR_NOT_INITIALIZED.Name of the buffer object to unmap hStream .4. CUDA_ERROR_NOT_INITIALIZED. asynchronous launches. as the context that was bound when the buffer was registered. ERROR_INVALID_CONTEXT. Parameters: buffer .7 CUresult cuGLUnmapBufferObjectAsync (GLuint buffer. All streams in the current CUDA context are synchronized with the current GL context. See also: cuGraphicsUnmapResources CUDA_- Generated for NVIDIA CUDA Library by Doxygen .0. CUDA_ERROR_DEINITIALIZED.4.4. Stream hStream in the current CUDA context is synchronized with the current GL context. This must be the same context. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous.Buffer object to unmap Returns: CUDA_SUCCESS. Unmaps the buffer object specified by buffer for access by CUDA.0. Unmaps the buffer object specified by buffer for access by CUDA. asynchronous launches. or a member of the same shareGroup.6 CUresult cuGLUnmapBufferObject (GLuint buffer) 269 Deprecated This function is deprecated as of Cuda 3.41.Stream to synchronize Returns: CUDA_SUCCESS. or a member of the same shareGroup. This must be the same context. There must be a valid OpenGL context bound to the current thread when this function is called.

This releases any resources associated with the registered buffer. the buffer may no longer be mapped for access by CUDA.4. asynchronous launches.0. or a member of the same shareGroup. See also: cuGraphicsUnregisterResource CUDA_- Generated for NVIDIA CUDA Library by Doxygen .Name of the buffer object to unregister Returns: CUDA_SUCCESS. as the context that was bound when the buffer was registered. There must be a valid OpenGL context bound to the current thread when this function is called.270 4. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. Unregisters the buffer object specified by buffer. After this call. CUDA_ERROR_DEINITIALIZED.8 CUresult cuGLUnregisterBufferObject (GLuint buffer) Module Documentation Deprecated This function is deprecated as of Cuda 3. Parameters: buffer .41. This must be the same context. ERROR_INVALID_CONTEXT. CUDA_ERROR_NOT_INITIALIZED.

CU_D3D9_DEVICE_LIST_CURRENT_FRAME = 0x02. IDirect3DResource9 ∗pD3DResource.42 Direct3D 9 Interoperability 271 4.1 Detailed Description This section describes the Direct3D 9 interoperability functions of the low-level CUDA driver application programming interface. CUdevice cudaDevice) Create a CUDA context for interoperability with Direct3D 9. 4. Generated for NVIDIA CUDA Library by Doxygen . unsigned int flags. unsigned int Flags. CUdevice ∗pCudaDevices. • CUresult cuD3D9GetDevice (CUdevice ∗pCudaDevice. const char ∗pszAdapterName) Gets the CUDA device corresponding to a display adapter. • CUresult cuD3D9GetDevices (unsigned int ∗pCudaDeviceCount. unsigned int Flags) Register a Direct3D 9 resource for access by CUDA. CU_D3D9_DEVICE_LIST_NEXT_FRAME = 0x03 } Functions • CUresult cuD3D9CtxCreate (CUcontext ∗pCtx.42 Direct3D 9 Interoperability Modules • Direct3D 9 Interoperability [DEPRECATED] Typedefs • typedef enum CUd3d9DeviceList_enum CUd3d9DeviceList Enumerations • enum CUd3d9DeviceList_enum { CU_D3D9_DEVICE_LIST_ALL = 0x01. unsigned int cudaDeviceCount. IDirect3DDevice9 ∗pD3D9Device.42. • CUresult cuD3D9CtxCreateOnDevice (CUcontext ∗pCtx. IDirect3DDevice9 ∗pD3DDevice) Create a CUDA context for interoperability with Direct3D 9. CUd3d9DeviceList deviceList) Gets the CUDA devices corresponding to a Direct3D 9 device. IDirect3DDevice9 ∗pD3DDevice. • CUresult cuD3D9GetDirect3DDevice (IDirect3DDevice9 ∗∗ppD3DDevice) Get the Direct3D 9 device against which the current CUDA context was created.4. CUdevice ∗pCudaDevice. • CUresult cuGraphicsD3D9RegisterResource (CUgraphicsResource ∗pCudaResource.

2 4. If pCudaDevice is non-NULL then the CUdevice on which this CUDA context was created will be returned in ∗pCudaDevice. and associates the created CUDA context with the calling thread.3. CUDA_ERROR_OUT_OF_MEMORY.42. See also: cuD3D9GetDevice. CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_DEINITIALIZED. This context will cease to function if pD3DDevice is destroyed or encounters an error. enables interoperability for that context with the Direct3D device pD3DDevice.42. Direct3D resources from this device may be registered and mapped through the lifetime of this CUDA context. CUdevice ∗ pCudaDevice.Context creation flags (see cuCtxCreate() for details) pD3DDevice .Returned pointer to the device on which the context was created Flags .Direct3D device to create interoperability context with Returns: CUDA_SUCCESS. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous.3 4. CUDA_ERROR_INVALID_VALUE. The created CUcontext will be returned in ∗pCtx.42. cuGraphicsD3D9RegisterResource Generated for NVIDIA CUDA Library by Doxygen .272 Module Documentation 4.2. This reference count will be decremented upon destruction of this context through cuCtxDestroy().1 Typedef Documentation typedef enum CUd3d9DeviceList_enum CUd3d9DeviceList CUDA devices corresponding to a D3D9 device 4.1 Function Documentation CUresult cuD3D9CtxCreate (CUcontext ∗ pCtx. unsigned int Flags.4.1 Enumeration Type Documentation enum CUd3d9DeviceList_enum CUDA devices corresponding to a D3D9 device Enumerator: CU_D3D9_DEVICE_LIST_ALL The CUDA devices for all GPUs used by a D3D9 device CU_D3D9_DEVICE_LIST_CURRENT_FRAME The CUDA devices for the GPUs used by a D3D9 device in its currently rendering frame CU_D3D9_DEVICE_LIST_NEXT_FRAME The CUDA devices for the GPUs to be used by a D3D9 device in the next frame 4.4 4. this call will increase the internal reference count on pD3DDevice. Parameters: pCtx .Returned newly created CUDA context pCudaDevice . IDirect3DDevice9 ∗ pD3DDevice) Creates a new CUDA context. asynchronous launches.42. On success.42.42.

The created CUcontext will be returned in ∗pCtx. enables interoperability for that context with the Direct3D device pD3DDevice. CUDA_ERROR_NOT_INITIALIZED.Direct3D device to create interoperability context with cudaDevice . CUdevice cudaDevice) Creates a new CUDA context. const char ∗ pszAdapterName) Returns in ∗pCudaDevice the CUDA-compatible device corresponding to the adapter name pszAdapterName obtained from EnumDisplayDevices() or IDirect3D9::GetAdapterIdentifier(). This context will cease to function if pD3DDevice is destroyed or encounters an error. CUDA_ERROR_DEINITIALIZED. unsigned int flags.3 CUresult cuD3D9GetDevice (CUdevice ∗ pCudaDevice. CUDA_ERROR_DEINITIALIZED. IDirect3DDevice9 ∗ pD3DDevice. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. and associates the created CUDA context with the calling thread.2 273 CUresult cuD3D9CtxCreateOnDevice (CUcontext ∗ pCtx. See also: cuD3D9CtxCreate CUDA_- Generated for NVIDIA CUDA Library by Doxygen . Returns: CUDA_SUCCESS. CUDA_ERROR_OUT_OF_MEMORY. Direct3D resources from this device may be registered and mapped through the lifetime of this CUDA context.Adapter name to query for device Returns: CUDA_SUCCESS. asynchronous launches. On success.42 Direct3D 9 Interoperability 4.42. asynchronous launches.42. This device must be among the devices returned when querying CU_D3D9_DEVICES_ALL from cuD3D9GetDevices.4. ERROR_INVALID_VALUE.4. CUDA_ERROR_NOT_INITIALIZED.Returned newly created CUDA context flags . cuGraphicsD3D9RegisterResource 4.The CUDA device on which to create the context.Returned CUDA device corresponding to pszAdapterName pszAdapterName . then the call will fail. See also: cuD3D9GetDevices. This reference count will be decremented upon destruction of this context through cuCtxDestroy(). Parameters: pCtx . CUDA_ERROR_INVALID_VALUE. If no device on the adapter with name pszAdapterName is CUDA-compatible. this call will increase the internal reference count on pD3DDevice. Parameters: pCudaDevice .Context creation flags (see cuCtxCreate() for details) pD3DDevice .4.

Parameters: ppD3DDevice .5 CUresult cuD3D9GetDirect3DDevice (IDirect3DDevice9 ∗∗ ppD3DDevice) Returns in ∗ppD3DDevice the Direct3D device against which this CUDA context was created in cuD3D9CtxCreate(). Returns: CUDA_SUCCESS. See also: cuD3D9CtxCreate CUDA_ERROR_NOT_INITIALIZED. asynchronous launches.274 4.The set of devices to return. CUd3d9DeviceList deviceList) Returns in ∗pCudaDeviceCount the number of CUDA-compatible device corresponding to the Direct3D 9 device pD3D9Device. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous.4. CU_D3D9_DEVICE_LIST_CURRENT_FRAME for the devices used to render the current frame (in SLI). CUDA_ERROR_DEINITIALIZED. ERROR_NO_DEVICE. Also returns in ∗pCudaDevices at most cudaDeviceCount of the the CUDA-compatible devices corresponding to the Direct3D 9 device pD3D9Device.4. or CU_D3D9_DEVICE_LIST_NEXT_FRAME for the devices used to render the next frame (in SLI).42. unsigned int cudaDeviceCount.Returned CUDA devices corresponding to pD3D9Device cudaDeviceCount . CUDA_ERROR_DEINITIALIZED. CUDA_- 4. This set may be CU_D3D9_DEVICE_LIST_ALL for all devices. Parameters: pCudaDeviceCount . If any of the GPUs being used to render pDevice are not CUDA capable then the call will return CUDA_ERROR_NO_DEVICE.Direct3D 9 device to query for CUDA devices deviceList .Returned number of CUDA devices corresponding to pD3D9Device pCudaDevices . See also: cuD3D9GetDevice CUDA_ERROR_NOT_INITIALIZED. ERROR_INVALID_CONTEXT Note: Note that this function may also return error codes from previous. CUDA_- Generated for NVIDIA CUDA Library by Doxygen . CUdevice ∗ pCudaDevices.42.4 Module Documentation CUresult cuD3D9GetDevices (unsigned int ∗ pCudaDeviceCount. asynchronous launches.The size of the output device array pCudaDevices pD3D9Device . IDirect3DDevice9 ∗ pD3D9Device.Returned Direct3D device corresponding to CUDA context Returns: CUDA_SUCCESS.

unsigned int Flags) 275 Registers the Direct3D 9 resource pD3DResource for access by CUDA and returns a CUDA handle to pD3Dresource in pCudaResource. • IDirect3DVertexBuffer9: may be accessed through a device pointer • IDirect3DIndexBuffer9: may be accessed through a device pointer • IDirect3DSurface9: may be accessed through an array. • Surfaces of depth or stencil formats cannot be shared. CUDA_ERROR_UNKNOWN Generated for NVIDIA CUDA Library by Doxygen .Returned graphics resource handle pD3DResource . This reference count will be decremented when this resource is unregistered through cuGraphicsUnregisterResource().Parameters for resource registration Returns: CUDA_SUCCESS. individual mipmap levels and faces of cube maps may not be registered directly. On success this call will increase the internal reference count on pD3DResource. To access individual surfaces associated with a texture. CUDA_ERROR_INVALID_VALUE.42 Direct3D 9 Interoperability 4. If pD3DResource is of incorrect type or is already registered then CUDA_ERROR_INVALID_HANDLE is returned. or 32-bit integer or floating-point data cannot be shared. IDirect3DResource9 ∗ pD3DResource. The type of pD3DResource must be one of the following. • The primary rendertarget may not be registered with CUDA. one must register the base texture object.42. If pD3DResource cannot be registered then CUDA_ERROR_UNKNOWN is returned. CUDA_ERROR_INVALID_CONTEXT. CUDA_ERROR_DEINITIALIZED. The handle returned in pCudaResource may be used to map and unmap this resource until it is unregistered.Direct3D resource to register Flags . Only stand-alone objects of type IDirect3DSurface9 may be explicitly shared. CUDA_ERROR_OUT_OF_MEMORY. or 4 channels of 8. • Resources allocated as shared may not be registered with CUDA. If Flags is not one of the above specified value then CUDA_ERROR_INVALID_VALUE is returned. In particular. Parameters: pCudaResource .6 CUresult cuGraphicsD3D9RegisterResource (CUgraphicsResource ∗ pCudaResource. The only valid value for this parameter is • CU_GRAPHICS_REGISTER_FLAGS_NONE Not all Direct3D resources of the above types may be used for interoperability with CUDA. 16. CUDA_ERROR_NOT_INITIALIZED.4.4. The Flags argument may be used to specify additional parameters at register time. The following are some limitations. • Textures which are not of a format which is 1. If Direct3D interoperability is not initialized for this context using cuD3D9CtxCreate then CUDA_ERROR_INVALID_CONTEXT is returned. 2. CUDA_ERROR_INVALID_HANDLE. • IDirect3DBaseTexture9: individual surfaces on this texture may be accessed through an array. This call is potentially high-overhead and should not be called every frame in interactive applications.

276 Note: Module Documentation Note that this function may also return error codes from previous. asynchronous launches. sourceGetMappedArray. cuGraphicsMapResources. cuGraphicsResourceGetMappedPointer cuGraphicsSubRe- Generated for NVIDIA CUDA Library by Doxygen . See also: cuD3D9CtxCreate. cuGraphicsUnregisterResource.

unsigned int Level) Get the dimensions of a registered surface. unsigned int Level) Get the pointer through which to access a subresource of a Direct3D resource which has been mapped for access by CUDA. IDirect3DResource9 ∗pResource. unsigned int Level) Get the size of a subresource of a Direct3D resource which has been mapped for access by CUDA. unsigned int Level) Get the pitch of a subresource of a Direct3D resource which has been mapped for access by CUDA. size_t ∗pPitchSlice. IDirect3DResource9 ∗pResource. unsigned int Flags) Register a Direct3D resource for access by CUDA. unsigned int Face. unsigned int Face. • CUresult cuD3D9ResourceGetMappedSize (size_t ∗pSize. • CUresult cuD3D9ResourceGetMappedArray (CUarray ∗pArray. • CUresult cuD3D9ResourceGetMappedPointer (CUdeviceptr ∗pDevPtr. • CUresult cuD3D9UnmapResources (unsigned int count. IDirect3DResource9 ∗∗ppResource) Map Direct3D resources for access by CUDA. • CUresult cuD3D9RegisterResource (IDirect3DResource9 ∗pResource. Generated for NVIDIA CUDA Library by Doxygen . unsigned int Face. IDirect3DResource9 ∗∗ppResource) Unmaps Direct3D resources.43 Direct3D 9 Interoperability [DEPRECATED] Typedefs • typedef enum CUd3d9map_flags_enum CUd3d9map_flags • typedef enum CUd3d9register_flags_enum CUd3d9register_flags Enumerations • enum CUd3d9map_flags_enum • enum CUd3d9register_flags_enum Functions • CUresult cuD3D9MapResources (unsigned int count. IDirect3DResource9 ∗pResource. unsigned int Face. size_t ∗pHeight.43 Direct3D 9 Interoperability [DEPRECATED] 277 4. unsigned int Level) Get an array through which to access a subresource of a Direct3D resource which has been mapped for access by CUDA. unsigned int Face. • CUresult cuD3D9ResourceSetMapFlags (IDirect3DResource9 ∗pResource. IDirect3DResource9 ∗pResource. • CUresult cuD3D9ResourceGetSurfaceDimensions (size_t ∗pWidth.4. unsigned int Flags) Set usage flags for mapping a Direct3D resource. • CUresult cuD3D9UnregisterResource (IDirect3DResource9 ∗pResource) Unregister a Direct3D resource. IDirect3DResource9 ∗pResource. size_t ∗pDepth. • CUresult cuD3D9ResourceGetMappedPitch (size_t ∗pPitch.

43.43.0.2.3 4.1 Typedef Documentation typedef enum CUd3d9map_flags_enum CUd3d9map_flags Flags to map or unmap a resource 4. 4.43.2 typedef enum CUd3d9register_flags_enum CUd3d9register_flags Flags to register a resource 4. If any of ppResource have not been registered for use with CUDA or if ppResource contains any duplicate entries.278 Module Documentation 4.43.4.43. then CUDA_ERROR_INVALID_HANDLE is returned. This function provides the synchronization guarantee that any Direct3D calls issued before cuD3D9MapResources() will complete before any CUDA kernels issued after cuD3D9MapResources() begin. The resources in ppResource may be accessed in CUDA kernels until they are unmapped.43.4 4. Maps the count Direct3D resources in ppResource for access by CUDA. Direct3D should not access any resources while they are mapped by CUDA. Parameters: count .Resources to map for CUDA usage Generated for NVIDIA CUDA Library by Doxygen .1 Detailed Description This section describes deprecated Direct3D 9 interoperability functionality.Number of resources in ppResource ppResource .3.2 enum CUd3d9register_flags_enum Flags to register a resource 4.3.43. If any of ppResource are presently mapped for access by CUDA.1 Enumeration Type Documentation enum CUd3d9map_flags_enum Flags to map or unmap a resource 4.43. If an application does so the results are undefined. then CUDA_ERROR_ALREADY_MAPPED is returned.1 Function Documentation CUresult cuD3D9MapResources (unsigned int count.2.43.2 4. IDirect3DResource9 ∗∗ ppResource) Deprecated This function is deprecated as of Cuda 3.

43 Direct3D 9 Interoperability [DEPRECATED] Returns: 279 CUDA_SUCCESS. asynchronous launches. Generated for NVIDIA CUDA Library by Doxygen . If this call is successful. For restrictions on the Flags parameter. The type of pResource must be one of the following.0. • IDirect3DIndexBuffer9: Cannot be used with Flags set to CU_D3D9_REGISTER_FLAGS_ARRAY. and (for textures). all surfaces associated with the all mipmap levels of all faces of the texture will be accessible to CUDA. • IDirect3DBaseTexture9: When a texture is registered. To access individual surfaces associated with a texture. The following are some limitations. CUDA_ERROR_ALREADY_MAPPED. CUDA_ERROR_NOT_INITIALIZED. See also: cuGraphicsMapResources 4. see type IDirect3DBaseTexture9. CUDA_ERROR_DEINITIALIZED. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. pitch for each subresource of this allocation may be queried through cuD3D9ResourceGetMappedPointer(). The following values are allowed. Not all Direct3D resources of the above types may be used for interoperability with CUDA. then the application will be able to map and unmap this resource until it is unregistered through cuD3D9UnregisterResource(). The pointer. size. • IDirect3DSurface9: Only stand-alone objects of type IDirect3DSurface9 may be explicitly shared. The Flags argument specifies the mechanism through which CUDA will access the Direct3D resource. unsigned int Flags) Deprecated This function is deprecated as of Cuda 3. one must register the base texture object. and cuD3D9ResourceGetMappedPitch() respectively.4. • CU_D3D9_REGISTER_FLAGS_ARRAY: Specifies that CUDA will access this resource through a CUarray queried on a sub-resource basis through cuD3D9ResourceGetMappedArray(). this call will increase the internal reference count on pResource.2 CUresult cuD3D9RegisterResource (IDirect3DResource9 ∗ pResource.43. • IDirect3DVertexBuffer9: Cannot be used with Flags set to CU_D3D9_REGISTER_FLAGS_ARRAY.4. CUDA_ERROR_INVALID_HANDLE. Registers the Direct3D resource pResource for access by CUDA. cuD3D9ResourceGetMappedSize(). This call is potentially high-overhead and should not be called every frame in interactive applications. This option is only valid for resources of type IDirect3DSurface9 and subtypes of IDirect3DBaseTexture9. This reference count will be decremented when this resource is unregistered through cuD3D9UnregisterResource(). This option is valid for all resource types. CUDA_ERROR_INVALID_CONTEXT. • CU_D3D9_REGISTER_FLAGS_NONE: Specifies that CUDA will access this resource through a CUdeviceptr. individual mipmap levels and faces of cube maps may not be registered directly. In particular. Also on success.

Returned array corresponding to subresource pResource .Mapped resource to access Generated for NVIDIA CUDA Library by Doxygen . CUDA_ERROR_INVALID_CONTEXT. asynchronous launches.Flags for resource registration Returns: CUDA_SUCCESS. or 4 channels of 8. • Surfaces of depth or stencil formats cannot be shared. 2. If pResource is not mapped then CUDA_ERROR_NOT_MAPPED is returned. If pResource cannot be registered then CUDA_ERROR_UNKNOWN is returned. or 32-bit integer or floating-point data cannot be shared. unsigned int Level) Deprecated This function is deprecated as of Cuda 3. Returns in ∗pArray an array through which the subresource of the mapped Direct3D resource pResource which corresponds to Face and Level may be accessed. If pResource was not registered with usage flags CU_D3D9_REGISTER_FLAGS_ARRAY then CUDA_ERROR_INVALID_HANDLE is returned.280 • The primary rendertarget may not be registered with CUDA. unsigned int Face. • Textures which are not of a format which is 1. CUDA_ERROR_INVALID_HANDLE. is a non-stand-alone IDirect3DSurface9) or is already registered. If pResource is of incorrect type (e.g. If pResource is not registered then CUDA_ERROR_INVALID_HANDLE is returned. IDirect3DResource9 ∗ pResource. 16. CUDA_ERROR_INVALID_VALUE.Resource to register for CUDA access Flags . then CUDA_ERROR_INVALID_CONTEXT is returned. see cuD3D9ResourceGetMappedPointer(). Parameters: pArray .4. CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_OUT_OF_MEMORY. The value set in pArray may change every time that pResource is mapped. Parameters: pResource . CUDA_ERROR_DEINITIALIZED. then CUDA_ERROR_INVALID_HANDLE is returned. For usage requirements of Face and Level parameters. See also: cuGraphicsD3D9RegisterResource 4. • Resources allocated as shared may not be registered with CUDA. Module Documentation • Any resources allocated in D3DPOOL_SYSTEMMEM or D3DPOOL_MANAGED may not be registered with CUDA. If Direct3D interoperability is not initialized on this context.43. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous.3 CUresult cuD3D9ResourceGetMappedArray (CUarray ∗ pArray.0.

CUDA_ERROR_INVALID_CONTEXT.Face of resource to access Level . y. If pResource is not mapped for access by CUDA then CUDA_ERROR_NOT_MAPPED is returned.43 Direct3D 9 Interoperability [DEPRECATED] Face . then cudaErrorInvalidResourceHandle is returned. CUDA_ERROR_INVALID_HANDLE.43. unsigned int Face. CUDA_ERROR_NOT_MAPPED Note: Note that this function may also return error codes from previous. the byte offset of the sample at position x.0. IDirect3DResource9 ∗ pResource.Returned Z-slice pitch of subresource pResource . size_t ∗ pPitchSlice.Returned pitch of subresource pPitchSlice . then CUDA_ERROR_INVALID_HANDLE is returned. For usage requirements of Face and Level parameters. unsigned int Level) Deprecated This function is deprecated as of Cuda 3. CUDA_ERROR_DEINITIALIZED. See also: cuGraphicsSubResourceGetMappedArray 4. z from the base pointer of the surface is: z∗ slicePitch + y ∗ pitch + (bytes per pixel) ∗ x Both parameters pPitch and pPitchSlice are optional and may be set to NULL.Mapped resource to access Face . Returns in ∗pPitch and ∗pPitchSlice the pitch and Z-slice pitch of the subresource of the mapped Direct3D resource pResource. Parameters: pPitch . CUDA_ERROR_INVALID_HANDLE.Level of resource to access Returns: CUDA_SUCCESS.4.Face of resource to access Level . For a 2D surface. see cuD3D9ResourceGetMappedPointer(). CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_NOT_MAPPED Generated for NVIDIA CUDA Library by Doxygen . the byte offset of the sample at position x. CUDA_ERROR_DEINITIALIZED. which corresponds to Face and Level. If pResource is not of type IDirect3DBaseTexture9 or one of its sub-types or if pResource has not been registered for use with CUDA. If pResource was not registered with usage flags CU_D3D9_REGISTER_FLAGS_NONE. CUDA_ERROR_NOT_INITIALIZED. y from the base pointer of the surface is: y ∗ pitch + (bytes per pixel) ∗ x For a 3D surface. CUDA_ERROR_INVALID_VALUE.Level of resource to access Returns: 281 CUDA_SUCCESS.4 CUresult cuD3D9ResourceGetMappedPitch (size_t ∗ pPitch. The values set in pPitch and pPitchSlice may change every time that pResource is mapped. CUDA_ERROR_INVALID_CONTEXT.4. The pitch and Z-slice pitch values may be used to compute the location of a sample on a surface as follows. asynchronous launches. CUDA_ERROR_INVALID_VALUE.

If Level is invalid.Returned pointer corresponding to subresource pResource . See also: cuGraphicsSubResourceGetMappedArray 4.43. CUDA_ERROR_INVALID_VALUE. IDirect3DResource9 ∗ pResource. then CUDA_ERROR_INVALID_VALUE is returned. then CUDA_ERROR_INVALID_HANDLE is returned. At present only mipmap level 0 is supported. unsigned int Face.6 CUresult cuD3D9ResourceGetMappedSize (size_t ∗ pSize.0.5 CUresult cuD3D9ResourceGetMappedPointer (CUdeviceptr ∗ pDevPtr. If pResource is not registered. If pResource was not registered with usage flags CU_D3D9_REGISTER_FLAGS_NONE. If pResource is not mapped.0.4.4. CUDA_ERROR_DEINITIALIZED. asynchronous launches.Face of resource to access Level . See also: cuGraphicsResourceGetMappedPointer 4. then CUDA_ERROR_INVALID_VALUE is returned. CUDA_ERROR_NOT_MAPPED Note: Note that this function may also return error codes from previous. then Level must correspond to a valid mipmap level.282 Note: Module Documentation Note that this function may also return error codes from previous. CUDA_ERROR_INVALID_CONTEXT. Parameters: pDevPtr .Level of resource to access Returns: CUDA_SUCCESS. CUDA_ERROR_INVALID_HANDLE. CUDA_ERROR_NOT_INITIALIZED. asynchronous launches. unsigned int Level) Deprecated This function is deprecated as of Cuda 3. For all other types Level must be 0. then Face must one of the values enumerated by type D3DCUBEMAP_FACES. which corresponds to Face and Level. unsigned int Level) Deprecated This function is deprecated as of Cuda 3. Generated for NVIDIA CUDA Library by Doxygen . If pResource is of type IDirect3DBaseTexture9.43. If Face is invalid. then CUDA_ERROR_NOT_MAPPED is returned. Returns in ∗pDevPtr the base pointer of the subresource of the mapped Direct3D resource pResource. unsigned int Face. then CUDA_ERROR_INVALID_HANDLE is returned. The value set in pDevPtr may change every time that pResource is mapped.Mapped resource to access Face . If pResource is of type IDirect3DCubeTexture9. IDirect3DResource9 ∗ pResource. For all other types Face must be 0.

7 CUresult cuD3D9ResourceGetSurfaceDimensions (size_t ∗ pWidth. and pDepth are optional. and ∗pDepth the dimensions of the subresource of the mapped Direct3D resource pResource. CUDA_ERROR_INVALID_HANDLE. size_t ∗ pHeight. unsigned int Level) Deprecated This function is deprecated as of Cuda 3. which corresponds to Face and Level. CUDA_ERROR_INVALID_CONTEXT. The parameters pWidth.Returned depth of surface Generated for NVIDIA CUDA Library by Doxygen .Level of resource to access Returns: CUDA_SUCCESS. then CUDA_ERROR_INVALID_HANDLE is returned.Returned size of subresource pResource . If pResource is not of type IDirect3DBaseTexture9 or IDirect3DSurface9 or if pResource has not been registered for use with CUDA. IDirect3DResource9 ∗ pResource. Because anti-aliased surfaces may have multiple samples per pixel. For usage requirements of Face and Level parameters. Returns in ∗pWidth.Face of resource to access Level . it is possible that the dimensions of a resource will be an integer factor larger than the dimensions reported by the Direct3D runtime. ∗pHeight.Returned width of surface pHeight . the value returned in ∗pDepth will be 0.0. see cuD3D9ResourceGetMappedPointer. The value set in pSize may change every time that pResource is mapped.4. pHeight. For usage requirements of Face and Level parameters. then CUDA_ERROR_INVALID_HANDLE is returned. CUDA_ERROR_NOT_INITIALIZED. then CUDA_ERROR_INVALID_HANDLE is returned. CUDA_ERROR_INVALID_VALUE. If pResource has not been registered for use with CUDA. See also: cuGraphicsResourceGetMappedPointer 4. Parameters: pWidth . which corresponds to Face and Level. then CUDA_ERROR_NOT_MAPPED is returned. CUDA_ERROR_NOT_MAPPED Note: Note that this function may also return error codes from previous.4. unsigned int Face. see cuD3D9ResourceGetMappedPointer(). If pResource was not registered with usage flags CU_D3D9_REGISTER_FLAGS_NONE. asynchronous launches.Returned height of surface pDepth .43 Direct3D 9 Interoperability [DEPRECATED] 283 Returns in ∗pSize the size of the subresource of the mapped Direct3D resource pResource. If pResource is not mapped for access by CUDA.Mapped resource to access Face . Parameters: pSize .43. CUDA_ERROR_DEINITIALIZED. size_t ∗ pDepth. For 2D surfaces.

CUDA_ERROR_INVALID_CONTEXT.43. asynchronous launches. If pResource is presently mapped for access by CUDA. asynchronous launches. This is the default value. CUDA_ERROR_ALREADY_MAPPED Note: Note that this function may also return error codes from previous. CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_INVALID_HANDLE.0.Face of resource to access Level .Registered resource to access Face . The Flags argument may be any of the following: • CU_D3D9_MAPRESOURCE_FLAGS_NONE: Specifies no hints about how this resource will be used. then CUDA_ERROR_INVALID_HANDLE is returned. Set Flags for mapping the Direct3D resource pResource. Parameters: pResource . It is therefore assumed that this resource will be read from and written to by CUDA kernels. CUDA_ERROR_INVALID_VALUE. • CU_D3D9_MAPRESOURCE_FLAGS_WRITEDISCARD: Specifies that CUDA kernels which access this resource will not read from this resource and will write over the entire contents of the resource. • CU_D3D9_MAPRESOURCE_FLAGS_READONLY: Specifies that CUDA kernels which access this resource will not write to this resource. CUDA_ERROR_DEINITIALIZED. CUDA_ERROR_INVALID_HANDLE Note: Note that this function may also return error codes from previous.4.8 CUresult cuD3D9ResourceSetMapFlags (IDirect3DResource9 ∗ pResource. CUDA_ERROR_NOT_INITIALIZED. unsigned int Flags) Deprecated This function is deprecated as of Cuda 3. See also: cuGraphicsResourceSetMapFlags Generated for NVIDIA CUDA Library by Doxygen . CUDA_ERROR_INVALID_CONTEXT. CUDA_ERROR_DEINITIALIZED.Registered resource to set flags for Flags . Changes to Flags will take effect the next time pResource is mapped. then CUDA_ERROR_ALREADY_MAPPED is returned.Level of resource to access Returns: Module Documentation CUDA_SUCCESS.Parameters for resource mapping Returns: CUDA_SUCCESS.284 pResource . See also: cuGraphicsSubResourceGetMappedArray 4. so none of the data previously stored in the resource will be preserved. If pResource has not been registered for use with CUDA. CUDA_ERROR_INVALID_VALUE.

IDirect3DResource9 ∗∗ ppResource) 285 Deprecated This function is deprecated as of Cuda 3.4. If pResource is not registered. then CUDA_ERROR_NOT_MAPPED is returned. then CUDA_ERROR_INVALID_HANDLE is returned. CUDA_ERROR_INVALID_HANDLE. This function provides the synchronization guarantee that any CUDA kernels issued before cuD3D9UnmapResources() will complete before any Direct3D calls issued after cuD3D9UnmapResources() begin. CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_DEINITIALIZED. asynchronous launches.0. CUDA_ERROR_INVALID_CONTEXT.4. asynchronous launches.Resource to unregister Returns: CUDA_SUCCESS. If any of ppResource are not presently mapped for access by CUDA. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. CUDA_ERROR_DEINITIALIZED.4.Number of resources to unmap for CUDA ppResource . CUDA_ERROR_INVALID_HANDLE.9 CUresult cuD3D9UnmapResources (unsigned int count. CUDA_ERROR_NOT_INITIALIZED. Unregisters the Direct3D resource pResource so it is not accessible by CUDA unless registered again. See also: cuGraphicsUnregisterResource Generated for NVIDIA CUDA Library by Doxygen .0.Resources to unmap for CUDA Returns: CUDA_SUCCESS. If any of ppResource have not been registered for use with CUDA or if ppResource contains any duplicate entries.43. CUDA_ERROR_NOT_MAPPED. Unmaps the count Direct3D resources in ppResource.43. Parameters: count . CUDA_ERROR_INVALID_CONTEXT. See also: cuGraphicsUnmapResources 4.10 CUresult cuD3D9UnregisterResource (IDirect3DResource9 ∗ pResource) Deprecated This function is deprecated as of Cuda 3. Parameters: pResource . then CUDA_ERROR_INVALID_HANDLE is returned. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous.43 Direct3D 9 Interoperability [DEPRECATED] 4.

• CUresult cuD3D10GetDirect3DDevice (ID3D10Device ∗∗ppD3DDevice) Get the Direct3D 10 device against which the current CUDA context was created. unsigned int cudaDeviceCount. unsigned int Flags. CUdevice ∗pCudaDevices. unsigned int Flags) Register a Direct3D 10 resource for access by CUDA. • CUresult cuD3D10GetDevice (CUdevice ∗pCudaDevice. ID3D10Device ∗pD3DDevice. 4. IDXGIAdapter ∗pAdapter) Gets the CUDA device corresponding to a display adapter. ID3D10Device ∗pD3DDevice) Create a CUDA context for interoperability with Direct3D 10. CU_D3D10_DEVICE_LIST_CURRENT_FRAME = 0x02.1 Detailed Description This section describes the Direct3D 10 interoperability functions of the low-level CUDA driver application programming interface. CUdevice cudaDevice) Create a CUDA context for interoperability with Direct3D 10. unsigned int flags. CUd3d10DeviceList deviceList) Gets the CUDA devices corresponding to a Direct3D 10 device. Generated for NVIDIA CUDA Library by Doxygen . • CUresult cuD3D10CtxCreateOnDevice (CUcontext ∗pCtx. CU_D3D10_DEVICE_LIST_NEXT_FRAME = 0x03 } Functions • CUresult cuD3D10CtxCreate (CUcontext ∗pCtx. • CUresult cuGraphicsD3D10RegisterResource (CUgraphicsResource ∗pCudaResource.286 Module Documentation 4. CUdevice ∗pCudaDevice. ID3D10Device ∗pD3D10Device.44.44 Direct3D 10 Interoperability Modules • Direct3D 10 Interoperability [DEPRECATED] Typedefs • typedef enum CUd3d10DeviceList_enum CUd3d10DeviceList Enumerations • enum CUd3d10DeviceList_enum { CU_D3D10_DEVICE_LIST_ALL = 0x01. • CUresult cuD3D10GetDevices (unsigned int ∗pCudaDeviceCount. ID3D10Resource ∗pD3DResource.

Direct3D device to create interoperability context with Returns: CUDA_SUCCESS. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. cuGraphicsD3D10RegisterResource Generated for NVIDIA CUDA Library by Doxygen .1 Function Documentation CUresult cuD3D10CtxCreate (CUcontext ∗ pCtx.1 Enumeration Type Documentation enum CUd3d10DeviceList_enum CUDA devices corresponding to a D3D10 device Enumerator: CU_D3D10_DEVICE_LIST_ALL The CUDA devices for all GPUs used by a D3D10 device CU_D3D10_DEVICE_LIST_CURRENT_FRAME The CUDA devices for the GPUs used by a D3D10 device in its currently rendering frame CU_D3D10_DEVICE_LIST_NEXT_FRAME The CUDA devices for the GPUs to be used by a D3D10 device in the next frame 4. The created CUcontext will be returned in ∗pCtx.44. and associates the created CUDA context with the calling thread.44 Direct3D 10 Interoperability 287 4.4. unsigned int Flags. This context will cease to function if pD3DDevice is destroyed or encounters an error. This reference count will be decremented upon destruction of this context through cuCtxDestroy().4. See also: cuD3D10GetDevice.44. Parameters: pCtx .Context creation flags (see cuCtxCreate() for details) pD3DDevice .44. ID3D10Device ∗ pD3DDevice) Creates a new CUDA context.2 4. CUDA_ERROR_OUT_OF_MEMORY.44. this call will increase the internal reference count on pD3DDevice.Returned newly created CUDA context pCudaDevice .44. If pCudaDevice is non-NULL then the CUdevice on which this CUDA context was created will be returned in ∗pCudaDevice.1 Typedef Documentation typedef enum CUd3d10DeviceList_enum CUd3d10DeviceList CUDA devices corresponding to a D3D10 device 4. enables interoperability for that context with the Direct3D device pD3DDevice. CUdevice ∗ pCudaDevice. CUDA_ERROR_INVALID_VALUE. On success.44.2. asynchronous launches. CUDA_ERROR_NOT_INITIALIZED.4 4.3.Returned pointer to the device on which the context was created Flags .3 4. CUDA_ERROR_DEINITIALIZED. Direct3D resources from this device may be registered and mapped through the lifetime of this CUDA context.

cuGraphicsD3D10RegisterResource 4. enables interoperability for that context with the Direct3D device pD3DDevice. Parameters: pCudaDevice . CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. This device must be among the devices returned when querying CU_D3D10_DEVICES_ALL from cuD3D10GetDevices. Direct3D resources from this device may be registered and mapped through the lifetime of this CUDA context.Direct3D device to create interoperability context with cudaDevice . CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous.44. CUDA_ERROR_INVALID_VALUE. CUDA_ERROR_OUT_OF_MEMORY.288 4. The created CUcontext will be returned in ∗pCtx. IDXGIAdapter ∗ pAdapter) Returns in ∗pCudaDevice the CUDA-compatible device corresponding to the adapter pAdapter obtained from IDXGIFactory::EnumAdapters. See also: cuD3D10GetDevices.2 Module Documentation CUresult cuD3D10CtxCreateOnDevice (CUcontext ∗ pCtx. This context will cease to function if pD3DDevice is destroyed or encounters an error. This reference count will be decremented upon destruction of this context through cuCtxDestroy(). this call will increase the internal reference count on pD3DDevice. asynchronous launches. ID3D10Device ∗ pD3DDevice. asynchronous launches. CUDA_ERROR_NOT_INITIALIZED. If no device on pAdapter is CUDA-compatible then the call will fail. unsigned int flags. and associates the created CUDA context with the calling thread.The CUDA device on which to create the context.3 CUresult cuD3D10GetDevice (CUdevice ∗ pCudaDevice. Parameters: pCtx .Returned CUDA device corresponding to pAdapter pAdapter . ERROR_INVALID_VALUE. CUdevice cudaDevice) Creates a new CUDA context.4. See also: cuD3D10CtxCreate CUDA_- Generated for NVIDIA CUDA Library by Doxygen .4. CUDA_ERROR_DEINITIALIZED.Adapter to query for CUDA device Returns: CUDA_SUCCESS.44. CUDA_ERROR_DEINITIALIZED. Returns: CUDA_SUCCESS.Context creation flags (see cuCtxCreate() for details) pD3DDevice . CUDA_ERROR_NOT_INITIALIZED.Returned newly created CUDA context flags . On success.

CUDA_- Generated for NVIDIA CUDA Library by Doxygen .4. ERROR_INVALID_CONTEXT Note: Note that this function may also return error codes from previous. asynchronous launches. Parameters: ppD3DDevice .The size of the output device array pCudaDevices pD3D10Device .5 CUresult cuD3D10GetDirect3DDevice (ID3D10Device ∗∗ ppD3DDevice) Returns in ∗ppD3DDevice the Direct3D device against which this CUDA context was created in cuD3D10CtxCreate(). CUdevice ∗ pCudaDevices.44 Direct3D 10 Interoperability 4.4.44.44. or CU_D3D10_DEVICE_LIST_NEXT_FRAME for the devices used to render the next frame (in SLI).The set of devices to return.Returned number of CUDA devices corresponding to pD3D10Device pCudaDevices . ID3D10Device ∗ pD3D10Device. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. CUDA_- 4.Returned CUDA devices corresponding to pD3D10Device cudaDeviceCount .4 CUresult cuD3D10GetDevices (unsigned int ∗ pCudaDeviceCount. Returns: CUDA_SUCCESS. See also: cuD3D10GetDevice CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_DEINITIALIZED. Also returns in ∗pCudaDevices at most cudaDeviceCount of the the CUDA-compatible devices corresponding to the Direct3D 10 device pD3D10Device. Parameters: pCudaDeviceCount . CUd3d10DeviceList deviceList) 289 Returns in ∗pCudaDeviceCount the number of CUDA-compatible device corresponding to the Direct3D 10 device pD3D10Device. asynchronous launches. CUDA_ERROR_DEINITIALIZED. See also: cuD3D10CtxCreate CUDA_ERROR_NOT_INITIALIZED.Returned Direct3D device corresponding to CUDA context Returns: CUDA_SUCCESS. CU_D3D10_DEVICE_LIST_CURRENT_FRAME for the devices used to render the current frame (in SLI). ERROR_NO_DEVICE.Direct3D 10 device to query for CUDA devices deviceList . This set may be CU_D3D10_DEVICE_LIST_ALL for all devices. unsigned int cudaDeviceCount. If any of the GPUs being used to render pDevice are not CUDA capable then the call will return CUDA_ERROR_NO_DEVICE.4.

If pD3DResource cannot be registered then CUDA_ERROR_UNKNOWN is returned.290 4.4. The following are some limitations. If Flags is not one of the above specified value then CUDA_ERROR_INVALID_VALUE is returned. 2. The handle returned in pCudaResource may be used to map and unmap this resource until it is unregistered. • ID3D10Buffer: may be accessed through a device pointer. • Textures which are not of a format which is 1. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. CUDA_ERROR_INVALID_CONTEXT. • ID3D10Texture1D: individual subresources of the texture may be accessed via arrays • ID3D10Texture2D: individual subresources of the texture may be accessed via arrays • ID3D10Texture3D: individual subresources of the texture may be accessed via arrays The Flags argument may be used to specify additional parameters at register time.Returned graphics resource handle pD3DResource . CUDA_ERROR_INVALID_VALUE.44. • Surfaces of depth or stencil formats cannot be shared. Parameters: pCudaResource . This reference count will be decremented when this resource is unregistered through cuGraphicsUnregisterResource(). Generated for NVIDIA CUDA Library by Doxygen . • The primary rendertarget may not be registered with CUDA. On success this call will increase the internal reference count on pD3DResource. unsigned int Flags) Registers the Direct3D 10 resource pD3DResource for access by CUDA and returns a CUDA handle to pD3Dresource in pCudaResource. This call is potentially high-overhead and should not be called every frame in interactive applications. ID3D10Resource ∗ pD3DResource.Parameters for resource registration Returns: CUDA_SUCCESS. CUDA_ERROR_NOT_INITIALIZED. 16.6 Module Documentation CUresult cuGraphicsD3D10RegisterResource (CUgraphicsResource ∗ pCudaResource. If pD3DResource is of incorrect type or is already registered then CUDA_ERROR_INVALID_HANDLE is returned. or 4 channels of 8. or 32-bit integer or floating-point data cannot be shared. asynchronous launches. If Direct3D interoperability is not initialized for this context using cuD3D10CtxCreate then CUDA_ERROR_INVALID_CONTEXT is returned. CUDA_ERROR_INVALID_HANDLE. The type of pD3DResource must be one of the following. CUDA_ERROR_DEINITIALIZED.Direct3D resource to register Flags . • Resources allocated as shared may not be registered with CUDA. CUDA_ERROR_OUT_OF_MEMORY. The only valid value for this parameter is • CU_GRAPHICS_REGISTER_FLAGS_NONE Not all Direct3D resources of the above types may be used for interoperability with CUDA.

44 Direct3D 10 Interoperability See also: cuD3D10CtxCreate. sourceGetMappedArray. cuGraphicsUnregisterResource.4. cuGraphicsMapResources. cuGraphicsResourceGetMappedPointer 291 cuGraphicsSubRe- Generated for NVIDIA CUDA Library by Doxygen .

unsigned int SubResource) size_t ∗pPitchSlice. unsigned int Flags) Set usage flags for mapping a Direct3D resource. ∗pResource. • CUresult cuD3D10UnmapResources (unsigned int count. • CUresult cuD3D10ResourceGetMappedSize (size_t ∗pSize. Generated for NVIDIA CUDA Library by Doxygen . ID3D10Resource ∗∗ppResources) Unmap Direct3D resources. ID3D10Resource ∗pResource. unsigned int SubResource) Get the dimensions of a registered surface. • CUresult cuD3D10ResourceGetSurfaceDimensions (size_t ∗pWidth. • CUresult cuD3D10RegisterResource (ID3D10Resource ∗pResource. unsigned int SubResource) Get an array through which to access a subresource of a Direct3D resource which has been mapped for access by CUDA. size_t ∗pDepth. unsigned int SubResource) Get the size of a subresource of a Direct3D resource which has been mapped for access by CUDA. unsigned int Flags) Register a Direct3D resource for access by CUDA. ID3D10Resource ∗pResource. ID3D10Resource ∗∗ppResources) Map Direct3D resources for access by CUDA.292 Module Documentation 4. ID3D10Resource ∗pResource.45 Direct3D 10 Interoperability [DEPRECATED] Typedefs • typedef enum CUD3D10map_flags_enum CUD3D10map_flags • typedef enum CUD3D10register_flags_enum CUD3D10register_flags Enumerations • enum CUD3D10map_flags_enum • enum CUD3D10register_flags_enum Functions • CUresult cuD3D10MapResources (unsigned int count. • CUresult cuD3D10ResourceGetMappedPointer (CUdeviceptr ∗pDevPtr. • CUresult cuD3D10ResourceGetMappedPitch (size_t ∗pPitch. ID3D10Resource ∗pResource. • CUresult cuD3D10ResourceGetMappedArray (CUarray ∗pArray. size_t ∗pHeight. ID3D10Resource Get the pitch of a subresource of a Direct3D resource which has been mapped for access by CUDA. • CUresult cuD3D10ResourceSetMapFlags (ID3D10Resource ∗pResource. unsigned int SubResource) Get a pointer through which to access a subresource of a Direct3D resource which has been mapped for access by CUDA. • CUresult cuD3D10UnregisterResource (ID3D10Resource ∗pResource) Unregister a Direct3D resource.

45. ID3D10Resource ∗∗ ppResources) Deprecated This function is deprecated as of Cuda 3.2 enum CUD3D10register_flags_enum Flags to register a resource 4. The resources in ppResources may be accessed in CUDA kernels until they are unmapped.45.4 4.45.1 Enumeration Type Documentation enum CUD3D10map_flags_enum Flags to map or unmap a resource 4.2 typedef enum CUD3D10register_flags_enum CUD3D10register_flags Flags to register a resource 4.45.Resources to map for CUDA Generated for NVIDIA CUDA Library by Doxygen .4.2 4.45. This function provides the synchronization guarantee that any Direct3D calls issued before cuD3D10MapResources() will complete before any CUDA kernels issued after cuD3D10MapResources() begin.4. then CUDA_ERROR_ALREADY_MAPPED is returned.45 Direct3D 10 Interoperability [DEPRECATED] 293 4. Maps the count Direct3D resources in ppResources for access by CUDA.3 4. the results are undefined. Direct3D should not access any resources while they are mapped by CUDA. If an application does so.3. 4.45.2. Parameters: count .45. then CUDA_ERROR_INVALID_HANDLE is returned.1 Typedef Documentation typedef enum CUD3D10map_flags_enum CUD3D10map_flags Flags to map or unmap a resource 4.3.45.0.Number of resources to map for CUDA ppResources . If any of ppResources are presently mapped for access by CUDA.45.1 Function Documentation CUresult cuD3D10MapResources (unsigned int count. If any of ppResources have not been registered for use with CUDA or if ppResources contains any duplicate entries.1 Detailed Description This section describes deprecated Direct3D 10 interoperability functionality.2.

CUDA_ERROR_NOT_INITIALIZED. The following are some limitations.2 CUresult cuD3D10RegisterResource (ID3D10Resource ∗ pResource. • ID3D10Texture2D: No restrictions.45. ID3D10Texture2D. • CU_D3D10_REGISTER_FLAGS_ARRAY: Specifies that CUDA will access this resource through a CUarray queried on a sub-resource basis through cuD3D10ResourceGetMappedArray(). • Resources allocated as shared may not be registered with CUDA. This call is potentially high-overhead and should not be called every frame in interactive applications. and ID3D10Texture3D. Not all Direct3D resources of the above types may be used for interoperability with CUDA.4. Also on success. This reference count will be decremented when this resource is unregistered through cuD3D10UnregisterResource(). • ID3D10Texture3D: No restrictions. • ID3D10Buffer: Cannot be used with Flags set to CU_D3D10_REGISTER_FLAGS_ARRAY. See also: cuGraphicsMapResources 4. If this call is successful. CUDA_ERROR_DEINITIALIZED. CUDA_ERROR_INVALID_HANDLE. and (for textures). The following values are allowed. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. Registers the Direct3D resource pResource for access by CUDA. The type of pResource must be one of the following. CUDA_ERROR_ALREADY_MAPPED. This option is valid for all resource types. asynchronous launches. cuD3D10ResourceGetMappedSize(). and cuD3D10ResourceGetMappedPitch() respectively. Generated for NVIDIA CUDA Library by Doxygen .0.294 Returns: Module Documentation CUDA_SUCCESS. • ID3D10Texture1D: No restrictions. This option is only valid for resources of type ID3D10Texture1D. • The primary rendertarget may not be registered with CUDA. unsigned int Flags) Deprecated This function is deprecated as of Cuda 3. CUDA_ERROR_INVALID_CONTEXT. pitch for each subresource of this allocation may be queried through cuD3D10ResourceGetMappedPointer(). The Flags argument specifies the mechanism through which CUDA will access the Direct3D resource. size. • CU_D3D10_REGISTER_FLAGS_NONE: Specifies that CUDA will access this resource through a CUdeviceptr. this call will increase the internal reference count on pResource. The pointer. then the application will be able to map and unmap this resource until it is unregistered through cuD3D10UnregisterResource().

CUDA_ERROR_NOT_MAPPED Generated for NVIDIA CUDA Library by Doxygen . CUDA_ERROR_DEINITIALIZED.Subresource of pResource to access Returns: CUDA_SUCCESS.45 Direct3D 10 Interoperability [DEPRECATED] 295 • Textures which are not of a format which is 1. see cuD3D10ResourceGetMappedPointer(). CUDA_ERROR_INVALID_VALUE. For usage requirements of the SubResource parameter.0. If Direct3D interoperability is not initialized on this context then CUDA_ERROR_INVALID_CONTEXT is returned.4. CUDA_ERROR_OUT_OF_MEMORY.Resource to register Flags . or 32-bit integer or floating-point data cannot be shared. If pResource cannot be registered.3 CUresult cuD3D10ResourceGetMappedArray (CUarray ∗ pArray.4. If pResource is not mapped. Parameters: pResource . CUDA_ERROR_NOT_INITIALIZED. or 4 channels of 8. CUDA_ERROR_INVALID_CONTEXT. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. If pResource is not registered. CUDA_ERROR_INVALID_HANDLE. The value set in pArray may change every time that pResource is mapped.Returned array corresponding to subresource pResource . CUDA_ERROR_INVALID_HANDLE. If pResource was not registered with usage flags CU_D3D10_REGISTER_FLAGS_ARRAY. CUDA_ERROR_DEINITIALIZED.Mapped resource to access SubResource . If pResource is of incorrect type or is already registered. then CUDA_ERROR_INVALID_HANDLE is returned. then CUDA_ERROR_INVALID_HANDLE is returned. 16. CUDA_ERROR_INVALID_CONTEXT. then CUDA_ERROR_UNKNOWN is returned. See also: cuGraphicsD3D10RegisterResource 4. then CUDA_ERROR_INVALID_HANDLE is returned. then CUDA_ERROR_NOT_MAPPED is returned.45. CUDA_ERROR_INVALID_VALUE. Returns in ∗pArray an array through which the subresource of the mapped Direct3D resource pResource. asynchronous launches. 2. Parameters: pArray . CUDA_ERROR_NOT_INITIALIZED.Parameters for resource registration Returns: CUDA_SUCCESS. ID3D10Resource ∗ pResource. unsigned int SubResource) Deprecated This function is deprecated as of Cuda 3. which corresponds to SubResource may be accessed. • Surfaces of depth or stencil formats cannot be shared.

0. CUDA_ERROR_INVALID_VALUE. asynchronous launches.Subresource of pResource to access Returns: CUDA_SUCCESS.Returned pitch of subresource pPitchSlice . CUDA_ERROR_INVALID_HANDLE. The values set in pPitch and pPitchSlice may change every time that pResource is mapped. For usage requirements of the SubResource parameter. the byte offset of the sample at position x. CUDA_ERROR_NOT_MAPPED Note: Note that this function may also return error codes from previous.45. CUDA_ERROR_DEINITIALIZED. unsigned int SubResource) Deprecated This function is deprecated as of Cuda 3. see cuD3D10ResourceGetMappedPointer().4 CUresult cuD3D10ResourceGetMappedPitch (size_t ∗ pPitch.Returned Z-slice pitch of subresource pResource . y from the base pointer of the surface is: y ∗ pitch + (bytes per pixel) ∗ x For a 3D surface.296 Note: Module Documentation Note that this function may also return error codes from previous. Returns in ∗pPitch and ∗pPitchSlice the pitch and Z-slice pitch of the subresource of the mapped Direct3D resource pResource. For a 2D surface. y. which corresponds to SubResource. z from the base pointer of the surface is: z∗ slicePitch + y ∗ pitch + (bytes per pixel) ∗ x Both parameters pPitch and pPitchSlice are optional and may be set to NULL. Parameters: pPitch . If pResource is not mapped for access by CUDA. If pResource was not registered with usage flags CU_D3D10_REGISTER_FLAGS_NONE. ID3D10Resource ∗ pResource. asynchronous launches. then CUDA_ERROR_INVALID_HANDLE is returned. the byte offset of the sample at position x. then CUDA_ERROR_NOT_MAPPED is returned. CUDA_ERROR_INVALID_CONTEXT. then CUDA_ERROR_INVALID_HANDLE is returned. size_t ∗ pPitchSlice.4.Mapped resource to access SubResource . See also: cuGraphicsSubResourceGetMappedArray Generated for NVIDIA CUDA Library by Doxygen . The pitch and Z-slice pitch values may be used to compute the location of a sample on a surface as follows. See also: cuGraphicsSubResourceGetMappedArray 4. If pResource is not of type IDirect3DBaseTexture10 or one of its sub-types or if pResource has not been registered for use with CUDA. CUDA_ERROR_NOT_INITIALIZED.

4. then CUDA_ERROR_INVALID_HANDLE is returned. If pResource is not mapped. See also: cuGraphicsResourceGetMappedPointer 4.Subresource of pResource to access Returns: CUDA_SUCCESS. unsigned int SubResource) 297 Deprecated This function is deprecated as of Cuda 3. then the value of SubResource must come from the subresource calculation in D3D10CalcSubResource(). Parameters: pDevPtr . CUDA_ERROR_INVALID_HANDLE. unsigned int SubResource) Deprecated This function is deprecated as of Cuda 3. which corresponds to SubResource.6 CUresult cuD3D10ResourceGetMappedSize (size_t ∗ pSize. If pResource is of any other type.45. If pResource was not registered with usage flags CU_D3D10_REGISTER_FLAGS_NONE. If pResource is not mapped for access by CUDA. Parameters: pSize .45. CUDA_ERROR_INVALID_VALUE. If pResource is not registered. CUDA_ERROR_NOT_MAPPED Note: Note that this function may also return error codes from previous. ID3D10Resource ∗ pResource. then CUDA_ERROR_INVALID_HANDLE is returned. If pResource was not registered with usage flags CU_D3D10_REGISTER_FLAGS_NONE. ID3D10Resource ∗ pResource. Returns in ∗pDevPtr the base pointer of the subresource of the mapped Direct3D resource pResource. then CUDA_ERROR_INVALID_HANDLE is returned. CUDA_ERROR_INVALID_CONTEXT.Returned pointer corresponding to subresource pResource . If pResource has not been registered for use with CUDA. Returns in ∗pSize the size of the subresource of the mapped Direct3D resource pResource.0.5 CUresult cuD3D10ResourceGetMappedPointer (CUdeviceptr ∗ pDevPtr.0. see cuD3D10ResourceGetMappedPointer(). If pResource is of type ID3D10Buffer. then CUDA_ERROR_NOT_MAPPED is returned.45 Direct3D 10 Interoperability [DEPRECATED] 4.4. then CUDA_ERROR_NOT_MAPPED is returned. The value set in pDevPtr may change every time that pResource is mapped. asynchronous launches. For usage requirements of the SubResource parameter. The value set in pSize may change every time that pResource is mapped. then CUDA_ERROR_INVALID_HANDLE is returned. CUDA_ERROR_DEINITIALIZED.Returned size of subresource Generated for NVIDIA CUDA Library by Doxygen . CUDA_ERROR_NOT_INITIALIZED. then SubResource must be 0. which corresponds to SubResource.Mapped resource to access SubResource .4.

Registered resource to access SubResource .45. For 2D surfaces. pHeight.4. which corresponds to SubResource. ID3D10Resource ∗ pResource. CUDA_ERROR_NOT_INITIALIZED. The parameters pWidth. asynchronous launches.7 CUresult cuD3D10ResourceGetSurfaceDimensions (size_t ∗ pWidth.Subresource of pResource to access Returns: Module Documentation CUDA_SUCCESS. it is possible that the dimensions of a resource will be an integer factor larger than the dimensions reported by the Direct3D runtime.Returned depth of surface pResource . ∗pHeight.Returned width of surface pHeight . CUDA_ERROR_DEINITIALIZED. CUDA_ERROR_INVALID_CONTEXT. CUDA_ERROR_INVALID_VALUE. Parameters: pWidth . CUDA_ERROR_INVALID_HANDLE.Subresource of pResource to access Returns: CUDA_SUCCESS.Returned height of surface pDepth . then CUDA_ERROR_INVALID_HANDLE is returned. CUDA_ERROR_DEINITIALIZED. the value returned in ∗pDepth will be 0. asynchronous launches. size_t ∗ pHeight. and pDepth are optional. See also: cuGraphicsSubResourceGetMappedArray Generated for NVIDIA CUDA Library by Doxygen . If pResource is not of type IDirect3DBaseTexture10 or IDirect3DSurface10 or if pResource has not been registered for use with CUDA. For usage requirements of the SubResource parameter.0.Mapped resource to access SubResource . and ∗pDepth the dimensions of the subresource of the mapped Direct3D resource pResource. CUDA_ERROR_INVALID_HANDLE Note: Note that this function may also return error codes from previous. CUDA_ERROR_INVALID_CONTEXT. CUDA_ERROR_INVALID_VALUE. CUDA_ERROR_NOT_INITIALIZED.298 pResource . CUDA_ERROR_NOT_MAPPED Note: Note that this function may also return error codes from previous. see cuD3D10ResourceGetMappedPointer(). Returns in ∗pWidth. See also: cuGraphicsResourceGetMappedPointer 4. Because anti-aliased surfaces may have multiple samples per pixel. size_t ∗ pDepth. unsigned int SubResource) Deprecated This function is deprecated as of Cuda 3.

4. asynchronous launches. Set flags for mapping the Direct3D resource pResource.4. If any of ppResources are not presently mapped for access by CUDA. Unmaps the count Direct3D resources in ppResources. This function provides the synchronization guarantee that any CUDA kernels issued before cuD3D10UnmapResources() will complete before any Direct3D calls issued after cuD3D10UnmapResources() begin. CUDA_ERROR_DEINITIALIZED. See also: cuGraphicsResourceSetMapFlags 4.0. • CU_D3D10_MAPRESOURCE_FLAGS_READONLY: Specifies that CUDA kernels which access this resource will not write to this resource. CUDA_ERROR_INVALID_HANDLE. ID3D10Resource ∗∗ ppResources) Deprecated This function is deprecated as of Cuda 3. so none of the data previously stored in the resource will be preserved. If any of ppResources have not been registered for use with CUDA or if ppResources contains any duplicate entries. If pResource has not been registered for use with CUDA. Generated for NVIDIA CUDA Library by Doxygen .4. • CU_D3D10_MAPRESOURCE_FLAGS_WRITEDISCARD: Specifies that CUDA kernels which access this resource will not read from this resource and will write over the entire contents of the resource. This is the default value. unsigned int Flags) 299 Deprecated This function is deprecated as of Cuda 3. CUDA_ERROR_ALREADY_MAPPED Note: Note that this function may also return error codes from previous. CUDA_ERROR_INVALID_VALUE. CUDA_ERROR_NOT_INITIALIZED.Registered resource to set flags for Flags . then CUDA_ERROR_NOT_MAPPED is returned. If pResource is presently mapped for access by CUDA then CUDA_ERROR_ALREADY_MAPPED is returned.Parameters for resource mapping Returns: CUDA_SUCCESS. • CU_D3D10_MAPRESOURCE_FLAGS_NONE: Specifies no hints about how this resource will be used. The Flags argument may be any of the following. then CUDA_ERROR_INVALID_HANDLE is returned. then CUDA_ERROR_INVALID_HANDLE is returned.45.45 Direct3D 10 Interoperability [DEPRECATED] 4.9 CUresult cuD3D10UnmapResources (unsigned int count. It is therefore assumed that this resource will be read from and written to by CUDA kernels. Parameters: pResource .45.8 CUresult cuD3D10ResourceSetMapFlags (ID3D10Resource ∗ pResource. CUDA_ERROR_INVALID_CONTEXT.0. Changes to flags will take effect the next time pResource is mapped.

CUDA_ERROR_INVALID_HANDLE. then CUDA_ERROR_INVALID_HANDLE is returned. CUDA_ERROR_INVALID_VALUE. If pResource is not registered. Parameters: pResource . CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. CUDA_ERROR_INVALID_HANDLE. asynchronous launches. CUDA_ERROR_INVALID_CONTEXT.0. CUDA_ERROR_DEINITIALIZED. CUDA_ERROR_DEINITIALIZED. See also: cuGraphicsUnmapResources 4. See also: cuGraphicsUnregisterResource Generated for NVIDIA CUDA Library by Doxygen . CUDA_ERROR_NOT_MAPPED.300 Parameters: count . CUDA_ERROR_NOT_INITIALIZED.Resources to unmap for CUDA Returns: Module Documentation CUDA_SUCCESS.Resources to unregister Returns: CUDA_SUCCESS.45.10 CUresult cuD3D10UnregisterResource (ID3D10Resource ∗ pResource) Deprecated This function is deprecated as of Cuda 3.Number of resources to unmap for CUDA ppResources . CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_INVALID_CONTEXT.4. asynchronous launches. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. Unregisters the Direct3D resource pResource so it is not accessible by CUDA unless registered again.

46. • CUresult cuD3D11GetDirect3DDevice (ID3D11Device ∗∗ppD3DDevice) Get the Direct3D 11 device against which the current CUDA context was created.46. ID3D11Resource ∗pD3DResource. • CUresult cuD3D11GetDevice (CUdevice ∗pCudaDevice. IDXGIAdapter ∗pAdapter) Gets the CUDA device corresponding to a display adapter. • CUresult cuGraphicsD3D11RegisterResource (CUgraphicsResource ∗pCudaResource. ID3D11Device ∗pD3DDevice. CUdevice ∗pCudaDevice.46 Direct3D 11 Interoperability 301 4.4. ID3D11Device ∗pD3D11Device.46 Direct3D 11 Interoperability Typedefs • typedef enum CUd3d11DeviceList_enum CUd3d11DeviceList Enumerations • enum CUd3d11DeviceList_enum { CU_D3D11_DEVICE_LIST_ALL = 0x01.1 Typedef Documentation typedef enum CUd3d11DeviceList_enum CUd3d11DeviceList CUDA devices corresponding to a D3D11 device Generated for NVIDIA CUDA Library by Doxygen . CU_D3D11_DEVICE_LIST_NEXT_FRAME = 0x03 } Functions • CUresult cuD3D11CtxCreate (CUcontext ∗pCtx. ID3D11Device ∗pD3DDevice) Create a CUDA context for interoperability with Direct3D 11.46. unsigned int Flags) Register a Direct3D 11 resource for access by CUDA. 4. CUdevice ∗pCudaDevices. • CUresult cuD3D11GetDevices (unsigned int ∗pCudaDeviceCount. CU_D3D11_DEVICE_LIST_CURRENT_FRAME = 0x02. unsigned int flags.2. unsigned int cudaDeviceCount.2 4. CUd3d11DeviceList deviceList) Gets the CUDA devices corresponding to a Direct3D 11 device. unsigned int Flags.1 Detailed Description This section describes the Direct3D 11 interoperability functions of the low-level CUDA driver application programming interface. 4. CUdevice cudaDevice) Create a CUDA context for interoperability with Direct3D 11. • CUresult cuD3D11CtxCreateOnDevice (CUcontext ∗pCtx.

This context will cease to function if pD3DDevice is destroyed or encounters an error. enables interoperability for that context with the Direct3D device pD3DDevice.4. This reference count will be decremented upon destruction of this context through cuCtxDestroy(). asynchronous launches.Context creation flags (see cuCtxCreate() for details) pD3DDevice . On success.Returned newly created CUDA context pCudaDevice . enables interoperability for that context with the Direct3D device pD3DDevice. and associates the created CUDA context with the calling thread.46.Direct3D device to create interoperability context with Returns: CUDA_SUCCESS. unsigned int flags. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. The created CUcontext will be returned in ∗pCtx. CUDA_ERROR_NOT_INITIALIZED.46. CUdevice ∗ pCudaDevice.4 4. unsigned int Flags. Direct3D resources from this device may be registered and mapped through the lifetime of this CUDA context. this call will increase the internal reference count on pD3DDevice. CUDA_ERROR_OUT_OF_MEMORY. CUdevice cudaDevice) Creates a new CUDA context. CUDA_ERROR_INVALID_VALUE.46.46. If pCudaDevice is non-NULL then the CUdevice on which this CUDA context was created will be returned in ∗pCudaDevice.302 Module Documentation 4. Generated for NVIDIA CUDA Library by Doxygen .2 CUresult cuD3D11CtxCreateOnDevice (CUcontext ∗ pCtx. Direct3D resources from this device may be registered and mapped through the lifetime of this CUDA context.1 Function Documentation CUresult cuD3D11CtxCreate (CUcontext ∗ pCtx. ID3D11Device ∗ pD3DDevice.1 Enumeration Type Documentation enum CUd3d11DeviceList_enum CUDA devices corresponding to a D3D11 device Enumerator: CU_D3D11_DEVICE_LIST_ALL The CUDA devices for all GPUs used by a D3D11 device CU_D3D11_DEVICE_LIST_CURRENT_FRAME The CUDA devices for the GPUs used by a D3D11 device in its currently rendering frame CU_D3D11_DEVICE_LIST_NEXT_FRAME The CUDA devices for the GPUs to be used by a D3D11 device in the next frame 4. ID3D11Device ∗ pD3DDevice) Creates a new CUDA context. Parameters: pCtx . See also: cuD3D11GetDevice. CUDA_ERROR_DEINITIALIZED. cuGraphicsD3D11RegisterResource 4. and associates the created CUDA context with the calling thread.Returned pointer to the device on which the context was created Flags .4.46.3. The created CUcontext will be returned in ∗pCtx.3 4.

This device must be among the devices returned when querying CU_D3D11_DEVICES_ALL from cuD3D11GetDevices.Direct3D device to create interoperability context with cudaDevice .46 Direct3D 11 Interoperability 303 On success. asynchronous launches. Returns: CUDA_SUCCESS.Context creation flags (see cuCtxCreate() for details) pD3DDevice .3 CUresult cuD3D11GetDevice (CUdevice ∗ pCudaDevice. IDXGIAdapter ∗ pAdapter) Returns in ∗pCudaDevice the CUDA-compatible device corresponding to the adapter pAdapter obtained from IDXGIFactory::EnumAdapters.4. This reference count will be decremented upon destruction of this context through cuCtxDestroy(). Parameters: pCtx . CUDA_ERROR_NOT_INITIALIZED. asynchronous launches. CUDA_ERROR_OUT_OF_MEMORY. Parameters: pCudaDevice . This context will cease to function if pD3DDevice is destroyed or encounters an error.Returned CUDA device corresponding to pAdapter pAdapter .46.4. If no device on pAdapter is CUDA-compatible the call will return CUDA_ERROR_NO_DEVICE. ERROR_NO_DEVICE. See also: cuD3D11CtxCreate CUDA_ERROR_NOT_INITIALIZED. See also: cuD3D11GetDevices. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. this call will increase the internal reference count on pD3DDevice.The CUDA device on which to create the context. CUDA_ERROR_DEINITIALIZED. cuGraphicsD3D11RegisterResource 4. CUDA_ERROR_INVALID_VALUE. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. CUDA_- Generated for NVIDIA CUDA Library by Doxygen .Returned newly created CUDA context flags .Adapter to query for CUDA device Returns: CUDA_SUCCESS. CUDA_ERROR_DEINITIALIZED.

See also: cuD3D11GetDevice CUDA_ERROR_NOT_INITIALIZED.4. Also returns in ∗pCudaDevices at most cudaDeviceCount of the the CUDA-compatible devices corresponding to the Direct3D 11 device pD3D11Device. CUd3d11DeviceList deviceList) Returns in ∗pCudaDeviceCount the number of CUDA-compatible device corresponding to the Direct3D 11 device pD3D11Device. CUDA_- 4. CUDA_- Generated for NVIDIA CUDA Library by Doxygen . CUDA_ERROR_DEINITIALIZED.Direct3D 11 device to query for CUDA devices deviceList . If any of the GPUs being used to render pDevice are not CUDA capable then the call will return CUDA_ERROR_NO_DEVICE. This set may be CU_D3D11_DEVICE_LIST_ALL for all devices. Parameters: ppD3DDevice . Parameters: pCudaDeviceCount .4. asynchronous launches.4 Module Documentation CUresult cuD3D11GetDevices (unsigned int ∗ pCudaDeviceCount. CUdevice ∗ pCudaDevices. ERROR_INVALID_CONTEXT Note: Note that this function may also return error codes from previous. ERROR_NO_DEVICE. See also: cuD3D11CtxCreate CUDA_ERROR_NOT_INITIALIZED.5 CUresult cuD3D11GetDirect3DDevice (ID3D11Device ∗∗ ppD3DDevice) Returns in ∗ppD3DDevice the Direct3D device against which this CUDA context was created in cuD3D11CtxCreate(). CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous.Returned Direct3D device corresponding to CUDA context Returns: CUDA_SUCCESS.Returned number of CUDA devices corresponding to pD3D11Device pCudaDevices .304 4. CU_D3D11_DEVICE_LIST_CURRENT_FRAME for the devices used to render the current frame (in SLI). or CU_D3D11_DEVICE_LIST_NEXT_FRAME for the devices used to render the next frame (in SLI). asynchronous launches. unsigned int cudaDeviceCount.46.Returned CUDA devices corresponding to pD3D11Device cudaDeviceCount .The set of devices to return. Returns: CUDA_SUCCESS. ID3D11Device ∗ pD3D11Device. CUDA_ERROR_DEINITIALIZED.46.The size of the output device array pCudaDevices pD3D11Device .

CUDA_ERROR_NOT_INITIALIZED. asynchronous launches. 2. or 32-bit integer or floating-point data cannot be shared. On success this call will increase the internal reference count on pD3DResource. CUDA_ERROR_DEINITIALIZED.46. Generated for NVIDIA CUDA Library by Doxygen .Direct3D resource to register Flags . The type of pD3DResource must be one of the following. CUDA_ERROR_INVALID_HANDLE. If Direct3D interoperability is not initialized for this context using cuD3D11CtxCreate then CUDA_ERROR_INVALID_CONTEXT is returned. Parameters: pCudaResource . 16.4.Returned graphics resource handle pD3DResource . The only valid value for this parameter is • CU_GRAPHICS_REGISTER_FLAGS_NONE Not all Direct3D resources of the above types may be used for interoperability with CUDA.Parameters for resource registration Returns: CUDA_SUCCESS. If pD3DResource is of incorrect type or is already registered then CUDA_ERROR_INVALID_HANDLE is returned. This call is potentially high-overhead and should not be called every frame in interactive applications. • Textures which are not of a format which is 1. • ID3D11Buffer: may be accessed through a device pointer. • ID3D11Texture1D: individual subresources of the texture may be accessed via arrays • ID3D11Texture2D: individual subresources of the texture may be accessed via arrays • ID3D11Texture3D: individual subresources of the texture may be accessed via arrays The Flags argument may be used to specify additional parameters at register time. CUDA_ERROR_INVALID_VALUE. If pD3DResource cannot be registered then CUDA_ERROR_UNKNOWN is returned. or 4 channels of 8. • Resources allocated as shared may not be registered with CUDA. The following are some limitations.4. unsigned int Flags) 305 Registers the Direct3D 11 resource pD3DResource for access by CUDA and returns a CUDA handle to pD3Dresource in pCudaResource. This reference count will be decremented when this resource is unregistered through cuGraphicsUnregisterResource().6 CUresult cuGraphicsD3D11RegisterResource (CUgraphicsResource ∗ pCudaResource. CUDA_ERROR_UNKNOWN Note: Note that this function may also return error codes from previous. The handle returned in pCudaResource may be used to map and unmap this resource until it is unregistered. • The primary rendertarget may not be registered with CUDA. If Flags is not one of the above specified value then CUDA_ERROR_INVALID_VALUE is returned. CUDA_ERROR_INVALID_CONTEXT. ID3D11Resource ∗ pD3DResource. CUDA_ERROR_OUT_OF_MEMORY. • Surfaces of depth or stencil formats cannot be shared.46 Direct3D 11 Interoperability 4.

cuGraphicsUnregisterResource. cuGraphicsMapResources. cuGraphicsResourceGetMappedPointer cuGraphicsSubRe- Generated for NVIDIA CUDA Library by Doxygen .306 See also: Module Documentation cuD3D11CtxCreate. sourceGetMappedArray.

The VdpOutputSurface is presented as an array of subresources that may be accessed using pointers returned by cuGraphicsSubResourceGetMappedArray. 4. • CUresult cuVDPAUGetDevice ∗vdpGetProcAddress) (CUdevice ∗pDevice.47. • CU_GRAPHICS_MAP_RESOURCE_FLAGS_READ_ONLY: Specifies that CUDA will not write to this resource. so none of the data previously stored in the resource will be preserved.1 Detailed Description This section describes the VDPAU interoperability functions of the low-level CUDA driver application programming interface. unsigned int flags) Registers the VdpOutputSurface specified by vdpSurface for access by CUDA.2 4.4. VdpGetProcAddress Gets the CUDA device associated with a VDPAU device. VdpVideoSurface vdpSurface. unsigned int flags. VdpDevice vdpDevice.47. The exact number of valid arrayIndex values depends on the VDPAU surface format. unsigned int flags) Registers a VDPAU VdpVideoSurface object. CUdevice device. VdpRGBAFormat VDP_RGBA_FORMAT_B8G8R8A8 VDP_RGBA_FORMAT_R10G10B10A2 Generated for NVIDIA CUDA Library by Doxygen arrayIndex 0 0 Size wxh wxh Format ARGB8 A2BGR10 Content Entire surface Entire surface . It is therefore assumed that this resource will be read from and written to by CUDA.47 VDPAU Interoperability Functions • CUresult cuGraphicsVDPAURegisterOutputSurface (CUgraphicsResource ∗pCudaResource.47 VDPAU Interoperability 307 4. mipLevel must be 0. The surface’s intended usage is specified using flags. • CUresult cuGraphicsVDPAURegisterVideoSurface (CUgraphicsResource ∗pCudaResource. A handle to the registered object is returned as pCudaResource. as follows: • CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE: Specifies no hints about how this resource will be used.47. VdpGetProcAddress ∗vdpGetProcAddress) Create a CUDA context for interoperability with VDPAU. 4. This is the default value. unsigned int flags) Registers a VDPAU VdpOutputSurface object.1 Function Documentation CUresult cuGraphicsVDPAURegisterOutputSurface (CUgraphicsResource ∗ pCudaResource. VdpOutputSurface vdpSurface. VdpOutputSurface vdpSurface.2. • CUresult cuVDPAUCtxCreate (CUcontext ∗pCtx. VdpDevice vdpDevice. The mapping is shown in the table below. • CU_GRAPHICS_MAP_RESOURCE_FLAGS_WRITE_DISCARD: Specifies that CUDA will not read from this resource and will write over the entire contents of the resource.

CUDA_ERROR_ALREADY_MAPPED. cuVDPAUGetDevice 4.The VdpVideoSurface to be registered Generated for NVIDIA CUDA Library by Doxygen .2.308 Parameters: pCudaResource . so none of the data previously stored in the resource will be preserved. This is the default value. See also: cuCtxCreate. The VdpVideoSurface is presented as an array of subresources that may be accessed using pointers returned by cuGraphicsSubResourceGetMappedArray.Pointer to the returned object handle vdpSurface . Note: Note that this function may also return error codes from previous. asynchronous launches. cuGraphicsUnregisterResource. cuGraphicsMapResources. A handle to the registered object is returned as pCudaResource. The surface’s intended usage is specified using flags.2 CUresult cuGraphicsVDPAURegisterVideoSurface (CUgraphicsResource ∗ pCudaResource. cuGraphicsResourceSetMapFlags. as follows: • CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE: Specifies no hints about how this resource will be used.Pointer to the returned object handle vdpSurface . The exact number of valid arrayIndex values depends on the VDPAU surface format. CUDA_ERROR_INVALID_HANDLE.47. • CU_GRAPHICS_MAP_RESOURCE_FLAGS_WRITE_DISCARD: Specifies that CUDA will not read from this resource and will write over the entire contents of the resource. unsigned int flags) Registers the VdpVideoSurface specified by vdpSurface for access by CUDA. VdpVideoSurface vdpSurface. mipLevel must be 0. cuGraphicsSubResourceGetMappedArray. The mapping is shown in the table below. CUDA_ERROR_INVALID_CONTEXT.The VdpOutputSurface to be registered flags . It is therefore assumed that this resource will be read from and written to by CUDA. cuVDPAUCtxCreate. cuGraphicsVDPAURegisterVideoSurface. • CU_GRAPHICS_MAP_RESOURCE_FLAGS_READ_ONLY: Specifies that CUDA will not write to this resource. VdpChromaType VDP_CHROMA_TYPE_420 arrayIndex 0 1 2 3 0 1 2 3 Size w x h/2 w x h/2 w/2 x h/4 w/2 x h/4 w x h/2 w x h/2 w/2 x h/2 w/2 x h/2 Format R8 R8 R8G8 R8G8 R8 R8 R8G8 R8G8 Content Top-field luma Bottom-field luma Top-field chroma Bottom-field chroma Top-field luma Bottom-field luma Top-field chroma Bottom-field chroma VDP_CHROMA_TYPE_422 Parameters: pCudaResource . cuGraphicsUnmapResources.Map flags Returns: Module Documentation CUDA_SUCCESS.

cuGraphicsMapResources.VDPAU’s VdpGetProcAddress function pointer Returns: CUDA_SUCCESS. unsigned int flags. cuVDPAUGetDevice 4. cuGraphicsSubResourceGetMappedArray.Device on which to create the context vdpDevice .47. see cuCtxCreate(). See also: cuCtxCreate. CUDA_ERROR_INVALID_HANDLE. For usage of the flags parameter. See also: cuCtxCreate. cuGraphicsVDPAURegisterVideoSurface. CUDA_ERROR_OUT_OF_MEMORY Note: Note that this function may also return error codes from previous.Map flags Returns: 309 CUDA_SUCCESS.2. cuGraphicsSubResourceGetMappedArray. cuGraphicsUnregisterResource. cuVDPAUGetDevice 4. cuGraphicsUnregisterResource.4. VdpGetProcAddress ∗ vdpGetProcAddress) Returns in ∗pDevice the CUDA device associated with a vdpDevice. and associates the CUDA context with the calling thread. It must be called before performing any other VDPAU interoperability operations. CUDA_ERROR_ALREADY_MAPPED. cuGraphicsUnmapResources. cuGraphicsUnmapResources. cuGraphicsVDPAURegisterOutputSurface. if applicable. VdpGetProcAddress ∗ vdpGetProcAddress) Creates a new CUDA context. asynchronous launches. cuGraphicsResourceSetMapFlags. VdpDevice vdpDevice.47. CUDA_ERROR_NOT_INITIALIZED. CUDA_ERROR_INVALID_CONTEXT.4 CUresult cuVDPAUGetDevice (CUdevice ∗ pDevice. cuGraphicsVDPAURegisterOutputSurface. Generated for NVIDIA CUDA Library by Doxygen . asynchronous launches. CUDA_ERROR_DEINITIALIZED. Parameters: pCtx .47 VDPAU Interoperability flags . It may fail if the needed VDPAU driver facilities are not available. cuGraphicsResourceSetMapFlags.2. CUDA_ERROR_INVALID_CONTEXT. CUDA_ERROR_INVALID_VALUE. initializes VDPAU interoperability. cuGraphicsMapResources. cuVDPAUCtxCreate.Returned CUDA context flags . VdpDevice vdpDevice.3 CUresult cuVDPAUCtxCreate (CUcontext ∗ pCtx.Options for CUDA context creation device .The VdpDevice to interop with vdpGetProcAddress . Note: Note that this function may also return error codes from previous. CUdevice device.

cuGraphicsUnmapResources. asynchronous launches. cuVDPAUCtxCreate.A VdpDevice handle vdpGetProcAddress . cuGraphicsResourceSetMapFlags.310 Parameters: pDevice . See also: CUDA_- cuCtxCreate.Device associated with vdpDevice vdpDevice . cuGraphicsMapResources. ERROR_INVALID_CONTEXT. cuGraphicsVDPAURegisterOutputSurface. CUDA_ERROR_NOT_INITIALIZED.VDPAU’s VdpGetProcAddress function pointer Returns: Module Documentation CUDA_SUCCESS. cuGraphicsUnregisterResource. CUDA_ERROR_INVALID_VALUE Note: Note that this function may also return error codes from previous. CUDA_ERROR_DEINITIALIZED. cuGraphicsSubResourceGetMappedArray Generated for NVIDIA CUDA Library by Doxygen . cuGraphicsVDPAURegisterVideoSurface.

1.Chapter 5 Data Structure Documentation 5.2.1.1.2.1.1.1 Detailed Description 3D array descriptor 5.1 CUDA_ARRAY3D_DESCRIPTOR_st Struct Reference Data Fields • • • • • • size_t Depth unsigned int Flags CUarray_format Format size_t Height unsigned int NumChannels size_t Width 5.4 size_t CUDA_ARRAY3D_DESCRIPTOR_st::Height Height of 3D array .1.2.3 CUarray_format CUDA_ARRAY3D_DESCRIPTOR_st::Format unsigned int CUDA_ARRAY3D_DESCRIPTOR_st::Flags Array format 5.2.1 Field Documentation size_t CUDA_ARRAY3D_DESCRIPTOR_st::Depth Depth of 3D array 5.2 5.2 Flags 5.

1.5 unsigned int CUDA_ARRAY3D_DESCRIPTOR_st::NumChannels Data Structure Documentation Channels per array element 5.2.2.312 5.6 size_t CUDA_ARRAY3D_DESCRIPTOR_st::Width Width of 3D array Generated for NVIDIA CUDA Library by Doxygen .1.

1 Detailed Description Array descriptor 5.4 size_t CUDA_ARRAY_DESCRIPTOR_st::Width Width of array Generated for NVIDIA CUDA Library by Doxygen .2 size_t CUDA_ARRAY_DESCRIPTOR_st::Height Height of array 5.1 Field Documentation CUarray_format CUDA_ARRAY_DESCRIPTOR_st::Format Array format 5.2.2.2.2.2.2 5.2.3 unsigned int CUDA_ARRAY_DESCRIPTOR_st::NumChannels Channels per array element 5.2.2 CUDA_ARRAY_DESCRIPTOR_st Struct Reference Data Fields • • • • CUarray_format Format size_t Height unsigned int NumChannels size_t Width 5.2.5.2.2.2 CUDA_ARRAY_DESCRIPTOR_st Struct Reference 313 5.

3.2.1 Field Documentation CUarray CUDA_MEMCPY2D_st::dstArray Destination array reference 5.2 CUdeviceptr CUDA_MEMCPY2D_st::dstDevice Destination device pointer 5.314 Data Structure Documentation 5.3. array) 5.2 5.5 size_t CUDA_MEMCPY2D_st::dstPitch Destination pitch (ignored when dst is array) Generated for NVIDIA CUDA Library by Doxygen .3 void∗ CUDA_MEMCPY2D_st::dstHost Destination host pointer 5.3.3.3 CUDA_MEMCPY2D_st Struct Reference Data Fields • • • • • • • • • • • • • • • • CUarray dstArray CUdeviceptr dstDevice void ∗ dstHost CUmemorytype dstMemoryType size_t dstPitch size_t dstXInBytes size_t dstY size_t Height CUarray srcArray CUdeviceptr srcDevice const void ∗ srcHost CUmemorytype srcMemoryType size_t srcPitch size_t srcXInBytes size_t srcY size_t WidthInBytes 5.3.3.3.2. device.2.4 CUmemorytype CUDA_MEMCPY2D_st::dstMemoryType Destination memory type (host.1 Detailed Description 2D memory copy parameters 5.2.2.

2.13 size_t CUDA_MEMCPY2D_st::srcPitch Source pitch (ignored when src is array) 5.3 CUDA_MEMCPY2D_st Struct Reference 5.2.2.10 CUdeviceptr CUDA_MEMCPY2D_st::srcDevice Source device pointer 5.3.12 CUmemorytype CUDA_MEMCPY2D_st::srcMemoryType Source memory type (host.3.2.5.3.3.2.14 size_t CUDA_MEMCPY2D_st::srcXInBytes Source X in bytes 5.6 size_t CUDA_MEMCPY2D_st::dstXInBytes 315 Destination X in bytes 5.3.9 CUarray CUDA_MEMCPY2D_st::srcArray Source array reference 5.2.2.2. device.7 size_t CUDA_MEMCPY2D_st::dstY Destination Y 5.3.2.15 Source Y 5.3.8 size_t CUDA_MEMCPY2D_st::Height Height of 2D memory copy 5.3.3. array) 5.11 const void∗ CUDA_MEMCPY2D_st::srcHost Source host pointer 5.16 size_t CUDA_MEMCPY2D_st::WidthInBytes size_t CUDA_MEMCPY2D_st::srcY Width of 2D memory copy in bytes Generated for NVIDIA CUDA Library by Doxygen .2.3.2.3.

4 CUDA_MEMCPY3D_st Struct Reference Data Fields • • • • • • • • • • • • • • • • • • • • • • • • • size_t Depth CUarray dstArray CUdeviceptr dstDevice size_t dstHeight void ∗ dstHost size_t dstLOD CUmemorytype dstMemoryType size_t dstPitch size_t dstXInBytes size_t dstY size_t dstZ size_t Height void ∗ reserved0 void ∗ reserved1 CUarray srcArray CUdeviceptr srcDevice size_t srcHeight const void ∗ srcHost size_t srcLOD CUmemorytype srcMemoryType size_t srcPitch size_t srcXInBytes size_t srcY size_t srcZ size_t WidthInBytes 5.4.3 CUdeviceptr CUDA_MEMCPY3D_st::dstDevice Destination device pointer Generated for NVIDIA CUDA Library by Doxygen .4.4.2.4.1 Field Documentation size_t CUDA_MEMCPY3D_st::Depth Depth of 3D memory copy 5.2 CUarray CUDA_MEMCPY3D_st::dstArray Destination array reference 5.2.1 Detailed Description 3D memory copy parameters 5.2.2 5.316 Data Structure Documentation 5.4.

4.2.8 size_t CUDA_MEMCPY3D_st::dstPitch Destination pitch (ignored when dst is array) 5.4.2. may be 0 if Depth==1) 5.7 CUmemorytype CUDA_MEMCPY3D_st::dstMemoryType Destination memory type (host.10 size_t CUDA_MEMCPY3D_st::dstY Destination Y 5.5 void∗ CUDA_MEMCPY3D_st::dstHost Destination host pointer 5.4.4.13 void∗ CUDA_MEMCPY3D_st::reserved0 Must be NULL 5.4 size_t CUDA_MEMCPY3D_st::dstHeight 317 Destination height (ignored when dst is array.2.4.5.2.2. array) 5.14 void∗ CUDA_MEMCPY3D_st::reserved1 Must be NULL 5.4.4.2.4.11 size_t CUDA_MEMCPY3D_st::dstZ Destination Z 5.12 size_t CUDA_MEMCPY3D_st::Height Height of 3D memory copy 5.2.4 CUDA_MEMCPY3D_st Struct Reference 5.2.2.2.6 size_t CUDA_MEMCPY3D_st::dstLOD Destination LOD 5.2.15 CUarray CUDA_MEMCPY3D_st::srcArray Source array reference Generated for NVIDIA CUDA Library by Doxygen .4.4. device.9 size_t CUDA_MEMCPY3D_st::dstXInBytes Destination X in bytes 5.4.2.4.

device.2.4.4.17 size_t CUDA_MEMCPY3D_st::srcHeight Source height (ignored when src is array.4.25 size_t CUDA_MEMCPY3D_st::WidthInBytes size_t CUDA_MEMCPY3D_st::srcZ size_t CUDA_MEMCPY3D_st::srcY Width of 3D memory copy in bytes Generated for NVIDIA CUDA Library by Doxygen .318 5.2.2.23 Source Y 5.18 const void∗ CUDA_MEMCPY3D_st::srcHost Source host pointer 5.20 CUmemorytype CUDA_MEMCPY3D_st::srcMemoryType Source memory type (host.22 size_t CUDA_MEMCPY3D_st::srcXInBytes Source X in bytes 5. may be 0 if Depth==1) 5.2.4.24 Source Z 5.21 size_t CUDA_MEMCPY3D_st::srcPitch Source pitch (ignored when src is array) 5.16 CUdeviceptr CUDA_MEMCPY3D_st::srcDevice Data Structure Documentation Source device pointer 5. array) 5.2.2.4.4.4.19 size_t CUDA_MEMCPY3D_st::srcLOD Source LOD 5.4.4.2.4.2.2.2.

2 5.4 y 5.2.5 z int cudaChannelFormatDesc::z int cudaChannelFormatDesc::y int cudaChannelFormatDesc::x int cudaChannelFormatDesc::w Generated for NVIDIA CUDA Library by Doxygen .5 cudaChannelFormatDesc Struct Reference Data Fields • • • • • enum cudaChannelFormatKind f int w int x int y int z 5.2.5.1 Field Documentation enum cudaChannelFormatKind cudaChannelFormatDesc::f Channel format kind 5.2.1 Detailed Description CUDA Channel format descriptor 5.2 w 5.2.5.5.5.5.5.5 cudaChannelFormatDesc Struct Reference 319 5.3 x 5.5.5.2.

6.320 Data Structure Documentation 5.6.2 int cudaDeviceProp::clockRate Clock frequency in kilohertz Generated for NVIDIA CUDA Library by Doxygen .1 Detailed Description CUDA device properties 5.2 5.1 Field Documentation int cudaDeviceProp::canMapHostMemory Device can map host memory with cudaHostAlloc/cudaHostGetDevicePointer 5.6 cudaDeviceProp Struct Reference Data Fields • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • int canMapHostMemory int clockRate int computeMode int concurrentKernels int deviceOverlap int ECCEnabled int integrated int kernelExecTimeoutEnabled int major int maxGridSize [3] int maxTexture1D int maxTexture2D [2] int maxTexture2DArray [3] int maxTexture3D [3] int maxThreadsDim [3] int maxThreadsPerBlock size_t memPitch int minor int multiProcessorCount char name [256] int pciBusID int pciDeviceID int regsPerBlock size_t sharedMemPerBlock size_t surfaceAlignment int tccDriver size_t textureAlignment size_t totalConstMem size_t totalGlobalMem int warpSize 5.2.2.6.6.

2.2.6.11 int cudaDeviceProp::maxTexture1D Maximum 1D texture size 5.2.6.6.2.5.6.12 int cudaDeviceProp::maxTexture2D[2] Maximum 2D texture dimensions 5.6.2.2.6 int cudaDeviceProp::ECCEnabled Device has ECC support enabled 5.5 int cudaDeviceProp::deviceOverlap Device can concurrently copy memory and execute a kernel 5.2.10 int cudaDeviceProp::maxGridSize[3] Maximum size of each dimension of a grid 5.6.14 int cudaDeviceProp::maxTexture3D[3] Maximum 3D texture dimensions Generated for NVIDIA CUDA Library by Doxygen .2.3 int cudaDeviceProp::computeMode 321 Compute mode (See cudaComputeMode) 5.4 int cudaDeviceProp::concurrentKernels Device can possibly execute multiple kernels concurrently 5.6.2.6.6.8 int cudaDeviceProp::kernelExecTimeoutEnabled Specified whether there is a run time limit on kernels 5.6 cudaDeviceProp Struct Reference 5.6.2.9 int cudaDeviceProp::major Major compute capability 5.7 int cudaDeviceProp::integrated Device is integrated as opposed to discrete 5.13 int cudaDeviceProp::maxTexture2DArray[3] Maximum 2D texture array dimensions 5.6.2.2.6.

2.6.6.18 int cudaDeviceProp::minor Minor compute capability 5.2.2.2.6.2.6.6.2.2.6.2.23 int cudaDeviceProp::regsPerBlock 32-bit registers available per block 5. 0 otherwise Generated for NVIDIA CUDA Library by Doxygen .6.2.20 char cudaDeviceProp::name[256] ASCII string identifying device 5.6.19 int cudaDeviceProp::multiProcessorCount Number of multiprocessors on device 5.6.21 int cudaDeviceProp::pciBusID PCI bus ID of the device 5.2.6.6.2.6.322 5.22 int cudaDeviceProp::pciDeviceID PCI device ID of the device 5.25 size_t cudaDeviceProp::surfaceAlignment Alignment requirements for surfaces 5.24 size_t cudaDeviceProp::sharedMemPerBlock Shared memory available per block in bytes 5.17 size_t cudaDeviceProp::memPitch Maximum pitch in bytes allowed by memory copies 5.15 int cudaDeviceProp::maxThreadsDim[3] Data Structure Documentation Maximum size of each dimension of a block 5.26 int cudaDeviceProp::tccDriver 1 if device is a Tesla device using TCC driver.2.16 int cudaDeviceProp::maxThreadsPerBlock Maximum number of threads per block 5.

2.28 size_t cudaDeviceProp::totalConstMem Constant memory available on device in bytes 5.6.2.27 size_t cudaDeviceProp::textureAlignment 323 Alignment requirement for textures 5.5.6 cudaDeviceProp Struct Reference 5.6.2.6.30 int cudaDeviceProp::warpSize Warp size in threads Generated for NVIDIA CUDA Library by Doxygen .29 size_t cudaDeviceProp::totalGlobalMem Global memory available on device in bytes 5.6.2.

7.7.2.2.2 size_t cudaExtent::height Height in elements 5. in bytes when referring to linear memory Generated for NVIDIA CUDA Library by Doxygen .7.324 Data Structure Documentation 5.1 Detailed Description CUDA extent See also: make_cudaExtent 5.2 5.1 Field Documentation size_t cudaExtent::depth Depth in elements 5.3 size_t cudaExtent::width Width in elements when referring to array memory.2.7.7.7 cudaExtent Struct Reference Data Fields • size_t depth • size_t height • size_t width 5.

This value is the major binary version ∗ 10 + the minor binary version.5 int cudaFuncAttributes::numRegs The number of registers used by each thread of this function.1 Detailed Description CUDA function attributes 5. Generated for NVIDIA CUDA Library by Doxygen .2. This value is the major PTX version ∗ 10 + the minor PTX version.2.8.3 size_t cudaFuncAttributes::localSizeBytes The size in bytes of local memory used by each thread of this function. 5.8.2.8 cudaFuncAttributes Struct Reference Data Fields • • • • • • • int binaryVersion size_t constSizeBytes size_t localSizeBytes int maxThreadsPerBlock int numRegs int ptxVersion size_t sharedSizeBytes 5.2.2 size_t cudaFuncAttributes::constSizeBytes The size in bytes of user-allocated constant memory required by this function.6 int cudaFuncAttributes::ptxVersion The PTX virtual architecture version for which the function was compiled.1 Field Documentation int cudaFuncAttributes::binaryVersion The binary architecture version for which the function was compiled.2. 5.8.3 function would return the value 13. This number depends on both the function and the device on which the function is currently loaded. 5.8 cudaFuncAttributes Struct Reference 325 5.8.4 int cudaFuncAttributes::maxThreadsPerBlock The maximum number of threads per block.2 5. 5.5. beyond which a launch of the function would fail.3 function would return the value 13.8. so a binary version 1.2. so a PTX version 1.8.8. 5.8.

2. Generated for NVIDIA CUDA Library by Doxygen . This does not include dynamically-allocated shared memory requested by the user at runtime.326 5.8.7 size_t cudaFuncAttributes::sharedSizeBytes Data Structure Documentation The size in bytes of statically-allocated shared memory per block required by this function.

9.5.9.9.2.2.2 struct cudaPos cudaMemcpy3DParms::dstPos [read] Destination position offset 5.9.9 cudaMemcpy3DParms Struct Reference Data Fields • • • • • • • • struct cudaArray ∗ dstArray struct cudaPos dstPos struct cudaPitchedPtr dstPtr struct cudaExtent extent enum cudaMemcpyKind kind struct cudaArray ∗ srcArray struct cudaPos srcPos struct cudaPitchedPtr srcPtr 5.6 struct cudaArray∗ cudaMemcpy3DParms::srcArray [read] Source memory address 5.2.2.9.9 cudaMemcpy3DParms Struct Reference 327 5.1 Detailed Description CUDA 3D memory copying parameters 5.7 struct cudaPos cudaMemcpy3DParms::srcPos [read] Source position offset Generated for NVIDIA CUDA Library by Doxygen .5 enum cudaMemcpyKind cudaMemcpy3DParms::kind Type of transfer 5.2.1 Field Documentation struct cudaArray∗ cudaMemcpy3DParms::dstArray [read] Destination memory address 5.9.3 struct cudaPitchedPtr cudaMemcpy3DParms::dstPtr [read] Pitched destination memory address 5.2 5.2.9.9.4 struct cudaExtent cudaMemcpy3DParms::extent [read] Requested memory copy size 5.9.2.

328 5.9.2.8 struct cudaPitchedPtr cudaMemcpy3DParms::srcPtr [read] Data Structure Documentation Pitched source memory address Generated for NVIDIA CUDA Library by Doxygen .

2 void∗ cudaPitchedPtr::ptr Pointer to allocated memory 5.1 Field Documentation size_t cudaPitchedPtr::pitch Pitch of allocated memory in bytes 5.3 size_t cudaPitchedPtr::xsize Logical width of allocation in elements 5.5.2.2 5.2.2.10.10 cudaPitchedPtr Struct Reference Data Fields • • • • size_t pitch void ∗ ptr size_t xsize size_t ysize 5.10.4 size_t cudaPitchedPtr::ysize Logical height of allocation in elements Generated for NVIDIA CUDA Library by Doxygen .10.10.10.1 Detailed Description CUDA Pitched memory pointer See also: make_cudaPitchedPtr 5.10 cudaPitchedPtr Struct Reference 329 5.10.2.

2 y 5.11.2.330 Data Structure Documentation 5.11.2.2 5.11.11.11.1 x 5.11 cudaPos Struct Reference Data Fields • size_t x • size_t y • size_t z 5.2.3 z Field Documentation size_t cudaPos::x size_t cudaPos::y size_t cudaPos::z Generated for NVIDIA CUDA Library by Doxygen .1 Detailed Description CUDA 3D position See also: make_cudaPos 5.

1 Field Documentation int CUdevprop_st::clockRate Clock frequency in kilohertz 5.12.12.2.2.12.1 Detailed Description Legacy device properties 5.5 int CUdevprop_st::memPitch Maximum pitch in bytes allowed by memory copies 5.2 int CUdevprop_st::maxGridSize[3] Maximum size of each dimension of a grid 5.12.2 5.2.12.12 CUdevprop_st Struct Reference Data Fields • • • • • • • • • • int clockRate int maxGridSize [3] int maxThreadsDim [3] int maxThreadsPerBlock int memPitch int regsPerBlock int sharedMemPerBlock int SIMDWidth int textureAlign int totalConstantMemory 5.4 int CUdevprop_st::maxThreadsPerBlock Maximum number of threads per block 5.2.7 int CUdevprop_st::sharedMemPerBlock Shared memory available per block in bytes Generated for NVIDIA CUDA Library by Doxygen .2.2.5.12.3 int CUdevprop_st::maxThreadsDim[3] Maximum size of each dimension of a block 5.6 int CUdevprop_st::regsPerBlock 32-bit registers available per block 5.2.12.12.12.12 CUdevprop_st Struct Reference 331 5.

9 int CUdevprop_st::textureAlign Alignment requirement for textures 5.10 int CUdevprop_st::totalConstantMemory Constant memory available on device in bytes Generated for NVIDIA CUDA Library by Doxygen .2.2.2.12.332 5.12.8 int CUdevprop_st::SIMDWidth Data Structure Documentation Warp size in threads 5.12.

5.13 surfaceReference Struct Reference

333

5.13

surfaceReference Struct Reference

Data Fields
• struct cudaChannelFormatDesc channelDesc

5.13.1

Detailed Description

CUDA Surface reference

5.13.2
5.13.2.1

Field Documentation
struct cudaChannelFormatDesc surfaceReference::channelDesc [read]

Channel descriptor for surface reference

Generated for NVIDIA CUDA Library by Doxygen

334

Data Structure Documentation

5.14

textureReference Struct Reference

Data Fields
• • • • enum cudaTextureAddressMode addressMode [3] struct cudaChannelFormatDesc channelDesc enum cudaTextureFilterMode filterMode int normalized

5.14.1

Detailed Description

CUDA texture reference

5.14.2
5.14.2.1

Field Documentation
enum cudaTextureAddressMode textureReference::addressMode[3]

Texture address mode for up to 3 dimensions 5.14.2.2 struct cudaChannelFormatDesc textureReference::channelDesc [read]

Channel descriptor for the texture reference 5.14.2.3 enum cudaTextureFilterMode textureReference::filterMode

Texture filter mode 5.14.2.4 int textureReference::normalized

Indicates whether texture reads are normalized or not

Generated for NVIDIA CUDA Library by Doxygen

Index
addressMode textureReference, 334 binaryVersion cudaFuncAttributes, 325 C++ API Routines, 99 canMapHostMemory cudaDeviceProp, 320 channelDesc surfaceReference, 333 textureReference, 334 clockRate cudaDeviceProp, 320 CUdevprop_st, 331 computeMode cudaDeviceProp, 320 concurrentKernels cudaDeviceProp, 321 constSizeBytes cudaFuncAttributes, 325 Context Management, 173 CU_AD_FORMAT_FLOAT CUDA_TYPES, 157 CU_AD_FORMAT_HALF CUDA_TYPES, 157 CU_AD_FORMAT_SIGNED_INT16 CUDA_TYPES, 157 CU_AD_FORMAT_SIGNED_INT32 CUDA_TYPES, 157 CU_AD_FORMAT_SIGNED_INT8 CUDA_TYPES, 157 CU_AD_FORMAT_UNSIGNED_INT16 CUDA_TYPES, 157 CU_AD_FORMAT_UNSIGNED_INT32 CUDA_TYPES, 157 CU_AD_FORMAT_UNSIGNED_INT8 CUDA_TYPES, 157 CU_COMPUTEMODE_DEFAULT CUDA_TYPES, 157 CU_COMPUTEMODE_EXCLUSIVE CUDA_TYPES, 157 CU_COMPUTEMODE_PROHIBITED CUDA_TYPES, 158 CU_CTX_BLOCKING_SYNC CUDA_TYPES, 158 CU_CTX_LMEM_RESIZE_TO_MAX CUDA_TYPES, 158 CU_CTX_MAP_HOST CUDA_TYPES, 158 CU_CTX_SCHED_AUTO CUDA_TYPES, 158 CU_CTX_SCHED_SPIN CUDA_TYPES, 158 CU_CTX_SCHED_YIELD CUDA_TYPES, 158 CU_CUBEMAP_FACE_NEGATIVE_X CUDA_TYPES, 157 CU_CUBEMAP_FACE_NEGATIVE_Y CUDA_TYPES, 157 CU_CUBEMAP_FACE_NEGATIVE_Z CUDA_TYPES, 157 CU_CUBEMAP_FACE_POSITIVE_X CUDA_TYPES, 157 CU_CUBEMAP_FACE_POSITIVE_Y CUDA_TYPES, 157 CU_CUBEMAP_FACE_POSITIVE_Z CUDA_TYPES, 157 CU_D3D10_DEVICE_LIST_ALL CUDA_D3D10, 287 CU_D3D10_DEVICE_LIST_CURRENT_FRAME CUDA_D3D10, 287 CU_D3D10_DEVICE_LIST_NEXT_FRAME CUDA_D3D10, 287 CU_D3D11_DEVICE_LIST_ALL CUDA_D3D11, 302 CU_D3D11_DEVICE_LIST_CURRENT_FRAME CUDA_D3D11, 302 CU_D3D11_DEVICE_LIST_NEXT_FRAME CUDA_D3D11, 302 CU_D3D9_DEVICE_LIST_ALL CUDA_D3D9, 272 CU_D3D9_DEVICE_LIST_CURRENT_FRAME CUDA_D3D9, 272 CU_D3D9_DEVICE_LIST_NEXT_FRAME CUDA_D3D9, 272 CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_CLOCK_RATE

336 CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_COMPUTE_MODE CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS CUDA_TYPES, 161 CU_DEVICE_ATTRIBUTE_ECC_ENABLED CUDA_TYPES, 161 CU_DEVICE_ATTRIBUTE_GPU_OVERLAP CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_INTEGRATED CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_X CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Y CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Z CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_X CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Y CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Z CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_MAX_PITCH CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_BLOCK CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_WIDTH CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_HEIGHT CUDA_TYPES, 161 CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_NUMSLICES CUDA_TYPES, 161 CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_WIDTH CUDA_TYPES, 161 CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_HEIGHT CUDA_TYPES, 160

INDEX CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_WIDTH CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_DEPTH CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_HEIGHT CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_WIDTH CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_PCI_BUS_ID CUDA_TYPES, 161 CU_DEVICE_ATTRIBUTE_PCI_DEVICE_ID CUDA_TYPES, 161 CU_DEVICE_ATTRIBUTE_REGISTERS_PER_BLOCK CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_SHARED_MEMORY_PER_BLOCK CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_SURFACE_ALIGNMENT CUDA_TYPES, 161 CU_DEVICE_ATTRIBUTE_TCC_DRIVER CUDA_TYPES, 161 CU_DEVICE_ATTRIBUTE_TEXTURE_ALIGNMENT CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_TOTAL_CONSTANT_MEMORY CUDA_TYPES, 160 CU_DEVICE_ATTRIBUTE_WARP_SIZE CUDA_TYPES, 160 CU_EVENT_BLOCKING_SYNC CUDA_TYPES, 161 CU_EVENT_DEFAULT CUDA_TYPES, 161 CU_EVENT_DISABLE_TIMING CUDA_TYPES, 161 CU_FUNC_ATTRIBUTE_BINARY_VERSION CUDA_TYPES, 162 CU_FUNC_ATTRIBUTE_CONST_SIZE_BYTES CUDA_TYPES, 162 CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES CUDA_TYPES, 162 CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK CUDA_TYPES, 162 CU_FUNC_ATTRIBUTE_NUM_REGS CUDA_TYPES, 162 CU_FUNC_ATTRIBUTE_PTX_VERSION
Generated for NVIDIA CUDA Library by Doxygen

INDEX CUDA_TYPES, 162 CU_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES CUDA_TYPES, 162 CU_FUNC_CACHE_PREFER_L1 CUDA_TYPES, 161 CU_FUNC_CACHE_PREFER_NONE CUDA_TYPES, 161 CU_FUNC_CACHE_PREFER_SHARED CUDA_TYPES, 161 CU_JIT_ERROR_LOG_BUFFER CUDA_TYPES, 163 CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES CUDA_TYPES, 163 CU_JIT_FALLBACK_STRATEGY CUDA_TYPES, 163 CU_JIT_INFO_LOG_BUFFER CUDA_TYPES, 163 CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES CUDA_TYPES, 163 CU_JIT_MAX_REGISTERS CUDA_TYPES, 162 CU_JIT_OPTIMIZATION_LEVEL CUDA_TYPES, 163 CU_JIT_TARGET CUDA_TYPES, 163 CU_JIT_TARGET_FROM_CUCONTEXT CUDA_TYPES, 163 CU_JIT_THREADS_PER_BLOCK CUDA_TYPES, 162 CU_JIT_WALL_TIME CUDA_TYPES, 163 CU_LIMIT_MALLOC_HEAP_SIZE CUDA_TYPES, 164 CU_LIMIT_PRINTF_FIFO_SIZE CUDA_TYPES, 164 CU_LIMIT_STACK_SIZE CUDA_TYPES, 164 CU_MEMORYTYPE_ARRAY CUDA_TYPES, 164 CU_MEMORYTYPE_DEVICE CUDA_TYPES, 164 CU_MEMORYTYPE_HOST CUDA_TYPES, 164 CU_PREFER_BINARY CUDA_TYPES, 162 CU_PREFER_PTX CUDA_TYPES, 162 CU_TARGET_COMPUTE_10 CUDA_TYPES, 163 CU_TARGET_COMPUTE_11 CUDA_TYPES, 163 CU_TARGET_COMPUTE_12 CUDA_TYPES, 163 CU_TARGET_COMPUTE_13
Generated for NVIDIA CUDA Library by Doxygen

337 CUDA_TYPES, 163 CU_TARGET_COMPUTE_20 CUDA_TYPES, 163 CU_TARGET_COMPUTE_21 CUDA_TYPES, 163 CU_TR_ADDRESS_MODE_BORDER CUDA_TYPES, 157 CU_TR_ADDRESS_MODE_CLAMP CUDA_TYPES, 157 CU_TR_ADDRESS_MODE_MIRROR CUDA_TYPES, 157 CU_TR_ADDRESS_MODE_WRAP CUDA_TYPES, 157 CU_TR_FILTER_MODE_LINEAR CUDA_TYPES, 161 CU_TR_FILTER_MODE_POINT CUDA_TYPES, 161 CU_MEMHOSTALLOC_DEVICEMAP CUDA_TYPES, 153 CU_MEMHOSTALLOC_PORTABLE CUDA_TYPES, 153 CU_MEMHOSTALLOC_WRITECOMBINED CUDA_TYPES, 153 CU_PARAM_TR_DEFAULT CUDA_TYPES, 153 CU_TRSA_OVERRIDE_FORMAT CUDA_TYPES, 153 CU_TRSF_NORMALIZED_COORDINATES CUDA_TYPES, 153 CU_TRSF_READ_AS_INTEGER CUDA_TYPES, 153 CU_TRSF_SRGB CUDA_TYPES, 153 CUaddress_mode CUDA_TYPES, 154 CUaddress_mode_enum CUDA_TYPES, 157 CUarray CUDA_TYPES, 154 cuArray3DCreate CUDA_MEM, 191 cuArray3DGetDescriptor CUDA_MEM, 192 CUarray_cubemap_face CUDA_TYPES, 154 CUarray_cubemap_face_enum CUDA_TYPES, 157 CUarray_format CUDA_TYPES, 154 CUarray_format_enum CUDA_TYPES, 157 cuArrayCreate CUDA_MEM, 193 cuArrayDestroy

338 CUDA_MEM, 194 cuArrayGetDescriptor CUDA_MEM, 195 CUcomputemode CUDA_TYPES, 154 CUcomputemode_enum CUDA_TYPES, 157 CUcontext CUDA_TYPES, 154 CUctx_flags CUDA_TYPES, 154 CUctx_flags_enum CUDA_TYPES, 158 cuCtxAttach CUDA_CTX, 174 cuCtxCreate CUDA_CTX, 174 cuCtxDestroy CUDA_CTX, 175 cuCtxDetach CUDA_CTX, 176 cuCtxGetApiVersion CUDA_CTX, 176 cuCtxGetCacheConfig CUDA_CTX, 176 cuCtxGetDevice CUDA_CTX, 177 cuCtxGetLimit CUDA_CTX, 177 cuCtxPopCurrent CUDA_CTX, 178 cuCtxPushCurrent CUDA_CTX, 178 cuCtxSetCacheConfig CUDA_CTX, 179 cuCtxSetLimit CUDA_CTX, 180 cuCtxSynchronize CUDA_CTX, 180 cuD3D10CtxCreate CUDA_D3D10, 287 cuD3D10CtxCreateOnDevice CUDA_D3D10, 287 CUd3d10DeviceList CUDA_D3D10, 287 CUd3d10DeviceList_enum CUDA_D3D10, 287 cuD3D10GetDevice CUDA_D3D10, 288 cuD3D10GetDevices CUDA_D3D10, 288 cuD3D10GetDirect3DDevice CUDA_D3D10, 289 CUD3D10map_flags CUDA_D3D10_DEPRECATED, 293 CUD3D10map_flags_enum CUDA_D3D10_DEPRECATED, 293 cuD3D10MapResources CUDA_D3D10_DEPRECATED, 293 CUD3D10register_flags CUDA_D3D10_DEPRECATED, 293 CUD3D10register_flags_enum CUDA_D3D10_DEPRECATED, 293 cuD3D10RegisterResource CUDA_D3D10_DEPRECATED, 294 cuD3D10ResourceGetMappedArray CUDA_D3D10_DEPRECATED, 295 cuD3D10ResourceGetMappedPitch CUDA_D3D10_DEPRECATED, 296 cuD3D10ResourceGetMappedPointer CUDA_D3D10_DEPRECATED, 296 cuD3D10ResourceGetMappedSize CUDA_D3D10_DEPRECATED, 297 cuD3D10ResourceGetSurfaceDimensions CUDA_D3D10_DEPRECATED, 298 cuD3D10ResourceSetMapFlags CUDA_D3D10_DEPRECATED, 298 cuD3D10UnmapResources CUDA_D3D10_DEPRECATED, 299 cuD3D10UnregisterResource CUDA_D3D10_DEPRECATED, 300 cuD3D11CtxCreate CUDA_D3D11, 302 cuD3D11CtxCreateOnDevice CUDA_D3D11, 302 CUd3d11DeviceList CUDA_D3D11, 301 CUd3d11DeviceList_enum CUDA_D3D11, 302 cuD3D11GetDevice CUDA_D3D11, 303 cuD3D11GetDevices CUDA_D3D11, 303 cuD3D11GetDirect3DDevice CUDA_D3D11, 304 cuD3D9CtxCreate CUDA_D3D9, 272 cuD3D9CtxCreateOnDevice CUDA_D3D9, 272 CUd3d9DeviceList CUDA_D3D9, 272 CUd3d9DeviceList_enum CUDA_D3D9, 272 cuD3D9GetDevice CUDA_D3D9, 273 cuD3D9GetDevices CUDA_D3D9, 273 cuD3D9GetDirect3DDevice

INDEX

Generated for NVIDIA CUDA Library by Doxygen

INDEX CUDA_D3D9, 274 CUd3d9map_flags CUDA_D3D9_DEPRECATED, 278 CUd3d9map_flags_enum CUDA_D3D9_DEPRECATED, 278 cuD3D9MapResources CUDA_D3D9_DEPRECATED, 278 CUd3d9register_flags CUDA_D3D9_DEPRECATED, 278 CUd3d9register_flags_enum CUDA_D3D9_DEPRECATED, 278 cuD3D9RegisterResource CUDA_D3D9_DEPRECATED, 279 cuD3D9ResourceGetMappedArray CUDA_D3D9_DEPRECATED, 280 cuD3D9ResourceGetMappedPitch CUDA_D3D9_DEPRECATED, 281 cuD3D9ResourceGetMappedPointer CUDA_D3D9_DEPRECATED, 282 cuD3D9ResourceGetMappedSize CUDA_D3D9_DEPRECATED, 282 cuD3D9ResourceGetSurfaceDimensions CUDA_D3D9_DEPRECATED, 283 cuD3D9ResourceSetMapFlags CUDA_D3D9_DEPRECATED, 284 cuD3D9UnmapResources CUDA_D3D9_DEPRECATED, 284 cuD3D9UnregisterResource CUDA_D3D9_DEPRECATED, 285 CUDA Driver API, 147 CUDA Runtime API, 9 CUDA_D3D10 CU_D3D10_DEVICE_LIST_ALL, 287 CU_D3D10_DEVICE_LIST_CURRENT_FRAME, 287 CU_D3D10_DEVICE_LIST_NEXT_FRAME, 287 CUDA_D3D11 CU_D3D11_DEVICE_LIST_ALL, 302 CU_D3D11_DEVICE_LIST_CURRENT_FRAME, 302 CU_D3D11_DEVICE_LIST_NEXT_FRAME, 302 CUDA_D3D9 CU_D3D9_DEVICE_LIST_ALL, 272 CU_D3D9_DEVICE_LIST_CURRENT_FRAME, 272 CU_D3D9_DEVICE_LIST_NEXT_FRAME, 272 CUDA_ERROR_ALREADY_ACQUIRED CUDA_TYPES, 159 CUDA_ERROR_ALREADY_MAPPED CUDA_TYPES, 159 CUDA_ERROR_ARRAY_IS_MAPPED CUDA_TYPES, 159 CUDA_ERROR_CONTEXT_ALREADY_CURRENT CUDA_TYPES, 158
Generated for NVIDIA CUDA Library by Doxygen

339 CUDA_ERROR_DEINITIALIZED CUDA_TYPES, 158 CUDA_ERROR_ECC_UNCORRECTABLE CUDA_TYPES, 159 CUDA_ERROR_FILE_NOT_FOUND CUDA_TYPES, 159 CUDA_ERROR_INVALID_CONTEXT CUDA_TYPES, 158 CUDA_ERROR_INVALID_DEVICE CUDA_TYPES, 158 CUDA_ERROR_INVALID_HANDLE CUDA_TYPES, 159 CUDA_ERROR_INVALID_IMAGE CUDA_TYPES, 158 CUDA_ERROR_INVALID_SOURCE CUDA_TYPES, 159 CUDA_ERROR_INVALID_VALUE CUDA_TYPES, 158 CUDA_ERROR_LAUNCH_FAILED CUDA_TYPES, 159 CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING CUDA_TYPES, 159 CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES CUDA_TYPES, 159 CUDA_ERROR_LAUNCH_TIMEOUT CUDA_TYPES, 159 CUDA_ERROR_MAP_FAILED CUDA_TYPES, 159 CUDA_ERROR_NO_BINARY_FOR_GPU CUDA_TYPES, 159 CUDA_ERROR_NO_DEVICE CUDA_TYPES, 158 CUDA_ERROR_NOT_FOUND CUDA_TYPES, 159 CUDA_ERROR_NOT_INITIALIZED CUDA_TYPES, 158 CUDA_ERROR_NOT_MAPPED CUDA_TYPES, 159 CUDA_ERROR_NOT_MAPPED_AS_ARRAY CUDA_TYPES, 159 CUDA_ERROR_NOT_MAPPED_AS_POINTER CUDA_TYPES, 159 CUDA_ERROR_NOT_READY CUDA_TYPES, 159 CUDA_ERROR_OPERATING_SYSTEM CUDA_TYPES, 159 CUDA_ERROR_OUT_OF_MEMORY CUDA_TYPES, 158 CUDA_ERROR_SHARED_OBJECT_INIT_FAILED CUDA_TYPES, 159 CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND CUDA_TYPES, 159

159 CUDA_ERROR_UNSUPPORTED_LIMIT CUDA_TYPES. 157 CU_COMPUTEMODE_DEFAULT. 161 CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_HEIGHT. 158 CU_CUBEMAP_FACE_NEGATIVE_X. 157 CU_COMPUTEMODE_PROHIBITED. 161 CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_WIDTH. 161 CU_DEVICE_ATTRIBUTE_REGISTERS_PER_BLOCK. 160 CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_DEPTH. 160 CU_DEVICE_ATTRIBUTE_SURFACE_ALIGNMENT. 160 CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Z. 157 CU_AD_FORMAT_SIGNED_INT16. 160 CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE1D_WIDTH. 160 CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_HEIGHT. 161 CU_FUNC_ATTRIBUTE_BINARY_VERSION. 161 CU_EVENT_DISABLE_TIMING. 157 CU_CUBEMAP_FACE_NEGATIVE_Z. 160 CU_DEVICE_ATTRIBUTE_COMPUTE_MODE. 157 CU_AD_FORMAT_SIGNED_INT32. 160 CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE3D_WIDTH. 160 CU_DEVICE_ATTRIBUTE_SHARED_MEMORY_PER_BLOCK. 158 CU_CTX_LMEM_RESIZE_TO_MAX. 160 CU_DEVICE_ATTRIBUTE_WARP_SIZE. 160 INDEX CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Y. 158 CU_CTX_SCHED_SPIN. 158 CU_CTX_BLOCKING_SYNC. 158 CU_CTX_MAP_HOST. 160 CU_DEVICE_ATTRIBUTE_MAX_PITCH. 157 CU_AD_FORMAT_UNSIGNED_INT16. 161 CU_DEVICE_ATTRIBUTE_PCI_DEVICE_ID. 161 CU_DEVICE_ATTRIBUTE_TEXTURE_ALIGNMENT. 157 CU_COMPUTEMODE_EXCLUSIVE. 159 CUDA_SUCCESS CUDA_TYPES. 160 CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT. 160 CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_Y. 157 CU_AD_FORMAT_UNSIGNED_INT32. 160 CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_BLOCK. 161 CU_DEVICE_ATTRIBUTE_TCC_DRIVER. 157 CU_CUBEMAP_FACE_POSITIVE_Z. 158 CU_CTX_SCHED_AUTO. 158 CU_CTX_SCHED_YIELD. 157 CU_CUBEMAP_FACE_POSITIVE_Y. 160 CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_Z. 160 CU_DEVICE_ATTRIBUTE_TOTAL_CONSTANT_MEMORY. 160 CU_DEVICE_ATTRIBUTE_INTEGRATED. Generated for NVIDIA CUDA Library by Doxygen . 160 CU_DEVICE_ATTRIBUTE_CLOCK_RATE. 160 CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT. 162 CU_FUNC_ATTRIBUTE_CONST_SIZE_BYTES. 160 CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS. 160 CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_BLOCK. 160 CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_BLOCK. 157 CU_AD_FORMAT_HALF. 160 CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_HEIGHT. 160 CU_DEVICE_ATTRIBUTE_MAX_BLOCK_DIM_X. 162 CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES. 158 CUDA_TYPES CU_AD_FORMAT_FLOAT.340 CUDA_ERROR_UNKNOWN CUDA_TYPES. 160 CU_EVENT_BLOCKING_SYNC. 161 CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_ARRAY_NUMSLICES. 160 CU_DEVICE_ATTRIBUTE_MAXIMUM_TEXTURE2D_WIDTH. 161 CU_DEVICE_ATTRIBUTE_GPU_OVERLAP. 160 CUDA_ERROR_UNMAP_FAILED CUDA_TYPES. 160 CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_X. 161 CU_EVENT_DEFAULT. 157 CU_CUBEMAP_FACE_POSITIVE_X. 157 CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY. 160 CU_DEVICE_ATTRIBUTE_PCI_BUS_ID. 161 CU_DEVICE_ATTRIBUTE_ECC_ENABLED. 157 CU_CUBEMAP_FACE_NEGATIVE_Y. 157 CU_AD_FORMAT_SIGNED_INT8. 157 CU_AD_FORMAT_UNSIGNED_INT8.

159 CUDA_ERROR_NOT_MAPPED_AS_POINTER. 163 CU_JIT_TARGET_FROM_CUCONTEXT. 164 CU_MEMORYTYPE_DEVICE. 158 CUDA_ARRAY3D_2DARRAY CUDA_TYPES. 163 CU_JIT_INFO_LOG_BUFFER. 159 CUDA_SUCCESS. 163 CU_TARGET_COMPUTE_21. 311 Height. 161 CU_JIT_ERROR_LOG_BUFFER. 158 CUDA_ERROR_NOT_FOUND. 157 CU_TR_FILTER_MODE_LINEAR. 158 CUDA_ERROR_DEINITIALIZED. 162 CU_JIT_OPTIMIZATION_LEVEL. 159 CUDA_ERROR_FILE_NOT_FOUND. 164 CU_MEMORYTYPE_HOST. 161 CU_FUNC_CACHE_PREFER_NONE. 153 CUDA_ARRAY_DESCRIPTOR CUDA_TYPES. 163 CU_JIT_MAX_REGISTERS. 312 CUDA_ARRAY3D_SURFACE_LDST CUDA_TYPES. 163 CU_TARGET_COMPUTE_20. 159 CUDA_ERROR_ARRAY_IS_MAPPED. 159 CUDA_ERROR_LAUNCH_TIMEOUT. 162 CU_FUNC_CACHE_PREFER_L1. 161 CU_TR_FILTER_MODE_POINT. 158 CUDA_ERROR_INVALID_HANDLE. 313 Height. 162 CU_PREFER_PTX. 163 CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES. 158 CUDA_ERROR_ECC_UNCORRECTABLE. 159 CUDA_ERROR_UNKNOWN. 164 CU_LIMIT_STACK_SIZE. 163 CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES. 159 CUDA_ERROR_OUT_OF_MEMORY. 311 Format. 163 CU_TR_ADDRESS_MODE_BORDER. 311 Width. 159 CUDA_ERROR_OPERATING_SYSTEM. 162 CU_JIT_WALL_TIME. 159 CUDA_ERROR_NO_BINARY_FOR_GPU. 157 CU_TR_ADDRESS_MODE_CLAMP. 162 CU_TARGET_COMPUTE_10. 311 Depth. 311 NumChannels. 159 CUDA_ERROR_INVALID_CONTEXT.INDEX 162 CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK. 154 CUDA_ARRAY3D_DESCRIPTOR_st. 160 CUDA_ERROR_UNMAP_FAILED. 163 CU_JIT_THREADS_PER_BLOCK. 159 Generated for NVIDIA CUDA Library by Doxygen 341 CUDA_ERROR_INVALID_IMAGE. 157 CU_TR_ADDRESS_MODE_WRAP. 174 cuCtxCreate. 158 CUDA_ERROR_INVALID_DEVICE. 158 CUDA_ERROR_NOT_MAPPED. 164 CU_MEMORYTYPE_ARRAY. 162 CU_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES. 159 CUDA_ERROR_INVALID_VALUE. 161 CUDA_ERROR_ALREADY_ACQUIRED. 159 CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND. 153 CUDA_ARRAY3D_DESCRIPTOR CUDA_TYPES. 174 cuCtxDestroy. 158 CUDA_ERROR_INVALID_SOURCE. 313 CUDA_CTX cuCtxAttach. 163 CU_TARGET_COMPUTE_11. 162 CU_FUNC_ATTRIBUTE_NUM_REGS. 159 CUDA_ERROR_ALREADY_MAPPED. 313 Width. 163 CU_LIMIT_MALLOC_HEAP_SIZE. 313 Format. 164 CU_PREFER_BINARY. 154 CUDA_ARRAY_DESCRIPTOR_st. 157 CU_TR_ADDRESS_MODE_MIRROR. 159 CUDA_ERROR_NO_DEVICE. 161 CU_FUNC_CACHE_PREFER_SHARED. 159 CUDA_ERROR_UNSUPPORTED_LIMIT. 313 NumChannels. 311 Flags. 175 . 159 CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES. 159 CUDA_ERROR_NOT_MAPPED_AS_ARRAY. 163 CU_TARGET_COMPUTE_13. 162 CU_FUNC_ATTRIBUTE_PTX_VERSION. 159 CUDA_ERROR_NOT_READY. 159 CUDA_ERROR_CONTEXT_ALREADY_CURRENT. 164 CU_LIMIT_PRINTF_FIFO_SIZE. 159 CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING. 159 CUDA_ERROR_NOT_INITIALIZED. 158 CUDA_ERROR_SHARED_OBJECT_INIT_FAILED. 163 CU_TARGET_COMPUTE_12. 158 CUDA_ERROR_LAUNCH_FAILED. 163 CU_JIT_TARGET. 163 CU_JIT_FALLBACK_STRATEGY. 159 CUDA_ERROR_MAP_FAILED.

287 CUd3d10DeviceList_enum. 239 cuFuncSetSharedSize. 273 cuD3D9GetDirect3DDevice. 302 cuD3D11GetDevice. 295 cuD3D10ResourceGetMappedPitch. 180 CUDA_D3D10 cuD3D10CtxCreate. 267 Generated for NVIDIA CUDA Library by Doxygen . 178 cuCtxSetCacheConfig. 273 cuD3D9GetDevices. 242 cuParamSetSize. 288 cuD3D10GetDevices. 266 cuGLMapBufferObject. 263 cuGraphicsGLRegisterImage. 244 CUDA_GL cuGLCtxCreate. 279 cuD3D9ResourceGetMappedArray. 263 cuWGLGetDevice. 168 cuDeviceGetAttribute. 235 cuEventSynchronize. 240 cuLaunchGrid. 299 cuD3D10UnregisterResource. 293 cuD3D10MapResources. 296 cuD3D10ResourceGetMappedSize. 278 cuD3D9MapResources. 168 cuDeviceGetCount. 267 cuGLRegisterBufferObject. 274 CUDA_D3D9_DEPRECATED CUd3d9map_flags. 272 CUd3d9DeviceList_enum. 284 cuD3D9UnmapResources. 171 CUDA_EVENT cuEventCreate. 304 cuGraphicsD3D11RegisterResource. 238 cuFuncSetBlockShape. 294 cuD3D10ResourceGetMappedArray. 176 cuCtxGetCacheConfig. 298 cuD3D10ResourceSetMapFlags. 240 cuLaunchGridAsync. 300 CUDA_D3D11 cuD3D11CtxCreate. 304 CUDA_D3D9 cuD3D9CtxCreate. 278 cuD3D9RegisterResource. 234 cuEventQuery. 170 cuDeviceGetProperties. 278 CUd3d9register_flags_enum. 265 CUGLmap_flags_enum. 285 CUDA_DEVICE cuDeviceComputeCapability. 243 CUDA_EXEC_DEPRECATED cuParamSetTexRef. 289 CUDA_D3D10_DEPRECATED CUD3D10map_flags. 298 cuD3D10UnmapResources. 278 CUd3d9register_flags. 287 cuD3D10CtxCreateOnDevice. 235 cuEventRecord. 272 CUd3d9DeviceList. 293 CUD3D10map_flags_enum. 284 cuD3D9UnregisterResource. 233 cuEventDestroy.342 cuCtxDetach. 278 INDEX CUd3d9map_flags_enum. 234 cuEventElapsedTime. 281 cuD3D9ResourceGetMappedPointer. 176 cuCtxGetDevice. 282 cuD3D9ResourceGetMappedSize. 293 CUD3D10register_flags. 170 cuDeviceGetName. 283 cuD3D9ResourceSetMapFlags. 241 cuParamSetf. 177 cuCtxPopCurrent. 240 cuLaunch. 242 cuParamSetv. 170 cuDeviceTotalMem. 176 cuCtxGetApiVersion. 282 cuD3D9ResourceGetSurfaceDimensions. 303 cuD3D11GetDirect3DDevice. 241 cuParamSeti. 293 cuD3D10RegisterResource. 301 CUd3d11DeviceList_enum. 167 cuDeviceGet. 287 CUd3d10DeviceList. 262 cuGraphicsGLRegisterBuffer. 288 cuD3D10GetDirect3DDevice. 287 cuD3D10GetDevice. 264 CUDA_GL_DEPRECATED cuGLInit. 177 cuCtxGetLimit. 280 cuD3D9ResourceGetMappedPitch. 272 cuD3D9CtxCreateOnDevice. 302 CUd3d11DeviceList. 266 cuGLMapBufferObjectAsync. 274 cuGraphicsD3D9RegisterResource. 236 CUDA_EXEC cuFuncGetAttribute. 293 CUD3D10register_flags_enum. 302 cuD3D11CtxCreateOnDevice. 266 CUGLmap_flags. 303 cuD3D11GetDevices. 238 cuFuncSetCacheConfig. 297 cuD3D10ResourceGetSurfaceDimensions. 180 cuCtxSynchronize. 179 cuCtxSetLimit. 272 cuD3D9GetDevice. 296 cuD3D10ResourceGetMappedPointer. 178 cuCtxPushCurrent. 289 cuGraphicsD3D10RegisterResource.

314 dstHost. 217 cuMemGetAddressRange. 317 Height. 195 cuMemAllocHost. 258 cuGraphicsSubResourceGetMappedArray. 228 cuMemsetD8Async. 193 cuArrayDestroy. 224 cuMemsetD2D32Async. 315 srcXInBytes. 221 cuMemsetD16. 195 cuMemAlloc. 317 dstLOD.INDEX cuGLSetBufferObjectMapFlags. 225 Generated for NVIDIA CUDA Library by Doxygen 343 cuMemsetD2D8Async. 228 CUDA_MEMCPY2D CUDA_TYPES. 200 cuMemcpy2DUnaligned. 216 cuMemcpyHtoDAsync. 217 cuMemFreeHost. 222 cuMemsetD2D16. 202 cuMemcpy3D. 196 cuMemAllocPitch. 182 . 154 CUDA_MEMCPY3D_st. 269 CUDA_GRAPHICS cuGraphicsMapResources. 314 dstY. 210 cuMemcpyAtoHAsync. 317 reserved1. 197 cuMemcpy2D. 315 srcMemoryType. 191 cuArray3DGetDescriptor. 227 cuMemsetD8. 214 cuMemcpyHtoA. 315 CUDA_MEMCPY3D CUDA_TYPES. 209 cuMemcpyAtoH. 165 CUDA_MEM cuArray3DCreate. 220 cuMemHostGetFlags. 260 cuGraphicsUnregisterResource. 209 cuMemcpyAtoD. 198 cuMemcpy2DAsync. 226 cuMemsetD32Async. 192 cuArrayCreate. 221 cuMemsetD16Async. 260 CUDA_INITIALIZE cuInit. 314 dstMemoryType. 218 cuMemGetInfo. 316 Depth. 318 CUDA_MODULE cuModuleGetFunction. 206 cuMemcpyAtoA. 317 dstZ. 214 cuMemcpyHtoAAsync. 315 Height. 317 reserved0. 316 dstHeight. 315 srcHost. 314 dstPitch. 318 srcHost. 194 cuArrayGetDescriptor. 318 srcY. 204 cuMemcpy3DAsync. 258 cuGraphicsResourceSetMapFlags. 316 dstArray. 317 srcDevice. 268 cuGLUnmapBufferObject. 317 srcHeight. 268 cuGLUnmapBufferObjectAsync. 218 cuMemHostAlloc. 318 srcXInBytes. 315 WidthInBytes. 317 dstY. 154 CUDA_MEMCPY2D_st. 315 srcPitch. 317 dstMemoryType. 314 dstDevice. 216 cuMemFree. 226 cuMemsetD32. 317 dstXInBytes. 212 cuMemcpyDtoDAsync. 222 cuMemsetD2D16Async. 315 srcY. 219 cuMemHostGetDevicePointer. 317 srcArray. 215 cuMemcpyHtoD. 213 cuMemcpyDtoHAsync. 257 cuGraphicsResourceGetMappedPointer. 315 srcDevice. 269 cuGLUnregisterBufferObject. 318 srcPitch. 259 cuGraphicsUnmapResources. 223 cuMemsetD2D32. 318 srcMemoryType. 224 cuMemsetD2D8. 212 cuMemcpyDtoH. 316 dstHost. 317 dstPitch. 211 cuMemcpyDtoD. 314 dstXInBytes. 211 cuMemcpyDtoA. 315 srcArray. 318 srcZ. 316 dstDevice. 318 srcLOD. 318 WidthInBytes. 314 dstArray.

246 cuTexRefGetArray. 246 cuTexRefGetFilterMode. 155 CUfilter_mode_enum. 154 CUctx_flags_enum. 157 CUarray_format. 156 CUDA_VDPAU cuGraphicsVDPAURegisterOutputSurface. 251 cuTexRefSetFormat. 153 CU_MEMHOSTALLOC_WRITECOMBINED. 162 CUgraphicsRegisterFlags. 307 cuGraphicsVDPAURegisterVideoSurface. 157 CUcontext. 156 CUsurfref. 154 CUarray_cubemap_face_enum. 156 CUjit_fallback_enum. 156 CUlimit_enum. 163 CUmemorytype. 153 CU_TRSF_NORMALIZED_COORDINATES. 154 CUDA_ARRAY3D_SURFACE_LDST. 185 cuModuleLoadDataEx. 162 CUjit_option. 157 CUcomputemode. 162 CUjit_target. 247 cuTexRefGetFlags. 154 CUarray_cubemap_face. 155 CUdevprop. 230 cuStreamDestroy. 161 CUgraphicsMapResourceFlags. 156 CUjit_target_enum. 154 INDEX CUctx_flags. 164 CUmodule. 187 cuModuleUnload. 232 CUDA_SURFREF cuSurfRefGetArray. 154 CUDA_MEMCPY2D. 153 cuDriverGetVersion. 156 CUmemorytype_enum. 158 CUDA_ARRAY3D_2DARRAY. 153 CU_MEMHOSTALLOC_PORTABLE. 155 CUgraphicsMapResourceFlags_enum. 183 cuModuleGetTexRef. 309 cuVDPAUGetDevice. 153 CU_PARAM_TR_DEFAULT. 156 CUjit_fallback. 156 CUstream. 250 cuTexRefSetFlags. 155 CUdevice_attribute. 155 CUevent. 160 CUdeviceptr. 163 CUlimit. 155 CUfunction_attribute. 156 CUtexref. 153 CU_TRSF_SRGB. 248 cuTexRefSetAddress2D. 247 cuTexRefGetFormat. 155 CUfunc_cache_enum. 255 CUDA_TEXREF cuTexRefGetAddress. 161 CUfunc_cache. 249 cuTexRefSetArray. 184 cuModuleLoad. 155 CUgraphicsRegisterFlags_enum. 154 CUaddress_mode_enum. 153 CUaddress_mode. 253 CUDA_TYPES CU_MEMHOSTALLOC_DEVICEMAP. 248 cuTexRefSetAddress. 153 CU_TRSA_OVERRIDE_FORMAT. 308 cuVDPAUCtxCreate.344 cuModuleGetGlobal. 231 cuStreamWaitEvent. 166 cudaAddressModeBorder Generated for NVIDIA CUDA Library by Doxygen . 251 CUDA_TEXREF_DEPRECATED cuTexRefCreate. 153 cudaError_enum. 155 CUfunction_attribute_enum. 153 CUDA_ARRAY3D_DESCRIPTOR. 156 CUjit_option_enum. 183 cuModuleGetSurfRef. 158 CUdevice. 155 CUevent_flags. 250 cuTexRefSetFilterMode. 156 CUresult. 155 CUdevice_attribute_enum. 162 CUgraphicsResource. 231 cuStreamSynchronize. 154 CUarray_format_enum. 154 CUDA_VERSION. 153 CUDA_ARRAY_DESCRIPTOR. 249 cuTexRefSetAddressMode. 161 CUfunction. 157 CUarray. 253 cuTexRefDestroy. 230 cuStreamQuery. 309 CUDA_VERSION CUDA_TYPES. 154 CUcomputemode_enum. 255 cuSurfRefSetArray. 184 cuModuleLoadData. 154 CUDA_MEMCPY3D. 246 cuTexRefGetAddressMode. 185 cuModuleLoadFatBinary. 155 CUevent_flags_enum. 161 CUfilter_mode. 187 CUDA_STREAM cuStreamCreate. 153 CU_TRSF_READ_AS_INTEGER.

140 cudaChannelFormatKindNone CUDART_TYPES. 145 cudaArrayDefault CUDART_TYPES. 103 CUDART_TEXTURE. 140 cudaComputeModeDefault CUDART_TYPES. 124 cudaD3D10ResourceGetMappedPitch CUDART_D3D10_DEPRECATED. 100. 101. 76 cudaD3D10MapResources CUDART_D3D10_DEPRECATED. 127 cudaD3D10SetDirect3DDevice CUDART_D3D10. 77 cudaD3D10GetDirect3DDevice CUDART_D3D10. 319 y. 17 cudaComputeMode CUDART_TYPES. 319 z. 76 cudaD3D10GetDevices CUDART_D3D10. 76 cudaD3D10RegisterFlagsNone CUDART_D3D10. 123 cudaD3D10ResourceGetMappedArray CUDART_D3D10_DEPRECATED. 125 cudaD3D10ResourceGetMappedPointer CUDART_D3D10_DEPRECATED. 30 cudaCreateChannelDesc CUDART_HIGHLEVEL. 76 cudaD3D10DeviceListNextFrame CUDART_D3D10. 126 cudaD3D10ResourceGetMappedSize CUDART_D3D10_DEPRECATED. 127 cudaD3D10ResourceSetMapFlags CUDART_D3D10_DEPRECATED. 140 cudaChannelFormatKindFloat CUDART_TYPES. 101 CUDART_SURFACE. 104 CUDART_TEXTURE. 140 cudaConfigureCall CUDART_EXECUTION. 93 cudaD3D10DeviceList CUDART_D3D10. 138 cudaBindSurfaceToArray CUDART_HIGHLEVEL. 76 cudaD3D10RegisterResource CUDART_D3D10_DEPRECATED. 140 cudaChannelFormatKindSigned CUDART_TYPES. 76 cudaD3D10DeviceListCurrentFrame CUDART_D3D10. 128 . 145 cudaBoundaryModeTrap CUDART_TYPES. 140 cudaComputeModeProhibited Generated for NVIDIA CUDA Library by Doxygen 345 CUDART_TYPES. 76 cudaD3D10GetDevice CUDART_D3D10. 126 cudaD3D10ResourceGetSurfaceDimensions CUDART_D3D10_DEPRECATED. 138 cudaArraySurfaceLoadStore CUDART_TYPES. 91 cudaBindTexture2D CUDART_HIGHLEVEL. 76 cudaD3D10MapFlagsReadOnly CUDART_D3D10. 319 w. 76 cudaD3D10DeviceListAll CUDART_D3D10. 319 x. 76 cudaD3D10MapFlagsNone CUDART_D3D10. 76 cudaD3D10MapFlagsWriteDiscard CUDART_D3D10. 92 cudaBindTextureToArray CUDART_HIGHLEVEL.INDEX CUDART_TYPES. 102 CUDART_TEXTURE. 145 cudaAddressModeWrap CUDART_TYPES. 77 cudaD3D10UnmapResources CUDART_D3D10_DEPRECATED. 319 f. 122 cudaD3D10RegisterFlags CUDART_D3D10. 140 cudaComputeModeExclusive CUDART_TYPES. 145 cudaAddressModeMirror CUDART_TYPES. 105 CUDART_TEXTURE. 102. 140 cudaChooseDevice CUDART_DEVICE. 145 cudaAddressModeClamp CUDART_TYPES. 140 cudaChannelFormatKindUnsigned CUDART_TYPES. 76 cudaD3D10RegisterFlagsArray CUDART_D3D10. 145 cudaBoundaryModeZero CUDART_TYPES. 319 cudaChannelFormatKind CUDART_TYPES. 96 cudaBindTexture CUDART_HIGHLEVEL. 93 cudaBoundaryModeClamp CUDART_TYPES. 145 cudaChannelFormatDesc. 77 cudaD3D10MapFlags CUDART_D3D10.

323 totalGlobalMem. 71 cudaD3D9DeviceListCurrentFrame CUDART_D3D9. 72 cudaD3D9MapFlags CUDART_D3D9. 323 cudaDevicePropDontCare INDEX Generated for NVIDIA CUDA Library by Doxygen . 322 totalConstMem. 322 surfaceAlignment. 322 pciBusID. 72 cudaD3D9UnmapResources CUDART_D3D9_DEPRECATED. 321 major. 71 cudaD3D9GetDevices CUDART_D3D9. 322 minor. 71 cudaD3D9DeviceListAll CUDART_D3D9. 320 canMapHostMemory. 320 computeMode.346 cudaD3D10UnregisterResource CUDART_D3D10_DEPRECATED. 321 maxThreadsDim. 321 deviceOverlap. 322 multiProcessorCount. 114 cudaD3D9RegisterFlags CUDART_D3D9. 81 cudaD3D11GetDevices CUDART_D3D11. 71 cudaD3D9GetDevice CUDART_D3D9. 80 cudaD3D11DeviceListCurrentFrame CUDART_D3D11. 81 cudaD3D11GetDirect3DDevice CUDART_D3D11. 80 cudaD3D11GetDevice CUDART_D3D11. 322 pciDeviceID. 115 cudaD3D9ResourceGetMappedArray CUDART_D3D9_DEPRECATED. 71 cudaD3D9MapFlagsReadOnly CUDART_D3D9. 321 integrated. 116 cudaD3D9ResourceGetMappedPitch CUDART_D3D9_DEPRECATED. 138 cudaDeviceMask CUDART_TYPES. 118 cudaD3D9ResourceGetSurfaceDimensions CUDART_D3D9_DEPRECATED. 321 maxThreadsPerBlock. 320 concurrentKernels. 138 cudaDeviceProp. 321 kernelExecTimeoutEnabled. 129 cudaD3D11DeviceList CUDART_D3D11. 121 cudaDeviceBlockingSync CUDART_TYPES. 120 cudaD3D9UnregisterResource CUDART_D3D9_DEPRECATED. 80 cudaD3D11DeviceListAll CUDART_D3D11. 80 cudaD3D11DeviceListNextFrame CUDART_D3D11. 120 cudaD3D9SetDirect3DDevice CUDART_D3D9. 138 cudaDeviceLmemResizeToMax CUDART_TYPES. 323 warpSize. 321 maxTexture1D. 71 cudaD3D9MapFlagsWriteDiscard CUDART_D3D9. 71 cudaD3D9DeviceListNextFrame CUDART_D3D9. 117 cudaD3D9ResourceGetMappedPointer CUDART_D3D9_DEPRECATED. 71 cudaD3D9MapFlagsNone CUDART_D3D9. 321 maxTexture3D. 322 tccDriver. 118 cudaD3D9ResourceGetMappedSize CUDART_D3D9_DEPRECATED. 322 regsPerBlock. 320 clockRate. 321 maxGridSize. 81 cudaD3D11SetDirect3DDevice CUDART_D3D11. 322 memPitch. 71 cudaD3D9RegisterFlagsArray CUDART_D3D9. 322 name. 72 cudaD3D9GetDirect3DDevice CUDART_D3D9. 71 cudaD3D9RegisterFlagsNone CUDART_D3D9. 322 sharedMemPerBlock. 321 maxTexture2D. 321 ECCEnabled. 71 cudaD3D9RegisterResource CUDART_D3D9_DEPRECATED. 322 textureAlignment. 82 cudaD3D9DeviceList CUDART_D3D9. 119 cudaD3D9ResourceSetMapFlags CUDART_D3D9_DEPRECATED. 321 maxTexture2DArray. 138 cudaDeviceMapHost CUDART_TYPES. 71 cudaD3D9MapResources CUDART_D3D9_DEPRECATED.

143 cudaErrorSharedObjectInitFailed CUDART_TYPES. 143 cudaErrorMissingConfiguration CUDART_TYPES. 98 cudaError CUDART_TYPES. 141 cudaErrorInvalidDevicePointer CUDART_TYPES. 143 cudaErrorIncompatibleDriverContext CUDART_TYPES. 138 cudaDeviceScheduleAuto CUDART_TYPES. 141 cudaErrorSetOnActiveProcess CUDART_TYPES. 139 cudaDeviceScheduleSpin CUDART_TYPES. 140 cudaErrorAddressOfConstant CUDART_TYPES. 141 cudaErrorInvalidResourceHandle CUDART_TYPES. 143 cudaErrorSharedObjectSymbolNotFound CUDART_TYPES. 141 cudaErrorMapBufferObjectFailed CUDART_TYPES. 142 cudaErrorTextureFetchFailed . 143 cudaErrorECCUncorrectable CUDART_TYPES. 141 cudaErrorLaunchOutOfResources CUDART_TYPES. 143 cudaErrorInitializationError CUDART_TYPES. 142 cudaErrorInvalidNormSetting CUDART_TYPES. 143 cudaErrorInvalidSurface CUDART_TYPES. 143 cudaErrorInvalidMemcpyDirection Generated for NVIDIA CUDA Library by Doxygen 347 CUDART_TYPES. 141 cudaErrorInsufficientDriver CUDART_TYPES. 143 cudaErrorNoKernelImageForDevice CUDART_TYPES. 143 cudaErrorStartupFailure CUDART_TYPES. 143 cudaErrorDuplicateVariableName CUDART_TYPES. 143 cudaErrorNotReady CUDART_TYPES. 142 cudaErrorDevicesUnavailable CUDART_TYPES. 142 cudaErrorApiFailureBase CUDART_TYPES. 140 cudaError_enum CUDA_TYPES. 141 cudaErrorInvalidDeviceFunction CUDART_TYPES. 143 cudaErrorNotYetImplemented CUDART_TYPES. 142 cudaErrorInvalidPitchValue CUDART_TYPES. 141 cudaErrorMixedDeviceExecution CUDART_TYPES. 143 cudaErrorInvalidChannelDescriptor CUDART_TYPES. 141 cudaErrorLaunchFailure CUDART_TYPES. 139 cudaDeviceScheduleYield CUDART_TYPES. 143 cudaErrorInvalidSymbol CUDART_TYPES. 143 cudaErrorSynchronizationError CUDART_TYPES. 141 cudaErrorInvalidTexture CUDART_TYPES. 142 cudaErrorInvalidConfiguration CUDART_TYPES. 141 cudaErrorInvalidFilterSetting CUDART_TYPES. 139 cudaDriverGetVersion CUDART__VERSION. 142 cudaErrorInvalidTextureBinding CUDART_TYPES. 142 cudaErrorPriorLaunchFailure CUDART_TYPES.INDEX CUDART_TYPES. 141 cudaErrorLaunchTimeout CUDART_TYPES. 141 cudaErrorMemoryAllocation CUDART_TYPES. 141 cudaErrorMemoryValueTooLarge CUDART_TYPES. 143 cudaErrorDuplicateTextureName CUDART_TYPES. 142 cudaErrorInvalidHostPointer CUDART_TYPES. 142 cudaErrorNoDevice CUDART_TYPES. 141 cudaErrorInvalidKernelImage CUDART_TYPES. 143 cudaErrorDuplicateSurfaceName CUDART_TYPES. 158 cudaError_t CUDART_TYPES. 142 cudaErrorInvalidValue CUDART_TYPES. 144 cudaErrorCudartUnloading CUDART_TYPES. 141 cudaErrorInvalidDevice CUDART_TYPES.

29 cudaExtent. 15 cudaGetLastError CUDART_ERROR. 325 binaryVersion. 28 cudaEventRecord CUDART_EVENT. 144 cudaFuncCachePreferNone CUDART_TYPES. 325 numRegs. 15 cudaGetSurfaceReference CUDART_SURFACE. 139 cudaEventCreate CUDART_EVENT. 27 cudaEventDisableTiming CUDART_TYPES. 107 cudaGetChannelDesc CUDART_TEXTURE. 37 cudaFreeArray CUDART_MEMORY. 325 sharedSizeBytes. 28 cudaEventSynchronize CUDART_EVENT. 130 cudaGLMapBufferObjectAsync CUDART_OPENGL_DEPRECATED. 108 CUDART_MEMORY. 38 cudaFuncAttributes. 27 cudaEventQuery CUDART_EVENT. 141 cudaErrorUnsupportedLimit CUDART_TYPES. 96 cudaGetSymbolAddress CUDART_HIGHLEVEL. 325 ptxVersion. 66 cudaGLMapFlagsWriteDiscard INDEX Generated for NVIDIA CUDA Library by Doxygen . 31 CUDART_HIGHLEVEL. 26 CUDART_HIGHLEVEL. 144 cudaFuncCachePreferL1 CUDART_TYPES. 324 width. 324 height.348 CUDART_TYPES. 144 cudaFuncGetAttributes CUDART_EXECUTION. 106 cudaFuncSetCacheConfig CUDART_EXECUTION. 140 cudaEventBlockingSync CUDART_TYPES. 144 cudaFuncCachePreferShared CUDART_TYPES. 107 CUDART_MEMORY. 94 cudaGetTextureReference CUDART_TEXTURE. 139 cudaEventElapsedTime CUDART_EVENT. 325 maxThreadsPerBlock. 139 cudaEventDestroy CUDART_EVENT. 142 cudaErrorUnmapBufferObjectFailed CUDART_TYPES. 325 constSizeBytes. 324 cudaFilterModeLinear CUDART_TYPES. 38 cudaGetTextureAlignmentOffset CUDART_HIGHLEVEL. 31 CUDART_HIGHLEVEL. 66 cudaGLMapFlagsReadOnly CUDART_OPENGL. 17 cudaGetDeviceCount CUDART_DEVICE. 145 cudaFree CUDART_MEMORY. 146 cudaFormatModeAuto CUDART_TYPES. 325 cudaFuncCache CUDART_TYPES. 108 CUDART_TEXTURE. 143 cudaEvent_t CUDART_TYPES. 325 localSizeBytes. 66 cudaGLMapFlagsNone CUDART_OPENGL. 142 cudaErrorUnknown CUDART_TYPES. 95 cudaGLMapBufferObject CUDART_OPENGL_DEPRECATED. 324 depth. 94 cudaGetDevice CUDART_DEVICE. 18 cudaGetDeviceProperties CUDART_DEVICE. 105 cudaEventCreateWithFlags CUDART_EVENT. 145 cudaFormatModeForced CUDART_TYPES. 26 cudaEventDefault CUDART_TYPES. 37 cudaFreeHost CUDART_MEMORY. 38 cudaGetSymbolSize CUDART_HIGHLEVEL. 18 cudaGetErrorString CUDART_ERROR. 131 cudaGLMapFlags CUDART_OPENGL. 142 cudaErrorTextureNotBound CUDART_TYPES. 146 cudaFilterModePoint CUDART_TYPES.

89 cudaGraphicsUnmapResources CUDART_INTEROP. 40 cudaLaunch CUDART_EXECUTION. 139 cudaHostAllocWriteCombined CUDART_TYPES. 67 cudaGraphicsGLRegisterImage CUDART_OPENGL. 42 cudaMallocArray CUDART_MEMORY. 144 cudaGraphicsCubeFaceNegativeZ CUDART_TYPES. 87 cudaGraphicsRegisterFlags CUDART_TYPES. 88 cudaGraphicsSubResourceGetMappedArray CUDART_INTEROP. 144 cudaGraphicsCubeFaceNegativeX CUDART_TYPES. 67 cudaGLUnmapBufferObject CUDART_OPENGL_DEPRECATED. 144 cudaGraphicsResource_t CUDART_TYPES. 144 cudaGraphicsCubeFacePositiveZ CUDART_TYPES. 132 cudaGLSetGLDevice CUDART_OPENGL. 43 cudaMallocHost CUDART_HIGHLEVEL. 32 CUDART_HIGHLEVEL. 88 cudaGraphicsResourceSetMapFlags CUDART_INTEROP. 41 cudaMalloc3DArray CUDART_MEMORY. 40 cudaHostGetFlags CUDART_MEMORY. 109 CUDART_MEMORY. 43 cudaMallocPitch CUDART_MEMORY. 84 cudaGraphicsVDPAURegisterVideoSurface CUDART_VDPAU. 144 cudaGraphicsRegisterFlagsNone CUDART_TYPES. 45 cudaMemcpy2D . 144 cudaLimitMallocHeapSize CUDART_TYPES. 73 cudaGraphicsGLRegisterBuffer CUDART_OPENGL. 109 cudaLimit CUDART_TYPES. 44 cudaMemcpy CUDART_MEMORY. 85 cudaHostAlloc CUDART_MEMORY. 144 cudaGraphicsCubeFacePositiveX CUDART_TYPES. 131 cudaGLSetBufferObjectMapFlags CUDART_OPENGL_DEPRECATED. 66 cudaGLRegisterBufferObject CUDART_OPENGL_DEPRECATED. 145 cudaMalloc CUDART_MEMORY. 140 cudaGraphicsResourceGetMappedPointer Generated for NVIDIA CUDA Library by Doxygen 349 CUDART_INTEROP. 41 cudaMalloc3D CUDART_MEMORY. 145 cudaLimitStackSize CUDART_TYPES. 67 cudaGraphicsMapFlags CUDART_TYPES. 144 cudaGraphicsCubeFaceNegativeY CUDART_TYPES. 82 cudaGraphicsD3D9RegisterResource CUDART_D3D9. 139 cudaHostAllocMapped CUDART_TYPES. 144 cudaGraphicsD3D10RegisterResource CUDART_D3D10.INDEX CUDART_OPENGL. 144 cudaGraphicsMapFlagsReadOnly CUDART_TYPES. 144 cudaGraphicsMapFlagsWriteDiscard CUDART_TYPES. 89 cudaGraphicsUnregisterResource CUDART_INTEROP. 39 cudaHostAllocDefault CUDART_TYPES. 145 cudaLimitPrintfFifoSize CUDART_TYPES. 139 cudaHostGetDevicePointer CUDART_MEMORY. 133 cudaGLUnregisterBufferObject CUDART_OPENGL_DEPRECATED. 133 cudaGraphicsCubeFace CUDART_TYPES. 132 cudaGLUnmapBufferObjectAsync CUDART_OPENGL_DEPRECATED. 139 cudaHostAllocPortable CUDART_TYPES. 78 cudaGraphicsD3D11RegisterResource CUDART_D3D11. 144 cudaGraphicsCubeFacePositiveY CUDART_TYPES. 144 cudaGraphicsMapFlagsNone CUDART_TYPES. 90 cudaGraphicsVDPAURegisterOutputSurface CUDART_VDPAU. 144 cudaGraphicsMapResources CUDART_INTEROP.

49 cudaMemcpy2DToArrayAsync CUDART_MEMORY. 51 cudaMemcpy3DAsync CUDART_MEMORY. 145 cudaMemcpyFromArray CUDART_MEMORY. 80 CUDART_D3D9 cudaD3D9DeviceListAll. 55 cudaMemcpyFromArrayAsync CUDART_MEMORY. 59 cudaMemGetInfo CUDART_MEMORY. 327 kind. 10 CUDART_D3D10 cudaD3D10DeviceListAll. 48 cudaMemcpy2DToArray CUDART_MEMORY. 63 cudaMemsetAsync CUDART_MEMORY. 76 cudaD3D10DeviceListCurrentFrame. 76 cudaD3D10RegisterFlagsNone. 45 cudaMemcpy2DArrayToArray CUDART_MEMORY. 76 CUDART_D3D11 cudaD3D11DeviceListAll. 329 pitch. 71 cudaD3D9DeviceListCurrentFrame. 80 cudaD3D11DeviceListCurrentFrame. 330 z. 50 cudaMemcpy3D CUDART_MEMORY. 330 x. 327 extent. 71 cudaD3D9DeviceListNextFrame. 327 dstArray. 145 cudaMemcpyToArray CUDART_MEMORY. 47 cudaMemcpy2DFromArrayAsync CUDART_MEMORY. 145 cudaMemcpyKind CUDART_TYPES. 146 cudaReadModeNormalizedFloat CUDART_TYPES. 76 cudaD3D10MapFlagsReadOnly. 329 xsize. 59 cudaMemcpyToSymbolAsync CUDART_MEMORY. 76 cudaD3D10DeviceListNextFrame. 61 cudaMemset3D CUDART_MEMORY. 76 cudaD3D10MapFlagsWriteDiscard. 71 INDEX Generated for NVIDIA CUDA Library by Doxygen . 61 cudaMemset2DAsync CUDART_MEMORY. 327 dstPos. 329 cudaPos. 327 srcPos. 62 cudaMemset3DAsync CUDART_MEMORY. 46 cudaMemcpy2DAsync CUDART_MEMORY. 80 cudaD3D11DeviceListNextFrame. 76 cudaD3D10MapFlagsNone.350 CUDART_MEMORY. 58 cudaMemcpyToArrayAsync CUDART_MEMORY. 329 ysize. 53 cudaMemcpyAsync CUDART_MEMORY. 71 cudaD3D9MapFlagsNone. 145 cudaMemcpyHostToHost CUDART_TYPES. 71 cudaD3D9MapFlagsReadOnly. 145 cudaMemcpyDeviceToHost CUDART_TYPES. 56 cudaMemcpyFromSymbolAsync CUDART_MEMORY. 327 cudaMemcpyArrayToArray CUDART_MEMORY. 76 cudaD3D10RegisterFlagsArray. 329 ptr. 52 cudaMemcpy3DParms. 63 cudaPeekAtLastError CUDART_ERROR. 58 cudaMemcpyToSymbol CUDART_MEMORY. 55 cudaMemcpyFromSymbol CUDART_MEMORY. 327 srcArray. 57 cudaMemcpyHostToDevice CUDART_TYPES. 327 dstPtr. 71 cudaD3D9MapFlagsWriteDiscard. 60 cudaMemset2D CUDART_MEMORY. 146 CUDART CUDART_VERSION. 47 cudaMemcpy2DFromArray CUDART_MEMORY. 60 cudaMemset CUDART_MEMORY. 16 cudaPitchedPtr. 327 srcPtr. 330 cudaReadModeElementType CUDART_TYPES. 54 cudaMemcpyDeviceToDevice CUDART_TYPES. 330 y. 71 cudaD3D9RegisterFlagsArray.

145 cudaFuncCachePreferL1. 77 cudaD3D10GetDirect3DDevice. 145 cudaChannelFormatKindFloat. 145 cudaAddressModeMirror. 143 cudaFilterModeLinear. 142 cudaErrorTextureFetchFailed. 141 cudaErrorMapBufferObjectFailed. 145 cudaMemcpyDeviceToHost. 144 cudaGraphicsCubeFacePositiveZ. 144 cudaGraphicsCubeFaceNegativeX. 141 cudaErrorInvalidTexture. 143 cudaErrorInvalidChannelDescriptor. 140 cudaComputeModeExclusive. 141 cudaErrorInsufficientDriver. 144 cudaGraphicsMapFlagsWriteDiscard. 141 cudaErrorUnsupportedLimit. 144 cudaGraphicsCubeFaceNegativeZ. 142 cudaErrorUnmapBufferObjectFailed. 145 cudaReadModeElementType. 140 cudaComputeModeProhibited. 145 cudaMemcpyHostToDevice. 76 cudaD3D10RegisterFlags. 145 cudaAddressModeClamp. 143 cudaErrorIncompatibleDriverContext. 98 CUDART_D3D10 cudaD3D10DeviceList. 143 cudaErrorDuplicateTextureName. 77 cudaD3D10MapFlags. 144 cudaLimitMallocHeapSize. 76 cudaD3D10GetDevice. 143 cudaErrorSynchronizationError. 145 cudaMemcpyDeviceToDevice. 141 cudaErrorLaunchOutOfResources. 141 cudaErrorMixedDeviceExecution. 146 cudaFilterModePoint. 141 cudaErrorInvalidDeviceFunction. 144 cudaErrorCudartUnloading. 143 cudaErrorECCUncorrectable. 145 cudaBoundaryModeClamp. 144 cudaFuncCachePreferShared. 143 cudaErrorNoKernelImageForDevice. 143 cudaErrorInvalidSurface. 141 cudaErrorMemoryValueTooLarge. 141 cudaErrorMemoryAllocation. 143 cudaErrorStartupFailure. 143 cudaErrorSharedObjectSymbolNotFound. 142 cudaErrorInvalidPitchValue. 144 cudaGraphicsRegisterFlagsNone. 141 cudaErrorInvalidResourceHandle. 143 cudaErrorNotYetImplemented. 146 cudaFormatModeAuto. 143 cudaErrorDuplicateVariableName. 143 cudaErrorInvalidMemcpyDirection. 98 cudaRuntimeGetVersion. 145 cudaLimitPrintfFifoSize. 142 cudaErrorInvalidHostPointer. 146 cudaReadModeNormalizedFloat. 141 cudaErrorInvalidKernelImage. 142 cudaErrorInvalidTextureBinding. 144 cudaGraphicsCubeFaceNegativeY. 141 CUDART__VERSION cudaDriverGetVersion. 144 cudaGraphicsCubeFacePositiveY. 143 cudaErrorDuplicateSurfaceName. 141 cudaErrorInvalidDevice. 66 cudaGLMapFlagsReadOnly. 71 CUDART_OPENGL cudaGLMapFlagsNone. 142 cudaErrorApiFailureBase. 140 cudaChannelFormatKindUnsigned. 144 cudaGraphicsMapFlagsNone. 144 cudaFuncCachePreferNone. 145 cudaFormatModeForced. 143 Generated for NVIDIA CUDA Library by Doxygen 351 cudaErrorMissingConfiguration. 142 cudaErrorPriorLaunchFailure. 145 cudaLimitStackSize. 140 cudaChannelFormatKindSigned. 144 cudaGraphicsMapFlagsReadOnly. 143 cudaErrorInitializationError. 143 cudaErrorSharedObjectInitFailed. 142 cudaErrorUnknown. 142 cudaErrorDevicesUnavailable. 66 CUDART_TYPES cudaAddressModeBorder. 142 cudaErrorInvalidValue. 141 cudaErrorInvalidDevicePointer. 143 cudaErrorNotReady. 140 cudaChannelFormatKindNone. 141 cudaErrorInvalidFilterSetting. 141 cudaErrorSetOnActiveProcess. 145 cudaBoundaryModeTrap. 66 cudaGLMapFlagsWriteDiscard. 142 cudaErrorNoDevice. 142 cudaErrorInvalidConfiguration. 145 cudaMemcpyHostToHost. 146 cudaSuccess. 76 . 140 cudaComputeModeDefault. 76 cudaD3D10GetDevices. 141 cudaErrorLaunchFailure.INDEX cudaD3D9RegisterFlagsNone. 144 cudaGraphicsCubeFacePositiveX. 145 cudaAddressModeWrap. 143 cudaErrorInvalidSymbol. 142 cudaErrorTextureNotBound. 140 cudaErrorAddressOfConstant. 141 cudaErrorLaunchTimeout. 142 cudaErrorInvalidNormSetting. 145 cudaBoundaryModeZero.

27 cudaEventElapsedTime. 126 cudaD3D10ResourceGetSurfaceDimensions. 119 cudaD3D9ResourceSetMapFlags. 123 cudaD3D10ResourceGetMappedArray. 88 cudaGraphicsSubResourceGetMappedArray. 29 CUDART_EXECUTION cudaConfigureCall. 82 cudaGraphicsD3D11RegisterResource. 38 cudaGetSymbolSize. 72 cudaGraphicsD3D9RegisterResource. 124 cudaD3D10ResourceGetMappedPitch. 72 cudaD3D9GetDirect3DDevice. 115 cudaD3D9ResourceGetMappedArray. 109 cudaSetupArgument. 27 cudaEventQuery. 107 cudaGetSymbolSize. 127 cudaD3D10UnmapResources.352 cudaD3D10SetDirect3DDevice. 38 cudaHostAlloc. 118 cudaD3D9ResourceGetSurfaceDimensions. 31 cudaLaunch. 41 cudaMalloc3DArray. 108 cudaLaunch. 45 Generated for NVIDIA CUDA Library by Doxygen . 87 cudaGraphicsResourceGetMappedPointer. 102 cudaBindTexture2D. 88 cudaGraphicsResourceSetMapFlags. 82 CUDART_D3D9 cudaD3D9DeviceList. 16 CUDART_EVENT cudaEventCreate. 18 cudaGetDeviceProperties. 31 cudaFuncSetCacheConfig. 118 cudaD3D9ResourceGetMappedSize. 44 cudaMemcpy. 71 cudaD3D9GetDevice. 122 cudaD3D10RegisterResource. 101 cudaBindTexture. 78 CUDART_D3D10_DEPRECATED cudaD3D10MapResources. 17 cudaGetDeviceCount. 116 cudaD3D9ResourceGetMappedPitch. 20 cudaSetValidDevices. 81 cudaD3D11SetDirect3DDevice. 125 cudaD3D10ResourceGetMappedPointer. 38 cudaGetSymbolAddress. 32 cudaSetDoubleForHost. 21 CUDART_ERROR cudaGetErrorString. 26 INDEX cudaEventCreateWithFlags. 89 cudaGraphicsUnmapResources. 20 cudaSetDeviceFlags. 103 cudaBindTextureToArray. 37 cudaFreeArray. 114 cudaD3D9RegisterResource. 15 cudaPeekAtLastError. 128 cudaD3D10UnregisterResource. 120 cudaD3D9UnregisterResource. 77 cudaGraphicsD3D10RegisterResource. 40 cudaMalloc. 80 cudaD3D11GetDevice. 43 cudaMallocPitch. 28 cudaEventRecord. 40 cudaHostGetFlags. 101. 104 cudaCreateChannelDesc. 127 cudaD3D10ResourceSetMapFlags. 81 cudaD3D11GetDevices. 15 cudaGetLastError. 89 cudaGraphicsUnregisterResource. 90 CUDART_MEMORY cudaFree. 73 CUDART_D3D9_DEPRECATED cudaD3D9MapResources. 109 cudaMallocHost. 41 cudaMalloc3D. 17 cudaGetDevice. 102. 33 cudaSetupArgument. 45 cudaMemcpy2D. 126 cudaD3D10ResourceGetMappedSize. 33 CUDART_HIGHLEVEL cudaBindSurfaceToArray. 71 cudaD3D9GetDevices. 26 cudaEventDestroy. 43 cudaMallocHost. 81 cudaD3D11GetDirect3DDevice. 105 cudaFuncGetAttributes. 18 cudaSetDevice. 111 CUDART_INTEROP cudaGraphicsMapResources. 129 CUDART_D3D11 cudaD3D11DeviceList. 107 cudaGetSymbolAddress. 72 cudaD3D9MapFlags. 121 CUDART_DEVICE cudaChooseDevice. 108 cudaGetTextureAlignmentOffset. 120 cudaD3D9UnmapResources. 117 cudaD3D9ResourceGetMappedPointer. 71 cudaD3D9RegisterFlags. 110 cudaUnbindTexture. 106 cudaFuncSetCacheConfig. 39 cudaHostGetDevicePointer. 32 cudaSetDoubleForDevice. 42 cudaMallocArray. 105 cudaEventCreate. 37 cudaFreeHost. 30 cudaFuncGetAttributes. 71 cudaD3D9SetDirect3DDevice. 100. 28 cudaEventSynchronize.

66 cudaGLSetGLDevice. 68 CUDART_OPENGL_DEPRECATED cudaGLMapBufferObject. 12 cudaThreadSetCacheConfig. 130 cudaGLMapBufferObjectAsync. 133 cudaGLUnregisterBufferObject. 131 cudaGLSetBufferObjectMapFlags. 139 cudaHostAllocWriteCombined. 145 cudaTextureReadMode. 144 cudaGraphicsResource_t. 85 . 144 cudaGraphicsCubeFace. 146 cudaUUID_t. 139 cudaDeviceScheduleYield. 60 cudaMemset2D. 11 cudaThreadGetLimit. 140 cudaError_t. 140 cudaHostAllocDefault. 94 cudaGetTextureAlignmentOffset. 59 cudaMemGetInfo. 85 cudaVDPAUGetDevice. 47 cudaMemcpy2DFromArrayAsync. 55 cudaMemcpyFromSymbol. 58 cudaMemcpyToArrayAsync. 140 cudaEvent_t. 58 cudaMemcpyToSymbol. 132 cudaGLUnmapBufferObjectAsync. 56 cudaMemcpyFromSymbolAsync. 144 cudaMemcpyKind. 138 cudaDeviceScheduleAuto. 139 cudaEventDisableTiming. 139 cudaLimit. 93 cudaGetChannelDesc. 140 cudaDeviceBlockingSync. 67 cudaGraphicsGLRegisterBuffer. 144 cudaGraphicsRegisterFlags. 24 cudaStreamWaitEvent. 138 cudaDevicePropDontCare. 95 cudaUnbindTexture. 84 cudaGraphicsVDPAURegisterVideoSurface. 92 Generated for NVIDIA CUDA Library by Doxygen 353 cudaBindTextureToArray. 96 CUDART_TEXTURE cudaBindTexture. 139 cudaFuncCache. 63 make_cudaExtent. 140 cudaComputeMode.INDEX cudaMemcpy2DArrayToArray. 138 cudaChannelFormatKind. 140 cudaSurfaceBoundaryMode. 140 CUDART_VDPAU cudaGraphicsVDPAURegisterOutputSurface. 63 cudaMemsetAsync. 95 CUDART_THREAD cudaThreadExit. 133 CUDART_STREAM cudaStreamCreate. 138 cudaDeviceLmemResizeToMax. 46 cudaMemcpy2DAsync. 145 cudaSurfaceFormatMode. 138 cudaArraySurfaceLoadStore. 94 cudaGetTextureReference. 59 cudaMemcpyToSymbolAsync. 67 cudaGraphicsGLRegisterImage. 14 CUDART_TYPES cudaArrayDefault. 61 cudaMemset2DAsync. 138 cudaDeviceMapHost. 140 cudaEventBlockingSync. 144 cudaGraphicsMapFlags. 47 cudaMemcpy2DFromArray. 48 cudaMemcpy2DToArray. 65 CUDART_OPENGL cudaGLMapFlags. 139 cudaHostAllocPortable. 11 cudaThreadGetCacheConfig. 93 cudaCreateChannelDesc. 64 make_cudaPos. 24 CUDART_SURFACE cudaBindSurfaceToArray. 50 cudaMemcpy3D. 139 cudaHostAllocMapped. 23 cudaStreamDestroy. 64 make_cudaPitchedPtr. 91 cudaBindTexture2D. 23 cudaStreamQuery. 12 cudaThreadSetLimit. 139 cudaEventDefault. 61 cudaMemset3D. 52 cudaMemcpyArrayToArray. 132 cudaGLUnmapBufferObject. 54 cudaMemcpyFromArray. 139 cudaError. 53 cudaMemcpyAsync. 96 cudaGetSurfaceReference. 49 cudaMemcpy2DToArrayAsync. 62 cudaMemset3DAsync. 13 cudaThreadSynchronize. 145 cudaTextureFilterMode. 55 cudaMemcpyFromArrayAsync. 57 cudaMemcpyToArray. 67 cudaWGLGetDevice. 139 cudaDeviceScheduleSpin. 145 cudaTextureAddressMode. 24 cudaStreamSynchronize. 60 cudaMemset. 145 cudaStream_t. 138 cudaDeviceMask. 51 cudaMemcpy3DAsync. 131 cudaGLRegisterBufferObject.

145 cudaTextureFilterMode CUDART_TYPES. 170 cuDeviceGetName CUDA_DEVICE. 24 cudaStreamSynchronize CUDART_STREAM. 331 maxThreadsPerBlock. 68 CUdevice CUDA_TYPES. 141 cudaSurfaceBoundaryMode CUDART_TYPES. 332 cuDriverGetVersion CUDA_VERSION. 155 CUdevprop_st. 32 cudaSetDoubleForHost CUDART_EXECUTION. 331 regsPerBlock. 23 cudaStreamDestroy CUDART_STREAM. 331 SIMDWidth. 168 cuDeviceGetCount CUDA_DEVICE. 155 CUevent_flags CUDA_TYPES. 146 cudaThreadExit CUDART_THREAD. 331 maxGridSize. 168 cuDeviceGetAttribute CUDA_DEVICE. 155 CUevent_flags_enum CUDA_TYPES. 86 cudaWGLGetDevice CUDART_OPENGL. 145 cudaSurfaceFormatMode CUDART_TYPES. 86 CUDART_VERSION CUDART. 155 CUdevice_attribute CUDA_TYPES. 155 cuDeviceTotalMem CUDA_DEVICE. 331 textureAlign. 161 INDEX Generated for NVIDIA CUDA Library by Doxygen . 331 memPitch. 171 CUdevprop CUDA_TYPES. 23 cudaStreamQuery CUDART_STREAM.354 cudaVDPAUSetVDPAUDevice. 160 cuDeviceComputeCapability CUDA_DEVICE. 21 cudaStream_t CUDART_TYPES. 33 CUDART_HIGHLEVEL. 24 cudaSuccess CUDART_TYPES. 331 sharedMemPerBlock. 20 cudaSetDeviceFlags CUDART_DEVICE. 140 cudaVDPAUGetDevice CUDART_VDPAU. 331 clockRate. 332 totalConstantMemory. 167 cuDeviceGet CUDA_DEVICE. 166 CUevent CUDA_TYPES. 20 cudaSetDoubleForDevice CUDART_EXECUTION. 33 cudaSetupArgument CUDART_EXECUTION. 12 cudaThreadSetLimit CUDART_THREAD. 98 cudaSetDevice CUDART_DEVICE. 24 cudaStreamWaitEvent CUDART_STREAM. 145 cudaTextureReadMode CUDART_TYPES. 331 maxThreadsDim. 11 cudaThreadGetLimit CUDART_THREAD. 14 cudaUnbindTexture CUDART_HIGHLEVEL. 140 cudaStreamCreate CUDART_STREAM. 10 cudaRuntimeGetVersion CUDART__VERSION. 111 CUDART_TEXTURE. 12 cudaThreadSetCacheConfig CUDART_THREAD. 95 cudaUUID_t CUDART_TYPES. 85 cudaVDPAUSetVDPAUDevice CUDART_VDPAU. 11 cudaThreadGetCacheConfig CUDART_THREAD. 170 CUdeviceptr CUDA_TYPES. 155 CUdevice_attribute_enum CUDA_TYPES. 145 cudaTextureAddressMode CUDART_TYPES. 110 cudaSetValidDevices CUDART_DEVICE. 170 cuDeviceGetProperties CUDA_DEVICE. 13 cudaThreadSynchronize CUDART_THREAD.

156 CUjit_option_enum CUDA_TYPES. 165 CUjit_fallback CUDA_TYPES. 269 cuGraphicsD3D10RegisterResource CUDA_D3D10. 161 cuGLCtxCreate CUDA_GL. 262 cuGLInit CUDA_GL_DEPRECATED. 238 cuFuncSetCacheConfig CUDA_EXEC. 162 CUgraphicsResource CUDA_TYPES. 307 cuGraphicsVDPAURegisterVideoSurface CUDA_VDPAU. 155 CUfunc_cache_enum CUDA_TYPES. 267 cuGLRegisterBufferObject CUDA_GL_DEPRECATED. 266 CUGLmap_flags CUDA_GL_DEPRECATED. 161 CUfunc_cache CUDA_TYPES. 266 cuGLMapBufferObjectAsync CUDA_GL_DEPRECATED. 235 cuEventSynchronize CUDA_EVENT. 268 cuGLUnmapBufferObjectAsync CUDA_GL_DEPRECATED. 266 cuGLMapBufferObject CUDA_GL_DEPRECATED. 263 CUgraphicsMapResourceFlags CUDA_TYPES.INDEX cuEventCreate CUDA_EVENT. 155 CUgraphicsRegisterFlags_enum CUDA_TYPES. 260 cuGraphicsUnregisterResource CUDA_GRAPHICS. 162 CUjit_target CUDA_TYPES. 236 CUfilter_mode CUDA_TYPES. 235 cuEventRecord CUDA_EVENT. 240 . 155 CUfunction_attribute CUDA_TYPES. 304 cuGraphicsD3D9RegisterResource CUDA_D3D9. 267 cuGLSetBufferObjectMapFlags CUDA_GL_DEPRECATED. 268 cuGLUnmapBufferObject CUDA_GL_DEPRECATED. 263 cuGraphicsGLRegisterImage CUDA_GL. 156 cuGraphicsResourceGetMappedPointer CUDA_GRAPHICS. 156 CUjit_target_enum CUDA_TYPES. 155 CUfilter_mode_enum CUDA_TYPES. 234 cuEventElapsedTime CUDA_EVENT. 258 cuGraphicsSubResourceGetMappedArray CUDA_GRAPHICS. 233 cuEventDestroy CUDA_EVENT. 265 CUGLmap_flags_enum CUDA_GL_DEPRECATED. 259 cuGraphicsUnmapResources CUDA_GRAPHICS. 162 cuGraphicsMapResources CUDA_GRAPHICS. 239 cuFuncSetSharedSize CUDA_EXEC. 155 CUfunction_attribute_enum CUDA_TYPES. 258 cuGraphicsResourceSetMapFlags CUDA_GRAPHICS. 289 cuGraphicsD3D11RegisterResource CUDA_D3D11. 234 cuEventQuery CUDA_EVENT. 257 CUgraphicsRegisterFlags CUDA_TYPES. 269 Generated for NVIDIA CUDA Library by Doxygen 355 cuGLUnregisterBufferObject CUDA_GL_DEPRECATED. 238 cuFuncSetBlockShape CUDA_EXEC. 162 CUjit_option CUDA_TYPES. 308 cuInit CUDA_INITIALIZE. 240 CUfunction CUDA_TYPES. 274 cuGraphicsGLRegisterBuffer CUDA_GL. 155 CUgraphicsMapResourceFlags_enum CUDA_TYPES. 161 cuFuncGetAttribute CUDA_EXEC. 260 cuGraphicsVDPAURegisterOutputSurface CUDA_VDPAU. 163 cuLaunch CUDA_EXEC. 156 CUjit_fallback_enum CUDA_TYPES.

217 cuMemFreeHost CUDA_MEM. 219 cuMemHostGetDevicePointer CUDA_MEM. 198 cuMemcpy2DAsync CUDA_MEM. 220 cuMemHostGetFlags CUDA_MEM. 228 CUmodule CUDA_TYPES. 222 cuMemsetD2D16Async CUDA_MEM. 214 cuMemcpyHtoAAsync CUDA_MEM. 204 cuMemcpy3DAsync CUDA_MEM. 214 cuMemcpyHtoA CUDA_MEM. 215 cuMemcpyHtoD CUDA_MEM. 202 cuMemcpy3D CUDA_MEM. 216 cuMemFree CUDA_MEM. 209 cuMemcpyAtoD CUDA_MEM. 241 CUlimit CUDA_TYPES. 195 cuMemAllocHost CUDA_MEM. 200 cuMemcpy2DUnaligned CUDA_MEM. 206 cuMemcpyAtoA CUDA_MEM. 184 cuModuleLoadData CUDA_MODULE. 156 CUmemorytype_enum CUDA_TYPES. 224 cuMemsetD2D8 CUDA_MEM. 213 cuMemcpyDtoHAsync CUDA_MEM. 226 cuMemsetD32 CUDA_MEM. 163 cuMemAlloc CUDA_MEM. 225 cuMemsetD2D8Async CUDA_MEM. 216 cuMemcpyHtoDAsync CUDA_MEM. 164 cuMemsetD16 CUDA_MEM. 183 cuModuleGetSurfRef CUDA_MODULE. 218 cuMemGetInfo CUDA_MEM. 212 cuMemcpyDtoDAsync CUDA_MEM. 185 INDEX Generated for NVIDIA CUDA Library by Doxygen . 156 CUlimit_enum CUDA_TYPES. 228 cuMemsetD8Async CUDA_MEM. 211 cuMemcpyDtoD CUDA_MEM. 218 cuMemHostAlloc CUDA_MEM. 221 cuMemsetD16Async CUDA_MEM. 209 cuMemcpyAtoH CUDA_MEM.356 cuLaunchGrid CUDA_EXEC. 217 cuMemGetAddressRange CUDA_MEM. 224 cuMemsetD2D32Async CUDA_MEM. 211 cuMemcpyDtoA CUDA_MEM. 240 cuLaunchGridAsync CUDA_EXEC. 197 cuMemcpy2D CUDA_MEM. 183 cuModuleGetTexRef CUDA_MODULE. 184 cuModuleLoad CUDA_MODULE. 196 cuMemAllocPitch CUDA_MEM. 226 cuMemsetD32Async CUDA_MEM. 182 cuModuleGetGlobal CUDA_MODULE. 221 CUmemorytype CUDA_TYPES. 210 cuMemcpyAtoHAsync CUDA_MEM. 223 cuMemsetD2D32 CUDA_MEM. 156 cuModuleGetFunction CUDA_MODULE. 222 cuMemsetD2D16 CUDA_MEM. 212 cuMemcpyDtoH CUDA_MEM. 185 cuModuleLoadDataEx CUDA_MODULE. 227 cuMemsetD8 CUDA_MEM.

243 CUresult CUDA_TYPES. 314 CUDA_MEMCPY3D_st. 309 cuWGLGetDevice CUDA_GL. 167 deviceOverlap cudaDeviceProp. 244 cuParamSetv CUDA_EXEC. 301 Direct3D 9 Interoperability. 309 cuVDPAUGetDevice CUDA_VDPAU. 271 dstArray CUDA_MEMCPY2D_st. 187 cuModuleUnload CUDA_MODULE. 316 dstHeight CUDA_MEMCPY3D_st. 317 dstPitch CUDA_MEMCPY2D_st.INDEX cuModuleLoadFatBinary CUDA_MODULE. 250 cuTexRefSetFlags CUDA_TEXREF. 327 dstDevice CUDA_MEMCPY2D_st. 317 dstMemoryType CUDA_MEMCPY2D_st. 231 cuStreamSynchronize CUDA_STREAM. 314 CUDA_MEMCPY3D_st. 230 cuStreamQuery CUDA_STREAM. 246 cuTexRefGetAddressMode CUDA_TEXREF. 80. 327 . 253 cuTexRefDestroy CUDA_TEXREF_DEPRECATED. 75. 230 cuStreamDestroy CUDA_STREAM. 246 cuTexRefGetArray CUDA_TEXREF. 255 cuSurfRefSetArray CUDA_SURFREF. 248 Generated for NVIDIA CUDA Library by Doxygen 357 cuTexRefSetAddress2D CUDA_TEXREF. 17. 156 cuTexRefCreate CUDA_TEXREF_DEPRECATED. 148 Data types used by CUDA Runtime. 249 cuTexRefSetAddressMode CUDA_TEXREF. 231 cuStreamWaitEvent CUDA_STREAM. 317 dstLOD CUDA_MEMCPY3D_st. 135 Depth CUDA_ARRAY3D_DESCRIPTOR_st. 232 CUsurfref CUDA_TYPES. 156 CUstream CUDA_TYPES. 187 cuParamSetf CUDA_EXEC. 264 Data types used by CUDA driver. 251 cuVDPAUCtxCreate CUDA_VDPAU. 250 cuTexRefSetFilterMode CUDA_TEXREF. 314 CUDA_MEMCPY3D_st. 251 cuTexRefSetFormat CUDA_TEXREF. 314 CUDA_MEMCPY3D_st. 248 cuTexRefSetAddress CUDA_TEXREF. 316 depth cudaExtent. 249 cuTexRefSetArray CUDA_TEXREF. 286 Direct3D 11 Interoperability. 321 Direct3D 10 Interoperability. 70. 253 cuTexRefGetAddress CUDA_TEXREF. 324 Device Management. 317 dstPos cudaMemcpy3DParms. 316 dstHost CUDA_MEMCPY2D_st. 316 cudaMemcpy3DParms. 241 cuParamSeti CUDA_EXEC. 314 CUDA_MEMCPY3D_st. 255 CUtexref CUDA_TYPES. 246 cuTexRefGetFilterMode CUDA_TEXREF. 242 cuParamSetTexRef CUDA_EXEC_DEPRECATED. 247 cuTexRefGetFlags CUDA_TEXREF. 311 CUDA_MEMCPY3D_st. 242 cuParamSetSize CUDA_EXEC. 156 cuStreamCreate CUDA_STREAM. 247 cuTexRefGetFormat CUDA_TEXREF. 156 cuSurfRefGetArray CUDA_SURFREF.

313 numRegs cudaFuncAttributes. 325 CUdevprop_st. 327 f cudaChannelFormatDesc. 319 filterMode textureReference. 317 dstZ CUDA_MEMCPY3D_st. 65 maxGridSize cudaDeviceProp. 182 multiProcessorCount cudaDeviceProp. 321 maxTexture2DArray cudaDeviceProp. 311 CUDA_ARRAY_DESCRIPTOR_st. 313 CUDA_MEMCPY2D_st. 331 maxTexture1D cudaDeviceProp. 325 major cudaDeviceProp. 329 ptr cudaPitchedPtr. 66. 327 localSizeBytes cudaFuncAttributes. 327 dstXInBytes CUDA_MEMCPY2D_st. 325 OpenGL Interoperability. 321 maxThreadsDim cudaDeviceProp. 237 extent cudaMemcpy3DParms. 311 CUDA_ARRAY_DESCRIPTOR_st. 315 CUDA_MEMCPY3D_st. 30. 15 Event Management. 34. 334 Flags CUDA_ARRAY3D_DESCRIPTOR_st. 322 cudaFuncAttributes. 313 Graphics Interoperability. 322 pciDeviceID cudaDeviceProp. 233 Execution Control. 87. 317 ECCEnabled cudaDeviceProp. 322 normalized textureReference. 262 pciBusID cudaDeviceProp. 331 Memory Management. 334 NumChannels CUDA_ARRAY3D_DESCRIPTOR_st. 322 pitch cudaPitchedPtr. 322 Module Management. 322 CUdevprop_st. 317 dstY CUDA_MEMCPY2D_st. 64 make_cudaPos CUDART_MEMORY. 321 Interactions with the CUDA Driver API. 257 Height CUDA_ARRAY3D_DESCRIPTOR_st. 317 height cudaExtent. 188 memPitch cudaDeviceProp. 324 Initialization. 314 CUDA_MEMCPY3D_st. 325 INDEX Generated for NVIDIA CUDA Library by Doxygen . 321 maxTexture3D cudaDeviceProp. 321 kind cudaMemcpy3DParms. 311 CUDA_ARRAY_DESCRIPTOR_st. 321 CUdevprop_st. 321 Error Handling. 331 minor cudaDeviceProp. 165 integrated cudaDeviceProp. 321 make_cudaExtent CUDART_MEMORY. 331 maxThreadsPerBlock cudaDeviceProp. 64 make_cudaPitchedPtr CUDART_MEMORY. 321 maxTexture2D cudaDeviceProp. 112 kernelExecTimeoutEnabled cudaDeviceProp. 322 name cudaDeviceProp. 329 ptxVersion cudaFuncAttributes. 26. 315 CUDA_MEMCPY3D_st. 311 Format CUDA_ARRAY3D_DESCRIPTOR_st.358 dstPtr cudaMemcpy3DParms. 321 CUdevprop_st.

315 CUDA_MEMCPY3D_st. 325 SIMDWidth CUdevprop_st. 331 reserved0 CUDA_MEMCPY3D_st. 317 cudaMemcpy3DParms. 166 w cudaChannelFormatDesc. 318 srcY CUDA_MEMCPY2D_st. 84. 315 CUDA_MEMCPY3D_st. 98. 319 cudaPos. 319 cudaPos. 11 totalConstantMemory CUdevprop_st. 307 Version Management. 319 warpSize cudaDeviceProp.INDEX regsPerBlock cudaDeviceProp. 334 normalized. 322 surfaceReference. 319 cudaPos. 255 surfaceAlignment cudaDeviceProp. 323 totalGlobalMem cudaDeviceProp. 330 . 331 sharedSizeBytes cudaFuncAttributes. 318 srcPitch CUDA_MEMCPY2D_st. 312 CUDA_ARRAY_DESCRIPTOR_st. 332 totalConstMem cudaDeviceProp. 318 srcLOD CUDA_MEMCPY3D_st. 318 x cudaChannelFormatDesc. 327 srcDevice CUDA_MEMCPY2D_st. 334 addressMode. 334 filterMode. 318 Stream Management. 315 CUDA_MEMCPY3D_st. 318 srcPos cudaMemcpy3DParms. 330 xsize cudaPitchedPtr. 322 Texture Reference Management. 315 CUDA_MEMCPY3D_st. 318 srcZ CUDA_MEMCPY3D_st. 23. 315 CUDA_MEMCPY3D_st. 318 srcMemoryType CUDA_MEMCPY2D_st. 245 textureAlign CUdevprop_st. 315 CUDA_MEMCPY3D_st. 315 CUDA_MEMCPY3D_st. 315 CUDA_MEMCPY3D_st. 334 Thread Management. 332 textureAlignment cudaDeviceProp. 331 srcArray CUDA_MEMCPY2D_st. 317 srcHeight CUDA_MEMCPY3D_st. 327 srcXInBytes CUDA_MEMCPY2D_st. 329 y cudaChannelFormatDesc. 323 Width CUDA_ARRAY3D_DESCRIPTOR_st. 317 sharedMemPerBlock cudaDeviceProp. 96. 317 reserved1 CUDA_MEMCPY3D_st. 313 width cudaExtent. 333 tccDriver Generated for NVIDIA CUDA Library by Doxygen 359 cudaDeviceProp. 324 WidthInBytes CUDA_MEMCPY2D_st. 330 ysize cudaPitchedPtr. 322 CUdevprop_st. 230 Surface Reference Management. 91. 323 VDPAU Interoperability. 322 CUdevprop_st. 322 textureReference. 327 srcPtr cudaMemcpy3DParms. 333 channelDesc. 329 z cudaChannelFormatDesc. 318 srcHost CUDA_MEMCPY2D_st. 334 channelDesc.

AND FITNESS FOR A PARTICULAR PURPOSE.Notice ALL NVIDIA DESIGN SPECIFICATIONS. DRAWINGS.” NVIDIA MAKES NO WARRANTIES. “MATERIALS”) ARE BEING PROVIDED “AS IS. FILES. the NVIDIA logo. Tesla. GeForce. Other company and product names may be trademarks of the respective companies with which they are associated.nvidia. Copyright © 2007-2010 NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT. NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. OR OTHERWISE WITH RESPECT TO THE MATERIALS. EXPRESSED.com . NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation. Trademarks NVIDIA. and Quadro are trademarks or registered trademarks of NVIDIA Corporation. Information furnished is believed to be accurate and reliable. This publication supersedes and replaces all information previously supplied. However. IMPLIED. AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY. NVIDIA Corporation 2701 San Tomas Expressway Santa Clara. DIAGNOSTICS. All rights reserved. No license is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation. CA 95050 www. MERCHANTABILITY. REFERENCE BOARDS. STATUTORY. LISTS.