P. 1
CUDA_C_Programming_Guide

CUDA_C_Programming_Guide

|Views: 7,818|Likes:
Published by Kevin Kim

More info:

Published by: Kevin Kim on Mar 02, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

02/24/2013

pdf

text

original

Any call to a __global__ function must specify the execution configuration for that
call. The execution configuration defines the dimension of the grid and blocks that
will be used to execute the function on the device, as well as the associated stream
(see Section 3.3.9.1 for a description of streams).
When using the driver API, the execution configuration is specified through a series
of driver function calls as detailed in Section 3.3.3.
When using the runtime API (Section 3.2), the execution configuration is specified
by inserting an expression of the form <<< Dg, Db, Ns, S >>> between the
function name and the parenthesized argument list, where:
Dg is of type dim3 (see Section B.3.2) and specifies the dimension and size of
the grid, such that Dg.x * Dg.y equals the number of blocks being launched;
Dg.z must be equal to 1;
Db is of type dim3 (see Section B.3.2) and specifies the dimension and size of
each block, such that Db.x * Db.y * Db.z equals the number of threads
per block;
Ns is of type size_t and specifies the number of bytes in shared memory that
is dynamically allocated per block for this call in addition to the statically
allocated memory; this dynamically allocated memory is used by any of the
variables declared as an external array as mentioned in Section B.2.3; Ns is an
optional argument which defaults to 0;
S is of type cudaStream_t and specifies the associated stream; S is an
optional argument which defaults to 0.
As an example, a function declared as

__global__ void Func(float* parameter);

must be called like this:

Func<<< Dg, Db, Ns >>>(parameter);

The arguments to the execution configuration are evaluated before the actual
function arguments and like the function arguments, are currently passed via shared
memory to the device.
The function call will fail if Dg or Db are greater than the maximum sizes allowed
for the device as specified in Appendix G, or if Ns is greater than the maximum
amount of shared memory available on the device, minus the amount of shared
memory required for static allocation, functions arguments (for devices of compute
capability 1.x), and execution configuration.

Appendix B. C Language Extensions

CUDA C Programming Guide Version 3.2

127

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->