Professional Documents
Culture Documents
Location
Cached
Access
Scope
Lifetime
Register
On-chip
N/A
R/W
One
Thread
Thread
Local
Off-chip
No
R/W
One
Thread
Thread
Shared
On-chip
N/A
R/W
Threads in
Block
Block
Global
Off-chip
NO
R/W
All Thread
+ Host
Application
Constant
Off-chip
Yes
All Thread
+ Host
Application
Texture
Off-chip
Yes
All Thread
+ Host
Application
Shared Memory
The total size of shared memory may be set to 16KB, 32KB or
48KB (with the remaining amount automatically used for L1
Cache)
Shared memory defaults to 48KB (with 16KB remaining for L1
Cache).
More than 1 Tbyte/sec aggregate memory bandwidth
Use :
As cache
To reorgnize global memory acesses into coalesced pattern .
To share data between threads.
Local Memory
Local memory is not a physical type of memory, but an
abstraction of global memory.
Its scope is local to the thread and it resides off-chip,
which makes it as expensive to access as global memory.
Local memory is used only to hold automatic variables.
The compiler makes use of local memory when it
determines that there is not enough register space to
hold the variable.
Automatic variables that are large structures or arrays
are also typically placed in local memory.
Constant:
Is used for data that will not change over the course of a kernel
execution and is read only.
Using constant rather than global memory can reduce the
required memory bandwidth, however, this performance gain
can only be realized when a warp of threads read the same
location.