You are on page 1of 7

CUDA Memory Type

Cuda Memory Type


Register Memory
Local Memory
Shared Memory
Global Memory
Constant Memory
Texture Memory

Access, Scope & Lifetime


Memory

Location

Cached

Access

Scope

Lifetime

Register

On-chip

N/A

R/W

One
Thread

Thread

Local

Off-chip

No

R/W

One
Thread

Thread

Shared

On-chip

N/A

R/W

Threads in
Block

Block

Global

Off-chip

NO

R/W

All Thread
+ Host

Application

Constant

Off-chip

Yes

All Thread
+ Host

Application

Texture

Off-chip

Yes

All Thread
+ Host

Application

Shared Memory
The total size of shared memory may be set to 16KB, 32KB or
48KB (with the remaining amount automatically used for L1
Cache)
Shared memory defaults to 48KB (with 16KB remaining for L1
Cache).
More than 1 Tbyte/sec aggregate memory bandwidth
Use :
As cache
To reorgnize global memory acesses into coalesced pattern .
To share data between threads.

Local Memory
Local memory is not a physical type of memory, but an
abstraction of global memory.
Its scope is local to the thread and it resides off-chip,
which makes it as expensive to access as global memory.
Local memory is used only to hold automatic variables.
The compiler makes use of local memory when it
determines that there is not enough register space to
hold the variable.
Automatic variables that are large structures or arrays
are also typically placed in local memory.

Constant and Texture


Texture memory
Is variety of read-only memory on the device.
When all reads in a warp are physically adjacent,
Texture memory can reduce memory traffic and increase
performance compared to global memory.

Constant:
Is used for data that will not change over the course of a kernel
execution and is read only.
Using constant rather than global memory can reduce the
required memory bandwidth, however, this performance gain
can only be realized when a warp of threads read the same
location.

You might also like