P. 1


|Views: 154|Likes:
Published by Mariusz Wawrzyński

More info:

Published by: Mariusz Wawrzyński on Dec 06, 2010
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less






Load read-only data from an address that is common across threads in the warp.


ldu{.ss}.type d, [a]; // load from address

ldu{.ss}.vec.type d, [a]; // vec load from address

.ss = { .global }; // state space

.vec = { .v2, .v4 };

.type = { .b8, .b16, .b32, .b64,

.u8, .u16, .u32, .u64,

.s8, .s16, .s32, .s64,

.f32, .f64 };


Load read-only data into register variable d from the location specified by the source
address operand a in the global state space, where the address is guaranteed to be the

same across all threads in the warp. If no state space is given, perform the load using

generic addressing. In generic addressing, an address maps to global memory unless
it falls within the local memory window or the shared memory window. Within these

windows, an address maps to the corresponding location in local or shared memory,

i.e. to the address formed by subtracting the window base from the generic address to
form the offset in the implied state space. For ldu, only generic addresses that map to

global memory are legal.

The addressable operand a is one of:


the name of an addressable variable var,


a register reg containing a byte address,

[areg+immOff] a sum of register reg containing a byte address plus a constant integer

byte offset (signed, 32-bit), or


an immediate absolute byte address (unsigned, 32-bit).

The address must be naturally aligned to a multiple of the access size. If an address is

not properly aligned, the resulting behavior is undefined; i.e., the access may proceed

by silently masking off low-order address bits to achieve proper rounding, or the
instruction may fault.

The data at the specified address must be read-only.

The address size may be either 32-bit or 64-bit. Addresses are zero-extended to the

specified width as needed, and truncated if the register width exceeds the state space

address width for the target architecture.

A register containing an address may be declared as a bit-size type or integer type.


d = a; // named variable a

d = *a; // register

d = *(a+immOff); // register-plus-offset

d = *(immAddr); // immediate address


Destination d must be in the .reg state space.

A destination register wider than the specified type may be used. The value loaded is

sign-extended to the destination register width for signed integers, and is zero-
extended to the destination register width for unsigned and bit-size types.

.f16 data may be loaded using ldu.b16, and then converted to .f32 or .f64 using cvt.


Introduced in PTX ISA version 2.0.

Target ISA Notes ldu.f64 requires sm_13 or later.


ldu.global.f32 d,[a];

ldu.global.b32 d,[p+4];

ldu.global.v4.f32 Q,[p];

PTX ISA Version 2.0


January 24, 2010

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->