P. 1
ptx_isa_2.0

ptx_isa_2.0

|Views: 154|Likes:
Published by Mariusz Wawrzyński

More info:

Published by: Mariusz Wawrzyński on Dec 06, 2010
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

11/08/2011

pdf

text

original

ldu

Load read-only data from an address that is common across threads in the warp.

Syntax

ldu{.ss}.type d, [a]; // load from address

ldu{.ss}.vec.type d, [a]; // vec load from address

.ss = { .global }; // state space

.vec = { .v2, .v4 };

.type = { .b8, .b16, .b32, .b64,

.u8, .u16, .u32, .u64,

.s8, .s16, .s32, .s64,

.f32, .f64 };

Description

Load read-only data into register variable d from the location specified by the source
address operand a in the global state space, where the address is guaranteed to be the

same across all threads in the warp. If no state space is given, perform the load using

generic addressing. In generic addressing, an address maps to global memory unless
it falls within the local memory window or the shared memory window. Within these

windows, an address maps to the corresponding location in local or shared memory,

i.e. to the address formed by subtracting the window base from the generic address to
form the offset in the implied state space. For ldu, only generic addresses that map to

global memory are legal.

The addressable operand a is one of:

[avar]

the name of an addressable variable var,

[areg]

a register reg containing a byte address,

[areg+immOff] a sum of register reg containing a byte address plus a constant integer

byte offset (signed, 32-bit), or

[immAddr]

an immediate absolute byte address (unsigned, 32-bit).

The address must be naturally aligned to a multiple of the access size. If an address is

not properly aligned, the resulting behavior is undefined; i.e., the access may proceed

by silently masking off low-order address bits to achieve proper rounding, or the
instruction may fault.

The data at the specified address must be read-only.

The address size may be either 32-bit or 64-bit. Addresses are zero-extended to the

specified width as needed, and truncated if the register width exceeds the state space

address width for the target architecture.

A register containing an address may be declared as a bit-size type or integer type.

Semantics

d = a; // named variable a

d = *a; // register

d = *(a+immOff); // register-plus-offset

d = *(immAddr); // immediate address

Notes

Destination d must be in the .reg state space.

A destination register wider than the specified type may be used. The value loaded is

sign-extended to the destination register width for signed integers, and is zero-
extended to the destination register width for unsigned and bit-size types.

.f16 data may be loaded using ldu.b16, and then converted to .f32 or .f64 using cvt.

PTX ISA Notes

Introduced in PTX ISA version 2.0.

Target ISA Notes ldu.f64 requires sm_13 or later.

Examples

ldu.global.f32 d,[a];

ldu.global.b32 d,[p+4];

ldu.global.v4.f32 Q,[p];

PTX ISA Version 2.0

116

January 24, 2010

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->