Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
Save to My Library
Look up keyword or section
Like this

Table Of Contents

Chapter 1 Introduction
1.1. Scalable Data-Parallel Computing Using GPUs
1.2. Goals of PTX
1.3. PTX ISA Version 2.0
1.3.1. Improved Floating-Point Support
1.3.2. Generic Addressing
1.3.3. Support for an Application Binary Interface
1.3.4. New Instructions
1.4. The Document’s Structure
Chapter 2 Programming Model
2.1. A Highly Multithreaded Coprocessor
2.2. Thread Hierarchy
2.2.1. Cooperative Thread Arrays
2.2.2. Grid of Cooperative Thread Arrays
2.3. Memory Hierarchy
Chapter 3 Parallel Thread Execution Machine Model
3.1. A Set of SIMT Multiprocessors with On-Chip
Chapter 4 Syntax
4.1. Source Format
4.3. Statements
4.3.1. Directive Statements
Table 1. PTX Directives
4.3.2. Instruction Statements
Table 2. Reserved Instruction Keywords
4.4. Identifiers
Table 3. Predefined Identifiers
4.5. Constants
4.5.1. Integer Constants
4.5.2. Floating-Point Constants
4.5.3. Predicate Constants
4.5.4. Constant Expressions
Table 4. Operator Precedence
4.5.5. Integer Constant Expression Evaluation
4.5.6. Summary of Constant Expression Evaluation Rules
Table 5. Constant Expression Evaluation Rules
Chapter 5 State Spaces, Types, and Variables
5.1. State Spaces
Table 6. State Spaces
Table 7. Properties of State Spaces
5.1.1. Register State Space
5.1.2. Special Register State Space
5.1.3. Constant State Space
5.1.4. Global State Space
5.1.5. Local State Space
5.1.6. Parameter State Space
5.1.7. Shared State Space
5.1.8. Texture State Space (deprecated)
5.2. Types
5.2.1. Fundamental Types
Table 8. Fundamental Type Specifiers
Basic Type Fundamental Type Specifiers
5.2.2. Restricted Use of Sub-Word Sizes
5.3. Texture, Sampler, and Surface Types
Table 9. Opaque Type Fields in Unified Texture Mode
Table 10. Opaque Type Fields in Independent Texture Mode
5.4. Variables
5.4.1. Variable Declarations
5.4.2. Vectors
5.4.3. Array Declarations
5.4.4. Initializers
5.4.5. Alignment
5.4.6. Parameterized Variable Names
Chapter 6 Instruction Operands
6.1. Operand Type Information
6.2. Source Operands
6.3. Destination Operands
6.4. Using Addresses, Arrays, and Vectors
6.4.1. Addresses as Operands
6.4.2. Arrays as Operands
6.4.3. Vectors as Operands
6.4.4. Labels and Function Names as Operands
6.5. Type Conversion
6.5.1. Scalar Conversions
6.5.2. Rounding Modifiers
Table 12. Floating-Point Rounding Modifiers
Table 13. Integer Rounding Modifiers
6.6. Operand Costs
Table 14. Cost Estimates for Accessing State-Spaces
Chapter 7 Abstracting the ABI
7.1. Function declarations and definitions
7.1.1. Changes from PTX 1.x
7.2. Variadic functions
7.3. Alloca
Chapter 8 Instruction Set
8.1. Format and Semantics of Instruction Descriptions
8.2. PTX Instructions
8.3. Predicated Execution
8.3.1. Comparisons
Table 16. Floating-Point Comparison Operators
Table 17. Floating-Point Comparison Operators Accepting NaN
Table 18. Floating-Point Comparison Operators Testing for NaN
8.3.2. Manipulating Predicates
8.4. Type Information for Instructions and Operands
8.4.1. Operand Size Exceeding Instruction-Type Size
8.5. Divergence of Threads in Control Constructs
8.6. Semantics
8.6.1. Machine-Specific Semantics of 16-bit Code
8.7. Instructions
8.7.1. Integer Arithmetic Instructions
Table 22. Integer Arithmetic Instructions: add
Table 23. Integer Arithmetic Instructions: sub
Table 24. Integer Arithmetic Instructions: add.cc
Table 25. Integer Arithmetic Instructions: addc
Table 26. Integer Arithmetic Instructions: sub.cc
Table 27. Integer Arithmetic Instructions: subc
Table 28. Integer Arithmetic Instructions: mul
Table 29. Integer Arithmetic Instructions: mad
Table 30. Integer Arithmetic Instructions: mul24
Table 31. Integer Arithmetic Instructions: mad24
Table 32. Integer Arithmetic Instructions: sad
Table 33. Integer Arithmetic Instructions: div
Table 34. Integer Arithmetic Instructions: rem
Table 35. Integer Arithmetic Instructions: abs
Table 36. Integer Arithmetic Instructions: neg
Table 37. Integer Arithmetic Instructions: min
Table 38. Integer Arithmetic Instructions: max
Table 39. Integer Arithmetic Instructions: popc
Table 40. Integer Arithmetic Instructions: clz
Table 41. Integer Arithmetic Instructions: bfind
Table 42. Integer Arithmetic Instructions: brev
Table 43. Integer Arithmetic Instructions: bfe
Table 44. Integer Arithmetic Instructions: bfi
Table 45. Integer Arithmetic Instructions: prmt
8.7.2. Floating-Point Instructions
Table 46. Summary of Floating-Point Instructions
Table 47. Floating-Point Instructions: testp
Table 48. Floating-Point Instructions: copysign
Table 49. Floating-Point Instructions: add
Table 50. Floating-Point Instructions: sub
Table 51. Floating-Point Instructions: mul
Table 52. Floating-Point Instructions: fma
Table 54. Floating-Point Instructions: div
Table 55. Floating-Point Instructions: abs
Table 56. Floating-Point Instructions: neg
Table 57. Floating-Point Instructions: min
Table 58. Floating-Point Instructions: max
Table 59. Floating-Point Instructions: rcp
Table 60. Floating-Point Instructions: sqrt
Table 61. Floating-Point Instructions: rsqrt
Table 62. Floating-Point Instructions: sin
Table 63. Floating-Point Instructions: cos
Table 64. Floating-Point Instructions: lg2
Table 65. Floating-Point Instructions: ex2
8.7.3. Comparison and Selection Instructions
Table 67. Comparison and Selection Instructions: setp
Table 68. Comparison and Selection Instructions: selp
Table 69. Comparison and Selection Instructions: slct
8.7.4. Logic and Shift Instructions
Table 70. Logic and Shift Instructions: and
Table 71. Logic and Shift Instructions: or
Table 72. Logic and Shift Instructions: xor
Table 73. Logic and Shift Instructions: not
Table 74. Logic and Shift Instructions: cnot
Table 75. Logic and Shift Instructions: shl
Table 76. Logic and Shift Instructions: shr
8.7.5. Data Movement and Conversion Instructions
Table 77. Cache Operators for Memory Load Instructions
Table 78. Cache Operators for Memory Store Instructions
Table 79. Data Movement and Conversion Instructions: mov
Table 80. Data Movement and Conversion Instructions: mov
Table 81. Data Movement and Conversion Instructions: ld
Table 82. Data Movement and Conversion Instructions: ldu
Table 83. Data Movement and Conversion Instructions: st
Table 85. Data Movement and Conversion Instructions: isspacep
Table 86. Data Movement and Conversion Instructions: cvta
Table 87. Data Movement and Conversion Instructions: cvt
8.7.6. Texture and Surface Instructions
Table 88. Texture and Surface Instructions: tex
Table 89. Texture and Surface Instructions: txq
Table 90. Texture and Surface Instructions: suld
Table 91. Texture and Surface Instructions: sust
Table 92. Texture and Surface Instructions: sured
Table 93. Texture and Surface Instructions: suq
8.7.7. Control Flow Instructions
Table 94. Control Flow Instructions: { }
Table 95. Control Flow Instructions: @
Table 96. Control Flow Instructions: bra
Table 97. Control Flow Instructions: call
Table 98. Control Flow Instructions: ret
Table 99. Control Flow Instructions: exit
8.7.8. Parallel Synchronization and Communication
8.7.9. Video Instructions
Table 107. Video Instructions: vmad
Table 108. Video Instructions: vset
8.7.10. Miscellaneous Instructions
The Miscellaneous instructions are:
Table 109. Miscellaneous Instructions: trap
Table 110. Miscellaneous Instructions: brkpt
Table 111. Miscellaneous Instructions: pmevent
Chapter 9 Special Registers
Table 112. Special Registers: %tid
Table 113. Special Registers: %ntid
Table 114. Special Registers: %laneid
Table 115. Special Registers: %warpid
Table 116. Special Registers: %nwarpid
Table 117. Special Registers: %ctaid
Table 118. Special Registers: %nctaid
Table 119. Special Registers: %smid
Table 120. Special Registers: %nsmid
Table 121. Special Registers: %gridid
Table 122. Special Registers: %lanemask_eq
Table 123. Special Registers: %lanemask_le
Table 124. Special Registers: %lanemask_lt
Table 125. Special Registers: %lanemask_ge
Table 126. Special Registers: %lanemask_gt
Table 127. Special Registers: %clock
Table 128. Special Registers: %clock64
Chapter 10 Directives
10.1. PTX Version and Target Directives
Table 130. PTX File Directives: .version
Table 131. PTX File Directives: .target
10.2. Specifying Kernel Entry Points and Functions
Table 132. Kernel and Function Directives: .entry
Table 133. Kernel and Function Directives: .func
10.3. Performance-Tuning Directives
Table 134. Performance-Tuning Directives: .maxnreg
Table 135. Performance-Tuning Directives: .maxntid
Table 137. Performance-Tuning Directives: .minnctapersm
Table 138. Performance-Tuning Directives: .pragma
10.4. Debugging Directives
Table 139. Debugging Directives: @@DWARF
Table 140. Debugging Directives: .section
Table 141. Debugging Directives: .file
Table 142. Debugging Directives: .loc
10.6. Linking Directives
Table 143. Linking Directives: .extern
Chapter 11 Release Notes
11.1. Changes in Version 2.0
11.1.1. New Features
11.1.2. Semantic Changes and Clarifications
11.1.3. Unimplemented Features Remaining
0 of .
Results for:
No results containing your search query
P. 1


Ratings: (0)|Views: 137 |Likes:
Published by Mariusz Wawrzyński

More info:

Published by: Mariusz Wawrzyński on Dec 06, 2010
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





You're Reading a Free Preview
Pages 4 to 99 are not shown in this preview.
You're Reading a Free Preview
Pages 103 to 187 are not shown in this preview.

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->