Professional Documents
Culture Documents
Mark Leone
Amortizes interpretive overhead over batches of points. Shading is dominated by oating point calculations.
SIMD interpreter
For each instruction in shader: Decode and dispatch instruction. For each point in batch: If runag is on: Load operands. Compute. Store result.
Why vectorize?
Consider batch execution of a compiled shader:
For each point in batch: Load inputs. For each instruction in shader: Compute. Store outputs.
Why vectorize?
Consider batch execution of a vectorized shader:
For each block of 4 or 8 points in batch: Load inputs. For each instruction in shader: Compute on vector registers (with mask) Store outputs.
Vector utilization
Shader vectorization
To vectorize, rst scalarize:
float dot(vector v1, vector v2) { vector v0 = v1 * v2; return v0.x + v0.y + v0.z; } float dot(vector v1, vector v2) { float x = v1.x * v2.x; float y = v1.y * v2.y; float z = v1.z * v2.z; return x + y + z; }
Vector load instructions (in SSE) require contiguous data. Store batch of vectors as a struct of arrays (SOA):
x x x x . . . y y y y . . . z z z z . . .
Masking / blending
Use a mask to avoid clobbering components of registers used
by the other branch.
No masking in SSE.
No need to blend each instruction Blend at basic block boundaries (at phi nodes in SSA).
Partitioning
Nf = faceforward( normalize(N), I); Ci = Os * Cs * ( Ka*ambient() + Kd*diffuse(Nf) );
ambient Cs scale Os mult Ka
normalize
faceforward
diffuse
Kd
scale
add
scale
Ci
Issues: summary
CPU code generation (perhaps JIT) Vectorization GPU code generation Multi-pass partitioning
Introduction to LLVM
Mid-level intermediate representation (IR) High-level types: structs, arrays, vectors, functions. Control-ow graph: basic blocks with branches Many modular analysis and optimization passes. Code generation for x86, x64, ARM, ... Just-in-time (JIT) compiler too.
Advantages of LLVM
Well designed intermediate representation (IR). Wide range of optimizations (congurable). JIT code generation. Interoperability.
Interoperability
Shaders can call out to renderer via C ABI. We can inline library code into compiled shaders. Compile C++ to LLVM IR with Clang. This greatly simplies code generation.
Weaknesses of LLVM
No automatic vectorization. Poor support for vector-oriented code generation. No predication. Few vector instructions, must resort to SSE/AVX intrinsics.
LLVM resources
www.llvm.org/docs Language Reference Manual Getting Started Guide LLVM Tutorial (section 3) Relevant open source projects ispc.github.com github.com/MarkLeone/PostHaste
Questions?
Mark Leone mleone@wetafx.co.nz