A Hardware Pipeline For Accelerating Ray Traversal Algorithms

A Hardware Pipeline for Accelerating Ray traversal Algorithms on Streaming Processors
Introduction
Ray tracing Ray tracing algorithms Ray traversal hardware pipeline Streaming processors GPGPU
Performance degradation of 1.5X-2.5X
Roll No:7 Mtech CSIS FISAT January 11 2
Introduction
2 stage traversal process
1. Hardware implementation 2. User defined algorithm
Roll No:7 Mtech CSIS FISAT January 11
Introduction
Performance Simulator created
streaming processor architecture Kd tree as software traversal algorithm
Software traversal reduced by 32X Instruction executed reduced by 2.15X.

Previous Work
Accelerated Data Structures
Hierarchical Space Subdivision Schemes Bounding Volume Hierarchies GPU implementations Vector operations Large programmable multi-core architectures Graphics computations in parallel Multiple threads on each processor Software kernels Roll No:7 Mtech CSIS FISAT January 11 Vector operations and vectorized processors
Graphics Hardware
Pipeline Traversal Algorithm

Group Uniform Grid (GrUG)
Axis-aligned subdivision of space Two hierarchical layers
Top Layer Lower Layer
Grid Concepts
Hierarchical Bounding Volume
Grid Concepts
Spatial Subdivisions
Stepping Between Neighbours

DDA method is used tmax , delta and step
Ray projection from original GrUG grouping in A to next GrUG grouping in B. To compute the next point along the ray for the hash function, the ray is projected by the tmin value.
10
KD-Tree
tmin B Y C A A tmax B C D D Y X Z X Z
11
KD-Tree Traversal
X B Y C A A B C D D Y Z X Z
12
Observation
X B Y C A A B C D D Y Z X Z
Current leaf s tmax Mtech= Next leaf s tmin Roll No:7 CSIS FISAT
January 11
13
Overview of GrUG
2 spatial seperation methods
Uniform Grid GrUG groups
Traversal of GrUG Hash Table

Performs 2 mappings Input:ray location Output:memory address of GrUG group
Hash function starting with X,Y,Z coordinates and outputting the memory address of a GrUG grouping that can be passed to a software traversal algorithm.
15
Hash function implementation

3 axes concatenated to form CellID Allows parallel processing
16
Hash Function Implementation
17
Architecture of Group Uniform Grid
18
Data Structure Creation

2 memory spaces
Hash table User defined tree data structure
Starts at GrUG groupings Kd tree is used Uniform grid structure Only leaf nodes need to be present in memory
19
Pipeline Architecture
Standalone processing block inside processor Fixed Hardware
Memory address registers Ray Projection Ray undergoes GrUG traversal Read bounding box of the GrUG groups tmax value is computed
20
Pipeline architecture
Rays per clock cycle Pipeline stages can be vectorized Ideal for streaming processors
21
Integration of the GrUG pipeline into a multi-core graphics processor and the fixed hardware stages for the GrUG pipeline.
22
Hash Function
Determine grid cell of a ray Grid cell id to memory address Locate root node for software traversal Input: Ray location (x,y,z) Output: 9 bit value from each hash function pipeline Maximum grid size support 512 X 512 X 512 Floating point values from -1.0 to 1.0
Architecture of GrUG hash function for one axis using a 512 grid
24
Implementation
Simulator
GPGPU SIM simulator PTX assembly files generated-NVIDIA NVCC compiler PTX assembly code modification
25
Implementation
Kernel Code
Ray generation Post GrUG traversal operation
Read selected GrUG grouping bounding box Compute ray s tmax value
Kd tree algorithm
Radius CUDA
Ray triangle intersection

Wald s algorithm
Kernel Code
27
Benchmark Scenes
8 scenes Resolution 512 X 512
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Results a) Performance
Box Bunny Robots Kitchen
12.9
Roll over brute-force Relative speedup No:7 Mtech CSIS FISAT intersection. January 11
48
Performance Results
Reduced the number o f tree traversal steps by 32.5x for visible rays. Overall Speedup : Average 1.6X for visible rays Performance for grid size of 128 is improved over software implementation by 1.9X compared to 2.15X for a grid size of 512. Conference benchmark scene at resolution 128
49
Results
b) Memory
50
Memory Requirements
Overhead of storing hash table in memory 4 bytes / grid cell -> 4,294,967,296 GrUG groups
512 MB hash table
2 bytes / grid cell -> 65536 GrUG groups

256 MB hash table
Smaller grid size -> upto 4MB hash table 128 grid size -> 1.5 times memory of kd tree 512 grid size -> 27.6 times memory of kd tree
Memory Requirements
Smaller grid sizes are more efficient
Balance between performance and memory
Stores kd tree structure bounding dimensions of threshold nodes Similar memory requirement for storing a full kd tree.
52
Results
c) Bandwidth
Roll No:7 Mtech CSIS FISAT
January 11
53
Bandwidth requirements
Average memory bandwidth per frame is smaller Less down tree traversals -> less device memory transactions Bandwidth is used for post GrUG software traversal GrUG Memory bandwidth + down tree traversal < down traversals by full software implementation
Advantages
Maintains user programmability Increases ray tracing performance Diverse implementation scope
55
Conclusion
New graphics hardware architecture Small fixed hardware pipeline Offload part of the acceleration traversal computations Diverse implementation scope of processor architecture User programmability Overall run time performance
Future Work
57
References
[1] Algorithm for 3D digital differential algorithm CG351-551 Raytracing Algorithm for 3DDDA.htm [2] Introduction to GRIDS flipcode - Raytracing Topics & Techniques.mht [3] KD-Tree Acceleration Structures for a GPU Raytracer. Tim Foley, Jeremy Sugerman Stanford University [4] Design and Evaluation of a Hardware Accelerated Ray Tracing Data Structure Michael Steffen and Joseph Zambreno , Department of Electrical and Computer Engineering Iowa State University, USA. [5] Analyzing CUDA Workloads Using a Detailed GPU Simulator Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong and Tor M. Aamodt University of British Columbia,Vancouver, BC, Canada, {bakhoda,gyuan,wwlfung,henryw,aamodt}@ece.ubc.ca [6] Ray Tracing on a GPU with CUDA Comparative Study of Three Algorithms Martin Zlatu ka Czech Technical University in Prague,Faculty of Electrical Engineering Czech Republic,zlatum1{@}fel.cvut.cz [7] Wikepedia, Ray Tracing basics.
58
Thank you
59

A Hardware Pipeline For Accelerating Ray Traversal Algorithms

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Hardware Pipeline For Accelerating Ray Traversal Algorithms

Uploaded by

Copyright:

Available Formats

A Hardware Pipeline for Accelerating Ray traversal Algorithms on Streaming Processors

Roll No:7 Mtech CSIS FISAT January 11

 Software traversal reduced by 32X  Instruction executed reduced by 2.15X.

Pipeline Traversal Algorithm

Top Layer Lower Layer

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Stepping Between Neighbours

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Traversal of GrUG Hash Table

Roll No:7 Mtech CSIS FISAT January 11

Hash function implementation

Roll No:7 Mtech CSIS FISAT January 11

Hash Function Implementation

Roll No:7 Mtech CSIS FISAT January 11

Architecture of Group Uniform Grid

Roll No:7 Mtech CSIS FISAT January 11

Data Structure Creation

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

 Ray triangle intersection

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

2 bytes / grid cell -> 65536 GrUG groups

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

Roll No:7 Mtech CSIS FISAT January 11

You might also like

Software traversal reduced by 32X Instruction executed reduced by 2.15X.

Ray triangle intersection