You are on page 1of 59

A Hardware Pipeline for Accelerating Ray traversal Algorithms on Streaming Processors

Introduction
Ray tracing Ray tracing algorithms Ray traversal hardware pipeline Streaming processors GPGPU
 Performance degradation of 1.5X-2.5X
Roll No:7 Mtech CSIS FISAT January 11 2

Introduction
2 stage traversal process
1. Hardware implementation 2. User defined algorithm

Roll No:7 Mtech CSIS FISAT January 11

Introduction
Performance Simulator created
 streaming processor architecture  Kd tree as software traversal algorithm

 Software traversal reduced by 32X  Instruction executed reduced by 2.15X.


Roll No:7 Mtech CSIS FISAT January 11 4

Previous Work
Accelerated Data Structures
         Hierarchical Space Subdivision Schemes Bounding Volume Hierarchies GPU implementations Vector operations Large programmable multi-core architectures Graphics computations in parallel Multiple threads on each processor Software kernels Roll No:7 Mtech CSIS FISAT January 11 Vector operations and vectorized processors

Graphics Hardware

Pipeline Traversal Algorithm


Group Uniform Grid (GrUG)
 Axis-aligned subdivision of space  Two hierarchical layers

Top Layer Lower Layer

Roll No:7 Mtech CSIS FISAT January 11

Grid Concepts
Hierarchical Bounding Volume

Roll No:7 Mtech CSIS FISAT January 11

Grid Concepts
Spatial Subdivisions

Roll No:7 Mtech CSIS FISAT January 11

Stepping Between Neighbours


DDA method is used tmax , delta and step

Roll No:7 Mtech CSIS FISAT January 11

Ray projection from original GrUG grouping in A to next GrUG grouping in B. To compute the next point along the ray for the hash function, the ray is projected by the tmin value.

Roll No:7 Mtech CSIS FISAT January 11

10

KD-Tree
tmin B Y C A A tmax B C D D Y X Z X Z

Roll No:7 Mtech CSIS FISAT January 11

11

KD-Tree Traversal
X B Y C A A B C D D Y Z X Z

Roll No:7 Mtech CSIS FISAT January 11

12

Observation
X B Y C A A B C D D Y Z X Z

Current leaf s tmax Mtech= Next leaf s tmin Roll No:7 CSIS FISAT
January 11

13

Overview of GrUG
2 spatial seperation methods
 Uniform Grid  GrUG groups

Traversal of GrUG Hash Table


 Performs 2 mappings  Input:ray location  Output:memory address of GrUG group
Roll No:7 Mtech CSIS FISAT January 11 14

Hash function starting with X,Y,Z coordinates and outputting the memory address of a GrUG grouping that can be passed to a software traversal algorithm.

Roll No:7 Mtech CSIS FISAT January 11

15

Hash function implementation


3 axes concatenated to form CellID Allows parallel processing

Roll No:7 Mtech CSIS FISAT January 11

16

Hash Function Implementation

Roll No:7 Mtech CSIS FISAT January 11

17

Architecture of Group Uniform Grid

Roll No:7 Mtech CSIS FISAT January 11

18

Data Structure Creation


2 memory spaces
 Hash table  User defined tree data structure

Starts at GrUG groupings Kd tree is used Uniform grid structure Only leaf nodes need to be present in memory
Roll No:7 Mtech CSIS FISAT January 11

19

Pipeline Architecture
Standalone processing block inside processor Fixed Hardware
     Memory address registers Ray Projection Ray undergoes GrUG traversal Read bounding box of the GrUG groups tmax value is computed

Roll No:7 Mtech CSIS FISAT January 11

20

Pipeline architecture
Rays per clock cycle Pipeline stages can be vectorized Ideal for streaming processors

Roll No:7 Mtech CSIS FISAT January 11

21

Integration of the GrUG pipeline into a multi-core graphics processor and the fixed hardware stages for the GrUG pipeline.

Roll No:7 Mtech CSIS FISAT January 11

22

Hash Function
Determine grid cell of a ray Grid cell id to memory address Locate root node for software traversal Input: Ray location (x,y,z) Output: 9 bit value from each hash function pipeline Maximum grid size support 512 X 512 X 512 Floating point values from -1.0 to 1.0
Roll No:7 Mtech CSIS FISAT January 11 23

Architecture of GrUG hash function for one axis using a 512 grid

Roll No:7 Mtech CSIS FISAT January 11

24

Implementation
Simulator
 GPGPU SIM simulator  PTX assembly files generated-NVIDIA NVCC compiler  PTX assembly code modification

Roll No:7 Mtech CSIS FISAT January 11

25

Implementation
Kernel Code
 Ray generation  Post GrUG traversal operation
 Read selected GrUG grouping bounding box  Compute ray s tmax value

 Kd tree algorithm
 Radius CUDA

 Ray triangle intersection


 Wald s algorithm
Roll No:7 Mtech CSIS FISAT January 11 26

Kernel Code

Roll No:7 Mtech CSIS FISAT January 11

27

Benchmark Scenes
8 scenes Resolution 512 X 512

Roll No:7 Mtech CSIS FISAT January 11

28

Roll No:7 Mtech CSIS FISAT January 11

29

Roll No:7 Mtech CSIS FISAT January 11

30

Roll No:7 Mtech CSIS FISAT January 11

31

Roll No:7 Mtech CSIS FISAT January 11

32

Roll No:7 Mtech CSIS FISAT January 11

33

Roll No:7 Mtech CSIS FISAT January 11

34

Roll No:7 Mtech CSIS FISAT January 11

35

Roll No:7 Mtech CSIS FISAT January 11

36

Roll No:7 Mtech CSIS FISAT January 11

37

Roll No:7 Mtech CSIS FISAT January 11

38

Roll No:7 Mtech CSIS FISAT January 11

39

Roll No:7 Mtech CSIS FISAT January 11

40

Roll No:7 Mtech CSIS FISAT January 11

41

Roll No:7 Mtech CSIS FISAT January 11

42

Roll No:7 Mtech CSIS FISAT January 11

43

Roll No:7 Mtech CSIS FISAT January 11

44

Roll No:7 Mtech CSIS FISAT January 11

45

Roll No:7 Mtech CSIS FISAT January 11

46

Roll No:7 Mtech CSIS FISAT January 11

47

Results a) Performance
Box Bunny Robots Kitchen

12.9

Roll over brute-force Relative speedup No:7 Mtech CSIS FISAT intersection. January 11

48

Performance Results
Reduced the number o f tree traversal steps by 32.5x for visible rays. Overall Speedup : Average 1.6X for visible rays Performance for grid size of 128 is improved over software implementation by 1.9X compared to 2.15X for a grid size of 512. Conference benchmark scene at resolution 128
Roll No:7 Mtech CSIS FISAT January 11

49

Results
b) Memory

Roll No:7 Mtech CSIS FISAT January 11

50

Memory Requirements
Overhead of storing hash table in memory 4 bytes / grid cell -> 4,294,967,296 GrUG groups
 512 MB hash table

2 bytes / grid cell -> 65536 GrUG groups


 256 MB hash table

Smaller grid size -> upto 4MB hash table 128 grid size -> 1.5 times memory of kd tree 512 grid size -> 27.6 times memory of kd tree
Roll No:7 Mtech CSIS FISAT January 11 51

Memory Requirements
Smaller grid sizes are more efficient
 Balance between performance and memory

Stores kd tree structure bounding dimensions of threshold nodes Similar memory requirement for storing a full kd tree.

Roll No:7 Mtech CSIS FISAT January 11

52

Results
c) Bandwidth

Roll No:7 Mtech CSIS FISAT

January 11

53

Bandwidth requirements
Average memory bandwidth per frame is smaller Less down tree traversals -> less device memory transactions Bandwidth is used for post GrUG software traversal GrUG Memory bandwidth + down tree traversal < down traversals by full software implementation
Roll No:7 Mtech CSIS FISAT January 11 54

Advantages
Maintains user programmability Increases ray tracing performance Diverse implementation scope

Roll No:7 Mtech CSIS FISAT January 11

55

Conclusion
New graphics hardware architecture Small fixed hardware pipeline Offload part of the acceleration traversal computations Diverse implementation scope of processor architecture User programmability Overall run time performance
Roll No:7 Mtech CSIS FISAT January 11 56

Future Work

Roll No:7 Mtech CSIS FISAT January 11

57

References
[1] Algorithm for 3D digital differential algorithm CG351-551 Raytracing Algorithm for 3DDDA.htm [2] Introduction to GRIDS flipcode - Raytracing Topics & Techniques.mht [3] KD-Tree Acceleration Structures for a GPU Raytracer. Tim Foley, Jeremy Sugerman Stanford University [4] Design and Evaluation of a Hardware Accelerated Ray Tracing Data Structure Michael Steffen and Joseph Zambreno , Department of Electrical and Computer Engineering Iowa State University, USA. [5] Analyzing CUDA Workloads Using a Detailed GPU Simulator Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong and Tor M. Aamodt University of British Columbia,Vancouver, BC, Canada, {bakhoda,gyuan,wwlfung,henryw,aamodt}@ece.ubc.ca [6] Ray Tracing on a GPU with CUDA Comparative Study of Three Algorithms Martin Zlatu ka Czech Technical University in Prague,Faculty of Electrical Engineering Czech Republic,zlatum1{@}fel.cvut.cz [7] Wikepedia, Ray Tracing basics.

Roll No:7 Mtech CSIS FISAT January 11

58

Thank you

Roll No:7 Mtech CSIS FISAT January 11

59

You might also like